The Oxford Handbook of Developmental Linguistics 9780199601264, 0199601267

In this handbook, renowned scholars from a range of backgrounds provide a state of the art review of key developmental f

175 81 14MB

English Pages 1041 Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Oxford Handbook of Developmental Linguistics 9780191644948, 9780199601264, 0191644943

In this handbook, renowned scholars from a range of backgrounds provide a state of the art review of key developmental f

160 8 25MB Read more

The Oxford Handbook of Persian Linguistics (Oxford Handbooks) 0198736746, 9780198736745

This handbook offers a comprehensive overview of the field of Persian linguistics, discusses its development, and captur

629 112 23MB Read more

The Oxford Handbook of Persian Linguistics (Oxford Handbooks) 9780198736745, 0198736746

This handbook offers a comprehensive overview of the field of Persian linguistics, discusses its development, and captur

158 99 29MB Read more

The Oxford Handbook of the History of Linguistics 9780199585847, 0199585849

Leading scholars examine the history of linguistics from ancient origins to the present. They consider every aspect of t

175 62 9MB Read more

The Oxford Handbook of the History of Linguistics 9780199585847, 0199585849

Leading scholars examine the history of linguistics from ancient origins to the present. They consider every aspect of t

168 30 20MB Read more

The Oxford Handbook of Persian Linguistics 9780191056420, 9780198736745, 0191056421

This handbook offers a comprehensive overview of the field of Persian linguistics, discusses its development, and captur

140 27 12MB Read more

The Oxford Handbook of Chinese Linguistics 9780199856336, 9780199856343, 0199856338

The Oxford Handbook of Chinese Linguistics offers a broad and comprehensive coverage of the entire field from a multi-di

187 80 23MB Read more

The Oxford Handbook of Cognitive Linguistics 9780195143782, 0195143787

Part I: Basic Concepts (16 cap.) + Part II: Models of Grammar (3 cap.) + Part III: Situating Cognitive Linguistics (3 ca

185 34 9MB Read more

The Oxford handbook of Japanese linguistics 9780195307344, 2007041899

289 99 223MB Read more

The Oxford Handbook of Computational Linguistics 9780191625541, 9780199573691, 019162554X

Ruslan Mitkov's highly successful Oxford Handbook of Computational Linguistics has been substantially revised and e

134 73 20MB Read more

The Oxford Handbook of Developmental Linguistics
9780199601264, 0199601267

Author / Uploaded
Jeffrey Lidz
William Snyder
Joe Pater

Table of contents :
Cover
Series
The Oxford Handbook of Developmental Linguistics
Copyright
Contents
Contributors
List of Abbreviations
1. Introduction
Part I The Acquisition Of Sound Systems
2. The Acquisition of Phonological Inventories
3. Phonotactics and Syllable Structure in Infant Speech Perception
4. Phonological Processes in Children’s Productions: Convergence with and Divergence from Adult Grammars
5. Prosodic Phenomena: Stress, Tone, and Intonation
Part II The Acquisition of Morphology
6. Compound Word Formation
7. Morpho-.phonological Acquisition
8. Processing Continuous Speech in Infancy: From Major Prosodic Units to Isolated Word Forms
Part III The Acquisition of Syntax
9. Argument Structure
10. Voice Alternations (Active, Passive, Middle)
11. On the Acquisition of Prepositions and Particles
12. A-.Movement in Language Development
13. The Acquisition of Complements
14. Acquisition of Questions
15. Root Infinitives in Child Language and the Structure of the Clause
16. Mood Alternations
17. Null Subjects
18. Case and Agreement
19. Acquiring Possessives
Part IV The Acquisition of Semantics
20. Acquisition of Comparative and Degree Constructions
21. Quantification in Child Language
22. The Acquisition of Binding and Coreference
23. Logical Connectives
24. The Expression of Genericity in Child Language
25. Lexical and Grammatical Aspect
26. Scalar Implicature
Part V Theories of Learning
27. Computational Theories of Learning and Developmental Psycholinguistics
28. Statistical Learning, Inductive Bias, and Bayesian Inference in Language Acquisition
29. Computational Approaches to Parameter Setting in Generative Linguistics
30. Learning with Violable Constraints
Part VI Atypical Populations
31. Language Development in Children with Developmental Disorders
32. The Genetics of Spoken Language
33. Phonological Disorders: Theoretical and Experimental Findings
References
Index
Series

Citation preview

T h e Ox f o r d H a n d b o o k o f

DE V E L OP M E N TA L L I N G U I ST IC S

OXFORD HANDBOOKS IN LINGUISTICS Recently published

THE OXFORD HANDBOOK OF THE HISTORY OF LINGUISTICS Edited by Keith Allan

THE OXFORD HANDBOOK OF LINGUISTIC TYPOLOGY Edited by Jae Jung Song

THE OXFORD HANDBOOK OF CONSTRUCTION GRAMMAR Edited by Thomas Hoffman and Graeme Trousdale

THE OXFORD HANDBOOK OF LANGUAGE EVOLUTION Edited by Maggie Tallerman and Kathleen Gibson

THE OXFORD HANDBOOK OF ARABIC LINGUISTICS Edited by Jonathan Owens

THE OXFORD HANDBOOK OF CORPUS PHONOLOGY Edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen

THE OXFORD HANDBOOK OF LINGUISTIC FIELDWORK Edited by Nicholas Thieberger

THE OXFORD HANDBOOK OF DERIVATIONAL MORPHOLOGY Edited by Rochelle Lieber and Pavol Štekauer

THE OXFORD HANDBOOK OF HISTORICAL PHONOLOGY Edited by Patrick Honeybone and Joseph Salmons

THE OXFORD HANDBOOK OF LINGUISTIC ANALYSIS Second Edition Edited by Bernd Heine and Heiko Narrog

THE OXFORD HANDBOOK OF THE WORD Edited by John R. Taylor

THE OXFORD HANDBOOK OF INFLECTION Edited by Matthew Baerman

THE OXFORD HANDBOOK OF LANGUAGE AND LAW Edited by Peter M. Tiersma and Lawrence M. Solan

THE OXFORD HANDBOOK OF DEVELOPMENTAL LINGUISTICS Edited by Jeffrey Lidz, William Snyder, and Joe Pater

THE OXFORD HANDBOOK OF LEXICOGRAPHY Edited by Philip Durkin

THE OXFORD HANDBOOK OF NAMES AND NAMING Edited by Carole Hough

[For a complete list of Oxford Handbooks in Linguistics, please see pp 1007–1008]

The Oxford Handbook of

DEVELOPMENTAL LINGUISTICS Edited by

JEFFREY LIDZ, WILLIAM SNYDER, and

JOE PATER

1

3 Great Clarendon Street, Oxford, ox2 6dp, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © editorial matter and organization Jeffrey Lidz, William Snyder, and Joe Pater 2016 © the chapters their several authors 2016 The moral rights of the authors‌have been asserted First Edition published in 2016 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2016933496 ISBN 978–0–19–960126–4 Printed in Great Britain by Clays Ltd, St Ives plc Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

Contents

Contributors List of Abbreviations

ix xvii

1. Introduction Jeffrey Lidz, William Snyder, and Joe Pater

1

PA RT I T H E AC QU I SI T ION OF S OU N D SYS T E M S 2. The Acquisition of Phonological Inventories Ewan Dunbar and William Idsardi 3. Phonotactics and Syllable Structure in Infant Speech Perception Tania S. Zamuner and Viktor Kharlamov 4. Phonological Processes in Children’s Productions: Convergence with and Divergence from Adult Grammars Heather Goad 5. Prosodic Phenomena: Stress, Tone, and Intonation Mitsuhiko Ota

7 27

43 68

PA RT I I T H E AC QU I SI T ION OF M OR P HOL O G Y 6. Compound Word Formation William Snyder

89

7. Morpho-phonological Acquisition Anne-Michelle Tessier

111

8. Processing Continuous Speech in Infancy: From Major Prosodic Units to Isolated Word Forms Louise Goyet, Séverine Millotte, Anne Christophe, and Thierry Nazzi

133

vi Content

PA RT I I I T H E AC QU I SI T ION OF SY N TAX 9. Argument Structure Joshua Viau and Ann Bunger

157

10. Voice Alternations (Active, Passive, Middle) Maria Teresa Guasti

179

11. On the Acquisition of Prepositions and Particles Koji Sugisaki

206

12. A-Movement in Language Development Misha Becker and Susannah Kirby

230

13. The Acquisition of Complements Jill de Villiers and Tom Roeper

279

14. Acquisition of Questions Rosalind Thornton

310

15. Root Infinitives in Child Language and the Structure of the Clause John Grinstead

341

16. Mood Alternations Kamil Ud Deen

367

17. Null Subjects Virginia Valian

386

18. Case and Agreement Paul Hagstrom

414

19. Acquiring Possessives Theodoros Marinis

435

PA RT I V T H E AC QU I SI T ION OF SE M A N T IC S 20. Acquisition of Comparative and Degree Constructions Kristen Syrett

463

21. Quantification in Child Language Jeffrey Lidz

498

22. The Acquisition of Binding and Coreference Sergio Baauw

520

content vii

23. Logical Connectives Takuya Goro

547

24. The Expression of Genericity in Child Language Ana T. Pérez-Leroux

565

25. Lexical and Grammatical Aspect Angeliek van Hout

587

26. Scalar Implicature Anna Papafragou and Dimitrios Skordos

611

PA RT V T H E OR I E S OF L E A R N I N G 27. Computational Theories of Learning and Developmental Psycholinguistics Jeffrey Heinz

633

28. Statistical Learning, Inductive Bias, and Bayesian Inference in Language Acquisition Lisa Pearl and Sharon Goldwater

664

29. Computational Approaches to Parameter Setting in Generative Linguistics William Gregory Sakas

696

30. Learning with Violable Constraints Gaja Jarosz

725

PA RT V I AT Y P IC A L P OP U L AT ION S 31. Language Development in Children with Developmental Disorders Andrea Zukowski

751

32. The Genetics of Spoken Language Jennifer Ganger

771

33. Phonological Disorders: Theoretical and Experimental Findings Daniel A. Dinnsen, Jessica A. Barlow, and Judith A. Gierut

790

References Index

817 963

Contributors

Sergio Baauw is Assistant Professor at the Spanish Language and Culture program of Utrecht University. His research is focused on first language acquisition from a cross- linguistic (Dutch/Spanish) perspective. He has also worked on agrammatism and bilingualism. Baauw has published papers on the acquisition of pronouns, reflexives, tense, and determiners. Jessica A. Barlow is Professor of Speech, Language, and Hearing Sciences at San Diego State University. Her research focuses specifically on phonological acquisition and phonological theory, with the goal of documenting universal properties of language sound systems in order to better inform our understanding of, and theories about, language, language acquisition, and language disorders, as well as the clinical management of those disorders. Misha Becker is Associate Professor of Linguistics at UNC Chapel Hill. Her recent research has focused on how children learn to distinguish syntactic constructions with argument displacement (e.g. raising and tough-movement) from similar constructions without displacement (control), and the role that animacy plays in this process. She has conducted psycholinguistic studies with both children and adults using a variety of methodologies, including sentence judgment, novel verb learning, and reaction time. Ann Bunger is Lecturer in Linguistics at Indiana University. Her work examines the relation between what speakers of different age groups and different language backgrounds understand about events happening in the world around them and the way they talk about those events. She has published papers on how children use syntactic cues to learn verb meanings and how conceptual and linguistic representations interact in real-time during language production. Anne Christophe is Research Director at CNRS as well as director of the Laboratoire de Sciences Cognitives et Psycholinguistique, in Paris (Ecole Normale Supérieure /PSL Research University /EHESS /CNRS). Her research focuses on how young children may bootstrap early lexical and syntactic acquisition by relying on sources of information available early on from the acoustic signal, namely phrasal prosody and function words. She uses experimentation with infants and toddlers as well as computational modeling. Jill de Villiers is Professor of Psychology and Philosophy at Smith College. Her research on child language acquisition has had a particular focus on the syntax of questions, and

x Contributors also on the relation of language and Theory of Mind. More recently she has worked on refining tests of child language assessment in English and several other languages. She is a co-editor with Tom Roeper of the recent Handbook in Generative Approaches to Language Acquisition. Kamil Ud Deen is Associate Professor of Linguistics at the University of Hawaii. His interests focus on the acquisition of understudied languages and how such data might bear on theories of child and adult grammar. Topics he has worked on include binding, the passive, mood, and relative clauses, drawing on data from languages such as Swahili, Thai, Korean, Japanese, Tagalog, and Serbian. A former associate editor for the Journal of Child Language, he is currently editor of Brief Articles of Language Acquisition: A Journal of Developmental Linguistics. Daniel A. Dinnsen is Chancellor’s Professor Emeritus of Linguistics at Indiana University-Bloomington. His research brings the latest developments in phonological theory to bear on the analysis of young children’s developing sound systems and phonological learning patterns, with special emphasis on phonological (non-organic) disorders. Dinnsen’s research has been funded by the National Institutes of Health for the last 30 years. Ewan Dunbar is a postdoctoral researcher at the Laboratoire de Sciences Cognitives et Psycholinguistique (ENS–CNRS–EHESS) at the Ecole Normale Supérieure /PSL Research University. His work focuses on speech development, at both the low-level phonetic level and the abstract level that records those sounds in long-term memory. He uses statistical machine learning techniques to model acquisition and typological data to narrow down the universal formal constraints on sound systems. Jennifer Ganger is Lecturer in Developmental Psychology at the University of Pittsburgh. She focuses on undergraduate teaching, especially in language development, behavior genetics, and research methods. Her research examines the relationship between genes and environment in language development, incorporating both twin studies and direct modulation of children’s language input. Judith A. Gierut is Professor of Speech and Hearing Sciences at Indiana University- Bloomington. Her research examines the linguistic structure of the phonologies of children with phonological disorders and the psycholinguistic variables that affect language learning in the population. The work is translational in that Gierut’s findings from basic research have direct clinical application in the validation of phonological treatment efficacy. Her research has had 30 years of support from the National Institutes of Health. Heather Goad is Associate Professor of Linguistics at McGill University. Her research focuses on prosodic structure in development: how syllable complexity and stress are acquired; how prosodic structure impacts the acquisition of functional morphology; and how prosodic constraints shape development in the segmental domain. Goad

Contributors xi was formerly an associate editor of Language Acquisition: A Journal of Developmental Linguistics. She is currently on the editorial board for this journal, for Benjamins’ Language Acquisition & Language Disorders series and for Oxford Studies in Phonology. Sharon Goldwater is Reader in the School of Informatics at the University of Edinburgh. Her research explores the nature of the language acquisition problem and the computational constraints and mechanisms needed to solve it. She has published widely in cognitive science, natural language processing, and machine learning, on topics including word and morpheme segmentation, phonetic and phonological learning, and the integration of multiple cues (e.g. the role of prosody and semantics in syntactic acquisition). Takuya Goro is Associate Professor in the Department of English at Tsuda College. His research interest lies in first language acquisition, especially on the acquisition of quantification, logical connectives, and scope assignments. Many of his studies adopted a cross-linguistic approach; those studies involve close comparison between child Japanese and child English. Louise Goyet is Lecturer at the Laboratoire Paragraphe, University Paris 8 Vincennes- Saint-Denis. Her work mainly examines underlying phonological and lexical acquisition: word segmentation process in monolingual French-learning infants and lexical categorization in bilingual infants, although her focus is on recognition of verbally labeled facial expressions (the conceptualization process) in mono-and bilingual children. John Grinstead is Associate Professor of Hispanic Linguistics at The Ohio State University. His work primarily addresses language development in typically developing Spanish-speaking children as well as in children with specific language impairment. His current projects investigate the development of syntax, semantics, pragmatics, and their interfaces. He has recently published two edited volumes on developmental linguistics, published studies in Journal of Speech, Language and Hearing Research, the journal Language and in Applied Psycholinguistics. Maria Teresa Guasti is Professor of Psycholinguistics and Linguistics at the University of Milano-Bicocca. Her work is concerned with specific language impairment, dyslexia, and complexity in child language ranging from phonology to semantics. She has worked on Romance and Germanic languages and Chinese. She has published papers on prosody, morphosyntax, relative clauses, questions, binding, scalar implicatures, and is the author of a textbook on language acquisition with MIT Press and co-author of a book on the acquisition of Italian. Paul Hagstrom is Associate Professor at Boston University. His work focuses on the interactions between the syntax, semantics, and morphology of interrogatives and focus constructions, and he has also published work on the acquisition of morphology in French, Chinese, and Korean, and on existential constructions in second language acquisition.

xii Contributors Jeffrey Heinz is Associate Professor in the Linguistics and Cognitive Science department at the University of Delaware. His research lies at the intersection of theoretical linguistics, theoretical computer science, and computational learning theory. He has published papers in these areas in the journals Science, Phonology, Linguistic Inquiry, Theoretical Computer Science, and Topics in Cognitive Science. He serves on the steering committee for the International Conference of Grammatical Inference. William Idsardi is Professor and Chair of Linguistics at the University of Maryland. His research focuses on the architecture of phonology and its relation to other components of grammar. In his work he uses a variety of methods including grammatical analysis of phonological patterns, computational modeling, behavioral experiments, and neural measures of speech processing. Gaja Jarosz is Associate Professor of Linguistics at University of Massachusetts, Amherst. Her work examines phonological acquisition and representation from computational, theoretical, and developmental perspectives. She has published on hidden structure learning in phonology, the subset problem in phonology, phonological opacity, word segmentation, acquisition of syllable structure, the inference of phonological and morphological structure, and the statistical properties of child-directed speech. Viktor Kharlamov is Assistant Professor of Linguistics at Florida Atlantic University where he directs an experimental linguistics lab. His primary areas of expertise include acoustic and articulatory phonetics, laboratory phonology, and psycholinguistics. He has published papers on syllabification, consonantal deletion, and phonetic voicing. He also works on investigating the role of orthographic knowledge in speech production and perception as well as documenting the phonetic system of Southern Ute, a Native American language. Susannah Kirby has a Ph.D. in linguistics from the University of North Carolina at Chapel Hill, and has held positions at UNC-CH, the University of British Columbia, and Simon Fraser University. Her research in linguistics focused primarily on the acquisition of raising and control verbs and the syntactic distinctions between those verbs in adult English, and used both nativist/generativist and emergentist/constructionist models. She is currently completing a degree in computer science. Jeffrey Lidz is Professor of Linguistics at the University of Maryland. His work examines the relation between grammatical theory, on-line understanding mechanisms, and learning. Bringing data to bear from languages as diverse as English, French, Korean, Kannada, and Tsez, he has published papers on quantification, argument structure, morphosyntax, A-bar movement, and reference relations, Lidz is currently editor in chief of Language Acquisition: A Journal of Developmental Linguistics. Theodoros Marinis is Professor of Multilingualism & Language Development at the University of Reading. His research focuses on cross-linguistic first and second language acquisition and processing in adults, typically developing children, and children

Contributors xiii with developmental language disorders with the aim of uncovering the nature of language processing in typical and atypical language development. He is well known for using on-line reaction-time experiments with children. He is a member of the Centre for Literacy & Multilingualism. Séverine Millotte is Associate Professor at the Laboratoire d’Etude de l’Apprentissage et du Développement, CNRS, University of Burgundy, where she studies lexical and syntactic acquisition in infants as well as language processing in adults. She also trains future school teachers in the Ecole Supérieure du Professorat et de l’Education (ESPE) of Dijon and recently began a new research project examining the role of school timetables and the impact of new digital technologies on learning in school-aged children. Thierry Nazzi is Research Director at the Laboratorie Psychologie de la Perception, CNRS-University Paris Descartes. His work mainly examines the mechanisms underlying phonological and lexical acquisition and processing, and their interactions, from birth to adulthood. Although his work focuses on French-learning infants, his interest in cross-linguistic variation has led him to compare acquisition across languages, including English, German, Hungarian, Japanese, and Cantonese. Thierry Nazzi is currently associate editor of Language and Speech. Mitsuhiko Ota is Reader (Associate Professor) in Linguistics at the University of Edinburgh. His main work is in the area of phonological acquisition with particular reference to prosodic structure and to the relationship between phonological and lexical development. His research on these issues relates to both in first and second language acquisition. Ota is currently an associate editor of Language Acquisition: A Journal of Developmental Linguistics. Anna Papafragou is Associate Professor in the Department of Psychological and Brain Sciences and the Department of Linguistics and Cognitive Science at the University of Delaware, where she directs the Language and Cognition Lab. Her research, funded by NSF and NIH, investigates how children acquire language, how language is used and understood online by both children and adults, and how language interfaces with human perceptual/conceptual systems cross-linguistically. Joe Pater is Professor of Linguistics at the University of Massachusetts. His work explores phonological theory and acquisition. His current research focuses on the use of weighted constraints for the modeling of phonology and its learning. Lisa Pearl is Associate Professor of Cognitive Sciences at the University of California at Irvine. She investigates the knowledge that can be inferred from language data, including how children infer linguistic knowledge and how adults infer subtle non-linguistic knowledge. Using computational modeling as her main tool, she has published papers on word order, reference relations, relative clause semantics, syntactic islands, word segmentation, metrical phonology, linguistic parameters, authorship, and tone identification.

xiv Contributors Ana T. Pérez-Leroux is Professor of Spanish and Linguistics, and Director of the Cognitive Science Program at the University of Toronto. Her work examines how children learn the syntax and semantics of the smallest and silent components of sentence grammar. Her work on the acquisition of noun phrases, tense, mood and aspect markers, null subjects, implicit objects, and movement phenomena is based on comparative studies in Spanish, French, English and German, and Japanese. Tom Roeper is Professor of Linguistics at UMass and has worked on theoretical and empirical approaches to language acquisition and morphology. His work has focused on complex syntax, long-distance rules, quantification, and recursion, which has led to joint projects in eight countries and languages such as Afrikaans, Pirahã, and German dialects. His work extends to a test (DELV) in Communication Disorders, African American English, and a general book (Prism of Grammar, MIT Press, 2007). He was a founding editor of Language Acquisition and is a current co-editor of Studies in Theoretical Psycholinguistics. William Gregory Sakas is Associate Professor of Computer Science and Linguistics at the City University of New York (CUNY) Graduate Center where he was the founding director of the Computational Linguistics Program. He is currently the chair of the Computer Science Department at Hunter College, CUNY. His research focuses on computational modeling of human language: What are the consequential components of a computational model and how do they correlate with psycholinguistic data and human mental capacities? Dimitrios Skordos is Lecturer of Psychology with a joint appointment in the Department of Psychology and the Linguistics Program at the College of William and Mary. His main research interests include experimental pragmatics and the development of pragmatic inference in children; the relationship between language and cognition with a focus on spatial language; and the acquisition of syntax and semantics. William Snyder is Professor of Linguistics at the University of Connecticut. His work examines the time course of acquisition against patterns of grammatical variation to identify the child’s learning procedures and hypothesis space. Research topics include argument structure (datives, resultatives, particles, path phrases), A-and A- bar movement (passives, reflexive- clitic constructions, P- stranding, comparatives), compound words, and syllable structure. A past editor of the journal Language Acquisition: A Journal of Developmental Linguistics, Snyder is author of Child Language: The Parametric Approach. Koji Sugisaki is Professor of Linguistics at Mie University, Japan. His work investigates the nature of parameters both from cross-linguistic and acquisitional perspectives. Major topics covered in his papers include preposition stranding, covert wh-movement, argument ellipsis, and sluicing. Kristen Syrett is currently Assistant Professor in the Department of Linguistics and the Center for Cognitive Science at Rutgers, The State University of New Jersey–New

Contributors xv Brunswick. In her language acquisition research and psycholinguistic investigations with adults, she addresses key issues in semantics, pragmatics, and the syntax–semantics interface, including word learning, gradability/scales, ambiguity, ellipsis, measurement, comparison, degree expressions, prosody, and at-issueness. Her investigations have included data from languages such as English, French, Spanish, and Mandarin. Anne-Michelle Tessier is Associate Professor at the University of Alberta (Ph.D. UMass Amherst, 2007). She studies many aspects of child phonology among both L1 and L2 learners, using data from existing corpora, non-word repetition, artificial language experiments, and simulations. Most of her work centers around the development predictions of algorithms for learning constraint-based frameworks. Her forthcoming textbook, Phonological Acquisition: Child Language and Constraint-based Grammar, will be published by Palgrave Macmillan in 2015. Rosalind Thornton is Associate Professor of Linguistics at Macquarie University, Sydney, Australia. Her work examines children’s acquisition of syntactic and semantic knowledge in typically-developing children and more recently in children with specific language impairment, with a view to understanding which aspects of our linguistic knowledge are innate and which aspects are learned. Rosalind has published papers on wh-movement, quantification, binding theory, structures with ellipsis including VP ellipsis, morphosyntax, control, and negation. Virginia Valian is Distinguished Professor at Hunter College and the CUNY Graduate Center. She is Director of the Language Acquisition Research Center and Co-director of the Gender Equity Project. Her research has been funded by NSF, NIH, and the Sloan Foundation. Her main interests in language include: logical and empirical arguments for nativism; theories of acquisition; children’s early abstract syntactic knowledge; the role of input; variability in early syntax acquisition; bilingualism and executive function. Angeliek van Hout is Professor of Linguistics at the University of Groningen. Her work focuses on the acquisition of form–meaning associations, connecting grammatical theory and experimental methods, especially with cross-linguistic designs. Comparing Germanic, Romance, and Slavic languages and beyond, Van Hout has investigated many themes in tense–aspect. Her research on the acquisition of Dutch syntax and semantics furthermore covers topics such as definiteness, quantification, wh-questions, embedding, pronouns, and unaccusatives. Joshua Viau received his Ph.D. in Linguistics from Northwestern University, and subsequently pursued postdoctoral research at Johns Hopkins University and the University of Delaware. His work probes the intersection of the conceptual domains of possession and location and the extent to which child language collapses across them in encoding recipients and spatial goals in transfer events. Currently, Viau is focused on integrating psycholinguistics into secondary school curricula in South Florida. Tania S. Zamuner is Associate Professor of Linguistics at the University of Ottawa. Her research focuses on psycholinguistics, developmental speech perception and

xvi Contributors production, lexical acquisition, and spoken word recognition. She is the Director of the Centre for Child Language Research/Centre de recherche sur le langage des enfants. Andrea Zukowski is Assistant Research Scientist in Linguistics at the University of Maryland. Her work in developmental disorders is aimed at understanding the effects and non-effects of cognitive limitations on the development and use of language. She has published papers on the knowledge and use of a variety of syntactic phenomena in both typically developing children and children and adults with Williams syndrome.

List of Abbreviations

1sPN

first-person singular pronoun

1sSM

first-person singular subject marker

ACC accusative ACD

antecedent-contained deletion

ACDH A-chain Deficit Hypothesis ACE

additive genetic variance + shared environmental variance + non-shared environmental variance

AGR

subject–verb agreement

AIC

Akaike’s Information Criterion

AIH

Argument Intervention Hypothesis

ANS

Approximate Number System

APL applicative ASL

American Sign Language

AT

Actor-Topic

ATOM Agreement Tense Omission Model BCD

Biased Constraint Demotion

CAH

Canonical Alignment Hypothesis

CC consonant+consonant CD

Constraint Demotion

CFG

context-free grammar

CG consonant+glide CG

Construction Grammar

CH

consonant harmony

CL clitic CL consonant+liquid COMP comparative CP

cardinal principle

CT

Circumstantial-Topic

xviii LIST OF abbreviations CV consonant+verb D determiner DS

Down Syndrome

DAT dative Deg degree DET determiner DIM diminutive DIR directional DMCMC decayed Markov Chain Monte Carlo DP determiner phrase DPBE

Delay of Principle B Effect

DR

double-reversal

DZ

dizygotic (twins)

EARH

External Argument Requirement Hypothesis

EC

empty category

ECM

exceptional case-marking

ECP

Empty Category Principle

EDCD

Error-Driven Constraint Demotion

EM

Expectation-Maximization

EP

Elicited Production

EPP

Extended Projection Principle

ER

embedded-reversal

ERPs

evoked response potentials

ESL

Error-Selective Learning

F0

fundamental frequency

FI

faire-infinitive

FP

faire-par

FRUs

first of repeated uses

FUT-PRT future particle FV

final vowel (mood)

GA

genetic algorithm

GA

gradable adjective

GB

Government-Binding

GEN genitive

list of abbreviations xix GLA

Gradual Learning Algorithm

GPSG

Generalized Phrase Structure Grammar

HAS

high amplitude sucking

HG

Harmonic Grammar

HPP

head-turn preference procedure

HPSG

Head-driven Phrase Structure Grammar

IDL

Inconsistency Detection Learner

IF Intermediate Faith IMPERF imperfect IND indicative INFL infinitival INSTR instrumental LF logical form LFCD

Low Faithfulness Constraint Demotion

LFG

Lexical-Functional Grammar

LOC locative LP

long passive

MCDI

MacArthur Communicative Development Inventory

MGL

Minimal Generalization Learner

MLG

Maximum Likelihood Learning of Lexicons and Grammars

MLU

mean length of utterance

MLUW

mean length of utterance in words

MMN

mismatch negativity

MP

Minimalist Program

MPs

measure phrases

MR

matrix-reversal

MRCD

Multi-Recursive Constraint Demotion

MRE

Modal Reference Effect

MZ

monozygotic (twins)

NEG negative NEUT neuter NOM nominative NP noun phrase NPI

negative polarity item

xx LIST OF abbreviations OCP

Obligatory Contour Principle

OT Optimality Theory P&P

principles and parameters

PAC

Probably Approximately Correct

PART particle PASS passive PAST past tense PCFG

probabilistic context-free grammar

PERF perfect tense PERFV perfective PERS.DET personal determiner PF phonetic form PIC

Phase Impenetrability Condition

PL plural PoS

poverty of the stimulus

PP prepositional phrase PPI

Positive Polarity Items

PRET preterite PRS present tense PTPs

phoneme transition pairs

PWd prosodic words QR

Quantifier Raising

QNP

quantified noun phrase

QTL

quantitative trait locus

RCD

Recursive Constraint Demotion

r.e.

recursively enumerable

RI

root infinitive

RIP

Robust Interpretive Parsing

RMSEA

root-mean square error of approximation

RT reaction time R-to-O

raising-to-object

SEM

structural equation modeling

SG singular SGA

Stochastic Gradient Ascent/Descent

list of abbreviations xxi SLI

Specific Language Impairment

SMT

Strong Minimalist Thesis

SP

short passive

SS Spontaneous Speech STL

Structural Triggers Learner

SUBJ subjunctive SVC

Single Value Constraint

TCP

The Compounding Parameter

TD

typically developing

TEDS

Twin Early Development Study

TLA

Triggering Learning Algorithm

TOP topic TP

transitional probability

TT

Theme-Topic

TVJ

truth value judgment

UCC

Unique Checking Constraint

UG

Universal Grammar

UN

unmarked for case

UPR

Universal Phase Requirement

UPRH Universal Phase Requirement Hypothesis UR

underlying representation

UTAH Uniformity of Theta Assignment Hypothesis (Baker 1988) VOT

voice onset time

VP verb phrse VPE

verb phrase ellipsis

WS

Williams Syndrome

Chapter 1

Introdu c t i on Jeffrey Lidz, William Snyder, and Joe Pater

Modern linguistics asks three fundamental questions, articulated by Chomsky (1986): What exactly do you know when you know your native language? How did you come to know it? And, how do you put that knowledge to use? Investigation of the second question is the concern of developmental linguistics. Leading research questions within developmental linguistics include the following: What do newborn children bring to the task of language acquisition in the form of prior knowledge, information processing capacity, and extralinguistic cognitive resources? What information must children extract from their linguistic input? How does biological maturation interact with the child’s developing linguistic abilities? What can the child’s process of language acquisition tell us about the nature of linguistic competence in an adult? How are children’s linguistic abilities informed by the study of extralinguistic cognition or by the information processing mechanisms that underlie on-line understanding? Research findings in developmental linguistics are extremely difficult to interpret, for several reasons. First, as in all areas of linguistics, children’s performance reflects a combination of their grammatical knowledge with the production or comprehension mechanisms through which they produce behavior. Because children appear to be generally more susceptible to the contribution of performance factors than adults, it is difficult to know whether to assign credit or blame to the grammar or to the performance systems in any single case. Similarly, because so little is understood about speech production processes, even in adults, children’s utterances may be susceptible to interference from these factors. Of course, examining children’s behavior through a psycholinguistic lens may help to alleviate the credit-assignment problem. Some of this difficulty is also lessened when children are compared against explicit models of the acquired knowledge and of the range and limits of cross-linguistic variation. A concrete model of linguistic knowledge can often provide specific predictions that help to more accurately identify the contribution of grammatical knowledge. The chapters in this handbook aim to use the relation between developmental findings and the core generalizations that any linguistic theory must account for as a guiding

2 Lidz, Snyder, and Pater principle. Where linguistic theory provides constraints on possible grammars, these inform our analyses of children’s linguistic development. As a whole, the book also aims to explore the role of experience in shaping children’s linguistic development. How does the structure of the learner interact with features of the environment to drive learning? The book is divided into 6 parts. Parts I–IV, which make up the bulk of the book, address key developmental findings in the core linguistic domains of phonology, morphology, syntax, and semantics/pragmatics. These chapters are fundamentally empirical in scope, while also tying the empirical and developmental phenomena to key insights in linguistic theory and to questions of learnability. Part V addresses computational approaches to learning, providing the state of the art in statistical and knowledge driven approaches to learning. Finally, Part VI addresses learning in atypical populations. In Parts I–IV, each chapter addresses a single area of grammatical knowledge, such as syllable structure, negation, or binding theory. Authors were asked to address three fundamental issues. First, what is the current state of our understanding about the core linguistic generalizations in the domain under investigation. Second, what is the current state of our understanding of children’s development with respect to the phenomena under discussion. Third, given what we know about these generalizations and the nature of the linguistic environment, how might a learner deduce the correct grammatical structure in a given language? We asked each author to provide an overview of the fundamental generalizations that guide current linguistic analyses and the features of grammatical representation that these generalizations entail. Within that area of grammar, the chapters ask what characteristics are plausibly shared by all human languages, and what characteristics are known to vary? Universals might (or might not) reflect innate characteristics of the human brain, but where grammars vary, the child necessarily deduces the correct option for her target language through analysis of her linguistic input. Next, each author was asked to review the relevant acquisition literature. Each literature review aims to organize the principal findings according to target language, age range of the child, and research methodology (such as naturalistic observation, elicited production, or truth value judgment). This systematic approach automatically highlights any differences in findings according to research method. It also brings to light any informational gaps in the literature, such as lack of naturalistic data from children in a particular age range, for example, or lack of evidence concerning a theoretically central class of target languages. Finally, authors were asked to raise considerations of language learnability. Given what we know about the nature of the child’s input, how in principle might a child deduce the correct grammatical options for her target language? To what extent do explicit representational theories make the input more or less informative? To what extent are such representational presuppositions necessary to garner learning benefits from the input? Do the experimental findings favor a particular approach to the logical problem of language acquisition? In what ways, if any, does the child’s knowledge surpass the information directly available from the input? In what ways can innate structure make the input

Introduction 3 more informative? Likewise, are there ways in which the child’s knowledge seems more limited than expected, given the richness of the available input? These questions have particular interest for the developmental linguist, because they bear directly on the prior knowledge that the newborn child does (and does not) bring to the task of language acquisition, and on the child’s computational capacities and limitations at different stages of development. Of course, for different phenomena, the relative proportion of linguistic, developmental, and learning theoretic knowledge varies. Consequently, the chapters vary in the degree to which each of these is emphasized. Readers of individual chapters, however, should be able to take away the current state of our understanding in a particular linguistic domain. And readers of the entire book will see the full range issues addressed in considerable depth across the chapters. In Part V, the emphasis shifts away from particular grammatical phenomena and towards theories of learning. These chapters address the space of possible grammars and the learner’s ability to traverse that space from a computational perspective. In these chapters, readers will be able to learn about the key findings and methods in learnability theory, statistical inference, parameter setting, and constraint ranking. These chapters help to place the more phenomenologically grounded chapters of Parts I–IV in a broader perspective and point to tools that researchers can use to investigate the link between observations of development and theories of acquisition. Part VI turns to language acquisition in atypical populations, asking how these populations help us to better understand the language faculty in the typical context. These chapters examine phonological and syntactic disorders and place them in a richly genetic and developmental linguistic context. We hope that the book as a whole will guide future research in several ways. First, it will provide a comprehensive survey that can provide the basis for subsequent research. The comprehensive reviews found here will be a valuable first stop for researchers looking to begin their exploration of a new area. Second, by identifying gaps in our knowledge, the chapters provide obvious jumping-off points for future research. Third, by including discussion of typological variation and the role of input in acquisition, the chapters will lay the groundwork for future studies that examine how the learner interacts with the input in acquiring various features of the grammar. Finally, by bringing together, in a single volume, the work of scholars who study a diverse range of languages and developmental linguistic phenomena, we aim to provide a definitive statement of (a) the set of phenomena to which a theory of language development must be responsive, (b) the overarching learning-theoretic issues posed by the complexity of grammar acquisition, plus (c) a picture of the constraints on grammatical theory that are determined by our understanding of language acquisition.

Pa rt I

T H E AC QU I SI T ION OF S OU N D SYST E M S

Chapter 2

The Ac qu i si t i on of Phonol o g i c a l Inventori e s Ewan Dunbar and William Idsardi

2.1 The Traditional Views The acquisition of phonological inventories is a subject which has been studied by both linguists and psychologists, and rightly so—there is no question from the point of view of the theoretician studying universal grammar that the path from the initial state to the adult state is of interest, and there is no question from the developmental psychologist’s point of view that the child’s phonological capacities are undergoing substantial development in the early years. Beyond similar titles, however, the linguist and the psychologist researching “phonological inventory development” traditionally have little to share, because by “inventory development” they usually mean quite different things. The tradition among linguists began with Roman Jakobson, who placed the empirical focus on the child’s improving capacities for producing sounds. For psychologists, however, the seminal work of Eimas et al. (1971) shifted the focus from observational studies of production to laboratory work in perception. Since then, there have been two different traditions, child phonology and infant speech perception, both of which use the term “inventory development,” but which have proceeded independently. With two traditions, there come two sets of received facts. The received facts here both come in the form of developmental sequences, and traditional ways of understanding those developmental sequences. We begin by summarizing these traditional views of inventory development from the point of view of the linguist and the psychologist.

8 Ewan Dunbar and William Idsardi

2.1.1 Production: The Linguist’s View A child phonologist asked about the seminal works in the field will likely cite Jakobson’s (1941) Kindersprache, Aphasie, und Allgemeine Lautgesetze, first published in English in 1968 as Child Language, Aphasia, and Phonological Universals. The Kindersprache presented an enticing theory of inventory development that elegantly tied together the three elements of its title. It was so enticing, and had such scope, that it became the standard theory in the study of phonological acquisition, gaining a central place in language acquisition textbooks, attracting a search for counterexamples to the empirical claims it contained, and detracting attention from the areas of development it did not cover (see Menn 1980 for a brief summary of this period). Jakobson’s view of acquisition was that phonological inventories were acquired by repeated division of the phonological space into two-way contrasts (see also Dresher 2009). The child would first distinguish vowels from consonants; any further details would be missing until the next contrast was established—for vowels, for example, the next contrast was said to be between low vowels and high-front vowels, and for consonants, the second contrast was said to be between nasal and oral stops. The order of acquisition was claimed by Jakobson to be universal, a claim that was based on a survey of the then-available empirical literature. At the center of the Kindersprache theory was a set of “structural laws” that gave priority to one contrast over another; intuitively, the structural laws gave an abstract complexity measure distinguishing “less-” from “more-structured” sounds. The elegance of the theory was that the same structural laws were said to govern all three of the title areas. The order in which sounds were acquired was said to be the reverse of the order in which sounds were lost in aphasia (“last in, first out”). The structural laws, too, gave rise to a set of cross-linguistic tendencies of the kind that would later come to be called “implicational universals” following Greenberg’s (1963) paper: “the opposition of a stop and an affricate in the languages of the world implies the presence of a fricative of the same series,” wrote Jakobson (1968: 56)—we conclude that the stop/fricative contrast has priority over the fricative/affricate (or stop/affricate) contrast. It follows from this that fricatives should appear earlier than affricates in language acquisition, and affricates should be lost prior to fricatives in aphasia. The Kindersprache theory was simple and powerful and, for researchers wishing to pursue it empirically, the theory cut an obvious path, essentially predicting a universal sequence in segmental acquisition. First, was there really a universal order of acquisition? If not, the Kindersprache theory as stated was too strong. Second, if there was a universal order of acquisition, what was it? Subsequent questions would depend on having answers to these questions, and so an empirical literature emerged reporting longitudinal data on the child’s changing set of contrasts. Over the following decades, longitudinal production data, combined with similar data from clinical studies of abnormally developing children, was often pooled in survey papers which attempted to find the commonalities in the observed data. We give an

The Acquisition of Phonological Inventories 9 Table 2.1 Sequence of consonant contrasts, drawing on previous reviews and summaries. Example inventories are given for a typical English-speaking child Contrasts Oral/nasal, obstruent/sonorant, coronal/labial ↓ Oral/nasal, obstruent/sonorant, coronal/labial, voiced/voiceless ↓ Oral/nasal, obstruent/sonorant, coronal/labial/dorsal, voiced/voiceless

Example inventory p

t

m w

n j

ph b

th d

m w

n j

ph b

th d

kh g

m w

n j

ŋ h

ph b f v m w

th d s z n j

kh g

ph b f v m w

th d s z n j

↓ Oral/nasal, obstruent/sonorant, coronal/labial/dorsal, voiced/voiceless, stop/fricative ↓ Oral/nasal, obstruent/sonorant, coronal/labial/dorsal/palatal, voiced/voiceless, stop/fricative ↓ l > r > θ, ð

ŋ h ʧ ʤ ʃʒ

kh g ŋ h

. . .

outline compiled from some of these surveys which relate strictly to the development of consonant contrasts (Grunwell 1981, 1982; Dinnsen 1992) in Table 2.1.1 1

The most comprehensive set of empirical claims about vowel development is still Jakobson’s: first, a high/low split ([i]‌versus [ɑ]), followed by a front/back split (adding [u]), or a secondary height contrast (adding [e]). Jakobson’s empirical claims are worthy of serious scrutiny, however, and thorough work describing the emergence of vowel productions is more scarce than work on consonants. The phonetic study of Lieberman 1980 does not distinguish words from babbling, and, more importantly, contains no record of the intended pronunciations in words, meaning that we cannot evaluate the child’s contrastive inventory; other work is not longitudinal (Davis and Macneilage 1990) or is unfortunately confounded (the subject in Major 1976 is English–Portuguese bilingual). The most usable data for English are to be found in Otomo and Stoel-Gammon 1992; the main finding in that paper, which examines only unrounded English vowels, is that the lax vowels [i] and [ɛ] become contrastive only relatively late, while tense [i] and [ɑ] are contrasted earliest (corroborating Jakobson), with [e] and [æ] falling somewhere in between.

10 Ewan Dunbar and William Idsardi Several questions arise from Table 2.1: first, we should ask how reliable the original sources are, but the authors have nothing to say about this. Second, we might ask how comparable the studies are. The short answer is that they are fairly inconsistent in their methodology. For example, many generalizations are taken from clinical studies, while others are taken from studies of normally developing children, a bias that simply reflects the greater contact with speech-language pathology practitioners for atypical children. These are not necessarily compatible; see Chapter 33 by Daniel A. Dinnsen, Jessica A. Barlow, and Judith A. Gierut in this volume for discussion of this and related issues. Another important methodological difference between studies is the criterion for adding a contrast to the table. Should we take the contrast to be in place when the sound is produced? When it is used “appropriately”? If so, what does “appropriately” mean? Do children “have a contrast” between [m]‌and [w] if they only ever use [w] as a substitute for [l]? Do they have a contrast between [w] and [l] if they substitute [w] for [l] in certain environments, but not others? How far should we go in attempting to find these environments, if the substitutions do turn out to be systematic, and we take this to be crucial in determining contrast? One somewhat radical answer to the question of what it means for a child to have acquired a contrast was suggested by Smith (1973), who constructed a full phonological grammar for the child’s productions, implying that the absence of a particular contrast on the surface could be treated as an epiphenomenon of a neutralization rule, and not the other way around, as a simple reading of Jakobson would suggest. This would imply that the answer to the question of when children “have a contrast” is really “when they produce each segment in all and only the environments that adults do,” so that tables like Table 2.1, inspired by Jakobson, would really be only approximations to the full set of relevant data. This really represents a different theory (see the discussion of Smith in section 2.2.1); we need not move to this extreme, however, to recognize that compiling a table such as Table 2.1 requires some set of clear criteria. Many of the original sources do not make their criteria explicit, and the rest are generally inconsistent with each other. Granting that the studies are comparable and reliable, the question of whether the data confirm or disconfirm Jakobson’s claims is of substantial interest. The highlights of Jakobson’s partial order on consonant contrast development are: nasal and oral stops should be distinguished early; labials and dentals should be distinguished later; these should be distinguished from velars and palatals still later; fricatives should be neutralized to stops early, but not the other way around, as noted earlier in this section; affricates and fricatives should be distinguished after stops and fricatives; [l]‌and [r] are distinguished late. Table 2.1 supports the idea that nasal and oral stops, as well as labial and dental stops, are distinguished early, but it is unclear whether there is an ordering between these two developments. Table 2.1 supports the idea that the velar/non-velar contrast, as well as the palatal/non-palatal contrast, are made later than the labial/dental contrast (except for the glides). Finally, Table 2.1 also supports the idea that stops precede fricatives, and that the [l]/[r] distinction is late. There are a few other generalizations that emerge about which Jakobson has nothing to say, such as the relative ordering between the acquisition of velars and the acquisition of palatals (Table 2.1 suggests that velars precede palatals), the place in the order of acquisition of a voiced/voiceless

The Acquisition of Phonological Inventories 11 distinction (Table 2.1 suggests it is relatively early), and the fact that, in addition to the relatively late emergence of the contrast between the two, neither [l] nor [r] appears at all before some late stage. On the whole, there is nothing in Table 2.1 that contradicts Jakobson. Inevitably, however, there are published exceptions to even the general progression given in Table 2.1 (for example, Prather et al. (1975) report consistent acquisition of [ŋ] before the velar obstruents, [w] much later than [j], and [r] and [l] before [z]; Olmsted (1971) reports consistent acquisition of [ŋ] only much later than the velar obstruents, and also later than the other nasals; and Vihman et al. 1986 report a child with [s] and [ʃ] before any velar stops). Of course, Jakobson did not just claim that these orders could be found in some language; he claimed they were universal, whereas the studies that underpin Table 2.1 are taken only from English-speaking children. Similar studies of other languages are rare, but, where they exist, they reveal potentially problematic differences for the Kindersprache theory of universality. Macken’s studies of Spanish acquisition (Macken 1978, 1979) showed that Spanish children make a continuant–non-continuant distinction early, but not a voicing distinction; she concluded that Spanish children do make a voicing distinction at roughly the same time as English children, but initially realize it as a continuancy distinction because of the allophonic status of the fricatives in Spanish (voiced but not voiceless stops are subject to non-contrastive spirantization). Pye et al. (1987) report that Quiché-learning children acquire [ʧ] much earlier than [ʃ], and perhaps even earlier than [p]‌, [t], and [k]; they learn [x] long before [s], which seems to be later than[ʃ]; they learn [l] very early, and [p] and [t] appear to be fairly synchronous, whereas [m] is acquired later than [n]. Similar facts can be adduced for Finnish ([d] is late and [r] early; see Itkonen 1977). Some of these facts are in conflict with generalizations in Table 2.1, while others are also in conflict with generalizations of Jakobson’s. There is room for some such conflicts, because the theory does not state that all orderings must be universal; these questions were not satisfactorily resolved before attention shifted to other types of theories, however (see section 2.2.1). Finally, we might ask whether the source of data, child productions, is the only one we might use. It did not have to be the case that empirical research following Jakobson’s program focused only on production, although it did. The core idea of the Kindersprache theory makes reference to contrast; but the idea of a hierarchy of contrasts underpinning phonological acquisition, loss, and typology is viable whether we are talking about perception, production, or memory. It is only because Jakobson dismissed the idea of studying perceptual development, pointing out that children could easily distinguish (and presumably remember) minimal pairs of words differing crucially by contrasts they could not yet produce, that he took all his evidence from production; as is often the case, this detail of the original author’s views shaped the understanding of the theory. Importantly, Jakobson was correct about certain generalities: although exact ages for individual contrast developments in production studies are variable, one thing which is broadly consistent is that the least developed production inventories are typically seen around 1;5, and acquisition of all contrasts can last years. As we shall see in this chapter, however, perceptual development for these basic contrasts is largely adult-like by the time native-like productions begin to take shape.

12 Ewan Dunbar and William Idsardi In summary, to the extent that the primary empirical claim of the Kindersprache— that there is a universal acquisition sequence—has been assessed, research has revealed that, while there are generalizations to be made, there are exceptions which are worthy of explanation. In light of this variability, and particularly in light of the cross-linguistic variability that appears to exist in the ordering of contrasts, it would be reasonable to conclude that the simplest version of the Jakobsonian hypothesis is long disconfirmed: whatever the substantive content of the learning mechanism, it does not consist of a simple “checklist.” (See Ingram 1988b and Edwards and Beckman 2008 for evidence that frequency might be able to explain some of the variance.) In defense of the theory, however, it is worth pointing out that the differences in methodology across studies make it difficult to assess the facts at all. In what is perhaps a more equivocal defense of the theory, we must point out that the inconsistent notion of what it means for a child to “have a contrast” makes it hard to compare generalizations. The fact that Jakobson is ambiguous on this point means that it is difficult in principle to evaluate the theory. Recent work (e.g. Edwards and Beckman 2008) is encouraging in its use of controlled methodology (laboratory-elicited word productions), but the simple percent accuracy measure used there leaves the question of what should qualify as contrast acquisition open; see Ingram (1988a) for some suggestions.

2.1.2 Perception: The Psychologist’s View If the Kindersprache theory was a model for elegance of theory in mid-twentieth century phonology, the emerging speech perception literature was surely the corresponding model of empirical rigor. When experiments aimed at uncovering the psychoacoustic basis of speech perception revealed that speakers of different languages respond differently to the same sounds on low-level perceptual tasks, it became difficult to dispute the psychological reality of phonological contrast (Abramson and Lisker 1970). If perception was influenced by the linguistic environment in which a person was brought up, then the natural questions were how and when the ambient language came to impress itself upon the perceptual systems. The paper that broke the empirical ground in infant speech perception was Eimas et al. (1971). Armed with the high-amplitude sucking technique of Siqueland and DeLucia (1969), the researchers were able to measure the discrimination abilities of 1-and 4-month-old infants, who, astonishingly, showed the same pattern of discontinous perception for VOT (voice onset time) as English-speaking adults. The insight that measures of infants’ habituation and recovery could be used as measures of discrimination abilities gave Eimas et al. (1971) as much methodological cachet as the Kindersprache had the theoretical. It took some time for a clear picture to begin to emerge. Streeter (1976) demonstrated that 2-month-old infants raised in a Kikuyu-speaking environment showed discontinuous perception for stop voicing with an English-like boundary, despite the fact that the

The Acquisition of Phonological Inventories 13 Kikuyu VOT contrast is between prevoiced and unaspirated, not between unaspirated and aspirated like English; Lasky et al. (1975) reported a similar result using a heart- rate measure for 4-and 5-month-old Spanish-learning infants, although their ambient language also had a non-English-like VOT boundary, suggesting that the Eimas et al. discontinuity might not have been due to the influence of the ambient language. (Crucially, adults from language backgrounds with short-lag VOT boundaries do not show English-like perceptual boundaries; see Lisker and Abramson 1970.) The discovery by Kuhl and Miller (1975) that chinchillas also showed an English-like discontinuity for perception of English stops dealt a serious blow to the idea that very young infants showed true categorical (not simply discontinuous) perception, that is, that their discrimination abilities tracked knowledge of linguistic categories.2 It was not until Werker and Tees (1984, which established the use of head-turn procedures, rather than sucking procedures, for infants of suitable age) that a timeline began to be established. The now- famous 10-month developmental milestone, at which infants show a precipitous drop in perceptual sensitivity to non-native contrasts, was not as early as previous results had suggested, but researchers were still surprised at just how early categorical perception was evident (Werker 1989). With a powerful experimental paradigm, and, now, a powerful experimental result, the study of infant speech perception had become, for the initiated, the study of phonological inventory development. In this field of phonological inventory development, the timelines looked different. Having started from the speech perception literature, and not from the Kindersprache, the question of a hierarchy of contrasts never arose, and was never systematically investigated, although clear differences from contrast to contrast in the onset of native- language effects were evident from the start. Instead, the next milestone that was sought was the development of a different type of knowledge—lexical categories, that is, categories used in long-term memory storage, rather than categories which might only be perceptual. The question of when children begin to acquire their native-language lexical categories, and how to elicit behavior that reveals lexical and not phonetic categorization, was first asked in the literature by Shvachkin (1948). To encourage young children (between 0;10 and 1;6) to construe strings of speech sounds as “words,” Shvachkin trained them on novel names for objects by presenting the names alongside the objects in play. In this way, the strings would presumably be encoded in the same way as any item in the lexicon. The children were then asked by the experimenter to pick up the object, to use it in play, and so on. After presenting several objects in this way, two of which formed a minimal pair, the experimenter would make a request for one of these two items; performance in retrieving the item was evaluated. From his data, Shvachkin attempted to find a sequence of lexical-receptive contrasts along the lines of Jakobson’s. This literature 2 The fact that Streeter (1976) also reported Kikuyu 2-month-olds’ sensitivity to a Kikuyu-like boundary not found in English infants of the same age has been cast off as an anomaly in light of later research (and indeed, earlier research: Lasky et al. did not find a Spanish-adult-like VOT boundary in older Spanish-learning infants), but it was taken to be significant at the time.

14 Ewan Dunbar and William Idsardi saw a resurgence in the 1970s, with experiments carried out by Garnica (1973), Eilers and Oller (1976), and Barton (1976). Some tendencies—like an early contrast between sonorants and obstruents and very late acquisition of voicing of stops—did begin to emerge—but the new interest in these experiments was too short-lived for anything clear to be determined. Eilers and Oller (1976) and Barton (1976) also reported improvement with mispronunciations of familiar words versus trained nonce words. Perhaps the most important point about these results, however, is that there is an enormous discrepancy with respect to perceptual category development. The youngest children tested by Shvachkin and Garnica were 0;10 and 1;5 respectively, and many of these children performed well only on a fairly restricted set of contrasts (obstruent versus sonorant, along with some distinctions among sonorants). Later researchers, with knowledge of the relatively early onset of native-like speech perception, would have thus been forgiven for thinking that the word-learning tasks were simply too complicated and the measures too indirect to be informative. There was some surprise, therefore, when Stager and Werker (1997) reported that 14- month-olds performed poorly in a modern, laboratory version of the word-learning task. In their methodology, the infant was presented with an auditory nonce-word label, paired with a visually presented novel object. In the two-object variant, a habituation phase with two different word–object pairs was followed by a test pair in a ‘same’ condition—one of the previously presented objects was presented with its previous label—or a ‘switch’ condtion—one of the previously presented objects was presented with the other previously presented label. A difference in looking times in the two conditions implied successful discrimination. Interestingly, English-learning 14-month-olds failed in this task when the labels were minimal pairs (/bɪ/ versus /dɪ/). Meanwhile, they succeeded when the words were presented without the objects during habituation (that is, with a checkerboard pattern rather than object pictures, with the transition now being from a one-word habituation phase to a second item at test). Even more interestingly, 8-month-olds performed well in a similar task—a one-object variant—in which only one word–object pair was presented in the habituation phase. (The younger infants could not do the two-object task at all.) Numerous variants on this experiment were subsequently reported, with the original result often corroborated, but not always (Pater et al. 1998; Werker et al. 1998; Swingley and Aslin 2000, 2002; Werker et al. 2002; Fennell and Werker 2003; Fennell 2004, 2006; Wales and Hollich 2004; Fennell et al. 2007; Thiessen 2007; Yoshida et al. 2009). The difficulties did not seem on the surface to be as severe as those of Garnica’s subjects, who struggled to perform on many contrasts up to at least 2;0; in this task, infants seemed to be fully recovered by 1;8, at least on the limited set of contrasts tested (Werker et al. 2002; Thiessen 2007). Furthermore, small task differences (including the difference between novel and familiar words explored in the earlier literature) made a big difference in performance. The consensus quickly emerged that the results had something to do with word learning (the “word effect”), but whether there was a true representational failure or some other problem in lexical access or learning was up for debate.

The Acquisition of Phonological Inventories 15 The speech perception paradigm for studying inventory development has now made definitive empirical progress; a standard timeline, with the standard understanding of the results, is given in Table 2.2. The usual understanding of this literature is that infants begin as “universal listeners,” capable of distinguishing any contrast perceptually, and then learn by “learning to ignore”—but what is the explanation? The speech perception tradition in phonetic inventory learning research is quite different from the Jakobsonian one, and it has a rather different story to tell. Most importantly, the difference between the slow grind of production development and the quick Table 2.2 Milestones in infant speech perception Age

Perceptual changes

Contrasts tested

0;1–0;2

Sensitivity to non-native contrasts

English, Kikuyu infants show a categorical VOT boundary for labials at around 25–30 ms, despite the absence of such a boundary in Kikuyu

0;6

Adult-like warping of vowel perception, as revealed by different directional asymmetries in discrimination

[i]‌–[ y], for English versus Swedish infants

Continued sensitivity to non- native consonant contrasts, as revealed by discrimination performance similar to adults or older infants from language environments in which the contrasts are native

[th]–[ṭ], [k′]–[q′], [k]‌–[k′], [b]–[b], [ɬ]-[ɮ] in English infants; [r]–[l] in Japanese infants

0;6–0;8

Poor sensitivity to certain difficult native-language contrasts

[f]‌–[θ], for English-learning infants; syllable-initial [n]–[ŋ], for infants learning Filipino (Tagalog)

0;8

Ability to detect changes in novel native-language minimal pairs associated with objects

[bɪ]–[dɪ], for English-learning infants

0;10–0;12

Decline in sensitivity to non-native consonant contrasts to adult-like or near-adult-like levels

Contrasts listed at 0;6

0;10–1;2

Improvement in sensitivity to difficult native-language contrasts

Contrasts listed at 0;6–0;8

1;2–1;3

Decline in ability to distinguish novel minimal pairs associated with objects

[bɪ]–[dɪ]; [bɪn]–[dɪn]; [bɪn]–[ph ɪn]; [dɪn]– [ph ɪn]; [dɪn]–[ɡɪn]; [dɑ]–[thɑ], all for English-learning infants

1;5–1;8

Improvement in ability to distinguish minimal pairs when associated with objects

[bɪ]–[dɪ]; [dɑ]–[thɑ], for English-learning infants

16 Ewan Dunbar and William Idsardi transition from newborn to native-like hearer is empirically undeniable. This simple difference seems to undermine the very premise of the acquisition of the phonological inventory.

2.2 Revisiting Tradition What does it mean to acquire a phonological inventory? “At first glance,” writes Jusczyk (1992), “[i]‌t would seem to be a matter of identifying the elementary sound units that are used to form words in the language.” This statement might seem innocuous and neutral. In fact, however, it represents only one answer to the question of what it means to acquire a phonological inventory. One can easily identify at least three others. We find it helpful to think of human speech-sound cognition (phonology, in a broad sense) in terms of three inescapable facts: humans have ears; humans have mouths; and humans have memories. Phonological processing includes, minimally, some receptive processing system, some productive processing system, and some storage system. Processing information entails having some way of encoding that information. Minimally, then, there are three encoding formats used by the brain to process speech; these three formats are conceptually distinct, even if they are not all actually distinct. Add to this the observation that the storage system must interface with both the productive and the receptive systems, and we obtain four different possible senses of the term “phonological inventory”: a set of possible distinctions that can be made when perceiving speech (the phonetic–receptive inventory); a set of distinctions that can be made when storing perceived speech in the lexicon (the lexical– receptive inventory); a set of distinctions that can be made when producing speech (the phonetic–productive inventory); and a set of distinctions among speech sounds that can be made when storing instructions for producing words in the lexicon (the lexical–productive inventory). These four senses of “phonological inventory” are summarized in Table 2.3. For example, linguists will be familiar with the lexical versus phonetic distinction made in Table 2.3 (vertical). Grammatical descriptions constructed by linguists are usually viewed as descriptions of a mapping between these two cognitively distinct

Table 2.3 Four conceptually distinct representations that have been called “phonological inventories” in the developmental literature lexical–receptive: Memory encoding (perceived speech)

lexical–productive: Memory encoding (speech to be produced)

phonetic–receptive: Information used in speech processing

phonetic–productive: Information used in speech production

The Acquisition of Phonological Inventories 17 types of representations, and it is for this reason that finding an adequate theory of these grammars is a part of cognitive science (Chomsky and Halle 1968; Prince and Smolensky 2004). On the other hand, most theoretical works in phonology assume a single set of features common to perception and production implying that the receptive versus productive dimension in Table 2.3 (horizontal) is not relevant in the lexicon. Psychologists, however, are likely to be familiar with theories that the lexicon contains two separate sub-stores, one for recognizing language, and one (linked, but in some sense distinct) for producing language (Straight 1980; Caramazza 1988; see Menn 1992 for such a proposal approached from within theoretical linguistics), which would require two separate encodings at the lexical level, and thus in principle two separate developmental tracks. The phonetic levels are simply those representational formats that the brain uses to encode information about perceived speech and to control motor systems. A multitude of logically possible distinctions other than these come to mind, but these four senses are the ones most frequently found in the literature and in current thinking on the development of phonological representations.3 Having reviewed some of the classic literature on inventory development, we encourage the reader to reconsider some of the principal results and claims in terms of this four-way distinction. As was made clear above in section 2.1, for example, the child phonology literature has focused on productive representations, while the infant speech perception literature has focused on receptive representations; we might wonder how we can relate the two. More difficult questions arise when we ask whether particular 3

All researchers will have heard the term phoneme used in one or more of these contexts, but we have avoided it. In certain schools of early twentieth-century phonology in which many researchers considered their enterprise to be only loosely connected to cognition (notably, American Structuralism), there were two senses of the term, corresponding to the minimal units on two different levels of analysis (the taxonomic phonemic level and the abstract phonemic level). Early generative linguistics (e.g. Chomsky 1964; Chomsky and Halle 1968) avoided the term in favor of a distinction between “lexical” (or “underlying”) and “phonetic” representations, except when referring to the work of others. The term “phoneme” quickly worked its way into work in generative phonology, however. These early generative uses of “phoneme” seem to refer to the unit of lexical representation; but, in more recent work, it is sometimes difficult to tell whether the term is defined by the level of representation it refers to or by certain properties that have become associated with that level of representation, and, if so, which properties (for example, division into segments, degree of abstractness, or assumption of a finite inventory). Outside linguistics, the situation is more confusing, as the term is often used to refer to any or all of the four representations in Table 2.3. For these reasons, we avoid the term to prevent misunderstanding. We also stress that the term “inventory” in this chapter could usually be replaced with “representational capacity”; most of the facts and theories discussed could be restated without assuming a finite inventory of segments (for example, the same themes would be stressed if representational development were considered from the point of view of exemplar theory); nor do we assume that what is developing in the infant are systems for representing individual sounds, like [s]‌or [i], since what we have presented is too general to bear on the nature of the features that represent individual sounds. We leave it to the reader to consider these issues. Although there is some controversy over these fundamental assumptions, understanding these debates is not crucial to considering most of the developmental literature, and attempting to be entirely neutral would have led to confusing terminological awkwardness. For some discussion of these issues, see Dunbar and Idsardi 2010.

18 Ewan Dunbar and William Idsardi theories or pieces of evidence relate to lexical or phonetic levels, and, although some attempt has been made to dissociate the two in the speech perception literature, we will find reasons to doubt how well we have dissociated them up to this point. In this section, we consider a few of the issues raised.

2.2.1 What Level Is This? Jakobson and Smith Current evidence suggests that the child’s perceptual abilities are relatively adult-like during the period of interest to Jakobson, in which production seems to be still in flux. Jakobson’s discussion and his examples from the observational literature of the time (for example, the 1-year-old son of Serbian linguist Milivoj Pavlović (1920), who understood the difference between the words for his father, tata, and his excreta, kaka, but called them both tata), are meant to suggest that both the phonetic–receptive inventory and the lexical–receptive inventory are in place. What inventory was it that Jakobson was watching develop? Following our four-way distinction, it could be the lexical– productive inventory, or of the phonetic–productive inventory, or both, or a mapping between them. In any case, Jakobson’s inventory was a productive inventory, meaning that, for Jakobson, the productive and receptive inventories were necessarily distinct on some level. Furthermore, according to Jakobson, the productive inventory is not limited just by the child’s motor skills, since, in babbling, the child can produce a wide range of native and non-native speech sounds (see also Hale and Reiss 2008). Empirically, this is an overstatement: while children do seem to have (somewhat) larger babbling repertoires than they show in word production, the set of babbling sounds is fairly similar to the set of sounds they use in words (Vihman et al. 1986); nevertheless, there is a protracted arc of productive inventory development after babbling ceases which needs to be explained regardless of how and whether it is related to babbling, and Jakobson does not tell us enough to know exactly which parts of the cognitive system this development should be attributed to. An answer to the question of just what was developing, if not motor skills, arose in the generative tradition shortly after the English publication of the Kindersprache. Stampe (1969) and Smith (1973) replaced Jakobson’s theory of representational development with a theory of grammatical development. They attempted to characterize a level of grammatical preprocessing prior to the articulatory level, that would map from adultlike stored forms, assumed to be fairly accurate, to child-like productions. They claimed that this computation was the same type of computation that went on in adult phonology (at the time, the kind of computation laid out Chomsky and Halle’s (1968) Sound Pattern of English). Both Stampe and Smith presented further arguments against the motor-failure hypothesis. Particularly compelling was Smith’s so-called puzzle-puzzle. Smith’s son, Amahl, at some stage, mapped all adult /d/to [ɡ] in a certain context (before syllabic [lˌ]‌). Under a motor-failure theory, this would be because Amahl was unable to coordinate

The Acquisition of Phonological Inventories 19 his muscles to pronounce /d/in this context. But, in fact, Amahl mapped adult /z/in this context to [d], giving the chain-shift outcome in (1): (1)

/pʌd1ˌ/→ [pʌg1ˌ] /pʌz1ˌ/ → [pʌd1ˌ]

Smith reports thoroughly testing that Amahl was in fact able to perceive this difference and map it on to different lexical items (though Macken (1980) objects to the claim that Amahl’s perception of the contrast was adultlike); his data also demonstrate that these substitutions were systematic. Two similar cases are presented by Smith in his 1973 book, and other cases have been noted before and since (Aleksandrov 1883; Smith 2010). The new insight here was that the absence of a contrast between, say, [d]‌and [ɡ], did not entail that a child would simply always select one, or choose one at random. If the child was systematic (therefore, consistent), the limitation could not be in the ability to produce a particular segment, or even in the ability to produce it consistently, and, as the puzzle-puzzle demonstrated, the limitation could not be in the ability to produce a particular sequence, either. (Smith also had a way of ruling out the possibility that productive development was due to changing lexical inventories, which would have manifested itself in the form of mislearned lexical items. For details, see Smith 1973; for a response, see Menn 1992; for review of the issue, see Smith 2010.) Though the details of Smith’s analysis were criticized (Braine 1976b; Macken 1980; Menn 1980), and though Smith was not the first to attempt a similar analysis (see, for example, Chao 1951), the idea that child productions were the product of systematic substitution rules similar to adult phonological rules, and that misperception was not the source of the majority of children’s phonological errors, stimulated a large amount of research. Furthermore, since Smith made his full longitudinal lexicon available in an appendix, many subsequent papers have reanalyzed the Amahl data (Macken 1980; Goad 1997; Dinnsen et al. 2001; Vanderweide 2006). One consequence of Smith’s conclusions is that, regardless of whether the lexical– productive inventory is distinct from the lexical–receptive inventory, the phonetic– productive inventory is not the same as the lexical–productive inventory. Under this theory, a Jakobson-style expansion in phonetic–productive representational capacity either drives, or is an epiphenomenon of, the development of the production grammar. Clearly, if the phonetic–productive inventory is epiphenomenal, then it is not an object of study by itself; recently, however, some theories have begun to treat phonetic– productive inventory development as a representational expansion again. The formal treatment has been in terms of a changing set of available features, either with restrictions stated as Optimality-Theoretic (OT) markedness constraints (Boersma and Levelt 2003; Kager et al. 2004), or stated directly in a more Jakobson-like theory specialized for markedness relations among features (Rice and Avery 1995; Dresher 2009; related child language analyses are to be found in Levelt 1989; Fikkert 1994; Dinnsen 1996; Brown and Matthews 1997).

20 Ewan Dunbar and William Idsardi Another of these representational approaches (Vihman and Croft 2007; Altvater- Mackensen and Fikkert 2010) locates Jakobson’s developmental arc in the lexical– productive inventory, and, furthermore, claims that there is only one type of lexical encoding, shared by the lexical–receptive and lexical–productive functions. The obvious challenge for this approach is to explain why children are capable of discriminating contrasts they cannot produce. One response has been to claim that, unlike children’s simple phonetic discrimination performance, children’s performance in word-learning tasks does in fact mirror production; see section 2.2.2 for further discussion. Finally, it is worth remembering that our four-way division of logically possible inventories is not complete, nor can it ever be; one can always think of finer subdivisions of each cell, and theories of phonology and speech processing that would make additional nuances critical. Although Smith assumed the child’s phonological grammar to be a sequence of rules manipulating discrete feature values, there is evidence for learned “phonetic processes” operating on sub-symbolic information (Sledd 1966; Dyck 1995; Flemming 2001), in addition to learned cross-linguistic differences in the phonetic implementation of individual phonological features (Pierrehumbert 2003). Taking into account either of these two parts of phonological processing would suggest that theoreticians might consider distinguishing between two types of phonetic–productive inventories, one of which is the output of a Smith-type grammar operating over discrete feature values, and one which is the output of subsequent phonetic implementation and phonetic adjustment processes. Since some studies have reported sub-phonemic changes in children’s productions over time (Macken and Barton 1980b; Scobbie et al. 2000), there are very likely many theoretically important facts waiting to be discovered in the phonetics of child productions.

2.2.2 Word Learning Tasks The standard interpretation of the result of Stager and Werker (1997) discussed in section 2.1.2 is that infants can be good at perceiving contrasts without being good at storing them in or retrieving them from the lexicon. If we accept that these tasks probe something lexical, then we need to ask how it could be that the storage step seems to fail when discrimination is at a ceiling. A familiar explanation is that there are two kinds of encoding under development: both the phonetic–receptive encoding and the lexical–receptive encoding are in flux during infancy, changing in response to the linguistic environment. The explanation here is straightforward: discrimination tasks show the development of the phonetic– receptive encoding (thought of as a set of phonetic categories, distinct from the child’s phonological categories); word-learning tasks show the development of a second encoding, to which phonetic information must be mapped in the lexicon. A variant of this view maintains that there is really only one linguistic level that develops in response to the environment—the lexical–receptive level—and early changes in discrimination performance reflect low-level adaptation of the auditory system. Researchers may decide

The Acquisition of Phonological Inventories 21 for themselves whether they believe that “phonetic” and “auditory” mean the same thing in receptive processing, but the choice does not change the general shape of the explanation: if and only if children fail in a word-learning task, their lexical–receptive inventory must be insufficiently developed to represent the contrast being tested. Taking the novel word-learning data as the primary source of evidence about lexical inventory development implies that the lexical–receptive inventory is acquired relatively late; in particular, although infants apparently fail at Stager and Werker word- learning tasks at 14 months, their first words typically come around 12 months. How can early lexical knowledge exist without an encoding scheme for storing words in the lexicon? A resolution is to be found in a theory that claims that phonemic representations develop in response to an enlarging lexicon (Brown 1973; Charles-Luce and Luce 1990). The claim is essentially that the infant brain is equipped by default with a system which can encode a few words, but which is insufficient to encode a full human lexicon. All of these views restrict the interpretation of word-learning results to the lexical– receptive inventory. An attempt to relate the word-learning results with the development of the productive inventory has been put forward by proponents of phonological underspecification. This is the view that the encoding format for speech sounds to some degree follows the Saussurean “nothing but differences” principle, encoding certain speech sounds by systematically leaving the values of certain encoding features (dimensions) effectively specified with none of the legal contentful values (Lahiri and Reetz 2002; Dresher 2009). For example, it is commonly held that place of articulation can be specified as coronal (using the tip or blade of the tongue), labial (using the lips), or dorsal (using the back of the tongue), but underspecification theories often contend that the feature coronal acts as a universal default, and only labial and dorsal are specified explicitly. Predictions about misperception asymmetries have been derived from these types of claims in the psycholinguistic and neurolinguistic literature, for example, that a larger mismatch negativity (MMN, a neurophysiological change detection response; Näätänen et al. 1978) should be observed when a repeated sound with a marked feature value is changed to another sound, as compared to when a sound is changed from one with an unmarked feature value to another sound: because there is no feature to restrict the listener’s expectations about the following sound, a mismatch should not be detected (Friedrich et al. 2008; Scharinger et al. 2012). Underspecification of a unified lexical–receptive–productive inventory has been proposed as an alternative explanation for some of the behavior seen in word- learning tasks (Fikkert 2005; Fikkert and Levelt 2008). For example, Altvater- Mackensen and Fikkert (2010) propose that children go through a stage in which all the consonants in a word must share the place feature of the vowel (both alveolar consonants and high-front vowels are coronal, both velar consonants and low vowels are dorsal, and so on, under the feature theory of Halle et al. 2000); if children are in this stage, they may also fail to detect certain minimal pair contrasts, but not others. In particular, a change from /ba/to /da/would be detected, because the “correct”

22 Ewan Dunbar and William Idsardi feature value for the consonant would be dorsal (the harmonic form would be /ɡa/); a change from /bɪ/ to /dɪ/would not be detected, however, because the difference between the marked labial feature and the unmarked harmonic coronal feature would be undetectable. Surveying the previous literature, it is certainly true that researchers have not taken into account the place features of the vowel when constructing materials. A survey of all known published studies following the Stager and Werker methodology reveals only the following crucial pairs: /bɪ/–/dɪ/(Stager and Werker 1997; Werker et al. 2002), /lɪf/– /nim/(Stager and Werker 1997; Werker et al. 1998), /bɑl/–/dɑl/(Fennell and Werker 2003; Fennell 2004), /dɑl/–/ɡ ɑl/(Fennell and Werker 2004; Fennell 2004), /dɪn/–/ ɡɪn/(Fennell 2004), /bɪn/–/dɪn/(Pater et al. 2004; Fennell 2006; Fennell et al. 2007), / bɪn/–/phɪn/(Pater et al. 2004) /dɪn/–/phɪn/(Pater et al. 2004) and /thɔ/–/dɔ/ (Thiessen 2007). Interestingly, although it is certainly not the case that all of the place contrasts tested with the coronal/front vowels /i/and /ɪ/have always failed to be detected using the Stager and Werker methodology, it is notable that both /bɑl/–/dɑl/ and /dɑl/–/gɑl/ are apparently detectable by 14-month-olds, suggesting a previously overlooked confound between vowel place and word familiarity, the factor to which previous authors attributed infants’ success on this task. Nevertheless, there remain numerous other experimental factors which can give rise to success on many pairs with coronal vowels in the Stager and Werker paradigm, and at least one paradigm (the visual choice paradigm of Swingley and Aslin 2000) in which 14-month-olds seem to perform well overall (Yoshida et al. 2009 tested infants in a word-learning task; Swingley and Aslin 2002 tested a wide variety of contrasts, but used known words). Finally, there is yet another way of looking at the data, which denies the tight link between word-learning performance and lexical encoding development. Storing and retrieving lexical items clearly involves more than simply encoding a set of sounds; for one, it involves encoding a sequence of sounds, and, for another, it involves exploring the links to the corresponding semantic and morphosyntactic entries (for relevant psycholinguistic theories, see Caramazza 1988; Bock and Levelt 1994). Failure in the broader lexical encoding step, not a deficit in the ability to encode individual segments, was proposed as an explanation for infants’ poor performance by Stager and Werker (1997). We are not aware of any experimental study attempting to disentangle these two factors. This view might help to resolve some empirical puzzles in the word-learning data. For example, there is a difference in performance in word–object pairing tasks between known and unknown words, with performance on known- word/ non- word minimal pairs much better than performance on nonce-word minimal pairs (Barton 1976; Swingley and Aslin 2002). If the infant has a perfect ability to store words, up to the limits of her lexical representational capacity, then, once learned, novel words should be the same as known words; the facts could be predicted by a theory positing a deficit in the ability to store words, without any faults in lexical representational capacity. Correctly constructed, such a theory might also account for the fact that seemingly small changes to laboratory word-learning tasks can affect performance greatly, so that performance is not always as bad as might be expected if there were an outright failure to encode particular contrasts (Fennell 2004; Thiessen 2007; see Werker and Fennell 2004 for discussion).

The Acquisition of Phonological Inventories 23

2.3 The Future To some extent, the goals in the study of phonological inventory acquisition overlap with the goals of the speech sciences as a whole: how is speech represented in memory? For production? For perception? Are some of these representations really the same? If not, how do they interact? Are they categorical or not? Absent answers to these questions, we can only hope that child data will be informative in roughly the same way adult data are. For example, children can recruit surprisingly fine phonetic detail in speech processing (McMurray and Aslin 2005); but the presence of phonetic detail does not entail the absence of coarser encodings in speech processing, or even imply primacy of more detailed encodings. We would also like to see experiments that attempt to determine under what circumstances infants do not pay attention to phonetic detail, and, ideally, in what types of representations. (For adults, these have sometimes taken the form of priming studies along the lines of Pallier et al. 1999.) Even issues which seem to be strictly developmental are at heart issues about language processing more generally. When do infants construct higher-order abstractions of speech sounds, and how? Are the word-learning results relevant? This gets at a more fundamental issue—where is the abstraction in speech processing? Is it mainly in the lexicon, with phonetic representations full of detail, or are coarse representations formed early in receptive processing? Similarly, when we ask about the relation between perception and production in infancy, and the mystery of children’s defective pronunciations, we touch on more basic questions: are lexical representations stated in a receptive alphabet, a productive alphabet, or both, or neither? In answering these questions, there is only so far the current type of empirical and theoretical literature can go. Although timelines are interesting, the real question in language development is how it takes place. In recent years, research on language learnability—the traditional term in linguistics for the theoretical study of learning mechanisms for language—has made inroads into previously uncharted territory by turning to well-understood principles and tools from statistics, machine learning, and the areas of computer science and mathematics related to optimization and search— related fields which could perhaps be collectively referred to as the inference sciences. During the 1990s, most learnability research presented algorithms and principles which were built to order for language acquisition problems, and rarely drew explicit connections to these other fields (Dresher and Kaye 1990; Gibson and Wexler 1994; Boersma 1997; Tesar and Smolensky 1998). However, deeper analysis of these and related algorithms (Niyogi and Berwick 1996; Boersma and Pater 2008; Pater 2008; Magri 2012), as well as new applications of standard techniques from the inference sciences (Yang 2002; Goldwater and Johnson 2003; Hayes and Wilson 2008), have helped to underscore the close connection between linguistics and these other fields. The study of receptive inventory acquisition has been greatly advanced by the simple observation that, at least for receptive inventories, the learner’s problem is one of clustering the auditory input. Clustering is a standard problem in machine learning in which, presented with a collection of tokens, the learner must sort out the tokens into some number

24 Ewan Dunbar and William Idsardi of categories (Hastie et al. 2009); the harder (and unfortunately more realistic) version of this problem also requires determining how many categories to posit. A mixture model is the statistical term for the generative model obtained by clustering, in which each token in the input is assumed to be an instantiation of one of a discrete set of categories. Clustering is different from classification, in which the problem is, given a description of some set of categories, to predict the categories of new points, but the two problems are intimately related: the solution to a clustering problem forms the input to a classification problem. Therefore, armed with some classification behavior (say, phoneme classification in adults or infants), plus some hypothesis about what method of classification is being used, we can try to work backwards to determine what techniques for clustering might have been used to arrive at this classification, or what types of information might be useful. Research in cognitive modeling of phoneme acquisition is in its infancy, however, and has up to now taken a less ambitious approach, simply attempting to discover statistical methods that find any sort of phoneme-like categories in acoustic data at all, rather than attempting to match the fine details of phoneme classification. As such, research in the field has often taken the ideal learner approach. This means ignoring many of the real-life constraints on the learning algorithm (like memory and speed) and attempting to determine how well a learner could do without these constraints (reminiscent of Chomsky’s (1965) “instantaneous acquisition”). In practice, it means applying standard clustering techniques which are known to find some good or optimal clustering solution and experimenting with different model assumptions and different inputs; if an ideal learner appears to find categories like those posited by a “gold standard” linguistic analysis under a certain set of modeling assumptions, then a real learner would presumably also benefit from taking the same approach. Assuming that we can determine the correct statement of the learner’s input at the level of auditory cortex, the clustering problem for receptive inventories is fairly well specified, given the last half century of productive research into the perceptually relevant acoustic dimensions for speech. For vowels, for example, the first through the third formants plus the duration can today be measured readily and fed into one of any number of off-the-shelf clustering algorithms. This simple approach is that taken by DeBoer and Kuhl (2003; for three English vowels, using Expectation Maximization to fit a mixture of Gaussians), by Hall and Smith (2006; for Greek vowels, using k-means clustering), by Vallabha et al. (2007; for Japanese vowels, using both an incremental version of expectation maximization for a mixture of Gaussians and a non-parameteric extension of the same algorithm), and by McMurray et al. (2009; for English VOT, using essentially the same parametric mixture estimation algorithm as Vallabha et al. 2007). Even for vowels, where the relevant acoustic parameters are thought to be well understood, the clustering problem is in general very hard using raw data for current models for systems with more than a few categories. One promising change to the assumptions of the model has been to construct categories which partial out the effects of allophonic rules (or, more generally, any effects of context or other variables), thereby removing some of the noise from the input (Dillon et al. 2013). Another potentially promising approach is to add an extra layer to the model corresponding to a set of known words (that is, a lexicon), thereby using context in another way, to help recover misclassified

The Acquisition of Phonological Inventories 25 acoustic material by attracting each token (now a word) to a known lexical item (Feldman et al. 2009a). Both these approaches assume that the encoding of interest is a lexical inventory, suggesting that, from a learnability perspective, it may be unnecessary and even counter-productive to attempt to discover a set of phonetic–receptive categories, rather than simply discovering a set of lexical–receptive categories directly. Of course, the view that there are two receptive inventories is the one we would obtain if we directly translated the standard tools for phonemic (lexical category) analysis taught to linguistics undergraduates into a learning mechanism. The analyst must first determine what the possible segments of the language are, including all positional variants (phone discovery); the analyst then discovers phonemes by grouping phones in some way. This might be done agglomeratively, by collapsing certain predictable distinctions—for example, by looking for complementary distributions between pairs of phones (Harris 1951; Peperkamp et al. 2006)—or it might be done divisively, by searching for evidence (for example, minimal pair evidence) that a pair of phones is contrastive (Dresher 2009). Grouping phonetic categories in this way is difficult to do well given current approaches to discovering phones (Dillon et al. 2013). A further insight gained by taking a computational perspective is that the problem of inventory discovery is inherently a statistical one—one that requires reasoning under uncertainty—in two senses: first, the discovery procedure must sort out signal from noise, and thus must allow for some uncertainty about the correctness of its solution or the relevance of an individual data point; second, the resulting inventories, at least receptive inventories, seen as classifiers, clearly have regions of uncertainty, a commonplace result in adult vowel identification tasks. Most of the techniques which have been applied in this domain (with the exception of k-means and k-nearest neighbors) are statistical in both these respects. A quickly growing trend in statistical modeling is the application of the Bayesian perspective, which treats probability as a quantification of uncertainty of knowledge rather than the limit of relative frequency that students are typically introduced to (see Knill and Richards 1996; Doya et al. 2007 for discussion of Bayesian models in perception and brain science; see Chapter 28 by Lisa Pearl and Sharon Goldwater in this volume for more detailed discussion of the relevance to acquisition). Taking a Bayesian perspective makes it licit to state probabilities not only of observable events, but also of hypotheses, and thus to use probability theory as a very generic method for changing information states—that is, learning. The crucial relation in this context is Bayes’ Rule, given in (2.2), which states that the probability of a hypothesis H given some set of data D (the posterior probability of H) is proportional to the probability of the data given the hypothesis (the likelihood of the data) times the probability of the hypothesis before seeing the data (the prior). (2)

Pr(H|D) ∝ Pr(D|H) Pr(H)

There are some simple but powerful consequences that can be drawn about learning if humans are Bayesian learners. For one thing, each hypothesis is in principle associated with a different likelihood function, and many likelihood functions will assign lower

26 Ewan Dunbar and William Idsardi probability to the observed data simply because they are more general: since probabilities must integrate to one, if a hypothesis assigns a large amount of probability to many unobserved events, there is less probability remaining for the observed events. This effect drives a Bayesian learner to be conservative, and the strength of the effect increases as a function of the number of data points (Tenenbaum 1999). However, a Bayesian treatment of learning can also be hierarchical, with complex learning problems depending on quantities which themselves must also be learned; in this case, the same type of principle applies to the intermediate level of the hierarchy, giving an automatic Occam’s Razor-like bias toward simpler intermediate-level hypotheses. Learning the parameters of a mixture model while learning the number of categories is one such problem. The likelihood term favors solutions with more categories, because such solutions can provide a more fine-grained (thus, conservative) description of the data; however, most sensible priors will imply for a similar reason a penalty on more complex solutions (that is, with more categories) just by virtue of the fact that they provide more ways of describing the same data. For details of the general point, see MacKay (2003). Beyond this interesting theoretical point, however, Bayesian modeling is highly practical, in that it allows researchers, in principle, to explore the consequences of learning extremely complex and nuanced models in a relatively “plug and play,” non ad-hoc manner (see Griffiths et al. 2008 for a review). Given the large number of unresolved theoretical questions discussed, explicitly or implicitly, in this chapter, Bayesian models will surely find a prominent place in future inventory acquisition literature. One potential application is in reconciling perception and production: systems in which uncertainty can be quantified seem particularly appropriate for exploring analysis-by-synthesis type models of speech perception, in which speech is perceived by determining the most likely gesture to have generated it. There are countless other informative projects waiting to be carried out. We cannot always reach a level of explicitness in our theories which makes them implementable by machine, although there will be a day when this is much easier than it is today. In the meantime, we hope only that acquisition researchers will keep in mind the inescapable facts—that the children under discussion all have ears, mouths, and memories—and will state their objects of study clearly.

Acknowledgments This work was partially supported by NSF IGERT DGE-0801465 to the University of Maryland, NIH 7R01DC 005660-07 to William Idsardi and David Poeppel, and SSHRC Doctoral Fellowship 752-2011-0293 to Ewan Dunbar.

Chapter 3

Phonotacti c s a nd Syl l able Stru c t u re i n I n fant Speech Pe rc e p t i on Tania S. Zamuner and Viktor Kharlamov

3.1 Introduction Phonotactics and syllable structure form an integral part of phonological competence and may be used to discover other aspects of language. Learners who are equipped with knowledge of which phonotactic structures are allowed in the native language may apply this knowledge during online language processing to locate potential word boundaries, which in turn helps them build the lexicon (Mattys and Jusczyk 2001b). Learners may also simultaneously use their knowledge about distributions of speech segments to classify lexical items into different word categories, such as nouns and verbs (Onnis and Christiansen 2008; Christiansen et al. 2009; Lany and Saffran 2010). Hence, given the importance of such knowledge to the process of language acquisition, numerous studies have investigated the development of phonotactic and syllabic knowledge in infancy and attempted to determine the point in development at which infants become sensitive to sound patterns or how knowledge of phonotactics and the syllable could be used as a strategy to solve the word segmentation problem (among others, Friederici and Wessels 1993; Jusczyk et al. 1993a, 1994; Chambers et al. 2003, 2011; Zamuner 2003, 2006, 2009a, 2009b; Seidl and Buckley 2005; White et al. 2009). The present chapter aims to introduce the reader to the relevant literature on the development of phonotactics and syllable structure. Considering that infants’ first exposure to linguistic structures comes from speech perception, our main goal is to provide an overview of the perception-related issues that have been investigated experimentally and to point out those questions that have not yet been addressed in the literature. We begin with phonotactic development, examining a wide range of sound patterns, followed by a discussion of the acquisition of syllable structure and a brief summary of

28 Tania S. Zamuner and Viktor Kharlamov various outstanding issues that may be of interest to the reader, including production- related investigations and phonological modeling studies.

3.2 Phonotactics The term phonotactics refers to language-specific restrictions on sequencing of speech sounds (Haugen 1956a; Hill 1958). For example, while English words are allowed to end in ŋ (as in ‘sing’), the ŋ is phonotacticaly illegal in word-initial position, as no English word may begin with this sound (Whorf 1940). The same constraint is not found in other languages where words are allowed to begin with ŋ. Hopi, for example, has the words ŋɨmni ‘flour’ and ŋɨhɨ ‘medicine’ (Jeanne 1992). Such phonotactic patterns are often described by referring to the notion of the syllable (e.g. prohibition against specific onsets; Haugen 1956b; Fudge 1969) and can be classified as “absolute” versus “probabilistic” or “first-order” versus “second-order” (Chambers et al. 2011). Absolute phonotactic patterns (also called “categorical phonotactic constraints”) refer to those restrictions that are never violated in a given language. Absolute patterns may involve either individual segments (e.g. the constraint against the word-initial ŋ in English) or sequences of sounds (e.g. the constraint against the word-initial sequence bn in English). Cross-linguistic differences in the distribution of segments have long received attention in the literature (Haugen 1951; Saporta and Olson 1958), and distributional analyses of where phonemes can and cannot occur have been used as a tool and diagnostic for identifying phonemic inventories (Hill 1958). In contrast to the absolute phonotactic constraints, probabilistic phonotactic patterns refer to the statistical likelihood of a sound or a sequence of sounds occurring in a specific environment (e.g. in languages such as Hopi, the ŋ may have different likelihoods of occurring word-initially versus word-finally). Such descriptions of the relative frequency of segments have a long tradition in linguistic literature, dating back to the Prague school of linguistics (Zipf 1935; Trubetzkoy 1939/1969; Saporta 1955; Keller and Saporta 1957). With respect to the “order” factor, first-order patterns typically involve restrictions on the position of a segment or a feature within a syllabic frame (e.g. a [+voiced] segment cannot be in a coda regardless of its preceding or following environment). Second-order patterns refer to positional restrictions that are dependent on another property, such as the type of preceding or following segment (e.g. a fricative can only be in coda position if preceded by a high vowel). As first-order patterns do not depend on the appearance of any other segment or feature, they can be viewed as less complex than second-order patterns (Chambers et al. 2011). In the following sections, we review the relevant literature that has examined the acquisition of these types of phonotactic knowledge.

3.2.1 Absolute Phonotactics As just described above, absolute phonotactic patterns are compulsory restrictions on individual segments or sequences of speech sounds. One of the first studies looking at

Phonotactics and Syllable Structure in Infant Speech Perception 29 the acquisition of such patterns was Friederici and Wessels (1993) who tested Dutch infants’ sensitivity to legal versus illegal onset and offset sequences in Dutch (e.g. sɡ, rt). The sound sequences were embedded in non-words, with phonotactically illegal stimuli created by reversing legal onsets and offsets. In a series of studies using the head-turn preference procedure (HPP), 9-month-old Dutch-learning infants preferred to listen to those lists that had legal onsets and offsets, whereas 4-and 6-month-old infants showed no significant listening preference to either the legal or the illegal list. Crucially, 9-month-olds also showed no preference for either list when the stimuli were low-pass filtered to remove the segmental content from the signal, demonstrating that the listening preference for the phonotactically legal stimuli was driven by their segmental properties. Friederici and Wessels offered two possible accounts of their results. One interpretation of the findings was that they reflected infants’ knowledge about the frequency of segmental sequences in different prosodic positions. The other possibility was that the observed results reflected more abstract or rule-governed phonotactic knowledge. Although these alternative interpretations concern the issue of the level of abstraction in learner’s phonotactic knowledge and Friederici and Wessels’ study was not designed to address this question, the authors suggested that “frequency of occurrence is the first ground upon which to build up initial language-specific knowledge” (1993: 297) and that more abstract or rule-governed knowledge emerges from exposure to such patterns. Friederici and Wessels demonstrated that there were differences in sensitivity to phonotactic regularities at different ages (no effect at 4 and 6 months, significant preference for legal phonotactic patterns at 9 months). This finding indicates that some type of learning of phonotactic patterns has taken place. However, when interpreting these results and considering how they might generalize to learners of other languages, it is important to keep in mind that phonotactic learning is traditionally thought to be dependent on language-specific knowledge of the native phonemic inventory and how these phonemes are allowed to combine, yet learners may also display sensitivity to certain types of phonotactic patterns without necessarily experiencing them in the ambient language. In other words, infants may prefer one pattern over another based on more language-general processing abilities (where sensitivity to different structures may emerge at different stages in development) or innate linguistic knowledge (which may reflect factors such as phonetic knowledge and/or knowledge of how speech is perceived; Hayes and Steriade 2004). Crucially, an important limitation of Friederici and Wessels’ stimuli is that through the juxtaposing of legal onset and offset clusters they created syllables that were ill-formed not only for Dutch but also cross-linguistically. For example, while the onset rt is phonotactically illegal in Dutch, it also violates the Sonority Sequencing Principle (Steriade 1982; Selkirk 1984; Clements 1990) and it goes against the tendencies that are found cross-linguistically (Kawasaki-Fukumori 1992). Hence, the stimuli were not only phonotactically illegal for Dutch but also marked cross-linguistically and the observed preference for the legal phonotactic patterns could have been driven by language-general knowledge rather than learning of the patterns in the ambient language. As such, the case for language-specific phonotactic learning would be much stronger if differences in learner’s performance were found depending on the learner’s input. For example, it would be revealing to examine the development

30 Tania S. Zamuner and Viktor Kharlamov of phonotactic knowledge in a group of infants acquiring a language like Russian, which has syllables that violate the Sonority Sequencing Principle (e.g. rta ‘mouth,’ bobr ‘beaver’). However, very few studies have looked at phonotactic acquisition and directly compared infants’ performances across languages. These investigations, described in this chapter, have generally found that a wide range of factors comes into play in the acquisition of phonotactics. Almost ten years after the original study, Sebastian-Galles and Bosch (2002) extended Friederici and Wessels’ (1993) results to a group of 10-month-old infants from monolingual Catalan families, monolingual Spanish families, and Catalan-Spanish bilingual households. Catalan and Spanish have different phonotactic restrictions on word-final consonants. Catalan permits word-final consonant clusters (e.g. ve[rt] ‘green (masculine)’; Wheeler 2005), while Spanish is more restrictive and does not allow final clusters. Sebastian-Galles and Bosch tested both monolingual Catalan and monolingual Spanish infants on their preference for words that ended in consonant clusters that were either legal or illegal according to Catalan phonotactics (legal: birt, dort, gurt, nast; illegal: ketr, datl, bitl, bepf). If the two lists were discriminated on the basis of language-general knowledge or perceptual saliency rather than experience with the ambient language, both groups of infants may be expected to show a preference for the legal phonotactic lists. However, Sebastian-Galles and Bosch found that only the Catalan-learning infants showed a listening preference for the legal list over the illegal list and that Spanish monolingual infants did not prefer either list (note, however, that the exact interpretation of the Catalan results remains a matter of debate since the illegal clusters violated both Catalan phonotactics and the Sonority Sequencing Principle). Jusczyk et al. (1993a) is another central study on phonotactic acquisition, which set out to establish when learners show evidence of language-specific, absolute phonotactic patterns. English-and Dutch-learning infants were presented with lists of either English or Dutch words that contained phonemes found in both languages, but the legality of sequencing of these phonemes differed across English and Dutch. For example, the Dutch word zweten ‘to sweat’ begins with a zw cluster. Although both z and w are phonemes of English and Dutch alike, only Dutch allows zw word-initially. Jusczyk and his colleagues found that infants at 9 months of age, but not at 6 months, listened longer to word lists from their ambient language. When the segmental information was removed by using low-pass filtered stimuli, infants no longer showed any preference for the stimuli from their native language. The authors concluded that knowledge of language- specific phonotactics emerges around 9 months of age. Note, however, that although the lists were controlled to contain phonemes that are found in both languages, the frequency and acoustic patterns of the phonemes in question differ across English and Dutch. This opens the possibility that the lists may also have been distinguished on the basis of factors other than phonotactics (see Zamuner 2006: 81). Furthermore, one question the authors raise is how infants acquire this phonotactic knowledge. Although phonotactic knowledge is characterized as patterns that are abstracted across the learner’s lexicon, this does not seem plausible given that infants at 9 months have small lexicons. Instead, infants’ phonotactic knowledge likely reflects sublexical levels of representation

Phonotactics and Syllable Structure in Infant Speech Perception 31 (Jusczyk et al. 1993a; Mattys et al. 1999), which is also supported by the findings of the studies on the acquisition of phonotactic probabilities that are described in the next section. A recent study by Archer and Curtin (2011) explored whether 6-and 9-month-old English-learning infants showed sensitivity to the type and token frequency of legal versus illegal onset clusters (e.g. kla, bla, tla). The study revealed that infants at 6 months of age were not sensitive to either the type or the token frequency of onset clusters. In contrast, 9-month-old infants showed sensitivity to type but not token frequencies of onset clusters. Archer and Curtin proposed that the lack of sensitivity to phonotactic patterns at 6 months is related to the lack of sensitivity to type frequencies and that the emergence of phonotactic knowledge at 9 months reflects a newly developed sensitivity to type frequencies, which underpins the learner’s ability to determine which sound combinations are legal and illegal in the language the learner is acquiring. The importance of type but not token frequency is also consistent with usage-based theories of language (e.g. Bybee 2001) which use type frequency rather than token frequency to predict the learnability of a given pattern. Other factors, such as the acoustic saliency of a phonotactic pattern, may also interact with the acquisition of phonotactics. In a study by Narayan et al. (2010), English- learning and Filipino-learning infants were tested on two contrasts which differed in their acoustic saliency and their cross-linguistic phonotactic legality. The first contrast was ma ~ na, which is both acoustically salient and phonotactically legal in English and Filipino alike (Narayan 2008). The second contrast was na ~ ŋa, which is less salient and also involves an important cross-linguistic difference. While English has both n and ŋ in its sound inventory, only n can occur word-initially. In Filipino, both sounds are phonotactically legal in word-initial position. Narayan and colleagues found that English-learning infants showed discrimination of ma ~ na at 6–8 and 10–12 months of age (Filipino infants were not tested on this contrast). At the same time, both English and Filipino infants failed to discriminate the na ~ ŋa contrast at 6–8 months, and only Filipino learning-infants discriminated the na ~ ŋa contrast at 10–12 months. Although one would expect the contrast to be discriminated by both groups of infants at 6–8 months (based on studies showing that language specific phonemic contrasts are acquired later; e.g. Werker and Tees 1984), the less salient contrast appears to be discriminated only with linguistic experience. Thus, the learning of a phonotactic pattern may also vary depending on its acoustic saliency. Other potential limitations on the types of phonotactic patterns that learners can acquire are discussed in sections 3.2.2–3.2.6.

3.2.2 Probabilistic Phonotactics The notion of phonotactic probability refers to the statistical likelihood that a sound or a sequence of sounds will occur in a given environment (e.g. Vitevitch and Luce 2004). In a study analogous to the works looking at absolute phonotactics, Jusczyk et al. (1994) examined whether infants are sensitive to the statistical frequency of sound patterns

32 Tania S. Zamuner and Viktor Kharlamov that are all legal in the ambient language. English-learning infants were tested on their listening preferences for non-words which contained either frequent or infrequent sound combinations. Jusczyk and colleagues found that infants at 9 months of age listened longer to a list of non-words of high phonotactic probability (fʌl, kit, mep) than to lists of non-words of low probability (ðʌʃ, zidʒ, θeθ). In contrast, 6-month-old infants displayed no preference for either list (also, see Zamuner 2003 who extended Jusczyk et al.’s findings to 7-month-old English-learning infants and found a similar preference for high phonotactic probabilities). These findings indicate that infants in the second half of their first year of life are sensitive to the distribution of sound patterns in the ambient language. This type of knowledge may then be used in word segmentation and the “development of word recognition abilities and the organization of the mental lexicon” (Jusczyk et al. 1994: 642). As mentioned earlier, studies on phonotactic acquisition have not necessarily addressed the question of whether phonotactic knowledge reflects specific knowledge about the frequency of phonotactic patterns in a language or, alternatively, more abstract or rule-governed phonotactic knowledge. Given that Jusczyk and colleagues demonstrated that infants are in fact sensitive to the statistical frequency of sound patterns, their results suggest that, at this stage in development, phonotactic knowledge is likely to be more specific than abstract (however, see Chambers et al. (2011) who showed that infants are able to generalize certain first-order patterns and may therefore have access to abstract knowledge). Another study by Mattys et al. (1999) found that infants are sensitive to the occurrence of consonant+consonant (CC) sequences within and across word boundaries. The CC clusters in their stimuli all had roughly the same type and token frequency in English (e.g. ŋk, ft, ŋt, fh). However, the frequency of the consonantal sequences varied within and across word boundaries. Some clusters had a high probability of occurrence within word boundaries but a low probability of occurrence across word boundaries (e.g. nɔŋ.kʌθ, zuf.tʌdʒ). Other clusters had a low probability within word boundaries but a high probability across-word boundaries (e.g. nɔŋ.tʌθ, zuf.hʌdʒ). Infants listened longer to non-words containing high probability within-word clusters. Thus, not only do infants demonstrate knowledge about the frequency of sound patterns, but they are also sensitive to how sound combinations are distributed both within and across word boundaries (see also Gonzalez-Gomez and Nazzi, 2016 for evidence that infants are capable of tracking probabilistic patterns that involve non-adjacent segments). To explain the results for between-word patterns, Mattys et al. argued that learner’s phonotactic knowledge cannot be based exclusively on individual, stored lexical items, since infants also showed sensitivity to between-word sequences.

3.2.3 Acquisition of Novel Patterns In more recent years, research on phonotactic acquisition has looked beyond infants’ sensitivity to phonotactic patterns that are legal versus illegal or frequent versus infrequent in the ambient language and have started investigating the learning mechanisms

Phonotactics and Syllable Structure in Infant Speech Perception 33 that are involved in the acquisition of phonotactics, the type of information that is relevant for learners, and the way learners might begin to analyze, represent, and generalize phonotactic knowledge. Such studies aim to determine what kinds of novel first-and second-order phonotactic patterns can be learned by infants during a brief familiarization stage and how the learnability of such patterns is affected by phonetic naturalness, the phonemic status of segments, and infants’ age. Using the head-turn preference procedure (HPP), Chambers et al. (2003) explored infants’ ability to track an arbitrary pattern involving positional restrictions on initial and final consonants. English-learning 16.5-month-olds were familiarized with C1VC2 non-word items in which initial and final consonants were drawn from two different segmental sets that could not be characterized using a single phonetic feature (e.g. b, k, m, t, f as C1; p, g, n, tʃ, s as C2). During testing, infants were presented with novel words that followed either the familiarized pattern (C1VC2) or a new pattern in which initial and final consonants were reversed (C2VC1). Infants showed a preference for the juxtaposed non-words, confirming that they were capable of tracking an arbitrary first-order phonotactic pattern that was not present in their ambient language (for comparable findings for 10.5-and 16.5-month-olds, see Chambers et al. 2011). However, it remained to be determined whether infants acquired the complex restrictions on both word-initial and word-final consonants or whether the observed effect was due primarily to their ability to detect word-initial or word-final patterns alone (see, among others, Saffran and Thiessen 2003; Weitzman 2007). Further evidence for infants’ ability to learn first-and second-order patterns came from Seidl and Buckley (2005) who tested whether 8.5-to 9.5-month-old infants could learn phonetically natural and phonetically unnatural, arbitrary phonotactic patterns involving manner and place of articulation. Infants were exposed to sound patterns that were either common (unmarked) or rare (marked) in the world’s languages. For manner of articulation, the unmarked pattern involved words containing intervocalic fricatives and affricates but no intervocalic stops (e.g. pasat, mitʃa). The marked pattern involved items with word-initial fricatives and affricates and intervocalic stops (e.g. sapat). For place of articulation, the unmarked pattern was the co-occurrence of labial consonants with round vowels and coronal consonants with front vowels (e.g. vogo, sike). The marked pattern involved co-occurrence of labial consonants with high vowels and coronal consonants with mid vowels (e.g. vigo, soke). Infants did not show any significant preference for phonetically natural sequences for either the first-order pattern involving manner of articulation or the second-order pattern involving place of articulation. Seidl and Buckley concluded that infants were capable of acquiring both marked and unmarked patterns and that there was no specific preference for phonetic naturalness in the grammars of 9-month-old learners. However, the proposed interpretation of the findings is based on the assumption that intervocalic affrication (i.e. stops becoming affricates in intervocalic positions) is phonetically natural and that infants define both fricatives and affricates as [+continuant] and oral and nasal stops as [–continuant], which may or may not be the case. Seidl and Buckley also acknowledged that, for consonantal manner, infants could have learned that word-initial segments must be stops

34 Tania S. Zamuner and Viktor Kharlamov rather than learning that word-medial segments must be fricatives and affricates. For place features, potentially conflicting cues that could have affected infants’ ability to recognize the pattern were present in the data as phonetic naturalness was only manipulated in the first syllable of bisyllabic items (e.g. infants were exposed to items such as sidu in which coronal segments in the second syllable were not followed by front vowels). In contrast to the studies discussed earlier in this section, some previous investigations showed that infants were not able to learn certain types of phonotactic patterns. Saffran and Thiessen (2003), for example, found that English-learning 9-month-old infants failed to acquire abstract phonotactic regularities involving phonetically unnatural classes of segments. Saffran and Thiessen examined whether infants could acquire first-order restrictions on the occurrence of voiced and voiceless consonants in disyllables (voiceless p, t, k in onsets, voiced b, d, ɡ in codas, and vice versa; e.g. todkad and dakdot) and whether infants could also learn a similar pattern that did not involve sets of sounds that could be defined as a natural class using a single phonological feature (p, d, k in onsets, b, t, k in codas, and vice versa; e.g. dotkat and taktod). Infants acquiring the pattern involving a single feature listened longer to the lists which did not follow the familiarized pattern, which is a novelty preference. Given the complementary distribution of voiced and voiceless consonants in the single feature pattern, it remained unclear what kind of regularities the infants were actually tracking (e.g. they could have learned that word-initial segments had to be voiceless and could have disregarded the voicing of coda consonants and/or word-medial onsets). However, those infants who were familiarized with arbitrary sets of segments that could not be characterized with a single feature, failed to acquire the pattern. Saffran and Thiessen also noted the role of age, since their findings contrasted the results obtained by Chambers et al. (2003) for 16.5-month-olds who showed successful acquisition of an arbitrary pattern (note, however, that recent work by Chambers et al. 2011 extended their 2003 findings to 10.5- month-old infants). The suggestion that age may have an important effect on infants’ ability to acquire phonotactic patterns has also been supported by White et al. (2009). White and colleagues explored the ability of English-learning 8.5-and 12-month-old infants to acquire phonotactic patterns involving obstruent voicing. Infants were familiarized with datasets involving stop or fricative voicing alternations at word boundaries (e.g. initial consonants were voiceless after voiceless segments, and voiced elsewhere, as in rot#pevi ~ na#bevi). While both 8.5-and 12-month-olds could learn the voicing pattern, younger infants appeared to rely on transitional probabilities of segments, whereas older infants seemed to group segments into functional categories (although the relevant interaction was not statistically significant). Similar developmental changes in the acquisition of phonotactic patterns have also been reported in the work by Cristià and colleagues (Cristià and Seidl 2008; Cristià et al. 2011). Cristià and Seidl (2008) trained 7-month- old infants on a phonotactic pattern involving nasals and oral stops or a pattern involving nasals and fricatives. Only the infants trained on the pattern involving nasals and stops were able to generalize the phonotactic regularity to novel test items. At the same time, Cristià et al. (2011) showed that 4-month-old infants were able to generalize both

Phonotactics and Syllable Structure in Infant Speech Perception 35 patterns when tested on the same stimuli. This suggests that infants’ sensitivity to phonotactic regularities becomes more focused as their exposure to the ambient language continues to increase. Finally, Seidl et al. (2009) investigated whether the phonemic status of segments could also affect the acquisition of phonotactic patterns. Seidl and colleagues familiarized 11- month-old infants from French Canadian families and 4-and 11-month-old infants from English-speaking households with a second-order phonotactic pattern in which the selection of C2 in C1VC2 items was dependent on the type of the preceding vowel (e.g. oral vowels were followed by stops, nasal vowels were followed by fricatives). During testing, French Canadian infants (for whom vowel nasality was phonemically salient) showed a head-turn preference for the items that did not follow the familiarized pattern and were also capable of transferring the familiarized pattern to a novel set of vowels. In contrast, English-learning 11-month-olds (for whom vowel nasality was not phonemically relevant) did not acquire the phonotactic regularity. English-learning 4-month-olds (who have had limited experience with the phonological system of English), patterned like the French-learning 11-month-old infants and learned the phonotactic pattern, displaying a familiarity preference for items that followed the original pattern. Thus, by 11-months, phonotactic learning appears to be also constrained by the phonemic status of segments, with less attention given to contrasts that are allophonic in the ambient language.

3.2.4 Phonotactics and Word Segmentation In the past few years, several works have looked at learner’s ability to use phonotactic knowledge in word segmentation in order to determine whether infants are able to apply statistical knowledge of a language’s phoneme distributions and phonemic transitions to discover word boundaries. Mattys and Jusczyk (2001b), for example, tested 9-month- old infants on their ability to segment a CVC word from fluent speech. The target words varied on whether they were preceded and/or followed by good versus poor phonotactic cues to a word boundary. The word ‘gaffe,’ for example, creates the clusters nɡ and fh when embedded in the sequence ‘The old pine gaffe house tends to break too often.’ When ‘gaffe’ is embedded in ‘The old tong gaffe tends to break too often,’ this creates the clusters ŋɡ and ft. While the frequency of these four clusters is approximately the same in English, they vary in how often they occur across versus within word boundaries. The clusters nɡ and fh are considered good phonotactic cues to a word boundary because nɡ and fh do not tend to occur in the same word in English. In contrast, ŋɡ and ft are poor phonotactic cues to a word boundary because in English these clusters tend to be found within a word. Mattys and Jusczyk found that infants segmented the words when they were preceded by good phonotactic cues that aligned with across-word boundaries. Similarly, at the level of syllables, work using artificial languages (Saffran et al. 1996a, 1996b) found that infants are more likely to posit a word boundary at the location of a low probability transition (indicative of a word boundary) than a high probability transition (indicative of within-word sequences).

36 Tania S. Zamuner and Viktor Kharlamov While a full review of the literature on word segmentation is beyond the aims of this chapter, it is worthwhile to note that a large amount of research has been dedicated to understanding how distributional information may be used in the word segmentation task (Church 1987; Brent 1999b) and how much can be gained on the basis of phonotactic cues alone versus when phonotactic information is combined with other cues, including knowledge of lexical stress (Christiansen et al. 1998), phonological cues (Onnis et al. 2005) or universal phonotactic constraints, such as the prior knowledge that well-formed words consist of a syllabic sound or a nucleus (Blanchard et al. 2010). For an in-depth discussion on word segmentation, the reader is referred to Chapter 8 by Louise Goyet, Séverine Millotte, Anne Christophe, and Thierry Nazzi in this volume.

3.2.5 Phonotactics and Lexical Acquisition Beyond word segmentation, phonotactic knowledge is also known to be applied to the learning of lexical items (Saffran and Graf Estes 2006; Onnis and Christiansen 2008; Christiansen et al. 2009; Graf Estes 2009; Lany and Saffran 2010). Specifically, researchers have asked whether the phonological shape of a word has an impact on whether or not that word is acquired, that is, whether already established phonological knowledge can facilitate the acquisition of new lexical items. To address this question, studies have manipulated the phonotactic patterns of novel words and tested whether learners showed an advantage in acquisition depending on the type of phonotactic patterns involved. While most of this research has looked at older children (e.g. Messer 1967), some studies have examined the potential role of phonotactic patterns for lexical acquisition in infancy. Graf Estes et al. (2011), for example, showed that word learning is impacted by the phonological patterning of novel words, for infants as young as 19 months of age. In their study, infants learned non-words that conformed to English phonotactics (dref, sloob), but did not learn non-words with illegal phonotactic patterns (dlef, sroob). Friedrich and Friederici (2005) reported neurophysiological data which largely parallels the behavioral findings of Graf Estes and colleagues. Friedrich and Friederici compared the processing of phonotactically legal versus illegal non-words by 19-month-old German-learning infants. They found that infants have an ERP component (N400) only for phonotactically legal non-words, but not for illegal tokens. The N400 is known to be an indicator of semantic processing or semantic integration (Kutas and Hillyard 1980). Because an N400 was only found in the processing of non-words containing legal phonotactic patterns, this suggests that the degree of semantic integration of a word is influenced by the words’ phonological shape (Friederici 2005). Furthermore, other work that has manipulated the phonotactic probabilities of non-word stimuli items found that young children are also better at learning non-words with frequent versus infrequent phonological patterns (e.g. Storkel 2001). Together, these studies demonstrate that the phonological properties of words impact lexical acquisition and provide a connection between phonotactics and lexical acquisition.

Phonotactics and Syllable Structure in Infant Speech Perception 37 Experimental findings can vary depending on the methodology adopted by the researchers. For example, the role of phonotactic probabilities and neighborhood densities in the acquisition of real words and non-word stimuli has been investigated either by looking at corpora of child directed speech or children’s first word productions (Coady and Aslin 2003; Storkel 2009; Zamuner 2009b) or by subjecting learners to various experimental procedures (Hollich et al. 2002; Swingley and Aslin 2007). The corpora studies find a benefit for new words that overlap with already acquired words. In contrast, experimental studies find that the same words are at a disadvantage (see discussion on these issues by Saffran and Graf Estes 2006; Graf Estes 2009). As pointed out by Saffran and Graf Estes (2006), examinations of the relationship between phonological knowledge and lexical acquisition are still relatively recent, and many research questions are yet to be addressed in the literature (also see Stoel-Gammon 2011). For example, although work on infant speech perception has already examined learner’s sensitivities to phonotactically legal versus illegal patterns, experimental work on the production of phonotactically illegal structures is so far limited to a few older studies (e.g. Messer 1967).

3.2.6 Phonotactics and Prosodic/Word Domains What many of the different types of phonotactic patterns discussed in the previous section have in common is that they refer to specific prosodic or word domains. For example, the restriction in English against ŋ in word-initial position can only be learned if ŋ is perceived correctly and its distribution is tracked across different environments. Therefore, any studies investigating the acquisition of phonotactics must consider potential differences in infants’ discrimination abilities as well as their ability to learn patterns that occur in different prosodic positions. Most studies have focused on the discrimination of contrasts in word-initial position, starting with some of the earlier work on infant speech perception (Eimas et al. 1971). While our knowledge of the types of contrasts that infants can discriminate in word-initial position is vast, there are only a handful of studies that have examined infants discrimination abilities in positions beyond the word-initial environment, such as word-finally (e.g. Jusczyk 1977; Zamuner 2006; Fais et al. 2009). These latter studies have varied in their experimental research aims, the types of contrasts that they tested, and the methodologies they used (for a summary of the relevant literature, see table 1 in Fais et al. 2009: 290). Generally, the discrimination of contrasts in final position varies depending on the nature of the stimuli. Jusczyk (1977b) found that 2-month-old infants are able to discriminate a word-final d~ɡ contrast (bad ~ baɡ) and an m~ɡ contrast (bam ~ baɡ). Fais et al. (2009) reported that English-learning infants at 6-12-and 18 months of age are able to discriminate between word-final singleton consonants and word-final consonant clusters (as in neek ~ neeks). Similar effects have also been found at the cross-linguistic level, with infants’ reaction to phonotactic patterns being affected by the phonotactic legality of the sequences in question in the ambient language (e.g. English versus

38 Tania S. Zamuner and Viktor Kharlamov Japanese; Kajikawa et al. 2006; Mugitani et al. 2007). Not all contrasts, however, appear to be equally salient. Zamuner (2006) found that 10-and 16-month-old Dutch-learning infants can discriminate place of articulation-based contrasts in the word-final position (kep ~ ket) but they do not discriminate between legal and illegal voicing phonotactics in the same word-final environment (ked ~ ket). Another general finding is that infants’ performance depends on the nature of the experimental task. Discrimination studies tend to find that infants are able to perceive final contrasts (e.g. Eilers et al. 1977; Jusczyk 1977b; Hayes et al. 2000, 2009; Fais et al. 2009). Similarly, sensitivity to phonotactic patterns in final position has also been found in preference studies (Friederici and Wessels 1993; Mattys and Jusczyk 2001b; Sebastián Gallés and Bosch 2002), word-segmentation tasks (Mattys and Jusczyk 2001b; Tincoff and Jusczyk 2003), and studies looking at word-recognition and word-learning (Nazzi and Bertoncini 2009; Swingley 2009b). Swingley (2009b), for example, found that 14- to 22-month-olds are equally sensitive to mispronunciations in word-initial and word- final positions (e.g. “book” pronounced as “[d]‌ook” or “boo[p]”). Nazzi and Bertoncini (2009) showed that French-learning infants at 20 months of age are able to learn minimal pair non-words that differ in a single consonant in either the initial position (but ~ put) or the final position (pid ~ pit). In contrast, studies that have required infants to categorize stimuli items, such as finding commonalities across experimental tokens, have shown that infants are less sensitive to contrasts that occur word-finally (Jusczyk et al. 1999c; Zamuner 2006; Fais et al. 2009). Jusczyk et al. (1999c), for example, compared infants’ abilities to detect similarities in different word positions. At 9 months of age, infants preferred to listen to lists of words that shared the initial segment (fɛt, fɛm, fɛt, fɛɡ), but not to lists of words that had the same sound in word-final position (bad, pad, mad, tad). Zamuner (2006) found that Dutch-learning infants at 9-and 11 months did not prefer to listen to lists of non-words ending in legal voicing phonotactics over non- words ending in illegal voicing phonotactics. In sum, there is a great deal of variability found across the different studies, such as testing infants of different ages and/or using different methodologies. Some evidence suggests that contrasts in different prosodic positions are equally learnable. Other studies find that learners are more sensitive to initial position than medial or final position (Karzon 1985; Walley et al. 1986; Jusczyk et al. 1999c; Swingley 2005a; Zamuner 2006; Levelt 2012) and, as in the case of the discrimination studies described at the start of this section, it remains to be explored whether the observed effects may stem primarily from the differences in the acoustic salience of the contrast under investigation.

3.3 Syllable Structure The notion of the syllable has a long tradition in the linguistic literature. Syllable structure, for example, is often used to describe phonotactic patterns that are found in the world’s languages (see, among others, Fudge 1969). The syllable has also been argued

Phonotactics and Syllable Structure in Infant Speech Perception 39 to function as a unit of processing that plays a role in both the production and perception of speech (Spoehr and Smith 1973; Mehler et al. 1981; Levelt and Wheeldon 1994; Ferrand et al. 1996; Cholin et al. 2006). Not surprisingly, the notion of the syllable is also frequently encountered in infant studies. However, very few works have attempted to directly address the question of what role (if any) the syllable plays in infants’ ability to perceive speech inputs and, so far, no work from infant studies has been specifically dedicated to the question of how syllable structure is acquired. Most research on the role of the syllable has concentrated on the issue of early representations. Namely, they attempted to address the question of whether infants perceive speech inputs as decomposable sequences (strings of phonemes, bundles of phonetic features, etc.) or, alternatively, as non-decomposable syllable-sized chunks. Jusczyk and Derrah (1987), for example, used a modified high amplitude sucking (HAS) procedure to habituate 2-month-old infants to a set of monosyllables bi, bo, ba, and bər. During testing, infants were exposed to du and bu. Jusczyk and Derrah assumed that a stronger reaction to du (which involves a new consonant and a new vowel) than bu (which involves a new vowel but has the same consonant as in the familiarization set) would indicate that infants’ early representations are decomposable into individual segments. Infants reacted to both changes in the same way. In the absence of any positive evidence for segmental representations, Jusczyk and Derrah concluded that perceptual inputs are not represented as phonemic sequences but rather as non-decomposable ‘global’ units reminiscent of the syllable. This finding was in contrast to some of the earlier claims of infants’ sensitivity to individual phones (e.g. Miller and Eimas 1979; Eimas and Miller 1981; Hillenbrand 1983, 1985), but was in line with a number of studies that advocated for syllable-sized units in infant speech perception (Bertoncini et al. 1988; Jusczyk et al. 1995a, 1995b; Houston et al. 2003). Another question seen in the literature on infant speech perception is whether the presence of syllable structure in the inputs facilitates perceptual processing. Using the HAS and a habituation-dishabituation paradigm, Bertoncini and Mehler (1981) tested 2- month-old infants’ ability to detect consonantal metathesis in CVC, CCC, and VCCCV sequences (e.g. tap ~ pat, tʃp ~ pʃt, or utʃpu ~ upʃtu). Bertoncini and Mehler found a significant discrimination rate for syllable-like CVC stimuli, a weaker but still significant rate for seemingly bisyllabic VCCCV items, and no significant differences for the non-syllabic CCC control group. On the basis of these results, Bertoncini and Mehler argued that the syllable is a unit of speech processing in infants. However, as pointed out in Bijeljac-Babic et al. (1993), infants’ ability to better discriminate a given contrast in a syllable-like environment does not necessarily entail that the syllable is a unit of speech perception. Furthermore, consonantal place differences are known to be most perceptible in pre-vocalic environments, less perceptible in post-vocalic environments and least perceptible when adjacent to other consonantal segments (see, among others, Steriade 1999; Côté 2000). Hence, no explicit reference to syllable structure is needed to account for the fact that infants are quite good at perceiving the transition from ta-to pa-(which involves a pre-vocalic position), slightly worse at discriminating the ut- versus up-contrast (which involves a post-vocalic contrast), and have the most difficulty

40 Tania S. Zamuner and Viktor Kharlamov discriminating tʃ-from pʃ-(when t, p are pre-consonantal). It is also not known whether infants tracked both consonants simultaneously or whether they paid attention to the initial or the final consonantal segment only, which would implicate a unit smaller than the syllable (e.g. tracking a rhyme or a coda but disregarding the onset). Lastly, a few studies have aimed to determine whether infants are equally capable of noticing phonotactic patterns involving differences in abstract syllable structures and those involving differences in segmental composition or moraicity. Bijeljac-Babic et al. (1993), for example, used the HAS to test whether 4-day-old newborns could discriminate between two stimuli lists on the basis of (i) syllabicity (bisyllabic versus trisyllabic; e.g. rifo, zuti ~ mazopu, kesopa) and (ii) the number of segments (4 versus 6; e.g. rifo, iblo, gria ~ suldri, treklu, alprim). Infants noticed the difference between bisyllabic and trisyllabic sets even when the duration of stimuli items was modified to create a substantial overlap in the two distributions. At the same time, infants did not show any evidence of sensitivity to differences in the number of segments. Similarly, Bertoncini et al. (1995) tested whether 3-day-old newborns were capable of discriminating speech inputs on the basis of syllable structure and moraicity. Results revealed that while newborns register the difference between bisyllabic and trisyllabic sets (e.g. iga, tema ~ hekiga, temari), they fail to discriminate between bimoraic and trimoraic lists (e.g. kago, tomi, seki, buke ~ kango, tomin, sekki, buuke). Bertoncini and colleagues concluded that syllables are salient units even for newborns and that neonates use global representations and do not perceive syllable-internal complexity. Saffran and Thiessen (2003, Experiment 1) also showed that 9-month-olds can be sensitive to syllabic differences. Using HPP, Saffran and Thiessen familiarized infants with either CVCV or CVCCVC sequences (e.g. boga ~ bikrub). During testing, infants listened longer to those items that conformed to the familiarized pattern, which led Saffran and Thiessen to conclude that infants are capable of acquiring knowledge about syllabic structure. However, the conclusions presented here were based on stimuli items that were not necessarily controlled for exogenous variability, such as the presence of durational differences between preshift and postshift sets. In addition, instead of noticing differences in syllable structures, infants could have perceived differences in the number of vowel peaks or presence versus absence of consonant clusters within stimuli items. In the case of Saffran and Thiessen’s study, the observed preference for a familiar syllabic frame has also been explained in terms of priming rather than learning (Seidl and Buckley 2005). Thus, the investigations described in this section provide at best limited evidence for the syllable playing a decisive role in perceptual processing. Furthermore, these studies often used monosyllabic items and did not attempt to de-correlate different levels of representation, which leaves open the possibility of another unit of processing (e.g. a word, a rhyme, a nucleus) being responsible for the observed effects. As such, the role of the syllable in infant speech processing and the question of how syllable structure is acquired (or whether it is acquired at all) remain to be explored, especially in light of a growing body of psycholinguistic literature challenging the status of the syllable as a unit of processing in adult speakers (among others, Jared and Seidenberg 1990; Schiller 1998, 1999, 2000; Perret et al. 2006).

Phonotactics and Syllable Structure in Infant Speech Perception 41

3.4 Outstanding Issues While the focus of the present chapter is on speech perception, it is important to point out that there exists a body of literature on other aspects of the acquisition of phonotactics and syllable structure, such as production-related issues and phonological modeling of the learning process. The literature on the production of phonotactic patterns and the acquisition of syllable structure in children’s early speech outputs is very limited, especially with children under the age of three (for general reviews, see Bernhardt and Stemberger 1998; Fikkert 2007; Demuth 2011; Stoel-Gammon 2011). The central themes of this research have been the nature of children’s phonological and lexical representations and the role of frequency in phonological acquisition (e.g. Beckman and Edwards 2000; Munson 2001; Coady and Aslin 2004; Edwards et al. 2004; Zamuner et al. 2004; Munson et al. 2005a; Munson et al. 2005b; Stokes et al. 2006; Coady and Evans 2008; Zamuner 2009b; Munson et al. 2012), the influence of the knowledge about the syllable and its organization (Jakobson 1941/1968; Demuth 1995b; Ohala 1999), and the emergence of different syllable structures and the developmental paths followed by the speakers (Moskowitz 1973; Fikkert 1994; Levelt et al. 1999). Several studies have also used production data to examine how different types of segmental patterns interact with prosodic knowledge (Fudge 1969; Fikkert and Freitas 2004) or to investigate the structure of the learner’s syllables, arguing for specific linguistic representations (Fikkert 1994; Goad 2002; Ning 2005). Issues of phonological processing and representations are also discussed in this volume in Chapters 4 by Heather Goad and 33 by Daniel Dinnsen, Jessica Barlow, and Judith Gierut. In addition to production-related issues, a growing number of works have looked into the question of modeling the acquisition process. Many earlier studies relied on Optimality Theory (Prince and Smolensky 2004) to show how phonotactics, syllable structure, and other similar phonological regularities may be acquired in the process of constraint (re)ranking (among others, Jusczyk et al. 2002, 2003; Hayes 2004). More recently, various computational models have also been proposed, such as the learning algorithm for acquiring full phonotactic systems in Hayes and Wilson (2008) and the model of learning sonority-based regularities in Daland et al. (2011). For further discussion on the phonological and computational modeling of language acquisition, we refer the reader to Chapter 28 on word segmentation by Pearl and Goldwater in this volume and the general reviews of the modeling literature in Albright and Hayes (2011) and Moreton and Pater (2011).

3.5 Conclusion The review in this chapter focused on the development of phonotactics and syllable structure in infancy. In sum, at around 9 months of age infants begin to demonstrate

42 Tania S. Zamuner and Viktor Kharlamov sensitivity to the legality of the distribution of phonemes and the usage frequency of individual sounds and sound combinations in the ambient language. While infant research has not centered on the question of how this learning takes place, it is generally accepted that phonotactic awareness starts with non-abstract, language-specific knowledge (knowledge of sound inventories, frequency counts, etc.) and that more abstract or rule-governed knowledge emerges at later stages in development. This phonotactic knowledge is also thought to reflect sublexical levels of representation. Beyond that, a number of more recent studies have begun to explore a range of factors that may come into play in the acquisition of phonotactic knowledge. These factors include acoustic saliency and legality of phonotactic patterns, phonemic status of segments in infants’ ambient language, and the role of prosodic/word domains. Other studies have started investigating how learners analyze and represent phonotactic knowledge by exploring the types of novel first-and second-order phonotactic patterns that can be learned by infants. These studies have found that infants are capable of learning arbitrary first- order phonotactic patterns and that learners appear to have no specific preference for phonetic naturalness. This ability might be limited to older infants, as younger learners appear to rely more on statistical patterns in the language, whereas older infants seem to have access to more abstract phonotactic knowledge. Studies have also begun to examine the relationship between phonotactics and lexical acquisition, examining the potential connection between already established phonological knowledge and the acquisition of new lexical items. Lastly, work on the syllable in infant speech perception has focused on issues of early representations of speech, arguing for syllable-sized units playing a role already at the early stages of infant speech perception. However, this type of work is relatively sparse, and the role of the syllable in infant speech processing and the development of syllable structure largely remains to be explored in more detail. Notably, the many findings from developmental speech production studies looking at the acquisition of phonotactics and syllable structure are yet to be integrated with those coming from the domain of speech perception. More detailed investigation is also needed into how phonotactic and syllable structure knowledge develop in learners acquiring more than one language and what kinds of differences are found cross-linguistically. It is also important to understand what it means to acquire a phonological inventory in the first place, a question that is discussed in the current volume in Chapter 2 by Ewan Dunbar and William Idsardi. Future research into these and other areas pointed out in this chapter will undoubtedly help us to better our understanding of how phonotactics, syllable structure, and linguistic knowledge in general are represented and processed in the human brain.

Acknowledgments Thank you to Kyle Chambers, Alejandrina Cristià, Amanda Seidl, and an anonymous reviewer for providing helpful feedback on earlier drafts of this chapter.

Chapter 4

Phonol o g i c a l Pro c esses in C h i l dre n’ s Produ ct i ons Convergence with and Divergence from Adult Grammars Heather Goad

4.1 Introduction A body of evidence shows that children’s early phonological grammars are unmarked vis-à-vis adult grammars (Jakobson 1941/1968; Stampe 1969); and although learners may take different paths toward the target grammar, the patterns they show largely reflect the typological options displayed in end-state grammars (Gnanadesikan 1995/2004). Accordingly, most research in the generative tradition has examined children’s behavior in relation to some adult grammar, typically the target grammar. This perspective is particularly evident in Optimality Theory (OT) (Prince and Smolensky 2004) where children’s initial grammars, stages in development, and variation observed within and across learners are formally expressed in the same manner as cross-linguistic variation in adult grammars: through differences in constraint ranking. Children’s grammars are thus typically viewed as “possible grammars” (White 1982; Pinker 1984), as systems that respect the same principles and constraints as do adult grammars, even if they bear only some resemblance to the target grammar. Indeed, coupled with the idea that development involves minimal constraint reranking over time, this approach predicts the existence of intermediate grammars that are neither entirely unmarked nor completely target-like (e.g. Levelt et al. 2000; Rose 2000). The OT approach has been fruitful in accounting for certain types of patterns that motivated some researchers to abandon the view that developing grammars are on a

44 Heather Goad trajectory toward the adult grammar in favour of the alternative that children’s grammars are self-contained systems subject to their own constraints (Stoel-Gammon and Cooper 1984; Vihman 1996). At the same time, however, this approach has generally placed the burden of explanation for productions that deviate from adult forms on a non-target-like phonological system. A central problem with this emerges in situations where children display patterns in development which, when viewed from the perspective of adult grammars, are unexpected and possibly universally illicit (Drachman 1978; Hale and Reiss 1998; Buckley 2003). The question that arises, thus, is whether children possess “rogue grammars” in the phonological domain (Goad 2006), systems that differ in fundamental ways from adult grammars, or whether rogue behavior can be explained through examining how perceptual and motor development interface with the acquisition of an adult-like grammar. This chapter overviews some of the ways that children’s phonological processes converge with and diverge from those attested in adult grammars. Although we will consider children’s productions in relation to some adult target, sources of explanation outside the grammar proper for mismatches between child and adult outputs take center stage, specifically, the developing perceptual system and immature vocal tract (Locke 1993). Development in perception is not complete at the onset of production (Shvachkin 1948/1973; Garnica 1973; Edwards 1974; Brown and Matthews 1993; Fikkert 2010). Nonetheless, in production studies, it is typically assumed (since Smith 1973) that, in the absence of evidence to the contrary, children accurately perceive the ambient input. Accordingly, the stored forms linguists propose for children correspond to adult outputs. In cases where development in production mirrors development in perception (Pater 2004), this supposition is not fatal: mispronunciations may at early stages have as their source misperception, but the parallels observed between the perception and production components of the grammar mean that when the perceptual challenges are overcome at a later stage, mispronunciations will be due to the same constraints operating in production. However, there are cases that do not fit this parallel perception-production view of the grammar. We consider, in this context, velar substitution: Amahl’s production of ‘puddle’-type words as [pᴧɡǝl] (Smith 1973). In this case, misproduction, if stemming from target-like perception, will lead to characterization of the child’s system as a rogue grammar. Instead, following Macken (1980), it will be argued that some rogue behavior like this has as its source misperception alone; once the perception problem is overcome, target-like production emerges. Poor motor control is considered as another source of explanation for rogue behavior. On one hand, vocal tract immaturity may impact the shape of an otherwise adultlike phonological system yielding non-adult-like behavior. We will examine consonant harmony and velar fronting in this context. We will also consider the possibility that some unexpected substitutions or deletions in children’s productions do not involve substitution or deletion at all but, instead, reflect “covert contrasts” (Kornfeld 1976; Macken and Barton 1980a; Scobbie 1998; Munson et al. 2010). Covert contrasts are genuine contrasts produced by a non-adult-like vocal tract that are detectable through

Phonological Processes in Children’s Productions 45 instrumental analysis but are misanalyzed by transcribers because they are filtered through the mature perceptual system of the adult. Here, we will focus on children who produce putative glides in place of target liquids; as will be discussed, if these substitutions are authentic and are attested in branching onsets (e.g. ‘try’→ [twaɪ]), the result will be a rogue grammar. If, however, instrumental analysis were to reveal that the putative glides have liquid properties, this unwanted conclusion would be avoided. Since this chapter views children’s phonological processes from the perspective of whether they parallel or depart from those attested in adult grammars, it sees linguistic theory as essential both to interpret data from developing grammars and to predict which patterns are and are not analyzable as grammar-driven. I will adopt OT, as core assumptions from this theory, combined with assumptions about acquisition, lead to predictions about the types of behavior that could—and should not—be observed in developing grammars. We turn to these assumptions next.

4.2 Children’s Grammars as Possible Grammars The following OT premises are of interest. (i) All constraints are universal in the sense that they are universally present in the grammars of all languages.1 Since constraints refer to structures (e.g. hierarchical relationships holding in prosodic constituency) and primitives (features, prosodic constituents) and express restrictions on representations (e.g. deletion, feature agreement), these must be universal as well. (ii) Constraints are rankable and therefore violable. Following from this, cross-linguistic variation is primarily captured through differences in constraint ranking. (iii) Constraints are principally of two types: markedness constraints, which strive for structurally-and/or phonetically-defined well-formedness, and faithfulness constraints, which strive to maintain identity between input and output. Some core assumptions about acquisition are as follows. (iv) As mentioned, children’s grammars are possible grammars in the sense that at every stage in development, they abide by the same constraints as do adult grammars; while children’s grammars may contain processes not present in the target language, these processes must have direct correlates in other adult languages. (v) Children learn through exposure to positive evidence only (Wexler and Culicover 1980; Pinker 1984). (vi) Early grammars are unmarked (Jakobson 1941/1968; Stampe 1969). 1 Universal is typically interpreted as innate (e.g. Tesar and Smolensky 1993; Gnanadesikan 1995/ 2004). However, the position that (all) constraints are innate has been challenged. Hayes (1999), for example, proposes that phonetically-grounded markedness constraints are induced by language learners as they process the ambient data and thereby emerge from the learner’s articulatory and perceptual limitations. Given the position that children’s grammars are possible grammars, I assume at the outset that all constraints are innate. We return to this issue in section 4.3.2.

46 Heather Goad If we pair the OT premises in (i)–(iii) with the acquisition assumptions in (iv)–(vi), testable predictions about the shapes of early grammars follow. If we couple (i) with (iv), we predict that all linguistic systems, both developing and end-state, have access to the same toolkit. Children’s grammars contain no more than adult grammars; that is, only constraints present in end-state grammars should be present in the child’s initial grammar: there should be no child-specific constraints (cf. Pater 1997; McAllister Byun 2011) or child-specific interpretation of constraints (cf. Pater and Werle 2003). Conversely, children’s grammars contain no less than adult grammars: there should be no late emergence of constraints or of the structures, operations and primitives they refer to (cf. Demuth 1995b; Goad 1996). If we combine (ii) with (v), we predict that development involves minimal constraint reranking over time (Pater 1997; Bernhardt and Stemberger 1998), parallel to what is observed in adult language typology (Rose 2000; Levelt and van de Vijver 2004), through exposure to positive evidence. Finally, if we pair (iii) with (vi), we conclude that there is an initial ranking of constraints, markedness >> faithfulness, the starting point assumed by most researchers (following Demuth 1995b; Gnanadesikan 1995/2004; Smolensky 1996b; cf. Hale and Reiss 1998; Buckley 2003). In the following sections, we examine these predictions. We will see that the relation between developmental behavior and grammatical theory, as envisioned by OT, is well- supported on some fronts and questioned on others. We begin with the positive results (sections 4.2.1–4.2.2), then turn to the challenges (sections 4.3–4.5).

4.2.1 High-ranking Markedness As mentioned in section 4.1, it has commonly been observed that early grammars are unmarked. We pointed out earlier in section 4.2 that in the optimality-theoretic literature, this has formally been expressed through high-ranking markedness constraints whose satisfaction comes at the expense of faithfulness. In what follows, we demonstrate that a frequently attested pattern in development—onset selection in cluster reduction—finds parallels in adult language behavior that is, itself, deemed to be cross- linguistically unmarked. Parallels between early grammars and unmarked adult systems and the consequent formal expression in OT as markedness over faithfulness was first empirically demonstrated by Gnanadesikan (1995/2004) in her analysis of G’s patterns of cluster reduction (age 2;3–2;9). When learning languages with left-edge clusters, perhaps all children initially reduce the cluster to a singleton. Although children could select either member of the cluster, the pattern that G follows is particularly common: the least sonorous member survives, as in (1a) (see also Fikkert 1994; Barlow 1997; Gierut 1999; Ohala 1999; Goad and Rose 2004). Gnanadesikan shows that this pattern finds an exact parallel in Sanskrit perfect reduplication: left-edge clusters, which are permitted in bases, are reduced to the least sonorous consonant in the reduplicant; see (1b) (Whitney 1889; glosses provided by Brendan Gillon p.c.).

Phonological Processes in Children’s Productions 47 (1)

a. G’s grammar: [fεn] ‘friend’ [bɪw] ‘spill’ [so] ‘snow’ [sɪp] ‘slip’

b. Sanskrit reduplication: [pa-prach] ‘asked’ h [pu-sp uṭ] ‘burst’ [sa-snaː] ‘bathed’ [ši-šliṣ] ‘clasped’

The patterns in (1) are consistent with the cross-linguistically supported observation that onsets of low sonority are favored over those of high sonority and that singleton onsets are preferred over clusters (Clements 1990). In OT terms, this supports a ranking of the markedness constraint(s) responsible for reducing clusters over segmental faithfulness, with other constraints responsible for selecting the onset of lowest sonority. As schematized in (2) this holds in the initial state and is maintained in (some components of) some adult grammars. In the case of Sanskrit, it holds only in reduplication and thus reflects emergence of the unmarked (McCarthy and Prince 1994). In Sanskrit more generally and in English, the target grammar for G, faithfulness to input clusters prevails and no effect of sonority is therefore visible. (2)

Markedness constraints yield:

Initial state:

End-state grammars:

G

Sanskrit reduplicants

Sanskrit bases

English

Cluster reduction

respected

respected

not respected

not respected

Sonority-driven onset selection

respected

respected

—

—

Although we have only scratched the surface (see Gnanadesikan 1995/ 2004 for details), sonority-driven onset selection in cluster reduction demonstrates that cross- linguistically unmarked patterns also characterize children’s grammars. This holds even when there is no evidence that the particular constraints involved must be satisfied in the grammar being acquired. In the next section, we formally exemplify markedness >> faithfulness, coupled with an examination of minimal demotion of markedness constraints over the course of development.

4.2.2 Development as Minimal Constraint Reranking As mentioned in section 4.2, if acquisition involves minimal constraint reranking over time, parallel to what is observed in adult language typology, children’s intermediate grammars can contain patterns not present in the ambient input, as long as these patterns are attested in other end-state grammars. Here, we provide evidence for this

48 Heather Goad from the development of branching onsets in Québec French (Rose 2000). We will see that minimal demotion of the markedness constraint against branching onsets yields an intermediate grammar which is neither completely unmarked nor target-like, yet is attested in other adult languages. The typology in (3) shows that three options are found concerning the distribution of branching onsets in relation to stress. Japanese and French fall at opposite ends in either forbidding or permitting branching onsets regardless of stress, while Southeastern Brazilian Portuguese falls in the middle: branching onsets are only permitted in stressed syllables (Harris 1997). (3)

Stressed syllables

Unstressed syllables

Japanese

no

no

SE Brazilian Portuguese

yes

no

French

yes

yes

Unattested

no

yes

The Southeastern Brazilian Portuguese pattern will be of central interest when we examine development, so we exemplify it in (4). In diminutive constructions, stress shifts rightward, which affects whether or not an underlying branching onset in the root can surface. (4) [ˈpratu] [paˈtʃiɲu], *[praˈtʃiɲu]

[ˈlivu], *[ˈlivru] [liˈvretu]

‘plate’ ‘small plate’

‘book’ ‘small book’

Consider now the acquisition data. The forms in (5) reveal that Théo, a learner of Québec French, goes through three stages in the development of branching onsets (Rose 2000). At Stage 1, the cluster is reduced to the first consonant; at Stage 3, clusters are produced as target-like. Most interesting is the intermediate stage in (5) where Théo produces branching onsets in stressed syllables only, parallel to Southeastern Brazilian Portuguese. Théo is uniformly exposed to French, where branching onsets are robustly attested in unstressed syllables. Clearly, we are not seeing an effect of the ambient input; rather, we are observing a grammar-driven effect that reflects positional constraints on the distribution of branching onsets. (5)

Acquisition of branching onsets: Target: Stage 1 (1;10.27–2;05.11): [kle] [bʁiˈze]

Théo: [ke] [pɪˈz̯e]

‘key’ ‘broken’

Stage 2 (2;05.29–2;11.29):

[plœʁ] [ɡʁyˈjo]

[plœʊ] [khœˈjɔ]

‘cry’-3sg ‘oatmeal’

Stage 3 (from 3;00.07):

[pʁǝˈne]

[pʁǝˈne]

‘take’-2pl

Phonological Processes in Children’s Productions 49 This pattern involves minimal demotion of the constraint against branching onsets, *Complex(Onset), relative to two faithful constraints opposing deletion: Max-IO, which states that every segment in the input has a correspondent in the output, and a positional faithfulness constraint, StressMax-IO, which forbids deletion from an output stressed syllable.2 The tableau in (6) shows that, at Stage 1, *Complex dominates both faithfulness constraints, ensuring that candidate (b) is selected over the faithful (a).3 (6) Stage 1: /kle/

*Complex

a. ˈkle

StressMax

Max

*

*

*!

☞ b. ˈke

Although *Complex must end up below both faithfulness constraints in French, minimal demotion of this constraint will result in children passing through the Southeastern Brazilian Portuguese pattern en route to target French. See (7); with *Complex ranked between StressMax and Max, branching onsets only survive in stressed syllables. (7) Stage 2: Stressed syllables: /plœʁ/

Unstressed syllables:

Stress Max *Comp Max

☞ a. ˈplœʊ b. ˈpœʊ

* *!

/ɡʁyjo/ a. kʁœˈjɔ

*

☞ b. khœˈjɔ

Stress Max *Comp Max *! *

At Stage 3, the target grammar is reached, indicating that *Complex has been demoted to its appropriate place for adult French: (8) Stage 3: /pʁǝne/

Stress Max

Max *Complex *

☞ a. pʁǝˈne b. pǝˈne

*!

2 Rose uses MaxHead(Foot) instead of StressMax, which demands that every element in the head of the foot in the input have a correspondent in the head of the foot in the output. This constraint requires that inputs be prosodified, counter to Richness of the Base (Prince and Smolensky 2004). For the data at hand, StressMax, which makes no claim about input prosodification, is sufficient (cf. Pater 1997). 3 An additional constraint requiring the head of the onset (Rose 2000) or least sonorous member of the cluster (section 4.2.1) to survive is needed to select [ke] over [le].

50 Heather Goad Théo’s data support the position that children’s phonological processes mirror those observed in adult grammars, as was seen earlier for G. Moving beyond G’s data, however, the present case shows that development as minimal reranking reflects the typological options that adult languages display, in spite of no evidence in the ambient data for the intermediate pattern observed.4

4.3 Do Children’s Grammars Contain the Same Toolkit as Adult Grammars? As mentioned in section 4.2, the observation that languages differ from one another in limited and systematic ways is formally expressed in OT through all systems having access to the same toolkit: they manipulate the same set of constraints and have access to the same primitives and principles of organization. If children’s grammars are possible grammars, they should employ the same toolkit as adult grammars. In the preceding sections, we provided examples that are consistent with this position. In the rest of the chapter, we examine some challenges: processes in early grammars which appear not to show convergence with those in adult grammars. We first discuss processes which suggest that developing and end-state grammars may be formally different: children may contain both less (section 4.3.1) and more (section 4.3.2) in their grammatical toolkit than adults. We then turn, in section 4.4, to other types of divergent behavior which we argue are very likely consistent with the view that children have access to the same grammatical apparatus as adults, once additional factors are considered: the developing perceptual system and motor control.

4.3.1 Children’s Grammars may Contain less than Adult Grammars If children’s grammars are possible grammars, they should contain no less than adult grammars: there should be no late emergence of constraints or the structures, primitives, and operations they refer to. The “CV stage” in acquisition, during which some children tolerate a significant number of CV outputs, challenges this view. Children’s ability to control a variety of prosodic shapes develops in a systematic way over time, from less to more complex, seemingly consistent with an initial markedness 4 Modeling the intermediate stage presents challenges. The Gradual Learning Algorithm (Boersma and Hayes 2001), a probabilistic version of OT, runs into difficulties because *Complex must be demoted below a positional faithfulness constraint (StressMax) before its general counterpart (Max). The problem is that the number of violations of the positional constraint will always be a subset of that of its general counterpart. Jesney and Tessier (2011) provide a solution where the strictly-ranked constraints of standard OT are replaced with weighted constraints: lower-weighted constraints (Max and StressMax) can ‘gang up’ and triumph over higher-weighted constraints (*Complex).

Phonological Processes in Children’s Productions 51 >> faithfulness ranking. If children’s grammars contain the same toolkit as adult grammars, unmarked outputs in the former should arise from high-ranking markedness constraints targeting particular prosodic constituents; it should not be necessary to posit that children begin acquisition with an impoverished prosodic hierarchy (cf. Demuth and Fee 1995). Support for this comes from the “minimal word stage,” when the binary foot (σσ or μμ) defines an upper and lower bound on learners’ outputs (see Kehoe and Stoel-Gammon 1997 for a review; Chapter 5 by Ota in this volume). For children learning English-like languages that respect word minimality (lexical words do not fall below a binary minimum), the minimal word stage typically manifests itself through truncation of longer words (e.g. [teːdo] for ‘potato’ (Trevor at 1;10.02) (Pater 1997)). For children learning Japanese-like languages that tolerate monomoraic words, truncation appears alongside augmentation of CV targets (e.g. [tadaima]→[meda] ‘I’m back,’ [ki]→[ ɟiː] ‘tree’ (Kenta at 1;11.02, 2;2.27) (Ota 2003b)). The minimal word stage can be straightforwardly captured in OT if markedness constraints regulating the shapes of words (FtBin: feet are binary; Parse-σ: syllables are parsed into feet; All-Feet-Left: the left edge of every foot is aligned with the left edge of the prosodic word) dominate Max as well as Dep, which prohibits epenthesis (cf. Pater 1997). The result is that all prosodic words (PWds) are exactly one binary foot in size: (9) FtBin, Parse-σ, All-Ft-Left >> Max: [(teːdo)Ft]PWd, [(meda)Ft]PWd Dep: [( ɟiː)Ft]PWd Prior to the minimal word stage, many children’s productions are maximally CV in size. Although such outputs form optimal syllables, because they are not binary, they do not form optimal feet nor, therefore, optimal PWds. In the following paragraphs we show that a solution to this problem employing high-ranking markedness constraints is spurious. An alterative solution is thus entertained: children begin acquisition without the full set of prosodic constituents available to adult grammars. At the CV stage, only core syllables are attested (e.g. Jakobson 1941/1968; Ingram 1978; Fikkert 1994; Demuth and Fee 1995). Representative examples from Mollie (Holmes 1927) are in (10). While the ban on complexity yields syllables that are relatively unmarked, CVX(C) targets typically undergo deletion of offending material rather than triggering epenthesis and are thus rendered as CV words.5 (10) CV stage: Target: a. [ɡʊd] [wɑnt] [kout] b. [bεd] [ðæt] 5

Mollie: [ɡu], [ɡuˑdː] [wɑ] [ko] [bε], [bεˑtː] [dæ]

‘good’ ‘want’ ‘coat’ ‘bed’ ‘that’

Epenthesis, which would yield bisyllabic feet/words, is not robustly attested in development (Demuth et al. 2006), but it does occur regularly in some children’s grammars (e.g. Padmint: [ˈtɔpɔ] ‘top’; [ˈbihi] ‘beach’ (Ross 1937)).

52 Heather Goad Not all children begin acquisition with a significant number of productions of this shape. Some English-speaking children (e.g. Kyle (Salidis and Johnson 1997)) have few CV outputs. The same holds for the Japanese children studied by Ota (2003b) and for French-speaking Clara (Goad and Buckley 2006), although both adult languages contain several CV words. Given the variability observed across children, we must question whether the CV stage is genuine. Perhaps for some learners, CV outputs are limited to a handful of words only (see Ota 2003b). This is not, however, the case for Mollie. Holmes (1927) provides a complete list of single-word utterances produced by Mollie on one day at age 18 months, from which the examples in (10) are drawn. There were 46 words, of which 25 were produced as monosyllabic. Of these, 18 (72 percent) were CV in shape! Perhaps for some children, the transcriptions are too broad to reflect what was actually produced, that forms transcribed as CV were actually produced as CVV. Consider again Mollie. It could be that Mollie’s outputs in (10a) with tense vowels are not subminimal, that Holmes intended, for example, [kou] or [koː] for ‘coat.’ This seems unlikely: concerning [kou], the word ‘no’ is transcribed by Holmes as [nou], suggesting that [ko] indeed contains a monophthong; concerning [koː], Holmes transcribes the difference between short, half-long, and long segments, as seen in the alternative pronunciations for some words in (10), suggesting that he is particularly sensitive to differences in length. We infer that the CV stage is genuine for at least some children, including Mollie. We turn, then, to a formal analysis of this stage. We begin by supplementing the ranking in (9) for the minimal word stage with undominated NoCoda and NoLongVowel, which prohibit syllables from having codas and long vowels respectively; see (11). With this ranking, we incorrectly predict some type of CVCV output to be selected as optimal.6 (11)

/ðæt/

FtBin NoCoda NoLongV Dep *!

a. [(dæt)Ft]PWd

*!

b. [(dæː)Ft]PWd c. [(dæ)Ft]PWd  d. [(dædæ)Ft]PWd

Max

*

*!

* *

*

Clearly, with FtBin undominated, the subminimal output in (11c) will never surface. If we introduce All-Syll-Left, which requires the left edge of every syllable to be aligned with the left edge of the PWd, and rank it above FtBin, as in (12), the subminimal form will be selected. As long as All-Syll-Left is demoted below FtBin at the minimal word stage, binary outputs will correctly be selected as optimal at that stage. 6

NoCoda is undominated in (11) because the larger dataset in Holmes (1927) suggests that the CVC outputs in (10) reflect the next developmental stage.

Phonological Processes in Children’s Productions 53 (12)

/ðæt/

All-Syll-Left NoCoda NoLongV FtBin Dep Max

*!

a. [(dæt)Ft]PWd

*!

b. [(dæː)Ft]PWd

* *!

☞ c. [(dæ)Ft]PWd d. [(dædæ)Ft]PWd

*!

* *

*

However, although All-Syll-Left capitalizes on the fact that alignment constraints can take any prosodic constituents as arguments, it receives no cross-linguistic support.7 We must therefore reject the analysis in (12). A second possibility is that CV outputs are not footed at this developmental stage. This would appear to violate Parse-σ, which we saw in (9) is undominated at the minimal word stage. However, if there were simply no foot projection in the grammar at the CV stage (Fikkert 1994; Demuth 1995b; Demuth and Fee 1995; Goad 1996), then FtBin and Parse-σ would be vacuously satisfied. Consider (13); with Dep ranked over Max in Mollie’s grammar, the CV output is selected as optimal for monosyllabic inputs. (13)

/ðæt/ a. [dæt]PWd b. [dæː]PWd

FtBin Parse-σ NoCoda NoLong V Dep Max *! *!

*

*

☞ c. [dæ]PWd

d. [dædæ]PWd

*

*!

Evidence for this analysis comes from Holmes’s (1927: 221) comment on Mollie’s two syllable words at 18 months: “Each of these syllables received an equal stress.” With no foot projection, equal prominence on both syllables is exactly what is expected. Equal prominence could also arise from Mollie not knowing whether feet are left-or right- headed (simultaneous satisfaction of FootShape(Troch) and FootShape(Iamb)). However, if bisyllabic forms with equal prominence are truly footed, then FtBin, along with these two constraints, would predict monosyllabic forms to be realized as bisyllabic with equal prominence, counter to what was observed in (10). Under the approach outlined here, absence of a foot projection is what defines the CV stage. Assuming that this constituent can be projected in the course of acquisition based 7

Mandarin is often raised as a language that may appear to motivate All-Syll-Left. However, although many Mandarin words are monosyllabic, this arises not from All-Syll-Left but from the fact that morphemes are predominantly monosyllabic and Chinese languages have limited bound morphology (Yip 1992). Concerning the shapes of prosodic words, which All-Syll-Left refers to, there is no pressure toward monosyllabic PWds in Mandarin. Indeed, less than 30 percent of Mandarin words are monosyllabic and most newly introduced words are bisyllabic (Duanmu 2007).

54 Heather Goad on positive evidence (greater prominence associated with stressed syllables), does this mean that children’s early grammars are deviant? Yes, unless it can be shown that there are adult languages that are footless. Although it is standardly accepted that all languages contain the foot (Selkirk 1995: 189–90), McCarthy and Prince (1995b: 323) mention the possibility of footless languages, and the literature on some languages contains analyses of prosodic phenomena assuming no foot (e.g. Jun and Fougeron 2000 on French; Özçelik 2009 on Turkish).8 Clearly, more work is required, but if the foot proves to be universally present in adult grammars and if the CV stage is genuine, we must accept the possibility that early grammars do not contain exactly the same toolkit as adult grammars: behavior that cannot be captured solely through constraint ranking may require that children begin acquisition with an impoverished set of (prosodic) primitives.

4.3.2 Children’s Grammars may Contain More than Adult Grammars Consider now the other side of the coin. If children have access to the same toolkit as adults, their grammars should contain no more than adult grammars: only constraints present in end-state systems should be evidenced in development; there should be no child-specific constraints or child-specific interpretation of constraints. Some processes in children’s grammars, notably consonant harmony (CH), challenge this view. In this case, vocal tract immaturity may be responsible: it impacts the shape of an otherwise adult-like phonological system yielding non-adult-like behavior. The question is how to formally express this, given that, in some ways, CH looks like ordinary phonology. In CH, consonants at a distance agree in primary place features (Vihman 1978; see Levelt 2011 for a review). Representative examples from Amahl’s regressive coronal-to- dorsal harmony at age 2.60 (years.days) are in (14) (Smith 1973) ([ɡ̇] = voiceless unaspirated lenis). (14) Coronal-to-dorsal CH: Target: Amahl: [daːk] [ɡ̇aːk] ‘dark’ i [sneik] [ŋe k] ‘snake’ [sɔk] [ɡ̇ɔk] ‘sock’ [lεɡǝu] [ɡ̇εɡuː] ‘Lego’ Across children, this type of harmony is most widely attested: (i) coronals are the optimal targets (Spencer 1986; Stemberger and Stoel-Gammon 1991; Bernhardt and Stemberger 1998); (ii) right-to-left directionality is preferred (Menn 1971; Smith 1973; Vihman 1978, 8 If French is truly footless, we should not expect learners of this language to evidence the foot. This, however, is contrary to what has been observed (Vihman et al. 1998; Goad and Buckley 2006; Demuth and Tremblay 2008; Goad and Prévost 2010).

Phonological Processes in Children’s Productions 55 Stoel-Gammon 1996); (iii) velars are better triggers than labials (Smith 1973; Pater 1997). All of these observations also define adult assimilation (Pater and Werle 2003): (i) coronals are prime targets in assimilation (Paradis and Prunet 1991); (ii) right-to-left directionality is preferred: local place assimilation between consonants targets codas from following onsets (Jun 1995) and long-distance adult CH is typically regressive (Hansson 2001); (iii) in some languages, notably Korean, velars trigger place assimilation targeting labials (Jun 1995). Before turning to the formal challenges that child CH presents, we briefly address whether it involves “real” phonology. As Levelt (2011) points out, systematic child CH may be less widely attested than the literature implies: it is clearly not always productive, and may for some children represent a speech error rather than rule-governed behavior. The speech error account cannot, however, always hold: Gormley’s (2003) articulatory and acoustic examination of CH reveals fully harmonic forms, which would be unexpected if harmony were the result of poor motor planning like speech errors (see also Vihman 1978). Further, Amahl’s forms in (14) reflect a fully productive process: of the 25 coronal-dorsal targets at age 2.60, 100 percent undergo CH; over the three-month period from 2.60 to 2.152, 95/113 (84 percent) of coronal-dorsal targets undergo CH; and during the last stage when the process is productive, three months later (2.233–2.242), 21/38 (55 percent) of suitable targets still undergo CH. Clearly, CH represents productive rule-governed behavior for some children. We now address the formal challenges that arise from this conclusion. CH of the type exhibited in children’s grammars is not attested in end-state grammars (Vihman 1978), an observation confirmed by recent surveys of adult CH (Hansson 2001; Rose and Walker 2004). Although consonant- to- consonant assimilation for primary place features is widespread in adult languages, it is limited to string-adjacent consonants. Long-distance patterns only hold for secondary place features, for example, in Barbareño Chumash, where apical and laminal sibilants assimilate right-to-left within a word (Beeler 1970): [kiškín] ‘I save it, store it up,’ [kiskínus] ‘I saved it for him’; [kasúnan] ‘I command,’ [kaʔalašúnaš] ‘he’s the boss.’ Two general approaches to child CH have been proposed. On one hand are analyses that involve the same formal mechanisms available to adult grammars. Most of these predict that CH should be freely attested in adult languages (Smith 1973; Stemberger and Stoel-Gammon 1991; Rose 2000; Goad 2001); others predict more restricted CH, either limiting it to languages with a particular profile (McDonough and Myers 1991) or to languages under emergence of the unmarked scenarios (Goad 1997). As all of these proposals assume that learners have access to the same toolkit as adults, they cannot confine CH to development. On the other hand are proposals that assume that some formal change takes place in children’s grammars, which accounts for the discontinuity observed as concerns CH (Menn 1978; Pater 1997; Pater and Werle 2003; see also Fikkert and Levelt 2008). We focus principally on Pater (1997). As mentioned, in some ways, CH resembles ordinary phonology, notably in the preference for coronal targets and velar triggers. Pater proposes that this follows from a fixed ranking of faithfulness constraints: Faith(Dor) >> Faith(Lab) >> Faith(Cor)

56 Heather Goad (see also Goad 1997). Harmony arises from a child-specific constraint, Repeat, which requires successive consonants to agree in primary place (cf. Menn 1978). Repeat is proposed to be induced by the child in response to the articulatory pressures that arise from the developing phonological system (cf. Hayes 1999). Kiparsky and Menn (1977), Vihman (1978), Fikkert and Levelt (2008), and Becker and Tessier (2011) point out that CH is an emergent process, which may indicate that a child-induced constraint like Repeat is along the right lines.9 Repeat can be interranked with “regular” constraints and demoted through the course of acquisition like other markedness constraints. These properties are illustrated in the tableaux in (15) and (16) for Trevor. At Stage 1, both coronal-and labial-initial forms assimilate to following velars, driven by the ranking of Repeat above Faith(Lab) and Faith(Cor); see (15). (15)

Stage 1: /dɔɡ/ ‘dog’

Faith(Dor)

/bæk/ ‘back’

*! Faith(Dor)

Repeat

Faith(Lab)

Faith(Cor)

*!

a. [bæk]

*

b. [ɡæk] c. [bæp]

Faith(Cor) *

b. [ɡɔɡ] c. [dɔd]

☞

Faith(Lab)

*!

a. [dɔɡ] ☞

Repeat

*!

At Stage 2, only coronal-initial forms continue to undergo CH. With Repeat demoted below Faith(Lab), ‘back’ is now faithfully rendered as [bæk]: (16)

Stage 2: /dɔɡ/

‘dog’

Faith(Dor) Faith(Lab) Repeat Faith(Cor) *!

a. [dɔɡ] ☞

*

b. [ɡɔɡ] c. [dɔd]

*!

9 If CH is the child’s response to articulatory pressures, we must question why Repeat is not induced at the earliest stage. Perhaps these pressures arise only at the point when the child must contend with a rapidly expanding lexicon (cf. Vihman 1978: 328).

Phonological Processes in Children’s Productions 57 /bæk/ ☞

‘back’

Faith(Dor) Faith(Lab) Repeat Faith(Cor) *

a. [bæk] *!

b. [ɡæk] c. [bæb]

*!

What is appealing about this approach is that it isolates the child-specific part of CH by means of Repeat, allowing other aspects of the process to be handled through independently-needed constraints.10 The merits of this approach, however, are also its downfall: we must ensure that CH effects are not inadvertently predicted to occur in adult grammars. Demotion of Repeat to the bottom of the constraint hierarchy may appear to be ideal, as the child–adult asymmetry would be reduced to a difference in ranking. Demotion, however, cannot guarantee that Repeat will not rear its head in emergence of the unmarked scenarios. Instead, Repeat must be purged from the grammar. This approach to CH must therefore accept the position that children begin with a richer toolkit than that available to adult grammars.11

4.4 Rogue Substitutions: Is Children’s Divergent Behavior always Grammar-driven? We have hitherto discussed cases of divergence between developing and end-state grammars that may require that children begin acquisition with a set of primitives and constraints that are not identical to those manipulated by adult systems. On one hand, accounting for the CV stage may require that children’s grammars contain less than adult grammars, specifically, that they lack the full set of prosodic constituents available to adults. On the other hand, explaining consonant harmony may require that children’s grammars contain more than adult grammars, constraints that are confined solely to development. In this section, we examine types of divergent behavior that, on closer examination, appear to be amenable to formal explanation using the same primitives and constraints that define adult grammars. 10

Pater and Werle (2003) propose that Repeat be replaced by Agree, an independently-motivated constraint responsible for assimilation in adult grammars (Lombardi 1999; Baković 2000a). What distinguishes consonant-to-consonant assimilation in child and adult systems is that Agree has wider scope in the former: it is not limited to applying between string-adjacent segments. The discussion in the text focuses on Repeat, although the same issues arise for wide scope Agree. 11 Child-specific constraints can conceivably be purged from the grammar once the articulatory limitations which compelled their creation are overcome (Hayes 1999; McAllister Byun 2011). The problem remains nonetheless: children’s grammars are drawing on different formal resources than adult grammars.

58 Heather Goad We consider that some rogue substitutions may arise from physiological constraints: for anatomical reasons, children simply miss the adult target. Others may reflect covert contrasts where, because of poor gestural timing, learners’ productions are misinterpreted by adult transcribers. Others may arise from children misperceiving target material. And others may indicate that learners have adopted an analysis of the ambient data that is different from that assumed by the linguist. As we will see, all four possibilities potentially lead to explanations for rogue behavior; and, consequently, children’s grammars may not, in the end, stray too far from what is formally observed in adult systems.

4.4.1 Velar Fronting We begin with velar fronting (VF), where velars surface as coronal across-the-board or, surprisingly, it first seems, in prosodically-strong positions (Ingram 1974; Chiat 1983; Stoel-Gammon 1996; Morrisette et al. 2003; Inkelas and Rose 2007). The examples in (17a) show that English-speaking E restricted VF to word-initial onsets and word- medial onsets in stressed syllables; velars in medial unstressed onsets and in codas were realized faithfully (17b) (Inkelas and Rose 2007). (17) E’s positional VF: a. Prosodically-strong positions: [thᴧp] ‘cup’ (1;09.23) [tᴧnˈdᴧktǝ] ‘conductor’ (2;01.21) [ˈhεksǝˌdɔn] ‘hexagon’ (2;02.22)

b. Prosodically-weak positions: [ˈbejɡu] ‘bagel’ (1;09.23) [ˈɑktǝɡɑn] ‘octagon’ (2;01.05) [bʊkh] ‘book’ (1;07.22)

Inkelas and Rose (2007) propose that anatomical constraints and grammatical considerations together account for this child-specific process, including its restriction for children like E to prosodically-strong positions (see also McAllister Byun 2012). Physiological constraints on young children’s vocal anatomy, including a relatively bigger tongue size and shorter palate, result in velars being articulated in a more anterior position than for adults. The restriction to prosodically-strong positions comes from the greater gestural amplitude exhibited in these positions which disproportionately affects velars, resulting in their being realized with greater palatal contact (Fougeron and Keating 1996; Fougeron 1999 on adult languages). Although Inkelas and Rose do not provide a formal analysis of VF, they propose that physiological constraints distinguishing the child’s vocal tract from that of the adult are sufficient to restrict this process to developing grammars. That is, the productions of velar- fronting children are only “cosmetically unfaithful” to target velars (Rose 2009: 340). If VF can be reduced to physiological considerations, no child–adult asymmetry must be formally expressed by the grammar. We conclude, then, that some types of child- specific behavior, like VF, may be amenable to explanations outside the grammar while others, like CH, require a grammar-internal explanation.12 12

Although see Qu (2011), who provides a unified grammar-driven account of both processes tied to the development of place features.

Phonological Processes in Children’s Productions 59

4.4.2 Glide Substitution We consider now the robustly-observed process of glide substitution for target liquids. Representative data from English-speaking Jake (age 2;1) are in (18) (Bleile 1991). Jake produced no liquids at all: onset liquids were realized as glides, in both clusters (18a) and singletons (18b); coda liquids were deleted or vocalized. (18) Glide substitution: Jake: a. [fwεnz] ‘friends’ [fwɔɡ] ‘frog’ [pwiz] ‘please’ [twækʊ] ‘tractor’ [stwɔbnwiz] ‘strawberries’ [kwaɪ] ‘cry’ [kwoʊz] ‘close’

b. [wεd] [wum] [waɪt] [woʊsɪn]

‘red’ ‘room’ ‘light’ ‘lotion’

We refer to the point in development, where glides consistently substitute for liquids in clusters, as the “CG stage.” At this stage, derived CG clusters are typically assumed to be represented as branching onsets (Fikkert 1994; Barlow 1997; Jongstra 2003). However, adult grammars where the only branching onsets are CG in shape are unattested: CG onsets are only permitted in languages that also allow CL (consonant+liquid) onsets (Clements 1990). Although there are end-state grammars that lack liquids, for example Blackfoot (Algonquian: southern Alberta and northwestern Montana; see Frantz 2009), in contrast to Jake, languages of this type do not permit branching onsets, consistent with Clements’s (1990) implicational universal. Jake’s grammar at the CG stage thus appears to be a rogue grammar. Further, glide substitution contrasts sharply with G’s treatment of left-edge clusters discussed earlier: sonority-driven onset selection had clear adult parallels (section 4.2.1). Given both of these observations, we pursue other explanations for glide substitution: misanalysis and covert contrast.13

4.4.2.1 Misanalysis As mentioned, in languages with branching onsets, CG onsets (19a) are only allowed if the language also allows CL (Clements 1990). In languages where clusters are CG in shape only, distributional evidence—available to the learner—reveals that the glide represents a secondary articulation (19b) (e.g. Clements 1986 on Luganda) or forms part of a light diphthong (19c) (e.g. Lee 1998 on Korean). 13 Misperception is not considered. Although there are children who misperceive liquids as glides (Aungst and Frick 1964; Monnin and Huntington 1974; McReynolds et al. 1975; Strange and Broen 1980), this does not offer a solution to the CG stage. If developing grammars are possible grammars, we would expect children who misperceive CL as CG to: (i) repair the putative CG cluster through deletion of the glide until the perceptual problem is overcome and the child realizes that what s/he thought were CG strings are in fact CL; (ii) select words for production that lack CG clusters (“selection and avoidance”; see Schwartz and Leonard 1982); or (iii) assign an analysis other than branching onset to putative CG strings. Clearly, neither (i) nor (ii) is the solution taken by children like Jake who produce CL clusters as CG. We consider (iii) in the text.

60 Heather Goad (19) a. Branching onset: Ons

b. Secondarily-articulated C:

c. Light diphthong:

Nuc

Ons

Nuc

Ons

Nuc

X

X

X

X

X

X

X

C

G

V

C

G V

C

G V

We must therefore consider whether (19b) or (19c) could be motivated for derived CG clusters in acquisition (Goad 2006; Kehoe et al. 2008). In the following paragraphs, we focus on distributional constraints present in derived CGV strings in Jake’s grammar, assessing them against the possibilities in (19). We first consider place constraints holding between C and G. Place constraints are observed when CG is syllabified as a branching onset (19a) or as a secondarily- articulated consonant (19b): branching onsets typically forbid place identity between C and G (*[pw]), parallel to what is observed with CL (*[tl]); place constraints are often observed for secondarily-articulated consonants such that labialized labials, labialized coronals, and palatalized dorsals are dispreferred (Maddieson 1984; Ladefoged and Maddieson 1996). Light diphthongs, by contrast, do not enter into place constraints between C and G, because the two segments are in different constituents (e.g. French [pwa] ‘pea’). Jake’s grammar does not display place constraints for derived CG strings: the data in (18a) show that he freely permits labial obstruents to be followed by [w]‌. This supports the light diphthong analysis in (19c). Turning to sonority, constraints hold between C and G when G is syllabified inside a branching onset (19a) or as a secondarily-articulated consonant (19b): in the former case, C is virtually always an obstruent; in the latter, C prefers to be a stop (Maddieson 1984). Fewer constraints hold when a constituent boundary interrupts C and G, in the light diphthong analysis (19c) (e.g. French [nwaʁ] ‘black’). Given that CL clusters must be obstruent-initial in English, it would be surprising to find children producing derived CG clusters with initial sonorants. However, Jake does produce one such cluster in [stwɔbnwiz] ‘strawberries.’ Although this is consistent only with the light diphthong analysis (19c), we do not know how representative this form is of his grammar overall. We thus cannot definitively rule out (19a) on sonority grounds. The presence of fricatives in CG clusters in [fwεnz] and [fwɔɡ], however, eliminates (19b). We might not expect children learning Germanic languages to constrain CG outputs to those that are stop-initial only, in support of (19b), but there are children in the CG stage who have this profile: English-speaking Kylie (Bleile 1991) and Dutch-speaking Elke (Fikkert 1994) reduce all fricative+liquid clusters to the fricative, rather than allowing these to surface as fricative+glide; this is consistent with a secondary articulation analysis for these children. For Jake, though, sonority constraints support either (19a) or (19c). Considering finally place constraints holding between G and the following vowel, constraints are observed when G and V are internal to the same constituent, as in (19c). Jake has no GV restrictions, supporting any analysis other than (19c). Again, we might

Phonological Processes in Children’s Productions 61 not expect to find such constraints, given that none hold in the target LV strings. There are learners, however, who do observe constraints on GV place: Kylie permits [w]‌in derived CG strings to be followed by a front vowel only. Taking the distributional evidence altogether, we can see that no single analysis emerges as definitive for Jake; the only option we can eliminate is (19b). This problem does not hold solely for Jake (Goad 2006), suggesting that misanalysis of derived CG as something other than branching onset is not a likely solution for the rogue CG stage.

4.4.2.2 Covert Contrast In view of this, we take a different tack and consider the possibility that some processes in acquisition, including glide substitution, are not genuine. Rather, children are producing a covert liquid–glide contrast (Kornfeld 1976; Scobbie 1998), but the contrast is not perceptible to adult transcribers. This would arise because of poor gestural timing: the cues used by children to express the liquid–glide contrast are not sufficient for linguists to perceive these classes of segments as different. If instrumental analysis were to reveal that putative glides in derived CG clusters in fact have liquid properties, this would avoid the unwanted conclusion that children’s grammars formally diverge from what is possible adult behavior. For some children, we might expect poor gestural timing to hold for all onset liquids; for others, the problem may be confined to clusters, where the child must quickly transition from the gestures involved in the closure to those required for the following liquid. As seen earlier, in Jake’s grammar, glide substitution applies across-the-board. Dutch-speaking Catootje and Elke, however, produce singleton onset liquids as targetlike during the CG stage (Fikkert 1994). To the extent possible, then, we must examine the results on covert contrast for both contexts. Spectrographic studies show that for English-speaking children judged to be substituters, productions of derived [w]‌and target [w] are measurably different, in both singleton contexts (Klein 1971; Menyuk 1971; Dalston 1972; Hawkins 1973; Chaney 1978) and clusters (Kornfeld 1971; Menyuk 1971; Hawkins 1973; Chaney 1978). Klein (1971), for example, found that F2 origin is higher for derived [w] than for target [w], indicating the presence of [r]-like qualities in the former. Chaney (1978) observed that, in some contexts, children had different F2 frequency and/or F2 transition rates for target [w] and [w] substitutes (for [r] and [l]), although no child judged to be a substituter did this reliably. UIltrasound imaging, which has more recently become available, has also revealed physical evidence of covert contrast (Richtsmeier 2007). Analysis involving two-dimensional tongue postures of two children’s productions of initial [r] and [w] targets, both of which were consistently transcribed as [w], reveals different tongue shapes. The possibility that children are representing the [r]‌–[w] contrast covertly is also supported by perceptual studies. Goehl and Golden (1972, cited in Kornfeld and Goehl 1974) observe that English-speaking children whose productions are judged to involve substitution of [w] for [r] can perceive these same productions of derived [w] as distinct from their productions of target [w]. Similar results are reported in Bryan’s (2009) study on the perception and production of the English [r]–[w] contrast and the French [ʁ]–[w] and [ʁ]–[j] contrasts. Some English learners whose productions of target [r]

62 Heather Goad were transcribed as [w] could perceive the [r]–[w] contrast. Acoustic analysis of these learners’ outputs revealed a difference between their productions of derived and target [w]: derived [w] had lower F3 values than target [w]. Similarly, some French children whose outputs for target [ʁ] were transcribed as [w] or [j] could perceive the difference between [ʁ] and [w] and between [ʁ] and [j]. Acoustic analysis revealed differences in F2 and F3 for these children’s derived and target glides. Not all researchers who have phonetically compared derived and target [w]‌among English learners have found evidence of distinct articulations (Locke and Kutz 1975; Kuehn and Tomblin 1977; see also Menyuk and Anderson 1969; Hoffman et al. 1983). Importantly, though, we have no information on how the children in these studies produced branching onsets. If the children reduced branching onsets to singletons, then neutralization of the liquid–glide contrast would not yield a rogue grammar but, rather, a grammar like that found in Blackfoot. Clearly, more research must be undertaken, but the evidence thus far available suggests that the explanation for the CG stage in acquisition may lie in covert contrast: if all children who seemingly produce a glide in CL targets in fact show evidence of producing liquids covertly, this would be significant. Before concluding this section, we briefly return to Jake’s data in (18) where liquids are uniformly transcribed as glides. As Smit (1993a) points out, it is difficult to accurately transcribe substitutions for English [r]‌. In view of this, we must question whether the glides in Jake’s outputs are genuine or whether the data are transcribed too broadly to reflect any evidence of covert contrast. For this child, we will never know. What we expect, though, is that if adult misanalysis of children’s liquids as glides is due to poor gestural coordination, significant variation should be observed in children’s production of glides, some of which should be reflected in narrowly transcribed data. That this variation may indeed be observed for some learners is revealed by the range of outputs observed for Richard, who went through an extended CG stage. Examining his 88 CG outputs for CL targets at 2;5 uncovers significant variation in the way the glide was phonetically transcribed. Although no instrumental data are available, the transcriptions reveal liquid properties much of the time (e.g. labio-rhotic glide: [dr̯aiv̹] ‘drive’) (see Goad 2006). We have concluded that the most promising avenue to pursue for the rogue CG stage is that the transcribed glides in children’s CG outputs are not truly glides but, instead, are covertly produced liquids misanalyzed by transcribers due to children’s poor gestural coordination. If vocal tract immaturity is responsible for the misanalysis, then this would be a case, like VF, where rogue behavior can be reduced entirely to physiological constraints and no child–adult asymmetry would need to be formally expressed by the grammar.

4.4.3 Velar Substitution Another possible source of explanation for rogue behavior is children’s misperception of adult forms. In this section, we examine velar substitution in Amahl’s outputs (Smith

Phonological Processes in Children’s Productions 63 1973) from this perspective. From age 2.60 to 2.333, Amahl produced strident fricatives as stops. The examples in (20a) exhibits this for words of the shape under focus. From age 2.60 to 3.282, target [t,d,nt,nd,n] in medial position before dark l surfaced as velar. See (20b). (20) a. Stopping: [pᴧdǝl] ‘puzzle’ (2.207–2.215) [paːtǝl] ‘parcel’ (2.317–2.333) [witǝl] ‘whistle’ (2.233–2.242)

b. Velar substitution: [pᴧɡǝl] ‘puddle’ (2.247–2.256) [bɔɡ̇ul] ‘bottle’ (2.207–2.215) [kεŋǝl] ‘kennel’ (3.159–3.206)

Words of certain types were protected from undergoing velar substitution: those with derived coronal stops (20a), and those with target [st], even though [s]‌was deleted from clusters during much of the relevant period; see (21). (21) Target [st]: [nɔtil] ‘nostril’ [pitǝl] ‘pistol’

(2.207–2.215) (2.247–2.256)

We return to these “exceptions” shortly. We first consider whether velar substitution like (20b) can be motivated for adult grammars. We begin with the following assumptions. One, given that laterals are velarized in English rhymes, we assume that the process in (20b) involves place assimilation: coronals acquire velarity from [ɫ]. Two, we assume that the process applies between an onset consonant and following nuclear trigger (syllabic [ɫ]); that is, it applies locally rather than long-distance, which we have already seen is not attested for primary place in adult grammars. Given these assumptions, a rule- based formulation of this process would be: /t,d,n/→[k,ɡ,ŋ]/__[ɫˌ]. Let us now address whether adult grammars contain such processes. Although place sharing between onset and nuclear consonants is attested, these assimilations seem to be restricted to cases where the nucleus acquires place from the preceding onset rather than the other way around (e.g. Gronings Dutch: lachen→[laχŋˌ ] ‘to laugh’ (Humbert 1997)). While we do commonly observe cases where vowels spread place features to a preceding consonant, they never seem to affect a change in primary place; rather they involve the addition of secondary place (Ní Chiosáin and Padgett 1993) (e.g. Fante Akan: /abε/→[abjε], *[adε] ‘palm’ (Clements 1984)). It appears, thus, that assimilations like /t,d,n/→[k,ɡ,ŋ]/__[ɫˌ] are not attested in adult grammars. For the string of segments under focus, we should rather expect to derive [tɣ,dɣ,nɣ]. Must we then conclude that Amahl’s grammar is a rogue grammar? Not if an alternative explanation holds. Following Macken (1980), we suggest that the explanation lies in perception: the forms in (20b) reflect perceptual miscoding rather than the operation of a rule or set of constraints. The discussion in this section largely follows Macken. Smith (1973) proposes that mismatches between adult surface representations and children’s outputs are generally due to constraints on production. Macken (1980) challenges this view for the data in (20b) beginning with the chain shift in (20b,a): ‘puddle’→[pᴧɡǝl] while ‘puzzle’→[pᴧdǝl]. Although formal accounts of chain shifts have been provided (involving rule ordering (Smith 1973) or judicious use of

64 Heather Goad constraints (Dinnsen et al. 2001; Jesney 2007)), it remains mysterious why a production difficulty should yield [pᴧɡǝl] for ‘puddle’ when intervocalic [d]‌is perfectly well-formed in [pᴧdǝl] from ‘puzzle.’ The perceptual account makes specific predictions, which Macken shows are supported. (i) Underapplication: Perceptual encoding problems should not be attested across-the-board; they should not be observed in contexts where perceptual confusion would not arise. This is supported: intervocalic dorsal is found in the “puddle”-type words in (20b) but not in the “pistol”-type words in (21).14 (ii) Overapplication: When the perceptual-encoding problem is overcome, the correction should overapply to some words which were appropriately stored with underlying velars, until such words are heard again. This is supported, as the examples in (22a) show. Those in (22b) are consistent with this, although there are no productions for these words before age 3.286–3.355. (22) a. ‘pickle(s)’ [pikǝl] [pitǝl] b. ‘winkle’ ‘Trugel’

(3.45–3.70) (3.286–3.355)

‘circle’ [sǝːkǝl] (3.45–3.70) [sǝːtǝl] (3.286–3.355)

[wintǝl] (3.286–3.355) [ˈtruːdrεl] (3.286–3.355)

In sum, the perceptual account of velar substitution yields dividends of two types. One, the chain shift (20), underapplication (21) and overapplication (22) patterns are no longer surprising. Two, we avoid the unwanted conclusion that Amahl has a rogue grammar. The latter problem does not arise under a perceptual account: medial velars in “puddle”-type words are not derived; they are stored as such. In sum, the rogue substitutions examined in this and the preceding sections have been explained by probing how perceptual and motor development interface with the acquisition of an adult-like grammar.

4.5 Unexpected Complexity We consider one remaining type of mismatch between children’s and adults’ grammars, cases where children’s outputs are not puzzling from the perspective of adult typology but where they seem to involve complexity beyond what is attested in the target grammar and, thus, complexity in the absence of positive evidence. If learners are on a trajectory from unmarked outputs to outputs that more faithfully reflect the adult grammar, markedness considerations should be responsible for shaping intermediate grammars. Accordingly, we must carefully examine cases of unexpected complexity at these intermediate stages. 14

Presumably, [st] is not misperceived before [ɫ] because [s]‌has robust internal cues for place (Wright 2004), which ensures its perceptibility even in non-optimal contexts (before stops) (Goad 2011).

Phonological Processes in Children’s Productions 65 We consider, here, unexpected syllable complexity in Mollie’s grammar (Holmes 1927). At Stage 1, Mollie’s bisyllabic forms are CVCV in shape; see (23a) (age in months indicated in parentheses). At Stage 2, a different pattern emerges: the first syllable is heavy, regardless of its target weight, (23b). The forms in (23b.i), in particular, require explanation. (23) Bisyllabic targets: Mollie: a. Stage 1: [nænæ] ‘dinner’ (18) [bɪbi] ‘bib’ (18) [dædæ] ‘dadda’ (18),(20) b. Stage 2: i. [tᴧnni]/[tᴧnˑti] ‘tummy’ (22) [bɪbbi] ‘bib’ (23) [dɑddi] ‘dolly’ (22)

ii. [wæŋˑki] ‘hanky’ (22) [pεnˑtǝ] ‘pencil’ (22) [kwᴧnˑtri] ‘country’ (23)

As mentioned in section 4.3.1, Holmes observes that Mollie has equal stress on both syllables in her bisyllabic words at 18 months, revealing a lack of understanding of English stress and possibly the absence of foot structure. This pattern is only made in reference to Mollie’s outputs at 18 months, suggesting that the Stage 2 words have the target stressed–unstressed profile. In view of this, foot well-formedness may be responsible for the pattern in (23b): if, by this point, Mollie understands that English builds moraic trochees and that final syllables in nouns are extrametrical, the first syllable must be augmented to respect foot binarity (Goad 2001). The adult analysis of “tummy”-type words with light penults either violates FtBin, (tᴧ)Ft, or the extrametricality requirement (NonFinality), (tᴧmi)Ft. If Mollie’s Stage 2 grammar must respect both of these constraints, the first syllable must be rendered heavy, as shown in (24). (24)

/tᴧmi/

☞

FtBin

a.

(tᴧni)Ft

b.

(tᴧ)Ft

c.

(tᴧn)Ft

NonFinality

Faith

*! *! *

Under this analysis, the forms in (23b.i), that seemingly involve complexity beyond what is attested in the target grammar, do not imply a more complex grammar, which would have been built in the absence of positive evidence. Rather, the added complexity stems from Mollie needing to satisfy requirements imposed by the English grammar which conflict in CVCV words. In sum, competing demands on the grammar can lead to unexpected behavior. Whether this type of approach can account for other cases of unexpected complexity we leave to future research.

66 Heather Goad

4.6 Conclusion We began this chapter with some predictions that arise from Optimality Theory in the context of a theory of acquisition. The foundation of the latter was that children’s grammars are possible grammars. Because all constraints in OT, along with the primitives, structures, and operations they refer to, are present in the grammar of every language, it followed that children’s grammars should manipulate all and only the constraints that adult systems do. We provided support for this from sonority-driven onset selection in English and the development of branching onsets in Québec French. We turned next to examine several types of divergent behavior, where children’s grammars seem to depart from what is expected from adult behavior. One type, which we discussed at the end of the chapter, included cases of unexpected complexity in children’s grammars, unexpected because the particular patterns observed arise in the absence of positive evidence. The case under examination, augmentation of stressed light syllables in Mollie’s grammar, was argued to be due to the child needing to satisfy competing requirements imposed by the target grammar. Explanations for the other types of unexpected behavior required that we look beyond the target grammar. One type seemed amenable to formal explanation using the same primitives and constraints that define adult grammars. Specifically, the patterns observed could be explained by examining how perceptual and motor development interface with an adult-like phonological grammar. Velar substitution in Amahl’s grammar was shown to be likely due to misperception of intervocalic alveolar stops before dark l. Velar fronting and glide substitution were proposed to arise from vocal tract immaturity. In the latter case, poor gestural timing resulted in the cues being used by the child to express a contrast not being sufficient for the transcriber to perceive the sounds involved in the contrast as different. The final type of unexpected behavior included the CV stage and consonant harmony, two phenomena that suggest that children’s grammars do not manipulate exactly the same toolkit as adult grammars: the CV stage seemed to require that children begin acquisition without the full set of prosodic constituents available to adults; consonant harmony seemed to require that early grammars contain constraints that cannot be motivated for adult grammars. This chapter has focused entirely on unusual patterns that are present in early grammars. In the literature, significantly less attention has been paid to processes that are commonly attested in adult grammars, yet absent from development. A striking comparison here is types of place harmony (Drachman 1978). As we have seen, consonant harmony, where consonants at a distance agree for primary place, is commonly attested in children’s grammars yet absent from adult grammars. Vowel harmony, where vowels at a distance agree for place, is common in adult grammars yet seemingly absent from the grammars of children learning languages without this process. Spontaneous creation of vowel harmony would not be surprising, because, like consonant harmony, it is

Phonological Processes in Children’s Productions 67 articulatorily advantageous in reducing the number of gestural changes required to produce a word. In view of this, does the absence of vowel harmony suggest that children’s grammars lack the mechanisms required to formally represent this process (see Goad 2001)? Perhaps, although any explanation for the absence of vowel harmony must, at the same time, permit the presence of the seemingly-similar consonant harmony. We have tried to come to some understanding of processes that unexpectedly occur in children’s grammars. We leave exploration of the unexpected absence of cross- linguistically common patterns from early grammars to future research.

Acknowledgments Thanks to an anonymous reviewer and Joe Pater for helpful comments on an earlier version of this chapter. Many of the ideas contained in this chapter were presented at the Boston University Conference on Language Development, Universität Hamburg, Rutgers University, Université de Montréal, Université Paris 8, and Memorial University of Newfoundland. Thanks to the audiences for questions and comments. This research was supported by grants from SSHRC and FRQSC.

Chapter 5

Prosodic Ph e nome na Stress, Tone, and Intonation Mitsuhiko Ota

5.1 Introduction This chapter provides an overview of research on the development of prosodic phonology, or the phonological organization beyond segments, particularly at or above the level of the word. The focus will be on three prosodic phenomena that have received much attention in developmental linguistics: namely, stress, tone, and intonation. To avoid overlap in coverage, this chapter will not discuss how these prosodic features are related to other areas of learning, such as speech and word segmentation, phonological processes, or morpho-phonological acquisition. The reader is referred to the relevant portion of Chapters 3 by Zamuner and Kharmalov, 4 by Goad, and 7 by Tessier for these issues. For each of these prosodic phenomena, the chapter first describes what is known about its course of development, including perceptual precursors in newborns and very young infants, and the subsequent emergence of general and language-specific properties. Next, it presents the outcomes of research that attempts to interpret the developmental observations within the frameworks of metrical stress theory and autosegmental phonology, models that have been central to theoretical research on prosodic phonology in the past few decades. This is followed by a critical assessment of the evidence and arguments for this approach to understanding the development of prosodic phenomena. The chapter concludes with suggestions for future directions.

Prosodic Phenomena: Stress, Tone, and Intonation 69

5.2 Acquisition of Stress 5.2.1 Development of Stress For the purpose of this chapter, stress is understood as a lexically assigned property of a syllable that renders the syllable a potential position of prominence (Hayes 1995; Sluijter 1995; Ladd 2008). Working from this definition, stress is a structural notion that does not necessarily translate to actual phonetic prominence; the realization of the latter depends on various factors, such as whether the stressed syllable is in or out of the focus position of the utterance. Stress is also a separate matter from the presence and shape of particular pitch patterns (e.g. a high pitch), whose association with a stressed syllable is dictated by the intonational system of the language. Despite the lack of isomorphic phonetic characteristics of stress, an underlyingly stressed syllable can be differentiated from an unstressed one as being phonetically salient in some contexts, and it is the acoustic signal of such prominence that the learner must be using to learn the structure and function of stress. There is much evidence that sensitivity to acoustic differences associated with stress is already present in very young infants. A study using the high-amplitude sucking paradigm shows that newborns can discriminate natural samples of Italian disyllables and trisyllables differing in stress position (e.g. /ˈmama/ versus /maˈma/, /ˈtacala/versus / taˈcala/) (Sansavini et al. 1997).1 Similarly, English-exposed 1-month-olds can detect a change in synthesized disyllables with different stress patterns such as /ˈbada/versus / baˈda/(Spring and Dale 1977; Jusczyk and Thompson 1978). As the stressed syllables in these studies differed from the unstressed ones in either duration (Sansavini et al. 1997; Spring and Dale 1977) or a combination of duration, amplitude, and fundamental frequency (Jusczyk and Thompson 1978), the results indicate that neonates are able to differentiate syllables based at least on duration, which is one of the key phonetic correlates of stress in mature systems (Sluijter and van Heuven 1995; Gussenhoven 2004; Kochanski et al. 2005). One of the first signs of language-specific stress development we see in infants is their recognition of the predominant stress pattern of the ambient language. In English, stress frequently falls on the initial syllable of a word (Cutler and Carter 1987), a tendency that is particularly strong in infant-directed speech, in which stress can coincide with the beginning of a word as much as 95 percent of the time (Kelly and Martin 1994). This characteristic of English stress is picked up by infants some time between 7 and 9 months. 1 A superscript vertical line is placed before the location of primary stress and a subscript vertical line before the location of secondary stress, if any.

70 Mitsuhiko Ota Experiments using the head-turn preference procedure show that 9-month-old infants exposed to English (but not those of 6-or 7-months old) prefer to listen to initially-stressed disyllables over finally-stressed disyllables (Jusczyk et al. 1993a; Echols et al. 1997). In German, another language that has an overall tendency toward initial stress, ERP experiments with the mismatch negativity paradigm have shown that a similar bias for initially- stressed disyllabes may emerge as early as 4 to 5 months of age (Weber et al. 2004; Friederici et al. 2007a). The interpretation that such preferences reflect language-specific input rather than a universal bias toward initial stress is reinforced by the lack of comparable effects in infants learning languages without an extremely skewed distribution of stress patterns, such as Spanish and Catalan (Pons and Bosch 2007). Around the same time, infants’ behaviors also begin to reflect cross-linguistic differences in the lexical contrastiveness of stress. While infants respond differently to initially-versus finally-stressed words if they are exposed to a language that uses stress contrastively in lexical items (e.g. English, Spanish, German), they do not if they are learning a language in which stress is not lexically contrastive (e.g. French) (Höhle et al. 2009; Skoruppa et al. 2009, 2011; Pons and Bosch 2010). However, there is evidence that infants learning the latter type of language, such as French, do retain sensitivity to acoustic correlates of stress; thus their failure to respond to stress differences imposed on words suggests not a decline in auditory abilities but a functional reorganization due to the non-lexical nature of stress in the language (Skoruppa et al. 2009). During the first year, infants also gradually become capable of learning stress patterns specific to individual lexical items. In English, the first indication of this process appears in 7-month-olds, who can detect a stress shift in a novel word form they have been familiarized to (e.g. doˈpita → ˈdopita) (Curtin et al. 2005). Evidence that infants can link such a stress difference to a referential contrast emerges several months later. In experiments using novel word forms and unfamiliar objects, 12-month-olds can learn distinct word–object pairings, even when the word forms differ only in the syllables that are stressed (e.g. ˈbedoka vs. beˈdoka, Curtin 2009, 2011), and 14-month-olds can learn novel pairings when the word forms differ in the position of the stressed syllable (e.g. ˈbedoka versus doˈbeka, Curtin 2010). Recognition of familiar words by English-or French-exposed 11-month-olds is slowed down when the stress is shifted to the wrong syllable (e.g. ˈbaby → baˈby), indicating that the representations of real words learned before 1 year of age already contain information associated with stress (Vihman et al. 2004). While the evidence from perception and recognition experiments may suggest that much of stress acquisition is achieved during the first year, production data tell a different story. In spontaneous real-word production as well as non-word imitation, children older than 1 year produce many stress errors, which usually reflect the predominant patterns of the language (English: Kehoe 1997, 1998, 2001, Klein 1984; Dutch: Fikkert 1994, Lohuis-Weber and Zonneveld 1996; Spanish: Hochberg 1988a, 1988b). In English and

Prosodic Phenomena: Stress, Tone, and Intonation 71 Dutch, initial primary stress is often imposed on words that begin with a syllable with no stress or secondary stress, as illustrated by the examples in (1): (1) Adult target word /bɑˈlɔn/(Dutch ‘balloon’) /ʃiːˈrɑf/(Dutch ‘giraffe’) ˌkangaˈroo /ˌbɛnəˈsiː/ (English non-word)

Child’s production [ˈmɔmə], [ˈbɔmə] [ˈɦɑfə] [ˈkæŋˌwu] [ˈbɛˌsiː]

Age 1;7 2;0 1;10 1;10

(Fikkert 1994) (Fikkert 1994) (Kehoe 1998) (Kehoe 1998)

In slightly older children, the conflict between the predominant and more specific stress patterns can result in word productions with two locations of primary stress. (2) Adult target word /bɑˈlɔn/(Dutch ‘balloon’) /ʃiːˈrɑf/(Dutch ‘giraffe’) ˌkangaˈroo /ˌbɛnəˈsiː/ (English non-word)

Child’s production [ˈbɑnˈdɔn] [ˈsiːˈaːf] [ˈkæŋgɑˈwuə] [ˈbɛəˈsiː]

Age 2;1 2;1 2;4 2;4

(Fikkert 1994) (Fikkert 1994) (Kehoe 1998) (Kehoe 1998)

Such errors gradually disappear from children’s production during the third and fourth years. However, stress patterns involving morphologically complex words and higher levels of prosodic domains continue to undergo development in school-aged children. An example of morphologically conditioned stress patterns would be derivational affixes that induce a predictable stress shift, such as -ic and -ity, which place primary stress on the preceding syllable (e.g. ˈmetal → meˈtallic, ˈpersonal → persoˈnality). These stress shift patterns are not fully acquired by 7-to 9-year-olds (Jarmulowicz 2006). An example of a stress pattern that operates above the level of a simple word is the contrast between compound (a ˈhotˌdog) and phrasal (a ˌhot ˈdog) stress in English. The comprehension of this contrast does not approximate adult performance until children pass the age of 9 years (Atkinson-King 1973; Vogel and Raimy 2002). The later development of these aspects of stress is not surprising since its mastery requires not only a purely phonological understanding of stress but also an understanding of how stress interacts with morphology (e.g. affixation, compounding) and syntax (e.g. noun phrase structure).

5.2.2 Metrical Phonology and Stress Acquisition From the review of the developmental process of stress in the previous section, it should be evident that infants and children do not simply learn stress on an item-by-item basis. They engage in some level of generalization, as indicated by the biases toward language-specific regular patterns in perception and production. This raises questions about the nature of the knowledge of stress in young learners as well as the learning mechanisms involved in the acquisition of stress. The infants’ developmental behaviors may emerge from the piecemeal learning of the distributional characteristics observed

72 Mitsuhiko Ota within numerous instances of individual stress patterns that the learners encounter. Alternatively, the behaviors may be a manifestation of some abstract structural principles that underlie any human-language stress system. A related issue is the extent to which the paths of learning are guided by a priori principles of stress organization. Successful convergence on the adult state may require certain constraints on the range of possible stress systems learners entertain, or it may be sufficiently accomplished through a process that integrates the input data without a predetermined learning space. In addressing these questions, many researchers have examined the extent to which the development of stress involves the same structural organization of mature phonological systems proposed in metrical stress theory. A central tenet of metrical phonology is that stress reflects a hierarchical structure that governs the positions of prominence in phonological forms (Liberman and Prince 1977; Hayes 1995; Kager 2007). One common way of representing this underlying structure is through grids of beats that express temporal sequencing along the horizontal axis and prominence along the vertical axis, as in (3). (3) ( x ) ( x) ( x .) (x .) ( x .) (x .) x x x x x x x x L L L L H L L L HL ˌhippoˈpotamus ˌmemoraˈbilia

Word level Foot level Syllable level

According to the representation in (3), syllables, potential bearers of stress, are grouped into feet, although for English nouns the final syllable is left out of the grouping (i.e. it is “extrametrical”). Feet are composed of a strong position, or its head (“x” at the foot level structure in (3)) and a weak position (“.” in (3)). In English, the head of the rightmost foot is the position of word-level prominence (i.e. primary stress). Feet in English are also “quantity sensitive.” That is, their formation takes the internal structure of the syllable into consideration. A syllable containing a coda or a long vowel (i.e. a “heavy” syllable, marked H in (3)) can form a foot on its own; otherwise (i.e. if it is “light,” marked L in (3)) it needs to be combined with another syllable to form a foot. Metrical representations capture many fundamental characteristics of stress systems in human language, such as the tendency for stress to occur on alternating syllables, and the cumulativeness by which one syllable in a word is singled out to carry the highest prominence. Metrical representations also allow us to describe cross-linguistic variation systematically. Languages can vary in the type of feet they have in terms of head direction (trochaic or left-headed versus iambic or right-headed) and quantity sensitivity. Within quantity-sensitive languages, some treat closed syllables as heavy while others do not. Languages also differ in the direction in which syllables are parsed into feet (left-to-right versus right-to-left), what unit (e.g. syllable, consonant) or position (final or initial) is extrametrical (if any), whether the rightmost or leftmost foot is the most prominent, and whether feet that do not satisfy their size requirements are still admitted if no other options are available.

Prosodic Phenomena: Stress, Tone, and Intonation 73 On this account, the task of a language learner is to find out the specific pattern of metrical setup adopted by the target language. In a parameter-setting approach (Dresher and Kaye 1990; Dresher 1999), the dimensions of cross-linguistic differences are binary parameters (e.g. parsing direction) that can be set to different values (e.g. right-to-left of left-to-right). The child is modeled as a learner who sets the value of each parameter based on the available cues in the input. In order for the learning to settle on the correct parameter settings, some parameters have been hypothesized to have a default value (the value that is retained in the absence of evidence to the contrary) and the order in which the parameters are set has been prescribed such that the parameters whose values crucially dictates the settings of other parameters are fixed first. In a constraint-based approach, such as Optimality Theory (Prince and Smolensky 2004), stress acquisition has been modeled as algorithmic reranking of violable constraints such as Parse (“a syllable must be footed”) and Align-Feet-Right (“each foot must be aligned with the end of a word”). An example of such a model is Robust Interpretive Parsing/Constraint Demotion (RIP/CD; Tesar 1998; Tesar and Smolensky 2000). In RIP/CD, the stress pattern assigned by the current constraint ranking is compared with the attested pattern. Whenever a mismatch is observed, the algorithm recursively modifies the metrical grammar by demoting certain constraints in the ranking until no such mismatches are detected.2 For a fuller discussion of this and other approaches to modeling phonological acquisition with constraints, the reader is referred to Chapter 30 by Jarosz in this volume.

5.2.3 Evidence for Metrical Organization in Stress Development There are two important empirical questions regarding the proposed involvement of metrical phonology in the development of stress. First, to what extent do patterns in developmental data support the hypothesis that stress learning progresses on the basis of metrical structures? Second, does successful learning of stress necessarily require the structures prescribed by metrical theory and its allied learning models (i.e. parameter- setting, constraint reranking)? This section will address the first question. In support of the empirical fit between developmental data and metrical phonology, several studies have shown that stages of stress development can be matched up with a systematic progression of metrical settings (Fikkert 1994; Demuth 1996; Kehoe and Stoel-Gammon 1997a; Kehoe 1998). For example, Fikkert (1994) bases her account of Dutch stress acquisition on the parameter-setting model of Dresher and Kaye (1990), and presents stage-wise analyses of production data in child Dutch. The stage during which weak–strong targets are produced as strong–weak forms (as illustrated in (1)) is explained as one in which the parameters are set to allow only one left-headed disyllabic 2 Tesar (2004b) proposes an augmentation (Inconsistency Detection) to this procedure, which allows the learner to resolve ambiguities in the data (i.e. when the attested pattern is consistent with more than one interpretation) by comparing a range of attested forms.

74 Mitsuhiko Ota foot. The next stage is characterized by level stress (as illustrated in (2)), which exemplifies a metrical stage that allows more than one foot (which is now a left-headed moraic foot), but lacks a setting for the main stress parameter that assigns primary stress.3 Crucially, the unset main stress parameter should result in phonetic forms with level stress, which are not attested in the input. The presence of such forms cannot be directly explained by appealing to convergence to a dominant stress pattern. A related but slightly different type of argument uses the size and shape of early word production as a source of evidence for metrical organization of developmental patterns. One robust observation of children’s word production is that there is an initial stage where syllables are omitted from long target words in such a way that the resulting word production is limited to two syllables at most. Furthermore, the child form only admits the dominant prosodic pattern of one or two-syllable words in the language (e.g. a strong(–weak) pattern in English). This pattern has been attested in a number of languages including English (Allen and Hawkins 1978; Echols and Newport 1992; Schwartz and Goffman 1995; Salidis and Johnson 1997), Catalan (Prieto 2006), Dutch (Fikkert 1994; Wijnen et al. 1994), Hungarian (Fee 1995), K’iche Maya (Pye 1992), and Japanese (Ota 2003b). This stage of development follows from the general developmental hypothesis in Optimality Theory that all markedness constraints are ranked above faithfulness constraints in the initial state (Smolensky 1996b; Davidson et al. 2004). Such a ranking scheme predicts that the faithfulness constraint that prohibits deletion of input materials (Max) should be outranked by markedness constraints that require every syllable to belong to a foot (Parse-σ), feet to be either disyllabic or bimoraic (FtBin) and every foot to be aligned with a prosodic word (Align-Ft) (Pater 1997). As demonstrated by the tableau in (4), the result is that no structure larger than a prosodic word consisting of a single binary foot can be an optimal output, hence the disyllabic maximality effect.4 Crucially, this argument rests on the assumption that children’s words have metrical constituents consisting of binary feet that group syllables into a hierarchical structure. (4)

Input: hippopotamus

Align-Ft Parse-σ

a. [(híppo)(pòta)mus]

*!

b. [(híppo)(pòta)]

*!

c. [(póta)mus] d. [(pótamus)] ☞ e. [(pómus)]

FtBin

Max

* *** *!

**** *!

**** ******

3 One problem with this approach to modeling prosodic acquisition is that the putative “stages” of development in reality are not discrete as assumed in the models, but rather overlap in time. This issue has been addressed in a more recent simulation of the same Dutch data using a probabilistic, gradual learning model of constraint reranking (i.e. Boersma’s (1998) Gradual Learning Algorithm) (Curtin and Zuraw 2002). 4 The tableau is adapted from Pater (1997). Round brackets indicate footing, and square brackets boundaries of the prosodic word.

Prosodic Phenomena: Stress, Tone, and Intonation 75 Finally, findings that infants can generalize the underlying stress patterns of nonsense words beyond the surface forms of the familiarization stimuli may be interpreted as evidence that stress acquisition involves abstract principles of metrical structures. For instance, Gerken (2004) and Gerken and Bollt (2008) exposed 9-month-olds to 3-and 5-syllable nonsense words derived from two artificial languages, and then tested their response to tokens which have surface stress forms that are different from those of the familiarization items but still in accordance with the stress assignment grammar of each language. Both artificial languages followed two metrical principles: the Weight- to-Stress principle (i.e. heavy syllables are stressed) and avoidance of stress clash (i.e. no two consecutive stressed syllables are stressed), the latter of which took precedence over the former when their demands conflicted. They also assigned stress according to iterative parsing of syllables into disyllabic feet, albeit in two opposite directions. In Language 1, stress fell on alternating syllables beginning with the initial one but only when it did not interfere with the Weight-to-Stress principle or avoidance of stress clash. In Language 2, the alternating pattern started with the final syllable, again subject to the Weight-to-Stress principle and avoidance of stress clash. The crucial test items were the words in (5), where the syllables in capital letters were stressed. (5) a. do TON re MI fa b. do RE mi TON fa The stress pattern in (5a) is consistent with the grammar of Language 1 but not with that of Language 2 (which would have generated “do TON re mi FA”). Conversely, (5b) is consistent with Language 2 but not with Language 1 (which would have generated “DO re mi TON fa”). The listening times for these test items differed depending on which language the infants were familiarized with. The effects were observed not only when the heavy stressed syllables had the same segmental composition in the training and the test items (e.g. TON; Gerken 2004), but also when they were different, as long as the training set contained more than two examples of heavy stressed syllables (e.g. BOM, KEER, SHUL; Gerken and Bollt 2008). These results suggest that 9-month-olds are able to generalize beyond the stress pattern in the individual nonsense words to new words and new stress patterns that reflect a metrical system. However, it has also been argued that many of the observations cited in this section as evidence for metrical phonological development can also be simulated in neural networks (Gupta and Touretzky 1994; Shultz and Gerken 2005) or exemplar-based models (Daelemans et al. 1994; Eddington 2000) without any recourse to formal metrical mechanisms. Learning simulations carried out with these computational models resemble real developmental data in important ways. First, the distribution of errors produced by the simulator during the early stages of learning matches that in children’s word production. Eddington’s (2000) simulation of Spanish stress acquisition yielded the largest number of errors for antepenultimate-stressed target words, much fewer for final- stressed targets, and the fewest for penultimate targets, correctly predicting the order of error frequencies in Hochberg’s (1988a, 1988b) experiments with 3-and 4-year-old

76 Mitsuhiko Ota Spanish-speaking children. Second, these simulations often exhibit apparent patterns suggestive of metrical organization. In Daelemans et al. (1994), a simulator trained on Dutch words learned to favor stress assignment to heavy syllables, just as predicted by the Weight-to-Stress principle. In fact, the effect was more pronounced in a model using simple phonemic representations of syllables than in a model in which syllables were annotated for weight. Third, the simulations yield results that correspond with the markedness predictions of metrical theory. For example, Dresher and Kaye (1990) propose that the default (thus, unmarked) setting of foot parsing is iterative. The learner can reset the value of this parameter to the marked setting of non-iterative feet when absence of secondary stress is observed. Consistent with this prediction, the simulators in Gupta and Touretzky (1994) took longer to learn languages with non-iterative feet, even though no explicit bias against non-iterative systems was engineered into the models.

5.2.4 Metrical Theory and the Learnability of Stress A second question related to the developmental evidence for the involvement of linguistic principles is whether the acquisition of stress necessarily requires a priori knowledge of metrical phonology in order for the learner to always arrive at the correct target system. Metrical theory limits the types of stress systems that are allowed in human language, and such restrictions on the learner’s hypothesis space may be necessary for successful learning to occur. One prediction that follows from this model is that stress patterns that are not licensed by principles of metrical theory should be unlearnable. This prediction has been tested by Gerken and Bollt (2008). Recall that in one experiment in this study, 9-month-olds were shown to generalize weight sensitivity when they were familiarized with words that favored placement of stress on closed (and hence “heavy” in metrical terms) syllables such as BOM, KEER, and SHUL. In another experiment, they familiarized 9-month-olds to a system in which stress was attracted to open syllables with a /t/onset (TU, TO, TI). In this case, the infants did not generalize this pattern to novel items. By contrast, 7-month-olds who participated in the same experiment did learn the pattern. Given the standard assumption in metrical phonology (e.g. Hayes 1989) that syllable weight is only sensitive to rhyme structure (i.e. syllable minus the onset), this is a rather surprising result. The interpretation offered by Gerken and Bollt (2008) is that “constraints on generalization” are not inherent in the learners, but develop over time as infants become familiar with the input regularities in the ambient input, which in English includes the tendency for closed syllables to attract stress. There is an alternative interpretation of these outcomes, however. Recent research shows that, although rare, some languages do exhibit onset-based distinctions that can be equated with syllable weight (Gordon 2005). Interestingly, in these languages, low sonority onsets (such as /t/, as in the Gerken–Bollt language) tend to attract more stress than high sonority onsets (such as /n/). The onset-based language in Gerken and Bollt (2008), then, might after all be a possible pattern in natural language, and metrical phonology may have to be revised to incorporate this pattern. The results

Prosodic Phenomena: Stress, Tone, and Intonation 77 can, therefore, be reinterpreted as a demonstration of 7-month-olds’ readiness to learn this metrical option after limited exposure because they are still not as committed to the specific ambient pattern (i.e. the rhyme-only syllable weight system of English) as 9-month-olds are. Another way to investigate what successful acquisition of stress is dependent on is through computer simulations. The amount and type of constraints or biases a simulation needs in order to correctly succeed to learn a possible stress pattern or fail to learn an unattested (hence presumed impossible) pattern can tell us what types of structure must be hard-wired into the learning model. For example, Hayes and Wilson (2008) demonstrate that a learning model using only structurally-adjacent (or local) information cannot succeed in acquiring non-local aspects of stress, such as the assignment of main stress to the rightmost stressed syllable, unless it is augmented by metrical representations of the kind illustrated in (3). The critical insight is that metrical representations make long-distance relationships, such as the positions of stressed syllables, local at some level of analysis. Also using model simulations, Pearl (2011) argues that even when learners adopt parametric metrical phonology, they cannot successfully converge on the target stress system through probabilistic learning unless there is some built-in bias in the learning process. In Pearl’s analysis, such a bias must guide the learner toward the input data that unambiguously lead to the correct system—a kind of data that turns out to be a small minority of the input data. Here again, the caveat raised for the connection between developmental data and metrical theory in the previous section may apply. As pointed out by Gupta and Touretzky (1994), the existence of formal architectures such as metrical principles cannot be proven by demonstrating that they turn an otherwise unlearnable attested system into a learnable one because degrees of learnability can also be changed by models that do not impose formal constraints on the learning space. Furthermore, there is a possibility that the child’s initial target system is not the same as the adult stress grammar (Pearl 2011). If so, then any demonstration of learnability or lack thereof based on the description of the adult system may not hold true. In recent years, the issue of learnability has also been readdressed within a programmatic approach that links language acquisition with typology. Instead of asking what needs to be hard-wired into learners for them to successfully converge on the target system based on finite samples of the language, this approach asks whether basic characteristics of language (such as the range of attested stress systems) can be explained as a product of children’s “analytic bias” (Moreton 2008). Biases in data interpretation will limit the types of generalizations that children make over a sample of stress patterns, and the outcome of that learning will result in stress systems with common properties that reflect those biases. In an attempt to explore this bidirectional relationship between learning and language universals, Heinz (2009) examined the typology of stress systems in Gordon (2002), and noted that, with a few possible exceptions, all systems are “neighborhood distinct.” Informally put, this means that if words are construed as a string of steady states (i.e. the beginning, the end, and any position between two syllables) and transitions (i.e. types of syllables, such as stressed and unstressed), no two steady states

78 Mitsuhiko Ota have exactly the same transition pattern before and after them (for a technical definition of neighborhood distinctness, refer to Heinz 2009). By postulating a learner who only learns neighborhood distinct systems, we can explain why children can arrive at least at a typologically possible stress system when there are many more logically possible systems that account for the examples that they may encounter. In turn, the restriction on the learning can explain why stress systems in human language share some basic characteristics.

5.3 Tone and Intonation This section begins by discussing tone and intonation together, as the distinction between the two is a matter of linguistic function. In general terms, tone refers to the linguistic use of pitch in marking lexical items, and intonation refers to non-lexical use of pitch to indicate, for example, utterance level pragmatic distinctions (statement versus question) and phrase boundaries. All languages are known to have intonation, and some languages (about 60–70 percent of the world’s languages, according to Yip 2002) also have tonal marking of lexical items. Languages that are tonal can differ in how densely they specify the pitch configuration of lexical items, ranging from nearly every syllable (or mora), as in some so-called “lexical tone languages” such as Mandarin and Yoruba, to only one location per word, as in so-called “pitch accent languages” such as Japanese, Serbo-Croatian, and Swedish.

5.3.1 Sensitivity to Pitch as a Linguistic Phenomenon The main phonetic feature of tone and intonation is pitch, a psychophysical correlate of fundamental frequency (F0). From birth, infants exhibit sensitivity to fundamental frequencies in nonlinguistic stimuli, and can discriminate pure tones that differ only in F0 (Wormith et al. 1975). But the perception of pitch cannot be straightforwardly equated with that of F0. On one hand, adult listeners perceive sounds that share the same F0 as having the same tonal height regardless of the composition of the harmonic overtones. On the other hand, if tonal complexes contain harmonics derived from a single F0, they are perceived as having the same pitch even when the F0 itself is missing from the signal. Research using operant conditioning has shown that both of these basic properties of pitch perception are present by 7 months (Clarkson and Clifton 1985; Montgomery and Clarkson 1997). By contrast, 4-month-olds are incapable of extracting pitch from the combination of overtones, indicating that the perception of pitch is still under development during the first several months (Bundy et al. 1982). Sensitivity to pitch differences in early infancy has also been demonstrated with linguistic stimuli. Nazzi et al. (1998b) used the high-amplitude sucking procedure to show that newborns in France can discriminate two lists of disyllabic Japanese words differing

Prosodic Phenomena: Stress, Tone, and Intonation 79 in F0 contour (ascending versus descending). Experiments using synthesized speech stimuli show that by 1–2 months, infants can also discriminate contour differences within a syllable (Morse 1972; Kuhl and Miller 1982). During the same period of development, there also appears to be a more general underlying neural change that differentiates responses to pitch differences that are linguistically relevant from those that are not. Using near-infrared spectroscopy with Japanese-learning infants, Sato et al. (2010) measured hemodynamic brain responses to falling versus rising pitch contours in pure tone and also in words. At 4 months, responses were bilateral for both the pure tone and word form stimuli, but at 10 months, responses were stronger in the left hemisphere only when the infants heard the contours embedded in words.

5.3.2 Development of Tone Some perceptual reorganization with respect to linguistic pitch occurs between 6 and 9 months of age. In experiments using the head-turn preference paradigm or the stimulus alternating preference procedure, infants learning English, French, or Mandarin all respond to the rising versus low tone in Thai at 6 months (Mattock and Burnham 2006; Mattock et al. 2008). But at 9 months, only the Mandarin-learning infants demonstrate sensitivity to this tonal difference. Similarly, 6-to 8-month-old Yoruba-learning infants attend more to F0 differences among monosyllables than do their English-learning counterparts (Harrison 2000). These findings suggest that infants exposed to lexical tone languages maintain a higher degree of sensitivity to certain types of pitch patterns in comparison to infants learning a language without lexical tone. However, the effect does not appear to be simply caused by a general typological difference in prosodic systems, as English-learning infants continue to show fairly good discrimination of other tonal differences (e.g. Thai rising versus falling contours). It is more likely that the perceptual difference arises from the phonetic details of the pitch contours that have linguistic functions in the ambient language. For example, the Thai contrast between rising and low tone has some resemblance to the Mandarin tone contrast between rising and low dipping, but such contour difference may not play a major role in English intonation. Conversely, a difference similar to the Thai rising versus falling contrast signals the difference between the rising and falling intonation in English. A number of studies have been carried out on the production of tones in languages with lexical tone including Mandarin (Li and Thompson 1977; Clumeck 1980; Hua and Dodd 2000; Wong et al. 2005), Cantonese (Tse 1978; So and Dodd 1995), Taiwanese (Tsay 2001), and Sesotho (Demuth 1993, 1995a). Many of these, particularly those studying Asian languages, use transcribed data or adult-listener judgment of spontaneous word production to establish an order of acquisition in the tonal inventory by comparing the (impressionistic) accuracy levels of different tones. Although there is some consistency of order within each language, no apparent universal order of acquisition can be identified among comparable tones (for example, the high level tone is acquired first

80 Mitsuhiko Ota in Mandarin, but last in Thai). One general observation that can be made, however, is that tones that are variable in their surface realizations tend to be acquired later than those that are not subject to alternations. Thus, as pointed out by Clumeck (1980), the confusion between the rising tone and low dipping tone in Mandarin-learning children recorded as late as 3½ years can be attributed to the variable realizations of an underlying low dipping tone, which can surface as a rising tone when it follows another dipping tone, or a low level tone if it is elsewhere in a non-final position. In a similar vein, the so-called subject marker tone in Sesotho that invariably differentiates first/second person (no tone) from third person (high tone) is produced fairly accurately by the age of 2 years, while the phonetically similar tonal contrast that underlies verbal roots, highly variable due to its interaction with various phonological processes, is not fully acquired until the age of 3 years (Demuth 1995a). The development of prosodic phonology in languages with a lexical pitch accent system has been studied primarily using spontaneous speech data from Japanese (Hallé et al. 1991; Ota 2003a) and Swedish (Engstrand et al. 1991; Kadin and Engstrand 2005; Ota 2006; Peters and Strömqvist 1996). These languages feature complex pitch contours that are made up of lexical and intonational components. Studies by Kadin and Engstrand (2005) and Ota (2006) show that the speech production of Swedish children exhibit both components by 18 months, but the complex contours of Japanese children before the age of 18 months sometimes lack a critical intonational feature, that is, phrase- initial lowering (Ota 2003a). In languages with a lexical pitch accent system, therefore, there is a certain degree of independence between the development of lexical pitch features and intonational pitch features.

5.3.3 Development of Intonation By 4 to 5 months, infants begin to show evidence for sensitivity to intonational units in speech. In head-turn preference experiments, 4.5-month-olds prefer to listen to passages with pauses inserted between clausal boundaries rather than those with the pauses inserted in other places (Juscyzk et al. 1995). Similar preference for phrasal boundaries appears around 9 months (e.g. Jusczyk et al. 1992). Further evidence that infants have precocious sensitivity to the global prosodic well-formedness of utterances comes from findings showing that infants as young as 2 months remember a list of words better when they are said with an intonational phrase rather than a prosodically disconnected sequence (Mandel et al. 1994; Mandel et al. 1996). A crucial functional feature of intonation is that variations in pitch patterns signal non-lexical differences. In an extensive review of the literature on early spontaneous production, Snow and Balog (2002) conclude that there is no clear evidence that children acquire the form–meaning/context mapping of intonational pitch patterns before the onset of word production. Much of what might be perceived as “intonation” before word production is essentially paralinguistic; in other words, they are modulations of pitch, amplitude, and/or speech rate to indicate the emotional states of the speaker,

Prosodic Phenomena: Stress, Tone, and Intonation 81 rather than linguistic contrasts. Suggestions have also been made that the pitch contours on early words may be lexically bound (Halliday 1975; Crystal 1979; Galligan 1987); that is, pitch patterns are learned as if they are part of the lexical property of the word. However, even these studies report productive non-lexical use of pitch from around 17 to 18 months. Children’s understanding of the non-lexical nature of intonational phonology has also been demonstrated in experimental work using the novel-word/novel- object pairing paradigm. For example, 2½-year-old English-learning children treat novel word forms as different words when they have different vowels, but not when they have different pitch contours (rise–fall versus low–fall) (Quam and Swingley 2010). The range of structures and meanings signaled by intonation is quite wide, and not surprisingly, the development of the different functions of intonation is not uniform. Functions that approximate adult-like performance before school age include the use of intonation to differentiate illocutionary acts such as statement versus question (Patel and Grigos 2006), or to mark information structure such as newness (MacWhinney and Bates 1978; Wonnacott and Watson 2008), topic (Chen 2011), and contrastive focus (Hornby and Hass 1970; Wells et al. 2004; Müller et al. 2006; Chen 2007). Müller et al. (2006), for example, show that German 4-year-olds produce focus elements (“Peter bakes a cake” contrasted with “Eva wants to bake cookies”) with a higher F0 than non- focused elements. Not all aspects of intonation for information structure are acquired at the same pace, however. For instance, 4-to 5-year-old Dutch-speaking children are capable of producing adult-like contours for topic-marking, but not for focus-marking in the sentence-final position (Chen 2011). In general, the use of intonation to demarcate the phrasal structure of sentences seems to be acquired later than those functions mentioned here. Comprehension studies show that English-speaking 4-to 5-year-olds fail to reliably use prosodic cues to disambiguate structures such as [Tap][the frog with the flower] versus [Tap the frog][with the flower] (Snedeker and Trueswell 2004). Although 5-and 7-year-olds understand the syntactic difference between [[pink and green] and white] versus [pink and [green and white]] indicated by intonational phrasing (Beach et al. 1996), their ability to produce the same prosodic difference has yielded mixed results (Katz et al. 1996; Wells et al. 2004).

5.3.4 Evidence for Autosegmental Representation in Tone and Intonation Acquisition There are two key properties in tone and intonation that may dictate the ways in which they are acquired. First, there is considerable evidence that the phonological elements behind tone and intonation are inherently independent of other phonological structures. Thus, lexical tones can be mobile (e.g. a particular tone can move from one segmental position to another), or stable (e.g. a particular tonal pattern can stay even when the associated segment or syllable is deleted), and participate in one-to-many or many- to-one association with other structures (e.g. a particular tone can be ‘spread’ over many

82 Mitsuhiko Ota segments or syllables) (Yip 2002). Second, because all tonal and intonational phonology has a single phonetic correlate (i.e. pitch), the mapping between the acoustic signal and the underlying phonology can be complex. A particular pitch configuration can not only be a marker of lexical distinction, phrase boundary or an utterance type, but may also be a composite of all of them. These properties of tone and intonation have been successfully modeled in autosegmental phonology, which postulates discrete tonal elements lined up along a separate tier from the rest of phonological representation, both for lexical tone (Leben 1975; Liberman 1975; Goldsmith 1976) and intonation (Pierrehumbert 1980; Beckman and Pierrehumbert 1986; Ladd 2008). The independence of the tonal tier from the segmental tier can be illustrated by the following examples from Mende. The three words shown in (6) differ in the number of syllables as well as the surface contour of pitch.5 Nevertheless, autosegmental representations allow us to see that they have the same underlying tonal structure: H(igh)–L(ow). (6) mbû HL

kényà

félàmà

H L

HL

As an example of autosegmental analysis of the interaction between different types of tonal and intonational phonology, (7) shows the two ‘word accents’ in Stockholm Swedish, which exhibit variable contour realizations depending on whether or not they appear in a focus position (including their citation forms). This complex pattern can be explained as a combination of lexical pitch accents (either H*L or HL*, where the asterisk indicates the tone that is associated with the stressed syllable), an intonational phrasal accent (H), and an utterance-final L% tone lined up on the same tonal tier. The lexical pitch accents are always present, but the phrasal accent occurs only when the word is in a focus position (Bruce 1977, 1987). (7) Focus

Non-focus

Accent I

Accent II

nummer

nunnor

HL*HL%

H*LHL%

nummer

nunnor

HL* ‘numbers’ 5

H*L ‘nuns’

A circumflex indicates a falling tone, an acute accent a level high tone, and a grave accent a level low tone.

Prosodic Phenomena: Stress, Tone, and Intonation 83 One can see how an understanding of pitch-related phonology in terms of atomic tonal units may assist the learning of tone and intonation. If quantitatively continuous F0 information is abstracted into strings of discrete units, it can constrain the possible phonological structures that can be postulated for the attested data.6 Complex contours that may reflect patterns associated with a range of diverse functions including lexical contrasts, phrasal boundaries, or discourse semantics can be decomposed into units of mapping between tonal sequences and their functions. General patterns behind alternations in pitch patterns can be learned as autosegmental processes (e.g. spreading of tonal association, avoidance of identical adjacent tones). In this way, autosegmental phonology provides a plausible model of the acquisition of tone and intonation. Whether the development of these phenomena actually involves autosegmental mechanisms is, of course, an empirical question. The literature offers several types of supporting evidence. The first type of evidence for autosegmental structure comes from various observations pointing to the separation of pitch patterns from segmentals. If tone is acquired as an inherent feature of vowels, sensitivity for non-native tonal contrasts should also begin to attenuate around the same time. Infants’ sensitivity to non- native vowel (quality) contrasts typically declines before 6 months (Kuhl et al. 1992; Polka and Werker 1994). However, results in Mattock and Burnham (2006) and Mattock et al. (2008) indicate that the analogous perceptual reorganization of tone takes place later, some time between 6 and 9 months. Such findings suggest that the perceptual development of tones is independent of that of vowel quality. Some non-adult-like pitch patterns found in early production also indicate the separation of tonal and intonational phonology from segmental structures (Demuth 1993, 1995a; Ota 2003a). One such example in Demuth’s analysis of early Sesotho involves the application of the Obligatory Contour Principle (OCP; the ban on adjacent identical tones on the tonal tier). In Sesotho, when two underlying high tones become adjacent, one of them becomes a low tone. In autosegmental terms, delinking of a high tone occurs to satisfy the OCP, and the toneless tone-bearing unit receives a default low tone. The example in (8a) is an utterance produced by a 2½-year-old Sesotho speaker, who omitted the subject marker from the target structure (which is given in (8b)) (Demuth 1993: 297).7 The omission would have made the high tone in /ná/adjacent to the high tone in the first syllable of the verb stem (/bídíkìsà/). Instead, the stem-initial syllable was produced with a low tone (/bìdíkísà/), resulting in a structure that respects the OCP. Crucially, the tonal specification of a syllable in the adult model has changed in the child production, indicating the independence of tonal and segmental representations in the child’s phonological system.

6 How learners can translate the continuous and time-varying acoustic signal of pitch into discrete phonological units is not a trivial question. But recent computational work promises a solution (Yu 2011). 7 In Demuth’s transcription, only high tones are marked (by an acute accent), and all other vowels are assumed to have low tone. For the sake of clarity, low-tone vowels have been marked here with a grave accent. 1sPN = first-person singular pronoun; 1sSM = first-person singular subject marker; PRS = present tense.

84 Mitsuhiko Ota (8) a. ná bìdíkísà b. nná kè-à-bídíkìsà 1sPN 1sSM-pres-turn ‘Me, I’m revolving (it)’ Occasionally, children come up with an idiosyncratic system of phonology that they appear to have spontaneously created. Such original language games or word plays offer unique evidence for autosegmental representations of tonal structures. Yue-Hashimoto (1980), for instance, discusses the case of a Mandarin-speaking child who productively engaged in a word game from the age of 2 years. The word game involved the application of a fixed tonal pattern to real words, as shown in (9).8 All disyllabic words received a HL pattern, regardless of the original tones (9a). Monosyllabic words were reduplicated and also forced into the HL template (9b), unless the nucleus contained two vowels, in which case the vowels were split into separate syllables with an LH pattern (9c). Although data like this may not generalize to ‘normal’ language development, they serve as a striking demonstration of children’s ability to manipulate tonal elements separately from segments. (9) Adult target word a. t’au35 wan35 b. yə n35 c. k’ua53

Word play t’au55 wan11 yə n55 yə n11 k’ɯ11 a55

Another type of argument for autosegmental representations in early phonology relates to the idea that pitch movements in (adult) tonal and intonational phonology are best represented as sequences of discrete tonal elements rather than the shape and slopes of the contours. A corollary of this model is that pitch contours tend to have phonetically stable turning points that anchor the pitch movement. For example, the so-called ‘Accent II’ in citation forms of Swedish words (given in (7)) has been analyzed as having four underlying tones: H*LHL%. Bruce (1977, 1987) shows that the phonetic constants in such pitch configurations are the F0 of points that correspond to the L and H of the hypothesized underlying tonal structure (i.e. the black dots superimposed on the contour in (10)). (10) núnnor

‘nuns’

H∗LHL% Using second-degree polynomials defined by high and low turning points, Ota (2006) examined the spontaneous speech of Swedish-speaking 16-to 18-month-olds, and found that more high–low–high turning point sequences can be identified in Accent 8

Tones in these examples are marked with the standard number notation used for Asian languages: 35 = ‘high-rise,’ 55 = ‘high level,’ 11 = ‘low level.’

Prosodic Phenomena: Stress, Tone, and Intonation 85 II words than in other productions. Furthermore, the higher the F0 peak of the stressed syllable was, the larger the drop after the peak, demonstrating the relative stability of the low F0 point, a presumed phonetic realization of the L tone between the two H tones.

5.4 Summary and Future Directions The purpose of this chapter was to review some key descriptive findings in the development of stress, tone, and intonation, and to discuss the extent to which the acquisition of these phenomena can be understood in light of the structural representations and formal organizational principles proposed within metrical theory and autosegmental theory. There now exists a substantial body of descriptive work in this area that provides us with some understanding of how stress, tone, and intonation develop over time, from the pre-linguistic sensitivity shown by very young infants to late-acquired aspects of these prosodic properties. Further progress in this area is contingent on having better developmental data, both in terms of coverage and quality. There is a noticeable lack of work outside the usual stock of familiar languages (e.g. English, Dutch, German, French, Chinese, Japanese). Particularly problematic is the paucity of information on the acquisition of languages with an iambic system of stress or an African type tone system. Without access to developmental data from such systems, it is difficult to test the full range of typological predictions that follow from metrical approaches to prosodic acquisition. The quality of developmental data also needs to improve. Much of the production data we currently have are based on transcriptions, which are not only difficult to verify, but also prone to adult-listener bias. This is a particularly serious concern for stress, which has a complex relationship with its phonetic correlates. As we do not fully understand how stress is acoustically signaled in early production, adult transcribers who listen for stress using their phonetic correlates may miscode the data. The issue of phonetic realization also applies to laboratory-based developmental research. For instance, most experimental studies do not fully control for the acoustic dimensions that vary between the stressed and unstressed syllables in the stimuli. As a result, we cannot be sure what cues are used by the participant infants or children to determine the presence or position of stress.9 Turning now to more theoretical issues, it still remains an open question whether the acquisition of prosody is best understood within metrical and autosegmental phonology, despite the many attempts to make the case. With respect to the development of stress, we have seen that the affinity between the predictions of metrical theory and developmental data does not in itself warrant a causal relationship between them, since

9

A recent study that addresses this failing is Lintfert and Möbius (2010).

86 Mitsuhiko Ota at least some of the developmental observations are also consistent with models that do not assume the abstract structures featured in metrical phonology. On the other hand, it is clear that these frameworks provide us with extremely useful representational devices to describe and characterize the developing system and its difference from the adult system. It is an added advantage of the metrical approach that the hypothesized developing stress systems can be subsumed within a general architecture of phonology that has been proposed to account for mature systems. The issues of learnability and its connection to metrical structure spawn empirical questions that can direct our exploration of the potentially inherent mechanisms (constraints or biases) involved in the development of stress. Furthermore, they afford an impetus to examine how language acquisition is related to the typological properties of phonological systems attested in human languages. Similarly, several types of arguments can be put forward to support the idea that tone and intonation are acquired as sequences of tonal elements linearly organized independently from other phonological structures, as proposed in autosegmental theory. Unlike research on stress development, however, these claims have not been systematically pitted against developmental models that do not rely on pre-wired structural units. This is probably due to the fact that our understanding of tone and intonation per se (and hence their development) has generally lagged behind that of stress. But this state of affairs is rapidly changing with recent developments in prosodic modeling. Most of the work in this area explores ways to best model pitch contours, using, in some cases, the same discrete and static tonal categories in autosegmental phonology (e.g. ToBI; Silverman et al. 1992), but in other cases, more articulatorily or acoustically motivated targets or parameters (e.g. PENTA, Xu and Wang 2001; INTSINT, Hirst and Di Cristo 1998; Tilt, Taylor 2000). Computationally informed work is also emerging in the area of tonal and intonational acquisition, addressing non-trivial questions such as how tonal categories can be learned from continuous speech signals, and what acoustic information, types of learning functions, and potential structure in the hypothesis space may be necessary for tonal and intonational learning to succeed (Gauthier et al. 2007, 2009; Yu 2011). Both types of research are likely to shed new light on the representational requirements and learning mechanisms involved in the development of tone and intonation.

Pa rt I I

T H E AC QU I SI T ION OF M OR P HOL O G Y

Chapter 6

C om p ound Word Formati on William Snyder

Languages differ in the mechanisms they provide for combining existing words into new, “compound” words. This chapter will focus on two major types of compound: synthetic -ER compounds, like English dishwasher (for either a human or a machine that washes dishes), where “-ER” stands for the cross-linguistic counterparts to agentive and instrumental -er in English; and endocentric bare-stem compounds, like English flower book, which could refer to a book about flowers, a book used to store pressed flowers, or many other types of book, as long there is a salient connection to flowers. With both types of compounding we find systematic cross-linguistic variation, and a literature that addresses some of the resulting questions for child language acquisition. In addition to these two varieties of compounding, a few others will be mentioned that look like promising areas for coordinated research on cross-linguistic variation and language acquisition.

6.1 Compounding—A Selective Review 6.1.1 Terminology The first step will be defining some key terms. An unfortunate aspect of the linguistic literature on morphology is a remarkable lack of consistency in what the “basic” terms are taken to mean. Strictly speaking one should begin with the very term “word,” but as Spencer (1991: 453) puts it, “One of the key unresolved questions in morphology is, ‘What is a word?.’ ” Setting this grander question to one side, a word will be called a “compound” if it is composed of two or more other words, and has approximately the same privileges of

90 William Snyder occurrence within a sentence as do other word-level members of its syntactic category (N, V, A, or P). A given compound will be called “synthetic” if it contains morphemes corresponding to both a verb and a VP-internal argument of the verb. Thus, dishwasher is synthetic because it corresponds to the VP [wash(es) dish(es)].1 A compound will be called “endocentric” if it contains a “head” morpheme that determines its morphosyntactic features and general semantic type. In flower book, for example, the head is book. Hence the compound is a noun (like book), and names a type of book. Alternatively, a compound will be “exocentric” if there is an understood head that is not pronounced. For example, in the English exocentric compound pick-pocket, which means a person who picks pockets, the understood head would be person. Still another possibility is that a compound is “doubly headed,” as in Spanish hombre lobo, literally ‘man wolf,’ meaning something that is simultaneously a man and a wolf (i.e. a werewolf). In this particular example the plural form is hombres lobos ‘men wolves,’ which makes it especially clear that the two nouns are both functioning as heads. (The word hombre lobo is also an example of an “appositional” compound, because it is composed of two Ns that are both possible descriptions of the individuals named by the compound.) The term “bare stem” will be used for any form that (i) could be used as an independent word (or at least, could be so used after the addition of inflectional morphology), and (ii) is the form that inflectional morphology would combine with, but (iii) does not yet bear any inflection. Thus, the English word plum is a bare stem. The regular plural suffix -s can be attached directly to this form to make plum-s, which is no longer a bare stem. An example of a “bare-stem compound” is the English word worm cans (e.g. ‘tin cans used for storing fishing bait’), where the modifier worm is a bare stem (cf. worm-s). In contrast to English, many languages require a morphosyntactically more complex structure, involving more than bare stems, in order to express a comparable meaning. For example, French would need to substitute an expression like boite aux verres (‘can for-the worm-s’), which includes a prepositional phrase. Following Bauer (1978), this type of expression will be called a “compound phrase.” Another example involving a structure more complex than a bare-stem compound is the Hebrew expression kufsat tulaAim ‘can-of worm.’ Here the head N kufsat appears in a form known as the “construct state.” For each noun in Hebrew there exists an inflected, construct-state form, which is similar to a genitive-case form but involves the word that refers to the (literal or metaphorical) possession, not the possessor. Note that in many cases the construct form happens to be homophonous with the bare-stem form, but unlike bare-stem compounds, construct-state expressions permit the syntax to see and manipulate the words that are contained within them. For example, two nouns in the construct state can share a single modifier in a way that is impossible in bare-stem compounds (cf. English *a black -bird and -board, for ‘a blackbird and blackboard’).2 1 Note that some authors refer to non-synthetic compounds (including what will here be called “bare- stem compounds”) as “root compounds.” 2 Along with compound phrases and constructs, a number of further alternatives to bare-stem compounding can be seen in Snyder (2001: Appendix A). The seminal work on the Hebrew construct

Compound Word Formation 91

6.1.2 Creativity The next step will be to highlight a distinction that deserves far greater attention than it currently receives: A process of word formation either is, or is not, creative. Here “creative” is being used in the sense of “the creative aspect of human language”: the fact that we can freely create a new sentence, potentially one that has never been used before, and reasonably expect a listener to understand it. A specific type of compound word formation is likewise creative, in a given language, if it is available for automatic, impromptu use whenever a new word is needed to fit the occasion.3 In contrast, it can happen that the lexicon of a language includes numerous compound words of a given type, even though there is no corresponding process of compound word formation available for creative use. This is the situation of bare-stem endocentric compounding in French. In a detailed analysis of French-speakers’ use of compounds, Bauer (1978: especially 83–4) highlights the fact that French-speakers only create new endocentric compounds when they are deliberately trying to coin a new word. He also notes that endocentric compounds in the Germanic languages, like English frog man (in the sense of an underwater diver), sometimes give rise to “calques” like homme grenouille (lit. ‘man frog’) in French. Crucially, where frog man in English could mean any number of things, depending on the context (e.g. a person who sells frogs), in French it can only mean a diver. Unfortunately the standard term in the morphology literature, “productive,” is highly ambiguous: Some authors use “productive” to mean creative, while others, for example, define the productivity of a given type of compound in terms of the number of instances they find listed in a dictionary. An example of the confusion that can result is discussed in Snyder (2012: 281–2).4

6.1.3 Cross-linguistic Variation Thus, for any specific type of compound word formation, a first point of cross-linguistic variation is whether it is creative. A second major point of variation is whether it can state, with its mixture of phrasal and word-like properties, is Borer (1988). A brief synopsis of some of her key evidence and arguments can be found in Spencer (1991: 449–50.) 3

Spencer (1991: 322–4), referring to creativity as “productivity,” comments on morphologists’ surprising lack of attention to cross-linguistic variation in the creativity of compounding: “In general, the problem of productivity (in its various senses) is not raised in the theoretical discussion of root compounds, …” (1991: 322); “One particularly interesting, but largely unexplored, question is what governs differences in root compounding between languages” (1991: 323–4); “Finally, not all the root compound types found in English can properly be said to be productive. The question of what governs productivitity and whether it’s necessary to distinguish productive from non-productive compounding types has not been discussed extensively in the theoretical literature” (1991: 324). 4 Bauer (2001) provides a book-length treatment of the various notions of “productivity,” and argues that a distinction is crucially needed between what he calls “availability” and “profitability.” Availability refers to

92 William Snyder apply recursively. For example in English, the endocentric compound flower book may be handed back as an input to the process of endocentric compounding, to obtain a new compound like [[flower book] collection]. Namiki (1994) proposes the generalization that endocentric compounding is creative, in a given language, if and only if it is recursive.5 In Snyder (1999) I built on this idea and proposed that evidence of recursion in endocentric compounding may be significant for the child learning a language, as an indication that endocentric compounding is creative. Roeper and Snyder (2004) took the idea further, and suggested that evidence of recursion might be used by the learner much more broadly, as an indication that any particular grammatical process is creative. Namiki’s generalization has held up well in the years since it was proposed, but the details of recursive endocentric compounding have turned out to be a bit more complex. As discussed in Roeper and Snyder (2005), Swedish is a language with endocentric bare- stem compounding similar to that found in English, and much as in English, the process is both creative and recursive. Yet, the recursion in Swedish compounds is subject to a restriction: Any left-branching node that is itself branching must be followed by the “linking element” -s- (Josefsson 1997: 60). This can be seen in (1) (based on Roeper and Snyder’s example 5): (1) a. barn [bok klub] ‘child [book club],’ or ‘book club for children’ b. *[barn bok] klub ‘[child book] club,’ or ‘club for (collectors of) children’s books’ c. [barn bok]-s klub ‘[child book]’s club,’ or ‘club for (collectors of) children’s books’ Here we see that morphologically simplex modifiers, as in (1a), can combine with either simplex or complex right branches, and no linker occurs. Yet a morphologically complex (i.e. branching) modifier, as in (1b,c), triggers the insertion of an -s-between the complex modifier (barn bok) on the left and its corresponding head (klub). In (1c) the Swedish element -s-has been translated into English as the “Saxon genitive,” -’s, but the correspondence is only approximate. For one thing the more direct English translation of (1c), namely children’s book club, has the element in a different location than Swedish. More to the point, there is no general need for a genitive marker in English left-branching compounds, as can be seen from familiar examples like [student film] committee. German likewise has endocentric bare-stem compounding that is both creative and recursive, and like Swedish it sometimes employs a linking element that is realized as -s-. Yet the distribution of this element is clearly different than in Swedish, because it is not required in left-branching compounds. For example it is absent from the direct counterpart to (1c), Kinderbuchclub (literally ‘children book club’), on the relevant interpretation, ‘club for people interested in children’s books.’ The rules governing the distribution whether a process of word-formation is available to the speaker whenever the need for it might arise. Thus, availability corresponds to what I am calling “creativity,” and is likewise taken to be a binary property. 5 Cross-linguistic variation in the creativity (and recursivity) of bare-stem endocentric compounding will be taken up in section 6.3 under the heading of “The Compounding Parameter.”

Compound Word Formation 93 of linking elements in the different Germanic languages, and children’s acquisition of these rules, would be an interesting pair of topics to investigate in tandem.6 Another point of cross-linguistic variation in endocentric compounding is the linear position of the head. The Germanic languages all follow a Right-hand Head Rule (cf. Williams 1981): Whenever the structure of a compound branches, the daughter-node on the right is the head. Williams proposed his version of this rule as a cross-linguistic universal, but its universality appears to be contradicted by languages like Khmer and Thai, which are the mirror image of English: They permit creative, endocentric, bare-stem compounding, but with the head on the left. In other words, exactly where an English speaker might create a compound like [[fish sauce] bottle], a Khmer speaker can create the compound [daup [tuk trey]] ‘[bottle [sauce fish]]’ to mean the same thing. Hence, in addition to determining that the target language has creative (and fully recursive), endocentric bare-stem compounding, a child acquiring English or Khmer will need to determine whether an endocentric compound’s head appears to the left or the right of a modifier. Quite interestingly, Beard (1995, 1996) has observed the following pattern across languages: The order of the modifier and the head in an endocentric compound is almost always the same as the (default) order of an attributive adjective and the noun that it modifies within a noun phrase. For example, in English both an attributive adjective in a noun phrase (the red house), and the modifier in an (endocentric) N-N compound (flower book), appear on the left. In Khmer the attributive adjective (pteah krahom, literally ‘house red’), like the modifier in an endocentric compound, appears on the right. Yet, there do exist exceptions: In a survey of more than 50 languages, Beard found that the pattern was fairly robust, but he discovered several apparent counterexamples among the languages of the Americas. Of particular concern are Dakota, Kiowa, Koasti, Navajo, and the Michoacán variety of Nahuatl, all of which require attributive adjectives to appear on the right in a noun phrase, but require the modifier to appear on the left in a compound.7 6 While the German linker -s-is homophonous with the -s ending of the German genitive singular, it is clearly distinct. Genitive -s is restricted to masculine and neuter nouns, while linking -s-also occurs with feminine nouns (e.g. Hochzeitstorte ‘wedding-s-cake,’ where Hochzeit ‘wedding’ is feminine). Nübling and Szczepaniak (2008), who examined a one-billion-word sample of written German, report that (i) in most cases, German compounds do not contain a linker; (ii) German provides a variety of linking elements, but most of them (e.g. -es-, -er-, -e-, -ens-) are restricted to specific, lexicalized compounds; and (iii) only -s-and -n-are “productive” linkers in present-day German. They propose a phonological account: “[T]‌he occurrence of linking elements, above all -s-, strongly depends on the quality of the preceding pword: The more distant it is from the ideal pword (i.e. a trochee, the second syllable containing [ə] or [ɐ]), the more probable a linking -s-becomes.” Note too that Kinderbuchclub contains an -er that could be a plural marker or a linking element; either way it brings the pword Kind ‘child’ much closer to Nübling and Szczepaniak’s ideal. (The -s-in Hochzeitstorte, however, may be present to prevent adjacency of the two [t] sounds.) 7 Bauer (2001) conducted an independent cross-linguistic survey on the position of modifiers in nominal compounds, and the position of attributive adjectives in noun phrases. Unlike Beard he reports only a weak association, and suggests it might be stronger (though still imperfect) if one looks at possessors rather than adjectives. This outcome may be due to Bauer’s methodological choices: (i) heavy

94 William Snyder In Snyder (2012: 286–7) I noted that the same problem arises for Basque, and suggested one type of solution: Perhaps Beard’s Generalization really only applies to bare- stem compounding. In this case Basque would no longer be a counterexample, because the type of endocentric compounding it employs does not in fact contain a bare-stem modifier. While the modifier in a Basque compound is often homophonous with a bare stem, de Rijk (2008: 853–7) has demonstrated that in many other cases the two are not homophonous. Compounding in Basque is thus reminiscent of construct-state expressions in Hebrew. More precisely, every Basque noun has a special morphological form that we might call “modificational,” and for many nouns the modificational form is distinct from the bare-stem form (i.e. the form to which inflectional affixes attach). Moreover, in the Basque version of endocentric compounding, any noun that is not the head must appear in its modificational form. For example, Basque has a noun meaning ‘human,’ which is gizon in its bare-stem form. (Thus the ergative-indefinite form, for example, is gizon + ek = gizonek ‘humans.’) Yet, the modificational form is giza-, and this is the form that must be used in an N-N compound: giza kuntza ‘human language.’ According to de Rijk, the modificational form can usually be derived from the bare-stem form by applying one of a small number of morpho-phonological rules, but exactly which rule will apply to a particular noun is somewhat unpredictable. Hence, Basque does not in fact have a grammatical process of bare-stem compounding, and would not be a counterexample to the amended version of Beard’s Generalization.8 Interestingly, Beard (1996: 24) observes that in all four of Dakota, Kiowa, Koasti, and Navajo, a possessor noun (unlike an attributive adjective, but just like the modifier in a compound) precedes the head of its noun phrase.9 This observation raises a possibility that to my knowledge has not been tested: Perhaps what Beard is glossing as bare- stem compounds in each of these “exception” languages will, on closer examination, turn out to be something akin to a construct-state expression in Hebrew, or an “izafet”

reliance on descriptive grammars, rather than native speakers (cf. discussion of West Greenlandic and Turkana, 2001: 698); and (ii) extreme inclusiveness (i.e. when in doubt, call it a compound), as seen in his decision to ignore differences in creativity (cf. discussion of Hebrew, 2001: 698). Bauer arrives at some surprising conclusions: (i) endocentric compounds are all but universal (2001: 697: “This is the majority pattern … in the languages of the world, and there are very few languages which do not have compounds of this type”); and (ii) the order of head and modifier is highly unstable (2001: 697: “Although it might be expected that [the ordering of modifier and head in compounds] would be fixed in any individual language, that is the case only in about half of my sample from any of the areas used”). For present purposes I will instead rely on Beard’s work. 8

The linking elements of German compounds may be reminiscent of Basque modificational forms, because their likelihood of occurrence depends on the particular modifier, but they differ in important ways: (i) in German the linker is usually optional (except in certain lexicalized compounds, and after specific derivational suffixes); (ii) the linker has no effect on the bare-stem form that precedes it; and (iii) Nübling and Szczepaniak (2008) found that the likelihood of a linker is predictable from the modifier’s phonological shape. Where Basque has a process combining a modificational form with a bare-stem head, German has bare-stem compounding plus the possibility of something akin to epenthesis. 9 Beard does not clearly indicate the position of posessional Ns in Michoacán.

Compound Word Formation 95 construction in Turkish, where the head and/or the modifier is actually in a special, possession-related form. As discussed by Spencer (1991: 314–19 and 449–50), both Hebrew constructs and Turkish izafets (especially the so-called “indefinite” izafets) share a number of properties with bare-stem compounds, but neither type of expression can be constructed out of bare stems. In fact, Spencer indicates that neither Turkish nor Hebrew has any “productive” (i.e. creative) bare-stem compounding whatsoever. Hence, it seems possible that tying Beard’s Generalization more closely to bare-stem compounding (or perhaps to creative bare-stem compounding) would exclude the problem cases identified in (Beard 1996).10 Turning now to synthetic compounds, two important types are exemplified by (i) the English compound dishwasher, and (ii) the corresponding French compound lave- vaisselle, literally ‘wash(es) dishware,’ which functions as a noun meaning ‘dishwasher’ (usually in the sense of a dish-washing machine). Type (i) is creative in English, but unattested in the Romance languages. Type (ii), in contrast, is widely attested in the Romance languages, where it represents a creative process of word formation. English, however, allows only a few, lexicalized examples of this kind, such as scarecrow and killjoy.11 In fact, Beard (1995, 1996) argues that his generalization about head-modifier order in compounds can explain the heavy reliance on V-N compounding in Romance. The reasoning goes as follows. In Romance languages, the default position for an attributive adjective is normally to the right of the N heading the NP. Theferefore, by Beard’s Generalization, the modifier in a compound should also be on the right side. At the same time, according to Beard, any derivational suffix like English -er, or its counterpart -eur in French, is normally subject to two positional requirements: It must 10 Note that Beard was not specifically looking for bare-stem compounds. Hence, where Beard (1996: 22) glosses the Dakota expression čhą-luí as the bare-stem compound “tree skin” (meaning “bark”), for example, this does not necessarily signal a bare-stem compound. In fact, Beard’s gloss for another of his nominal compounds contains a possessive form, and strongly resembles a Turkish indefinite izafet: hociɬí im-layki, ‘star its-dung’ (= ‘meteor’), from Koasati (1996: 21, ex.26b, emphasis mine). Note also that for these problematic languages Beard sketches a different approach, in which (for example) the position of a nominal modifier relative to a deverbal nominal head in a compound is determined either by the position of a nominal modifier (i.e. argument) in relation to a V (the “input” category), or by the position of a nominal modifier in relation to an N (the “output” category), depending on a parameter-setting. In the problematic languages the “input” setting (unlike the “output” setting) yields appropriate word order. Interestingly, Lardiere (1998) reports that this input–output parameter can account for certain synthetic compounds she obtained from adult English-learners in an elicited production study. 11 Somewhat closer to English dishwasher (type i) is the French phrasal compound laveur de vaisselle ‘washer of dishware.’ The head noun is laver ‘to wash’ plus agentive -eur, and a preposition de (‘of ’) is inserted, probably for case reasons. The definite article la would ordinarily accompany vaisselle, but is omitted in laveur de vaisselle, just as it is in lave-vaisselle. The gender of a French compound like lave- vaisselle is consistently masculine (like laveur), even when the noun on the right-hand side is feminine (like vaisselle).

96 William Snyder be affixed to a V, and it must appear at the rightmost edge of its word. A compound like *lav-eur vaisselle is ruled out in French, because the stem vaisselle (which for present purposes is functioning as a “modifier” inside the compound word) needs to follow its head laveur, but if it does, the suffix -eur is no longer at the word’s right edge. (Of course, suffixing -eur to vaisselle would solve the one problem but create another.) According to Beard, the solution in French and other Romance languages is usually to leave the morpheme -eur unpronounced: hence, lave-vaisselle. Beard suggests that V-N compounds in French are masculine precisely because the unpronounced suffix -eur is masculine. I would add that this account might also explain why V-N compounding is not a creative process of word formation in English: Perhaps the suppression (i.e. non- pronunciation) of an overt morpheme like -e ur is allowed only as a last resort. In English one can readily create a compound like dishwasher (i.e. the modifier dish and the suffix -er are on opposite sides of the V), hence there can be no suppression of -er, and a compound noun like *dish-wash or *wash-dish cannot be created (except perhaps as a conscious coinage).

6.2 The Acquisition of Synthetic -ER Compounding 6.2.1 What Must Be Acquired? Turning now to acquisition, consider synthetic compounds like English dishwasher and French lave-vaisselle. What exactly does the child acquire? Based on the material in the preceding section, the forms of synthetic compounding that are potentially part of the child’s target language will be connected to several broader characteristics of the language. Assuming (for expository purposes) that the specific proposals in the preceding section are entirely correct, some of the things the child will need to determine are whether the target language normally positions modifiers to the left or the right of a head noun, and whether derivational morphology (like an agentive affix) is prefixal or suffixal. If the answers are “Left, Suffixal” as in English, then it becomes possible that the language will have compounds like dishwasher, where the modifer dish and the suffix -ER are on opposite sides of the V. Moreover, if suppression of the agentive suffix (as in French lave-vaisselle) is grammatically possible only as a last-resort operation, as I speculated at the end of section 6.1.3, then it should not be available as a creative process of word formation in a Left- Suffixal language. Nonetheless, such a language could have forms like English washer of dishes, where the logical object dishes is outside the complex word, in a separate phrase. (Note too that a Right-Prefixal language like Swahili should have the option of allowing -ER style synthetic compounds, with a linear order of “ER-Wash-Dish.”)

Compound Word Formation 97 If the broader characteristics are “Right, Suffixal” as in French, compounds like dishwasher should be blocked, but V-N compounds like lave-vaisselle might exist in the language. Forms like French laveur de vaisselle (‘wash-er of dishware’) might also exist, because in this type of expression the agentive suffix -eur is adjacent to both the V and the right edge of its word. This is because the modifier vaisselle is now outside the word laveur, in a separate phrase. (Note that in a Left-Prefixal language—if such languages exist—an -ER synthetic compound should likewise be disallowed, since both the modifier and the prefix would be competing for the left edge of the compound word. Such a language might use suppression of the derivational prefix to obtain a form like “Dish- Wash,” the mirror image of lave-vaisselle.)

6.2.2 Puller-wagons? To examine children’s acquisition of synthetic compounding in English, Clark et al. (1986) ran an Elicited Production (EP) experiment on 48 children, aged 3–6 (12 children in each of four age groups). Suprisingly, they found that 3-year-olds sometimes produced forms like puller-wagon in place of the target wagon-puller. At first glance this suggests that children might, for a certain period, have a grammar that differs from adult English (and from other languages that have been examined) by allowing a derivational suffix (-er) to be separated from the right edge of its word by intervening material (i.e. wagon). It also suggests that the child’s grammar at this point places “modifiers” (in terms of Beard’s system) to the right of the head, at least in the case of synthetic compounds— even though there are no corresponding reports of children putting attributive adjectives after the noun (e.g. the book blue is over there), as Beard’s proposals would lead us to expect. Indeed, Lardiere (1998: 298) presents these specific child-language findings as an especially serious empirical threat to Beard’s Generalization. Before sounding the alarm, however, we should carefully inspect the nature of Clark et al.’s evidence. Each research method has its own strengths and weaknesses, and these need to be kept firmly in mind when evaluating the results of any study. In the case of EP, one of the weaknesses is that a child may respond to uncertainty about the point of grammar being tested, or to excessive processing demands of the experimental task, by producing forms that are not genuinely permitted by her grammar. In places where we can perform side-by-side comparsions (cf. Snyder 2007: 96–104), this is a far more common occurrence in EP studies than in studies of Spontaneous Speech (SS). In SS it seems the child can much more easily confine herself to the types of utterances that her current grammar actually allows. (On the other hand, if one is interested in a linguistic structure that is used only rarely, even by children who definitely know it, then EP can provide a vastly greater quantity of data than SS.) In Clark et al.’s (1996) study, the experimenter asked the child questions of the form, “What could you call a girl who pulls wagons?” This may seem like a simple enough question, but the task of constructing the target form, wagon-puller, at the very least requires the child’s language-processing system to identify the relevant portion of the

98 William Snyder prompt’s hierarchical structure (i.e. [VP [V pull-s] [DP [N wagon-s]]]), remove all inflectional morphology (yielding [VP [V pull] [NP [N wagon]]]), add an agentive, category- changing suffix onto the verbal element (yielding [NP [N [V pull] -er] [NP wagon]]), and then extract the head ([N wagon]) of the complement phrase in order to left-adjoin it to the next-higher head in the structure, yielding the complex noun [N [N wagon] [N [V pull] -er]]. If a 3-year-old child were to become fatigued during all this computation, the occasional production of a derivationally intermediate form, like [NP pull-er [NP wagon]], would be unsurprising (and would constitute a performance error, not a signal that the child’s grammar is non-target-like). On the other hand, if an individual child consistently gave answers of the precise form seen in puller-wagon, then it would be quite important to test the hypothesis that this child (and probably other children as well) had temporarily adopted an incorrect grammar, indeed a grammar that Beard’s approach to synthetic compounding would not have predicted possible. Evidence that children routinely adopted such a grammar (at least temporarily) during the acquisition process would support Clark’s (1993: 150) proposal that this error-type is characteristic of a middle stage along the normal path to the adult-English system. Yet, Clark et al. found no such children. Of the 48 children in their study, 33 exhibited a consistent response pattern (i.e. at least 75 percent use of a single pattern) on at least one of agent nouns or instrument nouns (12 nouns of each type were elicited from every child). Of the 33 consistent children, the younger ones (3–4 years) predominantly used bare-stem endocentric compounds (e.g. water person and garbage machine, as well as the less adult-like hugger man for a man who hugs, and feeder machine for a machine that feeds), while the older ones (5–6 years) predominantly used adult-like synthetic compounds of the form O V-er (e.g. wall-builder, box-mover). Crucially, not a single child met the 75 percent-consistency criterion with compounds of the form V-er O. This strongly suggests that the 3-and 4-year-olds who occasionally produced V-er O compounds were not doing so because they had temporarily adopted an incorrect grammar, in which “V-er O” was the correct way to build an -ER compound. Rather, the error most likely reflected the child’s limited processing capacity.

6.2.3 Rat(*s)-eaters Returning to the issue of exactly what information is acquired during the acquisition of synthetic compounding, note that so far the cross-linguistic generalizations from section 6.1.3 have suggested strong possibilities, but not certainties, about the types of synthetic compounding that will be allowed if the target language has certain broader characteristics. This is because a type of compounding that is possible in principle might be blocked by independent lexical or grammatical characteristics of the language. In addition, while certain types of information, like the position of attributive adjectives relative to the noun they modify, should be readily available from any number of different sentence-types that occur frequently in child-directed speech, other essential

Compound Word Formation 99 information, like the specific derivational morphemes that can be used to create synthetic compounds, may be impossible to determine except from examples of actual synthetic compounds. Hence, knowing the broader characteristics of the language may help the child to know what to look for, but cannot provide all the answers. Given that simple memorization will be a large part of learning the realization of -ER in the child’s target language, why should we suppose that a child is ever doing anything more than simple memorization? For example, how can one exclude the possibility that early on, the child is simply “muddling through” with superficial, non-grammatical strategies, like memorizing specific compounds that have occurred in the input, and modifying them only slightly if at all? How can we tell when the child is using the same, grammatical system that we attribute to the adult? A very elegant answer is provided by Peter Gordon’s seminal (1985) “rat-eater” experiment. The study was based on the observation that adult English-speakers usually find it unacceptable to include regular inflectional morphology on the noun inside a synthetic compound. For example, a monster who loves to eat rats can certainly be described as a rat-eater, but is quite unlikely to be called a rats-eater.12 This contrast seems to disappear, however, if we use a noun whose plural form is irregular: a monster who loves to eat mice could be either a mice-eater or a mouse-eater. Crucially, the child acquiring English seldom if ever hears irregular plurals inside compounds, and therefore receives little or no evidence to suggest that irregular plurals, in contrast to regular plurals, are generally acceptable in compounds. Gordon searched for five specific high-frequency nouns with irregular plurals (mouse, man, tooth, foot, goose) in the Kučera and Francis (1967) word count, which was based on a carefully constructed sample of approximately one million words of printed English. Within the sample, these particular nouns were used a total of 153 times as non-head elements in compounds of various types, but 150 of those uses employed the singular form. Hence, if young children do not yet have the adult grammar of English synthetic compounding, and are simply modeling their speech on frequent surface characteristics of their input, they should strongly prefer the singular form of both regular nouns and irregular nouns when used inside a synthetic compound. In contrast, if young children already follow the same grammar of synthetic compounding as do adult English-speakers, then it may be possible to elicit irregular nouns (in contrast to regular nouns) as either a singular or a plural inside a synthetic compound. To test these predictions, Gordon ran an EP study on 33 3-to 5-year-olds (divided by age into three groups of 11). The task used a Cookie Monster puppet and a variety of small objects. Elicitation of rat/rats-eater or mouse-/mice-eater went roughly as follows: experimenter:

Do you know who this is? . . . It’s the Cookie Monster. Do you know what he likes to eat? child: Cookies. 12

Note, however, that regular plurals are sometimes acceptable within compounds, when certain conditions are met. This point will be discussed later in this section.

100 William Snyder experimenter:

Yes—and do you know what else he likes to eat?—He likes to eat all sorts of things . . . . [Experimenter shows a single toy rat / mouse.] experimenter: Do you know what this is? child: A rat. /A mouse. experimenter: [Showing four rats /mice] Here we have a bunch of . . . what? child: Rats. / Mice. experimenter: What do you call someone who eats X? [where X is the child’s plural form] child: A rat-eater. /A rats-eater. //A mouse-eater. /A mice-eater. experimenter: Do you think Cookie Monster is a X-eater? [where X is whatever the child said] child: Yes!

Notice that the experimenter’s prompt, “What do you call someone who eats rats/mice?,” contains a plural form of the noun, and can be expected to bias the child very strongly towards using a plural in his or her response, whenever this is possible. The experiment used the same five irregular nouns already mentioned (mouse, man, tooth, foot, goose), five matched regular nouns (rat, baby, bead, hand, duck), four pluralia tantum (clothes, pants, (sun)glasses, scissors) and another four matched regular nouns (toy, shirt, shoe, knife). Pluralia tantum, like irregular plurals, are acceptable as non- heads inside compounds. The children at all ages made a clear distinction between the regular plurals on the one hand, and the irregular plurals and pluralia tantum on the other. Out of 297 opportunities (33 children × 9 regular items), the children produced in total only six regular plurals inside compounds (2.02 percent of the opportunities). Interestingly, five of these six were produced by 5-year-olds, 1 was produced by a 4-year-old, and none were produced by the 3-year-olds. Gordon speculates that the older children’s metalinguistic awareness may have played a role. In contrast, out of 165 opportunities (33 children × 5 irregular items), the children produced a total of 36 irregular plurals inside compounds (21.8 percent of the opportunities). With pluralia tantum, out of 132 opportunities (33 children × 4 pluralia tantum items) the children produced a total of 68 pluralia tantum inside compounds (51.5 percent of the opportunities). In most other responses to pluralia tantum items, and especially in responses to glasses and scissors, the childen used a morphologically singular form, as in scissor-eater, which is also possible for adult speakers. Thus, the findings provide powerful evidence that very young children are already using the same grammar of synthetic compounding as adults.13

13 The contrasts between regulars versus irregulars and pluralia tantum were found to be statistically reliable using chi-square tests, where each subject was classified according to his or her dominant response pattern. Gordon (1985: 85) indicates there was only a single error of the puller-wagon type in his entire study. (The child’s age is not specified.) Hence, even the youngest group of children in his study

Compound Word Formation 101 Where then does the “mice-eater / *rats-eater” contrast come from? The general answer is that it must somehow follow from Universal Grammar. The best specific answer available in the early 1980s, when Gordon began his research, was that of Paul Kiparsky, who had been the first to point out the phenomenon. Kiparsky’s (1982) account was based on Level Ordering (therefore building on earlier work of Siegel 1979 and Allen 1978), within the framework of Lexical Phonology. Roughly speaking, the idea was that there are three distinct, ordered levels of word formation processes in the lexicon. Level 1 is the access-point for stored forms, including those with irregular inflection. Compounding and regular derivational morphology (i.e. derivational affixation that does not alter the phonological form of the base) occur at Level 2. Regular inflectional morphology is inserted at Level 3. Thus, Level 1 can supply eat and any of the forms mouse, mice, and rat to Level 2. There, the verb eat can be combined with the derivational morpheme -er; and then any of the compounds rat-, mouse-, and mice-eater can be constructed and passed onward. At Level 3 the stem rat could in principle be combined with the regular plural-marker -s, but not if it is already contained inside a larger word. Hence, *rats-eater cannot be generated. Gordon, in his 1985 paper, was careful to point out that the theoretical importance of his findings would remain, even if Kiparsky’s Level-Ordering story came to be replaced by another account of the regular /irregular contrast. Indeed, the general Level Ordering thesis has since fallen out of favor. (See Spencer 1991: 179–183 for a brief synopsis of the principal reasons.) One challenge to the thesis comes directly from plurals-within-compounds. On the one hand, Senghas et al. (1991) have demonstrated that adult English-speakers judge irregular plurals to be far more acceptable than regular plurals in the modifier position of a novel compound. Futhermore, as we have just seen, children as young as 3 years old exhibit the same pattern in Gordon’s EP task. On the other hand, there exist a great many exceptions to this pattern, and they belong to several different categories. Two cases were actually addressed in Kiparsky’s original proposals. First, as discussed earlier in this section, pluralia tantum include morphologically regular plural marking, but are necessarily lexically listed and therefore must be handled at Level 1. As expected, this type of plural-marking is unproblematic within compounds. A second type of exception concerns examples like Human Resources Department. Here Kiparsky proposed an explanation in terms of recursion. The idea is that a morphologically complex output of the word-formation system (e.g. the plural compound (N = 11, age 3;02 to 4;00, 18 items per child) made fewer such errors (at most 1; i.e. 0.5% of opportunities) than did the 3-year-olds (N = 12, age 3;00 to 3;10, 24 items per child) in Clark et al.’s (1986) study (10 errors; i.e. 3.5% of opportunities). This may result from Gordon’s holding the “_-eater” frame constant, and thereby reducing the processing demands on the child. Clahsen et al. (1992) successfully replicated Gordon’s study in German, where nominal inflection is more complex than in English. For adults, possible plural markers include -s, -en, -er, and -e. Clahsen et al. argue that adults take the “regular” plural ending to be -s, but certain children in their study took it to be the more frequent ending -en. Either way, each child avoided his or her “regular” ending inside compounds, and allowed the others.

102 William Snyder human resources) sometimes comes back and gets listed as a single unit in the lexicon. When it does, it can be included within another compound in the same way as a pluralia tantum. Also, as a consequence of being lexically listed, the form can have either a more specialized meaning, or an entirely different meaning, than its compositional semantics would indicate (cf. human resources, meaning “personnel”).14 Alegre and Gordon (1996) discuss a number of additional types of exception. One concerns modification by an inherently quantificational noun. For example, a week- long seminar can last only one week, while a weeks-long seminar must last longer than one week. Another type of exception, which the authors term the “heterogeneous” type, includes examples like publications catalog, where the use of plural-marking seems to indicate that the publications listed in the catalog are heterogeneous in nature (i.e. that multiple publication-types are represented). This intuition is brought out fairly sharply by the observation that a mineralogist, who presumably studies many different kinds of rocks and is interested in the differences among them, can be called a ?rocks expert much more readily than a simple pile of rocks can be called a *rocks pile. The main type of exception that Alegre and Gordon investigate is a case where the plural modifier in a nominal compound crucially needs to be modified by an adjective: [new books] shelf versus *books shelf. The authors’ analysis of this case is as follows. First, they argue that Kiparsky’s proposal of a recursive loop is on the right track, but needs to allow full-on syntactic objects to be stored as lexical items. This position is supported by examples like the how-can-he-be-a-seat-of-the-pants-executive-if-he-needs- experience absurdity. Second, they propose that the human sentence-processing system prefers not to posit the type of structure that is required in order to have a syntactic object inside a morphological object inside a syntactic object. Hence, faced with a choice between [N [NP red rat] eater] (= ‘eater of red rats’) and [NP red [N rat eater]] (= ‘red eater of rats’], it favors the latter parse. Yet, if the modifier bears regular plural marking, then a parse involving the compound noun *rats-eater (i.e. [NP red [N rats eater]] (= ‘red eater of rats’]) is strongly dispreferred, and the parse that interleaves syntactic and morphological structure becomes the best available option: [N [NP red rats] eater] (= ‘eater of red rats’).15 Alegre and Gordon argue that far from undermining the work in (Gordon 1985), the discovery of these apparent exceptions to “Kiparsky’s Generalization” simply indicates that the contributions of UG must be even richer than previously thought, in order to account for the complex system (whatever it is exactly) that adults are using. Moreover, 14 A crucial assumption is that this type of recursion is constrained in some way, because otherwise forms like *rats-eater would be (or at least, would rapidly become) fully acceptable, as a result of feeding the NP rats back into the lexicon as a stored expression. 15 As noted above, a third proposal is needed to block structures like [ [ N NP rats] eater]. If these were allowed we would have the unwanted prediction that *rats-eater is fully acceptable (or at least as acceptable as red rats eater, given that the derivation would be essentially the same). The authors’ proposal on this point is not entirely clear to me, but seems to involve the idea that an unmodified nominal modifier with plural marking is necessary given the “heterogeneity” interpretation (which would then be anomalous in the case of a rat-eater).

Compound Word Formation 103 the discovery of even finer-grained patterns in the adult data, such as the interaction of regular plural-marking with the presence/absence of an attributive adjective, creates wonderful new opportunities to assess how early the adult system is actually present in the child. To this end the authors carried out a new child-language experiment, this time checking children’s preferred interpretation of structurally complex compounds like red rat eater and red rats eater. The method was picture-selection, where the child chose between two side-by-side images. For example, in one image there might be a red monster eating a blue rat, while in the other there would be a blue monster eating a red rat. The subjects were 36 children (12 each) at the ages of 3, 4, and 5 years. Each child judged four test items. The child received either four items with plural marking (red rats-eater), or four items without it. The main findings were as follows. When there was no plural marker in the compound, children in all three age groups preferred the “non-recursive” interpretation (e.g. red eater of blue rats), as expected if children have the adult system in place very early. When the plural marker was present, the children’s preference reversed, again as expected if children master the system very early. At age 3, the children on average selected the “recursive” interpretation 1.0 times (out of four opportunities) when the modifier was singular, and 3.33 times when it was plural. For 4-year-olds the corresponding averages were 0.67 and 2.5, and for 5-year-olds they were 1.5 and 2.5. (The contrast between the singular and plural conditions was robustly significant by ANOVA, while there was no significant effect of Age nor a significant interaction of Age with Condition.) In sum, the work of Gordon and Alegre has provided powerful evidence that English- learning 3-year-olds already know a great deal about the morphology and syntax of synthetic compounds—indeed, far beyond what they could realistically have inferred from their input. Clark et al.’s well-known (1986) finding that English-learning preschoolers occasionally produce compounds of the form puller-wagon (section 6.2.2) seems best explained in terms of task demands, rather than erroneous decisions about the grammar of -ER compounds, given that none of the children in the study produced the error consistently. On the other hand, Clark et al. obtained a robust finding that when English- learning children are asked to create a novel word to name a type of person or a type of physical instrument, younger children (3-and 4-year-olds) mainly employ bare-stem compounds like water-person, while older children (5-and 6-year-olds) mainly employ adult-like synthetic compounds (e.g. box-mover).

6.2.4 Synthetic Compounds in Romance Languages Turning to languages other than English, we have seen that French (together with the other major Romance languages) does not allow a morpheme-by-morpheme counterpart to English synthetic compounds like dishwasher. Recall that this fact can be explained in terms of Beard’s Generalization. If the “unmarked” position for French

104 William Snyder adjectives is post-nominal, then modifiers in compounds should likewise follow the compound’s head, and will end up competing with the derivational suffix -eur for the right edge of the word. Hence, as soon as the child recognizes that the target language puts modifiers to the right of a head noun, and that derivational morphology is suffixal, he or she should (ideally) be on the lookout for exocentric V-N compounds like lave-vaisselle, and for phrasal compounds like laveur de vaisselle, both of which are in principle possible in this language type. Hence we might expect one or both of these forms to be relatively early acquisitions. Clark (1993: ch.10) reviews studies eliciting novel agent nouns and/or instrument nouns that had been conducted with children acquiring English, Icelandic, Hebrew, French, or Italian. For French there were no EP data available for children younger than 5, but for Italian there was work by Lo Duca (1990) with children ranging from 3;03 to 7;10. Clark (1993: 194–195) reports that Lo Duca used an elicitation prompt of the form, “Com[e]‌si chiama quello che fa le pizze?” (‘What do you call a person who makes pizzas?’), and that more than two-thirds of the responses from 3-year-olds used exocentric V-N compounds. As one looks at progressively older children, one finds increasing use of derivational suffixes, which is consistent with adult practice. (Adults reportedly find V-N compounds more appropriate for instruments than for agents.) Thus, Italian V-N compounding appears to be well-established by around age 3 (and quite posssibly earlier) as a fully creative process of word-forrmation. As the child gradually acquires the derivational morphemes that Italian provides for creating novel agent nouns, it seems the use of V-N compounds for agents is increasingly preempted by the more specific terms that become available. The earliness of V-N compounding, and the fact that children initially use it even more extensively than adults do, are both consistent with the idea that broader characteristics of the language may have “primed” the child to acquire V-N compounding very early.

6.3 The Acquisition of Bare-Stem Endocentric Compounding Turning now to bare-stem compounds, some of the questions that a language learner will need to answer are the following: (2) a. Is bare- stem endocentric compounding a creative process in my target language? b. If so, can it be used recursively? c. Is the head on the left or the right side? d. Are there any linking elements that occur inside the compounds? e. If so, what determines their distribution?

Compound Word Formation 105 Here I will focus on (2a,b). Given that Namiki’s Generalization (discussed in section 6.1.3) has held up well over the years since it was proposed, I will also assume that creativity and recursivity can be treated as a package.

6.3.1 Origins of “The Compounding Parameter” Beginning with Snyder (1995), and in subsequent work up to the present, I have been investigating what I refer to as “The Compounding Parameter” (TCP). In its current formulation (e.g. Snyder 2012), TCP concerns the availability of a mechanism that is essential for, among other things, the semantic interpretation of novel endocentric compounds. To a first approximation, however, we might think of TCP as a simple yes/no specification of whether bare-stem endocentric compounding is creative. In Snyder (1995) I first proposed that certain “complex predicates,” including adjectival resultatives (e.g. wipe the table clean) and separable-particle constructions (pull the lid off), are possible only in [+TCP] languages. This hypothesis was suggested by work on Dutch (Neeleman 1994) and Afrikaans (LeRoux 1988) arguing that in those languages, both adjectival resultatives and verb-particle combinations often have the morphological status of compound words. Even though the same is not true in English, I began to explore the possibility that there is nonetheless, even in English, a more abstract connection between complex predicates and compounding. A comparison of Germanic languages with Romance languages suggested that creative, bare-stem endocentric compounding (which is available in all the Germanic languages but none of the major Romance languages) might be a relevant type of word formation, because all the Germanic languages have adjectival resultatives and separable-particle constructions that are comparable to the ones in English, while none of the Romance languages do. A small-scale survey of the world’s languages lent plausibility to the idea, because for the languages sampled, there was at least a one-way implication: Every language with adjectival resultatives and/or separable particles had bare-stem endocentric compounding as a creative process of word formation.16 My next step was to check for a connection between complex predicates and compounding in language acquisition. I decided to focus on the acquisition of English, because at the time the CHILDES database (MacWhinney and Snow 1990) already included more than a dozen longitudinal corpora for English, while for any other language there were considerably fewer. I also decided to focus on separable particles with transitive verbs, because unlike adjectival resultatives they are used frequently by both adults and older children; and because with transitive verbs there is often a direct object intervening between the verb and the particle, which reduces the likelihood that a child’s verb-particle combination is simply an unanalyzed “chunk.”

16

For details, see Snyder (2001) and the update regarding Basque in Snyder (2012).

106 William Snyder Given the high frequency of particle constructions, and as it turned out, novel N-N compounds, it was possible to identify a fairly precise point in each child’s corpus where the child went from never using the given structure, to using it frequently, correctly, and with a variety of lexical items. (In Snyder 2007 this point is referred to as the age of FRU, for “First use, followed soon after by Regular Use with varied lexical items.”) In the case of novel compounds, there was a worry that the frequency might be too low to see a sharp change at the point of acquisition, but fortunately the children went through what Brown and Hanlon (1970: 33) termed a “brief infatuation”: when they discovered bare- stem endocentric compounding, they treated it like a new toy. Hence there was no difficulty in identifying any child’s FRU. The result was an extremely tight correlation, with a best-fit line that closely approximated an identity function: the FRUs for compounding and particles were consistently very close together in time, and often occurred during the same recording session. This pattern has held up quite well as more longitudinal corpora have become available from children acquiring English. In Snyder (2007: 92–3) I reported an updated version of the analysis, based on the 19 highest-quality longitudinal corpora available at that time. The ages of FRU for compounding ranged from 1.85 to 2.59 years, and for particles ranged from 1.85 to 2.56 years. Pearson’s r was .937, indicating that 88 percent of the variability in ages of FRU for either compounds or particles was predicted by the ages of FRU for the other (t(17) = 11.1, p < .001). Of course, one needs to be careful when interpreting this type of correlation, because children go rapidly from knowing very little about their target language to knowing a great deal. As a consequence, many different measures of language ability will show some degree of correlation. I therefore obtained a quantitative measure of general linguistic development, and applied the statistical technique of partial correlation. More precisely, I determined that the children, at the point of their FRU for verb-particle constructions, had an average MLU (mean length of utterance) of 1.919 morphemes. For each child I then determined the age at which the MLU had first reached 1.919 morphemes. Thus armed with a general developmental predictor of when each child would begin using verb-particle combinations, I applied the statistical method of partial correlation. This allowed me to see what would happen if I removed all the variability in the ages for particles and compounds that could be explained in terms of the general developmental predictor, and just looked for a correlation in whatever variability was left over. The correlation between the MLU-based measure and the ages of FRU for verb-particle combinations was quite strong (r = .8690), which indicated that the MLU measure was a good control. Nonetheless, after partial correlation, when all the variation that could be explained by the MLU-measure had been “partialed out,” there was still a robust correlation between compounds and particles: rpartial = .799, t(17) = 5.31, p < .001. Hence, the association of particles with creative, endocentric compounding goes well beyond what one would expect on general developmental grounds, and instead seems to be a deeper, grammatical connection.

Compound Word Formation 107

6.3.2 Further Tests of the Syntax–Compounding Link: Japanese and German The proposed link of certain complex predicates to creative endocentric compounding has been tested acquisitionally in both Japanese and German. Japanese is a [+TCP] language, with creative bare-stem endocentric compounding as well as adjectival resultatives, but no separable-particle construction. Japanese-learning children acquire endocentric compounding considerably later, on average, than English-learning children. This fact enabled Sugisaki and Isobe (2000) to use EP and truth value judgment (TVJ) tasks (both of which work best when the child is at least 36 months old) in their study testing predictions for the acquisition of Japanese. Specifically, Sugisaki and Isobe (2000) used a cross-sectional approach, and tested each of 20 children (age 3;04 to 4;11) in (i) an EP task for novel compounds, and (ii) a comprehension task for adjectival resultatives. The prediction was that children who understood adjectival resultatives would be able to produce novel compounds. Each child received four items on the EP task, and six items (three for each of two verbs) on the TVJ task. A child was classified as “passing” the EP task if she answered at least three out of four items successfully, and as passing the TVJ task if she answered all three items correctly for at least one of the two verbs. The results were that ten children passed on both, and six children failed on both. Of the remaining four children, two passed on compounding but failed on resultatives, and two passed on resultatives but failed on compounding. Curiously, these four children with discordant patterns were also the oldest children in the study. Sugisaki and Isobe report that they were clearly more interested in the laptop that was used to present the stimuli than they were in the actual stimuli. Nonetheless, the association between compounding and resultatives reached statistical significance, p = .019 by two- tailed Fisher Exact Test, and thus supported the proposed link between resultatives and compounding.17 Turning to German, Hanink and Snyder (2014) have analyzed longitudinal corpora from ten children acquiring German as their first language. German has creative endocentric compounding much like that in English, and in addition to adjectival resultatives, has a separable-particle construction that is broadly speaking comparable to that found in English, although many of the details are different. In particular, German is a V2 language with underlying SOV order. Particles are preverbal, but are left behind if the V raises into V2 position: Jan hat das Buch aufgehoben. (literally) ‘Jan has the book up-lifted.’ versus Jan hob das Buch auf. ‘Jan lifted the book up.’ 17

Given the noisiness of the data, and the difficulties in holding some children’s attention, it would be highly desirable to do more research with Japanese-learning children. One possible change would be to use a narrower age range, so that the experimental materials can be age-appropriate for all the subjects. Another possible change would be to avoid analytic methods that require imposing pass/fail cut-off points on the data.

108 William Snyder Children’s FRUs of novel compounds ranged in age from 1;11,02 (Schlümpf Buch ‘Smurf book’) to 2;06,14 (Babytuhl ‘baby chair’). Due to concern about children’s possible production of unanalyzed chunks, particles in immediately preverbal position were set aside. The FRUs of separated particles (including cases where the particle immediately followed the verb) ranged from 1;10,14 (des kippt um ‘that falls over’) to 2;08,00 (pa(sst) das nich hier hin (lit.) ‘fits that not here in’). The ages of FRU for compounds and particles were strongly intercorrelated (r = .97, t(8) = 11.34, p < .001). This correlation remained robust after partialling out the age of each child’s first lexical compound (r = .95, t(7) = 8.41, p < .001), and after partialling out the age of each child’s first attributive adjective modifying a noun (r = .81, t(7) = 3.64, p = .008). Lexical compounds were chosen as a control because they are closely matched to novel compounds in terms of factors such as length, stress pattern, and articulatory difficulty. Attributive adjectives were chosen because the relationship between an attributive adjective and its head noun is a closely matched control for the conceptual difficulty associated with the modifier–head relationship in a novel endocentric compound. Given that the correlation between the ages of FRUs for separated particles and novel bare-stem endocentric compounds had a best-fit line approximating an identity function, and remained robust even when closely matched developmental controls were removed by partial correlation, the German findings are highly similar to those in English, and lend strong support to the hypothesis that separable-particle constructions are closely connected to the availability of creative endocentric compounding.

6.3.3 Setting TCP A few brief remarks are in order about how a child might arrive at the positive or negative setting of TCP. First, given that the negative setting does not appear to add any new types of utterances to the repertoire of a speaker, it very plausibly serves as an initial and/or default setting. Thus, in the case of a [–TCP] language it is possible that there is never any moment when the learner specifically decides that the [–TCP] setting is the correct one. Alternatively, it is also possible that the learner allows a specific window of opportunity (e.g. in terms of the number of utterances encountered so far, in the input for that particular language), and waits until the end of that period to make a determination either way. Interestingly, in the case of a [+TCP] language, it seems there must be at least two possible routes. In the case of children acquiring English there is an extraordinarily close association between the onset of novel endocentric compounding and the onset of V- particle combinations, but it could have turned out differently. This is because there exist languages like Japanese, where novel compounding is available but separable particles are not. Hence, it could have turned out that children acquiring English went through a Japanese-like stage along the way. The fact that they do not (at least in the cases of the children examined so far) strongly suggests that they discover compounding by way of the particle constructions. In other words, separable particles are frequent in the child’s input and have a highly distinctive surface form. As soon as the child works out what they are, [+TCP] is a necessary consequence.

Compound Word Formation 109 In contrast, if the child were to rely on endocentric compounds directly, there would be a problem. Even the child hearing French or Spanish is likely to hear a non-trivial number of these, because they exist as frozen, lexicalized forms. It would be quite difficult for the learner to decide whether a specific N-N compound in the input was lexical or novel. Yet, there exists an alternative, given Namiki’s Generalization (from section 6.1.3): If the input includes a robust number of recursive endocentric compounds (e.g. [[Christmas tree] cookie]), the child can safely conclude that the language allows endocentric compounding as a creative process. Such recursive compounds definitely exist in child-directed English (in Snyder 1999 they were found in ten out of ten CHILDES corpora examined), although they have a considerably lower frequency than the particles. For a child acquiring Japanese, recursive compounds might very well be the best available indication that Japanese is a [+TCP] language, given that Japanese lacks separable particles. Indeed, aside from recursive compounds, the only cue available to the learner might be adjectival resultatives, which are also likely to have a low freqency, given that they do in adults’ English. Moreover, while endocentric bare-stem N-N compounding is certainly a creative process in adult Japanese, it competes with another form that has a very similar semantic range, namely the N-no-N construction. The genitive marker no can be used to create something highly similar to the “N of N” compound phrases used in the Romance languages. Hence, it seems the child acquiring Japanese will have to wait a good bit longer than the English-learning child in order to obtain conclusive evidence that the target language is [+TCP]. This is plausibly why the sample of 3-and 4-year-olds studied by Sugisaki and Isobe still included a considerable number of children who failed to produce novel compounds. Furthermore, there is an interesting implication for models of language acquisition. If Japanese-learning children set TCP using recursive compounds (or perhaps adjectival resultatives), while English-learning children set TCP using V- particle combinations, it speaks against acquisition models in which each parameter is innately connected to a single trigger.

6.3.4 The Nature and Scope of TCP At present my working hypothesis is that TCP is a parameter of the syntax–semantics interface, and that the [+TCP] setting makes available an interpretive rule that I term “Generalized Modification” (GM), as shown in (3). (3) The Compounding Parameter (TCP): The language (does /does not) permit Generalized Modification. The proposal is that GM plays an essential role in the semantic interpretation of a novel compound, and also in the interpretation of a complex predicate (such as a V-Particle combination or an adjectival resultative) in which two syntactically autonomous constituents are jointly characterizing a single event. The proposed role of GM in the interpretation of compounds is very similar to proposals of Jackendoff (2002) and Kratzer (2010), among others, to the effect that the

110 William Snyder relationship between the head and the modifier in an English endocentric compound is extremely flexible and context-dependent. A formal definition is given in (4). (4) Generalized Modification (GM) If α and β are syntactic sisters under the node γ, where α is the head of γ, and if α denotes a kind, then interpret γ semantically as a subtype of α’s kind that stands in a pragmatically suitable relation to the denotation of β. For example, I assume that a noun can be viewed as denoting a kind of individual, and that when GM applies to a novel N1-N2 compound in English (i.e. with N2 as the head), the resulting meaning will be a subtype of N2’s kind that stands in some contextually appropriate relation to the denotation of N1. This is extremely flexible, and is meant to be, so as to capture the fact that for an English-speaker, in the right context, a compound like frog man could mean the kind of man who looks like a frog, eats frogs, studies frogs, sells frogs, or wishes to be buried in a frog-shaped casket, for example. The definition in (4) is also intended to encompass verbal predicates, which I assume serve to specify a kind of event (or more precisely, a kind of eventuality—which might be either an event or a state). My proposal is that GM might be one of the only mechanisms available to combine two separate event descriptions into the description of a single event. For example, in the case of an adjectival resultative like wipe the table clean, we might assume that wipe has undergone movement, and that originally it was part of the constituent wipe clean. There GM could apply, taking the V wipe to be the head, and returning the meaning “a wiping event of the kind associated with a state of cleanliness.” Along the lines of Levin and Rappaport Hovav (1995: 54), I assume that the conceptual calculus of events is highly restricted, and that this denotation would have to be interpreted as “an accomplishment event, with wiping (of the patient) as its development subpart, and cleanliness (of the patient) as its culmination.” A more detailed exposition of the proposed semantics can be found in Snyder (2012). Acquisitional evidence that links a number of additional syntactic structures to the availability of creative bare-stem endocentric compounding can be found in the following works: Beck and Snyder (2001), for English telic path PPs with manner-of-motion verbs; Goodrich and Snyder (2013), for English atelic path PPs with manner-of-motion verbs; and Snyder (2001) for English make-causatives, perceptual reports, and put- locatives, though for a contrasting view see Eom and Snyder (2012). For a new acquisitional test of the English particle-compound link, using the Intermodal Preferential Looking paradigm, see Naigles et al. (2013) and Snyder et al. (2014). For evidence that children acquiring French never produce novel N–N compounds, but do produce novel compound phrases (N de N), see Snyder and Chen (1997).18 18

While the Romance languages are clearly [-TCP], there do seem to be a few types of compounding that are creative, and these merit greater attention from acquisitionists. These include the exocentric V- N compounds to name instruments and (occasionally) agents, which were discussed in section 6.2 (cf. French le mange-souris lit. ‘the eat(s)-mice’ for ‘the mouse-eater’); and the doubly-headed, appositive compounds like Spanish la mujer araña, literally ‘the woman spider,’ for ‘the spider woman.’

Chapter 7

Morpho-p hon ol o g i c a l Ac quisi t i on Anne-M ichelle Tessier

Morpho-phonology describes the sound patterns of a language that interact with its lexicon of morphemes. It includes how roots, stems, affixes,and the like are affected phonologically when combined to create words and phrases, and how morphological processes and generalizations rely on phonology in their application. The basic facts of morpho-phonology come from alternations: multiple surface forms that share semantic content but differ in their phonological realizations. Some phonological patterns are morphologically “blind,” meaning that the alternations found in multi-morphemic words and phrases display exactly the same phonotactics as mono-morphemic ones, while other sound patterns are morphologically restricted to varying degrees. Acquiring a morpho-phonological system requires two kinds of interwoven knowledge: the learner must determine a set of lexical representations attached to meanings, and also control the phonological regularities involved in the combinations of those stored representations. This dual nature of the problem challenges both learner and researcher. To use morpho-phonology, a learner must control not just surface sound sequences but also the multiple morphemes they are derived from; to study the acquisition of morpho-phonology, the researcher must not only ascertain what the learner said but also what they meant and how many pieces of meaning they used to mean it. For at least this reason, the area of morpho-phonological acquisition is somewhat understudied; one of this chapter’s goals is to highlight various gaps in the existing and growing literature. The chapter reports a current, basic understanding of the nature and facts of acquiring morpho-phonology—what needs to be learned, how it is observed to be learned, and how these observations might be explained by theories of learning. It begins with adult typological data that address the question of how much of a language’s alternations can or cannot be predicted from its phonology alone and summarizes how this typology is treated, especially in rule-based and constraint-based phonological grammars. The second section presents some empirical observations about morpho-phonological

112 Anne-Michelle Tessier acquisition and considers several important views of how alternations are acquired. This discussion is centered around two central themes: first, the extents to which an observed alternation is either stored by learners as a isolated lexical fact or taken as evidence of more abstract, generalized grammatical knowledge; second, the mechanisms by which morph-phonological generalizations are made and extended by the learner, whether with rules, constraints, analogies, or other methods. The third section discusses the methodologies and results of experimental studies of morpho-phonological acquisition, and the final section raises a few of the many questions as yet unanswered in the field.

7.1 Three Kinds of Morpho-phonology This section presents three kinds of morpho-phonological pattern that learners face: those driven purely by a language’s phonotactics, those that are in some way morphologically- restricted, and those that include both “regular” and “irregular” morpho-phonological processes. Each pattern is illustrated and discussed in terms of its theoretical consequences, which in turn will serve to introduce theoretical approaches to their acquisition.

7.1.1 Phonotactically-motivated Alternations In many languages such as Russian, obstruents alternate between voiced and voiceless in the following two phonological conditions: obstruents must be voiceless at the end of a phonological word (1a), and sequences of obstruents must voice or devoice to match the final obstruent’s underlying voicing ((1b), compared to (1c)). These conditions drive voicing alternations in both affixes (1b) and roots (1a) (Halle 1959; Kiparsky 1985; data here from Padgett 2002): (1) Russian alternations driven by voicing phonotactics (a) final obstruents: (b) clusters: regressive [–voice] only [±voice] assimilation book track beach squeal hut

knik (gen.pl.) slet (nom.sg.) plʲaʃ (nom.sg.) visk (nom.sg.) isp (gen.pl.)

knig-a (nom. sg.) sled-a (gen. sg.) plʲaʒ-a (gen. sg.) vizg-a (gen. sg.) izb-a (nom. sg.)

ot-stupitʲ to step back od-brositʲ to throw aside pot-pisatʲ to sign pod-ʒetʃ to set fire to is-klʲutʃatʲ to exclude, dismiss iz-gnatʲ to drive out

(c) before sonorants: [±voice] contrast ot-jexatʲ to ride off pod-nesti to bring (to) iz-lagatʲ to state, set forth

Morpho-phonological Acquisition 113 Neutralization of obstruent voicing in these contexts is common cross- linguistically—patterns like (1a) are also found in e.g. Dutch (Booij 1995) and Polish (Rubach and Booij 1990); patterns like (1b) are found in Swedish (see Lombardi 1999 and references therein). A similar alternation is found among the English inflectional affixes indicating past tense, plurality, possessives, and third person singular agreement. These morphemes usually surface as voiced obstruents (e.g. past tense jog[d]‌, burn[d], row[d]; plural jog[z], burn[z], row[z]), but devoice when they appear after a voiceless consonant (e.g. past tense kick[t], huff[t]; plural kick[s], huff[s]). In English this alternation also correlates with a ban on mismatched obstruent voicing in word-f inal clusters, banning underived words like *[kɪkd] or *[kɪkz]. A very different example of an alternation serving phonotactics comes from the Ekiti dialect of Yobura (e.g. Archangeli and Pulleyblank 1989; Orie 2003), in which words are heavily restricted for the co-occurence of [+ATR] and [–ATR] vowels. These restrictions affect in particular the [±ATR] specifications of sequences of high and mid vowels; this harmony both holds of the vowels within a root (2a) and also drives alternations in mid and high vowel prefixes (2b): (2) Ekiti [±ATR] vowel alternations: data from Orie (2003) (a) stem mid vowels agree for [±ATR] (b) prefix Vs alternate to match stem [òde] outside *[odɛ] [ɛ̀dɔ]̀ liver *[edɔ] [ɔrʊk̀ ɔ] name [ewìrì] bellows

[lɛ] indolent [bi] give birth [lɔ] go [lo] use

[ɔ̀-lɛ] lazy person [ò-bí] parent [ʊ-̀ lɔ] going [ù-lò] using

Theoretical consequences. Morpho-phonological alternations can in principle be driven by any of a language’s phonotactics: assimilations and dissimilations, local and long-distance, cluster reduction, hiatus resolution, syncope, metathesis, and so on. One influential theoretical conclusion which has been drawn from this equivalence is that a single mechanism should enforce phonotactic restrictions both in the underlying representations of mono-morphemic words and in morpho-phonological alternations, casting some doubt on rule-based approaches that used two different mechanisms to capture such restrictions (this has been termed the duplication problem, beginning with Kisseberth 1970; Sommerstein 1974; see also Goldsmith 1993). One alternative is the theory of Natural Phonology (Stampe 1969, 1979), in which the same set of natural simplification “processes” are applied initially by learners to surface words regardless of evidence from alternations. This point has also been marshaled in support of constraint-based Optimality Theory (OT; Prince and Smolensky 2004; for an experimental investigation of this connection in L2 adult learning, see Pater and Tessier 2005).

114 Anne-Michelle Tessier

7.1.2 Morphologically-sensitive Phonological Alternations Some alternations appear to be driven by phonological demands that only hold in certain morphological contexts. In one case, a phonological process is not enforced in morphologically-derived words just when the process does not apply in a word’s base, enforcing “paradigm uniformity” (starting with Venneman 1972; Harris 1973; King 1973). As an example, the process in (3a) which voices /s/→ [z]‌between vowels in Northern Italian (Nespor and Vogel 1986: 126ff) applies in derived words across many root-affix boundaries (3b)1 except when the word’s morphological base contains an unvoiced [s] as in (3c): (3)

Northern Italian s/z alternations: data from Kenstowicz (1997: 10–11) (a) /s/ → [z]‌/ V_V (b) /s/ → [z]‌/ V_V (c) paradigm uniformity: [VsV] across root-affix boundaries only if base has surface [s]‌ a[z]‌ola, *a[s]ola button hole a[z]‌ilo, *a[s]ilo nursery school

di[z]‌-onesto dishonest di[z]‌-uguale unequal ca[z]‌-a house ca[z]‌-ina house, dim. re-[z]‌istenza resistence pre-[z]‌entire to have a presentiment

[s]‌entire to hear pre-[s]‌entire to hear again [s]‌ociale social a-[s]‌ociale asocial bu[s]‌ bus bu[s]‌-ino bus, dim

Numerous such examples are found cross-linguistically; it is especially common for the stress patterns of a base word to over-rule phonological processes in its derived words, including further stress assignment, vowel reduction, syncope, and others (classic examples include English: Chomsky and Halle 1968; Palestinian Arabic, Maltese, and Spanish: Brame 1974; Catalan: Mascaró 1976). Similar morphological disruptions can result in the over-or under-application of a phonological restriction within morphological paradigms. One set of Polish diminutives illustrated in (4) provide a striking example. Some nominal paradigms like (4a) show a phonological alternation between [ɔ]~[u]‌; (4b) shows that this alternation is completely “leveled” in diminutives, with [ɔ] for masculine stems and [u] for feminine:2

1

Though not across a clitic-verb boundary; Nespor and Vogel (1986) argue that the process holds within the Prosodic Word. 2 As noted by McCarthy (2005a) and others, the (4a) raising process is quite morphologically- restricted: see Gussman (1980: ch. 4); Kraska-Szlenk (1995); Sanders (2002); and references therein.

Morpho-phonological Acquisition 115 (4) Polish vowel raising and diminutives (data from Kraska-Szlenk 1995) (a) Raising: /ɔ/ → [u]‌in stem-final closed (b) Paradigm leveling: diminutives syllables (except before nasals) (exceptional vowels underlined) ‘ditch’ (masc) ‘cow’ (fem.) ‘ditch,’ dim. ‘cow,’ dim. nom gen. dat. acc. instr loc.

sing. duw dɔwu dɔwovi duw dɔwem dɔle

plur. dɔwi dɔwuf dɔwom dɔwi dɔwami dɔwax

sing. krɔva krɔvɨ krɔvjɛ krɔvɛ̃ krɔvõ̃ krɔvjɛ

plur. krɔvɨ kruf krɔvom krɔvɨ krɔvami krɔvax

sing. dɔwek dɔwka dɔwkovi dɔwek dɔwkjem dɔwku

plur. dɔwki dɔwkuf dɔwkom dɔwki dɔwkami dɔwkax

sing. krufka krufki kruftsɛ krufkɛ̃ krufkõ̃ kruftsɛ

plural. krufki kruvek krufkom krufki krufkami krufkax

Kraska-Szlenk’s (1995) interpretation of this leveling is that all diminutive stems are leveled to match the diminutive’s nominative singular form (in bold), where the (4a) pattern triggers raising for feminine forms like [kruf.ka] but not for masculine [dɔwek]; see also Kenstowicz (1997). Another way in which a language’s phonology may be morphologically-sensitive is a differential application of processes or restrictions in different-sized morphological domains: roots, stems, affixes, clitics, and so on. One example from Turkish vowel harmony (e.g. Lees 1961; Clements and Seizer 1982) is shown in (5): while suffixes must agree with preceding stems for vowel features including [±back] (5a), root-internal disharmonic sequences are attested (5b) (unlike the Ekiti Yoruba vowel harmony from section 7.1.1): (5) Turkish vowel harmony between roots and suffixes: data from Van Oostendorp (2004b) (a) suffix vowel in [-ler/-lar] predicted by (b) harmony imposed on suffixes, even [±back] feature of final root V while roots are disharmonic rope face hand stamp stalk

root [ip] [jyz] [el] [pul] [sap]

nom. pl. [ip-ler] [jyz-ler] [el-ler] [pul-lar] [sap-lar]

spool pilot book

root [bobin] [pilot] [kitap]

nom. pl. [bobin-ler] [pilot-lar] [kitap-lar]

The fact that Turkish’s affix vowels alternate to match stem vowels, rather than the reverse, is a well known cross-linguistic asymmetry (see Bakovic (2000a) and references therein on ‘stem-controlled’ vowel harmonies.) More generally, languages may impose static asymmetries on morphological categories regarding the location of phonological contrasts (e.g. McCarthy and Prince (1994, 1995a); see also Smith (1999, 2011); with respect to child speech, see Jesney and Tessier (2009)).

116 Anne-Michelle Tessier In other cases, some structures created at morphological junctures are less susceptible to phonotactic pressures than within morphemes alone: an English example comes from affixes like un-and -less (see esp. Kiparsky 1985). First, the [n]‌of ‘un’ is not forced to share place of articulation with following stop codas (u[n]balanced, u[n]couth) as required in other contexts (cf. i[m]balance and monomorphemic handy, Bambi, a[ŋ].gry.]. In addition, these affixes can create geminate sonorants not otherwise tolerated in English (e.g. u[n:]atural; sou[l:]ess, as discussed in Borowsky 1986; Benus et al. 2004). In a recent context, see Wolf (2010) for examples and discussion of many similar patterns. Even more lexically-specific alternations can be triggered by individual morphemes, which obey cross-linguistic phonological restrictions but are not otherwise observed in the rest of the language. One Dutch example comes from two assimilatory allomorphs of the diminutive suffix, which change from /-tje/to [-pje] and [-kje] after stems ending in an unstressed vowel followed by a homorganic nasal or obstruent (e.g. [bezəmpjə] ‘broom-dim’; [konɪŋkjə] ‘king-dim,’ Kerkhoff (2007: 16)). While this assimilation is typologically unsurprising, the phonology of Dutch does not otherwise ban sequences such as [Vmtjə] or [Vktjə] (Trommelen 1983; Booij 2002). A different, phrase-level example from American English is the optional palatalization of coronal stops before /j/in for example /nid + ju/→ [ˈnidʒə] ‘need you’ and /it + jor/→ [ˈitʃər] ‘eat your.’ Despite its phonetic naturalness, this process is not generally applied in English, and here only affects the initial /j/s of certain unstressed function words (cf. ‘need youths’: /nid juθs/→ *[nidʒuθs]3). Theoretical consequences. From the learning perspective, morphologically-sensitive patterns present a challenge because discovering the alternation requires identifying the allomorphs that share a common meaning despite their phonological differences. Once an alternation has been discovered, the learner has at least two options for analysis: either to impose some phonological pressures in limited morphological contexts, as discussed in section 7.2—or to not rely on phonology when choosing between these allomorphs at all, but rather to use enriched lexical representations or other mechanisms (see also section 7.1.3). Focusing on the first approach, morphological restrictions on phonological processes have been used as evidence for several theories of phonology that can selectively interweave phonological processes with morphological word-building (in contrast to Chomsky and Halle 1968 in which morphology is applied in its entirety before phonology.) In the rule- based approach of Lexical Phonology (esp. Kiparsky 1982, 1985; Mohanan 1986), a language’s ordered phonological rules are grouped into levels, themselves ordered according to the morphological affixes or processes (e.g. compounding) that apply before the level’s phonology is brought to bear. Phonological similarity within paradigms comes about via level-ordering: in the (3b) example, intervocalic voicing does not affect a[s]‌ociale because it applies at a level prior to a-prefixation; the Dutch diminutive alternation can be captured with a rule of place assimilation which applies only in the stratum where the diminutive affix attaches. The Lexical Phonology view of morpho-phonological interactions has also been adapted into a constraint-based approach known as Stratal OT or DOT (Bermudez- Otero 1999, 2003; Kiparsky 2000; Blumenfeld 2003), discussed in section 7.2.3. 3 For a frequency-based argument as to why ‘you’ and not e.g. ‘youths’ triggers this process, see Bybee (2001: 71–2).

Morpho-phonological Acquisition 117 In cases of paradigm uniformity and leveling, many constraint-based analyses have been proposed which overtly compel morphologically-related forms to be phonological similar (see Kenstowicz 1997; McCarthy 1998; Kager 1999b; Benua 2000; Steriade 2000; Burzio 2004). In cases like the Dutch diminutive, morphologically-indexed constraints can in some ways recapitulate the levels approach (see esp. Ito and Mester 2001; Flack 2007a; Pater 2009b; see Inkelas et al. 1997 for a somewhat different view). With respect to capturing morphological derived environment effects, see especially Wolf (2008: ch. 4), see also Łubowicz (2002).

7.1.3 Regular and Irregular Morphology, from the Phonological Perspective A different kind of morphologically-conditioned phonological alternation involves “regular” versus “irregular” markings of the same semantic distinction. One English example that has been extensively debated within the acquisition literature is past tense marking. The regular English past tense verbs undergo affixation of the [-t/-d/-ə d] allomorphs, discussed in section 7.1.1 for their phonotactic regularity—but many other past tense verbs are irregular: taking an unexpected allomorph (burn~burnt), undergoing root-internal processes (run ~ ran; hang ~ hung; dive ~ dove), or some combination thereof (lean ~ lent; weep ~ wept; bring ~ brought). These alternative affixes and processes are considered irregular not merely because they are less frequent or less productive (see section 7.2.1) than the phonologically- regular suffixes but also because they are not driven by any independent phonological pressure in the synchronic grammar. A compelling illustration of this point raised by Pinker and Prince (1988: 110) are pairs of homophonous present tense verbs like hang, with one regular past tense (The prisoner was hanged) and the other irregular (The painting was hung). At its most extreme, irregular morphological marking cannot be understood as phonological in any sense (as with am~was, go~went), so that the learner must store separate suppletive forms. Despite these degrees of arbitrariness, it is often the case that morphological irregulars show phonological sub-regularities. A Spanish example discussed by Clahsen et al. (2002) comes from irregular verb roots in first and third person singular past tense conjugations, as illustrated in (6): in these forms, regular verbs are stressed on their suffixal vowel (6a), whereas in irregular roots, stress predictably falls on the root vowel (6b):4 6)

Phonological regularity among Spanish irregular inflections (a) regular verbs: suffixal stress (b) irregular verbs: root stress infinitive 1st.p. sing. past 3rd.p. sing. past.

viv-ir, to live viví vivió

pon-er, to put púse púso

4 Spanish permits stress on any of the last three syllables, so that both the regular and irregular stress patterns must be contingent on morphological information.

118 Anne-Michelle Tessier On phonological regularites among irregulars for the English past tense, see especially Albright and Hayes (2003) on ‘islands of regularity’; these are discussed in detail in section 7.2.2. Many languages have both regular and irregular methods of marking common morphological information, particularly inflectional distinctions (e.g. number, person, tense, plurality, possession, and so forth). These multiple morphological patterns may arise via diachronic processes, borrowings, and other influences, but they very commonly exist in tandem within the synchronic grammar, and so must be learned in one way or another by individual learners in each generation. Such data have been central to the development of multiple theories of morpho-phonological acquisition, and have fueled considerable debate about the mental representations of morphemes and the phonology that shapes them—especially with respect to the dividing line between memorized isolated forms and abstracted generalizations, discussed again in sections 7.2.2 and 7.2.5.

7.2 Approaches to Morpho- phonological Acquisition 7.2.1 Initial Observations Several research perspectives have converged on the belief that much phonotactic learning can and does take place for children months before any concrete morphological knowledge is acquired. The compelling empirical evidence of this developmental trajectory comes from infant speech perception studies (beginning with Eimas et al. 1971; Werker and Tees 1983, 1984); this area has provided converging evidence that by 8–12-month-old babies are able to discriminate a wide range of phonological properties of their native language, including allophonic consonants and vowels, legal phonotactics, word stress, prosodic contours, and high-versus low-frequency sequences and structures (for one review, see Jusczyk 1997 and references therein). Recent experimental work on infants’ receptive knowledge of morphological dependencies and proto-alternations suggests that morpho-phonological sensitivities may develop soon afterwards, perhaps between 12–17 months (see section 7.3.1.) This staggered timeline accords well with the observation that those alternations which are first mastered in production are usually those in accordance with the language’s independent phonotactics, as in section 7.1.1. This observation was substantiated in surveys of cross-linguistic studies reported by MacWhinney (1974, 1975, 1978), discussed in some detail in this section. Łukaszewicz’s (2006) longitudinal study of A, a child learning Polish, provides a different example: by 3;8, A always produced alternations that accorded with Polish’s phonotactic processes of word-final obstruent devoicing (7a)5 and voicing assimilation within and between words (7b). On the other hand, 5 Though with respect to word-final sonorants, A’s final devoicing was not yet mastered: see Łukaszewicz (2006: 23).

Morpho-phonological Acquisition 119 A took much longer to master some of Polish’s vowel-zero alternations (‘jers’) which are conditioned by particular morphemes (see (7c)): (7)

The acquisition of Polish alternations: data from Łukaszewicz (2006) A at 3;8: phonotactically-motivated voicing alternations produced correctly

a)

bread

[xlep] (nom. sg.)

picture

[ɔbras] (nom. sg.) [ɔbrazi](nom. pl.)

[ɔbraski] (dim. nom.pl.)

tiger

[tɨgrɨs] (nom. sg.) [tɨgrɨsi] (nom. pl.)

[tɨgrɨz gɔni] a tiger is chasing

(c)

[xlebi] (nom. pl.)

(b)

A at 5;1: [ɛ]~zero still produced incorrectly in some paradigms day [dʑɛɲ] (nom. sg.) child: *[dʑɛɲɛ] (nom. pl.) target: [dɲi] ~ [dɲɛ] lion [lɛf] (nom. sg.) child: *[lɛvɨ] (nom. pl.) target [lvɨ] fire [ɔgɛɲ] (nom. sg.) child: *[ɔg’ɛɲɛm] (instr.) target: [ɔgɲɛm]

[xlep s sɛrɛm] bread and cheese

At the same time, difficulties with phonological production will necessarily impede the accuracy of an alternation when it appears in a difficult phonological context: for example see Song et al. (2009) on increased omission rates of English plural marking on consonant-final roots, which create a coda cluster (e.g. do[gz]), compared to vowel-final roots which create only singleton codas (e.g. bee[z]‌) (see also Stemberger and Bernhardt 1997; Demuth 2009; and section 7.2.4.) A second well known observation is that the acquisition of morphologically-sensitive phonology is often “U-shaped”: an initial period of apparently correct use (Cazden 1968) is followed by a regression, often involving some overgeneralization of the alternation’s scope, and finally by improvement and mastery. Bernhardt and Stemberger (1998) discuss a child who overgeneralized the phrasal palatalization of ‘need you’ [nidʒu] from section 7.1.2, applying it before all preceding obstruents rather just coronal stops: e.g. / lʌv + ju/→ *[lʌvʒu], ‘love you.’ Overgeneralization is also famously found in the morphological ‘irregulars’ of section 7.1.3, for example English *foots instead of feet. Zwicky (1970) reports a child at 4;6 overgeneralizing the restricted English morpheme -en as a means of creating past participles for irregular verbs: aten, gaven, roden, shooken, tooken. Several likely properties have been identified as contributors to the ease and speed of a morpho-phonological pattern’s acquisition. One oft-mentioned predictor is the frequency of an alternation—both the number of lexical tokens and types in which it is evidenced in the ambient input, as well as the range of phonological contexts in which it applies. For example, MacWhinney (1978) reports that a Hungarian vowel lengthening process affecting root-final [a]‌is learned productively by age 2;6–2;9, while [e]-lengthening is not fully productive until age 7; while both processes are fully regular, the former is much more frequently instantiated in the input because many more roots ends

120 Anne-Michelle Tessier in [a] rather than [e] (7 percent versus 1.8 percent of the relevant lexicon respectively.) Another factor is “transparency,” both of meaning and sound pattern—see for example Demuth and Ellis (2009) on the rates of acquisition of Sesotho noun class affixes. Outside the acquisition context, such properties have been combined into a notion of morphological “productivity” (for a recent review, see Baayen 2008; see also especially Hay and Baayen 2005); however the ways in which these properties are understood to influence the learner either directly or indirectly differ widely among the theories discussed in this section. A useful starting point for a discussion of children’s possible strategies in morpho- phonological acquisition comes from the “dialectic” model of MacWhinney (1978). This model identifies three ways that a child can go about learning a morpho-phonological pattern: (i) rote-learning in which they memorize and reproduce observed associations between a meaning and an unanalyzed phonological form, e.g.[bʊks] = “plural of book”; (ii) grammatical combination in which they produce morphologically-complex forms by choosing among allomorphs to combine in a phonologically-predictable way, for example /kwd-final/ + /{-s, -z}/ → [ks];6 (iii) analogy in which they produce complex words by making them sound like other memorized forms with similar meaning, for example producing the plural of [bʊk] as [bʊks] because the plural of [lʊk] is [lʊks], and look is (say) the stored root most similar to [bʊk].7 Among the grammatical ‘combinations’ of this model’s option (ii) are both the phonological rules independently motivated in the language discussed in section 7.1.1 and the morphologically-restricted but phonologically-defined rules discussed in section 7.1.2; the former are understood to be learned at an earlier stage than the latter (corresponding roughly to MacWhinney’s 1978 cycles 3 and 4, respectively). If the ‘rules’ of the dialectic model are interpreted to include the use of any phonological grammar, whether via rules or other mechanisms, then perhaps all approaches to morpho-phonological acquisition assume one or more of the model’s three strategies. In other words, the study of morpho-phonological acquisition investigates the correct definition and relative importance of rote-learning, grammatical combination, and analogy.

7.2.2 Acquisition via Rote-memorization and Rules As the name suggests, MacWhinney’s Dialectic Model sees its three acquisition strategies as continually in mutual competition, and preferentially ordered in succession. Learners are hypothesized to begin producing words using rote-learning exclusively; when rote learning fails, they will use grammatical combinations, and only when both rote-learning and grammatical combinations fail will they use analogy. The failure of rote learning is 6 This rule format is in the spirit of MacWhinney (1978) but see the original for somewhat different notational conventions. 7 See MacWhinney (1978: 2, 99) for a more precise description of the type of analogy assumed here.

Morpho-phonological Acquisition 121 presumably driven by the need to apply morphology to forms the child has never heard in the necessary morphological context (e.g. pluralizing a noun that has only ever been heard in the singular) while the failure of rules occurs when the learner has exhausted the grammatical mechanisms available to them in capturing an observed alternation. Rote-learning of unanalyzed wholes is understood to be the explanation for the first stage of U-shaped development, before overgeneralizations emerge. Rote-learned morpho-phonology is indistinguishable from mastery of a pattern when observing known lexical items, but it will be clear from failure on an experimental study with nonce words (see section 7.3). As an example, MacWhinney (1974: 76) reports the results from seven children who used Hungarian plural markers correctly in spontaneous speech but “showed no ability to use plurals in the experimental condition” which asked children to produce the plurals of novel singulars. Theories of how the learner learns to control morpho-phonology vary in part according to the relative primacy of morphological versus phonological knowledge assumed to capture alternations. The dialectic model considers the morphology as the central focus of the process, in that rules are constructed only via evidence of alternations, without explicitly consulting any prior phonotactic knowledge. From this vantage point, one approach to learning about allomorphs from MacWhinney (1974) is to construct a rule that assumes a form built from all and only the commonalities among all allomorphs. As evidence for this strategy, MacWhinney (1975: 68) cites Hungarian learners acquiring the five plural allomorphs all of the form -Vk (-k, -ök, -ek, -ok, and -ak) who go through a stage collapsing them all to -k. Similarly, Kernan and Blount (1966) discuss Spanish learners who simplify the two plural allomorphs [-s] and [-es] to the single fricative, thus producing *papels instead of the correct papeles; see also Miller and Schmitt (2010). One explicit algorithm from the recent literature that follows somewhat similar reasoning (also following Pinker and Prince 1988, see section 7.2.5) is the Minimal Generalization Learner (MGL; Albright and Hayes 2003). The MGL begins in a rote- learning mode, building a single rule for each unique context observed as in (8a,b) below. With multiple data points, however, it immediately begins to generalize across rules with the same change—such as ‘add [t]‌to “walk” to make it past tense’—and creates featural abstractions over their environments, combining for example (8a) and (8b) to form the slightly more general (8c). This learner will eventually build very general rules that correspond well to a maximally simple linguistic environment (8d), but it will also build more specific rules like (8e) along the way: 8)

Examples of English past tense rules learned by the MGL (a) walk ~ walked: 0  [t]‌/w ɑ k __ [+past] (d) 0  [t]‌ / [–voice] ___[+past] (b) talk ~ talked: 0  [t]‌/ t ɑ k __[+past] (e) 0  [t]‌/[–voice fricative][+past] (c) collapsing (8a,b): 0  [t]‌/X ɑ k __[+past]

With many generalizations about marking a single morphological contrast, the MGL has two pressures that aid the choice of which rule to apply in any context, and in some sense they are both frequency measures. One prefers rules with a large scope: those

122 Anne-Michelle Tessier whose context gives them the potential to apply to a large number of roots, because their phonological environment is either very frequent or very broad (e.g. 8d). The other prefers rules with a high reliability: those which correctly predict a high proportion of observed alternations among the forms in their scope. The rule in (8e) happens to have a very high reliability—in fact Albright and Hayes (2003) find it to be without exception among the 352 verbs to which it can apply in their lexicon—and the authors argue that English speakers demonstrate sensitivity to such reliability in their nonce word judgments. Like the dialectic model, the MGL does not directly build its rules with any prior phonotactic knowledge, though see Albright (2002) on the MGL’s use of “a limited capacity to discover phonological rules.” From the MGL’s perspective, regular and irregular morphological rules are learned and treated identically—the rules in (8) which control the regular past tense allomorph [-t] co-exist with all the rules driving irregular past tense changes like run ~ ran, sing ~ sang. This contrasts starkly with the prominent Dual Route approach to regular versus irregular morpho-phonology, which combines grammatical combinations for regular morphology with the rote memorization of irregulars (see especially Pinker and Prince 1988, 1994; Clahsen et al. 1992; Pinker 1999; Clahsen et al. 2002; and references therein). Whereas the MGL’s rules each vye for the opportunity to apply, a dual route model sees irregular rules as blocking the application of a default regular rule. Its major empirical predictions concern asymmetries in overgeneralization, namely that learners should apply regular rules to stems which act as irregular in the target language, and not the reverse; this prediction crucially relies on the analytic decision as to which rules are in fact the “regular” ones (e.g. see Say and Clahsen’s 2001 response to Orsolini et al. 1998, in which the former argue for a dual route account of Italian verbal acquisition data, relying on the claim that the latter work incorrectly identified which Italian patterns count as “regular”). The dual route approach is also associated with the claim that pure frequency does not dictate the “regular” rule (see esp. Pinker and Prince 1988 and also Clahsen et al.’s 1992 claim that children’s treatment of German compounds including plurals suggests they do not simply choose the most frequent allomorphs as the regular ones). With overgeneralization as central to the acquisition arguments for dual route accounts, it should be noted that some controversy has emerged in the literature about the true rates of morphological overgeneralization and how they should be interpreted (cf. Marcus et al. 1992; Maratsos 2000). One early study of relevance to this debate (though not associated with the dual route approach) is Kuczaj (1977), who found two different correlations between pre-school children’s accuracy on regular versus irregular past tense verbs. In a corpus of spontaneous speech, Kuczaj found that MLU (mean length of utterance) to be a better predictor of accuracy in producing only regular past tense verbs than chronological age: under the assumption that MLU increases as the grammar improves, this was interpreted as evidence that regular verbs are inflected for past tense by grammatical mechanisms. Conversely, this study also reports that chronological age is a better predictor of accuracy for irregular past tense forms than MLU; this suggests that irregular past tense formation merely improves with increased exposure to input. Along with the fact that the corpus

Morpho-phonological Acquisition 123 did not reveal any incorrect applications of irregular past tense vowel changes, the study interprets this correlation as evidence that irregulars are memorized one-by-one.

7.2.3 Acquisition via Constraint-based Phonology In recent years, theories of morpho-phonology have been built around the prominent constraint-based views of phonology, most notably Optimality Theory (McCarthy and Prince 1995a; Prince and Smolensky 2004; also Boersma 1998) and more recently Harmonic Grammar (Legendre et al. 1990a; Smolensky and Legendre 2006; Potts et al. 2010). In these theories, phonological patterns are driven by two sets of conflicting pressures: one that restricts marked structures in surface forms, and one which preserves underlying contrasts. In contrast to the more morphologically-driven approaches of the previous section, a constraint-based view of phonology has often led to a more phonologically-driven understanding of alternations. Recalling the empirical timeline from section 7.2.1, the learner of a constraint-based grammar who begins the task of morpheme “discovery” and semantic decomposition in the second year of life already controls most aspects of their simple phonotactics, so their learning task in the case of alternations driven by “pure” phonotactics as in section 7.1.1 is greatly reduced. As an example, a 2-or 3-year- old learning a language with consistent syllable-final devoicing, such as German, has likely already acquired a ranking which tolerates obstruent voicing in syllable onsets but not codas, as exemplified in (9) with the analysis of Lombardi (1999): (9) German ranking I: final devoicing /gʁaz/ (hypothetical input)

Ident-Onset[Voice] *VoicedObstruent Ident[Voice]

**!

[gʁaz]

*

☞ [gʁas] [kʁas]

*!

* **

When this learner determines that two forms like [gʁas] ‘grass, sg.’ and [gʁæ.ze] ‘grass, plur.’ share a common morpheme, their remaining tasks are to choose this morpheme’s underlying form, and then to refine their ranking to derive the precise nature of its allomorphy. If choosing between the two surface alternants of the morpheme, namely /gʁaz-/ and /gʁas/,8 the learner can observe that the existing phonological grammar can drive the word-final /z/ → [s]‌change by feeding this underlying representation (UR) to the ranking in (9). To see that the opposite UR and mapping cannot explain this alternation, whereby underlying */gʁas + ə/ → [gʁazə], the learner can consult another previously-learned phonotactic fact shown in (10), namely that German phonotactics do not enforce intervocalic voicing: 8

Setting aside the additional alternation in vowel quality.

124 Anne-Michelle Tessier (10) German ranking II: no intervocalic voicing /gʁasə/ [gʁa.zə] ☞ [gʁa.sə]

Ident-Onset[voice]

*V[-voice]V

*! *

Using the phonotactics represented in (9) and (10) to determine the correct UR for this morpheme requires an ability to compare across these multiple surface forms (as in e.g. Ito et al. 1995). An alternative approach is for the learner to compare not across multiple forms but over multiple successive grammars, and to detect if an incorrect UR assumption is causing the learner to repeatedly flip-flop between the same rankings (Kager 1999a). Other recent approaches in constraint-based learning have taken a fresh look at how underlying representations should be constructed from the observation of alternations (e.g. Jarosz 2006a; Apoussidou 2007; Jesney et al. 2010.) With respect to morphologically-sensitive patterns, however, more constraint reorganization is in order. One general strategy is to bias the learner towards phonological assumptions that will prove maximally informative once morphological alternations have been discovered. In the case of paradigm uniformity, this approach has been articulated as a ranking bias for Output–Output faithfulness constraints to be undominated at the initial state (McCarthy 1998; Hayes 2004; Tessier 2007; see also Becker 2009: ch. 4). This hypothesis about the learner’s initial state is strikingly supported by anecdotal cases in which children appear to innovate uniformity within morphological paradigms, overriding their otherwise-correct phonotactics. Bernhardt and Stemberger (1998) provide an example from Gwendolyn, a child learning English who from 2;0 to 3;8 produced voiced stops intervocalically in mono-morphemic words like ‘water’ (similar to the context’s target alveolar tap), but produced root-final alveolar stops with the voicing of the base form: for example ‘sitting’ as *[ˈsɪtɪŋ], matching the voiceless stop of [sɪt], rather than [ˈsɪɾɪŋ]. Other examples are reported in Kazazis (1969), Hayes (2004), Jesney and Tessier (2009, 2011). Interestingly, it also appears that some learners impose paradigmatic leveling on affixes as well: Jesney (2009) presents evidence from one child Trevor (Compton and Streeter 1977; Pater 1997) who at 2;3–2;5 produced past tense suffixes as [-d] even at the expense of cluster voicing agreement (e.g. [fɪnɪʃd] ‘finished.’) To acquire other kinds of morphologically-specific phonology as in section 7.1.2, OT constraint-based learners may again revise either their grammar or their lexicon. In the first type of approach, a learner can use the evidence of conflicting errors to clone some morpheme-specific constraints: see especially Becker (2009) and Pater (2009b). Another method is to store multiple allomorphs, encode them all in underlying representations, and give the grammar some special faithfulness assumptions with which to determine outputs (see especially Wolf 2015 and references therein; see also a similar but rule-based view in MacWhinney 1978.) One consequence of a theoretical shift from phonological rules to constraints is the change in emphasis from process-based to surface-based generalizations (see especially

Morpho-phonological Acquisition 125 McCarthy 2002). A connection between surface-based phonological generalizations and morpho-phonological development comes from the considerable evidence that children omit affixes in words whose simplex forms appear to be already marked with an affix. A frequently-noted example from English is that children tend to omit plural markers on roots that end in sibilants, and past tense markers on roots that end in coronal stops (Berko 1958; Bybee and Slobin 1982 and references therein; see also Kopcke 1988, 1998 for German.) Bybee and Slobin (1982) argue that this tendency can be understood as a learner using the surface-based generalization “plurals end in sibilants” rather than a process of affixing a final sibilant or fricative, and also suggest that children apply such surface-based generalizations differently than adults. Related evidence comes from Stemberger’s (1993) report that the frequency of a stem vowel influences the likelihood that learners will over-regularize its past tense forms—though how this surface-based fact might be incorporated in the grammar is not clear. A quite different constraint-based view of morphologically-sensitive phonology is taken in Stratal OT and similar frameworks (Bermudez-Otero 1999, 2003; Kiparsky 2000; Blumenfeld 2003), which combine the basics of a Lexical Phonology architecture of morphologically-defined levels or strata with an OT grammar of ranked phonological constraints, re-rankable at each stratum. Bermudez-Otero (2003) discusses how a Stratal OT grammar could be used by a learner to acquire the morpho-phonology of Canadian Raising and its famous interaction with flapping (Joos 1942; Chambers 1973, among others; for learning consequences, see also Hayes 2004; Pater 2014). In this approach, learners begin by acquiring the ranking of the last stratum, via the analysis of surface forms within a large morphological domain (such as the phrase), and then working downwards one level at a time to the very deepest morphological levels (such as the stem or root). A crucial mechanism in this process is the learner’s ability to remain agnostic about aspects of underlying representations which show evidence of neutralization at deeper, as-yet-unlearned morphological strata (a strategy termed “Archiphonemic Prudence” by Bermudez-Otero 1999, 2003; see also McCarthy 2005b).

7.2.4 Morpho-phonological Acquisition and Prosodic Representations A different perspective on the phonological drives behind morphological acquisition is that children’s acquisition of functional morphemes can be predicted in part by their prosodic shape.9 Much research demonstrates that early child grammars of English and Dutch obey prosodic constraints requiring that words must contain a single foot, either consisting of two syllables or one heavy syllable (Fikkert 1994; Kehoe and Stoel-Gammon 1997b; Pater 1997; Kehoe 2000; see also Adam and Bat-El 2008 on similar evidence from 9 This view is not inherently tied to a particular theory of phonological grammar itself—in principle, a child’s morpho-phonological production could be influenced by a rule that restricts prosodic shape, but often it has come paired with a constraint-based account, using the prosodic constraints of especially Selkirk (1995); see also Selkirk (2011).

126 Anne-Michelle Tessier Hebrew). When inputs violate these prosodic pressures by being larger than a foot, truncation usually occurs: [ˈbæ.nʌ] for ‘banana’ or [ˈɛ.fɪnt] for ‘elephant.’ When input morphemes are too phonologically small, they must either be footed with an adjacent word (as frequently assumed in adult languages) or else they must be deleted entirely. This rationale has been used to understand a variety of asymmetries in the production of different morphological affixes, and the cross-linguistic order and rates of their acquisition. For example, Gerken (1994, 1996) provides evidence that English-learning children produce more functional morphemes such as determiners the and a after monosyllabic roots than longer ones—that is, producing an overt determiner in ‘Tom hit the pig’ (in which ‘hit the’ could form a single foot) more often than in ‘Tom wanted (the) pig’; see also Carpenter (2005). Demuth and McCullough (2009) report that such patterns are observed only in early speech, disappearing between 2 and 2;6, also noting this as the age-range in which initial unstressed syllables begin to be produced. In cross-linguistic work, Lléo (1997) and others note that Spanish-learning children use “proto-determiners” from the onset of production, whereas German-learning children do not, and relate this to the nature and frequency of prosodic structures in each language (see also Lléo and Demuth 1999; Demuth 2001; Lléo 2001). Demuth and Ellis (2009) report somewhat similar results for Sesotho suffixes marking noun classes, which are used at the early stages (up to 2;3) more often with monosyllabic nouns than disyllabic ones.10

7.2.5 Morpho-phonological Acquisition via Analogy A different approach to acquisition agrees that morpho-phonological learning begins with rote-memorized forms, but asserts that these full forms are in fact all that the learner needs to store. In a purely analogical model, learners store a very rich and redundant lexicon of all observed words, simple or complex, and all their knowledge of morpho- phonology comes from calculations of phonological and semantic similarities across this lexicon—see Bybee and Slobin’s (1982) pioneering work on phonological “schemas” for determining morpho-phonological alternations (see also Bybee 1985, 1995, 2001; Skousen 1989; Tomasello 1998b, 2005; Bybee and Hopper 2001; Pierrehumbert 2001; Skousen et al. 2002; Croft and Cruse 2004; among others). When attempting to produce a novel multi-morphemic form (e.g. with a nonsense word, see section 7.3), a learner with purely analogical knowledge of morpho-phonology will rely not on a grammatical mechanism, but on the statistical properties of their lexicon and its schemas. In some senses, the Minimal Generalization Learner (Albright and Hayes 2003) from section 7.2.2 represents a type of compromise between rule-and analogy-based learners: while the MGL uses discrete grammatical rules learned from observed alternations, it chooses

10 For further discussion of the view that the phonological treatment of different morphological categories comes from different prosodic representations, in both first and second language acquisition, see variously Inkelas (1993); Goad et al. (2003); Goad and White (2004); van Oostendorp (2004a).

Morpho-phonological Acquisition 127 which rule to apply to a nonce root depending on its (structured) similarity to known roots—a more analogical approach than any strictly Dual Route model. One well known approach that relies fundamentally on analogy for producing surface forms—multi-morphemic or otherwise—is the use of connectionist networks (e.g. Rumelhart and McClelland 1987; MacWhinney and Leinbach 1991; Plunkett and Marchman 1993; Daugherty and Seidenberg 1994; among many others). From the analyst’s perspective laid out in section 7.2.1, it is not immediately clear what kind of analogical knowledge a connectionist network uses to produce a complex word—that is, whether it is analogizing to a single whole word, or a collection of subword forms to varying degrees, or something else (see Clark and Karmiloff-Smith 1993; Hutchinson 1994; further discussion in Albright 2002.)

7.2.6 Finding Morphological Bases and Underlying Forms One long-standing issue in capturing morpho-phonology, as well as its acquisition, is the method of choosing bases and underlying representations (URs). How do learners determine the underlying form for a set of surface allomorphs? An important subquestion is whether learners are forced to pick from the surface alternants, such [gʁas] or [gʁæz-] as assumed in the German example of section 7.2.3? (see especially Kenstowicz and Kisseberth 1979). Child production data may provide evidence in this debate, via over-regularizations and paradigm leveling errors. Evidence in one direction comes from Hale (1973), who found evidence from an apparently frequency-driven re-analysis of the passive Maori suffix, supporting the notion that learners are biased to build root URs which match their underived surface allomorphs (Kuryłowicz 1949; Mańczak 1957–58; Venneman 1972; Hooper [Bybee] 1976; Albright 2002.) A similar observation from the acquisition of Polish is that the errors children make with the distribution of vowel-zero alternations (‘jers’) tend strongly towards a constant stem allomorph (seen back in (7c) in data from Łukaszewicz 2006).11 Other evidence suggests some more creative over-generalization across paradigms: for example, Clahsen et al. (2002: 602) report a Spanish-learning child at 3;10 who used the past tense form *punieron ‘put, 3rd pl. past,’ rather than the correct pusieron. This verb is highly irregular (infinitive: poner) so overgeneralization is not surprising—however this particular stem overgeneralization combines the [n]‌ of the infinitive with the [u] from its irregularly-inflected forms, and no target surface form combines these two segments. Another Polish example involves A’s production of diminutive feminine nouns in the genitive plural: Łukaszewicz (2006: 17–18) demonstrates how A’s incorrect choice of the masculine genitive marker [-uf] causes the child’s

11

See also Wójtowicz (1959); Smoczyńka (1985: 639).

128 Anne-Michelle Tessier surface form to undergo further target-language processes, resulting in a completely innovative form: compare target [mru.vɛk] with A’s *[mruf.kuf], ‘ant, gen. pl. dim.’

7.3 Experimental Findings in the Acquisition of Morpho-phonology As seen throughout the previous sections, naturalistic diary studies have revealed much foundational evidence about the nature of morpho-phonological learning. Longitudinal studies which follow a group of children over months or years are particularly crucial to observing and studying U-shaped development, over-regularization, fleeting intermediate stages which may only last a few days or weeks, and any number of infrequent processes that might not be observed in a smaller or cross-sectional sample. Nonetheless, diary studies have some necessary limits which experimental or laboratory studies can overcome. One widely acknowledged limitation is that the linguistic abilities observed in spontaneous speech reflect only some of a learner’s abilities in comprehension. In the present domain, several studies using Preferential Head-Turn paradigms (Hirsh-Pasek et al. 1987; Kemler-Nelson et al. 1995) have found emerging morpho-phonological sensitivities early in the second year of life, despite very limited such evidence from production at this age. Soderstrom et al. (2007) found that at 16 months, English-learning infants are beginning to recognize correct nominal and verbal inflection—preferring to listen to passages like They used to sing in these chairs on the porch rather than *They used to sings in these chair on the porch—and that this preference is diminished but still discernable using nonce nouns and verbs (e.g. *They used to meeps in these nug on the porch).12 These results are particularly relevant because the English voicing alternations in section 7.1.1 create multiple allomorphs for precisely these inflections: compare the third person singular suffixes in sing[z]‌ versus meep[s]. A more direct test of the ability to learn alternations comes from White et al. (2008), in which infants heard pairs of nonsense words whose predictable segmental distributions could be interpreted as an alternation. After training on one of two possible “alternations,” 8.5-month-olds showed a preference for novel nonsense word pairs with same alternation, and by 12 months this preference became more robust. These results suggest that infants can use distributional cues in their input (along with their phonotactic grammars) to identify morpho-phonological alternations shortly after the phonemic inventory has been acquired, and long before production shows control over such morpho-phonology. (See also Gómez 2002; Gómez and Maye 2005, on the acquisition of long-distance phonological regularities which might guide the identification of alternations; related studies are reported in Morgan and Demuth 1996a.) 12

Related work includes Santelmann and Jusczyk (1998); Tincoff et al. (2000); Soderstrom et al. (2002); Höhle et al. (2006).

Morpho-phonological Acquisition 129 A second difficulty in studying child speech corpora is establishing the intended meaning: a transcript of spontaneous speech can demonstrate that a child knows a derived word and can retrieve it online, but not whether they have decomposed some or all of its morphological structure. One well-established approach to studying morphological decomposition is the “Wug-test” (starting with Berko 1958, discussed in the next paragraph), which measures the online productivity of a morphological process using nonce words; from the present perspective, wug-testing a process with multiple phonologically-conditioned allomorphs can thus measure the productivity of morpho-phonology. Wug-tests have since been used to determine the rate and accuracy of acquisition in many different languages, usually, though not exclusively, testing the acquisition of affixing (see e.g. Gordon 1985 on the acquisition of English plurality in compounds). In Berko’s (1958) original study, children were asked to apply inflection like pluralization onto novel roots via experimenter prompting with reference to pictures: for example, “This is a wug. Now here are two. There are two ___” (to elicit “wug[z]‌”). This initial study demonstrated that English-learning children control the two fricative allomorphs with some accuracy by age 4, and between 86–97 percent accuracy by first grade (ages 6 or 7), but that the vowel-initial allomorphs (-əz) remained at only 33–39 percent accuracy even by ages 6 and 7 (see also Derwing and Baker 1986). In fact, Graves and Koziol (1971) reports that third graders remain at 85 percent accuracy at pluralizing sibilant- final singulars. Berko’s (1958) results also suggest that this delay is not purely phonological, since children were more accurate with the possessive [-əz] allomorph than the homophonous plural. Some more recent research has focused on the acquisition of morpho-phonological alternations within roots, such as the devoicing of root-final segments in German or Dutch (section 7.2.3). A series of studies focused on final devoicing and alternations in Dutch (Zamuner et al. 2006, 2012; Kerkhoff 2007) suggest that this learning may be harder, despite its basis in a clear phonotactic restriction. Zamuner et al. (2012) ran a reverse Wug-test, in which 2-to 4-year-olds were given a novel plural word and asked to produce its singular. In this task, children were less likely to correctly provide a singular form for nonce plurals that required devoicing—for example when given a plural sladen whose singular would be [slat]—than for a plural like slaten. In the comprehension tests of Zamuner et al. (2006), however, 2½ and 3½-year-old Dutch-learning children were above chance at connecting singular slat with plural sladen; while learners were worse at predictably drawing this connection with alternating roots than with uniformly voiceless slat~slaten, this might simply reflect the additional difficulty of perceiving and processing the alternation. One reason that root allomorphy should be harder to learn is that it runs counter to the paradigm uniformity pressures found in many adult languages (section 7.1.2). Tessier (2006, 2012) found some initial confirmation that child learners impose paradigm uniformity onto novel words, using an artificial language Wug-test in which 4- year-olds were taught count singular, count plural, and mass noun names in a “alien” language spoken by a puppet, including a novel plural suffix [-dəɫ]. Both plural and mass

130 Anne-Michelle Tessier words in the alien language were made up of two closed syllables, with infrequent or illegal medial consonant clusters designed to elicit production errors—children thus learned mass terms like [zɪtʃ.dɪn] and plurals like [wʌtʃ.dəɫ], learned alongside the singular [wʌtʃ]. Results from 10 children suggest that clusters were produced differently in the two morphological conditions, such that initial syllable codas were more faithfully retained in count nouns like ‘wutchdel,’ where the [tʃ] forms part of the singular base ‘wutch,’ compared to the same [tʃ] in the mass term ‘zitchdin’ with no base * ‘zitch.’ The anecdotal examples noted in section 7.2.3 which suggest that learners impose uniformity on affixes are also supported by some experimental findings. In the Wug-test context, Graves and Koziol (1971) report that 6-year-olds made significantly more errors with nonce stems which require the [-s] allomorph rather than [-z], and that target [-s] affixes were avoided by voicing the stem final segment e.g. /tɹok + ‘plural’/→ [tɹogz]; see also Anisfield and Tucker (1967). An extension of the Wug-test from MacWhinney (1978) was designed to investigate the use of analogy in Hungarian nonce word formation. In this study, children were asked to produce novel morphologically-complex words immediately after hearing real words which could serve as phonological primes for a particular process. The results suggest that these primes were most likely to be followed by an analogy-driven response in the youngest age group (2;6–2;9) and at the oldest age group (6;8–7;5), while the middle groups (ages 3;0–5;1) showed little if any analogy. This latter difference between middle and older age groups may be evidence that analogization becomes more relevant to morpho-phonological processes later in acquisition (see also Kerkhoff 2007 for a similar suggestion about Dutch learners of root-final devoicing). One possible explanation for such a trend is that analogy cannot be used by learners until their lexicon has grown to a size over which analogies can be reliably calculated—see also the correlations between lexicon size and morpho-phonological over-regularization under discussion in Plunkett and Marchman (1991, 1993) and Bates et al. (1994). Some work has used a “receptive” Wug-test to address the potential differences between children’s production and comprehension: see for example Kuzcaj (1978), in which children were asked to judge which morphologically complex forms “sounded silly” when spoken by a puppet. More recently, Skoruppa et al. (2013) have used a picture-pointing task with known words in phrases to study toddlers’ abilities to use morpho-phonology in lexical access. Skoruppa et al. found that 2½-and 3-year-olds were already able to use language-specific phrasal phonology to recognize and point to familiar objects: for example, English-speaking children were more likely to point to a “pen” after hearing the string pem when embedded in an English phonological context for place assimilation: ‘Can you show me the pe[mp]lease?’ rather than in non- assimilatory context: ‘Can you show me the pe[md]ear?’ Experimental work has also investigated whether learners actually acquire proposed phonological relations among allomorphs. An early set of studies with 9-to 12-year- olds by Myerson (1976) used production, memory, and forced choice tasks to investigate English learners’ knowledge of vowel shifts as described in Chomsky and Halle (1968), exemplified in such Latinate stem alternations as serene~serenity and divine~divinity.

Morpho-phonological Acquisition 131 In part these studies emphasize the different results found across tasks: in Myerson’s (1976) production task, children produced vowel shifts only 5 percent of the time; while in comprehension tasks they chose a vowel-shifted form 63 percent of the time over an alternative with a lax root vowel, and in the memory task they also remembered vowel- shifted forms six weeks after testing at a considerably higher rate than vowel-laxing forms. Along with, for example, Moskovitz (1973), these results form a body of early critical work suggesting that SPE’s vowel shift rules are not in fact psychologically-real in the minds of English learners, though the fact that the fairly idiosyncratic English vowel shift rules were better memorized than vowel laxing would appear to reflect some English-specific knowledge of alternations. Related work with adults has also been used to infer properties of child acquisition (though cf. Kiparsky and Menn 1977). These studies have usually aimed at discerning whether regularities in the lexicon are in fact recognized by learners by the time they reach adulthood, again including work on English’s vowel shift rules (Jaeger 1984, 1986; Derwing and Baker 1986). Such studies cast doubt on phonologically-complex analyses of some alternations, as they appear not to be internalized by learners; see for example Bybee and Pardo (1981) on the behavior of Spanish diphthongs in verbal roots. However, the implications of speakers’ indifference to a morpho-phonological generalization instantiated in their lexicon are not necessarily clear-cut. On the one hand, such results have been used to argue that particular alternations are not produced in the minds of speakers by a phonological grammar. On the other hand, other studies argue that it is precisely these failures of generalization which require phonological restrictions on morpho-phonological learning (see especially Becker et al. 2011; see also Albright 2007), with the phonology serving as the learners’ guide to the range of possible or natural alternations within morphological paradigms. A promising development in recent work is the use of computational modeling to interpret developmental trends and asymmetries in morpho-phonological acquisition. Many of these studies rely on constraint-based grammars with inherent variability, and on innovative views of how allomorphy is stored and how URs are constructed. One notable example is Jarosz (2009a), which uses the Maximum Likelihood learner of Jarosz (2006a) to interpret the cross-sectional results on Dutch devoicing and plural allomorphy from Zamuner et al. (2012) discussed earlier.

7.4 Open Questions The results summarized in this chapter cover many basic aspects of morpho-phonological acquisition, primarily from the perspective of Indo-European languages, but much remains unknown or under-investigated. One area that has recently seen very promising investigation is the acquisition of languages like Hebrew whose basic morpho-phonology is fundamentally non-concatenative (see e.g. Adam 2002; Bat- El 2008 and references therein). Such studies make significant contributions both by

132 Anne-Michelle Tessier deepening our understanding of the range of morpho-phonologies to be acquired by learners but also by uncovering similarities among the early grammars of children learning very different target languages, which can provide hints at the initial state of morpho-phonological acquisition. A large class of empirical questions that remain relatively unanswered concern the nature of more “exotic” morpho-phonological phenomena. How do children acquire languages which exhibit productive processes such as morphologically-sensitive tone sandhi, morphemes consisting of “floating” phonological features (particularly given issues discussed in Wolf 2009), phonologically- sensitive reduplication, infixation, consonant mutation, subtractive morphology, and so on? Case studies on the acquisition of individual languages provide initial evidence: see for example the protracted development of Welsh consonant mutation reported in Thomas and Gathercole (2007), and the early studies of tone sandhi in Mandarin (Li and Thompson 1977) and Cantonese (Tse 1978).13 Another under-investigated area is the acquisition of phrasal morpho-phonological alternations (though see initial work in Skoruppa et al. 2013, as well as e.g. Chevrot et al. 2009 on French phrasal alternations), which may reveal children’s biases about the domains of morpho-phonological alternations and processes. Further systematic and cross-linguistic investigation of these individual phenomena will be crucial to our understanding of their position in theories of morphology and phonology, as well as in the minds of learners and adult speakers alike.

Acknowledgments Thanks especially to Joe Pater and an anonymous reviewer for careful reading and many important suggestions, challenges, corrections, and better ideas; thanks also to Adam Albright, John McCarthy, Matt Wolf, Katrin Skoruppa, Marcin Morzycki, and Jan Anderssen for numerous types of help with data.

13 While reduplication stands out as a morpho-phonological process that has received considerable attention in the acquisition literature, it has more often been discussed as a child-specific innovation in the development of languages that lack productive reduplication in the adult grammar: see e.g. the various acquisition studies reported in Hurch (2005).

Chapter 8

Pro cessing C ont i nu ou s Speech in I nfa nc y From Major Prosodic Units to Isolated Word Forms Louise Goyet, Séverine Millotte, Anne Christophe, and Thierry Nazzi

8.1 Introduction Infants acquiring language have to learn about the phonology, the lexicon, and the syntax of their native language. The issue we are going to discuss in the present chapter relates to some of the mechanisms involved in learning a lexicon. A word corresponds to the specific pairing between the mental representation of a sound pattern (word form) and the abstract representation (concept) of an object or event in the world which constitutes the meaning associated to that word form. The building of a lexicon will then rely on the development of three sets of abilities: the ability to elaborate, extract, and store word forms; the ability to build concepts for the objects and events in the world; and finally the ability to appropriately link word forms and concepts. Note that the acquisition of word forms (Jusczyk and Aslin 1995) and concepts (cf., Rakison and Oakes 2003 for an overview) starts before the onset of lexical acquisition per se around the ages of 6– 12 months. It is thus likely that infants in their first year of life constitute a store of word forms and concepts that are later paired to make words. In the second year of life, all these levels of processing could occur simultaneously (see Nazzi and Bertoncini 2003). More specifically we will focus here on the ability to extract word forms from speech (segmentation abilities) since there is evidence that it plays a critical role in the acquisition of the lexicon. This hypothesis is supported by the finding of positive correlations between word segmentation performance and later vocabulary levels (Newman et al.

134 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI 2006) and by the demonstration that newly segmented words are easier to link to new objects at 17 months of age (Graf Estes et al. 2007). It also appears that word form segmentation is a prerequisite for the acquisition of syntax, given that all theories of syntax acquisition presuppose that infants have access to the segmented sequence of words constituting the utterances they hear (Newman et al. 2006). Accessing word forms would not be an issue if word boundaries were clearly marked at the acoustic level or if words were (often) presented in isolation. Regarding the first point, it has been known since Klatt (1979) that word boundaries are not clearly marked in adult-directed speech and that adults typically resort to multiple activation of possible word candidates to segment continuous speech into words (e.g. McQueen et al. 1994; McQueen et al. 1995). Regarding the second point, several studies evaluated the presence of isolated words in the input to English-learning infants (Aslin 1993; Brent and Siskind 2001) and to a Dutch/German bilingual infant (van de Weijer 1998). Their results showed that infant-directed speech consists mostly of multi-word utterances, words pronounced in isolation making up less than 10 percent of all words present in the analyzed corpuses. Infrequent, isolated words might play a role in lexical acquisition as suggested by the fact that exposure to a word in isolation before 12 months predicts later knowledge of that word (Brent and Siskind 2001). However, given that many words appearing in isolation correspond to fillers (yes, hmm), vocatives (“infant’s first name”) and social expressions (hi!), segmentation procedures remain necessary for types of words that do not appear in isolation, especially grammatical words (van de Weijer 1998). Studies exploring segmentation procedures revealed that they emerge early in development. In a pioneering behavioral study, Jusczyk and Aslin (1995) had hypothesized that there might be subtle linguistic cues in the signal allowing infants to segment fluent speech and have shown that segmentation abilities emerge around 8 months of age. In their first experiment, 7.5-month-old English-learning infants were familiarized with two monosyllabic words (cup and dog, or bike and feet) and then heard four passages, each passage built around one of the four words (each of these words was repeated six times in its corresponding passage). The results revealed a preference for the passages corresponding to the familiarized words, indicating that infants had recognized the target words, which in turn implied that they had segmented the words from the passages. Failing to extend this result with 6-month-olds, Jusczyk and Aslin (1995) concluded that word form segmentation abilities emerge between the ages of 6 and 7.5 months. Given these early segmentation results, many studies have started exploring how infants become sensitive to many subtle linguistic cues that signal word boundaries and start using them for speech segmentation (see Table 8.1). The first part of this chapter will focus on information related to major prosodic/syntactic units such as sentences, clauses, and phrases. The boundaries of these units are marked by pauses, lengthening, and pitch variations, and might thus allow infants to parse fluent speech into units made up of just a few words each. In the second part, we will focus on various cues that are present at the word level. We will start by discussing the availability and use of distributional information (for example regarding the order of syllables in the speech stream),

Processing Continuous Speech in Infancy 135 Table 8.1 Cues used by infants for fluent speech segmentation. References are for studies on English-learning infants, unless otherwise specified 1. Phrasal prosody • Major prosodic boundaries

Hirsh-Pasek et al. (1987)

4.5 months

• Intermediate prosodic boundaries

Gout et al. (2004); for French: Millotte et al. (2010)

10.5 months

• Prosodic word boundaries

Johnson (2008)

12 months

2. Distributional information regarding syllable order Saffran et al. (1996); Johnson and Jusczyk (2001); Thiessen and Saffran (2003); for Dutch: Johnson and Tyler (2009); for French: Mersad and Nazzi (2012)

5.5–8 months

• Trochaic unit in stress-based languages (English, German, Dutch, …)

Jusczyk et al. (1999b); for Dutch: Houston et al. (2000), Koiijman et al. (2005, 2009)

7.5 months

• Syllabic unit in syllable- based languages (French, Spanish, Italian …)

for French: Nazzi et al. (2006); Goyet et al. (2010); Goyet et al. (2013); Nishibayashi et al. (in press)

8 months

• Coarticulation

Johnson and Jusczyk (2001)

8 months

• Phonotactic information

Mattys and Jusczyk (2001a) Gonzalez Gomez and Nazzi (2013)

9 months

• Allophonic information

Jusczyk et al. (1999a)

10.5 months

• Nature of initial phoneme (consonant versus vowel)

Mattys and Jusczyk (2001b); Nazzi et al. (2005)

8 months

• known content words

Bortfeld et al. (2005); Mersad and Nazzi (2012)

6 months

• known function words

Shi et al. (2006); for French: Shi and Lepage (2008), Hallé et al. (2008)

8 months

3. Rhythmic unit cue

4. Phoneme-level cues

5. known words

since such information is often considered to be language-general. Following this, we will discuss the use of language-specific cues, starting with a second kind of prosodic cue, namely the rhythmic units that define the different overall linguistic rhythms. We will then turn to other bottom-up cues such as allophonic cues (corresponding to the fact that the acoustic realization of some phonemes depends on whether they are at the border or inside a word) or phonotactic cues (constraints on the phonetic sequences

136 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI allowed at the lexical level, which provide information on the likelihood that two sounds belong to the same word). We will end this section with a discussion of the role of known or isolated words for segmentation, the idea being that the first words infants learn can help segmentation, probably by acting as anchor points that indicate which sound sequence next in the sentence is a word-like unit. Finally, in the third section of this chapter, we will discuss studies that have explored the interactions between the use of various segmentation cues.

8.2 Segmentation of Prosodic Units Phrasal prosody is carried by variations of melody, rhythm, and intensity in spoken sentences. It has been claimed that perceiving prosodic information could help young infants to bootstrap their acquisition of language (see for instance the prosodic bootstrapping hypothesis, Morgan and Demuth 1996b). Within sentences, words combine together to form syntactic phrases, and phrases constitute clauses. In the same way, within the hierarchical prosodic structure (see Shattuck-Hufnagel and Turk 1996 for a review), prosodic words combine to form phonological phrases, and phonological phrases constitute intonational phrases. Prosodic word boundaries always correspond to word boundaries, and phonological phrase boundaries always correspond to word and syntactic boundaries. Thus infants could rely on prosodic cues to find some word boundaries, as well as some syntactic boundaries, in fluent speech.

8.2.1 Infants’ Perception of Phrasal Prosody Previous research demonstrated that even newborn infants perceive the cues that correlate with prosodic units (Christophe et al. 1994; Christophe et al. 2001). Young infants are sensitive to the acoustic cues that correspond with clause boundaries. For instance, they have been shown to react to the disruption of intonational phrases in whole sentences by 4.5 months of age (see Jusczyk 1997 for a review). Using the pause-insertion manipulation, the first studies showed that young infants preferred listening to correctly segmented sentences, where pauses were inserted at intonational phrase boundaries that coincide with clause boundaries, rather than to incorrectly segmented sentences where pauses were inserted within clauses (Hirsh-Pasek et al. 1987; Kemler Nelson et al. 1989). Further studies showed evidence that this sensitivity to major prosodic cues actually plays a role in how young infants process spoken sentences, in how they package, recognize, and memorize relevant units that are embedded in continuous speech (Mandel et al. 1994; Nazzi et al. 2000b; Seidl 2007). Using the Head-turn Preference Procedure, Nazzi et al. (2000b) familiarized 6-month-old American infants with two versions of the same strings of words (i.e. “rabbits eat leafy vegetables”), one corresponding to a well-formed prosodic unit (words were produced as a complete clause as in “Many

Processing Continuous Speech in Infancy 137 animals prefer some things. Rabbits eat leafy vegetables. Taste so good is rarely encountered”), the other one corresponding to an ill-formed version (words belonged to two different sentences and spanned a clause boundary as in “John doesn’t know what rabbits eat. Leafy vegetables taste so good”). Next, infants listened to the two test passages and showed a preference for listening to the one that contained the prosodically well- formed unit over the one that contained the ill-formed sequence. This indicates that 6- month-olds package the speech stream into prosodically well-formed units which allow them to better recognize familiarized strings of words even when they are embedded in longer passages (see also Schmidtz et al. 2003 for results with German infants, as well as Johnson and Seidl 2008 for Dutch infants). Other studies have replicated these findings with smaller units (phrases) and showed that young infants between 6 and 9 months of age also react to the disruption of phonological phrases embedded within whole sentences (Jusczyk et al. 1992; Gerken et al. 1994) and that they can use this sensitivity to recognize phrasal units, such as noun and verb phrases, in fluent speech (Soderstrom et al. 2003; Soderstrom et al. 2005).

8.2.2 Phrasal Prosody Constrains Word Segmentation Infants’ sensitivity to prosodic units could help them to segment continuous speech into words, since prosodic boundaries always align with word boundaries. Indeed, Seidl and Johnson (2006, 2008) used the head-turn preference paradigm to test English-learning infants’ ability to segment words from continuous speech, and contrasted words that were either utterance-initial or utterance-final, with words that occurred in the middle of an utterance. They observed that words occurring at utterance edges were more readily segmented than utterance-medial words. Thus, 8-and 11-month-old infants exploit their ability to perceive large prosodic units (here, intonational phrases) in order to segment speech into words. Further studies showed that infants were also able to rely on intermediate prosodic units, namely phonological phrases. Thus, Gout et al. (2004) used the conditioned head- turning technique and trained 10-and 13-month-old American infants to turn their head for a target bisyllabic word (e.g. “paper”): in the test session, infants responded more often to sentences that actually contained the target word (as in “[The scandalous paper] [sways him] [to tell the truth]”) than to sentences that contained both of its syllables separated by a phonological phrase boundary (as in “[The outstanding pay] [persuades him] [to go to France],” where square brackets mark phonological phrases). Thus in infants, just as in adults, a phonological phrase boundary is perceived as signaling the end of the current word (see e.g. Christophe et al. 2004, for results on French adults). More recently, these results were confirmed with French infants of 16 months of age (Millotte et al. 2010). A last study demonstrated that 12-month-old infants were also capable of relying on prosodic word boundaries in order to figure out probable word boundaries (Johnson, E. K. 2008). In this study, English-learning infants were familiarized with passages

138 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI containing either a target word (e.g. toga from toga#lore), or a non-target for which both syllables were present, but separated by a prosodic word boundary, so that the word itself was not present (e.g. dogma from dog#maligns). At test, infants listened longer to the target word (the word that was present), than to either the non-target (the word that was absent while both its syllables were present) or completely unfamiliar words. This study thus demonstrates the ability of 12-month-old American infants to use prosodic word boundaries to segment words from continuous speech (see also Millotte et al. 2007a, for adult results on French). Overall, this series of studies demonstrates that young infants, within their first year of life, are able to process phrasal prosody, and that they exploit prosodic boundaries to segment words out of the continuous speech stream. Since prosodic boundaries correspond to syntactic boundaries, they may also provide infants with information regarding the syntactic organization of sentences (Christophe et al. 2008). Recent experimental work with adults shows that phonological phrase boundaries are exploited on-line to constrain syntactic parsing (Millotte et al. 2007a, 2008). Since infants have an early access to prosodic boundaries and use them to constrain lexical access, it is thus possible that they also use them for syntactic parsing (Choi and Mazuka 2003). These prosodic units typically contain more than one word, and as a result, infants need to apply additional segmentation procedures in order to identify individual words. In the remainder of this chapter, we will examine these other word segmentation procedures which apply within prosodic units.

8.3 Segmentation Cues at the Word Level In this section, we will discuss how infants find word forms within prosodic units. It has been found that many cues are used, the two most studied ones being distributional information and rhythmic units. The study of these two cues has benefited from increased interest because they both offer a solution to the issue of the bootstrapping of segmentation abilities. Indeed, most cues used by infants to segment are language-specific, and would thus seem to require knowledge of some words in the native language before becoming operative. This is not the case for distributional-based segmentation, which is usually thought of as (at least) a language-general process. As for the use of rhythmic units, it has been proposed that the one appropriate to the native language could be acquired at the sentence level. After having presented data on these two cues, we will present studies exploring the use of other language-specific bottom-up cues, and then the use of top- down information.

Processing Continuous Speech in Infancy 139

8.3.1 Transitional Probability (TP) Cues to Word Segmentation One much investigated kind of word boundary information corresponds to what has been called “statistical” or “distributional” information. This proposal is based on the assumption that there should be regularities in the order of appearance of sounds in the input, whether at the level of consecutive phonemes or that of consecutive syllables, which should mark if two sounds are likely to belong to the same word or to be at the boundary between two words. In this perspective, many studies have investigated different instantiations of this general idea. For example, Brent and Cartwright (1996) have conducted computational simulations on child-directed corpora establishing that “sound sequences that occur frequently and in a variety of contexts are better candidates for the lexicon than those that occur rarely or in few contexts.” Moreover, Perruchet and Vinter (1998) have proposed that on-line attentional processing of the input, relying on basic laws of memory and associative learning, could parse word-like units from a continuous input; supporting data were obtained for artificial languages via PARSER, a computer-based instantiation of their model. Another distributional-based approach is illustrated by a recent study by Hockema (2006), investigating whether statistics made at the level of single phoneme transitions are informative in terms of word boundaries. Phoneme transition pairs (PTPs) were identified in infant-and child-directed corpora, and classified along a continuum of probability of occurrence at a word boundary. The results showed that the vast majority of PTPs can be divided into two distinct groups: those that occur within words and those that occur across word boundaries. Furthermore, it was established that PTPs could be used reliably to find words in those corpora. However, the more influential instantiation at present of this distributional approach is the one based on transitional probabilities (TPs) at the syllabic level. In most studies on TPs, the definition of TPs is one of forward transitional probabilities between consecutive syllables (say, X and Y), in which TP is defined as the frequency with which X follows Y in the language divided by the frequency of X:

TP =

Freq ( XY ) Freq ( X )

Following this definition, Saffran et al. (1996a) investigated the prediction that infants used TPs to retrieve words and word boundaries. In that study, English-learning 8- month-olds heard two minutes of a continuous speech signal that was made by the concatenated repetition of four different trisyllabic words presented in randomized order. Then, infants were presented with words from that language versus either non-words or part-words (sequences of syllables that had occurred in the signal but straddled a word boundary) that had lower TPs. In both cases, infants listened less to the words, suggesting that they had retrieved the words from fluent speech and were processing

140 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI them differently than non-words or part-words. This basic finding was replicated in many studies in English (in particular, Aslin et al. 1998), as well as in other languages (for Dutch-learning 8-month-olds, Johnson and Tyler 2010; for French-learning 8-month- olds, Mersad and Nazzi 2012). Curtin et al. (2005) have also investigated the assumption that consecutive syllables should be more cohesive within than across words by calculating TPs for pairs of syllables corresponding to real words versus non-words in an infant-directed corpus. They found that TPs were much higher within words than across word boundaries (.64 versus .15 for tokens; .55 versus .12 for types). Additionally, they established that taking stress level (whether a syllable is stressed or not) into account led to increased differentiation of words and non-words. It thus appears that from a very young age, infants have the ability to use transitional probabilities between consecutive syllables to retrieve word boundaries and (or?) group syllables together. Importantly, TP-based word segmentation was found to be linked to subsequent word–object mapping. After hearing an artificial language for 2 minutes, 17-month-old infants were found to succeed at learning new label–object pairings when the labels corresponded to words in the artificial language, but not when they corresponded to non-words or part-words (Graf-Estes et al. 2007). Following up on these original results, many studies have explored TP use in infant and adult processing. These studies, that used for the most part a methodology similar to that of Saffran et al. (1996a), found that TPs are used in both populations not only to process speech stimuli, but also to process musical tones (Saffran et al. 1999) or visual patterns (Fiser and Aslin 2002a). Use of TPs was also established in non-human mammals such as cotton-top tamarins (Hauser et al. 2001), and rats were found to be sensitive to co-occurrence frequencies (Toro et al. 2005). It thus appears that the use of TPs could be domain-general, so that TPs could apply to different kinds of stimuli, and be non specific to humans. These are important features since they open the possibility that TPs could be used very early in development, even before infants have started specifying language-specific features of their native language. In this perspective, TPs are regarded as a potentially powerful learning mechanism that could be used to bootstrap word segmentation abilities. However, in order to support this claim, one needs to establish that TPs can be used in more complex and natural language-like contexts than those originally used. Indeed, in the original studies, infants were presented with very simple languages, made up of only four words that were all trisyllabic. Thiessen et al. (2005) were the first to address this issue. In that study, the language presented consisted of four different words of variable length: two bisyllabic words and two trisyllabic words. Moreover, rather than presenting 2 minutes of concatenated speech, twelve sentences of continuous speech were constructed, so that each contained the four concatenated words. The sentences were presented separated by silences. Both features made those stimuli more akin to natural languages. The stimuli were spoken either in infant-directed or adult-directed speech. In these conditions, in which TP and at least sentence boundary cues were present, infants could segment the words, though only in the infant-directed condition and not in the adult-directed condition.

Processing Continuous Speech in Infancy 141 The failure to find positive segmentation evidence with the adult-directed speech was surprising, and points to potential limits in the use of TPs to segment speech. However, the authors pointed out that a possible cause of failure was the fact that the target words were repeated less often in their study than in previous studies. More recently, the question of possible limits to the use of TPs was reassessed. Johnson and Tyler (2010) presented Dutch-learning 5.5-and 8-month-olds with a continuous artificial language made up of four words of mixed length (two bisyllabic words, two trisyllabic words). Although the procedure was similar to that of Saffran et al. (1996a), that is the words were repeated as often as in that previous study, no segmentation effect was found. Similar results were found for French-learning 8-month-olds (Mersad and Nazzi 2012). Taken together, both studies establish limits in the use of TPs in infancy, and suggest that TPs might be a less powerful cue than initially proposed. In more difficult languages (with words of mixed-length rather than uniform length, and without any phrasal prosody information), infants appear to have trouble using TPs. Does this mean that TPs are not useful for word segmentation outside the laboratory, when using complex languages? Several studies suggest the contrary, at least when using infant-directed speech. For example, the study of Jusczyk et al. (1999b), intended to establish the role of prosody in early word segmentation, also indirectly points out to 7.5-month-old infants use of TPs. This is illustrated, for example, by the fact that 7.5-month-old infants tested with passages containing trochaic words such as DOCtor showed a segmentation effect if they had been familiarized with the whole word, but not if they had been familiarized solely with its initial syllable DOC. A possible interpretation of this finding is that 7.5-month-olds detect the co-occurrence of the two syllables and treat them as a bisyllabic unit (the pattern for weak–strong words at the same age is different, for reasons that will be clarified later on). More recently, two studies by Pelucchi et al. (2009a, 2009b) presented English-learning 8-month-olds with passages of Italian containing pairs of syllables that had internal TPs of either 1 (they constituted a word), or .33 (they straddled boundaries between different words). In the first study, forward TPs were manipulated, as done in other studies; in the second study, backward TPs were contrasted. In both cases, infants treated differently the two types of pairs of syllables, thus showing that they were sensitive to the TP properties of the stimuli. These last studies show that TPs can be used to retrieve words in complex natural language stimuli, contrary to the fact that they could not be used in “complex” artificial languages (made up of words of different length). The comparison of both lines of research suggests that while there are limitations to TP use, the richness of natural language (which complements its complexity), and in particular of infant-directed speech, enables the use of TPs to segment words. Therefore, TP processing might be a sufficiently robust process to scale-up to aspects of natural language acquisition when presented in the presence of other cues, such as phrasal prosody or stress pattern information (Pelucchi et al. 2009a). In the next section, we consider other cues present in language that are also used to segment speech. Following this, we will address the issue of how the different cues are used in combination.

142 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI

8.3.2 Rhythmic Units A second word boundary cue that has received a lot of attention relates to the rhythmic unit of the native language. Its use for segmentation has been formalized in the early rhythmic segmentation hypothesis (Nazzi et al. 1998a, 2000a, 2006). This hypothesis states that infants could learn the rhythmic segmentation procedure appropriate to the rhythmic unit of their native language before they know any words of their language. This acquisition would first entail the acquisition of the rhythmic unit of the native language, and then the emergence of segmentation abilities based on that unit. The early rhythmic segmentation hypothesis is first based on evidence that there are different rhythmic classes of languages (Pike 1945; Abercrombie 1967), three rhythmic classes having usually been considered: the stress-based class (including languages such as English, Dutch, German, …), the syllable-based class (including French, Spanish, Korean, …) and the mora-based class (including Japanese, Telugu, …). An underlying rhythmic unit is associated to each of these language classes: the stress unit, the syllable, and the mora1 respectively. Importantly, there is a hierarchical relationship between these three units: stress units are made up of syllables that are themselves made up of morae. Second, this hypothesis relies on data showing that adults segment speech according to the rhythmic unit of their native language, and that these segmentation procedures are deeply embedded in the adults’ native language-specific abilities to the point of being used when listening to stimuli in a foreign language (Mehler et al. 1981; Cutler et al. 1986; Cutler and Norris 1988; Otake et al. 1993; McQueen et al. 1994). These procedures are thus likely to have been acquired at an early age. Third, it is supported by the fact that newborns and young infants show sensitivity to rhythm as attested by language discrimination abilities (Mehler et al. 1988; Moon et al. 1993; Mehler et al. 1996; Nazzi et al. 1998a; Nazzi et al. 2000a), and this early sensitivity to linguistic rhythm at the utterance level might allow the rapid acquisition of the underlying rhythmic unit of the native language. It was thus proposed that rhythmic segmentation procedures (maybe in conjunction with distributional information) would allow infants to start segmenting their first sound patterns (some being erroneous, as shown by Jusczyk et al. 1999b) and then progressively start specifying other language-specific word boundary cues (allophonic, phonotactic, …). This early rhythmic segmentation hypothesis thus offers a solution to the problem of the emergence of different segmentation abilities for different linguistic backgrounds. Crucially, it predicts different developmental trajectories of segmentation abilities for languages of different rhythmic types. In the following, we review evidence in favor of this hypothesis. First, we will present data showing the acquisition of the rhythmic unit of the native language. Then, we will focus on cross-linguistic studies,

1 The mora is a rhythmic unit that can either be syllabic or subsyllabic. In English, a mora roughly corresponds to a CV syllable with a short vowel (e.g. “the” as opposed to “thee,” which has a long vowel). In Japanese, CV syllables with long vowels and syllables with final nasals (like the first syllable in “Honda”) or final geminate consonants (like the first syllable in “Nissan”) have two morae.

Processing Continuous Speech in Infancy 143 which attest infants’ use of different rhythmic units for segmentation in stress-based and syllable-based languages.

8.3.2.1 Rhythmic Unit Acquisition At the word level, prosody is related to the way stress and intonation are affected by position within the word. For example, a majority of words are stressed in initial position in English (Cutler and Carter 1987; Kelly and Bock 1988; Cassidy and Kelly 1991) and in German (Höhle ad Weissenborn 2003; Höhle et al. 2009), while in French, syllable lengthening is observed at the end of words in phrase final positions (Delattre 1966; Fletcher 1991). This is compatible with the fact that the rhythmic unit of English and German is the trochaic unit (corresponding to word-initial stress in bisyllabic words such as PORter), while the unit of French is the syllable. Existing studies show that infants learning a stress-based language become sensitive to the rhythmic unit of their native language at a young age. For instance, a preference for trochaic words over iambic (word-final stress, such as rePORT) words emerges between the ages of 6 and 9 months in English-learning infants (Jusczyk et al. 1993a; Turk et al. 1995). New evidence from German-and French-learning infants suggest that this preference might actually emerge earlier (between 4 and 6 months) and results from exposure to the native language rather than from the emergence of a language-general trochaic bias (Friederici et al. 2007b; Höhle et al. 2009). The Höhle et al. (2009) study shows that the trochaic unit is acquired before the onset of segmentation abilities in German-learning infants, a crucial prediction of the rhythmic segmentation hypothesis. In the following, we present further studies evaluating the second prediction of this proposal that different rhythmic segmentation procedures are put into place according to the infants’ linguistic background, namely the rhythmic class of their native language. Such an evaluation inherently requires a cross-linguistic approach. We start with data on stress-based languages, and then present data on syllable-based languages.

8.3.2.2 Segmentation of Trochaic Units in Stress-based Languages Behavioral studies The research showing English-learning infants’ early use of rhythmic information for segmentation can be illustrated by Jusczyk et al.’s (1999b) study. Their results show that 7.5-month-olds segment trochaic (strong–weak) words such as DOCtor, whether words or passages are presented first. However, 7.5-month-olds, though not 10.5-month-olds, missegment iambic (weak–strong) words such as guiTAR, placing a word boundary between the initial/weak and final/strong syllables (e.g. gui /TAR). Many other studies have confirmed that English-learning infants use the prosodic information by 8 months of age to segment fluent speech into trochaic units (Morgan and Saffran 1995; Echols et al. 1997; Johnson and Jusczyk 2001; Houston et al. 2004; Curtin et al. 2005; Nazzi et al. 2005). An advantage of trochaic words was even found for English verbs, even though most English bisyllabic verbs have an iambic stress pattern (Nazzi et al. 2005). This last

144 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI finding suggests that although some acoustic and phonological properties distinguish nouns and verbs (Kelly 1992), the trochaic bias is applied to all lexical categories in this language. With respect to other stress-based languages, segmentation of monosyllabic words by German-learning infants emerges between 6 and 8 months (Höhle and Weissenborn 2003). More recent research further suggests the existence of a trochaic bias in German, similar to the one previously found for English: German-learning infants have been found to segment trochaic words by 9 months, while they still fail to segment iambic words by 11 months (Höhle and Weissenborn 2003). For Dutch, infants have been found to start segmenting trochaic words between 7.5 and 9 months of age (Kuijpers et al. 1998; Houston et al. 2000). However, no behavioral study investigated the segmentation of iambic words in this language, for which the rhythmic hypothesis predicts a later segmentation onset age. This prediction of the early rhythmic segmentation hypothesis was nevertheless recently evaluated using an electrophysiological method (evoked response potentials, or ERPs), a method that also provides information regarding the on-line time course of cognitive processes.

Electrophysiological (ERP) studies One series of experiments used ERPs to study word segmentation abilities by infants learning a stress-based language: Dutch (Kooijman et al. 2005, 2009; Kooijman 2007). In those experiments, infants heard sequences of ten repetitions of an isolated word, followed by eight sentences containing either that target word or a different (called control) word. For trochaic words, differences in ERP responses to target versus control words time-locked to word onset were already observed at 7 months (while segmentation is not found before 9 months when using HPP, Kuijpers et al. 1998). However, the results were much more robust at 10 months. At that age, ERPs to target words showed a greater negative deflection in the 350–500 ms window after word onset than did ERPs to control words, in particular on frontal electrodes and electrodes over the left hemisphere. This ERP pattern has been identified as a marker of word form recognition in infancy (e.g. Mills et al. 1993; Thierry et al. 2003; Friedrich and Friederici 2008). The Kooijman et al. (2005) results thus attest rapid recognition, following segmentation, of words previously presented in familiarization, at least when it comes to trochaic words. However, different results were found for iambic words, raising questions about whether these Dutch-learning 10-month-olds are segmenting onsets of words or onsets of trochaic units (Kooijman et al. 2009). Indeed, while a familiarity effect with the same polarity as the one found for trochaic words was found, that effect appeared to be time-locked not to the onset of the initial weak syllable, but to the onset of the final strong syllable (significant familiarity effect in the 370–500 ms window after the onset of the strong syllable; while the effect time-locked to onset of initial weak syllable, was only marginally significant for 680–780 ms windows). These results show that Dutch- learning 10-month-olds segment differently trochaic and iambic words, and provide indirect support to the rhythmic-based proposal of trochaic segmentation in stress- based languages.

Processing Continuous Speech in Infancy 145

8.3.2.3 Segmentation of Syllables in Syllable-based Languages (French) Behavioral studies Taken together, the results on stress-based languages reported in section 8.3.2.2 are compatible with the proposal that the trochaic unit is used by infants learning a stress-based language at a young age. In this section, we present studies exploring the role of the syllabic unit in early segmentation by infants learning a syllable-based language: French. With respect to French, two studies investigated the emergence of segmentation abilities in either Parisian (Gout 2001) or Canadian-French (Polka and Sundara 2012) infants. Gout (2001) found that Parisian infants segment monosyllabic words at 7.5 months but could not establish that they segment bisyllabic words between 7.5 and 11 months of age, bringing initial support for syllabic segmentation in syllable-based languages. Regarding the studies with infants learning Canadian French, segmentation of bisyllabic words was evaluated at 8 months in two separate experiments, by presenting stimuli either in Canadian or Parisian French (Polka and Sundara 2012). This study suggested that Canadian French infants segment whole bisyllabic words as early as 8 months of age, indirectly challenging the rhythmic-based hypothesis. In this context, Nazzi et al. (2006) conducted a study to directly evaluate the hypothesis that French-learning infants use the rhythmic unit of French (the syllable). In particular, they evaluated the prediction that bisyllabic words are first segmented as independent syllables and only later as whole units, by investigating how bisyllabic words, inserted in fluent passages, are segmented at different ages: 8, 12, and 16 months. Infants were familiarized either with two bisyllabic words (e.g. putois and toucan) or with their final syllables (e.g. tois and can) or their initial syllables (e.g. pu and tou). At 8 months, no segmentation effects were obtained. At the age of 12 months, no segmentation effect was found following whole word familiarization, while a segmentation effect was found following final syllable familiarization and, under certain conditions, initial syllable familiarization. Moreover, at 16 months, a segmentation effect emerged after whole word familiarization but could not be found any longer after final syllable familiarization. Taken together, these results show that at 12 months, French-learning infants independently segment the two syllables of a bisyllabic word, in spite of the fact that these syllables always appear consecutively in the signal. Yet, by 16 months, this distributional information (and probably other segmentation cues) seems to be taken into account. These results support the rhythmic hypothesis, but appear at first sight to contradict those of Polka and Sundara (2012), given that the data on Canadian French infants established segmentation of bisyllabic words as early as 8 months. A parallel investigation (Nazzi et al. 2014), in which Parisian infants were tested on Polka et al. (2012) Canadian and Parisian French stimuli, confirmed the failure to find whole word segmentation effects at 8 months. However, when 8-month-olds were tested in the reversed passage- word order (Nazzi et al. 2014), whole word segmentation was found for the Parisian stimuli containing the bisyllabic words. Success in this condition might result from the

146 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI fact that hearing the passages during familiarization for 30 seconds gave infants time, after performing an initial syllable-based segmentation, to compute some distributional analysis of syllable order. Thus, syllabic segmentation would be computed at 8 months of age in French, but consecutive syllables could also be processed as a bisyllabic unit with enough distributional information. This interpretation was assessed in a new behavioral study with 8-month-old Parisian infants (Goyet et al. 2013). Infants were familiarized with two passages, and tested with two lists of isolated target syllables and with two lists of isolated control syllables. In one condition, each passage contained four repetitions of two different bisyllabic words that shared the same target syllable (e.g. “di”), in either initial (“diva” -diva-) or final position (“radis” -radish-); in the second condition, eight different words sharing the same syllable were used (“dîner” -dinner-, “dizain” -a ten line stanza-, “divan” -couch-, “ditto” -ditto-, “caddie” -caddie-, “bandit” -outlaw-, “taudis” -slum-, “radis” -raddish-). Then, infants were both tested with the target syllable (“di”) and with a control one (e.g. “po”). The goal of this study was to manipulate TP information, by altering the number of syllables that preceded or followed the target syllables. Segmentation was found only when eight different words were used, that is in the condition of lowest TPs. These results first unambiguously establish syllabic segmentation abilities in French-learning 8-month-olds (syllabic segmentation was later extended to French-learning 6-month- olds; Nishibayashi et al. in press). Second, with the results by Nazzi et al. (2014), they show that these syllabic units can already be recombined into bisyllabic units at that age when other cues (such as TPs) are present. Taken together, these new studies establish that the segmentation advantage of Canadian French infants over Parisian infants is less important than initially suggested by Polka and Sundara’s (2012) and Nazzi et al.’s (2014) results. Because the HPP technique seems to be highly sensitive to small methodological changes, such as the order of presentation of the stimuli (word-passage versus passage-word) or the duration of the familiarization phase, additional studies were conducted using an electrophysiological method: ERP.

Electrophysiological (ERP) studies Two ERP studies (Goyet et al. 2010) were conducted in French in order to further evaluate the early rhythmic segmentation hypothesis for this syllable-based language. These studies also re-evaluated the issue of whole word segmentation by testing French- learning 12-month-olds (the age at which syllabic segmentation was found, Nazzi et al. 2006) on bisyllabic word segmentation. Accordingly, infants were either familiarized with whole bisyllabic words or with the final syllables of these words, and then presented with sentences containing these target words or control words. The ERP data showed evidence of whole word segmentation (see Figure 8.1, left panel), while also confirming that French-learning infants rely on syllables to segment fluent speech (see Figure 8.1, right panel). Thus, they establish that French-learning 12-month-olds are able to use both syllabic-and TP-based segmentation (complementing the behavioral data at 8 months by Goyet et al. 2013). They further establish that segmentation and recognition

Processing Continuous Speech in Infancy 147 of words/syllables happen within 500 milliseconds of their onset, as found by Kooijman et al. (2005) for Dutch. Taken together, the results on English, German, Dutch, and French establish that infants use the rhythmic unit specific to their native language to segment fluent speech from the onset of segmentation abilities. Future studies should extend this work to more languages of different rhythmic types, including more syllabic languages (at present there is only French), as well as some mora-based languages. In the two following sections, we turn our interest to other bottom-up (allophonic, phonotactic, and coarticulation) cues, and to top-down (isolated words) cues, which were also found to play a role in word segmentation.

8.3.3 Phoneme-level Bottom-up Cues If TP-and rhythmic-based segmentation procedures have both been proposed as possible solutions to the bootstrapping issue of segmentation abilities, many studies show that from 8 months onwards, infants use many other bottom-up cues to word segmentation. Allophonic variation constitutes a kind of cue to word boundaries since it refers to the fact that the acoustic realization of some phonemes depends on whether they are at the border of or inside a word. For example, in English, the realization of the phonemes /t/and /r/differ in the word nitrate and in the sequence night rate, and infants as young as 2 months of age are sensitive to such variations (Hohne and Jusczyk 1994). By 10.5 months of age, infants start to use allophonic information alone to segment word forms, while younger infants can only use such information if also supported by distributional information (Jusczyk et al. 1999a). Phonotactic constraints, which refer to the phonetic sequences that are allowed at the lexical level, provide another cue to word boundaries. For example, the sequence /zt/for English, or the sequences /kf/or /vg/for French, cannot be found within words. Their presence in the speech stream would thus signal the presence of a word boundary. Infants become sensitive to the phonotactic properties of their native language between the ages of 6 and 9 months (Jusczyk et al. 1993b; Jusczyk et al. 1994; Mattys et al. 1999; see also Friederici and Wessels 1993, for Dutch; Sebastián-Gallés and Bosch 2002, for Catalan; Nazzi et al. 2009; and Gonzalez Gomez and Nazzi 2012, for French) showing a preference for legal (or frequent) sequences of phonemes over illegal ones. By 9 months of age, infants start to use phonotactic information for segmentation (Mattys and Jusczyk 2001a, for English; Gonzalez Gomez and Nazzi 2013 for French ). Coarticulation refers to the fact that the production of a phoneme is influenced by neighboring phonemes. Coarticulation information was found to play a role in segmentation by the age of 8 months in English-learning infants (Johnson and Jusczyk 2001). It might also be that the observation in French-learning infants of a syllabic effect for monosyllables at 7.5 months (Gout 2001), and no syllabic effect for individual syllables of bisyllabic words at 8 months (Nazzi et al. 2006) in the word-passage

Word/initial syllable onset (1)

–4 mV

–6 mV

Left

Right

Anterior quadrant

200 ms

Right

Left

Posterior quadrant +6 mV

+4 mV Final syllable onset (1)

–4 mV

Left

(2)

Right

Left

Anterior quadrant

Target items Control items

–6 mV

Right

200 ms

Posterior quadrant

+4 mV

+6 mV

Target items Control items

Whole word familiarization Word/initial syllable onset (1)

–4 mV

Left

(2)

Right

Anterior quadrant

Left

–6 mV

200 ms

Right

Posterior quadrant

+4 mV

Target items Control items

+6 mV

Final syllable onset –4 mV

–6 mV

(1)

(2)

Left

Right

Anterior quadrant

Left

200 ms

Right

Posterior quadrant +6 mV

+4 mV

Final syllable familiarization

Target items Control items

Processing Continuous Speech in Infancy 149 condition is the result of increased coarticulation between the two syllables of the bisyllabic words.

8.3.4 Top-down Cues: Known Nouns and Function Words So far, the discussion has focused on infants’ use of bottom-up segmentation procedures relying on the presence of acoustic/phonetic cues. However, as mentioned earlier, infants do hear a few (though not many) words in isolation, and a few function words are very frequent in infant directed speech. It is conceivable that these word- forms are stored, and later used to perform another kind of segmentation: top-down segmentation. The Incdrop model (Brent and Cartwright 1996) was proposed in this top-down segmentation perspective. The model states that infants will memorize an incoming utterance as a whole unit (e.g. dopuneribo) unless it contains a sequence that has been previously memorized (e.g. ne). In this case, the memorized item is used to segment the new utterance, resulting in the memorization of the complementing units (e.g. dopu and ribo). Evidence for this model was obtained through computer simulation (Brent and Cartwright 1996), analysis of the link between isolated words in maternal input and later word production (Brent and Siskind 2001), and through studies looking at adults’ acquisition of artificial languages (Dahan and Brent 1999). More recently, two studies provided evidence that known content words can facilitate word-form segmentation by infants. First, English-learning infants were found to segment unfamiliar words by 6 months of age (as opposed to 7.5 months in Jusczyk and Aslin 1995) if these words were preceded by very familiar words, such as the infants’ name, or the word mommy (Bortfeld et al. 2005). Second, while 8-month-olds were found not to be able to use TPs to segment fluent speech made up of words differing in the number of syllables (Johnson and Tyler 2010; Mersad and Nazzi 2012), they could do so if one of the words of that stream was the known word maman (mommy in French, Mersad and Nazzi 2012). If this type of segmentation based on known content words necessarily plays a limited role at the onset of lexical acquisition when infants have only memorized a few words, its role is likely to grow as the infants’ vocabulary increases.

Figure 8.1 Segmentation results using ERPs, following familiarization with whole bisyllabic words (left panel) or the sole final syllables (right panel). The ERPs are time-locked to the onset of the word/initial syllable (top panel) and the final syllable (bottom panel) of the bisyllabic words in their sentence contexts. For each panel: (1) mean (and SE) ERP amplitude difference (target— control words) for the 350–500 ms window, broken down by quadrants; (2) Grand average ERPs for electrode 58[FC8] (right anterior quadrant) for target and control words; the grey area indicates the 350–500 ms window.

150 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI In addition, infants may also rely on their ability to identify function words to segment neighboring content words. Indeed, function words are extremely frequent, and typically occur at the edges of syntactic units, and therefore also at the edges of prosodic units (Shi et al. 1998; Gervain et al. 2008; Hochmann et al. 2010). Infants may thus be able to identify the function words of their native language early on, and several studies have demonstrated knowledge of function words in infants, starting at 8 months of age (Shi and Gauthier 2005; Shi et al. 2006a; Shi et al. 2006b). Once function words are identified, infants may store the syllables following them in the signal as possible words (Christophe et al. 1997). Two recent experimental studies provide evidence that young infants do precisely this (Hallé et al. 2008; Shi and Lepage 2008). Hallé et al. (2008) showed that 11-month-old French-learning infants recognize known words when they are presented in the context of an appropriate function word (i.e. an article, as in les chaussures ‘the shoes’), but not when they are presented in the context of a nonsense function word (e.g. mã chaussures, where /mã/is a nonsense word). Even more strikingly, Shi and Lepage (2008) showed that this function-word- based segmentation procedure works even for unfamiliar words: French-learning 8-month-olds familiarized with an “article+noun” string (e.g. des preuves, ‘some evidence,’ where “preuves” does not belong to the infants’ vocabulary) then chose to listen longer to the isolated noun (preuves, ‘evidence’), in contrast to when they were familiarized with a “nonsense item+noun” string (e.g. ké preuves, where /ké/is a nonsense item). An additional bonus of this segmentation procedure is that infants may be able to gather information not only about the phonological form of the segmented word, but also about its syntactic category (Höhle et al. 2004; Kedar et al. 2006; Zangl and Fernald 2007; Christophe et al. 2008; Bernal et al. 2010; Cauvet et al. 2010; Shi and Mélançon 2010). Overall, since function words appear to be acquired before most content words (due to their extremely high frequency and specific phonological and distributional properties), they may provide young infants with a rather useful segmentation procedure for content words.

8.4 Interactions and Combinations between the Various Segmentation Cues So far, we have discussed infants’ use of the various segmentation cues in isolation. However, none of these cues provide a systematic marking of word boundaries, although their use in combination would allow infants to correctly segment the speech stream (Christiansen et al. 1998). So far, only a few studies have explored infants combined use of different cues in a controlled way, mostly focusing on the combined use of distributional and rhythmic information.

Processing Continuous Speech in Infancy 151 When cues are converging, evidence shows that from 8 months onward, infants show better performance when they are presented with two convergent cues rather than a single cue (distributional and allophonic cues, Jusczyk et al. 1999a; distributional and rhythmic cues, Johnson and Jusczyk 2001, Curtin et al. 2005; distributional cues and known words, Mersad and Nazzi 2012). The first study investigating the use of conflicting cues supported a precedence of rhythmic information cues over distributional cues (Johnson and Jusczyk 2001; Johnson and Seidl 2009), as shown in an artificial language paradigm that pitted prosody and syllabic distributional information against one another. This was more indirectly supported by data obtained by Jusczyk et al. (1999b) using natural language stimuli, that suggested English-learning 7.5-month-olds could only use distributional information if it involved consecutive syllables within a trochaic unit, but not across two adjacent trochaic units (as is the case for iambic words). The conclusion regarding the relative importance of rhythmic and distributional cues was later challenged by Thiessen and Saffran (2003). While these authors did replicate Johnson and Jusczyk’s (2001) finding with 9-month-olds, they found the opposite pattern at 7 months, which led them to the conclusion that distributional information is used earlier than rhythm. However, as discussed earlier, even though Thiessen and Saffran’s (2003) results show that infants can track syllabic distributional information and use it after a few minutes of exposure to a very simplified language, it is unclear whether infants would benefit from syllabic order information at such a young age in the context of a natural language made up of thousands of words of varied syllabic length. But again, recent studies on this issue brought contradictory evidence, with some data suggesting that this may actually not be the case (Johnson and Tyler 2010; Mersad and Nazzi 2012), and some others suggesting it may be possible (Pelucchi et al. 2009a, 2009b). Note that while more infant studies will have to be conducted on these issues, there is also a growing body of studies on adult speech segmentation that are trying to establish how these cues are used in combination in adulthood (Mattys et al. 2005; Cunillera et al. 2006, 2008; Cunillera et al. 2010; Mersad and Nazzi 2011). Future studies will have to continue investigating this issue, in both infants and adults, using as a theoretical framework the hierarchical model of word segmentation proposed by Mattys et al. (2005) to account for adults’ segmentation abilities, although this model does not yet discuss the use of distributional information, nor that of phrasal prosody. This hierarchical model postulates three tiers of segmentation cues. Tier 1 refers to sublexical suprasegmental cues such as the stress pattern. Tier 2 refers to sublexical segmental cues such as phonotactics, allophony, and coarticulation. Tier 3 refers to lexical cues (which broadly include semantic, syntactic, and pragmatic information). In optimal listening conditions, adults were found to rely mostly on the lexical level. However, they were found to rely on segmental information when the speech signal was degraded, and to rely on stress pattern information when the speech signal was severely degraded. The data on early word form segmentation presented earlier supports the notion that word segmentation information from Tier 1 (rhythmic information) becomes available before information from Tier 2 (allophonic and phonotactic information). Information from Tier 3 (lexical

152 GOYET, MILLOTTE, CHRISTOPHE, AND NAZZI level), which it appears to be possible to use as early as 6 months of age for a very limited set of highly familiar words, would become more crucial for segmentation as the size of infants’ vocabulary increases. Phrasal prosody is at the top of the hierarchy for prosodic constituents that are at least the size of phonological phrases, since phonological phrase boundaries were found to outrank lexical information in adults (Salverda et al. 2003; Christophe et al. 2004). As for transitional probabilities, Mersad and Nazzi (2011) have recently proposed that they are situated at the bottom of the hierarchical model of Mattys et al. (2005), which is congruent with the findings that the weight given to prosody compared to TPs increases from 7 to 9/11 months of age (Johnson and Jusczyk 2001; Thiessen and Saffran 2003; Johnson and Seidl 2009). Indeed, transitional probabilities may have a crucial role in early infancy when no other cues are available, but they should be outranked in adulthood when pitted against other cues, a prediction supported by recent data (Shukla et al. 2007; Finn and Kam 2008).

8.5 Conclusions The studies reviewed in the present chapter demonstrate that infants start to cut utterances into prosodically well-formed units, like clauses and phrases, or smaller units like words, from about 8 months of age, by relying on the various cues instantiated in their native language. Some of the segmentation procedures we reviewed are expected to work universally, such as the use of transitional probabilities (both English-and French-learning infants use transitional probabilities to segment words at 8 months), or the ability to exploit phonological phrase boundaries to constrain lexical access. Other segmentation procedures are instantiated differently in different languages: for instance, English-learning infants and French-learning infants rely on different rhythmic units to segment continuous speech (the trochaic unit and the syllabic unit respectively). These facts raise questions that will have to be further investigated in the future, about the mechanisms underlying segmentation abilities, the cues they rely upon and their relative weight across development and in adulthood. Crucially, further cross-linguistic exploration of the emergence of segmentation abilities is needed. Thus, while the studies on English have revealed that infants can use different cues (transitional probabilities, rhythmic units, allophonic and phonotactic cues, phrasal prosody, known words, …) for segmentation, and that the relative weight given to these cues changes with development, there is very little work on these issues in other languages. Given the likelihood that the developmental trajectory of these cues’ use will vary across languages, it is important to study their use by infants learning different languages. As revealed by the comparison of the data obtained with HPP and ERPs in both Dutch-and French-learning infants, such work will probably benefit from the use of different experimental techniques. These future studies will not only establish the different developmental trajectories of segmentation abilities in different languages but, even more importantly, will contribute to our understanding of the mechanisms of early and

Processing Continuous Speech in Infancy 153 adult word segmentation, and help us refine models of segmentation abilities such as the one proposed by Mattys et al. (2005). To conclude, the ability of young infants to segment continuous speech into word- sized units is a crucial aspect of language learning. The body of work that started in the 1990s and that we reviewed here shows that infants under one year of age already use a variety of complementary segmentation procedures, relying on very diverse cues, and go on to refine these procedures during their second year of life. They are thus well equipped to work on the development of their lexicon, which undergoes a sizable boost in the second year of life. Whereas the vision of language learning used to be extremely “linear,” with the idea that infants would start by working on individual phonemes, then segment words and learn their meanings, and only then start working out the syntax of their native language, the scientific community now recognizes that these different components of language are, to some extent, acquired simultaneously, with the progress made in one domain feeding back into other domains. For instance, recent proposals suggest that phonological acquisition would benefit greatly from a ‘proto-lexicon,’ that is, a list of phonological forms for plausible word candidates from the language (Swingley 2005b; Martin et al. 2010). For instance, Martin et al. (2010) showed that such a proto-lexicon greatly improved the performance of an algorithm aiming at discovering pairs of allophones. Future work should aim at establishing how well infants can segment the speech stream by relying on all of these segmentation procedures simultaneously. Indeed, while none of the various segmentation procedures presented here provide a full solution to the word segmentation problem, taken together they may well be sufficient to promote both lexical acquisition and the acquisition of other parts of language (such as phonology and syntax). This work may have to rely on more realistic models of infant segmentation, that start from actual acoustic input (rather than pre- processed input, such as transcriptions), for the current work on the usefulness of transitional probabilities to segment corpora of child-directed speech.

Acknowledgments The writing of this chapter and the research on French undertaken by the authors were partly supported by a grant from the European Science Foundation (EUROCORES programme, “The Origin of Man, Language and Languages”), a grant from Fonds National de la Science # ACI-04-3-22, and two grants from the Agence Nationale de la Recherche, # ANR-07-BLAN-0014-01 to Thierry Nazzi and # ANR-2010-BLAN-1901 to Anne Christophe.

Pa rt I I I

T H E AC QU I SI T ION OF SY N TAX

Chapter 9

Argum ent St ru c t u re Joshua Viau and Ann Bunger

9.1 Introduction The study of the acquisition of argument structure has long figured prominently in debates about learning and abstractness. In what follows, our goals are to highlight the major positions taken and to provide a much-needed synthesis of the evidence supporting them. Section 9.1 introduces important terms and distinctions in the study of argument structure, addresses the balance of power between verbs and constructions with respect to the encoding of thematic relations, and briefly discusses various approaches to linking thematic relations with syntactic positions. In section 9.2, competing theoretical approaches to argument structure are compared with an eye toward issues of learnability. Section 9.3 forms the heart of the chapter, presenting a comprehensive review of developmental research pertaining to argument structure organized chronologically and, within age group, by methodology and (where possible) language. Finally, section 9.4 wraps up the current state of our understanding of argument structure and offers suggestions for future research.

9.1.1 Terms and Preliminaries At an intuitive level, we think of verbs as labeling events in the world and verbal arguments, in turn, as naming individuals/entities that stand in some principled relation to an event. From this perspective, any account of the development of argument structure presupposes knowledge of event structure and its organization on the part of the learner. Indeed, our linguistic and conceptual representations of events are undoubtedly linked, though opinions vary as to how intimate the connection is between them. Levin and Rappaport Hovav (2005) rightly point out that the linguist’s attention in this domain should naturally be drawn to those semantic properties of events that are grammatically relevant, that is, potentially affecting subject and object selection, characterizing

158 Joshua Viau and Ann Bunger classes of verbs defined in part by distinctive morphosyntactic behavior, etc. They highlight three main approaches to the conceptualization of events that have been proposed in the literature concerning argument realization (2005: 78). On the first, the localist approach, all events are thought of as being construed at some level of abstraction in terms of spatial motion and location (Jackendoff 1972, 1976, 1983, 1987, 1990). A second, aspectual approach places more emphasis on temporal properties such as telicity, measure, and incremental change (Vendler 1957; Dowty 1979, 1991; Tenny 1994; Jackendoff 1996). The third approach to event conceptualization takes causal notions to be central (Talmy 1976, 1988; Croft 1990, 1991, 1994, 1998). Abstracting away from the details of how these approaches diverge, one finds general agreement among their proponents that certain broad classes of events (e.g. location- oriented/static versus motion-oriented/dynamic, simple versus complex) and event participants (roughly, agent/initiator versus patient/theme versus location/goal versus instrument) are especially salient and often distinguished linguistically. Here, for the sake of concreteness, it may be helpful to pause and consider a scene involving a boy rolling a ball to his mother at a birthday party held outdoors. There are many sentences one might use to describe this dynamic event, including the following: (1)

a. The ball rolled (cf. #The boy rolled)1 b. The boy rolled the ball c. The boy rolled the ball to his mother

Syntactic construction intransitive (unaccusative) transitive ditransitive

Among the various entities that participate in this scene—including (but not limited to) the boy, the ball, the grass, the mother, the other party guests watching, and the various colorful decorations placed here and there—all bear some relation to the rolling event. However, only a few of these participants—agent, theme, goal/recipient—could ever qualify as arguments2 of the manner-of-motion verb roll. To know which event participants may be encoded as arguments, and in which syntactic contexts, is to know the argument structure of roll, that is, the requirements imposed by the verb roll (or its non- English equivalent) on those participants of a rolling event that the verb selects for the purpose of linguistic encoding in a given construction. More specifically, to know the argument structure of roll is to know that in an intransitive construction the verb takes the thing that gets rolled (the Theme) as its subject; in a transitive construction it takes the causer of the rolling action (the Agent) as its subject and the Theme as its object; and

1

The # symbol is used to mark sentences that, while grammatical, are incongruent with the event under consideration. In contrast, the * symbol indicates ungrammaticality. 2 Throughout this chapter we rely on the definition of argument commonly used in mathematical logic, namely that of a variable relevant to a function. Since functions are inherently relational, it is only natural that we should largely concern ourselves with the argument structure of the most ubiquitous of relational concepts in natural language, verbs, as is standard in the literature. Still, it is important to remember that other lexical categories such as nouns and prepositions are also argument-taking, strictly speaking.

Argument Structure 159 in a ditransitive construction, it maps the person or location that serves as the Recipient or Goal of the rolling action to a second post-verbal argument position (realized in (1c) as a prepositional phrase). Likewise, to know the argument structure of the instantaneously caused ballistic motion verb kick is to recognize that this verb is incompatible with the one-participant syntactic frame in (1a) unless its argument is the entity doing the kicking (the Agent), for example, The boy kicked /*The ball kicked. Furthermore, some of the dimensions along which aspects of events like the one just described might conceivably vary, such as color (Grimshaw 1993) and volume of speech (Pesetsky 1995), consistently fail to matter for argument structure generalizations. Despite these and other examples of opacity in the relationship between event structure and argument structure, it is fairly common to find appeals in the literature to principles requiring features of each event (or each subevent in a complex event) to have some sort of syntactic realization (Pustejovsky 1991; Grimshaw and Vikner 1993; van Hout 1996; Kaufmann and Wunderlich 1998; Rappaport Hovav and Levin 1998, 2001). One consequence of such a requirement is that complex events must be expressed by dyadic (i.e. two-participant) predicates at a minimum, frequently by transitive verbs. Speaking of transitivity, right off the bat we should take the opportunity to provide a few more examples of the various syntactic constructions to which we will often refer as we summarize what children know about argument structure in section 9.3. The intransitive, one-participant construction shown in (1a) can be subdivided into two classes, unaccusative and unergative. Unaccusative verbs typically encode the result of a complex causative event and take the Patient of that event as their single argument (e.g. The ball rolled); unergative verbs encode an activity that an Agent is engaged in, and take that Agent as their single argument (e.g. The boy kicked).3 The transitive, two-participant construction in (1b) is also compatible with verbs labeling several different event types; in particular, it is labeled causative when used to describe events in which an agent performs some action that gives rise to a change of state in another entity, the patient (e.g. The girl broke the glass). Verbs encoding caused motion/possession are most often associated with the ditransitive, three-participant construction in (1c), which figures prominently in discussions of English object alternations, in particular those referred to as locatives (e.g. The farmer loaded {the truck with hay /hay onto the truck}) and datives (e.g. Kai gave {Zoe the guitar /the guitar to Zoe}). More generally, the umbrella term applicative refers to a class of valency-augmenting constructions, encompassing transitives and ditransitives, studied most notably in the Bantu family but also in German, English, and other languages (Pylkkänen 2008; Bosse 2011). Constructions involving more than three participants are rare enough to merit no further attention in this chapter but are not unattested (e.g. Sara traded her sandwich to Aidan for a slice of pizza). Finally, more detail is necessary concerning the range of roles, or thematic relations, that arguments are observed to play with respect to the events that verbs encode. Traditionally, these were thought of as composing a set of semantically unanalyzable 3

See Levin and Rappaport Hovav (1995) for detailed discussion of the syntax and semantics of the unaccusative/unergative distinction, first made by Perlmutter (1978).

160 Joshua Viau and Ann Bunger roles defined independently of the verbs with which they were associated (Gruber 1965a; Fillmore 1968; Jackendoff 1972; Carter 1976). Some of the most prominent roles have already been mentioned—Agent (the instigator/causer of an event), Patient/ Theme (the entity affected by an event), Location (either source or goal in a motion event)—and to those we could add Recipient, Beneficiary, Experiencer, Instrument, and probably others. One issue that has arisen with respect to the list approach to thematic roles involves defining the proper grain size for argument-related generalizations. Attempts at addressing this issue have often appealed to either generalized roles labeling clusters of entailments imposed by predicates on their arguments (e.g. the Proto-Agent and Proto-Patient concepts in Dowty 1991) or some form of predicate decomposition into primitives instantiating basic meaning components in representations of verbal lexical semantics (Jackendoff 1976, 1983, 1990; Wunderlich 1997, 2000; Rappaport Hovav and Levin 1998) or syntax (Hale and Keyser 1993, 2002; Marantz 1997; Harley and Noyer 2000). We should also perhaps be wary of getting ahead of ourselves here given that some have argued thematic roles must be built up gradually from verb-specific notions such as, for example, Roller and Rollee, into verb-general categories like Agent and Patient (e.g. Tomasello 1992).

9.1.2 Are Thematic Relations Verb-specific or Verb-general? Decades of previous research have shown that, rather than varying arbitrarily in how they describe events, verbs with common semantic characteristics often have common syntactic characteristics. For example, hit-type verbs (e.g. hit, kick, pound) appear in the conative construction and the body-part possessor ascension construction but not in the middle construction, while break-type verbs (e.g. break, crack, rip, shatter) show the exact opposite pattern (Levin 1993: 6–7). (2) a. Carla {hit/*broke} at the vase conative cf. Carla {hit/broke} the vase b. Carla {hit/*broke} Bill on the back body-part possessor ascension cf. Carla {hit/broke} Bill’s back c. The vase {*hit/broke} easily middle cf. Carla {hit/broke} the vase easily Virtually all modern theories of lexical semantics have tried to account for this systematic aspect of verbal behavior in some way, with results varying widely in terms of representational detail. Paraphrasing Pylkkännen (2008), the fundamental question that anyone in this research area must grapple with concerns the nature of lexical complexity. One influential position has been that argument relations project from the verb. This position, which we will refer to as lexicalist because it places the burden of explanation on the lexicon (e.g. Chomsky 1981a; Bresnan 1982b; Dowty 1989; Jackendoff 1990; Steedman 1997; Rappaport

Argument Structure 161 Hovav and Levin 1998; Joshi 2004), requires—in addition to a precise theory of lexical representation—a theory of how the predicates and arguments in lexical semantic representations determine syntactic behavior and map onto a limited number of syntactic positions (the subject of section 9.3). Another equally important position, which we will call constructionalist following Folli and Harley (2002), has favored a significant role for verbal context (i.e. constructions) in argument realization (e.g. Marantz 1984, 1997; Hale and Keyser 1993; Goldberg 1995; Harley 1995; Michaelis and Ruppenhofer 2001; Borer 2003; Pietroski 2005; Williams 2005). In a constructionalist account, the mapping (or linking) problem is often minimized as the syntax is further articulated and the number of positions available to link thematic roles with increases. Consequently, the explanatory burden falls on the grammar, which accordingly requires more complex machinery than it might otherwise. In particular, the issue of how to integrate core verbal meaning (i.e. “root” meaning) with constructional meaning arises on this approach, whereas on a lexicalist approach a verb’s compatibility with multiple constructions is typically seen as a function of its own polysemy (and not that of the construction). The strongest version of the constructionalist approach holds that argument structure is purely syntactic, that is, verbs have no arguments as part of their lexical representations (Pietroski 2005; Williams 2005). An example of how proponents of the lexicalist and constructionalist positions would represent the denotation of the verb give with respect to its arguments should clarify the general picture. From a lexicalist perspective, give projects an Agent, Goal, and Theme (3a), all of which are associated with their syntactic positions via the application of a linking rule or lexical event structure (not shown). In contrast, from a strong constructionalist perspective, give does not project any arguments (3b); rather, basic thematic relations are introduced by the syntactic frame in which give occurs. There is also a weaker, intermediate constructionalist position (associated in particular with Marantz 1997) according to which the Goal and Theme arguments are part of give’s lexical representation but the Agent argument is supplied by the syntax (3c).4 (3) a. Lexicalist lexical entry for give [[give]]: λe.λx.λy.λz. give(e) ∧ goal(e,x) ∧ theme (e,y) ∧ agent (e,z) b. Strong constructionalist lexical entry for give [[give]]: λe. give(e) c. Weak constructionalist lexical entry for give [[give]]: λe.λx.λy. give(e) ∧ goal(e,x) ∧ theme (e,y) Note that there is evidence for both verb-specific and construction-specific knowledge on the part of children throughout development, so we will need to return to this issue on several occasions.

4 Kratzer (1996) first argued that Agents are not arguments of the verb based in part on the observation, attributed to Marantz (1984), that there are no idioms involving just the verb and its Agent. The argument holds if we assume along with Krazter that idiomatic meanings must be stated over lexical representations.

162 Joshua Viau and Ann Bunger

9.1.3 Mapping/linking The mapping between thematic roles and syntactic positions is not one-to-one: a given thematic role category may appear in multiple syntactic positions (e.g. Coffee spilled on my laptop versus The barista spilled coffee on my laptop), and multiple thematic roles can be mapped to a given syntactic position (e.g. The chef {roasted chicken versus visited Kitchen Stadium versus gave us a free appetizer versus intimidated his apprentice}).5 However, certain syntax–semantics correspondence patterns display considerable regularity across languages (e.g. Grimshaw 1981; Croft 1990; Jackendoff 1990; Dowty 1991) as well as in the “home sign” systems invented by deaf isolates (Feldman et al. 1978; Goldin-Meadow and Mylander 1998), for example, verbs like give that describe possession transfer consistently occur with three noun phrases. As a result, linguists’ collective desire to bring semantics and syntax into formal correspondence, thereby capturing cross-linguistic generalizations pertaining to argument structure and allowing one to make predictions in one direction or the other, has been strong (e.g. Randall 2010). As mentioned in section 9.1.1, researchers have explored various ways of either reducing the potential number of semantic distinctions to be made (through generalized thematic roles) or increasing the number of syntactic positions to link them to (through predicate decomposition) in order to simplify accounts of mapping. Another influential approach has focused on positing and aligning semantic and syntactic prominence hierarchies, with the former defined over thematic relations/semantic primitives (4a) and the latter defined over grammatical relations/syntactic positions (4b) (Baker 1988, 1997; Larson 1988b, 1990). (4) a. agent > theme > goal b. subject > object > indirect object/oblique Given two such hierarchies, an algorithm along the lines of the one described in (5), a relativized version of Baker’s (1988) Uniformity of Theta Assignment Hypothesis (UTAH), may be called into service to relate them. (5)

If a verb α determines θ-roles θ1, θ2, . . . , θn, then the lowest role on the Thematic Hierarchy is assigned to the lowest argument in constituent structure, the next lowest role to the next lowest argument, and so on. (Larson 1988b: 388, ex. P2)

Importantly, the grammatical relations invoked in (4b) can be recast in terms of morphological case-marking (Carrier-Duncan 1985), topicality (Givón 1984), or otherwise such that the notion of syntactic prominence does not necessarily presuppose configurationally represented syntactic structure. Within the monostratal Construction Grammar (CG) framework, apparent idiosyncracies of linking—such as the different syntactic positions in which the Theme argument the ball appears in (1a,b)—are traditionally encoded in construction- specific ways, where constructions amount

5

See Levin and Rappaport Hovav (2005: ch. 5), for more detailed discussion.

Argument Structure 163 to conventionalized pairings of form and meaning stored, like words, in the lexicon. Broader linking generalizations are captured in this framework by prominence hierarchies among constructions; for instance, a skeletal transitive construction specifying that Agents map to subject position and Patients to object position is argued to dominate and extend its mapping scheme to more specific instances (or senses) of this construction (e.g. causative, non-causative) (Goldberg 1995: ch. 4).

9.2 Theories of Argument Structure and Learnability Summarizing the range of theoretical approaches to argument structure from a developmental perspective is a challenge, but our task is simplified somewhat by the fact that the two most active research traditions stake out fairly opposite positions along the nature–nurture continuum. Of course, this sort of terminology is misleading in the sense that no serious researchers argue for either extreme. Instead, bodies of work typically vary in terms of the degree to which linguistic input is argued to play a decisive role in the formation of categories/representations relevant to argument structure and the generalizations made over them. On “early abstraction” (or, alternatively, “nativist”) approaches, the causal role of the input in structure building is downplayed, and relatively more emphasis is given to the learner’s use of the input in testing an innately constrained set of hypotheses about the relationship between form and meaning (Chomsky 1959, 1975, 1980, 1981a; Fodor 1966; Baker 1979; Pinker 1984; Crain 1991; inter alia). Importantly, positing innate abstract knowledge does not solve the learning problem. Rather, it shapes the learning mechanism to be a selective one rather than a strictly inductive one (Fodor 1966; Pinker 1979; Lightfoot 1982; Viau and Lidz 2011). For example, even if learners come fully loaded with innate knowledge about the range of abstract structures that are possibly utilized in language, they must still use evidence from the surface form of language to identify which particular abstract structures underlie any given sentence in the language to which they are exposed (Fodor 1966; Pinker 1979; Tomasello 2000b). This problem is made even more severe when we recognize that the very same aspect of syntactic representation may manifest itself differently in the surface form of different languages (Rizzi 1982; Baker 1988; Dresher and Kaye 1990; Clark 1992; Sakas and Fodor 2001). Thus, a selective learning mechanism cannot comprise a simple triggering mechanism in which the learner is endowed with knowledge of which cues must be sought out in order to identify a particular syntactic structure in the exposure language (Lightfoot 1993, 1999; Gibson and Wexler 1994; Fodor 1998; Tomasello 2000b). Of course, the nativist’s conclusion that the acquisition of syntactic forms relevant to argument structure is achieved in large measure by some sort of selective learning mechanism is valid only to the extent that the arguments for abstract representation are

164 Joshua Viau and Ann Bunger themselves valid. Alternatives to the “early abstraction” view come in two related varieties, which we will lump together under the “emergentist” (or “usage-based”) rubric.6 One approach recognizes that syntactic representations in adult grammars are abstract, but posits that this abstractness is the result of a learning mechanism that drives the learner from concrete representations of particular experiences to increasingly abstract generalizations over those experiences (Elman et al. 1996; Bybee 1998; Tomasello 2000b; inter alia). A second approach denies that syntactic representations are so abstract, moving the explanatory burden of seemingly abstract phenomena to other areas of linguistic knowledge (Culicover and Jackendoff 2005). A growing body of research reflects a convergence of these alternatives, attributing less abstractness to syntactic representations and deriving what abstractness there is from domain-general processes of induction and categorization (e.g. Tomasello 1992, 2003; Goldberg et al. 2004, 2005; Goldberg 2006). As a way of focusing on the empirical findings in section 9.3 that might allow us to confirm or falsify aspects of these theories, we find it useful to reflect on Baker’s Paradox (Baker 1979), as summarized by Pinker (1989). Pinker describes the acquisition of argument structure as a logical problem; since language is an open-ended set, but input is finite, the child learner must generalize on the basis of whatever hypothesis concerning argument structure seems to make the most accurate predictions. With respect to ditransitives, children learning English hear many verbs like give and tell that alternate between prepositional dative (X gave Y to Z) and double-object dative (X gave Z Y) structures. Children also hear verbs like say and whisper that only appear in one of those structures, the prepositional dative, and it is perhaps only natural for them to assume that these two verbs alternate just as give and tell do, leading to non-adultlike utterances like say/whisper Sophie the secret. Given that children are not taught the dative alternation explicitly, and further that they are not systematically corrected or miscomprehended when they speak ungrammatically (e.g. Brown and Hanlon 1970), how is it that they manage to avoid or unlearn such mistakes? Three aspects of this problem contribute to its paradoxical status: (a) the apparent lack of negative evidence, (b) children’s relative productivity, that is, their willingness to go beyond the input, and (c) the arbitrariness of argument structure, as illustrated by whisper’s failure to participate in the dative alternation. The only way out of the paradox is to question one or more of these three assumptions; conveniently, proponents of the two broad theoretical approaches sketched here make different choices. Early abstractionists typically take issue with (c), pointing out systematic regularities that exist in the mapping between verb meaning and verb syntax and positing that children readily apprehend and take advantage of them in acquiring verbal argument structure. Indeed, there is much evidence supporting this idea in its various incarnations, whether we take syntax or semantics as the starting point. Syntactic bootstrapping (e.g. Landau and Gleitman 1985; Gleitman 1990; Fisher et al. 1991; Fisher 1994; Snedeker and Gleitman 2004) involves using syntactic cues such as the number 6

Certainly there are hybrid approaches combining elements of the early abstraction and emergentist approaches in various ways as well (e.g. Ninio 2011).

Argument Structure 165 of arguments and general subcategorization patterns to constrain one’s hypotheses about verb meaning. In comparison, semantic bootstrapping (Pinker 1984, 1989; Grimshaw 1990) involves using cues about verb meaning (extracted in part from analysis of situations in the world) to make aspects of the syntax predictable. Note that these heuristics, while often discussed independently, are not mutually exclusive and have increasingly been thought of as working in tandem (e.g. Hochmann et al. 2010). On the other hand, proponents of more emergentist, usage-based approaches commonly object to (b) and (a), emphasizing children’s well-documented conservativity (i.e. relative unwillingness to generalize) in language production under certain circumstances (e.g. MacWhinney 1982; Tomasello 1992; Olguin and Tomasello 1993; Akhtar and Tomasello 1997; Lieven et al. 1997) and the potential influence of forms of indirect negative evidence such as entrenchment (Ambridge et al. 2008) /statistical preemption7 (Boyd and Goldberg 2011), respectively. As we will see in the next section, one issue in critically evaluating the early abstraction and emergentist approaches to argument structure is that work done in these traditions tends not to focus on the same tasks or age groups. Generally speaking, the former relies more on comprehension-based methods with younger populations, while the latter skews toward production-based methods with older populations. To the extent possible, we will attempt to correct for this asymmetry in reviewing the developmental literature.

9.3 Review of Developmental Findings Our approach in this section is to let the child’s chronological progression through stages relevant to the acquisition of argument structure guide us in organizing the literature review. Section 9.3.1 covers the period between birth and 18 months of age, ending roughly at the onset of the two-word stage. The cut-off for section 9.3.2 is 3 years, approximately when we begin to see abundant evidence of children using verbs productively. In section 9.3.3, we focus on what is known about 3-and 4-year-olds’ increasingly sophisticated language skills in this domain. Finally, in section 9.3.4 we take the rise of children’s verb-compliant behavior as a milestone, dealing with lexical learning (or unlearning) at 5 years and beyond.

7

By entrenchment we mean the phenomenon whereby repeated presentation of a verb in one attested construction discourages or suppresses use of that verb in unattested constructions (e.g. Clark 1987; Braine and Brooks 1995). Statistical preemption, whereby speakers learn not to use a construction with a particular verb if an alternative construction with the same function is consistently witnessed, differs from entrenchment primarily in that the frequency of a verb in functionally comparable constructions (as opposed to in any alternative construction) is considered predictive of the strength of the negative evidence provided (see Boyd and Goldberg 2011: 61).

166 Joshua Viau and Ann Bunger

9.3.1 Infancy to 18 Months In the earliest stages of language acquisition, prior to the two-word stage, children are unable to demonstrate knowledge of argument structure in production. Nevertheless, there is evidence from comprehension-based methods that they are sensitive to certain aspects of argument structure.

9.3.1.1 Representative Findings In some respects, event representation in young children parallels argument structure, suggesting that early conceptual representations may provide a foundation for emerging linguistic representations. Gordon (2004), in work replicating and extending Scherf and Gordon (1998, 2000),8 habituated 6-, 8-, and 10-month-old infants to videos depicting three entities (a boy, a girl, and a stuffed camel) engaged in one of two events, giving or hugging. At test, participants were shown two trials of the same (habituated) video and two trials of a novel (altered) video in alternating order. In the altered videos, the camel had been removed, but the motions of the girl and the boy remained the same. Looking time data revealed that by 10 months of age, infants discriminated between the familiar and novel test events for giving events only, that is, they discriminated between giving something and not giving something, but not between hugging while holding something and hugging while not holding anything. Thus, well before they are able to describe giving and hugging events, 10-month-old infants arguably form nonlinguistic representations of these events that encode the number of grammatically relevant event participants. The findings of Brandone and colleagues (2006) suggest that by 17 months, children have taken the next step and begun to make use of these similarities between event structure and linguistic structure to predict the subcategorization frame of a novel verb. In a habituation study, Brandone and colleagues demonstrated that 17-month-olds expect a novel verb associated with a one-participant event to be used in an intransitive frame. Brandone and colleagues’ findings are consistent with the idea that toddlers at this age are already able to bootstrap from semantics to syntax, inferring the syntactic frame(s) with which a novel verb is compatible on the basis of the structure of the event the verb describes. In addition, there is suggestive evidence that 17.5-month-olds can bootstrap in the other direction, using their knowledge of the transitive frame to guide their interpretation of the event described. For instance, Hirsh-Pasek and Golinkoff (1996a), replicating and extending Golinkoff et al. (1987), used the intermodal preferential looking paradigm to demonstrate that toddlers at this age can interpret word order appropriately in sentences with familiar transitive verbs such as “Big Bird is tickling Cookie Monster.” The Agent and Patient roles that these two Sesame Street characters played in their test events were manipulated such that participants had to choose (by directing their eyes) 8

See also Scherf (2005).

Argument Structure 167 which of two versions of the events matched each linguistic stimulus at test. Participants attended significantly more often to matching events regardless of their ability to produce multiword utterances, indicating that they recognize not only the order of phrases or entities in a given sentence, but also the semantic significance of this ordering. Along these lines, White and colleagues (2011) have recently shown an effect of syntactic bootstrapping even earlier using the same experimental paradigm. Presented with events depicting, for example, a hand using a truck to push a block during familiarization, 16-month-old toddlers were then observed to look significantly more often at the Patient (the block) than at the Instrument (the truck) when told to find the novel noun phrase “the tiv” if they had previously heard the pushing event described transitively (e.g. “She’s pushing the tiv”). Those who heard the same event described using a different syntactic frame (e.g. “She’s pushing with the tiv”) instead looked more often at the Instrument, provided that they had no productive verb vocabulary.9 We can conclude on the basis of these results that 16-month-olds are able to use their knowledge about the range of syntactic positions a novel noun phrase might occur in to infer the thematic role borne by the noun phrase.

9.3.2 18 Months to 3 Years From the two-word stage through to approximately the end of the second year, child learners typically evince what’s known as a grammar explosion. This burst in linguistic sophistication brings with it an embarrassment of riches in terms of empirical research. Much of the work targeting this developmental stage—concerning language comprehension and, also, increasingly, production—suggests considerable knowledge on the part of the child with respect to argument structure. However, there is a great deal of disagreement over the nature and origin of this knowledge.

9.3.2.1 Representative Findings In comprehension, studies have shown that children in this age range can extract quite a bit of information about the meaning of a novel verb from the context in which it occurs. For instance, the number of noun phrases co-occurring with a verb is apparently informative quite early in development. One piece of evidence for this comes from Yuan and colleagues (2007), who demonstrated that 21-month-olds hearing a novel verb in the transitive frame (e.g. “He’s gorping him”) looked reliably longer at a two-participant event depicting one boy causing another to bend than did those hearing the same novel verb in the intransitive frame (e.g. “He’s gorping”); the latter were more likely to look at a simultaneously presented one-participant event involving a boy moving his arms (see also Fisher 2002 for similar results using pointing as a dependent measure with 28- month-olds). In addition, knowledge of the semantic significance of word order appears

9

See section 9.4 for more details.

168 Joshua Viau and Ann Bunger abstract enough to be recruited in the interpretation of novel verbs presented in transitive frames by 21 months (Fisher 2000; Gertner et al. 2006; preferential looking) and in both transitive and intransitive frames by 28 months (Fernandes et al. 2006; forced- choice pointing). Note, however, that evidence for children’s comprehension of word order in sentences containing novel verbs used transitively is not found until after age 3 in act-out tasks (e.g. “Make Mickey dack Ernie”) (Akhtar and Tomasello 1997). Within months, children can use the transitive/intransitive frame distinction to guide the meanings they posit for novel verbs in choosing among events with equal numbers of participants. For example, 26-month-olds are able to determine whether a verb taking two noun phrase arguments refers to a causative scene (e.g. a duck forcing a rabbit to squat by pushing down its head) or a non-causative scene (e.g. a duck and bunny each twirling one arm in circles) by noting whether the verb appears in a transitive (e.g. “The duck is gorping the bunny”) or intransitive frame (e.g. “The duck and the bunny are gorping”) (Naigles 1990; see also Naigles and Kako 1993; Fisher 2002). Thus, toddlers use the syntactic structures in which two noun phrases occur to infer the relation between two event participants that is encoded by the novel verb. Recently, it has become clear that children can make similar inferences about novel verb meaning on the basis of syntactic context—that is, the number, identity, and syntactic position of accompanying arguments—even in the absence of accompanying information from visual scenes. Yuan and Fisher (2009) exposed 28-month-olds to dialogues in which a novel verb was mentioned eight times in either an intransitive frame (e.g. “The boy blicked”) or a transitive frame (e.g. “The boy blicked the girl”). Participants then viewed two scenes, one depicting an event with one actor, the other an event with two actors. When asked to “Find blicking,” 28-month-olds looked reliably longer at the two-actor events in the transitive condition. Thus, even in non-referential contexts, 28- month-olds are able to glean the number of noun phrases occurring in sentences containing a novel verb and use this information to choose among subsequently presented visual scenes differing chiefly in terms of the number of participants. Arunachalam and Waxman (2010) presented converging evidence in 27-month-olds using a similar procedure (with pointing as their dependent measure), and there are signs of the same ability to take advantage of multiple syntactic frame cues without concurrent events at 22-month-olds in the work of Messenger and colleagues (Messenger et al. 2015). There is evidence, moreover, that children in this age range can use semantic information about the arguments of a novel verb to infer the argument structure of that verb. Scott and Fisher (2009) revealed that unaccusative and unergative intransitive verbs can be distinguished in a corpus of child-directed speech by the animacy of the event participants appearing as their subjects and the degree of lexical overlap between the nouns used as subjects and objects in their transitive variants. Specifically, unergative verbs (e.g. “Anne dusted”) were more likely to occur with animate subjects than unaccusative verbs (e.g. “The lamp broke”), and unaccusative verbs were more likely to exhibit lexical overlap between subjects and objects in their transitive variant than unergative verbs (“Anne broke the lamp” versus “Anne dusted the lamp”). Scott and Fisher demonstrated, moreover, that 28-month-old children can use their knowledge

Argument Structure 169 of the distribution of these semantic features in the input to guide their interpretation of a novel verb. Similarly, Bunger (2006) showed that that 24-month-old children can use semantic combinatorial information to determine which of the subparts of a causative event (means versus result) a novel verb labels. In a preferential looking study, both adults and 24-month-olds preferred to map novel verbs presented in the presence of a causative event (e.g. a girl bouncing a ball by hitting it repeatedly with a tennis racquet) onto the result of the causative event if they had been presented in an intransitive syntactic frame with an inanimate subject (e.g. “The ball is pimming”) and onto the agent’s activity if they had been presented in an intransitive frame with an animate subject (e.g. “The girl is pimming”). At this early stage in the acquisition of argument structure, what children seem to know about the meanings associated with verbal frames is in some sense more impressive than what they know about verb-specific meanings. One set of experimental findings highlighting this asymmetry has relied on pitting frames and familiar verbs against one another directly in act-out tasks. For instance, Naigles and colleagues (1993) found that 2-year-old English speakers interpreted transitives causatively and intransitives non-causatively even with verbs that do not occur grammatically in these frames. Thus, children hearing “Noah comes the elephant to the ark” have been shown to interpret this as meaning that Noah brings the elephant to the ark, that is, that Noah causes the elephant to come to the ark. These findings have been replicated in French at age 5 (Naigles and Lehrer 2002), and extended in Kannada, in which early 3-year-old speakers ignore causative verbal morphology and instead rely on argument number (a probabilistic cue to transitivity) in acting out the meanings of familiar verbs in unfamiliar syntactic environments (Lidz et al. 2003a); in addition, Mandarin speakers show a similar pattern at 32 months (Lee and Naigles 2008; but cf. Göksun et al. 2008 for complications in Turkish child data). Relative to the comprehension-based findings that we have just summarized—which are by and large friendly to the idea that early learners make note of universal tendencies existing in the mapping between form and meaning across languages and possess relatively abstract knowledge of argument structure from the beginning—results from production studies in this age range paint a more mixed picture with respect to the degree of abstraction evident in children’s argument structure representations. Many researchers have noted children’s relative unwillingness to generalize verbs (familiar or novel) in spontaneous speech and elicited production before age 3. Concerning the former, some data from corpus and diary studies suggest that children’s early language is organized and structured entirely around individual verbs and other predicative terms, with argument structure initially consisting of little other than verb-specific constructions with open nominal slots, or so-called “verb islands.” For example, building on work by Bowerman (1976), Braine (1976a), and MacWhinney (1978) that emphasizes the item-specific nature of constructional knowledge, Tomasello (1992, 2000b) observed that his daughter’s verb usage up to 24 months was characterized by impoverished verbal morphology (e.g. tense/aspect marking), conservativity (with each verb typically limited to one or two constructions), and unevenness with

170 Joshua Viau and Ann Bunger respect both to how verbs were distributed across various constructional types and to how arguments were marked across verbs within a construction (e.g. Instruments were preceded by with or by inconsistently and Agents were sometimes omitted). Similar results have been reported for broader samples of English-speaking children up to approximately 3 years (Lieven et al. 1997; see also Pine and Lieven 1993, 1997; Pine et al. 1998; Goldberg et al. 2004) as well as for child learners of Italian (Pizzutto and Caselli 1994), Hebrew (Ninio 1999), Inuktitut (Allen 1996), and various other languages (see, e.g., Tomasello 2000b: 214). Furthermore, verbal overgeneralization errors (e.g. “Don’t giggle me”; “Mommy, I poured you … yeah, with water”; “Jay said me no”) appear to be much more frequent after age 3 than before, generally speaking (e.g. Bowerman 1982, 1988; Pinker 1989). Especially in the longitudinal research summarized in this section, a potentially causal role for input frequency in caregiver speech throughout children’s subsequent development has often been touted;10 however, data from select cross-sectional studies relying on first use as a dependent measure should give us pause. For a microcosmic glimpse of this sort of general dispute with respect to the importance of input frequency in the acquisition of the English dative alternation, readers are encouraged to compare Campbell and Tomasello (2001) on the one hand to Gropen et al. (1989), Snyder and Stromswold (1997), and Viau (2007) on the other. Finally, contra the observed verb-specificity in children’s early production mentioned briefly earlier in the section, a few corpus studies have revealed what look like verb- general effects in the acquisition of argument structure at age 2—notably Snyder’s (2001) work on a syntactic parameter argued to determine the availability of both compounding and a range of syntactic constructions and Viau’s (2007) work on the acquisition of possessional and locational semantic primitives, upon which the representations of datives and other verbs arguably depend. From a cross-linguistic perspective, Demuth (1998) observed that Sesotho-speaking children use applicative constructions (6) productively (and with few errors) by 30 months. (6) Mosadi o-rek-el-a ngwana dijo11 woman agr-buy-apl-fv child food ‘The woman is buying food for the child.’ Applicatives can apparently be derived from most (but not all) intransitive and transitive verbs, so the child learner must eventually learn to avoid overgeneralizing. Additionally, 10 About the issue of input richness, a few researchers have argued from production data that the construction of argument structure knowledge gets off the ground in various ways via observation of the input, which apparently contains a number of highly frequent and/or salient general-purpose light verbs—e.g. do, give, go, make, put—that happen to be among those acquired earliest (e.g. Goldberg et al. (2004) on English-speaking 28-month-olds, Ninio (1999) on Hebrew-speaking 18-month-olds). However, Brown (1998) discusses early production data from Tzeltal Mayan children that cast doubt on the cross-linguistic validity of the correlation between ease of acquisition and semantic lightness. 11 Ex. (3b) from Demuth et al. (2005); glosses are as follows: AGR: subject–verb agreement, APL: applicative, FV: final vowel (mood).

Argument Structure 171 in Sesotho the learning task is complicated by the fact that the applicative argument may bear a number of different thematic roles and may appear either before or after the other internal object depending on animacy (see Demuth et al. 2010 for a demonstration that 4-year-old Sesotho speakers are sensitive to the latter). Concerning elicited production, as a rule, children under 3 years of age tend to hew closely to the syntactic frames in which they have heard each verb used. Experimenters modeling novel verbs in intransitive frames (Tomasello and Brooks 1998; see Berman 1993 for Hebrew data), as passives (Brooks and Tomasello 1999), or as simple gerunds (“Tamming!”; Olguin and Tomasello 1993; Akhtar and Tomasello 1997), have found that children in this age range were reliably unlikely to use the novel verbs in active transitive frames. A related finding is that 32-month-old English-speaking children asked to describe novel events using novel verbs modeled in sentences with non-canonical word order—for example, either SOV (“Ernie the cow tamming”) or VSO (“Tamming Ernie the cow”)—were significantly less likely than older children to “correct” these odd forms by using canonical SVO word order in their own productions (Akhtar 1999; Abbot- Smith et al. 2001; for similar studies with French-speaking children, see Matthews et al. 2005, 2007). However, Franck and colleagues (2011) argue that this “Weird Word Order” paradigm is problematic for various reasons and underestimates 2-year-olds’ competence (see also Franck and Lassotta 2011). Their own experiment using the same method with a few modifications showed no difference between 35-month-olds and older children in terms of corrections.

9.3.3 3 Years to 5 Years Proponents of both main approaches to the development of argument structure typically agree that by age 3 a certain level of linguistic abstraction has been achieved. On average, learners are more willing and able to demonstrate productive knowledge of argument structure than in previous stages and, thus, more tolerant of a variety of additional research methods used to assess the character of this knowledge. The preponderance of work on language production relative to comprehension in this age range may reflect researchers’ focus on 3-year-olds’ newfound ability to express themselves, coupled with the fact that some comprehension-based methods used with younger populations are not appropriate for older children.

9.3.3.1 Representative Findings A few studies have explored priming—the effect that the use of a particular construction or structure has on subsequent uses of the same structure (Bock 1986)—during online sentence comprehension at this stage. In the first to do so, Thothathiri and Snedeker (2008), 3-and 4-year-old children heard unambiguous double-object (e.g. “Give the lion the ball”) or prepositional datives (e.g. “Give the ball to the lion”) before encountering temporarily ambiguous dative utterances (“Bring the monkey the hat” /“Bring the money to the bear”). During the ambiguous interval, participants who had been primed

172 Joshua Viau and Ann Bunger with double-object datives were more likely to look at the potential Recipient (monkey), while those who were primed with prepositional datives were more likely to look to the potential Theme (money); this effect was observed both when the same verb was used in prime and target sentences and when different verbs were used. As Thothathiri and Snedeker themselves note, the data are agnostic as to the locus of priming effects, with possibilities including (a) syntax alone, (b) the mapping between syntax and semantics (thematic roles), and (c) the mapping between syntax and conceptual structure (animacy or lack thereof). In subsequent work, these authors have found evidence supporting (b) with 4-year-olds in work priming datives with locatives, either Goal-first (e.g. “They loaded the truck with the hay”) or Theme-first (e.g. “They loaded the hay on the truck”) (Thothathiri and Snedeker 2011). In comparison, the literature on production priming is somewhat more extensive. Generally speaking, these studies have relied on the picture description paradigm, in which children either repeat a prime sentence uttered by the experimenter or simply listen to it and then describe a target picture. On the negative side of the ledger, two studies by Savage and colleagues explored the role of lexical overlap on priming in children from ages 3 to 6 using the active-passive alternation for transitive verbs (Savage et al. 2003, 2006). They found that children in all age groups showed priming when the prime sentence used pronouns that could be repeated in the target utterance, that is, “It is catching it” or “It got caught by it” facilitated the subsequent production of similar active and passive sentences, for example, “It is closing it” or “It got closed by it.” In contrast, when non-overlapping lexical nouns were used (e.g. “The ball was caught by the net”) only the 6-year-olds (and not the 3-and 4-year-olds) showed priming. All other production studies have demonstrated robust structural priming across verbs in children as young as 3 years in the absence of overlapping content words. For instance, Huttenlocher and colleagues found priming of both passive and dative constructions in 3-, 4-, and 5-year- olds (Huttenlocher et al. 2004; Shimpi et al. 2007). Bencini and Valian (2008) observed a similar pattern with 3-year-olds who were given either active or passive primes and then asked to describe pictures of transitive actions with inanimate agents and patients. Turning to spontaneous production, perhaps most notable is a burst in the number of overgeneralization errors produced by children that begins approximately after age 3. Such errors are generally interpreted as a tell-tale sign of abstract knowledge of argument structure. Causative overgeneralizations of intransitive verbs and adjectives (e.g. uttering “Can I glow him?” with the intended meaning “make the toy glow”) have been particularly well-studied, thanks in no small part to Bowerman’s careful diary studies (Bowerman 1974, 1982, 1988; Bowerman and Croft 2008). In addition, we find extensive discussion of locative (e.g. “Can I fill some salt into the bear?”) and dative (e.g. “Button me the rest”) overgeneralizations in Pinker (1989). Pinker’s account of how children retreat from such overgeneralizations has been influential and merits close attention, since we will need to presuppose some familiarity with it in the next section. Briefly, he argues that the rules for argument structure alternations apply to (and so must be sensitive to) the semantic properties of verbs; in particular, if a verb’s meaning is not compatible with the semantic change the rule brings about, then the verb will fall outside

Argument Structure 173 its scope and fail to participate in the alternation (Pinker 1989). Broad-range rules are thought of as defining necessary conditions for alternating verbs, while narrow- range rules provide sufficient conditions and serve to constrain overapplication of the broad-range rules by picking out classes of verbs that actually alternate. Learning is predicated on the assumption that knowledge of the mapping between meaning and form is innate; thus, arguments project correctly provided that verb meanings and the rules manipulating them come to be represented properly. On Pinker’s view, retreating from overgeneralizations amounts to cautiously nailing down verb meanings and refining narrow-range rules. To the extent that overgeneralizations on the part of the child appear constrained rather than random and/or prolific, then we have support for Pinker-style criteria-governed productivity in the acquisition of argument structure. Indeed, children’s overgeneralizations do appear to be rare relative to grammatical uses based on available estimates (see Gropen et al. 1989 for a discussion of overgeneralization in datives and Kim et al. 1999 for overgeneralization of Korean locatives; see also Maratsos et al. 1987 for a discussion of Bowerman’s causatives, though cf. Bowerman and Croft 2008: 288–289). Problematic for Pinker’s account, however, are data of the sort Bowerman and Croft (2008) discuss showing that causativization errors appear to cross-cut or fall outside of the broad-and narrow-range verb classes described here. Potentially complicating the learning process is the fact that arguments are often omitted in caregiver speech in many languages, including (but not limited to) Japanese (Rispoli 1995), Hindi (Narasinham et al. 2005), and Inuktitut (Skarabela 2006). How do children determine the argument structure of verbs with elided arguments? Helpful strategies could include attending to multiple utterances per verb to gather information on argument number (Korean: Clancy 1996; English: Medina 2007), noting contingencies in the input concerning argument number and verb class (Mandarin: Lee and Naigles 2005) or animacy of the Patient (Japanese: Rispoli 1987, 1995), and looking for other cues such as the likelihood of adverbial modification (Wittek 2008). An additional strategy with cross-linguistic support involves attending to discourse factors affecting argument realization. For instance, Allen (2000, 2008) found that Inuktitut-speaking 3-year-olds were more likely to omit arguments in their own speech when their referents had been previously mentioned in the discourse (but cf. Brown’s 2008 results with Tzeltal-speaking 3.5-year-olds). Given this tendency, Allen speculates that learners of Inuktitut can infer the existence of missing verbal arguments in caregiver speech from their understanding of the discourse context of an utterance. In any case, it certainly seems to be true that children’s omission of arguments from their own speech at this age is not the result of limited underlying conceptual capacity. For example, Bunger and colleagues (2010) observed that 4.5-year-old English-speaking children, unlike adults, tend to omit Goal information from their descriptions of motion events (e.g. “The boy was skating [into a soccer net]”). However, children’s eyegaze patterns during event viewing and performance on a subsequent change detection task suggested that, like adults, they had encoded goals as part of their conceptual representations of the events. Concerning elicited production in this age range, relatively little work seems to have been done using novel verbs, with at least one notable exception: Conwell and Demuth

174 Joshua Viau and Ann Bunger (2007) assessed 3-year-olds’ ability to generalize across dative frames in English after teaching them novel verbs of transfer. Children did generalize, but they showed a strong preference for doing so in one particular direction, from the double-object dative frame to the prepositional dative frame (e.g. “Kim norped Petey the ball” > “Kim norped the ball to Petey;” 52 percent), rather than the other way around (9 percent). Interestingly, the double-object dative frame shows more semantic restrictions than its prepositional dative counterpart in the adult grammar (Green 1974; Oehrle 1976; Pinker 1989), so this asymmetry in extension could conceivably be due to children’s sensitivity to the relevant meaning differences between the frames. In contrast, evidence of the effect of syntactic context on children’s identification of familiar verbs is easier to come by in this age range than it is before age 3, particularly with respect to potentially difficult verbs encoding alternate perspectives on events (e.g. give/get, chase/flee) or belief (e.g. think, know). Fisher and colleagues (1994) showed 3- and 4-year-olds videotaped scenes described by a puppet using novel verbs and then asked participants to paraphrase these puppet words. Importantly, the syntactic contexts in which the novel verbs were presented ranged from neutral (e.g. “Look! Ziking!”) to biased toward one perspective (e.g. “The elephant is ziking the ball to the bunny”) or the other (e.g. “The bunny is ziking the ball from the elephant”). In neutral contexts, children showed an agency bias, preferring to paraphrase with verbs whose subjects mapped to the agent in the videotaped event (e.g. give); this bias was enhanced in contexts with goal prepositions (to) taking the agent’s perspective and reversed in contexts with source prepositions (from) taking the recipient’s perspective. Thus, structural properties of the sentence children heard influenced their perception of a single scene and even led children to override their default goal bias in event representation. Similarly, Papafragou and colleagues (2007) had 4.5-year-olds watch animated scenes at the end of which a character described what happened using a novel verb—in either a transitive frame (e.g. “Matt gorps a basket of food”) or a sentential complement frame (e.g. “Matt gorps that his grandmother is under the covers”)—and then asked participants to help an experimenter understand what the character had meant. Like adults, children chose to paraphrase using belief verbs significantly more often when novel verbs were embedded in sentential complement frames.

9.3.4 Beyond 5 years By age five, what is left to discover with respect to argument structure? Primarily verb- specific lexical information, including exceptions to the generalizations made in previous stages, consideration of which brings issues of abstraction and the role of the input once again to the forefront.

9.3.4.1 Representative Findings Children in this age range have typically begun to move away from predominantly frame-compliant behavior, as discussed in section 9.3.2, and toward a more nuanced,

Argument Structure 175 adultlike, verb-compliant stance that is responsive to the particulars of the exposure language. Trueswell and colleagues provide abundant evidence of this shift in their studies of 5-year-olds’ parsing preferences with respect to locative verbs. In the first study of this kind, upon hearing sentences like “Put the frog on the napkin in the box” in which the prepositional phrase on the napkin is temporarily ambiguous between modifier and Goal interpretations, 5-year-olds were shown to strongly prefer to interpret the ambiguous PP as the Goal of put—as indicated by eye movements and act-outs—even when the accompanying referential scene provided disambiguating evidence (e.g. two frogs, only one of which was on a napkin) (Trueswell et al. 1999). Coupled with the fact that children’s observed Goal bias for put reflects how the verb is used in child-directed speech, the data suggest that the information that children gather on the number and types of phrases that probabilistically occur with verbs they acquired earlier in development are not jettisoned after the verb meanings are learned; instead, the information is deployed on the fly as a sentence unfolds during future encounters with each particular verb in order to assist in recognizing the intended structures of utterances. Indeed, the influence of this knowledge of argument structure on children’s parsing commitments outweighs that of countervailing discourse factors for years (Trueswell et al. 1999, but cf. Choi and Trueswell 2010 for complications in Korean, a verb-final language). Snedeker and Trueswell (2004) replicated and extended these findings, confirming that 5-year-olds demonstrate considerable sensitivity to the argument structure of known verbs—including whether with-PPs are typically linked to them as instruments (“The chef poked the pastry with the fork”) or to the direct objects as modifiers (“The chef poked the pastry with the flaky crust”)—and rely on this information in making parsing decisions. Returning to Pinker’s criteria-governed productivity hypothesis (Pinker 1989), a number of studies have examined older children with an eye toward assessing the psychological reality of the narrow-range verb classes that are posited to constrain children’s overgeneralization errors. For example, Ambridge and colleagues had 5-to 6-year-olds, 9-to 10-year-olds, and adults rate the grammatical acceptability of causativized intransitive verbs (both familiar and novel) from three of Pinker’s semantic classes using a 5-point scale (Ambridge et al. 2008). In support of Pinker’s general approach, participants in all age groups showed the strongest dispreference for causative overgeneralizations with verbs belonging to the class associated with the lowest degree of direct external causation, that is, the class that is furthest in meaning from the transitive causative prototype (namely, “semivoluntary expression of emotion” verbs such as laugh and giggle). However, consistent with the entrenchment hypothesis mentioned toward the end of section 9.2, they also found that participants’ willingness to overgeneralize seemed to be conditioned by frequency, with high-frequency familiar verbs most likely to be used conservatively, low-frequency familiar verbs somewhat less so, and novel verbs least of all. Ambridge et al. (2009) replicated this pattern of results with slightly different stimuli, leading them (and Ambridge et al. 2011) to argue for a hybrid approach to solving Baker’s Paradox in which both verb semantics and entrenchment/pre-emption play a role.

176 Joshua Viau and Ann Bunger If only to highlight the need for more work on how children retreat from argument structure overgeneralizations, it is interesting to juxtapose two sets of results from studies focusing on datives in this age range. On the one hand, Gropen et al. (1989) had relative success in eliciting double-object dative tokens of four novel verbs taught to older children (mean age 7;4). Children produced novel verbs in the double-object dative frame 50 percent of the time when the novel verbs had been taught in that construction, and 44 percent of the time when the novel verbs had been taught in the prepositional dative. Children also showed sensitivity to a morpho-phonological constraint argued elsewhere to govern the dative alternation in English by producing novel verbs in the double-object dative frame 54.7 percent of the time for monosyllabic verbs and 39.1 percent of the time for polysyllabic verbs. In a follow-up experiment, Gropen and colleagues tried to elicit double-object dative uses of novel verbs more naturally, that is, by modeling the novel verb using a syntactically neutral gerund form (“This is norping”). Production was elicited with sentences like “Can you tell me, using the word ‘norp,’ what I’m doing with you?” Results again indicated success in eliciting double-object datives in general. Children produced unmodeled double-object datives with novel verbs in response to 41 percent of questions, with 75 percent of children producing at least one double-object dative. Children were significantly more likely to produce double-object datives if the recipient was a prospective possessor than if it was an inanimate location, with the highest rate of double-object dative responses occurring when the child was the recipient, followed by when toy animals were recipients. Gropen and colleagues concluded on the basis of their results that children were productive in their use of double- object datives by virtue of respecting semantic and morpho-phonological constraints on the permissible bounds of extension in English—all this against a background of relative conservatism, whereby children preferred to use argument structures where they had heard a verb used. On the other hand, Mazurkewich and White (1984) investigated 9-, 12-, and 15-year-old English speakers’ knowledge of dative extension patterns using a grammaticality judgment task and found relatively widespread overgeneralization late in linguistic development. For instance, 46.7 percent of the 9-year-olds accepted utterances like “Tom reported the police the accident” as grammatical. At 12 years, 33 percent of participants accepted them, and at 15 years the acceptance rate was still measurable at 11 percent. It is not immediately obvious how one can square this sort of finding, if robust, with Gropen and colleagues’ results.12

9.4 Analysis On balance, do the results summarized so far in this chapter favor one theoretical approach to argument structure over another? In our view, there is no definitive answer 12

Though see Fodor (1985) for discussion of possible methodological problems with the experimental work in Mazurkewich and White (1984).

Argument Structure 177 to this question currently. Moreover, as stressed in section 9.2, the early abstraction and emergentist approaches are best thought of not as diametrically opposed but rather as positions along a continuum with respect to the potential role of the input in building/ constraining generalizations related to argument structure. We can see at least two constructive ways forward, which we discuss in turn. The first involves highlighting how each approach advances our understanding of the process by which children master verbal argument structure as well as what issues each approach raises. The second involves considering the potential benefits and risks of a shift in focus away from what is acquired by a given stage in this domain and toward how that knowledge is acquired. Indeed, there are signs that the field might already be progressing in this direction. Fisher (2002) comments on the complementary (as opposed to contradictory) nature of work emphasizing either the role of abstraction in driving the learning process on the one hand or item-based learning about verbs and their arguments on the other, asserting that “to achieve a complete picture of how children learn their native languages, we must explore the interactions of lexical and more abstract syntactic knowledge in language acquisition” (2002: 275). What then do the two main approaches to argument structure contribute to that picture? To start with, it would be difficult to underestimate the significance of early abstraction work probing young children’s developing comprehension of argument structure through the second year of life. Positive findings early in development such as the ones reviewed in sections 9.3.1–9.3.2 help to motivate a meaningful distinction between competence and performance, which in turn should suggest caution is in order when interpreting null results in production in these stages and later on. Overall, however, the early abstraction approach to argument structure (with a few notable exceptions) has neglected to adequately address thorny learnability issues posed by overgeneralization errors past age 3 as well as challenges to universals of argument linking raised by cross-linguistic data. In comparison, emergentists (using production-based methods in particular) have quite successfully highlighted the importance of the type of lexical learning older children must do in order to arrive at an adultlike understanding of argument structure. That said, their investigations of younger children’s competence have yielded more mixed results. While the healthy skepticism toward the necessity of positing innate knowledge on the part of early learners that emergentists tend to emphasize is certainly valuable, one might find fault with the (at times) overly broad strokes in which their proposals of alternative explanations are sketched. Whether or not the domain- general learning strategies frequently attributed to children—intention reading, analogical structure mapping, and/or pragmatic (i.e. Gricean) reasoning (e.g. Tomasello 2000b; Goldberg 2006), to name a few—are relevant to the acquisition of argument structure is an empirical question; however, valid empirical tests require the sort of predictions that only more explicit formal accounts of learning can generate. On that note, we close with a brief discussion of mechanisms of learning. As alluded to in this chapter, we perceive a trend in work on the acquisition of argument structure toward focusing on how the process unfolds. Three papers serve to illustrate this trend. First, recall from section 9.3.1 that White and colleagues (2011) demonstrated syntactic bootstrapping as early as 16 months in English-speaking children

178 Joshua Viau and Ann Bunger who knew few verbs. More broadly, these authors described a U-shaped curve along which 16-month-olds without verbs and 28-month-olds behaved as expected in the experiment outlined earlier, but 16-month-old verb users and 19-month-olds did not. They argued that learners’ developing verbal knowledge, combined with the maturation of their ability to parse syntactic structure, accounts for the observed dip in performance. Abstracting away from the details of this particular study, we note the surprisingly early influence of verb-specific knowledge that it seems to reveal and its focus on early sentence processing. Second, Wonnacott and colleagues (2008) explored language learners’ ability to acquire knowledge of argument structure when Pinker-style semantic bootstrapping was not an option. In a series of artificial language learning experiments with adult participants, they showed that, even without semantic cues to verb distribution, learners could acquire both verb-specific and verb-general constraints based on distributional information in the input. Concerning methodology, Wonnacott and colleagues found that participants’ production and online comprehension data reflected either probabilistic knowledge or knowledge of absolute constraints, while their grammaticality judgment data reflected only knowledge of absolute constraints, for example, a verb was only judged to be more grammatical in one construction than the other when participants were convinced that the verb could never occur in the other. Thus, participants appeared to distinguish grammaticality from other factors potentially affecting verb usage. Planned follow-ups are reported to include taking a closer look at spoken corpora in order to gauge the relative naturalness of the statistical regularities that participants were able to track and adapting the artificial language learning paradigm for use with children. Third, Alishahi and Stevenson (2008) recently discussed a computational model for the representation, acquisition, and use of verbs and constructions. In the Bayesian framework they adopted, constructions were viewed as probabilistic associations between syntactic and semantic features. The authors found that the behavior of their model throughout acquisition largely mirrored the stages of learning through which children have been observed to progress. Of course, it is always important to closely examine what types of knowledge are assumed by/built into models of this type, and Alishahi and Stevenson’s is no exception—though in more recent work (Alishahi and Stevenson 2010) they claim to have managed to remove semantic role information from the “assumed knowledge” column. Regardless, we are encouraged to observe the mini-renaissance that these studies and others have begun to form in terms of attention to learning mechanisms in this domain. A proper weighing of the twin roles of input and abstraction with respect to the development of knowledge of argument structure will no doubt continue to preoccupy researchers of all theoretical persuasions for years to come.

Chapter 10

Voice Alternat i ons ( Active, Passiv e , Mi ddl e ) Maria Teresa Guasti

10.1 Introduction In the passive voice the mapping of thematic roles is non-canonical, in that the theme/ patient, or another internal argument, becomes the grammatical subject, while the external argument generally becomes an adjunct. In addition, the morphosyntactic form of the verb changes. The pragmatic conditions for the use of the passive voice are also different than they are for the active. This is because the same event is being presented, but with the theme/patient (or another internal argument) highlighted. The examples in (1) illustrate these changes: (1a) is in the active voice, and (1b) is in the passive voice. (1) a. John is washing the dog b. The dog is being washed (by John) We refer to (1b) as a long passive (LP) when the by-phrase is expressed, and as a short passive (SP) when the by-phrase is omitted. Looking across languages, we find other structures in which an internal argument is promoted to the subject function and occurs in the subject position, for example Italian reflexives, under some analyses (Marantz 1984; Cinque 1988), as in (2a), where the inalienable possessor Gianni is taken to have been base-generated as an indirect object; or Italian impersonal constructions, as in (2b,c), where the internal argument occurs either in a post-verbal position (where subjects can be found in Italian) or in a preverbal position. Regardless of the position, the internal argument determines agreement on the verb: (2) a. Gianni si è morso la lingua John SI is bitten the tongue ‘John bit his tongue.’

180 MARIA TERESA Guasti b. A scuola si scrissero molte lettere At school, SI wrote many letters ‘At school, they wrote many letters.’ c. Molte lettere si scrissero a scuola Many letters SI wrote at school ‘They wrote many letters at school.’ In contrast to passives, however, there cannot be any by-phrase in (2). In Austronesian languages (Indonesian/Malay, Malagasy, Tagalog) there is a rich system of voices and, depending on which voice the verb bears, the surface subject will be the agent, the theme/patient or the goal. Regardless of which argument is promoted to the subject function, the other arguments remain arguments and are not demoted to adjunct status, as they are in an English passive. As most of the research on voice alternations has concentrated on passives in the English style, this chapter will be mainly devoted to them, but the cross-linguistic dimension will not be neglected, and when relevant pieces of data are available from other voices, they will be discussed. Not only are passive sentences complex, they are also infrequent in the input that English-learning children receive. This is especially true of LP (Svartvik 1966; Brown 1973: 358) and non-actional passives (Gordon and Chafetz 1990). For these reasons their acquisition might have been expected to be delayed, relative to active sentences. Although there is a delay, the picture that is emerging nowadays offers a more articulated perspective, because, on the one hand, data have accumulated that provide a finer view of what children know about the passive voice and, on the other, new techniques are available that allow one to gather data not available through earlier methods. Finally, it should be clear that passive voice is a descriptive term that covers a number of operations independently present in a range of structures. Children may be poor at some operations, but better at others. Thus, the important point is to understand what precisely makes the passive voice more difficult.

10.2 Some Generalizations about the Passive Voice Earlier studies in the 1960s and 1970s came to the conclusion that children below the age of 5 years have trouble comprehending (Fraser et al. 1963; De Villiers and De Villiers 1978b) and producing passives, especially with reversible verbs, whereas actives are produced from the first multiword utterances. Children mis-interpret passives (De Villiers and De Villiers 1973), and rarely use them in their spontaneous speech (Menyuk 1963). Studies carried out later in the 1970s and 1980s provided a more articulated picture, with some aspects of passives handled better than others.

Voice Alternations (Active, Passive, Middle) 181 In more recent years it has been shown that children comprehend active sentences before they even begin putting words together. Seventeen-month-old English-to-be infants can correctly associate an active sentence to a scene on the basis of word order. When they hear a sentence such as Cookie Monster is scratching Big Bird, they prefer to look at the picture displaying this event rather than a picture displaying Big Bird scratching Cookie Monster (Hirsh-Pasek and Golinkoff 1996b). This is not so for passives. More recent studies have confirmed that passives are not easy, but have also narrowed down the search space for the source of the difficulties. In addition, whilst earlier studies were based on English, other languages have come to the arena. The various research carried out on passives invites the following generalizations: (3) Some generalizations a. Around the age of 4–5 years, passives based on actional verbs (comb, scratch, touch, kiss, kick, hold) are produced and comprehended better than passives based on non- actional verbs (see, hear, love, remember) (Maratsos et al. 1985). b. Production and comprehension of SPs starts earlier (around the age of 3 years or a bit earlier) than production and comprehension of LPs (Horgan 1978). c. Get-passives in English are comprehended and produced earlier than be-passives (Turner and Rommetveit 1967a, 1967b; Harris and Flora 1982; Crain and Fodor 1993; Slobin 1994). In what follows we are going to discuss each of these generalizations, and also report some more recent findings.

10.2.1 Actional and Non-actional Passives Maratosos et al. (1985) were the first to notice the actional /non-actional asymmetry in the comprehension and production of passives, which has since been replicated in studies of comprehension (Sudhalter and Braine 1985; Gordon and Chafetz 1990; Fox and Grodzinsky 1998; Terzi and Wexler 2002; Hirsch and Wexler 2006; Manetti 2012) and of production (Pinker et al. 1987; Budwig 1990, 2001; Marchman et al. 1991). Children comprehend (4a), with an actional verb, better than (4b), with a psych(ological) verb. (4) a. Jasmine was combed (by Wendy) b. Peter Pan was feared (by Captain Hook)1 1

Some speakers of Italian and English feel that the SP version of (4b) is less natural than the SP of (4a). A small-scale investigation, with a small set of actional and psychological verbs in Italian, suggests that with actional verbs SP are uniformly accepted, while with psychological verbs there seems to be variation. This fact deserves further investigation, as this asymmetry in the availability of SP between these two classes of verbs should have an impact on acquisition.

182 MARIA TERESA Guasti Spontaneous production of passives has been observed from as early as 3 years (Pinker et al. 1987), but interestingly most of the examples are based on actional verbs. (5)

So it can’t be cleaned? I don’t want the bird to get eated ... it will be cooked in de minute

(Adam, 3;2) (Adam, 3;7) (Adam, 3;3)

In a very comprehensive study with six groups of English-speaking children aged 3;0 to 5;11, Hirsch and Wexler (2006) tested children’s comprehension of actional verbs (hold, kick, kiss, push) and psych verbs (hate, love, remember, see) using a two-choice sentence– picture matching task. They found that active voice was unproblematic from the age of 3 years on, with both types of verbs. Developmental change was observed for all types of passives, however, with actional SPs understood earlier than actional LPs and psych passives. The psych passives were still not understood at the age of 5 years, regardless of whether the by-phrase was expressed (LP) or not (SP). These results are illustrated in Figure 10.1. In contrast to children, adults understood all sentences equally well. In another study it was found that passives based on psych verbs were not well understood until age 7 (Hirsch et al. 2006). Findings from other researchers are only partially consistent with those already discussed. Manetti (2012) (cf. also Volpato et al. 2015), presents Italian data confirming the asymmetry between actional and psych passives, with the former being comprehended at an adult-like level from the age of 4;6. Unlike Hirsch and Wexler (2006), however, she did not find a difference between SPs and LPs based on actional verbs. Fox and Grodzinsky (1998), based on English, found that psych SPs were comprehended earlier than psych LPs in a subgroup of 5-year-olds. However, this was not confirmed Comprehension of passives Correct responses

100% 80% 60% 40% 20% 0%

Actional actives

Actional Actional Psych actives long passives short passives

Psych long passive

Psych short passive

Type of passives 3–

3+

4–

4+

5–

5+

Figure 10.1 Percent correct responses by children (based on numbers from Hirsch and Wexler 2006).

Voice Alternations (Active, Passive, Middle) 183 in Orfitelli (2012). She found that actional passives (SP, LP) were comprehended earlier than psych passives and that the latter were generally well comprehended around 6 years, regardless of the presence or absence of the by-phrase (comprehension accuracy was around 80 percent in both cases). These contrasts certainly invite further exploration, possibly with cross-linguistic comparisons, and with an investigation designed to determine whether difficulties are confined to the use of psych verbs in passives, or extend to the use of psych verbs in other, related constructions. One area worth exploring is causative constructions in Romance languages. Guasti (1993, 1996b) has shown that in French and Italian, psych verbs can only be used in the faire-infinitive (FI) construction (6a), where the causee is introduced by the preposition a ‘to,’ and usually must be present), and not in the faire- par (FP) construction (6b), which displays a close resemblance to passives: the causee is introduced by the preposition da ‘by,’ and is optional (Kayne 1985; Burzio 1986).2 (6) a. Quella lezione fece amare la matematica ai bambini That class made love mathematics to the children ‘That class made children love mathematics.’ b. *Quella lezione fece amare la matematica (dai bambini) That class made love mathematics by the children ‘That class made mathematics be loved by children.’

Faire-infinitive

Faire-par

Such restrictions do not hold with actional verbs, which can appear in both types of causative. How do children handle causative constructions in these languages? If children have a general difficulty with psych verbs, we expect them to fail to produce FI based on these verbs. At the same time they should be able to produce FI based on actional verbs. Similarly, they should be able to produce FP based on actional verbs, with or without the by-phrase. In the latter case, the preposition “by” would assign a thematic role to the NP that it governs. In sum, the FI and FP constructions involve a number of ingredients related to passives, and their study may therefore shed new light on the active/passive alternation.

10.2.2 Short and long passives As we saw in the previous section, SP tend to be comprehended earlier than LP and they also tend to be produced earlier. (Note that the examples in (5) are all SP.) Slobin (1968) found that in retelling stories that contained both SP and LP, children were inclined to retell SP with passive sentences, but they used active sentences to retell LP. Messenger (2009) points out one weakness of these findings. She notes that retelling LP 2 As will be discussed below, the prohibition on psych verbs makes FP especially similar to the English get-passive.

184 MARIA TERESA Guasti with active sentences may be natural as two arguments are mentioned, while retelling SP with active verbs requires the use of a pleonastic noun phrase, somebody, and this may sound unnatural: Bob was invited to the party should become Somebody invited Bob to the party. Other studies have reported that children comprehend SP and LP equally well (Maratosos and Abramovitch 1975). This seems to be correct from the age of 5 years on, based on Hirsch and Wexler (2006), but a small advantage for SP seems to be evident earlier. In addition, production of LP is observed in the literature from the age of 3½ (Budwig 1990, 2001). One important observation is that the production of LP (i.e. of the by- phrase) is subject to pragmatic conditions. Take example (7). It is relevant to mention “the red soldier” if there is someone else who could have hit Bob, and the actual hitter’s identity is under discussion. (7) Bob was hit by the red soldier Crain et al. (1987, 2009) showed that 3-and 4-year-old children are more likely to produce LP in pragmatically appropriate contexts than in pragmatically inappropriate ones. Although these authors used only actional verbs, their findings support the claim that the pragmatic context matters for the production of the by-phrase. This observation, based on production, has also been investigated for comprehension in a small-scale study by O’Brien et al. (2006). These authors showed that children from 3;6 comprehend LP (actional and non-actional) better if the story contrasts the actual agent (or experiencer) with another potential agent (or experiencer). This contrast makes the use of the by-phrase felicitous, as it provides information about who actually did something as opposed to someone else who could have done it. These findings are very suggestive and indicate that the impact of felicity conditions for the use of a certain construction must not be neglected. As not only the use of a by- phrase but also the use of a passive in the first place is ruled by pragmatic conditions, the question arises as to whether manipulation of the pragmatic context may have a wider impact on the comprehension of passives. This is an important question for further research.

10.2.3 Get- and be-passives In English, get-passives tend to be preferred over be-passives, and this preference seems to be stronger in younger than in older children (Turner and Rommeveit 1967b), although Marchman et al. (1991) noticed this preference in all children from 3 to 11 years. In addition, Marchman et al. found that, like adults, children used get-passives more frequently with change of state verbs (lick, hit, bite) than with verbs like give, throw, and crawl under. Crain et al. (1987, 2009) found that many of the LPs produced by children were get-passives. Get-passives are more colloquial, while be-passives are more

Voice Alternations (Active, Passive, Middle) 185 often used in written English and may be boosted through school education. Thus, it is possible that this earlier preference for get-passives simply reflects these facts. We should notice that not all verbs can be used in the get-passive. Hirsch and Wexler (2006) observe that stative verbs are absolutely banned from get-passives, as shown in (8), and propose that this is due to the fact that get-passives impose an eventive interpretation on their complement. (8) *John got loved by Mary *French got known by Mary In addition, some degree of affectedness is necessarily present in get-passives, but not in be-passives. Thus, (9) is acceptable only if something happens to John as a consequence of Mary seeing him. This requirement is absent in (10). (9)

John got seen by Mary

(10) John has been seen by Mary Interestingly, the resemblance between passives and the causative construction FP shows up again, as affectedness also plays a role in the formation of the FP construction, according to Guasti (1996b): only verbs with affected objects can be employed in this construction. As shown in (11a), the object of vincere ‘win’ (i.e. il premio ‘the prize’) is not affected, and this verb cannot appear in the FP construction. Stative verbs are likewise incompatible with the FP construction for the same reason, as seen in (6b) above and in (11b): (11) a. *Maria fece vincere il premio da Paolo Maria made win the prize by Paolo ‘Maria made Paolo win the prize.’ b. *Maria fece conoscere sua madre da Paolo Maria made know her mother by Paolo ‘Maria made Paolo know her mother.’ Based on these similarities and on what we know from the acquisition of passives, we might conjecture that the FP construction will be acquired earlier than FI, because FP is more similar to the get-passive. Future research may establish whether this prediction holds.

10.3 The Usage-based Account Having discussed three main generalizations about the acquisition of passives, we are now ready to discuss some of the proposed explanations and further findings. One account for children’s delay in passive acquisition stresses the role of the input, and

186 MARIA TERESA Guasti the idea that children learn passives, and indeed all aspects of language, on an item- by-item basis. Gordon and Chafetz (1990) analyzed child-directed speech and found that adults more frequently use passives based on actional than on non-actional verbs (93 percent versus 7 percent). In experimental work they likewise found that 3-and 5-year-old English learners comprehend actional better than non-actional passives, in line with what we discussed in the previous section. They also found that children’s performance at test and at retest was very consistent; that is, at retest children still comprehended better those passives that they had already understood at test. They proposed that this consistency is due to children possessing an item-specific representation of passives.3 Results from studies using modeling or syntactic priming paint a more intricate picture. Whitehurst et al. (1974) used actional passives or actives to describe pictures to children. Then, they asked the children to describe other pictures. The results indicated that modeling passives to 4–5-year-old learners had a positive impact on both the production and the comprehension of passives, when compared with a control group who had not been exposed to any model at all. In a syntactic priming study, Savage et al. (2003) examined the production of actional passives in 3-, 4-, and 6-year-old children following an active or a passive prime. First, children were presented with a picture that was described with an active or passive sentence. The sentences included the verb and either two pronouns (high overlap condition: it pushed it or it got pushed by it) or two NPs (low overlap condition: the digger pushed the bricks or the bricks were pushed by the digger). Then, they were shown another picture and asked to describe it. Verbs used in the priming sentences and target sentences were different. Results showed that 6-year-old children produced more passives when primed with passive sentences than when primed with active ones, regardless of whether the primes included pronouns or NPs. By contrast, 4-and 5-year-olds showed a priming effect only when the sentences contained two pronouns (and not when they included two NPs). From this, Savage et al. (2003) concluded that 6-year- olds, like adults, have an abstract representation of passives, independent of specific lexical items; by contrast, 3-and 4-year-olds have a representation that is, at least in part, item-specific. Thus, where Whitehurst et al. (1974) found a positive effect of modeling at age 4–5, Savage et al. (2003) seem to find a generalized effect only at 6 years (see also Savage et al. 2006). Moreover, while the studies reviewed here showed good performance on actional passives before the age of 6 (at 5 years they already understood 90 percent of LPs; see Figure 10.1), Savage et al. (2003) find evidence for generalization only at the age of 6 years. These contrasting findings raise doubts about the conclusion that children have an item-specific knowledge of passives at 3–4 years, especially because these 3 Vasilyeva et al. (2006) found that children who heard stories including a high number of passives subsequently used passives more frequently than children who heard the same stories told in the active. They also comprehended passives better than their peers and better when compared to their performance before the exposure to the stories with passives.

Voice Alternations (Active, Passive, Middle) 187 children are indeed primed to produce passives, at least when pronouns are used. As pointed out in Messenger (2009), Savage et al.’s (2003) findings are compatible with the idea that children at this age already have an abstract syntactic representation of passives, but this is evident only when the processing effort is reduced, as when pronouns, rather than NPs, are employed in the prime. It is important to call attention to the use of different verbs in the priming and target sentences, as this indicates that children were not merely mimicking what they heard, but rather using the underlying structure of what they heard to generate a new, structurally similar sentence. Thus, although input must matter, it is unclear exactly in what way and how this should be interpreted. In fact, we could equally well reason as follows: children do indeed produce passives and, by hearing them in syntactic priming experiments, they are boosted in their use, despite the limited input they have received. In the same vein, item-based effects are expected on all accounts, as children need to learn specific properties of classes of verbs (e.g. if these verbs participate in the passive construction). But this does not entail that children’s knowledge is limited to specific items. In syntactic priming experiments, children produce passives more frequently, because the structure is available to them. As a matter of fact, on all accounts, by the age of 4–5 years children have some abstract knowledge of the passive. To better appreciate the challenges facing a frequency-based account, we will briefly discuss a piece of data presented in Hirsch and Hartman (2006). These authors reasoned as follows. If passives are difficult for children because they are infrequent in the input (especially passives based on psych verbs), it is expected that other infrequent constructions will also be difficult. Hirsch and Hartman examined the input received by Adam and Sarah (Brown 1973, available from CHILDES), and collapsed the data from the two corpora. They found that these children heard 16 object who-questions with actional verbs, 11 object who-questions with psych verbs, 7 psych passives, and 76 actional passives. A frequency-based account would thus predict difficulties with object questions, much as it does for passives. Instead, Hirsch and Hartman (2006) found that English-speaking children with a mean age of 4;1 (age range 3;1 to 5;8) comprehended subject and object wh-questions very well, with no difference as a function of the verb class employed. Moreover, even the youngest group of children had a high level of accuracy (around 90 percent accurate in all conditions).4 This is surprising given the rarity of object questions in the input. Although children may have difficulties with passives, the low number of passives they hear cannot be the cause. In fact, the low input frequency may be telling us that passives are hard to produce even for adults, who certainly have an abstract knowledge of the passive. From this perspective the interesting question is what makes passives difficult for adults (and what makes them even more so for children).

4 I should point out that object questions are more difficult than subject questions in a variety of languages, English included, but the difficulty is not evident at the age investigated by Hirsch and Hartman (2060). For a wider discussion, see Guasti et al. (2012).

188 MARIA TERESA Guasti

10.4 The A-chain Deficit Hypothesis Another proposal that synthesizes the results from children’s production and comprehension of passives was Borer and Wexler’s (1987) A-chain Deficit Hypothesis (ACDH). This hypothesis holds that children do not succeed with passives because they are unable to form an A(rgument)-chain between the original position of the internal argument, and the subject position to which it moves, as in (12). (12) Horacei was scratched i (by Aladdin) As a consequence of this failure, children cannot correctly assign the thematic role to the displaced object, Horace in (12). The ability to form an A-chain and thus a verbal passive is biologically determined, and matures around 7 years. Of course, children younger than 7 do not fail at passives entirely, as actional passives are possible for younger children. Borer and Wexler (1987) proposed that this comes about because children parse what for the adults are verbal passives as adjectival passives. The latter differ from the former in that they do not involve the critical formation of an A-chain. While in English the verbal and adjectival passives are homophonous (e.g. The door was closed), in other languages this distinction is morphologically expressed, as in German and Hebrew (Berman and Sagi 1981). In these languages, adjectival passives are employed from age 3, while verbal passives are only used by 9 years, thus supporting Borer and Wexler’s (1987) hypothesis (Mills 1985; but see also Aschermann et al. 2004 for evidence showing that at the age of 5 years German speaking children understand actional verbal passives; see also Abbot-Smith and Behrens 2006). Furthermore, evidence from Japanese (Sugisaki 1999) also suggests that children have difficulties with passives that involve A-chains, but not with those passives in which the object does not have to move. Whether these latter passives are adjectival is not discussed. While earlier evidence seemed to support the ACDH, more recent evidence offers a more complex picture. An important ingredient of the ACDH is that passives based on actional verbs tend to make better adjectives (e.g. the combed doll) than do passives based on psych verbs (e.g. ?the seen doll). Thus, the former, but not the latter are amenable to an adjectival parse. English-speaking children exploit this escape hatch to understand passives. However, this escape hatch is not available in languages like Greek, where verbal and adjectival passives are not homophonous. Thus, it is predicted that in these languages children will be weak in comprehending not only passives based on psych verbs but also passives based on actional verbs. Terzi and Wexler (2002) tested this prediction for Greek and found that 5-year-old Greek-speaking children do poorly in the comprehension of verbal passives based on both actional and psychological verbs. According to the ACDH, this is due to the lack of the adjectival strategy to accommodate verbal passives. Greek-speaking children did not have trouble with true adjectival passives. In a subsequent study, however, Driva and Terzi (2007) used a different design and different verbs, and found much better performance on actional passives (85 percent against 44 percent in Terzi and Wexler at the age of 5 years), with no difference

Voice Alternations (Active, Passive, Middle) 189 between SP and LP and more importantly with no difference with respect to adjectival passives. Poor performance was still found in passives based on psych verbs regardless of the presence or absence of the by-phrase. These newer findings make the picture somewhat more complex, as children do not seem to have problems with verbal passives based on actional verbs even in Greek, where the only analysis that can accommodate verbal passives is one that involves A- movement. These findings also confirm that passives based on psych verbs are more problematic, in line with other studies that we have already mentioned, and they show that the presence of the by-phrase with actional passives does not matter, as certain other studies have found. Of course these findings threaten the ACDH in the form we are discussing in this section. At this point, we would expect Hebrew-and German-speaking children to behave similarly to their Greek peers on verbal passives. For German, contrasting evidence exists in the literature. While the earliest studies found that German children did not produce verbal passives, more recent findings of Aschermann et al. (2004) show that 5-year-old German-speaking children do understand verbal passives. These were by all means verbal passives, as they employed the auxiliary werden ‘come’ (and not sein ‘be’), a clear sign of verbal passives, as well as a by-phrase. Additional evidence in favor of the view that early actional passives are indeed verbal comes from Italian. In addition to ‘be,’ in Italian it is possible to use the auxiliary venire (come), which is similar to German werden. Manetti (2012) and Volpato et al. (2015) show that Italian-speaking children understand equally well passives based on actional verbs including both auxiliaries from 4;6. For passives with venire an adjectival parse is not available. Let us come back to the ACDH. Under this hypothesis, the reason that SP are easier than LP is that SP are adjectival passives, and as such they often resist the addition of a by- phrase. For example, the door was closed on an adjectival reading means that the door was in a closed state. If we add a by-phrase it loses the adjectival reading and acquires a verbal passive meaning: there was an event in which the person named in the by-phrase closed the door. Finally, in some of the studies mentioned earlier it was found that passives produced by children tended to have a stative, rather than eventive, meaning. Under the ACDH this follows from the fact that only the stative interpretation is compatible with an adjectival passive. As we will see later, the acquisitional finding is somewhat overstated, and is likely to have been an artefact of using an experimental task in which children were invited to describe the result of what had happened.

10.5 From ACDH to Universal Phase Requirement (UPR) Over the years the ACDH has been modified to accommodate new findings or new theoretical innovations, but the core idea has remained unchanged. Wexler (2004)

190 MARIA TERESA Guasti frames the idea in minimalist terms, by suggesting that the vP of passives is a phase in child grammar, but not in adult grammar (following in this respect Chomsky 2001), a hypothesis known as the Universal Phase Requirement hypothesis (UPRH). As the vP of passives is a phase, by Chomsky’s (2001) Phase Impenetrability Condition (PIC) only material in Spec vP is visible to elements in the next higher phase. The logical object is not in Spec vP, but in the complement position of V, and as such it is not visible outside the vP. Thus, the T of passives in child grammar, but not in adult grammar, fails to check its phi-features with the logical object, as the latter is too deeply embedded. Hence, passives are ruled ungrammatical in child grammar. This proposal is schematized in (13). (13) Verbal passives: [T [vP Spec [VP V

DP-obj]]]

vP is a phase in child grammar vP is not a phase in adult grammar To account for the fact that passives based on actional verbs are apparently possible in child grammar, Wexler adopts Embick’s (2004) analysis, in which adjectival passives have two readings: a stative reading (e.g. The door is opened) and a resultative reading, as in The tank is being filled, whereby the tank, as a result of an event of filling, has entered the “target state” of being full. In resultative passives, unlike verbal passives, it is proposed that the logical object is directly merged in Spec vP, and is thus accessible to movement, as in (14). (14)

Resultative passives: [T [vP Spec DP-obj [VP V]]]

Thus, immature children cannot form verbal passives (as passive vP is a phase in their grammar), but they can analyze actional passives as resultative passives. As these include an event, children can give an eventive interpretation to their actional passives. An advantage of the UPRH (coupled with the resultative view) over the ACDH, is that it can accommodate the fact that children raise the subject from Spec vP to Spec TP/ AgrSP in active sentences, thus forming an A-chain, as in (15). (15)

Active sentences: [Spec DP-subjk T [vP Spec k [VP V

DP-obj]]]

The subject of an active sentence is in the highest specifier of the vP phase and visible from outside: it can check the T features and move to Spec TP. The UPRH can also explain the fact that get-passives are easy, once we make the assumption that the logical object is merged in spec vP (updating Haegeman’s 1985 view; see also Guasti 1993) and thus visible from outside. Thus get-passives would be a sort of resultative passive, a conclusion that straightforwardly accounts for the eventive interpretation of get-passives, as seen in (16): (16) The cow gonna get locked up (Adam, 4;0) That children can assign to passives an eventive interpretation is further indicated by Dutch, where 3-year-old children produce passives of unergative intransitive verbs (as adults do) that do not refer to states, but to events (from Verrips 1996):

Voice Alternations (Active, Passive, Middle) 191 (17) a. Hier wordt gespuilt (2;6) Here become sprayed b. D’r moet gespele worden (Luuk, 3;0) There must played become The passives in (17) are not resultative passives but verbal passives, as indicated by the presence of the auxiliary worden. Notice, incidentally, that this piece of data indicates once more that children can also form verbal passives, as we have already extensively discussed. By itself, this result is not a problem for the UPRH or ACDH; as in (17) we have impersonal structures, where there is no logical object that needs to move out of the phase. However, the findings from Greek and Italian that were discussed in section 10.4 concerning verbal passives based on actional verbs are also problematic for the UPRH. Thus, evidence has accumulated showing that children are not treating passives as simple (non-resultative) adjectival passives, and the resultative analysis has opened new avenues for investigation. One consequence of this view is that children’s comprehension of passives is modulated by the event structure of the passivized verb. Thus, children are expected to be better at comprehending passives that include a target/result state in their event structure than at comprehending those that lack one, a prediction that is supported by a recent comprehension study on Korean passives (Lee and Lee 2008). How far this analysis can be pushed and how it can handle the facts on verbal passives in languages like Greek and Italian is something that future research needs to explore.

10.6 Problems for the ACDH/UPR and Further Data Although the ACDH or the UPRH fares reasonably well with most of the findings about passives, there are unsolved problems. One has been described in the preceding sections: for certain languages other than English, there is recent evidence that passives based on actional verbs are or can be full-fledged verbal passives, where no alternative parse is available. Other problems are discussed in this section.

10.6.1 Passive Participles are not Adjectives In Dutch, passive participles, but not adjectives, can undergo V-raising and appear to the right of the auxiliary, as shown by the contrast in (18). (18) a. Dat Jip gewassen wordt That Jip washed become

192 MARIA TERESA Guasti b. Dat Jip wordt gewassen (after V-raising) That Jip become washed c. dat Jip rood wordt That Jip red become d. *dat Jip wordt rood That Jip become red In an imitation experiment, Verrips (1996) showed that 5-year-old children who accept V-raising with active verbs also do so with passive participles (18b). This suggests that children do not treat passive participles as adjectives, and this threatens the adjectival analysis of early passives. The resultative analysis is of no help here, as the presence of the auxiliary wordt indicates that we are dealing with verbal passives. To clarify the implications of this result, it will be important to investigate whether children not only accept, but also produce, V-raising with passive participles.

10.6.2 Pattern of BE Omission Another potential problem for the resultative analysis is presented by Caprin and Guasti (2006). These authors show that Italian-speaking children aged 2;0–3;6 omit the copula BE less than the homophonous passive auxiliary BE. In ambiguous contexts where a word like chiuso ‘closed’ can be either an adjective (il tavolo chiuso, lit. ‘the table closed,’ meaning ‘the closed table’) or a passive participle (‘the table that has been closed’), omission of BE patterns with omission of the passive auxiliary BE and not with omission of the copula. This suggests that children prefer to treat the ambiguous word chiuso (‘closed’) not as an adjective but as a passive participle, and that what is omitted is the passive auxiliary BE. If so, ambiguous contexts cannot be instances of adjectival passives. The resultative analysis is not an option, because a resultative passive is an instance of an adjectival passive and as such requires the copula BE, not the auxiliary BE.

10.6.3 Unaccusative Verbs Another problem is presented by clauses with unaccusative verbs. If the vP of such clauses is a phase, as Wexler assumes, then the internal argument should not be able to move out. Children are therefore expected to make errors with unaccusatives. Here the evidence is controversial. Chinese-speaking children comprehend sentences with unaccusative verbs from age 2, while they still do not comprehend passives at 5 years (Haiyun and Chunyan 2009). In addition, children speaking various languages produce sentences with unaccusative verbs. In English, the few cases where the child’s surface subject follows the verb are sentences with unaccusative verbs (Guasti 2002), as shown in (19).

Voice Alternations (Active, Passive, Middle) 193 (19) come car fall pants

(Eve, 1;6) (Nina, 1;11)

Pierce (1989, 1992a) further found that 75 percent of the sentences with VS order produced by French-speaking children involved unaccusative verbs. Thus, unaccusatives are clearly produced, but in many cases, the subject is post-verbal. This could be expected under the UPRH, as in this analysis the internal argument cannot move. However, an agreement relation still needs to be established between AgrS/T and the internal argument. As this relation would violate the PIC, sentences with unaccusatives would still represent a problem for the UPRH. One solution could be to say that unaccusative sentences are parsed as unergative, that is, that the sole argument of the unaccusative verb is not merged as a sister of V in the VP, but in Spec vP, as an external argument. This is indeed the move adopted by Babyonyshev et al. (2001) to account for some difficulties children experience with unaccusative structures in Russian. Other difficulties are observed by Miyamoto et al. (1999) in Korean. Based on the speech of one child, they argue that the internal argument of unaccusatives fails to receive nominative case, as expected if it is unable to move. Other data, however, are problematic for the unergative parse hypothesis. Lorusso et al. (2005) found that 2;6–3;0-year-old Italian-speaking children clearly distinguished between unaccusative and transitive/unergative verbs. First, subject omission, which is licit in Italian, was more frequent in clauses containing a transitive or unergative verb (87 percent) than it was for clauses containing an unaccusative verb (72 percent). Second, overt subjects were mostly preverbal in the former case (73 percent), while they were half preverbal and half post-verbal in the latter case. The same distribution of overt subjects has been found in Catalan (Bel 2003), Hebrew (Friedmann 2007) and European Portuguese (Friedmann and Costa 2011). If sentences with unaccusative verbs were actually parsed as though they contained an unergative, it is unclear why these distinctions would have been found. The findings imply that forming an A-chain, or raising the logical object out of vP, is not a problem when unaccusative verbs are involved. But why should unaccusatives differ from passives, if in both cases the vP is a phase? The only viable solution within the UPR framework would be to say that the vP of unaccusatives is not a phase in child grammar, just like adult grammar, while the vP of passive is a phase in child grammar. One implementation of this idea is found in Haiyun and Chunyan (2009), who suggest that the vP headed by unaccusative verbs is not a phase, because there is no external argument, while the vP of passive is a phase in child grammar, because the external argument is projected. Another path to pursue is to investigate whether children do equally well with all kinds of unaccusative verbs (see Sorace 1995 for discussion of an unaccusativity hierarchy).

10.6.4 Passives and Post-verbal Subjects Pierce (1992a) tested comprehension of periphrastic passives in Spanish-speaking children with a picture-sentence matching task. In this language, the logical object of a

194 MARIA TERESA Guasti passive sentence can either move to the preverbal position or stay in the base position after the verb (as in Italian or Catalan), as shown in (20): (20) a. Este libro fue escrito en Mexico This book was written in Mexico b. Fue escrito este libro en Mexico This book was written in Mexico While the representation of (20a) requires an A-chain, the one in (20b) does not, and thus (20b) would be expected to be easier for children to understand than (20a). Yet, the results were not as expected. Children found (20a) easier than (20b), though still difficult. While these findings are incompatible with the ACDH, given that the logical object does not form an A-chain in (20b), and thus (20b) should be easier than (20a), they are less problematic for the UPRH (see also section 10.6.5). On the latter approach, the reason for children’s difficulty with (20b) might be that an agreement relation needs to be established between AgrS/T and the logical object inside the VP. This relation cannot be established in child grammar, as vP is a phase. A proponent of the UPRH would still need to account for the asymmetry between (20b) and (20a), however, and it is not immediately clear how to do this.

10.6.5 Passive and Voice Systems Other challenging results come from studies of non-Indo-European languages, where children produce passives from very early on. Lau (2011) showed that Cantonese-speaking children exhibit good comprehension of the bei-passive. This was so even though the frequency of this construction was very low in the input. Although the bei-construction is related to passives, it is closer to an English get-passive than a be-passive, as noticed in Orfitelli (referring to Hoshi 1994 and Huang et al. 2009). We already know that get-passives are not problematic for children. In addition, on some analyses the bei-construction is derived by A′-movement of an empty operator within the embedded clause that is controlled by the matrix subject (see Huang et al. 2009), and in this respect their earlier use has little impact on the acquisition of English-type passives. Allen and Crago (1996), based on the spontaneous speech of four Inuktitut-speaking children, and Demuth (1990), based on the speech of Sesotho-speaking children, showed that verbal passives are produced from the age of 3 years. Moreover, Demuth et al. (2010) showed that 3-year-old Sesotho-speaking children understand LP at a higher-than- chance level.5 One line of explanation for these findings is based on frequency. In both languages, passives are used much more frequently than in English. For example, in Sesotho passives are very commonly used to form subject questions. As we have already 5 Crawford (2009, 2012) found results in striking contrast with Demuth et al. (2010). Sesotho- speaking children displayed poor comprehension of passives up to 6 years. Orfitelli (2012) notes that Demuth et al. used three characters, as in O’Brien et al. (2006), while Crawford used only two and that this may have rendered the context more felicitous for the use of passives in Demuth et al.

Voice Alternations (Active, Passive, Middle) 195 seen, however, frequency-based explanations are not satisfactory. If passives are more frequent in these languages, first and foremost we should ask why this is so in adult speech. Since it is hardly believable that speakers of these language are more talented than English- speakers at producing difficult structures, it must be that what is called passive has slightly different properties than what is called passive in English, or that there are other properties of the language that “facilitate” passives, as suggested by Allen and Crago (1996). One property holding in Inuktitut is that passives are morphologically and not periphrastically expressed. In addition, the use of passives decreases the number of inflections on the verb compared to an active transitive verb, which must show agreement with both the subject and the object. Thus, passive verbs are morphologically simpler than active verbs. Moreover, Inuktitut is an ergative language, in which the object typically moves to get case, even in active sentences. Thus, movement in passives would somehow be an extension of this possibility. A similar suggestion is offered in Aschermann et al. (2004), who found that German- speaking children were one year ahead of English-speaking children in understanding passives. They claim that this is due to the fact that German-speaking children exploit the presence in their language of topicalization of objects to Spec CP, due to the V2 nature of German. Although this idea is appealing, it should be pointed out that topicalization is an instance of A′-movement, while in the passive the internal object is A-moved. Some elaboration is needed to explain how A′-movement could affect A- movement. However, this line of thinking according to which some properties of the language may “facilitate” passives is worth pursuing. In this connection, however, it may be worth investigating more deeply what is behind the use of a certain voice in the various languages. In Malagasy (and in other Austronesian languages), verbs bear specific voices that indicate which DP argument is the “trigger” or surface subject, that is, the referent on which the rest of the sentence is predicated. In (21a), the trigger is the actor, as indicated by the AT voice on the verb (AT stands for Actor-Topic); in (21b), it is the internal argument or the theme and the verb bears the TT (Theme-Topic) voice, while in (21c) it is the goal or the location, and the verb bears CT (Circumstantial-Topic) voice (Pearson 2005). (21) a. Mamono AT.kill

ny akoho det chicken

amin’ny antsy with-det knife

ny mpamboly det farmer

‘The farmer is killing chickens with the knife.’ b. Vonoin’ TT.kill

ny mpamboly amin’ny antsy det farmer with-det knife

ny akoho det chicken

‘The chickens, the farmer is killing (them) with the knife.’ c. Amonoan’ ny mpamboly ny CT.kill det farmer det

akoho ny antsy chicken det knife

‘The knife, the farmer is killing the chickens (with it).’ The trigger or surface subject invariably moves to the final position of the clause. Whether this is an instance of A or A′-movement is a point of debate (see Pearson

196 MARIA TERESA Guasti 2006 for discussion) and one that data from language acquisition may help to disentangle. Some preliminary data are offered by Hyams et al. (2006). These authors found that Malagasy-speaking children already employed the AT and TT voices between 2 and 3 years. Adhering to the premise that passive is problematic for children at this age, they argued that the TT voice should not be assimilated to passive, and that movement of the trigger to the sentence-final position is not an instance of A-movement, but A′- movement. Following Pearson (2005), they took it to be Topic movement. Whether what is called passive in Inuktitut or Sesotho is likewise not really a passive of the English type, and whether movement of the logical object in these languages is likewise not an instance of A-movement, are open questions. We should point out that Hyams et al. (2006) do not discuss in depth the position of the trigger in the sentences produced by children. It is possible that children use the TT voice, but do not move the theme to the sentence-final position. This would be a crucial piece of data for better understanding the role of the various voices in Malagasy. Also, Hyams et al.’s proposal is based on the premise that children have difficulties with passives. If we rejected this premise and adopted another one, whereby children in Malagasy do not have a problem in A-moving the internal argument, we would have to find a way to distinguish English from Malagasy that makes A-movement difficult in the former, but not in the latter case. Possibly some light on the avenue to follow may be shed by Standard Indonesian. In this language, both the TT voice (with subject promotion of the theme argument) and a morphological passive with a by-phrase are available (Cole et al. 2005). In the former case, the agent remains an argument, while in the latter it is demoted to adjunct status. This is an ideal situation, as, under the UPRH and under Hyams et al.’s (2006) view on Malagasy, we expect Indonesian children to have difficulties with passive sentences, but not with sentences in the TT voice. We leave this as a direction for future research. Finally, the data from Malagasy, as well as Inuktitut and Sesotho, clearly show that passive is just a label. More data from languages with richer voice systems might help in discovering the various processes that are involved in those constructions that we label passive, and in establishing which processes are more problematic during acquisition.

10.6.6 Middle Voice There have also been a few studies on the middle voice. Pierce (1992a) tested the acquisition of the middle voice, or more precisely of the Spanish medio-passive SE construction, as exemplified in (22), with a semi-imitation elicited production task. She found that children were more likely to produce sentences with the logical object in post-verbal position, as in (22b), than preverbal, as in (22a), in line with the ACDH (since no A-chain needs to be formed in (22b)).

Voice Alternations (Active, Passive, Middle) 197 (22) a. Este libro se escribó en Mexico. This book SE wrote-3sg in Mexico b. Se escribó este libro en Mexico SE wrote-3sg this book in Mexico ‘This book was written in Mexico.’ This result is problematic for the UPRH. In (22b) the logical object is in post-verbal position, and in general in this kind of construction it determines agreement on the verb, as can be seen in (23), where a plural NP is used: (23) Se escribieron muchos libros en Mexico SE wrote-3pl many books in Mexico ‘Many books were written in Mexico.’ The UPRH incorrectly predicts that (22b) will not be understood by children, because in (22b) an agreement relation has to be set up between T/AgrS and the object in its base position, across the vP phase. Pierce’s (1992a) experiment did not include examples like (23). Therefore one could claim that in (22b) we have an instance of default agreement (third person singular), and that this does not require the establishment of an agreement relation (see also section 10.6.3). Thus, in future work it will be important to test the middle voice varying the number agreement on the verb. Another study looked at reflexive verbs in French and Italian (Snyder et al. 1995), which according to some analyses are unaccusative verbs with the internal argument moved to the preverbal position, as seen in (2), repeated below. (24) Gianni si è lavato John SI is washed ‘John washed himself.’ The unaccusativity analysis is supported by the fact that in Italian and French the auxiliary BE is used in these sentences, as also happens with passives and unaccusatives in these languages. Snyder et al. (1995) found that 2–3-year-old children select the correct auxiliary with reflexive verbs and virtually no errors are produced. They also showed that this could not be due to lexical learning, as the same verb was used both with reflexive and non-reflexive clitics, which are quite similar. Note that (24) forms a minimal pair with (25): (25)

Gianni lo ha lavato John him has washed ‘John washed him.’

The absence of errors in auxiliary selection has also been reported for Italian lexically unaccusative verbs (Snyder et al. 1995; Guasti 2002). Interestingly, there is also some evidence indicating that German-speaking children select the correct auxiliary even with

198 MARIA TERESA Guasti nonsense verbs (Randall et al. 1994), indicating that selection of the auxiliary is not simply item-based. All in all, these findings indicate that children are not misrepresenting sentences with reflexive verbs, and that they are able to move the internal argument to the preverbal position in sentences including both reflexives and lexical unaccusatives. This is certainly a problem for the ACDH, but could be handled under the UPRH, if the assumption is made that the vP of reflexives is not a phase, as we also suggested for unaccusative verbs earlier in this section. However, this solution would render the UPRH somewhat ad hoc, as only in passives would the vP be a phase.

10.7 The Transmission of the Thematic Role Passives are a problem for children up to 7 years, but the data reviewed in section 10.6 cast doubt on the hypothesis that moving the logical object is the core problem, because in other structures, moving the logical object seems unproblematic.6 An alternative proposal is that the source of difficulty is the transmission of the theta role to the by-phrase, a proposal put forth by Fox and Grodzinsky (1998). These authors proposed that children under the age of 7 are unable to transfer the thematic role of the external argument to the NP in the by-phrase, and thus cannot properly process passives, the inability being due to maturational or processing limitations. This account squares well with the finding that children are better at SP than at LP, as the latter but not the former require transmission of the theta-role to the by-phrase. In order to account for the earlier comprehension of LP based on actional verbs, the assumption is made that in actional passives the preposition by can assign a default theta-role of Agent. This assumption also explains why get-passives are easier, as the NP in the by-phrase is agentive and thus can be assigned through the default mechanism. This proposal is consistent with all the results showing that unaccusative and reflexive verbs are unproblematic, as there is no by-phrase in these cases. For the same reason, it is also in agreement with findings showing that children have no problem in understanding impersonal passives. 6

Another structure that involves A-movement is raising, as in (i). Evidence on the acquisition of this structure is also mixed, with some studies showing that children have problems with it, especially when the experiencer “to Mary” is present (Hirsch et al. 2007; Orfitelli 2012). Children do not have problems with raising structures that do not involve movement, as in (ii). Other authors contest this conclusion, suggesting that children do understand raising structures (Becker 2006, 2007, see also Becker and Kirby this volume). (i) Johni seems (to Mary) ti to have won the prize (ii) It seems (to Mary) that John won the prize.

Voice Alternations (Active, Passive, Middle) 199 Although this proposal is very simple and attractive, it does not appear to be sound. First, it is at odds with Hirsch and Wexler’s (2006) findings that psychological short passives are problematic for children. Fox and Grodzinsky (1998) found that 8 of the 13 children in their study comprehended all passives well (long and short actional passives, and short psych passives) except for psych LP. Yet, they also found two children who failed to comprehend short, as well as long, psych passives. Thus, the claim that children comprehend short passives based on psych verbs awaits further confirmation. Second, Hirsch and Wexler examined children’s comprehension of the agentive by-phrase in nominals, as in (26a), and contrasted this with sentences involving the preposition about, as in (26b): (26) a. The story by Minnie had a mountain in it b. The story about Minnie had a mountain in it They found that children had no problem in understanding (26b) from age 3, but were still poor at comprehending (26a) at the age of 5 years. Children did not understand that the by-phrase in nominals reflects an agent/creator interpretation, and took it to denote the theme (e.g. ‘book about Minnie’) of the noun to which it was adjoined. If children do not understand that by can assign an Agent theta-role in nominals, it is unclear how they could use this mechanism in passives. In this case again, causative constructions become relevant, as in the FP construction the NP in the by-phrase is assigned a default Agent theta-role by the preposition, according to Guasti (1996b). If children do not understand the by-phrase, we expect them not to understand the FP construction, a prediction for future investigation.

10.8 The Canonical Alignment Hypothesis A third line of explanation is a development of the ACDH, and holds that the child’s difficulty is limited to those A-chains that create a misalignment of thematic and grammatical hierarchies, as usually happens in passives, where the theme is mapped to the surface subject position and the agent is demoted to adjunct status. This hypothesis, known as the Canonical Alignment Hypothesis (CAH), was proposed by Hyams et al. (2006) in their paper about Malagasy, discussed in section 10.6.5. Like the ACDH, it is a maturational hypothesis in which children adhere to an especially rigid version of the linking rules from thematic roles to grammatical functions. According to the CAH, children are unable to form A-chains that result in a misalignment of thematic and grammatical hierarchies. Agents are typically mapped into the subject function; if there is no agent in the thematic grid of the verb, the experiencer is mapped into the subject function and in case there is no experiencer, the theme will be the subject. Thus, children obeying the CAH map the external argument (agent, experiencer), if there is one, onto the subject function. If there is no external argument, as

200 MARIA TERESA Guasti with unaccusative verbs, they will map the sole argument, the theme, into the subject function, but this does not derive a misalignment of the thematic and grammatical hierarchies, as there is a single argument. Thus, unaccusative structures are correctly predicted to be unproblematic. Similarly, adjectival passives are expected to be unproblematic, as in most cases they include only one argument (the exception being cases like The food is untouched by human hands).7 The same holds true for get-passives, if we maintain that the complement of get is a small clause or a VP without an external argument, and if we set aside those cases in which the get-passive includes a by-phrase (as in The car got hit by a train). In this case, there will only be one argument, the theme, which raises to Spec TP/ AgrSP (see Haegeman 1985 for this proposal).8 Under the CAH, reflexive and impersonal SI constructions are all predicted to be understood by children, before verbal passives. However, the CAH cannot explain why actional passives are understood more accurately than psychological passives, at an earlier point of development. In fact, the A-chain of both actional and psychological passives involves a misalignment of the thematic and grammatical hierarchies and thus should fail to be understood. This is a point that research following the CAH needs to address. Hyams et al. (2006) state that children will have no trouble with active verbs, as typically the mapping from theta roles to grammatical functions respects the CAH. There is one case, however, where this does not happen, namely with object-experiencer verbs, like scare, irritate, amuse, surprise, which, unlike subject-experiencer verbs, have not been the topic of much research so far.9 With these verbs, in the active, the theme is mapped into the subject function and the experiencer into the object one, as shown in (27). (27) Homer scares Marge Bunny amused Minnie As experiencers are generally mapped into the subject function (when there is no Agent), the sentences in (27) are instances of misalignment of the thematic and grammatical hierarchy. Here Homer and Bunny, the themes, are in subject position. Thus, these active sentences are predicted to be problematic for children by the CAH. By 7 Hyams et al. (2006) claim that the CAH predicts that raising structures will be unproblematic for children, as the argument that raises is the sole argument. This is correct, but the CAH also predicts that a raising structure with a PP experiencer, as in (i), will be problematic, as the raised argument is a theme that fills the subject position and the PP is an experiencer. As we said in the previous footnote, the evidence for raising structures is controversial, but for sentences with the PP experiencer the only data available show poor comprehension by children (Hirsch et al. 2006).

(i)

Johni seems (to Mary) ti to have won the prize.

8 This proposal has some similarities to that of Maratsos et al. (1985), who argued that children do not acquire passives late, but restrict the use of these structures to certain classes of verbs, initially actional verbs, which are precisely those verbs whose external argument, an agent, is “mapped canonically,” in their terms. 9 These verbs have both a stative reading and an agentive reading (cf. John scared Mary on purpose). Here it is the stative reading that is relevant.

Voice Alternations (Active, Passive, Middle) 201 contrast, their passive versions, in (28), should be well understood, as the experiencers are mapped into the subject function and the mapping between theta roles and grammatical function is aligned. (28) Marge is scared by Homer Minnie is amused by Bunny These predictions are partially fulfilled. Messenger (2009), using a sentence–picture matching task, found that 3–4-year-old children are as accurate with passives of object-experiencer verbs (69 percent) as with passives of actional verbs (77 percent).10 Similarly, in unpublished research by Hirsch et al. (2006), it was found that the passive of scare is already mastered (100 percent) at age 3 years, before mastery of action passives. These authors also found that children had some problems with active sentences involving scare (75 percent correct), although this finding was not replicated in the work of Messenger (2009), who reports equally good comprehension of active sentences based on actional verbs (83 percent) and of active sentences based on object-experiencer verbs (77 percent) (2009: ch. 4, experiment 4). However, she reports that adults perform slightly better on passives than on actives of object-experiencer verbs. In addition, Manetti (2012), based on Italian, found no difference in the comprehension of active and passive of object-experiencer verbs at 4;6 (around 75 percent correct comprehension). We have to point out that the studies of Messenger and of Manetti, on the one hand, and of Hirsch et al. (2006), on the other, are not directly comparable, as different methodologies were used. Thus, the predicted asymmetry between passive and active of object-experiencer verbs calls for further investigation in light of the various theoretical accounts and the possible cross-linguistic variation. Incidentally, notice that an asymmetry between the active and passive of object experiencer verbs is expected under the UPRH. According to Belletti and Rizzi (1988), object-experiencer verbs in Italian are unaccusative verbs (preoccupare class). The theme and the experiencer are both generated inside VP or vP, as in (29) and, in the active, the theme raises from its base position to Spec,TP/AgrSP. Since according to the URPH, the vP of unaccusatives is a phase, this movement is illegitimate in child grammar and this would explain why the active sentences based on object experiencer verbs are difficult, according to Hirsch et al.’s (2006) results (29) [TP/AgrSP Homer [vP [VP worries ] Marge]] The passive counterpart of (29) is expected to be unproblematic, under URPH. According to Belletti and Rizzi (1988) object-experiencer verbs form adjectival passives and we have learned that adjectival passives are not problematic for children. Pesetsky (1995), however, has challenged the conclusion that all object-experiencer verbs form adjectival passives and shown that a fair number indeed form verbal passives, at least in English. This conclusion is clearly a challenge for UPRH. 10

Object-experiencer verbs included annoy, frighten, scare, shock, surprise, and upset.

202 MARIA TERESA Guasti Thus, the CAH can account for the good performance of children on passives of object-experiencer verbs, while UPRH, instead, may not account for the whole range of facts and the theta-transmission proposal also has problems in this respect. In John is scared by Mary, which is likely a verbal passive on Pesetsky’s (1995) view, the by-phrase is incompatible with a default-Agent interpretation, as Mary is an experiencer. Finally, it is interesting to notice that object-experiencer verbs are also the verbs that elicit the greatest use of passive in adults (Ferreira 1994). Thus, it seems that generating passives that maintain an alignment of the thematic and grammatical hierarchies has some advantages even in adults.

10.9 The Argument Intervention Hypothesis Another recent explanation, called the Argument Intervention Hypothesis (AIH), is offered in Orfitelli (2012) who builds on insights from unpublished research by Snyder and Hyams (2015). Following Collins (2005), Orfitelli assumes that the thematic subject is generated in a by-phrase that occupies the external argument position. Under passivization, the thematic object must move over the by-phrase in Spec IP, that is, the by- phrase intervenes between the surface subject in Spec IP and its copy in the VP. Under this approach, passivization involves a violation of strict locality. Adults can circumvent locality constraints on A-movement through some syntactic operation, while children cannot. For example, adults can smuggle, that is, move the VP including the verb and the direct object to spec vP. In this way, the direct object is visible from outside. Children cannot avail themselves of the smuggling operation. This proposal is in line with the idea that children experience difficulty in any construction where the target grammar requires one to circumvent strict locality (see Friedmann et al. 2009; Guasti et al. 2012). However, the AIH has difficulties in handling the difference between action and non- action verbs, unless some additional assumption is made.

10.10 Taking Stock and Future Development We have examined one frequency-based or item-specific account that stresses the role of the input, and five grammatically based accounts that lay the emphasis on the role of syntactic or syntactic–semantic interface constraints (maturation of A-chains, of what counts as a phase, of thematic role transmission, of tolerance for non-canonical alignment, or of strategies for overcoming argument intervention). These theories have generated new expectations and new findings. We can agree on the conclusion that generally speaking,

Voice Alternations (Active, Passive, Middle) 203 active sentences are easier than passive ones, but not all passives are equally hard. Whether this is evidence of an adult-like representation of passives or not is where theories diverge. For all theories but the CAH, a good early performance on actional passives is not a manifestation of adult-like knowledge of passives. However, for the grammar-based views, it is evidence for some kind of abstract linguistic knowledge, while for the frequency-based view it is merely evidence of the item-specificity of children’s knowledge. All these theories create a discontinuity between child and adult grammar (although this may not hold for the thematic transmission hypothesis, if the source of difficulty is the processing burden), as the passive representations of children and adults are different (be it item-specific versus abstract or just of a different abstract nature). The CAH, instead, is compatible with a continuity perspective between child and adult grammar, in that children initially are capable of forming A-chains that respect the CA of the thematic and grammatical hierarchies. What develops is an ability to extend A-chains to cases of misalignment. Some technical innovation is starting to offer us some new results and open some new avenues for research. We have already mentioned studies using the syntactic priming method. Generally, priming effects are observed both when there is an overlap of lexical material between prime and target and also when there is no such overlap. For our purposes, the latter situation is more informative. When there is overlap of lexical material, it is possible that what primes the target is the sequence of lexical items, and not the underlying representation. In this respect, Huttenlocher et al.’s (2004) results are relevant, as they reveal priming effects on the production of actional passives in 4-and 5-year olds, even when the lexical material in the prime and the target does not overlap. This suggests that what primes the target is not a lexically specific representation but rather an abstract, underlying syntactic representation. The results are confirmed by Shimpi et al. (2007) and Bencini and Valian (2008). These authors found syntactic priming of passives at the age of 3 years. Unlike other studies, their sentences included inanimate entities (e.g. The presents are carried by the wagon). Partially contrasting results are reported in Savage et al. (2006), who conclude that some limited kind of abstract syntactic representation of passives emerges between 4 and 5 years (see the discussion in section 10.3). Another relevant finding within this paradigm is that priming is observed both when the children are asked to repeat the prime before producing the target, and equally when they are not required to do so (Huttenlocher et al. 2004; Savage et al. 2006); this already holds of children between 3 and 4 years, according to Messenger (2009) and Messenger et al. (2011). This indicates that the priming effect does not depend on repetition of a given sentence, but only on its comprehension. That is, understanding a sentence already activates an abstract syntactic representation that may then be used in production. Thus, from the age of 3 years, children must have some kind of abstract linguistic knowledge related to passives. All the syntactic priming studies discussed so far tested actional passives. Thus, these findings are not in principle incompatible with the ACDH, the UPR, the AIH, and the theta transmission hypothesis, all of which posit that children assign to actional passives some abstract representation, though not the adult one (or perhaps one relying on some default mechanism).

204 MARIA TERESA Guasti More problematic for all the theories considered here are the findings in Messenger (2009), based on English, showing that actional passives can be primed not only by actional passives but also by psychological passives (both object-experiencer and subject-experiencer), in children between 3 and 4 years. If the children do not understand subject-experiencer passives, as discussed in section 10.2.1, how can these prime the production of passives? Similarly, Manetti (2012), based on Italian, found priming of passives after passive primes regardless of the semantic properties of the verbs in the prime. The assumption of the syntactic priming methodology is that by understanding a sentence, its underlying representation is activated and then re-used in subsequent production. This means that children, in understanding psychological passives, activate an abstract representation and use it to produce actional passives. Messenger also found that both be- and get-passives prime get-passives. This finding is quite interesting, as there are many reasons to believe that be- and get-passives do not have exactly the same underlying representation. For example, in be-passives the external argument is somehow present in the underlying representation, but this is not so in get-passives (Haegemen 1985).11 If this is correct, it means that priming occurs even when the underlying representations of the prime and the target are not exactly the same; it is enough that they share some relevant components, for example the A-chain connecting the surface subject to its underlying internal argument position. This conjecture is in line with another finding of Messenger’s, that SP prime LP. In this case, the mere presence in the underlying representation of a null external argument primes the production of an adjunct. Messenger did not study whether actional and psychological passives prime the production of psychological passives.12 On the basis of the results she reports, we expect that they do. But it is of interest to know whether the priming effect in this case is smaller than in the reverse case, because if it is, this means that there is still an asymmetry between actional and non-actional passives that somehow needs to be accounted for. It would also be interesting to see whether there is priming from middle voice to passives (and vice versa), as these structures also share many properties. The findings from comprehension and production indicate that the passive is not fully acquired by the age of 4 years and that adult-like competence is not attained until around 6–7 years. However, these results unequivocally show that children, by age 4 or even a bit earlier, access and use the syntax of verbal passives (see O’Brien et al. 2006; Bencini and Valian 2008; Crain et al. 2009; Messenger 2009; Manetti 2012; Volpato et al. 2015). The findings from syntactic priming (production) reveal that children already possess abstract knowledge of passives at age 3, with priming occurring between verbs belonging to different semantic classes. This clearly provides evidence that children’s knowledge of passives is not lexically specific. It should also be pointed out, however, 11 In get-passives I assume the NP in the by-phrase gets an Agent/Cause role from the preposition by.

12 However, Messenger found that children sometimes produced psychological passives even if the target pictures depicted actional verbs. The verbs produced in the passive included frighten, annoy, and scare.

Voice Alternations (Active, Passive, Middle) 205 that children were less accurate than adults in all the studies that included adults. That is, although children exhibited some abstract knowledge of verbal passives by 3–4 years, their acquisition of passives is not completed before 6–7 years. Data from comprehension reveal knowledge of passives between 4 and 5 years. Generally, comprehension is more advanced than production, but here we observe the reverse. Why is this so? One possibility is that the task matters, as suggested in Messenger (2009). In fact, comprehension has been tested using a sentence–picture matching task, wherein the child is presented with two pictures differing in that the two relevant characters swap roles (e.g. the cat is pulling the dog or the dog is pulling the cat). To accomplish the task, children have to visually inspect the pictures, understand what they have to pay attention to, spot the relevant difference and choose the matching picture; all of this may not be easy. Moreover, pictures for actional verbs are easier to understand than pictures for psychological verbs. The picture representing “John is seen by Mary” may not be terribly different from the one depicting “Mary is seen by John,” as one who sees is also seen, unless one of the two characters is blindfolded. In fact, Beilin (1975) found that children performed better in an act-out task than in a sentence–picture matching task. Giving a verbal description of the event depicted in the pictures, and thereby providing the child with relevant vocabulary, also seems to improve children’s performance. This is something that Bencini and Valian (2008) did in their experiment. Compare also the better results of Adani (2009) to those of Arosio et al. (2009) on the comprehension of Italian relative clauses. The better performance in the former study is almost certainly due, at least in part, to the verbal description given to children prior to the test. Thus, one possibility is that weaker performance on comprehension compared to production is due to the task. Future research should develop new methods to test comprehension, or should employ more ecologically sound, indirect methods like the visual world paradigm. It would also be of interest to see whether enhanced production of passives through priming may improve comprehension of reversible passives. Bencini and Valian (2008) did not find such an effect, but their passives included inanimate entities in production and animate ones in comprehension; this asymmetry may have had an effect. Broadening the empirical base with investigations of other passive-like constructions, such as the causative, may also help in further narrowing the search space for the source of children’s difficulty with passives.

Acknowledgments This chapter was prepared while I was a visiting fellow at the Department of English and Applied Linguistics in Cambridge, UK, which is kindly acknowledged for the facilities and the atmosphere provided. In particular, I would like to thank Teresa Parodi.

Chapter 11

On the Ac qu i si t i on of Prep osi t i ons and Part i c l e s Koji Sugisaki

11.1 Introduction Languages of the world have a variety of ways to express spatial notions. For example, in Chalcantongo Mixtec, an Otomanguen language of Mexico, spatial configurations are classified via an extended and systematic body-part metaphor (Brugman 1983). Thus, in this language, the sentence “He sat down on the hill” is expressed as “He sat down the hill’s face,” as illustrated in (1). (1) ni-ndukoo-ø nuù-yuku perfv-sit-3sg face-hill ‘He sat down on the hill.’ In a language like English, spatial relationships are expressed primarily with prepositions and prepositional particles, which of course have additional, non-spatial uses as well. Even when we restrict our attention to these prepositions (or slightly more generally, adpositions), however, as we will in this chapter, we still encounter substantial cross-linguistic variation at a number of different levels. Table 11.1 provides a brief overview of variation in adpositions and in particles. First, with regard to adpositions, there are languages in which the existence of P (prepositions and postpositions) as a syntactic category is in doubt. For example, Li and Thompson (1981: 360–369) argue that the morphemes serving as functional equivalents of P in present-day Mandarin are better analyzed as “co-verbs.” These elements can often function as independent verbs in their own right, or occur with aspectual markers that are otherwise restricted to verbs. Huang et al. (2009) argue for a broadly similar view,

On the Acquisition of Prepositions and Particles 207 Table 11.1 Some points of cross-linguistic variation in adpositions and particles Adpositions

Particles

General

Existence

Existence

Lexicon

Inventory Cases assigned

Inventory

Morphology

Use as case-marking; Suppletive forms

Ability to surface as a verbal affix

Syntax

Word-order options

Word-order options

A /A-bar extraction of object Possibility of “swiping” Possibility of “stacking” Use as a complementizer Semantics

Event-type conversion

Possibility of “stacking” Event-type conversion

according to which only a handful of the adposition-like elements in Mandarin are plausibly members of the syntactic category P. In other languages (those that do employ clearcut adpositions), there is considerable variation in the inventory of adpositions provided. For example, where French has a preposition sous, meaning ‘under,’ Japanese instead requires the use of a spatial noun sita ‘area underneath,’ to express the same meaning (2)–(3): (2)

Le the

crayon pencil

est is

sous la table. under the table

(3) Enpitsu-wa teeburu-no sita-ni pencil-top table-gen “under-area”-loc ‘The pencil is under the table.’

aru. is

Furthermore, languages may differ as to which spatial relations are expressed by the same spatial adposition. In English, relationships involving contact with and support by a vertical surface, as in “the handle on a cupboard door,” are grouped together with relationships involving contact with and support by a horizontal space, as in “a cup on the table,” while a different preposition is needed for containment relations: “the apple in the bowl.” This classification pattern is shared with neither Dutch nor Spanish. As indicated in Table 11.2, Dutch distinguishes all three spatial situations, and employs a different preposition in each of them. Spanish, in sharp contrast, collapses all three together and uses a single preposition.

208 Koji Sugisaki Table 11.2 Classification of three static spatial situations (Bowerman 1996) Prepositions used

　

English “handle on cupboard door” “cup on table” “apple in bowl”

on in

Dutch

Spanish

aan op

en

in

In languages with a rich system of morphological case-marking, the case assigned by a particular adposition may (or may not) vary as a function of semantics, as seen in the alternation between dative (used for location, as in (4)) and accusative (for path, (5)) in German: (4)

Wir haben in dem Saal we have in the-dat hall ‘We danced in the hall.’

getanzt. danced

(5)

Wir sind in den Saal we are in the-acc hall ‘We danced into the hall.’

getanzt. danced

Certain languages use adpositions themselves as a form of case-marking. For example, English uses the adposition to both as a spatial preposition and as a marker of dative case (6): (6) a. The teacher was walking b. The book belongs c. Something happened

to / towards the post office. [spatial to] to / * towards that woman. [dative to] to / * towards the teacher. [dative to]

The acquisition of spatial versus dative to in English is discussed in detail in Snyder and Stromswold (1997). Another point of morphological variation concerns the existence, in some languages, of suppletive forms for certain combinations of P+D. In French, for example, de ‘of ’ + le ‘the’ (masc.sg.) becomes the combined form du. As will be discussed in section 11.2.4, the existence of P+D suppletion could be an indication that D in the given language consistently undergoes syntactic head-movement to P. When it comes to adpositional syntax, languages vary considerably. First, there is the issue of whether adpositions surface as prepositions or postpositions. A second issue is whether one may extract a P’s complement. Extraction by A-movement, as

On the Acquisition of Prepositions and Particles 209 in (7), is known as the “pseudopassive,” and is possible in English but extremely rare cross-linguistically.1 (7)

That idea is often spoken of.

Only slightly more common cross-linguistically is “adposition-stranding” (typically called “preposition-stranding”), the extraction of a P’s complement by A′-movement, as in (8a). Outside of English, adposition-stranding (henceforth “stranding”) has been documented in the North Germanic languages (Icelandic, Norwegian, Danish, Swedish) and some of the Niger-Congo languages (Vata and Gbadi; Koopman 1984), and also seems to be available, at least to some speakers, in other West Germanic languages (especially Frisian; cf. Merchant 2002). Outside of these languages, stranding appears to be extremely rare. (8) English: a. Who was Peter talking with b. ?? With whom was Peter talking (9)

t t

? ? [Odd, in spoken English]

Spanish: a. *Quién hablaba who was-talking

Pedro Peter

b. Con with

hablaba Pedro was-talking Peter

quién who(m)

con with

t

?

t

?

Instead of stranding, most languages that form adpositional questions by means of wh- movement require “pied-piping” (henceforth “piping”) of the adposition, as shown for Spanish in (9b). Cross-linguistic variation in the availability of stranding, and children’s acquisition of prepositional questions (P-questions) in English and other languages, will be discussed in section 11.2. Table 11.1 includes several additional points of syntactic variation. The term “swiping” in the table refers to the possibility in some languages of inverting the order of a P and a wh-object, as in (10). (10) John was obviously upset, but I don't know what about. German has similar-looking expressions (known as R-pronouns) that involve the wh- word wo ‘where,’ as in (11). (11)

Hans war unmutig, aber ich weiss nicht Hans was displeased, but I know not ‘Hans was displeased, but I don’t know what about.’

worüber. where+about

1 Even within the Germanic family, the pseudopassive seems to be quite restricted. For example, according to Maling and Zaenen (1990) and Lødrup (1991), Norwegian has a productive pseudopassive, but other Scandinavian languages do not.

210 Koji Sugisaki The form worüber is a combination of wo ‘where’ and the preposition über ‘about, over,’ with a “linking” segment -r-in the middle. Note that although wo is literally ‘where,’ its domain in an expression like worüber is far more general than location, and R-pronouns can therefore take on most of the meanings expressed by swiping in English. Yet, despite the similarities between swiping and German R-pronouns, Merchant (2002) gives evidence for a number of important differences in their syntactic properties. Hence, this is an area of considerable richness for studies of both syntactic variation and language acquisition. The term “stacking” in Table 11.1 refers to the possibility, found at least in English, of inserting one or more particles on top of a preposition, as in (12). (12) He stormed back on up over the hill. Stacking seems to have received surprisingly little attention in the syntax literature, although it is discussed briefly (under the name “particle recursion”) in work of den Dikken (1995: 80), whose examples suggest that it might be considerably more restricted in Dutch than in English. Another, quite exotic property of English is that it allows the use of a preposition (for, with) as a Case-assigning complementizer: (13)

a. b. c. d.

John wants very much [CP for [TP Mary to leave now]]. John wants [CP (??for) [TP Mary to leave now]]. John would prefer [CP *(for) [TP Mary to leave now]]. With [SC John in the kitchen], dinner might be late.

Note that for becomes phonetically null when adjacent to want, as in (13b), although it must remain overt when adjacent to prefer, as in (13c). Cross-linguistic variation in the availability of prepositional complementizers, and their acquisition in English, will be discussed in section 11.2.3. One final point of cross-linguistic variation in adpositions relates to compositional semantics, namely the possibility of “event-type conversion.” In some languages, including English, a simple-Activity verb like run can combine with a spatial PP and yield a VP denoting an Accomplishment, as in (14). (14)

John ran *(through the tunnel) in five minutes.

In other languages, like Spanish, there is no such change in Aktionsart: (15)

*Juan corrió por el túnel John run-3sg.pret through the tunnel ‘John ran through the tunnel in five minutes.’ [* on accomplishment reading]

en cinco minutos. in five minutes

In Spanish a VP headed by the activity verb corrió ‘ran’ denotes a simple Activity, even when there is a PP like “through the tunnel” that offers a natural endpoint (i.e. reaching the far end of the tunnel) for an Accomplishment event. This point of variation will be discussed in section 11.3.2.

On the Acquisition of Prepositions and Particles 211 Another prominent area of cross-linguistic variation, often connected with prepositions, is the availability of verb–particle constructions. As illustrated by the English examples in (16), a particle (e.g. down) is a spatial morpheme that works in concert with the verb to characterize an event, but is morphologically and syntactically a free word. (16) a. Mary set the box down. b. Mary set down the box. In the literature on German or Dutch one may find particles referred to as separable prefixes, because in these languages particles are sometimes free, and at other times are bound to the verb as a prefix, as illustrated for German in (17). (17)

a. Sie setzt es ab. she sets it down ‘She sets it down.’ b. Sie hat es ab+ge+setzt. she has it down+perf+set ‘She (has) set it down.’

In a language like English, where this type of prefixation does not occur, the notion of separability is nonetheless key. This is because one of the distinctive properties of English particles is that they can be separated from their associated verb by phrasal material, such as the direct object in (17a). In English and other Germanic languages, most particles are homophonous with prepositions, and for this reason some syntactic analyses (e.g. Emonds 1985) regard them as intransitive prepositions. Yet even in Mandarin, where the existence of P as a syntactic category is doubtful, there are nonetheless verb–particle constructions comparable to those in (17) and (18). In Mandarin, the counterparts to English particles tend to be homophonous with verbs, namely verbs of directed motion. While verb–particle constructions are certainly found outside the Germanic family, they are by no means universal. Indeed, true verb–particle constructions, with a particle that is fully separable from the verb, seem to be unavailable in the major Romance languages. This is illustrated in (18) for Spanish, where there is no direct counterpart to a particle like English off. (18)

a. María Mary b. María Mary

arrancó ripped arrancó ripped

el tapón the lid (* de/afuera/...) off

(* de/afuera/...) off el tapón the lid

In the languages that do have particles, their morphological and syntactic properties vary. For example, the possible placement of a particle in relation to a direct object varies considerably, even between closely related languages. Thus, in English one finds both I picked the book up and I picked up the book (though if the direct object is a pronoun, only the first order is possible). Norwegian allows the same two orders as English, while Swedish requires the particle to come before the direct object, and Danish strongly

212 Koji Sugisaki prefers for it to come after. Variation in the availability of particles, and in their syntactic distribution, will be addressed in sections 11.3.1 and 11.3.3. Further points of cross-linguistic variation in particles include two with direct parallels to what we have already seen for adpositions. First, the possibility of “stacking” one or more particles above an English preposition has a counterpart in which the bottom of the stack is not a preposition but simply another particle: (19)

He stormed right on out.

Second the possibility of event-type conversion, where an Activity is converted into an Accomplishment by adding a PP, may likewise exist with a particle, as in (20). (20)

Mary walked *(over) in an hour.

In sum, cross-linguistic variation in the syntax and semantics of adpositions and particles leads to a very important but difficult question of how children converge on the target grammar. As a first step towards an answer, the remainder of this chapter will focus primarily on syntactic variation, and on acquisition studies that consider the role of innate constraints on syntactic variation. Since languages permitting verb–particle constructions are somewhat uncommon, and languages permitting stranding are downright rare (seldom seen outside the Germanic family), evidence from child language acquisition is an extremely important source of insight into the variation permitted in these areas of syntax. Therefore, in the remainder of this chapter, the scope of the discussion will be restricted in at least the following two ways. First, the chapter will not address acquisition of the lexical semantics of prepositions, nor the development of (nonlinguistic) spatial concepts. Readers who are interested in these topics are referred to Bowerman (1996) and references therein. Second, the chapter will gloss over many details concerning the internal structure of PPs and verb–particle constructions. Readers who would like to know more about current approaches to the syntax of PPs are referred to the papers collected in Cinque and Rizzi (2010). Those who would like to know more about the syntax of particle constructions may wish to consult den Dikken (1995) and the papers collected in Dehé et al. (2002), among many others.

11.2 Stranding: Cross-linguistic Variation and Acquisition 11.2.1 The Acquisition of Stranding and Piping As we have seen in section 11.1, languages differ in the movement possibilities for adpositional complements. In English, wh-movement of a prepositional complement can strand the preposition, while in Romance languages like Spanish, the preposition must be piped with the wh-word. Cross-linguistically, stranding seems to be a highly marked

On the Acquisition of Prepositions and Particles 213 option: most languages with overt wh- movement require piping (Abels 2003; van Riemsdijk 1978; Hornstein and Weinberg 1981; Kayne 1981; Stowell 1981, 1982a; among many others). In light of its marked status, early studies on the acquisition of stranding investigated whether the “unmarked” option (piping) precedes the marked option (stranding) during children’s acquisition of English. French (1984) was one of the earliest studies to address this question. Two experiments were conducted with 33 English-speaking children, ranging in age from 2;11 to 5;6. In one experiment children were tested for comprehension of sentences like (21). If they understood the preposition, they were expected to point at the box that the boy was hiding in, but if they simply ignored the preposition, they were expected to point at the box that the boy had hidden. French further reasoned that if stranding is late to emerge, there should be a tendency to ignore the preposition more often in the stranded version (21a) of a test sentence. (21) a. Show me the box which the boy hides in. b. Show me the box in which the boy hides. Contrary to this prediction, however, the children performed identically on stranding and piping, with a mean success rate of 68 percent on either structure. A somewhat stronger finding emerged in the second experiment, which employed an elicited imitation task. There the children were significantly more successful at the production of stranding, as compared with piping: The 18 children who completed the imitation task scored only 29 percent correct on piping, but 55 percent correct on stranding. French took these results as evidence that children do not acquire piping before stranding: The results, according to French (1984), argue against the hypothesis that the “unmarked” option of piping is adopted in advance of linguistic experience. Hildebrand (1987) also addressed the question of whether the “marked” option of stranding appears late in the acquisition of English, albeit in a different way: Her study investigated whether sentences like (22b), with stranding, are more difficult for children to produce than sentences like (22a), in which there is no stranding. (22) a. What is the boy pulling? b. What is the boy drawing with? Two experiments were conducted with 48 children (4-, 6-, 8-, and 10-year-olds, with 12 children in each group). One was an elicited-imitation task, in which the child was asked to repeat a model sentence when cued by the experimenter. The other was an elicited-production task, in which the experimenter showed a picture and uttered a statement about it. The child’s task was to convert the experimenter’s sentence into a cleft. Examples are given in (23) and (24). (23)

Type I: Stimulus: Expected Response:

[picture of a car] The boy on the road is pulling a car. This is the car that the boy on the road is pulling.

214 Koji Sugisaki (24)

Type II: Stimulus: Expected Response:

[picture of a crayon] The boy at the table is drawing with a crayon. This is the crayon that the boy at the table is drawing with.

The results of the elicited-imitation task suggested that children in the youngest group had not yet acquired the more marked option, Type II. Similarly, the results of the elicited-production task indicated that Type II is acquired later than Type I. The predominant error observed in these experiments was to omit the stranded preposition. In light of these findings, Hildebrand argued against French (1984), and claimed that unmarked options are indeed acquired before marked ones. A word of caution is in order, however: the two structures Hildebrand investigated are different from those in French’s study: While French compared the acquisition of stranding versus piping, Hildebrand compared the acquisition of sentences with stranding versus sentences with neither stranding nor piping (that is, direct-object wh-questions). This difference could perhaps be the source of the discrepancy in their results. Stranding and piping can be observed not only in wh-questions but also in relative clauses, as shown in (25). McDaniel et al. (1998) ran two experiments on children’s knowledge of stranding and piping in English relative clauses. Their subjects were 115 children, aged 3;5 to 11;11. One experiment employed an elicited-production task, and the other used a grammaticality-judgment task. The results of the elicited-production task showed that even the youngest children tested (age range, 3;5–5;11) had stranding as an option in their grammars. In sharp contrast, piping was never produced by any child (nor, in fact, by the adult controls). In the judgment task, children were asked whether sentences like (25) sounded “right” or “wrong”: (25) a. Stranding: b. Piping:

This is the woman who Grover talked to. This is the woman to whom Grover talked.

The results of the judgment task were consistent with those of the production task: Even the youngest children accepted sentences involving stranding more than 90 percent of the time. Piping, on the other hand, was generally rejected by the younger children (6 percent acceptance; age range, 3;5–5;11), and was accepted by only about half of the children in the older age group (54 percent acceptance; age range, 9;1–11;11). In contrast, adults generally accepted piping. According to McDaniel and colleagues (1998), the results of these two experiments indicate that stranding is already an option in young children’s grammars, despite its marked status cross-linguistically. They argue that the lack of a piping option in child English is consistent with certain aspects of the Minimalist Program (Chomsky 1995): The stranding and piping alternatives involve the same numeration, but the stranding alternative involves movement of less material and hence is the more economical derivation. On their account, adults’ acceptance of the piping alternative is due to a prescriptive rule taught in school.

On the Acquisition of Prepositions and Particles 215 The studies reviewed in this section focused on stranding and piping in the acquisition of English, but others have examined children’s acquisition of non-stranding languages. Elicited-production studies with French-and Spanish-speaking children between the age of 3 and 6 have suggested that children avoid the use of piping in their own speech, even though stranding is not an option in these languages (Labelle 1990; Pérez-Leroux 1993): the children did not strand prepositions, but instead employed alternatives like resumptive pronouns. In a study on Serbo-Croatian, which is also an obligatory-piping language, Goodluck and Stojanovic (1997) found that children around the age of 4–6 understood sentences with piping, but only a few, older children ever produced them. As in the studies of French and Spanish, no child ever produced a sentence with stranding. In contrast to the studies summarized in this section, which employed experimental tasks, Sugisaki and Snyder (2003) examined the acquisition of prepositional questions (P-questions), in both English and Spanish, using longitudinal corpora of children’s spontaneous speech. The details of their study will be spelled out in some detail, because they are relevant for the interpretation of a number of other spontaneous-speech studies to be discussed in the rest of section 11.2 and in section 11.3. First, Sugisaki and Snyder’s (2003) findings for English strongly supported the conclusions of French (1984) and McDaniel et al. (1998): there was no evidence that any child acquiring English had piping as an initial, “default” strategy. Of ten children acquiring English, none had P-questions at the beginning of his or her corpus, but nine had begun using P-questions by the end of the corpus. (The exception was a child whose corpus ended at age 2;10.) In these nine children, the age of onset for P-questions with stranding ranged from 2;2 to 3;3 (mean 2;7). Crucially, none of the ten children examined ever produced even a single question with piping. Sugisaki and Snyder’s findings for Spanish indicated that children acquiring an obligatory-piping language sometimes master the syntax of piping quite early. Of four children acquiring Spanish, two were making regular use of P-questions by the end of their corpora. (The two who did not were a child whose corpus ended at 2;5, and a child whose corpus extended to 4;11 but contained very sparse data, especially after the age of 2;10.) The ages of onset for P-questions with piping were 2;1 and 2;4. None of the four children ever produced a P-question with stranding. While these data for Spanish are somewhat limited, they indicate that production of piping by children acquiring a piping language can begin much earlier than one would have thought, given the experimental findings for French, Spanish, and Serbo-Croatian reported in this section. One possibility is that children’s difficulties in the studies of elicited production indicate difficulty in sentence processing (or perhaps simply difficulty with the experimental tasks), rather than a lack of grammatical knowledge. Sugisaki and Snyder’s combined data from English and Spanish corpora provide support for another point that is worth mentioning here, because of its bearing on spontaneous-speech studies more broadly. Namely, the children appear to have been “grammatically conservative,” in the sense discussed at length by Snyder (2008,

216 Koji Sugisaki 2011): from the perspective of their spontaneous speech, the children were making steady progress towards the adult grammar, with few grammatical errors. The children acquiring English did not experiment with (i.e. produce) various possible types of P-questions, nor did they begin with an “unmarked” type and correct this error later. Instead, they waited until they knew the correct option for English, and only then began producing P-questions. The children acquiring Spanish likewise showed no sign of experimentation with the stranding option before they began producing P-questions with piping, the correct option for Spanish. Here and in other studies, the majority of errors in children’s spontaneous speech (though not in their elicited production) are errors of omission, where required words or morphemes are simply absent. It is far less common to find an actual error of “comission,” where the child puts words or morphemes together in a way that is disallowed in the target language. This observation suggests that when a child abruptly goes from never using a construction to using it frequently and correctly in her spontaneous speech, with a variety of lexical items, we are entitled to conclude that the child’s construction has the same grammatical basis (in terms of parameter settings and lexical information) as it has for the adult speaker. An important consequence of this “grammatical conservatism” is that the longitudinal records of children’s spontaneous speech provide an extremely valuable testing ground for predictions stemming from theories of cross-linguistic variation. This testing ground is directly exploited in a number of studies discussed in the rest of section 11.2 and in section 11.3. To summarize this subsection, English-learning children appear to have no difficulty acquiring stranding, despite its “marked” status cross-linguistically. On the other hand, the “unmarked” option of piping seems to give children some difficulty, although this may be at the level of processing rather than grammar. The expected developmental change from the unmarked option (piping) to the marked option (stranding) was not observed in the acquisition of English. On the other hand, the phenomenon of grammatical conservatism makes it possible to test parametric proposals about stranding fairly directly, using longitudinal corpora of spontaneous speech. In the subsections that follow, we review a number of acquisition studies that pursued this general research strategy.

11.2.2 Stranding and the V-Particle-DP Construction As mentioned in section 11.1, stranding and separable-particle constructions are both among the more exotic properties of English, and are absent from the major Romance and Slavic languages, for example. Building on this observation, Stowell (1981, 1982a) proposed to relate the two. On his account, stranding requires the availability of a “reanalysis” rule like (26), which creates a single, complex word from a string-adjacent V and P: (26)

V [P DP] ⇒ [V+P] DP

On the Acquisition of Prepositions and Particles 217 Stowell further proposed that a reanalysis rule like (26) is permitted to exist in English only because the result is weakly equivalent to the “[V+P] DP” form that is allowed independently by the English “[V+Particle] DP” construction. Stowell’s proposals were framed in a theory of grammar that is no longer current, but his intuition that stranding in English relies (at least in part) on whatever makes V–Particle–DP possible, is an idea that could still be translated into a current framework. Whether such a translation is warranted will depend on whether Stowell’s intuition is supported by the available evidence from comparative syntax and child language acquisition. Sugisaki and Snyder (2002) sought to answer this question. They began with a basic comparative survey. As shown in Table 11.3, the results were largely consistent with Stowell’s intuition: Languages allowing stranding also permitted a prepositional particle to stand adjacent to a transitive V.2 Yet, given the cross-linguistic rarity of stranding, their survey was based on an extremely limited sample of languages. It was also based on extremely superficial diagnostics for the actual grammatical properties of each language examined. As a consequence, the apparent connection between stranding and the V–Particle–DP construction could easily be an accident. To bring in some evidence from a different source, they conducted an acquisitional investigation. Such an investigation has the advantage that it can focus on one or two well-studied target languages, such as English, without the need for in-depth analysis of diverse languages that comes with a comparative approach. The prediction tested by Sugisaki and Snyder was that every child acquiring English will have the V–Particle–DP construction in place, in their spontaneous speech, by the time they begin to use stranding. Note that this prediction is independent of exactly how the two structures are thought to be related, and independent of how one would decide whether a structure in some language other than English is comparable to the one in English. As long as the properties of English grammar that make V–Particle–DP possible in English are a proper subset of the specific properties of English that make stranding possible, the prediction stands.3

2

Danish was the one potential counterexample they found to Stowell’s generalization. Danish allows stranding, but Herslund (1984: 40) and others have asserted that Danish systematically disallows the V-Particle-DP order. Yet, Thráinsson (2000: 166) reports that the following example with V-Particle-DP order is accepted. Thus, the grammatical status of this order in Danish calls for further investigation. Jeg skrev op nummeret / *det. I wrote up number-the it ‘I wrote the number /it down.’

(i)

3

Strictly speaking, it is rather unlikely, and in practice unnecessary, for the full set of grammatical pre-requisites of the English-type V–Particle–DP construction to be needed for stranding. Rather, what is crucial is that the late-acquired prerequisites of the former be a proper subset of the prerequisites for the latter. To the extent that this is the situation, one may safely abstract away from “non-shared but early-acquired” prerequisites of the first construction.

218 Koji Sugisaki Table 11.3 Cross-linguistic survey on P-stranding and the V-particle-NP construction Language

Stranding?

V-Particle-DP?

North Germanic:

　

　

Icelandic Norwegian Swedish

Yes Yes Yes

Yes Yes Yes

West Germanic:

　

　

Yes

Yes

　

　

Greek

No

No

Romance:

　

　

French Italian Spanish

No No No

No No No

　

　

No No No

No No No

English Greek:

Slavic: Bulgarian Russian Serbo-Croatian

To evaluate this prediction, Sugisaki and Snyder analyzed a sample of ten English- learning children for whom longitudinal corpora were available in the CHILDES database (MacWhinney 2000), and asked whether any child in the sample acquired stranding significantly earlier than the V–Particle–DP construction. To address this question, they needed a way to judge whether a temporal gap between the first uses of two constructions went beyond what could be attributed to simple differences in frequency of use. Their approach was to apply a Binomial Test to each child’s data, using frequency data from the same child’s speech slightly later in the corpus. Specifically, their approach to each child’s corpus was as follows: (i) Locate the child’s “first clear use” of either stranding or a V-Particle-DP construction.4 (ii) Identify each use of this first construction, up to the point when he or she begins using both constructions. (iii) Determine the relative frequency of the two constructions in the next four transcripts, or until the end of the child’s corpus, whichever comes first. (iv) Use the Binomial Test to calculate the probability of the child’s producing at least the observed 4 “First clear use” is, more precisely, “first clear use, followed soon after by regular use.” See Stromswold (1996), Snyder and Stromswold (1997), and Snyder (2007) for detailed discussion of this criterion for acquisition.

On the Acquisition of Prepositions and Particles 219 number of examples of the first construction, before starting to use the second construction, simply by chance. The null hypothesis for this test was that the second construction was grammatically available at least as early as the first construction, and that the two constructions had the same relative frequency of use that was observed in the child’s own speech slightly later in the corpus (Stromswold 1996). The results were as follows: seven of the ten children acquired the V–Particle–DP construction significantly earlier than stranding, and three acquired them both at approximately the same age (no significant difference by Binomial Test). Crucially, no child acquired stranding significantly earlier than the V–Particle–DP construction. Thus, evidence from child English provided a completely new type of support for Stowell’s proposal to link the availability of stranding to that of V–Particle–DP constructions.

11.2.3 Stranding, Prepositional Complementizers, and Double Accusatives Kayne (1981) proposed that the availability of stranding is a necessary condition for two further syntactic properties: the possibility of prepositional-complementizer (PC) constructions,5 in which an infinitival clause takes a lexical subject that is accompanied by an overt or null prepositional complementizer; and the possibility of double-accusative constructions, in which a single verb takes two accusative-marked objects,6 English, which allows stranding, clearly exhibits both of these other properties, while French, which requires piping, exhibits neither. (27)

English: a. Which candidate have you voted for? b. John wants (for) Mary to leave. c. John gave Mary a book.

(28) French: a. * Quel candidat as-tu voté pour? b. * Jean veut (de) Marie partir. c. * Jean a donné Marie un livre. 5 See Lasnik and Saito (1991: 337) for arguments that in sentences like (27b), when for is not overtly present, the infinitival subject is assigned Case not by the matrix verb, but by a null prepositional complementizer. 6 The term double-accusative construction is based on the fact that both of the two NPs that follow the verb in the English example (27c) bear morphological accusative case. We can observe this in the following example, in which both of the objects are pronouns:

(i)

I showed him her.

Yet, it might be the case that one of the two objects bears a dative Case, and that the loss of the morphological distinction between accusative and dative in English masks this fact. In the analysis by Kayne (1981), it is crucially assumed that both of the DPs in fact bear accusative Case.

220 Koji Sugisaki Table 11.4 Kayne’s (1981) cross-linguistic survey Language

Stranding?

PC Construction?

Double accusatives?

English

Yes

Yes

Yes

Icelandic

Yes

No

No

French

No

No

No

Icelandic appears to have an intermediate status between English and French: it allows stranding in wh-questions as shown in (29), but does not have PC constructions or double-accusative constructions. (29) Hann spurði He asked

hvern ég hefði talað við. whomACC I had talked to (Maling and Zaenen 1990: 155)

Since the properties listed in Table 11.4 can be found in very few languages, it is extremely difficult to evaluate Kayne’s parametric proposal through comparison of typologically diverse languages. Sugisaki and Snyder (2006) therefore conducted an acquisitional evaluation, in which they analyzed the longitudinal corpora of spontaneous speech samples from ten children acquiring American English. The fundamental idea in Kayne’s (1981) proposal is that the parameter-settings required for stranding are a proper subset of the parameter-settings required for the PC construction and the double-accusative construction. The prediction from this proposal is that any English-learning child who uses the PC construction or the double- accusative construction should also allow stranding. This amounts to (30). (30)

Predictions from Kayne’s Parametric Proposal: a. Children learning English should never acquire the PC construction significantly earlier than stranding. b. Children learning English should never acquire the double-accusative construction significantly earlier than stranding.

The approach taken by Sugisaki and Snyder (2006) was basically the same as in Sugisaki and Snyder (2002). The results were as follows: seven of the ten children produced both stranding and the PC construction by the end of their corpora. Among these seven children, three acquired stranding significantly earlier than the PC construction. The remaining four children acquired stranding and the PC construction at approximately the same age (no significant difference by Binomial Test). Crucially, none of the ten children acquired the PC construction significantly earlier than stranding. Thus, the results are consistent with the prediction in (31a), and lend support to theories positing a direct implicational relationship from the existence of prepositional complementizers in a language, to the possibility of stranding. On the other hand, the results for stranding and double accusatives falsified the prediction. Nine of the ten children produced both stranding and the double-accusative

On the Acquisition of Prepositions and Particles 221 construction by the end of their corpora, and five of these nine children actually acquired the double-accusative construction significantly earlier than stranding, by Binomial Test. For the remaining four children, the age-discrepancy did not reach significance (p > .05, by Binomial Test). These acquisitional findings directly contradict the prediction in (31b), and therefore constitute strong evidence against Kayne’s (1981) view that natural-language grammars permitting the type of double-accusative construction found in English are a proper subset of those permitting stranding.7

11.2.4 Piping and P+D Suppletion In sections 11.2.2 and 11.2.3 we saw acquisitional evidence supporting Stowell’s (1981, 1982a) proposal of a link between stranding and the V–Particle–DP construction of English, and we saw additional evidence supporting Kayne’s (1981) proposal of a direct parametric link connecting PCs to stranding. These proposals share the fundamental idea that stranding is a marked option, and that the relevant parameter determines whether stranding is permitted or not in a given language. In sharp contrast, Law (1998, 2006) and Salles (1997) regard stranding as an unmarked option, and attempt to account for why piping of prepositions is obligatory in a number of languages. Capitalizing on the observation that “piping” languages like French and Italian have the property that a preposition sometimes coalesces with the following determiner into a suppletive form, Salles (1997) and Law (1998, 2006) argue that the existence of such suppletive forms of prepositions and determiners (henceforth, P+D suppletive forms) is the source of obligatory piping. According to their analysis, the existence of P+D suppletive forms in a given language is an indication that D always incorporates to P in that language, and this D-to-P incorporation forces the movement of a wh-phrase to carry along the preposition. The sentences in (31) and (32) illustrate the P+D suppletive forms in French and Italian respectively, and Table 11.5 summarizes the cross-linguistic surveys conducted by Salles (1997) and Law (1998, 2006) of P+D suppletive forms and obligatory piping. (31) French Jean a parlé du sujet le Jean have talked about-the subject the ‘Jean talked about the most difficult subject.’

plus most

difficile. difficult

7 Evidence from child English thus bolsters Zhang’s (1990) point that Chinese is a problem for Kayne’s proposal, because it permits a double-accusative construction but disallows stranding:

(i)

Wo song le Lisi I give Asp Lisi ‘I gave Lisi a book.’

yi one

ben copy

shu. book

Note that this type of converging evidence from comparative syntax and acquisition is far stronger than either type of evidence alone, because the two have complementary strengths and weaknesses. For example, a weakness of (i) by itself is that the Chinese construction may or may not turn out to be equivalent, in relevant respects, to the one in English. See Snyder (2012) for detailed discussion.

222 Koji Sugisaki Table 11.5 Cross-linguistic survey of P+D suppletive forms and obligatory piping

(32)

Language

P+D suppletive forms?

Romance:

　

　

French Italian Portuguese

Yes Yes Yes

Yes Yes Yes

Germanic:

　

　

German English Scandinavian Languages

Yes No No

Yes No No

Italian Gianni ha parlato del sogetto più Gianni have talked about-the subject most ‘Gianni talked about the most difficult subject.’

Obligatory piping?

difficile. difficult

Under the system of Salles (1997) and Law (1998, 2006), the existence of P+D suppletion constitutes a sufficient condition for the obligatory piping of prepositions: in every language that has P+D suppletive forms, D-incorporation to P is obligatory and hence piping is required. Therefore, the following prediction can be made for the time course of acquisition. (33)

Prediction of the parametric proposal of Salles (1997) and Law (1998, 2006): Every child acquiring a language with P+D suppletion will exhibit piping in P- questions as soon as she acquires both overt wh-movement and at least one P+D suppletive form.

In order to evaluate this prediction, Isobe and Sugisaki (2002) analyzed two longitudinal corpora for French available in the CHILDES database. The results were as follows. One of the two children (Philippe) acquired overt wh-movement, P+D suppletive forms, and piping by the end of his corpus. Contrary to the prediction, however, P+D suppletive forms appeared earlier than piping, and the age-discrepancy between the two was statistically significant by Binomial Test. Thus, having both overt wh-movement and P+D suppletion was not sufficient to let Philippe know how to form P-questions. The findings are still compatible with an analysis in which the availability of P+D suppletion is only a necessary, not a sufficient, condition for piping. However, such an analysis is far from appealing, because it would seem to entail that adult grammars can have P+D suppletive forms and still permit stranding. Thus,

On the Acquisition of Prepositions and Particles 223 evidence from child language runs counter to the view that P+D suppletion and piping are tied to the same parameter-setting.

11.2.5 Summary In this section we reviewed studies that addressed variation in stranding from both a comparative and an acquisitional perspective. We saw acquisitional evidence for the view that stranding (at least in English) is dependent on the availability of V–Particle– DP constructions, and for the view that the availability of prepositional complementizers depends on the possibility of stranding. We also saw acquisitional evidence against the view that P+D suppletion directly entails piping, and against the view that the English double-accusative construction would be impossible without the availability of stranding. Naturally, the hard work of figuring out the grammatical details of stranding and related structures still remains, as does the question of how stranding should be captured within a Minimalist approach to syntax (though see Abels 2003 for one attempt). Nonetheless, these findings from the time course of acquisition impose considerable constraints on the syntactic analyses, and the accounts of syntactic variation, that are empirically tenable.

11.3 The Verb–Particle Construction: Acquisition and Cross-linguistic Variation In the previous section we saw evidence linking the availability of stranding (at least in English) to the availability of a V–Particle–DP construction. In this section we turn to the latter point of variation and address, from an acquisitional perspective, the question of what is necessary for the availability of separable-particle constructions.

11.3.1 Verb–Particle Construction and Productive Compounding It has been observed at least since Talmy (1985) that the English-type of verb-particle construction, in which the prepositional particle can be separated from the verb by a phrasal constituent, is not available in Romance languages like Spanish, as illustrated in (18), repeated as (34). In Talmy’s (1985) terminology, English is a “satellite-framed” language, meaning that path is normally expressed by a “satellite” such as a particle or a PP,

224 Koji Sugisaki while Spanish is a “verb-framed” language, in which path is normally expressed not by a satellite but within the main verb. (34) a. María Mary b. María Mary

arrancó ripped arrancó ripped

el tapón the lid (* de/afuera/...) off

(* de/afuera/...) off el tapón the lid

Snyder (1995, 2001) examined cross-linguistic variation in particle constructions within a parametric framework, and found evidence suggesting that the crucial prerequisite for a language to have English-style separable particles is the availability of endocentric, bare-root compounding as a fully “creative” process. While novel endocentric compounds can be created freely in English, this is not possible in Spanish, as illustrated in (35). (35) a. English b. Spanish

banana box ‘box in which bananas are stored’ *banana caja, *caja banana

According to Snyder, the separable-particle construction as in (34a) is possible only in those languages that allow productive endocentric compounds. The results of his comparative survey are summarized in Table 11.6. In order to test his proposal further, Snyder performed an acquisitional study. The prediction for child English was that no child should acquire the V–DP–Particle construction significantly earlier than novel compounding. As seen in Table 11.6, novel compounding is a necessary, but not sufficient, condition for the separable-particle construction. Rather more ambitiously, if it turns out that the availability of novel compounding is the last-acquired prerequisite for the V–DP–Particle construction, it is predicted that each child will acquire novel compounding and the V–DP–Particle construction right at the same point in time. The results of an analysis of ten longitudinal corpora for English from the CHILDES database bore out the latter, stronger prediction. The point at which a given child begins producing transitive V–DP–Particle constructions (e.g. throw the picture away) is almost exactly the point when the child suddenly starts producing novel endocentric compounds (e.g. zoo book, for ‘book about the zoo’). Statistically speaking, the correlation is incredibly strong (r = .98, t(8) = 12.9, p < .001), and remains strong even when the variability that can be explained by control measures (such as the age at which a given child first used a lexicalized compound, like apple juice) has been subtracted out by means of a partial-correlation procedure. Moreover, these findings were confirmed in a larger version of the study (Snyder 2007), based on a total of 19 children who were acquiring either American or British English (r = .94, t(17) = 11.1, p |X –Y| b. Most of the crayons are broken.

Thus, (1b) is true just in case the number of broken crayons is greater than the number of unbroken crayons. Do children have to have acquired number words, in order to represent the meanings of quantifiers that seem to have numerical content? Initial evidence may be consistent with this possibility. Children’s acquisition of most has been shown to be protracted. Papafragou and Schwarz (2006) found that in contexts where children need to verify whether a dwarf “lit most of the candles,” children do not appropriately select situations in which more than half of the candles are lit until well after their 5th birthday, an age at which they are already prodigious counters. Barner and colleagues (Barner, Libenson et al. 2009) have likewise shown that both English-and Japanese-speaking 4-year-olds often interpret most in an non-adult way, often to mean what some means, and do not yet appreciate that most is only true in more-than-half situations. On the other hand, studies that provide children with clear alternatives in the question (e.g. “Are most of the crayons blue, or yellow?”) seem to find better performance even in 4-year-olds (Halberda et al. 2008), suggesting that children have at least some meaning for most by this age, though it perhaps undergoes further development. Indeed, Halberda et al. (2008) asked about the dependence of most on number knowledge by examining correlations in between performance on a most task (are most of the crayons blue, or yellow?) and their performance on standard number knowledge tasks (Gelman and Gallistel 1978; Wynn 1992). These authors found a significant number of children who acquired most prior to learning the cardinality principle. Moreover, these children, despite not knowing natural numbers, showed clear ratio-dependent performance—the critical signature of the ANS. Thus, prior to acquiring natural numbers, the ANS may provide the content over which children’s understanding of most develops. Moreover, Odic et al. (2014) showed that even when children acquire the cardinality principle, it still takes several months before they begin to use their number knowledge in verifying sentences containing most. They found that young CP-knowers use the ANS to verify most sentences, even directly after counting the items in the array and hearing the experimenter repeat the numbers (e.g. There are 9 goats and 5 rabbits. Are most of the animals goats, or rabbits?). These findings suggest that the initial meanings for quantifiers do not depend on knowedge of cardinality and that the ultimate representation of quantity in semantic representations must be broad enough to be verified by both precise cardinalities and by ANS representations (see also Odic et al. 2013 for evidence about the count-mass distinction in early comparative quantifiers).

21.2.2.2 Possible Quantifier Meanings: The Case of Conservativity In the framework of generalized quantifier theory (Mostowski 1957), sentences with the form in (2) express a relation between two sets: the set of dogs, and the set of brown things.

Quantification in Child Language 503 (2) a. every dog is brown b. some dogs are brown c. most dogs are brown If we represent these sets by DOG and BROWN, respectively, the truth conditions of the three sentences in (2) can be expressed as in (3). (3) a. ‘every dog is brown’ is true iff DOG ⊆ BROWN b. ‘some dog is brown’ is true iff DOG ∩ BROWN c. ‘most dogs are brown’ is true iff |DOG & BROWN| > |DOG –BROWN| When we examine the set of determiners that occur in natural language, a striking generalization emerges about the relations they pick out: all natural language determiners are conservative (Barwise and Cooper 1981; Higginbotham and May 1981; Keenan and Stavi 1986). A determiner is conservative if the following biconditional holds: (4) R(X,Y) ⇔ R(X, X∩Y) For example, “every” expresses a conservative relation because (5a) and (5b) are mutually entailing: (5) a. every dog is brown b. every dog is a brown dog Perhaps more intuitively, conservative relations “live on” their internal argument. In determining whether (5a) is true, one need only consider the dogs; other things that are brown (e.g. cats, beavers, grizzly bears, etc.) are irrelevant. Assuming that innate properties of the language faculty are to be expressed in typological generalizations, a reasonable hypothesis to consider is that the lack of nonconservative determiners in the world’s languages derives from the (in)ability of the human language faculty to associate sentences like those in (2) with the claim that a nonconservative relation holds between the set of dogs and the set of brown things (cf. Fox 2002, Pietroski 2005, Bhatt and Pancheva 2007, for analyses). Of course, it could also be the case that typological generalizations like the restriction to conservative determiners fall out not from constraints on possible meanings, but from the interaction between a less restrictive language faculty and other properties of experience. Indeed, Inhelder and Piaget (1964) observed that some children will answer “no” to a question like (6) if there are blue non-circles present. When prompted, these children will explain this answer by pointing to, for example, some blue squares. (6) Are all the circles blue? Taken at face value it appears that these children are understanding (6) to mean that (all) the circles are (all) the blue things. Since answering (6) on this interpretation requires

504 Jeffrey Lidz paying attention to non-circles that are blue, this would mean that children had acquired a nonconservative meaning for all. Similar “symmetric responses” have been observed with questions like (7) involving transitive predicates. Some children will answer “no” to (7) if there are elephants not being ridden by a girl (Philip 1995); see Drozd (2000) and Geurts (2003) for review. (7) Is every girl riding an elephant? Consider the interpretation of every that these children appear to be using. If every is analyzed as a determiner with a conservative meaning, then answering (7) should require only paying attention to the set of girls (and which of the girls are riding an elephant), since this is the denotation of the internal argument girl. Clearly every is not being analyzed in this way by the children for whom the presence of unridden elephants is relevant. However, these children are also not analyzing every as a nonconservative determiner. Such a determiner would permit meanings that required looking beyond the set of girls denoted by the internal argument and take into consideration the entire set denoted by the external argument; but crucially, the external argument is [is riding an elephant] and denotes the set of elephant-riders, not the set of elephants. So allowing a nonconservative relation into the child’s hypothesis space would leave room for an interpretation of (7) on which the presence of non-girl elephant-riders triggers a “no” response, but would do nothing to explain the relevance of unridden elephants. On the assumption, then, that the symmetric responses to (6) and (7) are to be taken as two distinct instances of a single phenomenon, this phenomenon is more general than (and independent of) any specific details of determiners and conservativity. Looking more directly into the source of children’s symmetry errors, Crain et al. (1996) have argued that these errors have more to do with the felicity of the question than with children associating an incorrect meaning with the quantifier. In contexts where it is clear which elements of the scene are relevant to answering the question, children do not make symmetry errors. Thus, children’s symmetry errors do not speak directly to the question of the origins of the constraint on conservativity. Similarly, Sugisaki and Isobe (2001) show that children make fewer symmetry errors when there are a greater number of extra elements (e.g. three unridden elephants instead of one) in the display, suggesting a nonlinguistic source for symmetry errors. Keenan and Stavi (1986) suggest that learnability considerations may play a role in explaining the origins of conservativity. They argue that the size of the hypothesis space for possible meanings would be too large if it allowed both conservative and nonn conservative relations. In a domain with n elements, there are 24 possible determiner n meanings, with 23 of these being conservative. However, Piantadosi et al. (2012) argue, based on a computational model of quantifier learning, that even with a hypothesis space containing nonconservative determiners, it is possible to learn the English quantifiers. Nonetheless, this result leaves unexplained the typological generalization about extant determiner meanings, and does not address the question of whether children consider nonconservative relations when learning natural language quantifiers.

Quantification in Child Language 505 To assess this question more directly, Hunter and Lidz (2013) taught children two novel determiners, one which was conservative and one which was nonconservative. The determiners in Hunter and Lidz (2013), both pronounced gleeb, had a meaning like “not all.” The conservative variant of this determiner is given in (8a) and its nonconservative counterpart in (8b): (8) a. gleeb(X,Y) is true iff X ⊄ Y. b. gleeb′(X,Y) is true iff Y ⊄ X. (9) gleeb girls are on the beach So, on the conservative use of gleeb, (9) is true just in case the girls are not a subset of the beach-goers. Or, said differently, it is true if not all of the girls are on the beach. This relation is conservative because it requires only attention to the girls. Similarly, the biconditional in (10) is true. (10) gleeb girls are on the beach ⇔ gleeb girls are girls on the beach On its nonconservative use, (9) is true just in case the beach-goers are not a subset of the girls. It is true if not all of the beach-goers are girls. This relation is nonconservative because evaluating it requires knowing about beach-goers who are not girls. The biconditional (10) comes out false for this meaning because the sentence on the right can never be true whereas the one on the left can be. That is, gleeb girls are girls on the beach would mean “not all of the girls on the beach are girls,” which can never be true. Hunter and Lidz (2013) found that 5-year-old children were able to successfully learn the conservative quantifier, but not the nonconservative one. Note that the conditions expressed by these two determiners are just the mirror image of each other, with the subset–superset relationship reversed. By any nonlinguistic measure of learnability or complexity, the two determiners are equivalent, since each expresses the negation of an inclusion relation. Thus, the observed difference should follow not from extralinguistic considerations, but rather from constraints on the semantic significance of being the internal or external argument of a determiner. An additional way of seeing the asymmetries between the internal and external arguments of a quantifier comes from the monotonicity profiles of real quantifiers. Consider, for example, the pattern of entailments among the sentences in (11). (11) a. every dog bit a cat b. every chihuahua bit a cat c. every dog bit a siamese cat The sentence in (11a) entails the sentence in (11b), but not the sentence in (11c). The sentence in (11c) entails the sentence in (11a), but the sentence in (11b) does not. These patterns reflect the fact that every is downward-entailing in its first argument, but not its second. Because chihuahuas are a subset of dogs, anything that is true of every dog will also be true of every chihuahua. Similarly events of siamese

506 Jeffrey Lidz cat-biting are a subset of the events of cat-biting in general. However, every is not downward entailing in its second argument, and so this subset–superset relation is unrelated to the meaning of every. Consequently, (11a) does not entail (11c). Note that different quantifiers show different properties with respsect to these entailments. For example, some is not downward-entailing in its first argument. (12a) does not entail (12b). (12) a. Some dogs are running. b. Some chihuahuas are running. Gualmini et al. (2005) used these entailment patterns to show that children’s early understanding of every is appropriately asymmetric. They showed that 5-year-olds correctly interpreted disjunction conjunctively when it occurred in the first argument of every, but not when it occurred in the second argument. That is, (13a) implies that both every boy who ate cheese pizza and every boy who ate pepperoni pizza got a snack. But, (13b) does not entail that every ghostbuster will choose both a cat and a pig. (see Chapter 23 by Goro in this volume for further discussion of the interpretation of disjunction): (13) a. every boy who ate cheese pizza or pepperoni pizza got sick b. every ghostbuster will choose a cat or a pig In sum, children’s knowledge of the asymmetries between the internal and external arguments of quantifiers appear to be in place by preschool. Moreover, these asymmetries, as illustrated through a constraint on possible determiner meanings, contributes to children’s acquisition of novel quantifiers.

21.2.3 Learning Quantifiers: Syntactic Contributions Having identified some semantic conditions on early quantifier meanings, we now turn to the question of how children identify which words are quantificational to begin with. Wynn (1992; see also Condry and Spelke 2008) found that children at the age of 2½ years, who do not yet understand the relationship between the words in the count list and exact cardinalities, nevertheless understand that the number words describe numerosity. This result is striking in light of the observation that it takes children another full year to gain the knowledge of which exact quantities are associated with any particular number word (Wynn 1992; Carey 2009). Examining the distribution of numerals in the CHILDES database of child-directed speech, Bloom and Wynn (1997) proposed that the appearance of an item in the partitive frame (e.g. as X in [X of the cows]) was a strong cue to number word meaning. Considering a sentence like (13) with the novel word gleeb, it is plain to the adult speaker of English that this word cannot describe anything but a numerical property of the set of cows (13a–e).

Quantification in Child Language 507 (13) Gleeb of the cows are by the barn. a. * Red of the cows are by the barn. b. * Soft of the cows are by the barn. c. * Big of the cows are by the barn. d. Many of the cows are by the barn. e. Seven of the cows are by the barn.

*color *texture *size approximate number precise number

In other grammatical frames, such strong intuitions are not observed: adult English speakers allow for the novel word in (14) to describe any number of properties that might be instantiated by a group of cows, (14a–e). (14) The gleeb cows are by the barn. a. The red cows are by the barn. b. The soft cows are by the barn. c. The big cows are by the barn. d. The many cows are by the barn. e. The seven cows are by the barn. Adults, of course, have had a lifetime of language experience, and so their intuitions do not yet inform our understanding of what would compel a child to decide what meaning the speaker of the sentences in (13) or (14) had in mind. Syrett et al. (2012) tested the hypothesis that the partitive frame (i.e. [ X of the cows]) is a strong cue to quantity-based meanings (cf. Jackendoff 1977). If it were, then embedding a novel word in this frame should lead children to pick a quantity-based interpretation in cases when both this and an alternative, quality-based interpretation were available. In Syrett et al.’s (2012) word learning task, they restricted the potential referents for the novel word pim to the quantity TWO and the quality RED. They found that the partitive predicted quantity-based judgments only in restricted cases, casting doubt on the robustness of a syntactic bootstrapping account based on the partitive as a strong cue. These authors went on to observe that in child-directed speech, a great variety of non- quantity-referring expressions occur immediately preceding of. For example, nouns referring to a geometrical feature of an object naturally occur in that position: (15) the back/front/side/top of the refrigerator Similarly, measure expressions also occur immediately before of: (16) a. an hour/mile of the race b. three pounds/buckets of fruit They suggest, therefore, that occurring immediately before of may not be as strong a cue to quantity-based meanings as suggested by Bloom and Wynn (1997).

508 Jeffrey Lidz Wellwood et al. (2014) extend this work, arguing that the relevant cue is more abstract: being a determiner. A partitive construction in English of the form [ __ of the NP], where nothing occurs to the left of the open slot in the frame, is a strong cue to being a determiner, which in turn restricts the interpretation of a word in that slot to quantity-b ased meanings. Note, that the geometrical or measure expressions in (14)–(15) do not occur naturally in this context; when they occur before of, they require a determiner to their left. Wellwood et al. suggest that this is a powerful cue to quantity-b ased meanings, but that children in Syrett et al.’s experiment failed to use the cue because the potential quantity meaning was a particular number, which we independently know are difficult for children at the relevant age to acquire (Gelman and Gallistel 1978; Carey 2009). Thus, Wellwood et al. examine whether children can use determiner syntax as a cue for quantity- based meanings that are not restricted to a specific number (cf. Barner, Chow, and Yang 2009). These authors exposed children to positive examples of a novel word, gleebest, in three syntactic contexts: (16) a. Gleebest of the cows are by the barn. b. The gleebest cows are by the barn. c. The gleebest of the cows are by the barn. Each use of gleebest was also paired with a scene that confounded two properties: the cows by the barn being most numerous (relative to a set of cows not by the barn) and the cows by the barn being the most spotted (again relative to cows not by the barn). These properties were later deconfounded such that the most numerous cows were not the spottiest and children were asked to identify which set of cows was gleebest. They found that children systematically interpreted gleebest as referring to quantity when it occurred as a determiner and that they interpreted it as referring to spottiness when it occurred in the other syntactic frames. Thus, they concluded that children are able to use the syntactic position of a novel word to determine whether it is a quantifier. In particular, identifying a novel word as being a determiner leads to the conclusion that it has a quantity-based interpretation. Summarizing this section, we have seen that children have the cognitive resources to support quantificational meanings early in development. Approximate number representations and set representations are available to be recruited for possible quantifier meanings prior to the acquisition of precise number representations. Moreover, children are restricted in the space of possible relations between sets that they consider as potential quantifier meanings. Finally, children are able to use the syntax of a novel word in order to identify that it is a quantifier and to restrict its interpretation appropriately.

Quantification in Child Language 509

21.3 The Syntax and Semantics of Children’s Quantifiers Having established the cognitive and linguistic constraints on early quantifier acquisition, we now turn to the behavior of quantifiers in syntactic contexts to see the degree to which children’s quantificational syntax and semantics aligns with adults.’

21.3.1 Beyond Exactness in the Acquisition of Number Many sentences containing number words assert either lower bounded or upper bounded interpretations of the number (Horn 1972). For example, in (17a) the students who can go home early must have read at least two articles. Students who read three or more can also go, and so the sentence illustrates a lower bounded interpretation of two. (17) a. Every student who read two articles can go home early. b. Prisoners are allowed to make three phone calls. In (17b), the rule gives prisoners permission to make up to three phone calls and no more (Carston 1998). Hence, this illustrates an upper bounded interpretation of three. The sentential meanings in (17) involve an interaction between the lexical semantics of the number words, the lexical semantics of the other quantifiers or modals in the sentence, and the procedures for deriving pragmatic meaning from sentence meaning (Horn 1972; Carston 1998; Breheny 2008; Kennedy 2013). There is considerable debate about the lexical contribution of number words (Horn 1972, 1992; Barwise and Cooper 1981; Sadock 1984; Krifka 1998a; Breheny 2008; Kennedy 2013) but what is clear is that sentences containing number words do not always convey exact (i.e. simultaneously upper and lower bounded) interpretations. Papafragou and Musolino (2003) examined the meaning of number words in comparison to other scalar terms. They created contexts in which the use of a weaker term (some, two) was true but also compatible with a stronger term (all, three). For example, they showed contexts in which three horses attempted and succeeded at jumping over a fence and then asked participants to judge a puppet’s utterance of “some/two horses jumped over the fence.” Adults rejected both utterances, presumably because pragmatics dictates that the puppet should have used the stronger term. Interestingly, though, 4- and 5-year-old children rejected the numeral but not some. That is, children required an exact use of the number word in this context, just like adults, even though they failed to enrich the meaning of some to some but not all in this context, unlike adults. Moreover, this result suggests that the pragmatic mechanisms that restrict a quantifier’s interpretation in context are different for numbers than they are for some (see Chapter 26 by Papafragou and Skordos in this volume).

510 Jeffrey Lidz Musolino (2004) went on to show that children are not restricted to exact interpretations of number words, however. He showed that children do allow for lower bounded and upper bounded interpretations when those interpretations are licensed by an interaction between the number word and a modal. For example, children observed a game in which a troll tried to put hoops on a pole and a judge (Goofy) set the rules of the game. The troll successfully put 4 out of 5 hoops on the pole. In the at least condition, the experimenter said, “Goofy said that the troll had to put two hoops on the pole to win. Did the troll win?” In this case, adults intepret the condition for winning as putting at least two hoops on the pole and so responded that the troll does win in this case. Four-and 5-year-old children showed the same pattern, indicating that nonexact readings are licensed for children. In this case, even though he put 4 and not (exactly) 2 hoops on the pole, the troll won the prize. In the at most condition, the experimenter said, “Goofy said that the troll could miss two hoops and still win. Did the troll win?” Here, adults interpreted the condition for winning to be missing no more than two hoops, and so answered “yes.” And again, 4-and 5-year-old children showed the same pattern, accepting an outcome of missing one ring as compatible with the use of two. Thus, children are able to take into account interactions between number words and modals in determining whether a sentence expresses an at least or an at most reading of the number word. In still other cases, number words occur in sentences where reference to an exact number is not required, because of scope interactions (Barwise 1979). For example, in a sentence like (18), we have several possible readings. (18)

Three boys held two balloons.

This sentence is compatible with four readings. First, consider two (scope-independent) readings in which the total number of boys and balloons match those in the expression. On the “each-all reading,” there are three boys who, together, hold two balloons. That is, each of the two boys is holding all of the balloons and all of the balloons are held by each boy. This contrasts with the “cumulative reading,” where a total of two balloons are held by a total of three boys, though no one balloon is held by more than one boy. This reading is exemplified by a situation with one boy holding one balloon and another boy holding two. But there are two more readings in which, because of scopal interactions between the number words, the number of boys and balloons picked out may be different from the numbers used in the sentence. On the wide-scope subject intepretation, illustrated by a situation in which three boys hold two balloons each, the sentence identifies three boys but six balloons. Finally, on the wide-scope object interpretation, examplified by a situation with two sets of three boys each holding one balloon, the sentence identifies two balloons but six boys. In these kinds of cases, the meanings of the number words interact in such a way as to hide their precise meanings in the final interpretation. That is, although a number word like three might lexically pick out sets of three things, when

Quantification in Child Language 511 these words occur in complex linguistic contexts, the situations they define may not always have a single set of three things in them. Musolino (2009) examined whether children allowed these kinds of interactions between quantifiers, or whether they were restricted to interpreting them in such a way that, for example, every use of “three” picked out precisely three things. To the extent that they did, this would show that children’s knowledge of number words included whatever syntactic or semantic properties allowed for these kinds of interactions, beyond the lexical semantic contribution of the number word. Musolino found that children were able to access the wide-scope subject interpretation and the each-all interpretation at adultlike levels, but that they showed more difficulty with the wide-scope object and cumulative interpretations. Children’s acceptance of the wide-scope subject interpretation illustrates that children are aware of the effects of scope in number word interpretation. In such contexts, even though the sentence mentions only two balloons, it ultimately refers to six. The fact that children are able to access this interpretation reveals the complexity of their knowledge of the syntax–- semantics interface for number words. Interestingly, children’s difficulties with the cumulative and wide-scope object interpretations appeared to be an exaggeration of adult preferences. Adults also showed preferences for the each–all and wide-scope subject interpretations, though they were able to access all four readings. These preferences may be related to children’s independently observed difficulties with inverse scope interpretations, an issue we turn to now.

21.3.2 Isomorphism and the Scope of Negation Consider the ambiguous sentences in (19) and (20) along with their potential paraphrases. (19) Every horse didn’t jump over the fence a. Every horse failed to jump over the fence. b. Not every horse jumped over the fence. (20) The Smurf didn’t catch two birds a. It is not the case that the Smurf caught two birds. b. There are two birds that the Smurf didn’t catch. In each case, two scope readings are possible, indicated by the paraphrases. In (19), when the quantified subject is interpreted outside the scope of negation, the sentence can be paraphrased as (19a), equivalent to none of the horses jumped over the fence. This reading is called an isomorphic interpretation since the scope relation between the quantified subject and negation can be directly read off of their surface syntactic position. Example (19) can also be paraphrased as in (19b), in which the quantified subject is interpreted within the scope of negation. This is called a non-isomorphic interpretation since in this case surface syntactic scope and semantic scope do not

512 Jeffrey Lidz coincide. Similarly, (20) also exhibits an isomorphic interpretation (20a) as well as a non-isomorphic interpretation (20b). Several studies on the acquisition of quantification have shown that when given a Truth Value Judgment (TVJ) task, preschoolers, unlike adults, display a strong preference for the isomorphic interpretation of sentences like (19)–(20) (Musolino 1998; Musolino et al. 2000; Lidz and Musolino 2002; Musolino and Gualmini 2004; Noveck et al. 2007; among others). This is what Musolino (1998) called “the observation of isomorphism.” Isomorphism effects have been found in several languages (Lidz and Musolino 2002; Lidz and Musolino 2006; Noveck et al. 2007; Han et al. 2007).1 Lidz and Musolino (2002) examined sentences containing a quantifier and negation in Kannada in order to determine whether isomorphism should be described in structural or linear terms. Kannada provided a good testing ground because in that language, unlike English, linear order and syntactic height can be easily deconfounded. For example, in (21) the quantifier in object position precedes negation, but negation c-commands the quantifier. (21) vidyaarthi eraDu pustaka ooD-al-illa (Kannada) student two book read-inf-neg ‘The student didn’t read two books.’ Hence, if isomorphism were structurally driven, we would expect wide scope for negation. If it were based in linear order, we would expect wide scope for the object. Lidz and Musolino found that children assigned wide scope to negation, suggesting that the isomorphism effect should be described in structural, not linear, terms.

21.3.2.1 Explaining Isomorphism In order to account for the observation of isomorphism—as it pertains to universally quantified NPs and negation—Musolino et al. (2000) observe that in Chinese, the equivalent of a sentence like (19) allows only an isomorphic, that is “none” interpretation. They argue that learners should universally consider a Chinese-type grammar first, so as to avoid the potential problem of having to retract from a more permissive grammar (Berwick 1985; Wexler and Manzini 1987; Pinker 1989; Crain 1991; Crain and Thornton 1998; Goro 2007). For Chinese learners, this would be the correct grammar, but English learners at this stage must ultimately move to a more general grammar on the basis of experience. However, this analysis of children’s isomorphism depends on the effect being due to the grammar and not to other factors having to do with the mechanics of ambiguity resolution. Indeed, Musolino and Lidz (2006) showed that English-learning 5-year-olds

1 One exception to this concerns indefinite object NPs in Dutch, which seem to be restricted to narrow scope for children, possibly because these indefinites are interpreted as property-denoting (Krämer 2000).

Quantification in Child Language 513 do not have a hard-and-fast ban against non-isomorphic interpretations. When such sentences occur in contrastive contexts like (22), the isomorphism effect is weakened. (22)

Every horse jumped over the log but every horse didn’t jump over the fence.

Viau et al. (2010), building on Gualmini 2008 and Gualmini et al. (2008), argued that this weakening of isomorphism arose not from the form of (22), but rather to the pragmatics of negation. In particular, they argued that negative sentences are used to negate expectations that are established in the discourse context. The successful jumping events associated with the first conjunct in (22) are sufficient to create the expectation that every horse would also jump over the fence, the negation of which is the non- isomorphic interpretation of the second conjunct. Indeed, they found that such contexts, even without an explicit contrast, reduced the amount of isomorphism exhibited by preschoolers. Together, these studies argue that observations of isomorphism do not reflect grammatical knowledge tout court. Rather, they arise in children whose grammars generate both interpretations of such ambiguous sentences and reflect aspects of an immature ambiguity resolution process. However, to say that the discourse context surrounding the use of sentences containing quantifiers and negation can impact ambiguity resolution, does not yet tell us how discourse contributes to ambiguity resolution and whether it is the only factor that does so. Viau et al. (2010) addressed the second point by demonstrating that other factors beyond discourse can impact children’s interpretations. Specifically, they showed that experience with non-isomorphic interpretations can lead children to access those interpretations even in suboptimal discourse contexts. Using a priming manipulation, they showed that children who heard scopally ambiguous sentences in contexts that were highly supportive of the non-isomorphic interpretation both accessed those interpretations more and also were able to carry that experience over to less supportive discourse contexts. Similarly, they showed that experience with unambiguous sentences like (23a), that are synonymous with the non-isomorphic interpretation of (23b), also carried over to the ambiguous cases, leading to higher rates of non-isomorphic interpretations. (23)

a. Not every horse jumped over the fence. b. Every horse didn’t jump over the fence.

Lidz and Musolino (2003) and Musolino and Lidz (2003, 2006) argued that the isomorphic interpretation is the first intepretation that children access, and that revising initial interpretations is difficult for children (Trueswell 1999; Leddon and Lidz 2006; Conroy 2008; Omaki et al. 2014). Moreover, discourse factors can help the revision process. Support for this view comes from several adult studies demonstrating that children’s only interpretation corresponds to adults’ preferred or initial interpretation (Musolino and Lidz 2003; Conroy et al. 2008). For example, Musolino and Lidz (2003) presented sentences like (24) in contexts that were equally compatible with both interpretations, for example if there were a total of three birds, only one of which was caught. Adults in such contexts explained that the sentence was true because the smurf only caught one

514 Jeffrey Lidz bird, illustrating that they had accessed the isomorphic interpretation. They did not say that the sentence was true because of the two uncaught birds, which verify the non- isomorphic interpretation. (24) The smurf didn’t catch two birds. Similarly, Conroy et al. (2008) asked adults to complete sentence fragments like (25), after hearing a story in which no boys painted the barn and only some of the boys painted the house. (25) Every boy didn’t paint the ___ When participants were asked to complete the sentence under time pressure, they gave 80 percent surface scope responses (completing the sentence with barn). But, without time pressure they were equally likely to say either barn (surface scope) or house (inverse scope). Together, these findings suggest that adults’ initial interpretation of such sentences corresponds to the only interpretation that children arrive at, pointing to revision difficulty as a major contributor to their bias (Trueswell et al. 1999; Conroy 2008; Omaki and Lidz 2014). The priming results of Viau et al. (2010), discussed earlier in this section, further support the view that isomorphic interpretations come first and need additional support to be overridden. By increasing the baseline likelihood of the non-isomorphic interpretation, revision away from the isomorphic interpretation becomes easier. Lidz et al. (2008) also showed that the isomorphism effect can be modulated by structure inside the quantified nominal, comparing sentences like (26a) and (26b). They found that in discourse contexts that made the isomorphic reading true and the non- isomorphic reading false, children accessed the isomorphic reading for both. However, in contexts that made the isomorphic reading false and the non-isomorphic reading true, only children who heard (26b) accessed the non-isomorphic interpretation. (26) a Piglet didn’t feed two Koalas. b. Piglet didn’t feed two Koalas that Tommy fed. The content of the relative clause here seems to focus children’s attention on the contrast between the Koalas that Tommy fed and those that Piglet fed, leading to a higher availability of the non-isomorphic interpretation. This result also fits with the view that isomorphic interpretations reflect children’s initial interpretations. Difficulty revising can be overridden by making the cues to revision salient, as the relative clause in (26b) does. Lidz and Conroy (2007) found a similar effect in the “split-partitive” construction in Kannada, illustrated in (27) (27) avanu ii seebu-gaL-alli eradu orey-al-illa he these apple-pl-loc two peel- inf-neg ‘He didn’t peel two apples.’

Quantification in Child Language 515 Here, the noun phrase that the number word quantifies over occurs with locative case outside of the VP and does not form a constituent with the number word. Nonetheless, such sentences are scopally ambiguous, though adults report a preference for the number to scope over negation. Lidz and Conroy (2007) compared the split-partitive (27) against canonical quantified sentences (21) and found that (holding discourse context constant) children acessed only the non-isomorphic interpretation in the split-partitive sentence, and that they accessed only the isomorphic interpretation in the canonical sentence. This suggests, first, that isomorphism effects are not merely effects of discourse context, and second, that drawing attention to a contrast set linguistically, helps to override isomorphism effects. Morever, in a priming design, these authors found that experience with the split-partitive helped children access the non-isomorphic interpretation of canonical sentences, but that the reverse did not hold. Children who were primed with canonical sentences with an isomorphic interpretation did not show an increase in isomorphic interpretations of the split-partitive.

21.3.3 Scope Ambiguities without Negation Goro (2007), building on Sano (2004) and Marsden (2004), examined children’s interpretations of multiply quantified sentences in English and Japanese. Whereas such sentences are ambiguous in English, they are unambiguous in Japanese: (28) a. Someone ate every food b. Dareka-ga dono tabemono mo tabeta someone-nom every food ate ‘Someone ate every food.’ The surface scope interpretation in which a single person eats all of the food is acceptable in both languages. However, the inverse scope interpretation in which each food is eaten by a different person is possible in English but not Japanese. Goro (2007) tested children and adults’ intepretations of these sentences in contexts that made the inverse scope reading true and the surface scope reading false, and which also made the surface scope reading relevant to the context. He found that both English- speaking adults and 5-year-olds accessed the inverse scope at a rate of about 40 percent. This finding suggests that there is a bias for surface scope interpretations in both adults and children, as discussed in this chapter (see also Kurtzman and McDonald 1993; Marsden 2004; among others). Japanese-speaking adults and children differed, however. The adults, as expected, never accessed the inverse scope interpretation. Children, however, showed acceptance rates similar to those of English-speaking children and adults, suggesting that Japanese learners’ early acquisition of scope is more permissive than their exposure language.

516 Jeffrey Lidz The fact that children allow an overly general set of interpretations to multiply quantified sentences in Japanese could potentially introduce a subset-problem, since evidence for the impossibility of the inverse scope interpretation is unlikely to occur. Goro (2007) argues that the subset problem is averted if the grammatical rules responsible for the lack of inverse scope are not explicitly represented as such. Instead, Goro argues that scope is restricted because of properties of the nominative case-marker, which when attached to certain indefinites enforces a specific interpretation. Other indefinites that resist specific intepretations do not block inverse scope: (29) Hutari ijyou-no gakusei-ga dono kyoujyu-mo hihan-sita two greater-than-gen student-nom every professor criticize-did ‘More than two students criticized every professor.’ Thus, the Japanese child who allows inverse scope with an indefinite subject does not have to learn to remove a covert scope shifting operation from the grammar, but rather only needs to learn the additional interpretive properties of nominative case marking, from which the restriction against inverse scope follows (see Goro 2007 for details).

21.3.4 Quantifier Raising and Antecedent Contained Deletion The potential for quantifiers to shift their scope can also be seen in the interaction between quantifiers and ellipsis. The relevant case concerns Antecedent Contained Deletion (ACD), first discussed by Bouton (1970). ACD is a special case of verb phrase ellipsis (VPE) and provides one of the strongest pieces of evidence for the covert displacement operation of QR (Fiengo and May 1994; Kennedy 1997). Elided VPs are generally interpreted as identical in reference to another VP in the discourse context (Hankamer and Sag 1976). For example, in (30), the elided VP (signaled by did) is interpreted as identical to the underlined VP (tense aside). (30) Lola jumped over every frog and Dora did too. = Lola jumped over every frog and Dora did jump over every frog too. What makes ACD unique, though, is that the elided VP is contained in its antecedent. As is illustrated in (31), the elided VP is part of the underlined VP. (31) Lola jumped over every frog that Dora did. Thus, if we were to replace the elided VP with the matrix VP, the ellipsis site would remain in the replacement VP: (32) Lola jumped over every frog that Dora did[jump over every frog that Dora did …]

Quantification in Child Language 517 Any attempt to resolve the ellipsis with this antecedent VP results in another elided VP ad infinitum. And as long as the elided VP is contained in its antecedent, the two VPs cannot possibly be identical and so the ellipsis cannot be properly resolved. The sentence therefore remains uninterpretable as long as the quantified noun phrase remains in situ. An operation of covert displacement, however, averts the infinite regress (May 1977). After movement of the QNP, the elided VP can now find a suitable antecedent, as illustrated in (33). (33) a. Lola jumped over [every frog that Dora did] next step: QR b. [every frog that Dora did] Lola jumped over t next step: VPE resolution c. [every frog that Dora did [jump over t]] Lola [jumped over t] These examples illustrate that quantifier raising must apply in ACD environments, because if it did not, there would be no way to assign a meaning to the elided VP. Syrett and Lidz (2009) demonstrate that 4-year-olds successfully intepret simple sentences containing ACD and that they distinguish them from coordinate structures with deletion: (34) a. Miss Red jumped over every frog that Miss Black did. b. Miss Red jumped over every frog and Miss Black did too. Four-year-olds interpreted (34a) as requiring that the two characters jump over the same frogs, whereas in (34b) they allowed an interpretation where each character jumps over all of the frogs that she was assigned to jump over, even if they jumped over disjoint sets of frogs. Kiguchi and Thornton (2004), adapting Fox (1999) used the interaction between ACD and the binding principles (Chomsky 1981a) to determine whether children correctly apply QR and also whether they target the appropriate landing site for this operation. The authors showed that 4-year-olds, like adults, consistently reject coreference in sentences such as (5). (35) a. *Darth Vader found heri the same kind of treasure that the Mermaidi did. b. *[the same kind of treasure that the Mermaidi did find heri t] Darth Vader found heri t To identify whether the source of this response pattern was due to a Principle C violation at S-structure (because the name is c-commanded by the pronoun) or to a Principle B violation at LF (because the pronoun is c-commanded by the name), the authors showed that 4-year-olds, who typically obey Principle C (Crain and McKee 1985, see also Baauw, this volume for review), allow coreference between a VP-internal pronoun and a name that it c-commands on the surface, as in (36). (36) Dora gave himi the same color paint the Smurfi’s father did

518 Jeffrey Lidz Here, the only way to avert the violation of Principle C that would obtain at S-Structure is to QR the QNP “the same color paint the Smurf ’s father did” so that at LF (after QR), the NP “the Smurf ” is no longer in the c-command domain of the pronoun “him.” Unlike in (35), the name (here, in the possessor position) does not c-command the pronoun at LF. This derivation is illustrated in (37): (37)

a. Dora gave himi [the same color paint the Smurfi’s father did] b. [the same color paint the Smurfi’s father did] Dora gave himi t c. [the same color paint the Smurfi’s father did [give himi t]] Dora [gave himi t]

The authors argued that the lack of Principle C effects in such cases provides support for children’s ability to apply QR. Children’s responses to sentences like (35) must therefore derive from an LF Principle B violation and not from an S-structure Principle C violation. This conclusion, then, entails that children are able to apply QR in order to resolve ACD (cf. Fox 1999). Kiguchi and Thornton (2004) further argued that while children’s grammars allow QR to target a VP-external landing site, this movement is restricted to a position that is lower than the subject. Support for this claim comes from the fact that children allow coreference in (38a) but reject it in (38b), where there is a Principle C violation at LF. (38) a. *Hei jumped over every fence that Kermiti tried to. b. *Hei [every fence that Kermiti tried to jump over] [jumped over t] Syrett and Lidz (2011) probed the question of the landing site for quantifier raising further. These authors asked whether in multiclause sentences containing ACD, each of the VPs in the sentence was available as an antecedent of the elided VP. They examined both infinitival and finite complements, as in (39)–(40). (39) Kermit wanted to drive every car that Miss Piggy did. = that Miss Piggy drove (embedded) = that Miss Piggy wanted to drive (matrix) (40) Goofy said that Scooby read every book that Clifford did. = that Scooby read (embedded) = that Clifford said that Scooby read (matrix) In the case of nonfinite complements (39), they found that children could access both potential interpretations, suggesting that both the matrix and embedded VPs are available as landing sites for QR. In the case of finite complements, these authors found that 4-year-olds are more permissive than adults in allowing a quantificational NP to scope outside of a tensed clause. Four-year-olds systematically accessed both intepretations of (40), unlike adults, who only accessed the embedded VP interpretation. Syrett and Lidz (2011) argue that children’s permissive interpretations result not from them having acquired the wrong grammar, but rather reflect differences in the memory

Quantification in Child Language 519 processes that control antecedent retrieval in on-line understanding (cf. Martin and McElree 2008). In sum, studies on the syntax of quantification and scope suggest that preschoolers have access to the same grammatical resources as adults, having placed their newly acquired quantifiers in a rich syntactic and semantic system giving rise to complex interactions between those quantifiers. The cases where children appear to be less permissive than adults appear to derive from difficulties revising initial interpretations. Cases where children appear to be more permissive than adults also seem to involve the interaction of independent grammatical or processing factors.

21.4 Conclusions The study of quantification in child language has revealed several important insights. First, children’s initial hypotheses about quantifier meanings are informed by syntactic principles governing the link between word meanings and linguistic categories. Second, early acquisition of quantifiers is informed by constraints on possible quantifier meanings and the cognitive mechanisms through which these meanings can be evaluated. Third, children’s knowledge of quantifiers includes the ability for meaning interactions implemented via syntactic movement. Cases where children differ from adults are explained by two aspects of development: (a) the on-line information processing mechanisms through which sentence interpretation is reached and (b) interaction with pragmatic reasoning. Immature processing mechanisms can impact the capacity to revise initial interpretations or the memory retrieval processes through which interpretations arise in real time. In addition, children’s immature pragmatic abilities can lead them to fail to use information in the context or in the linguistic signal to either revise or restrict their initial interpretations.

Chapter 22

The Ac quisi t i on of Binding and C ore fe re nc e Sergio Baauw

22.1 Introduction Since the early 1980s several studies appeared in which children were tested on their knowledge of the principles that prevent pronouns from referring to the local c-commanding antecedent Peter in (1a), force reflexives to always refer to the local c-commanding antecedent Peter in (1b), and ban DPs, such as Peter in (1c), from referring to any c-commanding antecedent (Jakubowicz 1984; Wexler and Chien 1985; Chien and Wexler 1990; Koster 1993; among many others). (1) a. John said that Peter hit him. b. John said that Peter hit himself c. He said that Mary hit Peter.

[him ≠ Peter] [himself ≠ John] [Peter ≠ he]

In general, these studies showed that children have problems with the interpretation of pronouns, often allowing identification of the pronoun with a local c-commanding antecedent in sentences such as (1a), a phenomenon known as the Delay of Principle B Effect (DPBE). Performance on reflexives (1b), on the other hand, appeared to be much more adultlike. The more recent studies indicate that children’s interpretation of pronouns in so-called Principle C contexts, such as (1c), is highly adultlike too. In this chapter we will discuss several approaches offered in the literature to explain children’s performance on referential dependencies. We will discuss cross-linguistic variation, and structure sensitivity of children’s interpretation of pronouns. The discussion of children’s performance on pronouns will be followed by a discussion of children’s performance on reflexives.

The Acquisition of Binding and Coreference 521

22.2 Binding Theory Pronouns and DPs can establish referential dependencies with other elements in the same sentence. Chomsky (1981a) argued that the ability of pronouns, reflexive pronouns (anaphors), and DPs (R-expressions) to be identified with other DPs within the same sentence is constrained by three binding principles: (2) Standard Binding Theory (Chomsky 1981a, 188) Principle A: an anaphor is bound within its governing category Principle B: a pronoun is free in its governing category Principle C: an R-expression must be free Binding is defined as coreference under c-command, that is, the binder must c- comment the bindee. This means that reflexive pronouns (anaphors) must always establish a referential dependency with a DP that is within the same governing category and c-commands the reflexive. This rules out (3a), because of the lack of c-command, and (3b), because the binder is outside the governing category of the reflexive. (3) a. [Johnj’s neighbor]i hit himselfi/*j. b. Johnj thought that Peteri criticized himselfi/*j. Pronouns, on the other hand, can be bound by a c-commanding DP, as long as it is outside the pronoun’s governing category (4a). They can also corefer with a DP if this DP does not c-command the pronoun. In that case, the pronoun is not bound by the DP, although it corefers with it. (4) a. Johni thought that Peterj hit himi/*j. b. [Johni’s neighbor]j hated himi/*j. Finally, Principle C explains why (5b) is grammatical, while (5a) is not; in (5a) the pronoun c-commands the DP John, but not in (5b), to the effect that coreference is an option. (5) a. *Hei said that Mary hit Johni b. Because hei hated Mary, Johni left the party before it was finished. The status of Principle C as an independent grammatical principle is controversial. Chomsky (1982) suggested that Binding Theory may be limited to Principles A and B. Reinhart (1983) has argued that Principle C effects are caused by pragmatic constraints on pronoun reference. More recently, Schlenker (2005) proposed that Principle C can be derived from a Gricean maxime of minimization. The grammaticality of the coreferential reading in (4b) and (5b) shows that not every referential dependency involves binding. In fact, a distinction is made between binding

522 Sergio Baauw and coreference. Technically, in the case of coreference, pronouns are interpreted as free variables, which receive their value from the (linguistic) context. When the semantic value of a pronoun and the value of a local DP refer to the same individual as in (6a), we speak of coreference. Binding, on the other hand, involves QR (Quantifier Raising) of the binder, after which the binder, in this case the subject, binds its trace and the pronoun through λ-conversion, as in (6b) (Heim and Kratzer 1998):1 (6)

a. May is touching her | | M M Coreference: Mary and her happen to have the same semantic value (= refer to the same individual in the world: “M”) b. λx (x is touching x) (Mary) | M

We can say then that while binding involves establishment of a referential dependency between two lexical items in the semantics, coreference involves establishing a referential dependency at the level of pragmatics or discourse. Binding also requires c- command, coreference does not. In fact, coreference is not normally an option when a personal pronoun is identified with a local c-commanding DP, as in (1a). However, there are exceptions. In some contexts pronouns can be identified with a local c-commanding DP, as in (7): (7) a. Everybody hates John. Mary hates him, Peter hates him, even John hates him. b. Do you know what Mary and John have in common? Mary admires him and John admires him too. Cases of local coreference such as those in (7) are often called “pragmatic coreference,” since they are pragmatically constrained. An influential formulation of this constraint 1

Evidence for the existence of two mechanisms to establish referential dependencies is provided by the two different readings that pronominal elements have in VP elipsis sentences such as (i) (Keenan 1971; Sag 1976; Williams 1977): (i)

John pushed his sister and Paul did too a. Johni pushed hisi sister and Paulj pushed hisj sister too (sloppy reading) b. Johni pushed hisi sister and Paulj pushed hisi sister too (strict reading)

In the sloppy reading, the possessive pronoun is a bound variable, which means that the pronoun in the overt VP and the pronoun in the elided VP are bound by the subject of their own clauses, determining directly the pronoun’s interpretation (John binds his in the first clause, Paul binds his in the second clause). In the strict interpretation, the pronoun receives a fixed interpretation (John), which “accidentally” corresponds with the reference of the subject of the first clause. This means that the interpretation of his in the second clause corresponds with, but is not determined by John. We therefore say that John and his corefer, but John does not bind his (Conroy et al. 2009).

The Acquisition of Binding and Coreference 523 is Grodzinsky and Reinhart’s (1993) Rule I. We will discuss Rule I, and its relevance for acquisition in section 22.3.1. Although Chomsky’s Binding Theory accounts relatively well for the distribution of English pronouns and reflexives, it does not fare so well when it has to account for the distribution of pronouns and reflexives in other languages (Everaert 1986). What is problematic for Chomsky’s Binding Theory is the fact that many languages, including well-studied ones, such as Dutch and Spanish, have more than one type of reflexive pronoun. Dutch for instance has so-called SE reflexives, such as zich, as well as complex or SELF reflexives, such as zichzelf, which have a different distribution and semantics (Jackendoff 1992; Lidz 1997). Problems also arise for long distance anaphors, such as Japanese zibun and Chinese zìjǐ, whose behavior challenges the binding principles as formulated by standard Binding Theory. This has led to modifications of the binding theory and formulation of alternatives, such as Reinhart and Reuland’s (1993) Reflexivity model. Specifically this latter model has been shown to be useful in explaining some binding facts in Dutch and other European languages, and it has also been successful in explaining some of the acquisition results in these languages, as we will see in section 22.3.7.

22.3 Interpretation and Production of Pronouns 22.3.1 The DPBE: Binding versus Coreference An important claim about Principle B delays since Chien and Wexler (1990) is that children’s problems with the interpretation of pronouns do not indicate a violation of Principle B. Chien and Wexler (1990) based this conclusion on an experiment in which they used a version of a Picture Verification Task. In this task 177 English-speaking children between 2;6 and 7 were presented pictures with Mama Bear and Goldilocks performing several actions. In the items testing knowledge of Principle B one of the individuals, Mama Bear was touching herself. When the children were asked Is Mama Bear touching her?, it turned out that 5-to 6-year-old children answered with yes around 50 percent of the time. However, other pictures presented a situation in which several bears standing in front of Goldilocks were touching themselves. When asked Is every bear touching her?, 5-to 6-year-old children answered with yes only 15 percent of the time. (8) a. Mama Bear is touching her b. Every bear is touching her

50% targetlike performance 85% targetlike performance

Chien and Wexler (1990) argue that these results indicate that children do not violate Principle B when they accept (8a). They argue that unlike referential DPs, such as Mama Bear, which can either bind a pronoun or corefer with it, quantified DPs, such as every

524 Sergio Baauw bear, can only enter a binding relation with a pronoun. Since binding would result in a violation of Principle B, children will generally reject a reflexive interpretation of (8b). Since the reflexive interpretation of (8a) can also be obtained by coreference, acceptance of this reading will not automatically lead to a violation of Principle B. Chien and Wexler conclude that children’s acceptance of the reflexive interpretation of (8a) indicates that they often violate the pragmatic constraints on local or “pragmatic” coreference, whereas they seem to repect syntactic constraints such as Principle B. A similar line of reasoning was followed by Avrutin and Thornton (1994), on the basis of sentences such as (9a): (9) a. [The Smurf and the clown] dried them. b. [The Smurf and the clown] dried Big Bird Following Heim et al. (1991), Avrutin and Thornton claim that in (9b) the Smurf and the clown can be interpreted collectively (the Smurf and the clown dry Big Bird together, with a big towel) or distributively (the Smurf and the clown are part of different drying events). Avrutin and Thornton further claim that in the distributive reading the Smurf and the clown behave as a quantified DP. As a result, the reflexive interpretation of (9a) can in principle be construed in two ways too: as a binding configuration, in which case the Smurf and the clown are interpreted distributively, or as a coreference construal, in which case the Smurf and the clown are interpreted collectively. In the adult language both interpretations are ruled out; the former because it violates Principle B, the latter because it violates constraints on local coreference. If children allow local coreference across the board, but respect Principle B, they are expected to allow a reflexive-collective interpretation, but to reject the reflexive- distributive interpretation. Avrutin and Thornton tested this hypothesis with a Truth Value Judgment task (TVJ) in which 33 3- to 4-year-old children participated. The results showed that the 17 children that allowed the reflexive interpretation of (15a), accepted the reflexive interpretation 93 percent of the time in the collective contexts, but only 42 percent of the time in the distributive contexts, which according to Avrutin and Thornton confirms the idea that the DPBE involves non-adult-like coreference. However, Chien and Wexler’s (1990) claims were strongly weakened by Elbourne (2005). Elbourne argues that the contrast found between sentences with quantified subjects and sentences with referential subjects does not show up in all studies (Lombardi and Sarma 1989; Boster 1991).2 He further argues that the contrast found by Chien and Wexler (1990) and some other studies is an experimental artifact; the referential and quantificational test conditions differed considerably, to the effect that in the quantificational conditions the quantified DP was not a prominent antecedent. This had the effect that childen avoided reference to it (see also Conroy et al. 2009 and our discussion

2 Philip and Coopmans (1996b) only found a mild improvement of the results in Dutch sentences containing quantified subjects. They attribute this to semantic properties of the Dutch quantifier iedere (every/each/any) that was used in the test sentences.

The Acquisition of Binding and Coreference 525 in section 22.3.4). Elbourne also argues that Avrutin and Thornton’s 42 percent acceptance of the distributive reading is too high to claim that children reject local binding. Elbourne (2005) concludes that young children do go through a stage in which they accept violations of Principle B. Opinions not only differ with respect to the nature of children’s problems with pronoun interpretation, views also differ with respect to the cause of these difficulties. In general, three kind of approaches can be distinguished in the literature; (i) those that consider the DPBE as the result of incomplete acquisition of some grammatical rule or feature, (ii) those that see the DPBE as the result of limited processing resources, and (iii) those that see the DPBE as (largely) the result of an experimental artifact.

22.3.2 DPBE as Incomplete Acquisition Early accounts viewed the DPBE as the result of the absence of Principle B in the grammar of young children (Chien and Wexler 1987). However, by the end of the 1980s there was consensus about children’s adherence to the Binding Principles. Errors in the realm of binding and coreference were attributed to either parametrically determined properties of the grammar, or extra-syntactic factors. An example of the first type of approach is McKee’s (1992) account of the DPBE. This account is based on Manzini and Wexler’s (1987) parametrization of the notion Governing Category (GC) of the Standard Binding Theory. In Chomsky (1981a) the GC of a pronoun or DP was defined as the minimal XP that contains the pronoun or DP, a subject accessible to the pronoun or DP, and its governor. Manzini and Wexler (1987) propose that languages may differ with respect to the exact definition of the GC. As a result the domain in which a pronoun needs to be free may differ from language to language. McKee (1992) proposes that children’s apparent violations of Principle B are the results of an incomplete parameter setting process. She proposes that children initially consider the VP as the GC of a pronoun. As a result the pronoun can be bound by the subject, which she assumes to be outside the VP, in [Spec, IP]. Children start performing adultlike when they start taking the IP as the language specific GC of English. A different approach is taken by Chien and Wexler (1990). They show, on the basis of children’s different performance on (8a) and (8b), that the DPBE involves non-adultlike acceptance of coreference, not binding, as McKee assumes. The reason why children accept local coreference is the incomplete acquisition of Principle P, a pragmatic rule that regulates local coreference. Chien and Wexler argue that, unlike Principle B, which they take to be innate, pragmatic rules need to develop, which may take several years. Elbourne (2003; cited by Elbourne 2005) argued that Principle B may be subject to parametrization, since in some languages, such as Old and Middle English, pronouns can refer to local c-comanding antecedents. Young children may initially have the same parameter setting as in Old English. A similar proposal was made by Fodor (1992; cited by Elbourne 2005), who proposed that English children might initially treat him/her as ambiguous between a real pronoun (sensitive to Principle B), and a weak reflexive

526 Sergio Baauw as zich in Dutch. In that sense, English children might resemble Frisian adults, since in Frisian pronouns can show up in reflexive contexts where Dutch requires weak reflexives (i.e. zich) instead of pronouns. Accounts that assume incomplete acquisition often have difficulties explaining why children’s rejection of local coreference is optional instead of 0 percent, which would be expected if some linguistic principle is absent, or if a parameter is set incorrectly. Also, Bloom et al. (1994) have shown that the DPBE does not show up in spontaneous production. If children lacked a principle that constrains local coreference or choose a different parameter setting for the definition of the governing category, one expects them to produce instances of locally coreferring pronouns at least occasionally. A recent experimental study by de Villiers et al. (2005) confirms Bloom et al.’s (1994) results. In this study 68 English-speaking children between the ages of 4;6 and 7;2, with a mean age of 6;2, were tested on a TVJ task and on a production task in which they had to describe pictures. The results showed that children’s production was considerably better than their comprehension. In the conditions eliciting reflexives, they produced pronouns between only 3 and 14 percent of the time (depending on the kind of sentence). In trials eliciting pronouns they produced reflexives only between 0 and 2.8 percent of the time (depending on the kind of sentence). Finally, incomplete learning approaches face a learnability problem. It is not easy to see how the constraints of local coreference can be acquired without negative evidence. Elbourne (2005) recognizes that a parameter setting approach as proposed by Elbourne (2003) faces the problem that in the absence of negative evidence, it is difficult to explain how children unlearn the use of pronouns in reflexive contexts. McKee’s proposal faces similar problems. Although she argues that there may be triggers in the data that lead English children to extend the GC to the IP (see McKee 1992: 50), one needs to explain why this trigger is not available from the beginning. On the other hand, it is inevitable that some aspects of pronoun interpretation require some kind of learning; children need to learn at least the partially language specific, referential properties of pronouns and reflexives.

22.3.3 DPBE as the Result of Processing Limitations Grimshaw and Rosen (1990) already argued that children’s apparent violations of constraints do not automatically imply absence of knowledge of the relevant constraints. Grodzinsky and Reinhart (1993) propose that the DPBE that shows up in (8a) is not the result of the lack of knowledge about Principle B or the absence of a pragmatic Rule. Instead, they claim that it is due to a limited ability to apply Rule I, a cross-modular economy condition that regulates local coreference: (10) Rule I (Grodzinsky and Reinhart 1993) NP A cannot corefer with NP B if replacing A with C, C a variable A-bound by B, yields an indistinguishable interpretation.

The Acquisition of Binding and Coreference 527 The rule is invoked whenever local coreference is an option, and states that coreference is only admitted if it leads to a different interpretation than a binding construal. This is the case in, for instance, (7b). In (7b) the binding construal of the second conjunct of the last sentence, John admires him too, corresponds to an interpretation in which the subject John is involved in self-admiration (as in John admires himself). However, this conflicts with the context of this sentence, which indicates that is is not about self-admiration, but about admiring John, a reading that corresponds to a coreference of John and him. Grodzinsky and Reinhart (1993) further argue that in order to determine whether coreference is a legitimate option, the speaker needs to compare both the binding construal and the coreference construal of the same sentence. According to Grodzinky and Reinhart this computation often exceeds children’s processing capacity, leading to a breakdown of Rule I. As a result they will guess to determine the reference of the pronoun. They argue that the 50 percent adultlike performance that is often found is the result of the guessing strategy. The role of limited processing resources as an explanation for the errors that children make is also higlighted by Avrutin (1999, 2004, 2006). Following ideas by Reuland (2001), Avrutin argues that for conveying meaning, both syntax and context (visual, linguistic discourse) can in principle be used. However, for economy considerations, the use of syntax is usually preferred. In the case of referential dependencies it leads to a general preference for binding (syntax/semantics), instead of (local) coreference (pragmatics). Avrutin further argues that in populations with limited syntactic processing resources, such as children, syntax is no longer the cheapest option to convey meaning. In the case of referential dependencies, this may lead children to allow local coreference as an equally cheap way to establish referential dependencies as binding. A processing approach is also defended by Van Rij et al. (2010), who work within a Optimality Theory (OT) framework. They exploit the “direction sensitive” nature of OT grammars, and argue that the form–meaning relation of pronouns is different for hearers (who need to optimize from form to meaning) than for speakers (who need to optimize from meaning to form). They argue that from a hearer perspective, pronouns are referentially ambiguous, that is, they can have either coreferential or disjoint reference (Hendriks and Spenader 2005). As a result, in order to interpret pronouns correctly, the hearer needs to take into account the speaker’s perspective. They argue that this step often exceeds children’s processing capacity. In order to test their hypothesis they tested 75 4-to 6-year-old Dutch children with a variant of a Picture Verification experiment. In this experiment the test sentences were presented at different speech rates, assuming that a lower speech rate facilitates processing. The results showed that a reduction of the speech rate had a positive effect on the correct interpretation of pronouns.3 Processing approaches are often defended by quoting experimental results from language breakdown, such as aphasia. It has been claimed that aphasic patients have a reduced (syntactic) processing ability, but they are not in the process of language 3

An aditional interesting aspect of the “bidirectional OT approach” is that it provides a way to account for the observed production-comprehension asymmetry. See Hendriks and Spenader (2005).

528 Sergio Baauw acquisition. Grodzinsky et al. (1993) showed that patients with agrammatic Broca’s aphasia exhibited a highly similar pattern with respect to their interpretation of pronouns (and reflexives). Similar correspondences between children and patients were found for Spanish and Dutch patients (Baauw and Cuetos 2003; Ruigendijk et al. 2006).

22.3.4 The DPBE as an Experimental Artifact Recently, Conroy et al. (2009) argued that the DPBE that was found in several studies, including Thornton and Wexler (1999), is mainly due to deficiencies in the experimental set up. They argue that children’s overacceptance of local coreference was due to the failure of most studies to respect the condition of plausible dissent (Crain and Thornton 1998). Applied to a TVJ task, this condition states that in order to evaluate the truth of a statement, both the situation in which the statement is true, and the situation in which the statement is false should have been under consideration at some point of an experimental trial. In the case of a study that tests children’s interpretation of pronouns in sentences such as The boy is touching him, this means that both the situation in which the boy is touching another male individual (which would make the sentence true) and the situation in which the boy is touching himself should have been presented to the child. By satisfying the condition of plausible dissent, both the sentence internal antecedent (the boy) and the sentence external antecedent (another male individual) are made equally prominent, so that the child’s choice of one or the other antecedent is motivated only by constraints on binding and coreference (Principle B, Rule I) and not by the pragmatic prominence of one of the antecedents. Following Elbourne (2005), Conroy et al. (2009) argue that many studies, including Thornton and Wexler (1999), made the sentence internal antecedent of the test sentences containing a referential subject more salient than the sentence external antecedent. To illustrate this, they discussed one of the trials, which tested children’s interpretation of the sentence Bert brushed him. In this trial children were presented with a story in which Bert asked several reindeers to brush him, but all of them refused. In the final scene, Bert decides to brush himself. Although at some point of the story the possibility of Bert brushing a reindeer was briefly mentioned, complying formally with the condition of plausible dissent, the story was about Bert trying to get brushed. This made Bert much more prominent as an antecedent of the pronoun than the reindeer. According to Conroy et al. (2009), this explains why children accepted local coreference in this study 58 percent of the time; children, unlike adults, often fail to inhibit the activated incorrect local coreference reading. On the other hand, as argued by Elbourne (2005), when the test sentence contained a quantified subject, as in Every reindeer brushed him, the prominence of Bert will lead children to avoid identifying the pronoun with the subject of the sentence, resulting in a higher rate of adultlike responses in sentences containing quantified subjects. Conroy et al. (2009) present a new experiment (16 children between 4 and 5;6 years of age, mean age 4;6) in which the deficiencies in the experimental setup were corrected, by satisfying the condition of plausible dissent, granting equal

The Acquisition of Binding and Coreference 529 prominence to both sentence internal and sentence external antecedents. The results were as predicted: the rate of non-adultlike yes-responses dropped to 11 percent, and the contrast between quantified and referential subjects disappeared (14 percent non-adultlike yes-responses).4 Conroy et al. (2009) do not deny that there is something real about the DPBE in child language, since otherwise it would be difficult to explain why there are cross-linguistic and structure related differences with respect to the intensity of the DPBE, as we will see in the following sections. But they argue that the DPBE is much milder if the relevant pragmatic felicity conditions are satisfied in the experimental setup.5

22.3.5 Principle C Contexts In Crain and McKee (1985) 62 English-speaking children with a mean age of 4;2 were tested with a TVJ task on their interpretation of pronouns in sentences such as (11): (11) a. He washed Luke Skywalker. b. He ate the hamburger [when Smurf was in the fence.] c. [When he stole the chickens], the lion was in the fence. Identification of the pronoun he with Luke Skywalker in (11a) and Smurf in (11b) is ungrammatical, but he can be identified with the lion in (11c). Crain and McKee showed that children accepted the coreference reading 73 percent of the time in (11c), but rejected this reading 88 percent of the time in (11a–b). Crucially, only in (11a–b) does identification of the pronoun with the DP lead to a violation of Principle C. This was taken as evidence for children’s early adherence to Principle C (see also Grimshaw and Rosen 1990; Lust et al. 1992).6 This adherence to Principle C is not limited to English, but has been found in children acquiring other languages, such as Italian (Guasti and Chierchia 2000) and Russian (Kazanina and Phillips 2001).7 However, children’s highly adultlike performance is unexpected if we take into consideration that identification of the pronoun with the DP it c-commands can also be done through coreference instead of binding. In some contexts this is grammatical: (12) Only HE thinks that Bill is nice. [HE = Bill] 4 A study by Hendriks et al. (2011), in which Dutch adults had to perform a Picture Verification task, while reaction times and eye movements were registerred, indicates that even adults are sensitive to the discourse context when they interpret pronouns; their reaction times were higher in those trials in which no single topic was presented in the context-setting sentence preceding the test sentence. A similar sensitivity to discourse contexts was found for children by Spanader et al. (2009). 5 Note that Conroy et al. (2009) attribute the residual DPBE to processing limitations of children. 6 Recently, using a preferential looking task, Lukyanenko et al. (2010) showed that infants as young as 30 months show sensitivity to Principle C. 7 Note that in Russian (11c) would be ungrammatical, which means that apart from Principle C, another language specific restriction on backward anaphora is at work. Interestingly, Russian children accept coreference in this condition, although they rejected it in the conditions violating Principle C.

530 Sergio Baauw Rule I (or some orther coreference rule) will prevent this option across the board, as in (11a–b). Since children have problems with ruling out coreference (probably as a result of the breakdown of Rule I), they are expected to accept coreference roughly 50 percent of the time in (11a–b), contrary to fact. It is important to note, though, that the status of Principle C as an independent syntactic principle is controversial (Chomsky 1982; Reinhart 1983; Schlenker 2005), and may be derived from constraints on working memory and pragmatic rules of reference assignment to pronouns. These may be relevant to account for children’s highly targetlike performance. For proposals along these lines, see Avrutin (1994) and Thornton and Wexler (1999).

22.3.6 Cross-linguistic Variation: Clitics and Strong Pronouns Although the earliest studies on the DPBE were conducted with English-speaking children (Jakubowicz 1984; Chien and Wexler 1987, 1990; Grimshaw and Rosen 1990; among others), these were soon followed by studies in other languages, such as Icelandic (Sigurjónsdóttir 1992), Italian (McKee 1992), Dutch (Koster 1993; Philip and Coopmans 1996a), Russian (Avrutin and Wexler 1992; Avrutin 1994), Spanish (Baauw et al. 1997; Baauw 2000), French (Hamann et al. 1997), Catalan (Escobar and Gavarró 1999), Norwegian (Hestvik and Philip 2000), Greek (Varlokosta 2000), and Hungarian (Margócsy 2000). The results of these studies showed that the DPBE does not universally show up in children’s interpretation of sentences such as (1a). It has been found in the acquisition of languages such as English, Dutch, Russian, and Icelandic, but not in the acquisition of the Romance languages, Greek, Norwegian, and Hungarian. Spanish 4-to 6-year-olds, tested with a Picture Verification task, rejected a reflexive interpretation of (13a) 90 percent of the time (Baauw et al. 1997; Baauw 2000). Dutch children, tested with the same methodology showed a similar DPBE as English children; the 5-year-olds rejected (13b) only 32 percent of the time, the 6-year-olds 50 percent of the time (Philip and Coopmans 1996a). (13) a. La niña la señala the girl her points-at b. Jan wees naar hem. John pointed at him The absence of a DPBE in some languages is usually related to the presence of syntactic clitics in these languages, hence we will call this phenomenon the Clitic Exemption Effect (CEE) (Baauw 2000). McKee (1992) argued that the absence of a DPBE in Italian is due to the fact that Italian clitic pronouns are in a VP-external position (14a), adjoined to INFL, while pronouns in English remain VP-internal (14b).

The Acquisition of Binding and Coreference 531 (14) a. [IP Lo gnomo [I’ loi INFL [VP lava ti]]] the gnome him washes b. [IP The gnome [VP washed him]] According to McKee (1992), English children can identify the pronoun with the subject in [Spec, IP] because they initially misanalyse the VP as the GC of the pronoun. Due to the VP-external position of clitic pronouns, Italian children, on the other hand, automatically analyze the IP as the GC of pronouns. Since binding of the clitic pronoun by the subject violates Principle B of the Binding Theory, children, like adults, reject such an interpretation. McKee seems to capture the correct generalization; the DPBE does not affect syntactic, that is, VP-external clitic pronouns (Baauw 2000).8 A second group of explanations relates the CEE to the referential deficiency of clitics as elements that cannot (locally) corefer (Avrutin 1994; Cardinaletti and Starke 1995, 1996; Thornton and Wexler 1999). According to Avrutin (2004) and Thornton and Wexler (1999) a coreference construal is dependent on a pronoun’s ability to refer deictically, that is, the ability to refer directly to objects or individuals in the visual context (often supported by pointing). This is something strong pronouns can do, but clitics cannot: (15) Pointing: Pointing:

#Yo la I herclitic Yo hablé I talked

besé kissed con ella with herstrong pronoun

Following Heim (1982), Avrutin (2004) claims that the ability of an element to refer depends on its ability to introduce a “guise,” a mental representation of an object or individual. NPs can introduce guises. Pronouns, on the other hand, normally do not introduce new guises, unless they refer deictically. When the “deictic” guise introduced by the strong pronoun and the guise introduced by an NP refer to the same individual or object in the world, we say that these two elements corefer. Avrutin argues that the DPBE results from children’s ability to use pronouns deictically even outside a pointing context. They do so because they often fail to take into account the perpective of the hearer, who normally requires pointing in order to identify the object or person the pronoun refers to. Since clitics cannot refer deictically, coreference is not an option, hence the DPBE has no chance to show up in constructions containing clitics. An interesting aspect of this approach is that it relates the DPBE to local coreference, instead of binding, and views the CEE as the result of the inability of clitics to corefer. However, a problematic aspect of this approach is that it predicts the CEE to 8 Note that McKee’s (1992) account implies that the DPBE involves binding, not coreference. This means that McKee (1992) cannot easily explain English children’s improved performance on sentences containing quantified subjects as found by Chien and Wexler (1990). However, as Elbourne (2005) shows, these results are likely the result of an experimental artifact.

532 Sergio Baauw show up in constructions with weak pronouns in general, not just syntactic clitics of the Romance kind, since regular weak pronouns, such as English ’em in I saw ’em cannot be used deictically either. Baauw (2000) showed that Dutch children tested on their interpretation of the weak pronoun ’m (him) performed adultlike around 50 percent of the time, which is not much different than their performance on the full pronoun hem. In Baauw (2000) a third account of the CEE is proposed, which attempts to combine McKee’s (1992) insight that the CEE is limited to languages with syntactic clitics and Chien and Wexler’s (1990) insight that the DPBE involves coreference, not binding. According to this account, which is based on Sportiche’s (1992) analysis of clitics, syntactic clitics cannot enter in local coreference relations as a result of the movement of the null object to the specifier of a Clitic Phrase, a functional projection headed by the clitic (Sportiche 1992). Baauw (2000) further argues that the trace left by movement of the null object converts the structure c-commanded by the specifier of the clitic phrase into a predicate (Heim and Kratzer 1998; Neeleman and Weerman 1999). After the application of QR of the subject DP and identification of the subject trace with the object trace, the resulting representation of the reflexive interpretation of (16a) is (16b), a clear violation of Principle B: (16) a. La niña la señala the girl her points-at b. [λx (x señala x)] (la niña) Baauw (2000) further argues that since children respect Principle B, Spanish children will always reject a reflexive interpretation of (16a).9 Di Sciullo and Agüero-Bautista (2008) recently proposed an alternative to Baauw (2000). Like McKee (1992) and Baauw (2000) they claim that the CEE is the result of the VP external position of syntactic clitics, and like Baauw (2000) they argue that syntactic clitics cannot establish local coreference relations. They assume that subject DPs are generated inside the VP, and that in this position they cannot bind the VP internal object pronoun. Following Heim and Kratzer (1998), they argue that for binding to take place, the subject has to move outside the VP. From that position the subject wil bind both the VP internal subject trace and the pronoun: λx (x touches x)] (Mama Bear). This construal leads to a Principle B violation. However, the subject can also only bind the subject trace. This trace can then establish a coreference relation with the pronoun. This respects Principle B, but violates Rule I. But since children often fail to apply Rule I

9

According to Baauw (2000) clitics can still be used to refer to clause-external antecedents because the internal argument can be bound by a topic DP. This topic can be overt, as in the case of clitic left dislocations (ia), or covert (ib) (see also Delfitto 2002 for a further elaboration of the idea that clitic constructions are hidden left dislocation constructions). (i) a. [Top A la madre] la niña la señala Acc. the mother the girl herclitic point-at b. [Top ec] la niña la señala the girl herclitic pointed-at

The Acquisition of Binding and Coreference 533 correctly, they will often accept this reading. Di Sciullo and Agüero-Bautista (2008) further argue that due to the VP external position of syntactic clitics, this latter construal is impossible in languages such as Spanish, so that Spanish children will reject a reflexive interpretation of (16a). If the absence of a DPBE is related to either referential or syntactic properties of clitic pronouns, it is predicted that it will show up in constructions with strong pronouns, even in languages that have syntactic clitics, such as Spanish. However, so far, the evidence is inconclusive. Cardinaletti and Starke (1995) argue that Spanish children show a DPBE in constructions containing strong pronouns. The authors refer to Padilla (1990), who tested Spanish children on their interpretation of sentences such as (17), in which the strong pronoun is the complement of a preposition: (17) María apuntó hacia ella Mary pointed at her The problem with this type of sentence is that in Spanish pronominal complements of prepositions can be much more easily identified with the local subject than in English or Dutch, where this type of anaphoric readings is limited to the complements of locative prepositions such as next to or behind. In fact, Baauw (2000), using a Picture Verification task, has shown that not only do Spanish children accept a reflexive interpretation of sentences such as (17) around 50 percent of the time, but adults do too. Similar results were found by Escobar and Gavarró (1999) for Catalan children and adults; 70 percent of the adults accepted a reflexive interpretation and 55 percent of the 5-year-olds. The problem of the acceptability of an anaphoric reading in the adult language is avoided when strong direct object pronouns are used. Baauw (2000), using a Picture Verification task, tested 32 Spanish speaking children, with a mean age of 5;12 on sentences such as (18): (18) El niño le dibujó a él the boy cl painted acc him The results showed that Spanish children reject the coreferential reading of él with el niño 83 percent of the time, which was statistically no different from their performance on sentences such as (13a). Baauw (2000) relates the absence of a DPBE in sentences containing strong direct object pronouns to the fact that in Spanish a strong direct or indirect object pronoun is obligatorily doubled by a clitic pronoun, as in (18), where the strong pronoun él is doubled by the clitic pronoun le. The presence of clitic doubling could also account for the absence of a DPBE in Catalan and Greek sentences containing strong direct object pronouns (Escobar and Gavarró 1999; Varlokosta 2000).10

10 Note that Varlokosta (2000) also found highly adultlike performance on sentences containing direct object strong pronouns that were not doubled by clitics. However, Greek strong pronouns such as afton and aftin may have a different status than strong pronouns in Romance or Germanic languages, since they seem to behave more like demonstratives (Sanoudaki 2003).

534 Sergio Baauw However, in some languages, such as Italian, there is no clitic doubling when strong direct object pronouns are used. (19) a. Il ragazzo lo sta indicando. the boy himclitic is pointing-at b. Il ragazzo sta indicando lui. the boy is pointing-at himstrong Berger (1999) tested Italian children with a Picture Verification task on both (19a) and (19b). The results indicate that children reject the reflexive interpretation of (19a), which contains a clitic pronoun, but accepted (19b), in which the direct object is a strong pronoun, 61 percent of the time. However, caution should be taken in interpreting these results as evidence for a DPBE in Italian. The same children also rejected the grammatical non-reflexive interpretation of (19b) 29 percent of the time. It may be the case that a methodological factor intervenes in children’s interpretation of strong pronouns. Normally, strong direct object pronouns in Spanish and Italian are used in contrastive context. In these contexts the strong pronoun is usually stressed. As McDaniel and Maxfield (1992) have shown, contrastive stress facilitates local or “pragmatic” coreference. Researchers may try to avoid contrastive stress when they read the test sentence, but this may make the test sentence sound unnatural, which may affect the experimental results. In sum, the cross-linguistic differences show that children have acquired the language specific properties of pronouns before the age of 5. By that age Spanish and Italian children “know” that weak pronouns in their language are syntactic clitics, while Dutch and English children know by that time that weak pronouns in their language are “regular” pronouns that allow local coreference. Production data from Spanish and Italian acquisition confirms this. Spanish and Italian children make hardly any mistakes when they start to use clitics; they always place them correctly (Guasti 1994; Ezeizabarrena 1996; Schaeffer 1997; Lyczkowsky 1999). If the referential properties of clitics are related to their position in the syntactic structure (Baauw 2000; Di Sciullo and Agüero-Bautista 2008), it is predicted that once the syntactic properties are acquired the referential properties are acquired as well.

22.3.7 Structural Variation Sigurjónsdóttir and Coopmans (1996), using a Picture Verification task, found that Dutch children’s performance on pronominal anaphora was not uniform across sentences. They discovered that children performed less adultlike on (20a) than on (20b,c). (20) a. Jan waste hem. John washed him b. Jan aaide hem. John stroked him c. Jan wees naar hem. Jan pointed at him

The Acquisition of Binding and Coreference 535 Sigurjónsdóttir and Coopmans reported that the reflexive interpretation of (20b) was rejected by 58 percent of the 5-year-olds and 50 percent of the 6-year-olds. For (20c) the results are similar: the reflexive interpretation was rejected by 32 percent of the 5-year-olds and 50 percent of the 6-year-olds. Sentence (20a), on the other hand, was rejected by only 17 percent of the 5-and 6-year-olds. In sum, whereas rejection rates for the reflexive reading of (20b–c) were at approximately the same level as found in English children (roughly 50 percent), the reflexive interpretation of (20a) yielded much lower rejection rates (roughly 20 percent). This variation could not easily be explained within the framework of the Standard Binding theory, since this framework does not distinguish between (20a) and (20b–c) (Chomsky 1981a). However, in Reinhart and Reuland’s (1993) Reflexivity model, the verbal predicates in (20a) and (20b–c) are different. In this alternative model, Principle B is an interface filter that constrains the formation of reflexive predicates to those that are reflexive-marked: (21) Binding Principles (Reinhart and Reuland 1993) Principle B: a reflexively interpreted predicate must be reflexive-marked In Dutch the ability of a verbal predicate to be reflexive-marked depends on the lexical- semantic properties of the verb. Some verbs, such as wassen (wash) can be reflexive- marked, others, such as aaien (stroke) or wijzen naar (point at) cannot (Everaert 1986). Verbs that can be reflexive-marked are easily recognizable in Dutch; they allow the reflexive pronoun zich to occupy the object position: (22) a. Jan waste zich. John washed SE b. ??Jan aaide zich. John stroked SE c. *Jan wees naar zich. John pointed at SE According to Reinhart and Reuland (1993), only zich can be used in the object position of a reflexively interpreted verb, since other third person pronouns such as hem (him) or haar (her) would violate a condition on A-Chains: (23) A-Chain Condition (Reinhart and Reuland 1993) A maximal A-Chain (a1 … an) contains exactly one link -a1-that is both [+R] and case-marked. Reinhart and Reuland (1993) argue that an A-Chain is formed between two elements that are in a local binding configuration. The A-Chain Condition states that the tail of an A-Chain—the bound element—needs to be referentially deficient, that is, [–R]. SE anaphora such as zich are [–R]. Pronouns such as hem or haar, on the other hand, are [+R], and would violate this condition. This explains why (20a) cannot be interpreted reflexively, in spite of the fact that wassen ‘wash’ is a predicate that can be reflexive- marked, which means that a reflexive interpretation would satisfy Principle B.

536 Sergio Baauw This lead Sigurjónsdóttir and Coopmans (1996) to propose that the extra strong DPBE in (20a) is due to Dutch children’s tendency to interpret third person pronouns optionally as [–R] pronouns, that is, elements similar to reflexive pronouns such as zich. When children interpret hem ‘him’ in (20a) as a [–R] element, the A-Chain Condition, which requires the tail of an A-Chain to be [–R], is no longer violated. However, a [–R] analysis of pronouns does not save (20b–c); in these structures Principle B would still be violated, in the same way as it is violated in (22b–c). This means that children optionally analyze pronouns as [–R] elements only in those contexts in which Principle B is satisfied. Note that if this analysis is on the right track, it means that the DPBE does not always involve coreference; when children analyze pronouns as [–R], the resulting construal involves Chain Formation, hence binding.11 There is another context in which children show an extra strong DPBE: ECM constructions. Philip and Coopmans (2006a) tested Dutch-speaking children, using a Picture Verification task containing pictures in which individuals performed actions in front of mirrors. They found that children performed in a less adultlike manner on (24) than on (20b–c): (24) Het jongetje zag hem dansen the boy saw him dance Even the 7-year-olds rejected the reflexive interpretation of (24) only 16 percent of the time, whereas the reflexive interpretation of (20b–c) was rejected 55 percent of the time. The scores for the 4-to 6-year-olds were even worse: 6 percent rejection of the reflexive interpretation in (24) and 36 percent in (20b–c).12 Philip and Coopmans (1996a) argue that the extra strong DPBE that shows up in ECM constructions can also be accounted for by the Reflexivity model. As argued in section 22.2, the main clause subject and the subject of the embedded sentence are not arguments of the same verbal predicate. As a result, in (25) binding of the pronoun hem by the main clause subject Jan does not lead to the creation of a reflexive predicate, which makes Principle B, irrelevant: (25) Jan zag [hem dansen] John saw him dance

11 Note that a similar contrast between inherently reflexive verbs and non-inherently reflexive verbs was found for Norwegian, although there is no general DPBE in Norwegian. Hestvik and Philip (2000) reported that Norwegian children accepted the reflexive interpretation of inherently reflexive predicates, as in (i), 68% of the time.

(i) Driver mannen og wasker ham? ‘Is the man washing him?’ 12 An extra strong DPBE in ECM constructions was found earlier by Jakubowicz (1984) for English children. However, Jakubowicz tested ECM sentences containing different main verbs, as in (i):

(i) Peter said that John wanted him to push the car. Also, Jakubowicz used an Act Out task instead of a TVJ task. This explains the overall higher rate of targetlike performance in this study. For task effects in the study of the DPBE, see Baauw et al. (2011).

The Acquisition of Binding and Coreference 537 The only reason why binding of hem by Jan is ungrammatical is because it leads to a violation of the A-Chain condition. If children can interpret the pronoun as [–R], binding becomes grammatical. Interestingly, ECM constructions turned out to be the only syntactic context in which a DPBE shows up in languages with clitic pronouns, such as Spanish. Baauw and colleagues (Baauw et al. 1997; Baauw 2000) showed, using a Picture Verification task, that Spanish 4-to 6-year-old children accepted the reflexive interpretation of (26) 40 percent of the time. (26) La niña la ve bailar the girl her sees dance The appearance of a DPBE in languages such as Spanish seems to support the view that the kind of DPBE that shows up in ECM sentences is of a different nature than the DPBE that was found in Dutch and English simple transitive sentences. If clitic pronouns cannot corefer locally, and the DPBE always involves a local coreference construal, no DPBE is expected in Spanish.13 Importantly, the Spanish results have been replicated for many other languages that have object clitic pronouns or object-agreement morphemes, such as Italian (Berger 1999), Catalan (Escobar and Gavarró 1999), French (Hamann et al. 1997), Norwegian (Hestvik and Philip 2000; Philip pc), Greek (Varlokosta 2000), and Hungarian 13 It could be argued that the extra strong DPBE that shows up in ECM constructions is the result of the additional complexity of ECM sentences, or with specific syntactic properties of ECM constructions. However, Escobar and Gavarró (1999) showed that Catalan children exhibited problems with the interpretation of pronouns in ECM sentences (ia), but not in the equally complex control sentences (ib). A similar result was found for Spanish. Children rejected a reflexive interpretation of (ic) 85% of the time (Baauw 2000).

(i) a. La nena lai veu [ti saltar a corda]. the girl her see jump rope ‘The girl sees her jump rope.’ b. La nena lai vol [PRO tocar ti amb la mà]. the girl her wants touch with the hand ‘The girl wants to touch her with her hand.’ c. El niño trata de [PRO lavarle]. the boy tries to wash-him ‘The boy is trying to wash him.’ Additionally, if ECM were problematic, children are expected to show problems not only with pronominal ECM sentences, but also with ECM sentences containing reflexives, as in (ii), contrary to fact. (ii)

La niña se ve bailar The girl SE sees dance

Alternatively, the extra strong DPBE effect in ECM sentences could be attributed to an experimental artifact, as a result of the use of pictures containing mirrors in the trials testing ECM; children might interpret the reflection in the mirror as a different individual. However, if this were the case, children are expected to reject (ii) in the context of a girl seeing herself dance in the mirror, contrary to fact.

538 Sergio Baauw 100 90 80 70 60 50 40 30 20 10

Pron

ut ch D

an un ga ri

H

re ek G

n or we gia

N

Fr en ch

Ca ta la n

Ita l ia n

Sp an is h

0

Pron-ECM

Pron = Simple transitive sentences (John hit him); Pron-ECM = pronominal ECM sentences (John saw him dance)

Figure 22.1 Percent “no”-responses on “no”-conditions.

(Margócsy 2000). In Figure 22.1 the results of these studies are summarized, including the results for Dutch (Philip and Coopmans 1996a) and Spanish (Baauw 2000). All studies were carried out with some variety of the Picture Verification paradigm. The results represent percentage correct “no” responses on pronominal conditions eliciting “no” responses in adults (sentences containing accusative pronouns that were matched with pictures representing “reflexive” actions). Philip and Coopmans (1996a) proposed that children’s misanalysis of pronouns as [–R] elements may be due to incomplete feature acquisition. They follow Reinhart and Reuland’s (1993) proposal that the [+R] property is related to full specification of phi- features, including structural case. They argue that children may not have fully acquired the structural case specification, due to ambiguity in the pronominal system: hem ‘him’ can be both accusative and dative, haar ‘her’ can be accusative, dative, or genitive. Baauw (2000) suggests that the missing feature may be [number]. However, accounts in terms of incomplete acquisition cannot easily account for the optionality; children do not always accept a reflexive interpretation of pronominal ECM sentences.14 A different approach is followed by Avrutin (2006) and Baauw et al. (2011). Following Reuland (2001) and Avrutin (2004), they propose that adults reject a reflexive 14 One way out of this problem is to propose that children have acquired the relevant pronominal features, but that they may have trouble accessing them. This has been proposed by Baauw and Cuetos (2003) in their account of the DPBE in Spanish children and agrammatic patients’ interpretation of pronouns in ECM sentences.

The Acquisition of Binding and Coreference 539 interpretation of (24) as a result of an economy condition on referential dependencies. According to this condition, dependencies built in (narrow) syntax (A-Chains) are cheaper than dependencies built in semantics (pronominal bound variable construal), which are cheaper that dependencies built in discourse (coreference). Only zich establishes an A-Chain with a local c-commanding DP, as a result of a feature checking relation (Reuland 2001). This means basically that (27a) is rejected by Dutch adults because (27b) is cheaper. (27)

a. *Jani ziet [hemi dansen] b. Jan ziet [zich dansen] |_______| A-Chain

‘John sees him dance’ ‘John sees SE dance’

(bound variable) (A-Chain)

However, in populations that have a reduced syntactic processing capacity, such as children, bound-variable relations may be equally as cheap as A-Chains, which leads to frequent acceptance of (27a). Although in this approach children make errors, not as a result of incomplete acquisition, but as a result of their limited processing capacity, this does not mean that acquisition does not play a role at all. In order to reject the bound variable construal sometimes, the properties of pronouns and reflexives that define them as such need to have been acquired already. According to Reuland (2001) SE reflexives differ from third person pronouns in that the former are underspecified for [number], whereas the latter are specified for this feature. This means that children when they reach the age of 5 should have acquired [number] in pronouns. The approaches followed by Philip and Coopmans (1996a) and Baauw (2000) have in common that they do not see the DPBE as a unitary phenomenon. They sharply distinguish between contexts in which Principle B is violated, and those where it is satisfied (reflexive-marked verbs) or irrelevant (ECM sentences). The first context gives rise to a DPBE only in languages that do not have syntactic clitic pronouns, such as Dutch and English, as a result of children’s overacceptance of local coreference. Since the roughly 50 percent adultlike performance in Romance ECM sentences is highly similar to the 50 percent performance in Dutch and English simple transitive sentences, Escobar and Gavarró (1999) and Di Sciullo and Agüero-Bautista (2008) proposed alternatives that have a single mechanism that accounts for both results. Escobar and Gavarró (1999) propose that while the object clitic of sentences such as (26) is in the main clause, the subject position of the embedded sentence is filled with a pro. Since this pro is not clitic-like, it can corefer with the main clause subject. A breakdown of Rule I accounts for the 50 percent performance pattern of Spanish and Catalan children in pronominal ECM sentences. Di Sciullo and Agüero-Bautista (2008), who follow Chomsky’s (1981a) standard Binding Theory, argue that due to the movement of the clitic pronoun from the embedded subject position to a functional position in the main clause, the clitic and the main clause are in the same binding domain, which would exclude binding of the clitic by the local subject. However, if the clitic is reconstructed in its original position, binding

540 Sergio Baauw by the main clause subject would not violate Standard Principle B, but would violate the Scope Economy Principle. Like Rule I, this principle requires the comparison of two semantic representations of the same construction; the representation before reconstruction and the representation after reconstruction. Since pronouns are scopeless elements, the two representations do not differ in meaning. As a result, both the interpretation before reconstruction and the interpretation after reconstruction are barred; the former because it violates Principle B, the latter because it is barred by the Scope Economy Principle. Di Sciullo and Agüero-Bautista further argue that in children the Scope Economy principle will often break down when they try to apply it, as a result of their more limited processing capacity. As a result they will guess in order to determine the interpretation of the clitic, hence the 50 percent targetlike performance rate. Although Di Sciullo and Agüero-Bautista’s (2008) approach is attractive in the sense that it proposes a single cause, that is, failing reference set computations, for all instances of the DPBE, it is not clear how it accounts for the extra strong DPBE in Dutch ECM constructions, nor does it account for the extra strong DPBE in Dutch sentences containing reflexive-marked verbs. These results seem to indicate the cumulative effect of two independent factors.15

22.4 Interpretation and Production of Reflexives 22.4.1 Reflexive Pronouns across Languages Unlike pronouns, reflexives have been claimed to give rise to highly targetlike performance in young children, which pointed at an early adherence to Principle A. Chien and Wexler (1990) showed that English speaking 5-year-olds tested with a Picture Verification task rejected the non-reflexive interpretation of (28) over 90 percent of the time. (28) Mama bear is touching herself. However, children younger than 5 often did accept a sentence external interpretation of the reflexive. Children younger than 4 rejected sentence external reference only 15

The differences in adultlike responses between Dutch and Spanish can be derived from the following probability calculation. Suppose Dutch children manage to reject a local coreference construal 50% of the time, and when they do so they also manage to reject a binding construal 50% of the time. This will lead to a 25% rejection rate of the reflexive interpretation of pronominal ECM sentences. This comes close to the rougly 20% rejection rate that was actually found. Spanish children, on the other hand, will never consider a local coreference construal, since Spanish weak pronouns are clitics, but a binding construal is a possibility. Suppose that for Spanish children the rejection rate is also 50%, then we come close to the 40–60% rejection rate that was actually found (Baauw 2000; Baauw et al. 2011).

The Acquisition of Binding and Coreference 541 30 percent of the time. Children between 4 and 5 rejected this reading about 67 percent of the time (Chien and Wexler 1990: 270). This means that before the age of 4 children do not really show evidence of knowledge of Principle A. However, this does not necessarily mean that Principle A is not in place. Children need to acquire the morphosyntactic properties that define reflexives and pronouns. It may be the case that English children below the age of 5 have not yet acquired these properties. However, McKee (1992), using a TVJ task, provides evidence for early adherence to Principle A. She tested 3- to 5-year-old children, on two-clause sentences and showed that English children reject a reading of (29a) in which the reflexive pronoun himself refers to the extra-clausal DP the clown 81 percent of the time. The same applies to Italian children for the reflexive si in (29b); reference to the extra-clausal DP la gnoma was rejected 91 percent of the time. (29) a. While the clown was sitting down, Roger Rabbit covered himself. b. Mentre la gnoma cantava, la puffetta si copriva. while the gnome sang the smurfette SE covered McKee also notes that there was hardly any difference between the younger and the older children. In sum, the results indicate that children know that the antecedent of a reflexive must be local and must c-command the reflexive. However, the test does not allow us to determine whether children respect both locality and c-command. Avrutin and Cunningham (1997), using a Picture Verification task, show that when two potential antecedents are in the same clause, American English children correctly reject the non-c-commanding DP the boy as a possible antecedent of himself 96 percent of the time. (30) The man near the boy was washing himself. These results seem to confirm the conclusion that young children respect the grammatical principles that regulate the interpretation of reflexives. However, evidence from child Dutch indicates that children do not always perform in an adultlike way on reflexives. Avrutin and Coopmans (1999), using the same experimental design as Avrutin and Cunningham (1997), showed that 5-to 6;6-year- old Dutch children accepted the non-c-commanding DP de prinses ‘the princess’ as the antecedent of the SE reflexive zich 74 percent of the time. Their performance on the SELF reflexive zichzelf was much better; 90 percent of the time they rejected de princes as the antecedent of zichzelf. (31) a. De boerin naast de prinses wast The farmer’s-wife next-to the princess washes b. De boerin naast de prinses wast The farmer’s-wife next-to the princess washes

zich. SE zichzelf. SELF

This seems to indicate that Dutch children have no problems with SELF anaphors, just like English children, but they do have problems with SE reflexive zich, a type of reflexive

542 Sergio Baauw that is absent in English. Avrutin and Coopmans (1999) propose that Dutch children may treat SE reflexives as [+R] optionally, as a result of incomplete feature acquisition (Philip and Coopmans 1996a; Reinhart and Reuland 1993). However, if zich can be treated optionally as a pronoun, it is difficult to explain why children perform highly adultlike on simple sentences, such as (32). (32) Jan wast zich. John washes SE Interestingly, a more recent study by Coopmans et al. (2004) tried to replicate the results from Avrutin and Coopmans (1999) with a methodology in which they complied with the condition of plausible dissent (Crain and Thornton 1998). The methodological improvements lead to considerably more adultlike responses; the 3;5-to 5;7-year-olds rejected the non-c-commanding DP 94 percent of the time as the antecedent of zich, the 5;8-to 7;9-year-olds rejected it 100 percent of the time. These results seem to confirm the results from earlier studies on reflexives; children perform highly adultlike on reflexives. The poor results in Avrutin and Coopmans could be dismissed as an experimental artifact. However, this would fail to explain why Avrutin and Cunningham (1997) obtained highly adultlike results, although their experimental setup was similar to Avrutin and Coopmans’ (1999). This suggests that Dutch reflexives involve a difficulty that leads Dutch children to violate syntactic constraints such as c-command when unbalances in the experimental setup favor the non-c-commanding antecedent. Baauw et al. (2006) presented evidence that seems to confirm Avrutin and Coopmans’ (1999) earlier findings, using a Story Elicitation task. In this task 5;4-to 6;7-year-old Dutch children and 5;3-to 6;1-year-old Spanish children had to tell short stories on the basis of three-picture sequences. These sequences were designed to either elicit the use of SE reflexives (se in Spanish, zich in Dutch) or SELF reflexives (sí mismo in Spanish, zichzelf in Dutch).16 The Dutch results showed that Dutch children’s performance was far from adultlike on the condition eleciting SE reflexives; whereas adults produced zich almost 70 percent of the time, children produced zich less than 40 percent of the time.17 Dutch Children’s performance on SELF reflexives, on the other hand, was highly 16 When in Dutch the verbal predicate allows the SE reflexive zich, the SELF reflexive zichzelf is mainly used in contrastive contexts. In Spanish, SELF reflexives, such as sí mismo, are also mainly used for contrastive purposes. The picture sequences represented contrastive situations to elicit SELF reflexives, and non-contrastive situations to elicit SE reflexives. 17 Instead of using zich, children sometimes used the sub-standard/dialectal form z’n eigen (lit: his own) (ia), sometimes they avoided using a reflexive sentence by omitting the object (ib), or using a sentence containing a possessive (ic).

(i)

a. Het jongetje wast z’n eigen. The boy washes his own b. Het jongetje is aan het wassen the boy is washing c. Het jongetje wast zijn buik the boy washed his belly

The Acquisition of Binding and Coreference 543 adultlike. Interestingly, the Spanish children’s performance on se was highly adultlike; se was produced roughly 80 percent of the time both by adults and children. Baauw et al. (2006) argue that Dutch children’s poor performance on zich can be explained by Avrutin’s (2004, 2006) ideas on referential dependencies and economy. As argued in section 22.3.7, Avrutin follows Reuland’s (2001) idea that referential dependencies can be established in either (narrow) syntax, by means of A-Chain formation, as in (27b) or in the semantics, by establishing a bound variable relation between a pronoun and a binder (27a). In healthy adults referential dependencies established in syntax are cheap. However, in populations with limited syntactic processing capacity, such as children, the ability to use syntax to code referential dependencies is weakened. As a result, the use of zich will be more difficult, and is often avoided. Spanish children’s adultlike performance on se is due to the fact that it is a different element; whereas zich is a pronominal element whose morphosyntactic properties force it to establish an A- Chain, se is a morphological reflexive-marker, whose function is to legitimate the transformation of a transitive verb into an intransitive or reflexive verb, by identification of the internal theta role with the external role: λxλy (xRy) → λx (xRx). Since there is no pronominal, zich-like, element in object position, no A-Chain is established. SELF anaphors appear to be less problematic. According to Baauw et al. (2006) this is because they involve a different type of reflexivity, which is not established in narrow syntax (Heim 1982; Jackendoff 1992; Lidz 1997).18 Hence, its use and interpretation is less problematic for children.19 18

SELF reflexives are elements associated with a “guise,” i.e., a mental representation that is highly similar, but not necessarily identical to the guise associated with the antecedent. The reflexive interpretation is created by identification of this “guise” with the guise of its antecedent (Jackendoff 1992; Rooryck and van den Wyngaerd 1997; Reuland 2001). This explains the ability of SELF to refer to statues, pictures, and other representations of the referent. SE reflexives, on the other hand, do not allow such an interpretation: Context: Ringo visiting a wax museum passing by a wax statue of himself. (i) a. Plotseling begon Ringo zichzelf uit te kleden [zichzelf = Ringo/statue] ‘All of a sudden, Ringo started undressing SELF’ b. Plotseling begon Ringo zich uit te kleden [zich = Ringo/ *statue] ‘All of a sudden, Ringo started undressing SE’ 19

Spanish children did produce SE reflexives 29% of the time in the condition eliciting SELF reflexives. Baauw et al. (2006) argue that this was due to some children’s tendency to follow a “deictic” strategy in their interpretation of discourses (Karmiloff-Smith 1980). This often leads them to interpret the different pictures of the three-picture-sequence as independent events, ignoring the contrastiveness of the situation evoked by the picture sequences eliciting SELF reflexives. Another finding was that neither children nor adults used the standard SELF reflexive sí mismo very often. They preferred alternative forms containing strong pronouns, with or without the mismo (SELF) morpheme, which are more usual in informal speech: (i) a. La niña se lava ella (misma) The girl SE washes she (SELF) b. La niña se lava a ella (misma) the girl SE washes ACC her (SELF)

544 Sergio Baauw

22.4.2 Non-local Reflexives It is well known that reflexives can sometimes be used to refer to entities outside what is normally considered the binding domain of reflexives, as in A picture of myself would be nice, in which myself is identified with the speaker. This non-local use of reflexives is often called logophoric. Avrutin and Cunningham (1997) tested English speaking children not only on (33a) but also on sentences such as (33b), in which himself is not an argument of the verb hide, and is taken to be logophoric. (33) a. The man near the boy was washing himself b. The man near the boy hid a book behind himself. They found that whereas children rejected the non-c-commanding DP as an antecedent 96 percent of the time in (33a), they only did so 66 percent of the time in (33b). Avrutin and Cunningham argue that unlike the local use of himself, the logophoric use of himself is sensitive to discourse constraints, and that children have problems accessing these constraints. Coopmans et al. (2004) found similar results for Dutch children’s interpretation of zich; the 3;5-to 5;7-year-olds accepted the non-c-commanding DP (Rose) as an antecedent in (34a) 94 percent of the time, but did so 69 percent of the time in (34b). (34) a. De vader van Rose waste zich. The father of Rose washed SE b. De vader van Rose hield de paraplu boven zich The father of Rose held the umbrella above SE In some languages reflexive pronouns in embedded sentences can be bound by DPs in the main clause. This long-distance use of reflexives shows up in Asian languages such as Japanese, Chinese, and Korean, but also in some European languages such as Icelandic. In Icelandic, the SE reflexive sig can be bound by either the embedded clause subject Pétur or the main clause subject Jón in (35) (Hyams and Sigurjónsdóttir 1990): (35) Jón segir að Pétur raki sig. John said that Peter shavesSubjunctive SE The possibility for reflexives to have an extra-clausal antecedent is dependent on lexical and semantic factors, such as the choice of the main clause verb, and the mood of the embedded clause; in Icelandic the embedded clause must be subjunctive. Early studies on children’s ability to access both the local and the long-distance antecedent of the Chinese reflexive zìjǐ indicate that children initially have a strong preference for the local antecedent of the reflexive. This reading is also preferred by Chinese adult speakers (Chien and Wexler 1987). Korean children have a similar preference for the local reading of the anaphor caki, although Korean adult speakers prefer the long distance reading (Lee and Wexler 1987; Cho 2009). This was predicted by Manzini and Wexler’s (1987) parametrization of binding domains and their idea that children follow the subset principle in determining the correct language specific binding domain

The Acquisition of Binding and Coreference 545 (Berwick 1985). Since a language in which only local antecedents are permitted is a subset of a language in which both local and non-local antecedents are permitted, children are expected to start with permitting only local antecedents. More recent experimental evidence confirms this. Su (2003) tested 25 children with a mean age of 4;10 with a TVJ task on their interpretation of Chinese zìjǐ in sentences such as (36). (36) Zhangsan shuo Lisi xihuan zìjǐ. Zhangsan say Lisi like self ‘Zhangsan said Lisi liked self.’ The results indicated that both Chinese children and adults had a preference for the local antecedent, preferring Lisi above Zhangsan, although children’s preference for the local antecedent was stronger. However, Hyams and Sigurjónsdóttir (1990) show that Icelandic children, who were tested with an Act-Out task, strongly prefer to identify sig with the non-local antecedent. In fact, they even prefer this reading when the embedded clause is in the indicative mood, which is ungrammatical in the adult language. Hyams and Sigurjónsdóttir propose that in Icelandic long distance reflexives have pronominal properties, unlike Chinese long distance reflexives. This may explain the developmental differences between Chinese children and Icelandic children. Hyams and Sigurjónsdóttir also argue that children develop sensitivity to the semantic and lexical factors governing long distance binding only gradually, to the effect that they do not distinguish between subjunctive and indicative clauses. Another property of long distance reflexives such as Chinese zìjǐ is that they can only refer to subjects, unlike English reflexives such as himself (Su 2003). (37) Zhangsani gei Lisij yi-ben zìjǐ i/*j-de shu Zhangsan give Lisi a-CL self-of book ‘Zhangsan gave Lisi a book of self.’ Su claims that the subset principle would predict for this property that children initially limit reference to the subject, since languages that only allow subjects as antecedents for reflexives (such as Chinese) are a subset of languages that allow both subjects and non- subjects as antecedents (such as English). Su (2003) tested 19 children with a mean age of 5;2 with a TVJ task on their knowledge of this constraint. The results showed that, while the adult controls rejected reference to a non-subject 100 percent of the time, children did so only 51 percent of the time. This confirms earlier results by Chien (1992), and seems to defy the subset principle. A possible explanation of these results is provided by Chien and Lust (2006). They follow Huang and Liu (2001), who proposed that Chinese zìjǐ is ambiguous; when it refers to a local antecedent, it is subject to Principle A of the Binding Theory, but when it is used as a long distance reflexive, it is a pragmatic logophor, whose antecedent corresponds to “a person whose (a) speech or thought, (b) attitude or state of consciousness,

546 Sergio Baauw and/or (c) point of view or perspective is being reported” (Huang and Liu 2001: 156). Since the mastery of the logophoric use of zìjǐ requires the development of pragmatic knowledge, it is expected that children will initially treat zìjǐ as a local reflexive, subject to Principle A. As a result they will allow zìjǐ to refer to both local subjects and objects. Chien and Lust further argue that Chinese children learn that zìjǐ is subject-directed after they learn that zìjǐ adjoins to IP at LF, a position that is c-commanded by the subject, but not by the object (Huang and Liu 2001).

22.5 Summary and Conclusions The studies on children’s interpretation of pronouns and reflexives reviewed in this chapter point at a relatively early mastery of linguistic principles such as Principle A and B, provided a balanced experimental design is used. The cross-linguistic evidence also indicates that the language specific syntactic and semantic properties of (clitic) pronouns are acquired before the age of 5. Recent developments in linguistic theory, particularly the Binding Theory, and cross-linguistic acquisition research have shown that children’s difficulties with the interpretation of pronouns and reflexives are to be found at the interfaces between syntax and discourse or semantics, and may be due to limited (syntactic) processing resources. In addition, children’s performance on logophors and long-distance reflexives suggests that the acquisition of pragmatics and particular properties of lexical items are relatively late.

Chapter 23

L o gical C on ne c t i v e s Takuya Goro

Developmental psychologists have long been interested in the development of logical concepts and logical thinking in children (e.g. Inhelder and Piaget 1958, 1964; Piaget 1967, 1968). Studies on children’s comprehension of logical connectives in natural languages, such as English conjunction and and disjunction or, have an equally long history. In this chapter, rather than trying to spell out the whole history of research, I will focus on presenting findings from more recent studies on the acquisition of logical connectives, and discuss some implications of the findings. This is because the recent research in this field has made great advances, which I believe is a fundamental leap forward. The key feature of this “paradigm shift” in the field is the incorporation of insights from theoretical linguistics. Over the last few decades, theoretical linguists have elaborated on the descriptions of logical connectives in natural language, proposing formal analyses that encompass syntax, semantics, and pragmatics. These formal models help elucidate the nature of the knowledge of logical connectives in natural language, thereby providing grounds for designing appropriate experiments to assess the knowledge in children. In what follows, I review some of the experimental investigations of the acquisition of logical connectives that are built on the results of theoretical linguistics. The experimental findings point in the same direction: preschool children have sophisticated knowledge of logical connectives, even though their behavior may sometimes deviate from adult behavior. I begin by describing how semantics and pragmatics affect interpretations of natural language disjunction.

23.1 Disjunction: Semantics and Pragmatics In English, sentences containing a disjunctive phrase “A or B” often invoke the so-called exclusive interpretation of disjunction. For example, upon hearing the sentence in (1),

548 Takuya Goro the hearer would typically infer that it is not the case that John speaks both French and Spanish. (1)

John speaks Spanish or French.

It is a widely accepted view that this exclusive interpretation of disjunction arises from a scalar implicature (e.g. Horn 1972, 1989; Gazdar 1979). Building on the ideas of Grice (1975), it is assumed that in cooperative conversations the speaker is expected to provide the maximally specific information within the relevant context to the hearer. In the current case, using the conjunction and instead of or yields a more restricted interpretation, hence it provides more specific information. Therefore, if the speaker knows that John speaks both French and Spanish, he should have uttered “John speaks French and Spanish,” rather than using the form in (1). This allows the hearer to infer that the speaker does not believe John speaks both French and Spanish. Assuming that the speaker is cooperative and reliable, the hearer then concludes that John in fact does not speak both languages, giving rise to the exclusive interpretation of disjunction. In what follows, I will follow the pragmatic account of the exclusive interpretation of disjunction. I will assume English or corresponds to inclusive disjunction in classical logic (∨), and a derived exclusive implicature is computed and added onto the basic lexical meaning, due to the pragmatic principles that govern interpretations of sentences in conversation. Under this assumption, making judgments about the truth or falsity of sentences like (1) involves more than just knowing the meanings of each word and the rules of semantic composition: language users must also be aware of pragmatic principles that give rise to the relevant scalar implicature. This compound nature of the interpretation of disjunction or becomes particularly relevant when we examine children’s acquisition of the connective. In order to manifest the full range of adultlike behavior with or, a child must have acquired (a) knowledge of the lexical semantics of the word, and (b) the ability to compute scalar implicatures. Therefore, a study of the acquisition of the logical connective must appropriately distinguish the contributions of these two different components to children’s behavior. It has been widely observed that young children do not reliably compute pragmatic implicature in general (i.e. they fail to reject test sentences on the basis of pragmatic implicature) in experimental setups (e.g. Noveck 2001; Papafragou and Musolino 2003; Guasti et al. 2005). For example, using the word some in a descriptive sentence invokes a scalar implicature “not every.” However, in one of Papafragou and Musolino’s (2003) experiments, 5-year-old Greek-speaking children almost consistently accepted the Greek version of the test sentence Some of the horses jumped over the fence under a situation where all of the horses jumped over the fence, while adults overwhelmingly rejected the same sentence in that situation. Given this background, the observation that young children often fail to assign the exclusive interpretation to disjunction or (e.g. Paris 1973; Chierchia et al. 2001) would most naturally be interpreted as reflecting their general difficulty in computing pragmatic implicatures. Put differently, children’s non-adult behavior with respect to the exclusive interpretation of or does not endorse the conclusion

Logical Connectives 549 that their knowledge of the meaning of the lexical item is flawed. An appropriate test for children’s lexical semantics of or requires an experimental design that abstracts away from the exclusive interpretation. I will discuss such a design in the next section. Another factor that can affect children’s behavior in experimental setups is the felicity conditions for using a disjunctive statement such as (1). In normal conversation, a speaker would avoid using the sentence in (1) if he knows exactly what language John speaks; he would instead say something like (2) or (3). (2) John speaks Spanish and French. (3) John speaks French. In the situations where (2) or (3) is appropriate, a disjunctive statement such as (1) is infelicitous. In other words, the use of or in a descriptive statement is pragmatically felicitous only when the speaker is not in a position to provide more specific information. Some examples of utterances where or is felicitously used are as follows. The disjunctive statement in (4) expresses the speaker’s uncertainty, and the sentence in (5) expresses John’s minimum requirement: (4) I can’t remember what foreign language John speaks. I believe it is French or Spanish. (5)

John is looking for a person who speaks French or Spanish.

In earlier studies on the acquisition of or (e.g. Suppes and Feldman 1969; Paris 1973; Beilin and Lust 1975; Braine and Rumain 1981), little attention was paid to the felicity requirements. Test sentences that involve or were often presented without contextual supports that would justify the use of a disjunctive statement. However, the emerging consensus in more recent development of experimental studies on language acquisition is that failing to meet the felicity requirements for test sentences can cause young (especially, pre-school) children to behave differently from adults (see Crain and Thornton 1998: ch. 35; Gualmini 2003). Therefore, the results from earlier studies, which involved younger children’s non-adult-like performance with the disjunction or and gradual development of adult-like behavior, must be reevaluated with the data from recent experiments that controlled for the felicity of test sentences.

23.2 Boolean Disjunction in Preschool English Since early 2000s, Stephen Crain and his colleagues have developed a research paradigm that draws upon the Boolean property of the disjunction or (e.g. Crain et al. 2002; Gualmini and Crain 2002, 2004, 2005; Goro et al. 2005). This approach focuses on the semantic interaction between or and another logical word (in particular, negation) in the same sentence. When the disjunction or appears within the scope of negation, it

550 Takuya Goro allows an inference that closely resembles one of De Morgan’s laws of Boolean logic. In (6), to illustrate, the truth conditions of the sentence that contains a negated disjunction can be recast with the conjunction and presiding over both of the disjuncts. (6) John doesn’t speak Spanish or French. →John doesn’t speak Spanish AND doesn’t speak French. We call this interpretation the “conjunctive” interpretation of disjunction because it is logically equivalent to the conjunction of two negated expressions (e.g. Higginbotham 1991): (7) A

B

A∨B

¬(A∨B)

¬A

¬B

¬A∧¬B

0 0 1 1

0 1 0 1

0 1 1 1

1 0 0 0

1 1 0 0

1 0 1 0

1 0 0 0

Therefore in normal contexts, the sentence in (6) is judged to be false if John speaks either Spanish or French. The following pair of sentences from Crain et al. (2002) illustrates the interpretive contrast between the conjunctive interpretation of disjunction and the ordinary disjunctive interpretation: (8) The girl who stayed up late will not get a dime or a jewel. (9) The girl who didn’t go to sleep will get a dime or a jewel. Both of the sentences involve the disjunction operator or and negation. In addition, in both of the sentences negation precedes disjunction. However, the structural relations between negation and disjunction are different in the two sentences. In (8), the negation not appears in the matrix clause, and is structurally higher than the disjunction or. In contrast, the negation n’t in (9) is embedded within a relative clause, and hence is not structurally higher than the disjunction or. This difference in the structural relations affects the interpretations of the sentences. In (8), or is interpreted under the scope of negation, yielding a conjunctive truth condition: the girl will not get a dime AND will not get a jewel. In (9), or is interpreted outside the scope of negation, and the sentence is not associated with conjunctive truth conditions. Therefore, under a situation where the girl ends up getting a jewel but not a dime, (8) is judged to be false, whereas (9) is judged to be true. Notice that the exclusive interpretation of disjunction, which involves computation of scalar implicatures, is irrelevant in making these judgments. First, the conjunctively interpreted disjunction in (8) does not invoke a scalar implicature, presumably because it yields the most restrictive truth condition (see the truth table in (7)). Second, regardless of whether the disjunction or in (9) is interpreted exclusively or inclusively, the sentence should be judged true in the above situation. On the other hand, assigning distinct truth conditions to (8) and (9) requires sophisticated semantic knowledge. In order to

Logical Connectives 551 derive the conjunctive truth condition for (8), or must be understood as the Boolean inclusive disjunction. In addition, the knowledge of semantic compositional principles is necessary to correctly determine scope relations between the disjunction and negation. Thus, the interpretive contrast between (8) and (9) provides an excellent testing ground for children’s semantic knowledge of the disjunction or, abstracting away from the problem of pragmatic implicature computation. Crain et al. (2002) investigated children’s interpretation of sentences such as (8) and (9), using a Truth Value Judgment (TVJ) task (Crain and McKee 1985; Crain and Thornton 1998). In one of the experimental trials, children were told a story about two girls who had both lost a tooth, and were looking forward to getting a reward from the Tooth Fairy in exchange for their lost tooth. At the moment the Tooth Fairy arrived in the story, the puppet (manipulated by one experimenter) interrupted the story and presented his prediction about what would happen in the reminder of the story. The “prediction mode” was employed in order to satisfy the felicity conditions for using a disjunctive statement. Since the puppet was making a prediction about what would happen, rather than presenting a description about what had happened, or can be felicitously used to express his uncertainty. One group of children were presented with (8), while the other group heard (9). Then the story resumed, and the Tooth Fairy gave a jewel, but no dime, to the girl who had stayed up late. Following the completion of the story, the child participant was asked to judge whether the puppet’s prediction was right or wrong. Under the adult interpretations of the test sentences, (8) is false in the situation, because of the conjunctive truth conditions associated with the sentence (i.e. the girl will not get a dime AND will not get a jewel). In contrast, (9) is true in the situation, because the disjunction or is interpreted outside the scope of the negation. Crain et al. (2002) found that English-speaking children at age 4–5 rejected (i.e. said the puppet was “wrong”) sentences like (8) 92 percent of the time, while they accepted (i.e. said the puppet was “right”) sentences like (9) 87 percent of the time. This highly robust adultlike performance by preschool children has been replicated in other studies that employed similar design (e.g. Gualmini and Crain 2002, 2004, 2005; Goro et al. 2005). Therefore, the available evidence strongly suggests that preschool children as young as age 4 can correctly compute the semantic interaction between the disjunction or and other logical words, for example, negation. Given the data, we conclude that the semantics of the disjunction or is fully developed at age 4, although children at this age may have problems in computing pragmatic implicatures, or in accommodating themselves to test sentences that fail to meet the relevant felicity conditions.

23.3 Cross-linguistic Variation and Universality As we have seen in the previous section, English disjunction or yields the “conjunctive” interpretation within single-clause negative sentences such as (6). In contrast, the

552 Takuya Goro Japanese counterpart of (6) appears to lack the conjunctive interpretation. As illustrated in (10), a Japanese simple negative sentence that involves the disjunction ka is most naturally paraphrased by the disjunction of two negated expressions: (10) John-wa supeingo ka furansugo-o hanasa-nai. John-top Spanish or French-acc speak-neg Lit. ‘John doesn’t speak Spanish or French.’ → ‘John doesn’t speak Spanish OR doesn’t speak French.’ Thus, the sentence in (11) can be truthfully uttered in a situation where, for example, John speaks Spanish but not French. In order to convey the “neither” meaning of (6), Japanese speakers use the conjunction … mo … mo: (11)

John-wa supeingo mo furansugo mo hanasa-nai. John-top Spanish both French and speak-neg → ‘John doesn’t speak Spanish AND doesn’t speak French.’

This cross-linguistic contrast was first pointed out by Szabolcsi (2002), on the basis of the observation that Hungarian disjunction vagy lacks the “conjunctive” interpretation in simple negative sentences. (12) Nem csukt-uk be az ajtó-t vagy az ablak-ot. not closed-1pl in the door-acc or the window-acc Lit. ‘We didn’t close the door or the window.’ → ‘We didn’t close the door OR didn’t close the window.’

(Szabolcsi 2002: 2)

However, when the disjunction and negation are separated by a clause boundary, the interpretive contrast between English and Hungarian evaporates. For example, (13) is interpreted as “I don’t think we closed the door AND don’t think we closed the window,” just as in English. (13)

Nem hisz-em, hogy becsukt-uk volna az ajtó-t vagy az ablak-ot. not think-1sg that in-closed-1pl AUX the door-acc or the window-acc “I don’t think we closed the door or the window.” (Szabolcsi 2002: 2)

A parallel observation can be made in Japanese. The following examples illustrate that identical conjunctive interpretations of ka and or emerge when they appear in a sentential complement ((14)), and in a relative clause ((15)), embedded under a matrix negation. (14) a. English complement clause John didn’t say that Mary speaks Spanish or French. → ‘John didn’t say that Mary speak Spanish AND didn’t say that Mary spoke French.’ b. Japanese complement clause

Logical Connectives 553 John-wa [Mary-ga supeingo ka furansugo-o hanasu-to] iwa-nakat-ta. John-top Mary-nom Spanish or French-acc speak-Comp say-neg-past → ‘John didn’t say that Mary speaks Spanish AND didn’t say that Mary spoke French.’ (15) a. English relative clause John didn’t see a student who speaks Spanish or French. → ‘John didn’t see a student who speaks Spanish AND didn’t see a student who speaks French.’ b. Japanese relative clause John-wa [supeingo ka huransugo-o hanasu] gakusei-o mi-nakat-ta. John-top Spanish or French-acc speak student-acc see-neg-past → ‘John didn’t see a student who speaks Spanish AND didn’t see a student who speaks French.’ These data suggest that both Hungarian vagy and Japanese ka are Boolean disjunction, just like English or. However, vagy and ka are subject to a language-specific constraint that blocks them from being interpreted under the scope of local negation. Due to the effect of the constraint, they take scope over negation in single clause sentences such as (10) and (12), yielding the “disjunctive” truth conditions. Szabolcsi (2002) and Goro (2004a, 2007) argue that the relevant constraint on scope interpretation is lexical in nature. Specifically, they propose that disjunctions in natural language are divided into two classes. In one class, which includes Hungarian vagy and Japanese ka, disjunctions are Positive Polarity Items (PPI). In the other class, which includes English or, disjunctions are not PPIs. The defining property of PPIs is that they cannot take scope under local negation, while they can be interpreted under the scope of extraclausal negation. English existential quantifier some, which is assumed to be a typical PPI (e.g. Baker 1979; Progovac 1994; Szabolsci 2004), shows this locality effect in scope interpretation. As illustrated in (16), the scope behavior of some parallels that of vagy and ka: (16) a. John didn’t call someone. b. Taro didn’t think John called someone. c. John didn’t meet a boy who called someone.

*¬ >> ∃1 OK¬ >> ∃ OK ¬ >> ∃

Given this analysis, children learning a language with a PPI-disjunction (i.e. Hungarian, Japanese, etc.) must learn that the relevant lexical item is a Boolean inclusive disjunction, and also that the item is a PPI. However, an equally plausible hypothesis seems to be available for children: the hypothesis that the relevant disjunction operator is not a

1 Although the literature often claims that the narrow scope interpretation of some is impossible, there seem to be nontrivial numbers of native speakers who find the reading just fine. This may possibly be due to a lexical variation among speakers: for those speakers who allow the narrow scope interpretation of some in simple negative sentences, some is not a PPI.

554 Takuya Goro Boolean connective, and therefore it does not semantically interact with negation (or with any other logical words). As we have seen in (10) and (12) above, PPI disjunctions do not yield the conjunctive interpretation in simple negative sentences. Consequently, adults chose to use an alternative form (e.g. the conjunction … mo … mo as in (11)) to express the “neither” meaning. As a result, in the vast majority of input data to children, PPI disjunctions fail to manifest their Boolean nature. The direct evidence for the Boolean nature presumably involves sentential embedding, such as in (13), (14), and (15), which may not be included in the primary linguistic data used by children (e.g. Lightfoot 1999). Given these considerations, cross-linguistic research on the acquisition of disjunction bears on the issue of how experience affects language acquisition. If children are to learn the Boolean semantics of disjunction from experience, then it is expected that English-speaking children and Japanese-speaking children would go through different developmental courses, because of the difference in the availability of crucial evidence in input. Goro and Akiba (2004) examined Japanese children’s interpretation of disjunction in simple negative sentences. The experiment used the TVJ task. One experimenter acted out a short story about an “eating-game.” In the game, there are 12 animals who are each asked to eat vegetables that they don’t like: a carrot and a green pepper. Each animal gets a different prize depending on how well they did. First, if an animal eats not only cake, but also the vegetables, then it receives a shining gold medal. Second, if an animal eats cake but only one of the vegetables, then it receives a blue medal. Finally, if an animal only eats cake and does not eat any vegetables, then it gets a black cross (a symbol of failure in Japanese culture). The story phase continued until all twelve animals finished their trials and were presented with their rewards. After the story, the puppet started to guess how well each animal did in the game, starting with the first animal. First, the puppet said that he didn’t remember exactly what each animal ate, then he started to make guesses about this, based on the color of the prizes the animals had been presented as awards. The crucial test cases are the puppet’s guess about those animals with a blue medal, that is, those who ate only one of the vegetables. For example, the puppet uttered the test sentence (17) for the pig, who had a blue medal as his reward: (17) Butasan-wa ninjin ka piman-wo tabe-nakat-ta. pig-top carrot or pepper-acc eat-neg-past Lit. ‘The pig didn’t eat the carrot or the pepper.’ Recall that a blue medal was awarded only to those animals who had eaten just one of the vegetables—it did not indicate which vegetable the animals had actually eaten. Therefore, the color of prizes provides only incomplete information with respect to what the pig actually ate. Given this incompleteness of information, all that the puppet could reasonably guess was something like “he didn’t eat the pepper or he didn’t eat the carrot,” which corresponds to the adult-Japanese interpretation of the target sentence. In this situation, the adult control group accepted the test sentences with disjunction, as in (17),

Logical Connectives 555 100 percent of the time. This is just as expected. In contrast to the pattern of results from adults, however, 30 Japanese-speaking children (age: 3;7–6;3, mean: 5;3) only accepted the crucial test sentences under the same situation 25 percent of the time. Among the 30 children, only four were adultlike in consistently accepting the test sentences. The remainder of the children rejected the test sentences 87 percent of the time. When these children were asked to explain the reason for their negative judgments, they said, for example either “because the pig did eat one of the vegetables” or “because it is only one of the vegetables that the pig didn’t eat.” The negative judgments from the vast majority of children, combined with their explanation for their negative judgments, suggest that Japanese children were assigning the conjunctive interpretation to ka in simple negative sentences. That is, with respect to the interpretation of negated disjunction, Japanese children are more like English- speaking children/adults than like Japanese adults. This result has been replicated in other languages with a PPI-disjunction: Mandarin Chinese2 (Jing et al. 2005) and Russian (Verbuk 2006a). Thus, the conjunctive interpretation of negated disjunction appears to be the “default” interpretation for children. This in turn suggests that children uniformly assign Boolean semantics to disjunction, regardless of whether the input data provide evidence that supports the hypothesis. Summarizing, available evidence suggests a lack of cross-linguistic variation in child language: children uniformly assign Boolean semantics to disjunction, and interpret negated disjunction conjunctively, even if the interpretation is not available in their target language. In other words, the cross-linguistic variation with respect to the interpretation of negated disjunction in adult language fails to affect children’s semantic knowledge of disjunction: experience does not seem to play a significant role in the acquisition of semantics of disjunction. If children are to construct the meaning of disjunction from linguistic experience, then the difference between adult English and Japanese is expected to affect the outcome: children learning English should encounter a sizable amount of evidence that shows the disjunction or is Boolean (i.e. not … or expressing “neither”), but Japanese children are virtually deprived of such evidence. These results from cross-linguistic studies on the acquisition of disjunction have led researchers to propose that the semantics of disjunction is innately determined (Goro 2007; Crain and Khlentzos 2008, 2010; among others). On this view (“logical nativism” by Crain and Khlentzos 2010), humans have an innate logical faculty that structures thought, providing universal logical concepts that all natural languages draw on. The Boolean inclusive semantics of disjunction is specified in the innate endowment, and children learn an appropriate linguistic label for the innate concept. It is then predicted that all human languages associate the same truth conditions to an expression of disjunction. Crain and Khlentzos (2008, 2010) argue that this is indeed the case: the semantics of disjunction is uniform across languages. Further support for the logical nativism account would come from a study of younger children. According to this view, the innate logical concepts including disjunction 2

See Jing (2008) for an alternative view on Mandarin disjunction.

556 Takuya Goro should be present even at the earliest stages of development. Therefore, with an appropriate task design, it should be possible to find evidence that younger children, possibly infants, engage in logical reasoning using such concepts. Studies on the development of logical capacity in children have found that, with tasks that do not require explicit conscious reasoning, children as young as 2.5 years of age show some success on making what looks to be logical inference (e.g. Pea 1982; Fabricius et al. 1987; Scholnick and Wing 1995; Watson et al. 2001). For example, Halberda (2006) showed a familiar object (e.g. a cup) and a novel object (e.g. a vacuum tube-looking object) to 3-year-old children, providing a novel label (“watch the dax”). Eye-tracking analysis revealed that upon hearing the novel label, children systematically switched their gaze back to the familiar object, rejected it, and then returned their gaze to the novel target. Halberda argues that this behavior reflects the computational structure of Disjunctive Syllogism (i.e. A or B, not A, therefore B) that the children worked through to motivate the mapping of the novel label to the novel object. Thus, 3-year-olds engage in logical inference that involves disjunction, suggesting that the concept is present and operative in the mind of children at that age. Summarizing, we have reviewed studies on the acquisition of disjunction from a cross-linguistic perspective. Due to the existence of language-specific constraints on scope interpretation, simple negative sentences involving disjunction are assigned contrastive truth conditions in different languages. However, the cross-linguistic contrast is absent in child language. Children across different languages uniformly assign the Boolean, “De Morgan” interpretation to negated disjunction. This universality in child language motivates logical nativism: the idea that logical concepts including disjunction are specified as part of the innate endowment of the species.

23.4 Acquisition of Scope In Goro and Akiba’s (2004) experiment, Japanese children at around age 5 assigned the conjunctive interpretation to the disjunction ka in test sentences like (17). This means that they interpreted ka under the scope of local negation, even though the interpretation is not possible in adult Japanese. Given this observation, it is important to ask how Japanese children learn the scope constraint on ka. Since the scope constraint on disjunction is not universal, first language learners must somehow learn it from experience. In this section, we will discuss how the acquisition of the scope constraint is possible. First, it must be pointed out that the non-adult scope interpretation of ka by Japanese children cannot be attributed to a general inability to access “inverse-scope” interpretations. The behavior of Japanese children, at first glance, is somewhat reminiscent of English-speaking children’s non-adult performance observed by Musolino et al. (2000). In their TVJ task experiments, Musolino et al. found that young children often failed to assign “inverse scope” readings to test sentences like (18), resulting in a failure to accept the sentence in a situation where the detective found two of his friends but missed one.

Logical Connectives 557 (18) The detective didn’t find someone/some guy. Note that in adult English some is a PPI, and resists taking scope under local negation. However, young children adhered to the narrow-scope interpretation of the existential quantifier (i.e. “the detective didn’t find anyone”), ignoring the constraint on scope interpretation. Given this background, Goro and Akiba (2004) carried out a control experiment to make sure that Japanese children at the relevant age (i.e. children at around age 5) can access inverse-scope interpretations. The control condition was established by replacing the disjunctive phrase in the target sentences, such as (17), with another quantificational expression nanika. Nanika is an indefinite existential corresponding to English something: (19)

Butasan-wa nanika tabe-nakat-ta. Pig-top something eat-neg-past ‘The pig didn’t eat something.’

In the control experiment, Japanese children (age: 3;7–6;3, mean: 5;4) were presented the target sentences such as (19) as the puppet’s guess about those animals with a blue medal, that is, those who ate only one of the vegetables. In contrast to the experiment with disjunction ka, children did not show the same non-adult performance: they correctly accepted the crucial test sentences 88 percent of the time. This result suggests that Japanese children in Goro and Akiba’s experiments did not experience general problems in accessing inverse-scope interpretations of object QPs. Given this result, Goro and Akiba concluded that the non-adult performance in interpreting sentences with ka must have to do specifically with the lexical item. In Goro (2004a, 2007), I proposed that innate linguistic knowledge restricts a learner’s possible hypotheses about the scope of natural language disjunctions, arguing that (20) is a part of innate linguistic knowledge: (20)

Boolean disjunctions in natural language are associated with a lexical parameter with the following values: {+PPI, –PPI}.

Under the analysis, Hungarian vagy and Japanese ka are disjunctions with the value [+PPI], and English or has the value [–PPI ]. The first language learner’s task is then to determine which value of the parameter a particular disjunction in the target language has. Based on a learnability consideration, Goro (2004a, 2007) argued that the lexical parameter has a default value. Assuming that direct negative evidence does not play a crucial role in language acquisition, the learnability argument goes as follows. Suppose that a language L has the Boolean disjunction OR. In order to determine whether this item is +PPI or –PPI, the crucial data is a single clause negative sentence, in which the form A OR B appears in the potential scope domain of sentential negation (e.g. the object position in transitive sentences). With such a form as input, there are two different output truth conditions for the connective, corresponding to whether the item is +PPI or –PPI:

558 Takuya Goro (21)

Disjunction OR: a. OR [–PPI ] →¬ (A∨B) = ¬A ∧ ¬B b. OR [+PPI] →¬ A ∨ ¬B

Notice that the two truth conditions stand in a subset/superset relation: the situations in which ¬A ∧ ¬B is true are a subset of the situations in which ¬A ∨ ¬B is true. Thus, in every logical situation where ¬A ∧ ¬B is true, ¬A ∨ ¬B is also true. Given this, the learnability argument contends that an incorrect hypothesis that yields the superset truth conditions can never be falsified by positive input data. Therefore, the relevant parameters must have a default value so that children always start with a hypothesis that yields the subset truth condition. Such a default value for disjunction is [–PPI], hence the conjunctive interpretation of negated disjunction is the default interpretation across children learning different languages. The experimental data from young children are compatible with the predicted default values for the parameter. However, it must be pointed out that the conceptual underpinnings of the learnability argument has been challenged. Gualmini and Schwarz (2007), for example, argued that the semantic entailment problem could be gotten around either by taking pragmatic implicature into account, or by considering cases in which the relevant forms are embedded under a downward-entailing operator (see Gualmini and Schwarz 2007 for details). Furthermore, Jing (2008) pointed out that a –PPI disjunction (e.g. English or) may take scope over local negation, and showed that in appropriate experimental settings both adults and children can access the wide-scope interpretation of disjunction. Thus, the truth conditions associated with the two parameter values may not stand in a proper subset/superset relation, with [–PPI] value yielding both ¬A ∧ ¬B and ¬A ∨ ¬B. But even if the conceptual argument has to be lifted, the question of whether or not the relevant parameter has a default value remains as a valid empirical issue for future research.

23.5 Using Disjunction as a Probe into Children’s Semantic Capacity As we have reviewed in the previous sections, there are strong empirical grounds to support the conclusion that English-speaking children, at least by roughly age 4, have adultlike knowledge about the Boolean property of the disjunction or. This conclusion invites the research strategy of using children’s interpretation of or as a diagnostic test of their knowledge of the semantics of different operators: if children assign correct adultlike interpretations to or within the immediate scope of a certain operator, that would suggest they identify the semantics of the operator correctly. This strategy can bear a particularly insightful result when it is applied to an operator with which children have been reported to show non-adult performance. If children are able to compute the

Logical Connectives 559 semantic interaction between the operator and or, that would strongly suggest that they have adultlike lexical semantics of the operator. This result would, in turn, narrow down the range of possible sources of children’s non-adult behavior in interpreting sentences with the operator. One such case concerns the universal quantifier every. Various studies have demonstrated that when presented with a sentence like “Is every boy riding a pony?” with a picture showing three boys on a pony with an extra rider-less pony, 3-to 5-year-olds often respond “no,” pointing to the extra pony (e.g. Philip 1995; cf. Inhelder and Piaget 1964). This non-adult behavior is often referred to as symmetrical response. To account for the non-adult behavior, Philip (1995) argues that children’s grammar may allow the determiner every in sentences like every boy is riding a pony to quantify over events, rather than over individuals. Under that event quantification interpretation, the sentence means something like “For every event e that involves a boy or a pony, a boy is riding a pony.” Thus, for the sentence to be true on that interpretation, it must be the case that every pony is ridden by a boy, in addition to that every boy is riding a pony. Here, studies on children’s interpretation of sentences that involve every and or becomes relevant. Within sentences containing every in the subject position, the interpretation of or varies systematically according to its position. Specifically, when or appears within the noun phrase that every combines with (“restrictor”), it yields the conjunctive entailment. In contrast, when or appears within the predicate phrase (“nuclear scope”), it continues to provide the ordinary “disjunctive” reading: (22) Every troll who ordered French-fries or onion rings got some mustard. ⇔ Every troll who ordered French-fries got some mustard and every troll who ordered onion rings got some mustard. (23) Every ghostbuster will choose a cat or a pig. *⇔ Every ghostbuster will choose a cat and every ghostbuster will choose a pig. (Gualmini et al. 2003a: 146–148)

Experimental studies (Boster and Crain 1993; Guamlini et al. 2003) observed that children assigned different interpretations to or in sentences like (22) and (23). Specifically, when or appears within the restrictor of every, children assigned the conjunctive interpretation to or, and consistently rejected the test sentence in (22) when those trolls who ordered onion rings did not get any mustard. In contrast, children interpreted or in the nuclear scope of every disjunctively, and consistently accepted sentences such as (23) even when there are some ghostbusters who didn’t choose a pig. These asymmetrical interpretations of or are unexpected if children interpret every as taking sentential scope, quantifying over events. This in turn suggests that children’s semantic composition of sentences with every is essentially adultlike, and the source of the non-adult, symmetrical response by children must reside somewhere else. Another case involves the focus operator only. Previous research found that preschool children often assigned non-adult interpretations to sentences involving only. For example, Crain et al. (1994) observed that children often associate pre-subject only

560 Takuya Goro with VP, and interpret, for example, Only John speaks Spanish as meaning that John only speaks Spanish, and the exact nature of the non-adult interpretations is under ongoing debate (e.g. Paterson et al. 2003). Goro et al. (2005) sought to bring about a new perspective on the issue, by investigating children’s interpretation of sentences with only and or. They pointed out that the meaning of sentences involving only and or, such as (24), is decomposed into two parts, as in (25) (e.g. Horn 1969): (24) Only Bunny Rabbit ate a carrot or a pepper. (25) i. Bunny Rabbit ate a carrot or a pepper, AND ii. Everyone other than Bunny Rabbit did not eat a carrot or a pepper Notice that the two different meaning components affect the disjunction or differently. Within the first component of the decomposed propositions, or is interpreted disjunctively with respect to what Bunny Rabbit ate. Thus, for the sentence to be true, it must be the case that Bunny Rabbit ate a carrot or a pepper (but not necessarily both). In contrast, within the second component, or receives the conjunctive interpretation with respect to what everyone other than Bunny Rabbit did not eat. Hence, for the sentence to be true, it must be the case that everyone other than Bunny Rabbit did not eat a carrot and everyone other than Bunny Rabbit did not eat a pepper. In other words, the disjunction or in (24) is “two-faced”: it is effectively interpreted twice, giving rise to distinct truth conditions. In their experimental study, Goro et al. (2005) found that children at around age 4 correctly interpreted the two-faced or. In one of their experimental stories, Bunny Rabbit and another two characters were introduced, and the puppet made a prediction about what would happen next in the story: for example, “I think only Bunny Rabbit will eat a carrot or a pepper.” In one condition, Bunny Rabbit proceeded to eat a carrot, but neither of the other characters ate anything. In this condition, children consistently accepted the test sentence, showing that they interpreted or disjunctively with respect to what Bunny Rabbit ate. In the second condition, Bunny Rabbit proceeded to eat a carrot, and another character ate a pepper. In this condition, children consistently rejected the test sentence, showing that they interpreted or conjunctively with respect to what everyone else failed to eat. Taken together, the results suggest that adultlike semantic composition of the meaning of sentences containing or and only is fully operative at around age 4. This implies that the relevant semantic computation is not subject to a performance problem that is caused by some limitation on the processing capacity of children, although the computation appears to be quite complex. In interpreting the crucial test sentences, children must have created representations for two separate propositions, and computed the truth conditions for each of the propositions individually, in order to derive the distinct truth conditions with the “two-faced” or. Children’s adultlike behavior with the “two-faced” or suggests that the processing capacity of children at around age 4 is capable of carrying out the computation.

Logical Connectives 561

23.6 Conjunction In most of the earlier studies on the acquisition of logical connectives (e.g. Suppes and Feldman 1969; Paris 1973; Beilin and Lust 1975; Roberge 1975), the conjunction and was found to be easier than the disjunction or for young children. With preschool children, and almost always elicited a higher rate of adultlike responses than or did. In the 2000s, as we have reviewed 23.2 and 23.3 above, research on disjunction shifted its focus on the acquisition of semantic interaction between disjunction operator and another logical expression in the same sentence. This research strategy has proven to be fruitful, revealing that preschool children have adultlike semantic knowledge of the disjunction or, even though they may have problems in the domain of pragmatics. To date, however, there are only a handful of studies on conjunction that focus on the acquisition of the semantic interaction between conjunction and another logical element. One reason behind this situation is that the Boolean character of natural language conjunction is not as stable as that of disjunction. For example, Szabolcsi and Haddican (2004) point out that the Hungarian conjunction és conjoining two definite NPs systematically resists the Boolean “not both” interpretation under the scope of negation. (26) Nem látta Katit és Marit. Lit. ‘He didn’t see K and M.’ * ‘not both’ √ ‘neither’ (27)

Nem hiszem, hogy látta volna Katit és Marit. Lit. ‘I don’t think that he saw K and M.’ * ‘not both’ √ ‘neither’

(Szabolcsi and Haddican 2004: 222)

As illustrated in (27), the non-Boolean, “neither” interpretation of és persists even under extra-clausal negation. This behavior of és contrasts with Hungarian disjunction vagy, which is a PPI and yields the Boolean interpretation when there is a clause boundary between negation and the disjunction (e.g. (13)). Szabolcsi and Haddican argue that definite conjunctions with és are given the same denotation as definite plurals and, therefore, yield a non-Boolean interpretation under negation. Szabolcsi and Haddican further argue that the same definite plural interpretation is available for English definite conjunctions with and. In fact, the Boolean “not both” interpretation of and in simple negative sentences seems to be a rather marked reading for English-speaking adults, sometimes requiring focal stress on and. In an experiment by Goro et al. (2006), adult English speakers resisted assigning the “not both” interpretation to the conjunction and (without focal stress) in sentences like (28) for over two-thirds of the time.

562 Takuya Goro (28) The Smurf didn’t jump over the tree and the pond. Japanese conjunction … mo … mo, according to Goro (2004a, 2007), displays another pattern. While it lacks the “not both” interpretation in simple negative sentences (e.g. (11)), the Boolean interpretation becomes systematically available in embedded contexts, suggesting that the lexical item is a Boolean PPI conjunction: (29)

John-wa [Mary-ga supeingo mo furansugo mo hanasu-to] iwa-nakat-ta. John-top Mary-nom Spanish also French also speak-Comp say-neg-past ‘John didn’t say that Mary spoke both Spanish and French.’ √ ‘not both’

In short, the semantics of conjunctions in natural language shows greater variation than that of disjunction, and the exact nature of the variation is not yet fully understood. A problem for language acquisition research is that it is not always possible to exploit the Boolean character of conjunction in order to assess children’s knowledge of the lexical item. English conjunction and, for example, seems to be ambiguous between Boolean and non-Boolean interpretations, and therefore most experimental designs with the disjunction or cannot be straightforwardly carried over to the study of the acquisition of conjunction. To resolve the problem, Goro et al. (2006) adopted different experimental materials, namely sentences with the focus operator only. In these sentences, the Boolean interpretation of the conjunction and systematically emerges even when conjunction does not receive focal stress. For example, imagine that the sentence in (29) describes the result of a “jumping contest” with the Smurf and other two characters: (29) Only the Smurf jumped over the tree and the pond. Goro et al. (2006) found that English-speaking adults systematically accept the sentence when the Smurf jumped over both the tree and the pond, and the other characters jumped over either the tree or the pond (but not both). This suggests that the sentence is interpreted as “everyone other than the Smurf didn’t jump over both of the obstacles,” with the Boolean “not both” interpretation of the conjunction and. Goro et al. further observed that English-speaking 4-year-olds showed exactly the same behavior: children consistently accepted the Boolean “not both” interpretations of and within sentences containing only. Goro (2004a, 2007) employed this experimental material to investigate Japanese children’s knowledge of the conjunction … mo … mo. As we have just discussed, … mo … mo is a PPI, and is therefore forced to scope over local negation, failing to instantiate the “not both” interpretation in simple negative sentences. This property of … mo … mo parallels that of the disjunction ka in Japanese. However, children’s behavior with … mo … mo contrasts with their behavior with ka. Remember that Japanese children, at around age 5, assigned non-adult Boolean (i.e. “neither”) interpretations to ka in simple

Logical Connectives 563 negative sentences. The same children, however, systematically assigned adultlike interpretations to … mo … mo in sentences such as (30) (Goro and Akiba 2004): (30) Butasan-wa ninjin mo piman mo tabe-nakat-ta. pig-top carrot also pepper also eat-neg-past Lit. ‘The pig didn’t eat the carrot and the pepper.’ Children consistently rejected the sentence when the pig ate, for example, the carrot but not the pepper. This suggests that children interpreted the negated … mo … mo as ‘neither.’ This can be because Japanese children at this age have already learned that … mo … mo is a PPI. Another possibility is that they were assigning non-Boolean semantics to the conjunction. To tease these possibilities apart, Goro (2004a, 2007) carried out an experiment using test sentences with … mo … mo and dake ‘only.’ In one condition, the child participant was introduced to three Pokemon characters (Pikachu, Zenigame, and Hitokage) who were going to attempt to open boxes with their PSI power. There was a blue box and a black box. Pikachu successfully opened both of the boxes, and the other two opened either one of the boxes. Then the puppet presented the test sentence (31) as his description of what happened: (31) Pikachu-dake-ga aoi hako mo kuroi hako mo aketa. Pikachu-only-nom blue box also black box also opened ‘Only Pikachu opened the blue box and the black box.’ Under the adult interpretation, the conjunction … mo … mo in (31) receives the non- Boolean “not both” interpretation with respect to what everyone other than Pikachu opened. Hence the sentence is true in this condition because nobody other than Pikachu succeeded in opening both boxes. The finding was that Japanese children (at age 5) systematically accepted the test sentence, just like adult controls. This result suggests that Japanese children identify … mo … mo as a Boolean conjunction, and demonstrates that neither children’s lexical semantics nor the scope property of … mo … mo deviate from adult knowledge. It remains mysterious, however, how exactly Japanese children had avoided the hypothesis that … mo … mo is a non-Boolean conjunction. Since natural languages do allow non-Boolean conjunctions (e.g. Hungarian és), the hypothesis cannot be ruled out by turning to some innate restriction on possible semantics of conjunction. This is an open issue for future research.

23.7 Conclusion In this chapter, I have tried to sketch out the results of recent studies on the acquisition of logical connectives in natural language. My main goal is to show how developmental psycholinguistics informed by theoretical linguistics has uncovered preschool children’s knowledge of logical connectives. There are two important theoretical grounds that

564 Takuya Goro greatly advanced developmental research. One is the dissociation of pragmatic implicature from the semantics of disjunction, and the other is the model of semantic interaction between a logical connective and another logical word in the same sentence. Studies that incorporated these perspectives have revealed that preschool children’s knowledge of logical connectives, especially of disjunction, had been underestimated, and children in fact have quite sophisticated semantic knowledge of the expression. Furthermore, theoretical studies on cross-linguistic variations of logical connectives motivated cross- linguistic studies on the acquisition of connectives, which resulted in finding otherwise unexpected universality in child language. These results illustrate the importance of combining experimental studies on children’s knowledge with formal theories of adult language. I hope that our understanding of child development and adult knowledge of logical connectives advance in tandem in the future.

Chapter 24

The Expres si on of Generici t y i n Child L ang uag e Ana T. Pérez-L eroux

24.1 Introduction At the age of 1;11 (1 year, 11 months), a little boy named Kendall once said the sentence doggy bark (extracted from the Braine corpus in the CHILDES database; MacWhinney 2000). Was Kendall referring to a certain dog? Or was he describing, not an actual dog or dogs, but to “dogs” in general? To answer this, we need to consider not just the context of Kendall’s utterance, but also the question of how and when children acquire the ability to convey generic propositions. The goal of this chapter is to present the answers to these questions that are currently emerging from recent work in psychology, linguistics, and philosophy. Generic expressions convey information that pertains to species or kinds of objects, rather than to individuals (Carlson and Pelletier 1995). They have distinct semantic properties, which manifest in a range of verification conditions that vary across kinds and properties. They also occur in a variety of grammatical forms. Despite the complexity of the domain, even very young children produce sentences that clearly seem to contain generic expressions. Here are some examples from both the literature, and from samples of spontaneous speech of children extracted from the CHILDES database. (1) Doggy goes poop. (Gelman 2003) Eyes are like this. (Sarah 4;03) (Brown Corpus) The funny thing about numbers is that numbers go on for ever and letters only goes on for a while. (Ross, 5;11, MacWhinney Corpus)

566 Ana T. Pérez-Leroux Generics are among the linguistic means that allow us to use language to speak of things that go beyond the here and now. Researchers are very interested in children’s use and comprehension of generic expressions, partly because of their importance for the study of conceptual development, and partly because of the linguistic complexity of the domain of generics. In this chapter, we will review data on children’s patterns of use and experimental performance. The evidence suggests that, much like adults, young children understand that generic expressions go beyond the specific instances encountered in a given context. In section 24.2 I introduce the basic questions of meaning and form in generic expressions, discussing the relevant grammatical markers in English and other types of languages. Section 24.3 explores children’s early use of generics, use of generics in the speech of parents, and examines the evidence on how much children know about the meaning and form of generic expressions. Section 24.4 considers the nature of the learnability problem, and discusses some ideas that have been proposed to explain learning of this complex domain. Section 24.5 presents some conclusions.

24.2 What is a Generic Expression? 24.2.1 Basic Definitions Generic knowledge is about kinds of things rather than about specific objects. To verify the truth of specific statements such as (2), all a speaker needs to do is inspect one individual animal in a given context. The more general statement in (3), in contrast, is not context-bound. To evaluate this judgment, speakers need to consider the definition of the kind or species as a whole, as shown by (4): (2) The animal in our yard is not a dog. (3) Raccoons are not related to dogs. (4) This chicken may have no feathers, but chickens do. Generics are of various types. A linguistic expression is generic if it either makes direct reference to kinds or species, as in (5), or to regular properties, events, or facts, predicated of individuals or kinds, as in (6) (Krifka et al. 1995). The examples in (5) describe the species or kinds (potatoes, whales, and raccoons) as a whole. This type of generic expression is known as a kind noun phrase (kind-NPs). These cases involve a special type of predicate, kind-selecting predicates. Characterizing sentences as in (6) constitute a related but different type of generic expression. These involve object-level predications that apply to individuals. They are considered generic because they denote regular properties or generalizations that a speaker can make about kinds, about typical individuals, or ‘instances of a kind’ or about specific individuals. Here genericity is a property of the sentence, rather than of a specific NP.

The Expression of Genericity in Child Language 567 (5) a. The potato was first cultivated in South America. b. Blue whales are in danger of extinction. c. Raccoons are not related to dogs. (Kind reference) (6) a. My dog will eat anything. (Generalizations about a specific individual) b. A potato will grow in many ecosystems. (Generalizations about instances of a kind) c. The potato is highly digestible. (Generalizations about the kind as a whole)

24.2.2 The Meaning of Generic Expressions Are generalizations different from quantifiable truths? At first glance generic expressions appear near identical to associated quantificational sentences as shown in (7). However, closer examination reveals important differences between the two types of expressions. An important difference pertains to their behavior regarding the availability and prevalence of members of the species to which the property does not apply. If we consider the cases in (7), we see that their truth values differ in the presence of counterexample (7) a. All raccoons are gray-brown. b. Raccoons are gray-brown. c. Most raccoons are gray-brown.

(Universal quantifier) (Generic sentence) (Proportional quantifier)

The universal quantifier has the strongest meaning, so that finding a single albino raccoon renders (7a) false. In contrast, the generic in (7b) remains valid. Generics appear to be somewhat immune to counterexamples, possessing a similar flavor to sentences with proportional quantifiers such as in (7c). Prevalence judgments of this latter type are true as long as the property holds for the specified proportion of the individuals in the species. This is not always true of generics. Some generics remain true even though the property does not hold for the majority of the individuals. The same speakers who believe (8) about tortoises will agree that (9) is true. (8) Young tortoises, as soon as they are hatched, fall prey in great numbers to a variety of hazards: succumbing to predators, falling into cracks, being crushed by falling rocks, or excessive heat stress.1 (9) Tortoises live a long life. This example shows that the truth conditions of generic sentences are not the same as that of quantified prevalence sentences of the form most x, since it is true that tortoises live a long life, but most tortoises do not. By uttering (9) we express our belief that given a chance, typical tortoises live long lives. Getting snatched by a predator is a 1 .

568 Ana T. Pérez-Leroux circumstantial event; the potential for longevity is a general characteristic we believe to be true of tortoises. The question is whether generic statements involve either principled or statistical connections between properties and kinds (Prasada and Dillingham 2006). On this matter, the field is split. For some authors, the special property of generic knowledge is that it is normative, rather than statistical (Prasada 2000). Under this view, generic knowledge is knowledge of what properties ought to hold of a member of a kind, given normal conditions, and thus, information represented generically has different status. For instance, it is known that generics elicit different types of causal inferences from statistical prevalence judgments. Children as young as 4 years of age invoke different types of causes for facts learned from generic and non-generic statements (Cimpian and Markman 2009). They use function to explain generically attributed properties (for instance, “snakes have holes in their teeth to chew better”), but explain individually or quantificationally-attributed properties by means of accidental causes (“some snakes have holes in their teeth because of bug bites”). Under the opposing view, generics are generalizations or prevalence judgments that apply uniformly but not universally across the different types of members of a kind (Leslie 2008). The question is whether the conditions that make a generic statement appropriate vary across different conceptual domains and property types. We can say, for example, that having four legs in (10a) is an intrinsic property of dogs; that’s just how dogs are. But being red is not an intrinsic property of barns, so we can say that (10b) simply reflects the prevalence of red barns. (10) a. Dogs have four legs. b. Barns are red. Many generics belong in these two broad types, but there are others more complex. Consider the contrast in (11). In these generic sentences the predicate does not reflect a principled characterization of the kind. Instead, such sentences seem to appeal to comparisons between kinds (Leslie 2008). Being carriers of the West Nile virus is neither an intrinsic nor a prevalent property of mosquitoes in (11a), as it is estimated that 99 percent of mosquitoes are not carriers. However, to negate this proposition, as in (11b), would be false. (11) a. Mosquitoes carry the West Nile virus. b. Mosquitoes do not carry the West Nile virus.

(True) (False)

According to Leslie (2008), the truth of (11a) depends, not on prevalence or necessity of the property as applied to members of the kind, but to an implicit reference to a comparison between species of animals that are potential carriers of the virus, and those that are not. The structure of the conceptual domain gives rise to contrasts in the acceptability for sentences about prevalent but non-essential properties. Generic sentences show interesting contrasts with parallel quantifier sentences. Speakers tend to reject (12), but accept (13). These cases show that generics do not allow the exceptions to constitute a

The Expression of Genericity in Child Language 569 salient subset of the domain. However, this requirement is absent from overtly quantified statements such as (13). (12) ?Primary school teachers are female. (13) a. Most primary school teachers are female. b. Primary school teachers are generally female. According to Cohen (2004), what matters is not direct quantitative prevalence (how many mosquitoes are actually Nile-virus carriers, or how many teachers are actually female, but that the property applies (or can in principle apply) homogeneously across the salient partitions of the kind. The most important partition for human-kind is gender itself, hence the oddity of (12). A recent experiment by Brandone and colleagues shows children evaluate generic statements based differently depending on the type of property attributed. Children, like adults, are likely to accept statements such as “do lions have manes?,” but are confused by others such as “are lions males?” (Brandone et al. in press). These authors interpret the results to show that children attend to more than the proportion of category members for which a property holds true. In Cohen’s terms, this experiment establishes that children are sensitive to the internal conceptual structure of the domains, and to the requirement that generic propositions apply across salient partitions of a given domain. Generic properties are part of our intuitive theories of the categories described, and represent default rules of how those categories behave. Generic knowledge is tied to the structure of a conceptual domain, and we treat various domains differently. We are more likely to agree with properties that emerge as part of the life cycle of a given kind, independent of statistical prevalence (Cimpian et al. 2010b) and to produce more generic statements about animals than artifacts, even in the absence of prior experience about the kinds presented (Brandone and Gelman 2009). This is also true of generic sentences uttered by children (Gelman 2003), despite the fact that their conceptual models of biological kinds are not as developed as those in adults (Gelman and Bloom 2007).

24.2.3 The Grammatical Form of Generic Expressions Grammatically, generic NPs in a language like English appear in various forms, most commonly definite singulars, indefinite singular, bare plurals, and mass nouns, as in (14): (14) a. The potato was first cultivated in South America. b. A potato will grow in many ecosystems. c. Potatoes are highly digestible. d. Milk is good for you.

(Definite singular) (Indefinite singular) (Bare plural) (Bare singular mass noun)

570 Ana T. Pérez-Leroux There is an intuition that indefinite singular NPs are different from the other three types. Speakers judge these not appropriate in kind-selecting contexts, except under a taxonomic reading (Carlson 1977). This suggests that indefinite singulars only function as generic expressions by introducing typical individuals as instances of a kind. The only interpretation available for (15) is derived in this way, interpreting subspecies within the kind as instances. (15)

A bird is dying out. → “a type of bird”

Most of our understanding of generic expressions, and of children’s acquisition of genericity, comes from studying English. But to apprehend the full dimensions of the acquisition problem, one must consider the range of cross-linguistic variation in the syntax of NPs and its consequences for the expression of genericity. Languages possess a variety of different grammatical devices associated with the expression of generic meanings. They also draw on a variety of semantic and pragmatic cues that have the potential of contributing to the interpretation of a certain sentence as generic rather than episodic (Behrens 2005). The whole range of grammatical features involved, and how they interact, has yet to be fully mapped out. However, some clear generalizations and parameters of variation emerge from the study of typological variation. The first is that no language has been identified as possessing a determiner specifically devoted to generic reference. Therefore generic meanings always overlap with other senses in given syntactic contexts, which vary across languages (Dayal 2004, 2009). The availability of generic meanings for a given form thus depends on the overall inventory of forms available in a given language. The crucial factors seem to be the availability of determiners and grammatical number, and the extent and distribution of bare nouns (Chierchia 1998). These features interact with other associated markers such as case, topic, and tense and aspect markers. Generic reference accommodates in bare forms to the extent these are available in a language (see discussion in Gavarró et al. 2006). In languages without determiners, bare nouns can have definite reference (i.e. link to uniquely identified, previously introduced referents), as existentials, and as generics. In these bare noun languages, the availability of generic readings may depend on case, as in Finnish, or on the absence of classifier marking, as in Korean and Tagalog. Discourse-wise, generics are often topics, so in these languages generic expressions frequently appear with topic markers, although they can also surface as bare forms (Behrens 2005). In languages without determiners but with number marking, both singular and plural NPs have generic readings (Dayal 2004). These bare plurals and singulars can appear with predicates that select kinds, such as die out, evolve, and with object-level predicates. (16)

a. kutta aam janvar hai dog common animal is ‘The dog is a common animal.’ b. kutte yehaaN aam haiN dogs here common are ‘Dogs are common here.’

(Hindi)

The Expression of Genericity in Child Language 571 Languages that possess both determiners and number marking generally express singular kinds with definite determiners, but vary in terms of how plural generics work. Depending on the language, plural generics pattern with indefinites or with definite NPs (Dayal 2009). As shown in (17), in English plural generics share the form of indefinite plural reference. In contrast, in the Romance languages, plural generics share a determiner with plural definites, as illustrated in (18a–d) in Spanish (Dobrovie-Sorin and Laca 1998). In these cases, existential interpretations are expressed by means of indefinite articles (as in Spanish) or partitives, as in French, shown in (18e). (17)

a. The dog is a carnivore. (Singular generic) b. Dogs bark. (Plural generic) c. Dogs are barking outside. (Plural indefinite) d. #The dogs bark. (Generic interpretation not available)

(18)

a. *Perros ladran (Bare plural subjects not possible) dogs bark b. Los perros ladran. (Definite plural generic) the-plural dogs bark c. Los perros están ladrando. (Definite plural specific) the-plural dogs are barking d. Unos perros están ladrando. (Spanish) Indef-plural dogs are barking e. Des chiens sont en train d’aboyer. (French) Partititive-plural dogs are barking

This variation can be understood in terms of the mapping between syntax and semantics. Chierchia (1998), for instance, notes that in languages with a restricted distribution for bare nouns, definite plural generics are available. To account for typological variation, he proposes to refine the canonical mapping for the nominal domain, where NPs are exclusively given a property or predicate interpretation, and DPs are interpreted as entities. His point of departure is the observation that nominal expressions have an ambiguous role. In their role as quantifier restrictors, they function as property-denoting predicates. In their role as names of kinds, however, they are able to make direct reference to entities, functioning as semantic arguments. Chierchia proposed to parameterize the mapping of NPs, arguing that languages vary on whether they are mapped directly as semantic arguments (type e), or semantic predicates (type ). He proposed a nominal mapping parameter to differentiate three language types. First, languages without determiners such as Chinese, which map nominals directly as entities, would have bare nouns interpreted referentially, existentially, or generically. Second, in languages with generalized use of determiners (e.g. Romance), a DP layer is required for the nominal to be a semantic argument. As bare nouns do not appear in subject position, definite plurals allow generic interpretations. Mixed languages such as English allow bare NPs to map directly as entities (as mass nouns) or as semantic predicates requiring D to be referential (for count nouns). Bare plurals can be

572 Ana T. Pérez-Leroux interpreted existentially or generically, depending on the operator inserted. Chierchia’s nominal mapping parameter predicts early acquisition of determiners in predicate-type languages, in comparison to mixed, English-type, languages. This acquisition prediction for language types appears to hold for several languages studied (Guasti et al. 2008), and also to determine the direction of bilingual influence in the acquisition of determiners (Kupisch 2006). Other semanticists describe variation across these languages in terms of parameters of lexicalization of type-shifting operations. These are the procedures for changing the semantic type of an expression to fit a given semantic context. For Dayal (2009), language differences depend on the cut-off points for lexicalization of type-shifters on a universal scale of specificity.

24.2.4 The Interaction between Syntax and Semantics in Generic Expressions Generic sentences are analyzed as possessing a logical form (LF) similar to that of quantifier sentences. Their meaning is represented by a tripartite quantificational structure consisting of an operator (in this case Gen), a restrictor and the nuclear scope where the operator applies. The syntax-to-LF mapping is straightforward in the case of sentences with overt quantifiers in subject position, as in (19). (19) a. Tripartite structure Sentence

Quantifier

Restrictor

Scope

b. Syntax S

NP

Quant N Every tiger [quantifier] [restrictor]

VP

has stripes. [scope]

The representations in (19) indicate a direct correspondence between constituents and their interpretations. Deriving the logical form of sentences with adverbs of quantification or quantifiers in other positions requires additional structural assumptions, but still constitutes a fairly direct implementation of this approach. For generics, an implicit operator Gen, semantically close to the overt adverb generally, is assumed to bind any variable that remains free in the sentence (Carlson 1977).

The Expression of Genericity in Child Language 573 A generic sentence such as Tigers are striped would have a logical form representation as in (20): (20) Gen x [[x is a tiger] [x is striped]] Gen works as a default operator, applying only when explicit quantifiers are absent. It applies unselectively to variables not overtly bound, similarly to the working of existential closure proposed by Heim (1982) for indefinites. Diesing (1992) analyzes indefinite subjects by assuming that the bare plurals are generic when introduced into the restrictive clause and bound by Gen, but interpreted existentially when they remain in the nuclear scope, and become bound by the existential operator (∃). (21)

Firemen are intelligent. Firemen arrived at the scene.

This proposed mapping from syntax to generic interpretation is relevant to the contrast of bare plurals with stage and individual-level predicates. With stage-level predicates, bare plurals are ambiguous between a generic or existential interpretation, as in (22). With an individual level predicate, only the generic reading is present, as in (23). (22) Firemen are available (i.e., ‘in general,’ or ‘right now at the station’) Gen x [ [x is a fireman] [ x is available]] [∃x: x is a fireman & x is available] (23) Firemen are intelligent (i.e. only in general) Gen x [ [x is a fireman] [ x is intelligent]] NOT [[∃x: x is a fireman & x is intelligent]] Diesing proposes the contrasts originate in the availability of two subject positions, one VP internal, one external. The VP domain maps onto nuclear scope and is the domain of application of existential closure; the higher (IP) domain of the clause maps onto the restrictor clause, where Gen applies. Subjects of stage level predicates have either positions; but subjects of individual level predicates remain in IP and must be generically interpreted.

24.3 Children’s Acquisition of Generic Expressions 24.3.1 The Problem of Learning Generics What do children need to learn to master generics? Generic expressions present two separate learning problems. One problem is the problem of conceptual development. How do children acquire generic knowledge, given that direct experience with the

574 Ana T. Pérez-Leroux world is, by necessity, always of the instances and never of the kind? For any observation of a specific object, children must make a decision on whether the characteristics of this object are representative of the kind, or just the individual. If a child sees a black dog, with a tail, the child should generalize that dogs have tails, but avoid overgeneralizing that dogs are black. By necessity, generalizations are learned only on the basis of a limited number of exemplars. Furthermore, the same learning mechanism will have to induce generalizations even in the cases where the property is not prevalent within the kind, as previously discussed. One key question pertaining to this problem is what is the role of language in supporting children’s identification of generic properties. The second learning problem involving generics is strictly a language acquisition problem. Given a generic sentence, how does a child know to interpret it generically? As discussedin section 24.2, there is substantial variation across languages, yet there are solid typological trends. Therefore, children must learn several things about the grammatical inventory of their target language to fully map out this domain. How difficult is it to solve this language acquisition problem? For each domain of grammar, the answer to this question depends on two dimensions: one is the complexity of the domain, which includes the complexity of the structure itself, and the robustness of the form–meaning mappings. As we have already seen, generics have complex semantics, and complex mapping relations to the functional system. One possibility is that genericity functions as a default form of reference, freely available to children. This would simplify the semantic problem but leave the problem of typological learning intact. The second dimension pertains to the availability of the relevant experience in the child input: how frequent and transparent are the models that are available to children in the language input?

24.3.2 The Mapping Problem How does a child learn which grammatical components of a sentence will restrict or allow the association with generic reference? Given the considerable problems of polysemy and ambiguity, the problem of mapping grammatical forms to generic meaning is not trivial. Taking into consideration the range of variation in the form of generic expressions, this is clearly a many-to-many mapping problem. What the target forms are shows variation across languages. For now we concentrate on the best-studied language, English. In this type of language, the main grammatical ingredients relevant to generic interpretations are determiners and number. (24) Meaning-to-form mappings Generic

Definite singular [the potato] Indefinite singular [a potato] Bare plural [Potatoes ]

The Expression of Genericity in Child Language 575 (25) Form-to-meaning mappings Definite singulars

Anaphoric [The potato is in the microwave.] Generic [ The potato is a common staple in the Andes.

Indefinite singulars

Non Specific indefinite: [ Everyone should have a cookie.] Specific indefinite: [A boy is sitting next to her.] Generic: [ A cookie has sugar, eggs and flour.]

Bare plurals

Non Specific indefinite: [ I want cookies.] Specific indefinite: [Boys are sitting next to her.] Generic: [ Cookies have lots of calories]

These mappings are multilayered: several forms to a meaning, several meanings to a form. For generics, determiners and grammatical number interact. Singular definites can be generics but plural definites cannot.

24.3.3 The Input Problem Several studies have investigated the availability of generic expressions in the input. In the speech parents address to children generics appear consistently, although they are less frequent than other expressions of nominal reference (Gelman 2003). Parents’ use of generics is particularly frequent in story-telling settings (Baby birds eat worms), when compared to other activities such as play activities involving toys, or in unstructured, at- home interactions. Parents also produce more generic expressions making reference to animals than to artifacts or other things. In a study eliciting generics with a picture-book task, Pappas and Gelman (1998) found that the majority of mothers (92 percent) used at least one generic expression, and on average 11 percent of the target utterances produced were generics. Their goal was to explore the de-contextualized property of generics, by examining parent’s occasional mismatch between number of tokens in the context, and grammatical number used when making reference to the kind. They hypothesized that number marking in generic expressions would be more context-independent relative to number marking in referential expressions. When discussing a picture containing a single example, parents used almost exclusively singular NPs with non-generic sentences, but more plural NPs if making generic statements. A picture of a single squirrel prompted singular direct descriptions of the situation (i.e. the squirrel is on a tree), but plural generic characterizations (They eat acorns). Last, one study examined the distribution of generics across different form and syntactic positions in the input. Sneed (2008) reveals that generics in parental speech tend to appear mainly in subject position, as bare plurals or indefinites. They are attested infrequently as objects of psychological verbs (love, fear, hate, etc.).

576 Ana T. Pérez-Leroux In sum, in a language like English generics are used frequently enough to provide robust models, and they tend to be associated with specific forms and syntactic positions. They are more frequent with reference to animate kinds, than with artifactual or other conceptual domains. Formally, they appear primarily in subject position (Sneed 2008). Last, there are interactions between context and generic expression (such as number mismatch) that can signal to the child that the utterance is generic. How is input expressed in other languages? Is it more difficult to acquire generics in languages with less explicit cues? Some languages do not have the same grammatical cues as are available in English. Mandarin, for instance, lacks three of the grammatical markers that determine whether the sentence has generic reference: determiners, numbers, and tense. In this type of language, there is possibly more ambiguity as to whether an expression is generic or not. Gelman and Tardif (1998) studied the extent to which generic expressions could be identified in Chinese, and their frequency relative to English. In a study of English and Mandarin child-directed speech, they recorded parent–child pairs in a variety of activities, including a specifically designed storytelling task. Although generic expressions appeared in parental speech in both languages, they were more frequent in English than in Mandarin. These authors speculate that formal properties of the language may prompt speakers to notice and use generics relatively more if genericity is conveyed by means of obligatory grammatical cues. They leave open the possibility that the structure of a language may have an effect on the frequency of a speaker’s use of generic expressions, and even on how frequently they consider abstract kinds. This latter point of interpretation pertains to the hypothesis that the grammatical form of a language interacts with the conceptual representations most available to speakers. This is the linguistic relativity hypothesis, also known as the Sapir–Whorf hypothesis. While Gelman and Tardif do not endorse the Sapir–Whorf hypothesis, they note that others have raised it before for generics in Chinese, and that their own data reveals frequency differences. Their case study illustrates a potential methodological problem with this type of research, which arises from the direct comparison of forms abstracting from the fact that their relative distribution is determined by the overall structure in the language. In the specific case of measuring generic expressions in this study, the availability of null arguments in Mandarin was not factored into the measurement. Given that generics are often sentence topics, and that null topics are possible, even preferred, in these types of languages, the observed differences in frequencies between (overt) generic NPs across the two languages should not come as a surprise. One possibility is that Chinese generics should frequently remain unexpressed. Is there really a difference between the frequencies of generic reference in a Chinese-type language, versus in an English-type language? If this were the case, would it imply that Chinese speakers should be less willing to interpret overt bare NPs as generic? The evidence in two recent studies indicates the answer is negative to both questions. One is a comprehension study by Munn and colleagues that will be discussed in section 24.3.5. The other is a study testing the edge of the poverty of stimulus problem, by examining generic use in a population of children raised outside a language community.

The Expression of Genericity in Child Language 577 Goldin-Meadow et al. (2005) observed deaf children whose hearing losses have prevented them from learning the speech of their hearing parents, and whose families were unable to provide them with a language community accessible to them in the visual modality, that is, a sign language. Without early exposure to a sign language, these children develop autochthonous gesture systems that have language-like structure at many different levels (home signing), while lacking full access to language experience. The study compared preschool hearing and deaf children in Taiwan and the US, to test whether the lack of access to language input will have an impact on these deaf children’s use of generic reference. This study is important in two ways. First, the data from the control speakers addresses the question of differences in rates of generic use across languages. The data clearly address the typological question: they find no difference between Taiwanese and American children’s rates of generic propositions. This suggests that initial observations regarding low frequency of generic propositions in Chinese were incorrect. The second, more important contribution of this study is that it extends the question of input to this special population that by virtue of external, community- level factors is growing with atypical input conditions. In this regard, the results from the deaf children are striking. The researchers analyzed these children’s home signs for identifying generic statements, aiming to compare the prevalence and patterns of use of generic expressions in their language to that of the speech of English-and Taiwanese- speaking children. The deaf children in both communities produced clear examples of generics in the gestural system they had invented. In one instance, when responding to a toy fish, one child made the sign for “fish hurt,” to indicate that fish hurt when they bite. In another instance, they point from a gorilla to a toy cage to indicate that “gorillas go in cages.” Clearly, these are generic, de-contextualized references. These deaf children produced generics at about the same rates as their hearing age-mates, and their productions also exhibited the same animacy bias found in other studies. The implications are clear: young children are not dependent on a language model to produce generic utterances, and their conceptual biases reflect the same general universal tendencies. This suggests that generic knowledge emerges spontaneously from experience, and that this is possible in the absence of exposure to generic language.

24.3.4 What Children Know about Generic Expressions: Evidence from Production Children, in typical circumstances, are exposed to generic language, and they learn it. How fast can they learn generics? Studies on children’s production suggest it is very fast: Generic use is clearly documented by age 2 (Gelman 2003). (26) Dogs go ruff-ruff and them have long tails Milk comes from a cow.

578 Ana T. Pérez-Leroux Generics then appear in children’s speech by the time their utterances have enough functional structure for researchers to be able to distinguish generic from non-generic NPs. At age 2, 4.5 percent of animate NPs are identifiable as generic expressions, and this doubles by age 3 and nearly triples by age 4, becoming as prevalent as in adult speech. Other features of early child generic use suggest there is continuity between the generic language of parents and their children. Like adults, children produce more generic NPs referring to animal kinds than to other kinds. Likewise, they express generics with number marking that is independent from the numerosity of the elements in the context, whereas their non-generic expressions follow the context in terms of number. Finally, young children also use a variety of NP forms from the outset to express generic reference (Gelman 2003).

24.3.5 What Children Know about Generic Expressions: Evidence from Comprehension Does evidence of use equal evidence of mastery? Most of the research on this topic does not focus on the acquisition of the grammar of generic expressions, but on how language drives the acquisition of generic knowledge. For the latter purpose, psychologists have conducted a variety of comprehension experiments on how different grammatical cues may support children’s recognition of generic reference. Gelman and Raman (2003) considered the potential role of morphosyntactic and pragmatic cues in children’s access to generic interpretations. In one experiment, they tested whether children were sensitive to definite and bare plurals. Children were shown pictures of two atypical entities (for instance, two penguins, birds that are atypical because they don’t fly), and asked questions about them (Do birds fly? versus do the birds fly?). All the groups of children distinguished bare from definite NPs in their responses, giving generic answers for bare plurals (“Birds fly”), but dissociating the generic property from definite plurals (“The birds don’t fly”). The materials in this study included artifact kinds (such as telephones and hats) and non-essential properties (Are buttons big or small?) so even adults only reached 76 percent of generic responses to bare plurals. Although this experiment had a non-generic bias, it proved that even 2-year olds can understand that the question “ ‘Do birds fly?” is not about the birds in the immediate context, but about the abstract, conceptual kind, not present. A similar experiment focusing on definite interpretations by Pérez-Leroux et al. (2004) also contrasted preschoolers’ ability to distinguish definite plurals from bare plurals. These authors were interested instead in a linguistic question: whether children learning a language like English could treat definite plurals as generic, as is the case in the Romance (Chierchia Type II) languages. That study presented atypical entities after a story about how unusual they are, and followed up with the given questions (are zebras/the zebras spotted?). Children had ceiling performance with bare nouns, but high rates of definite generic errors, particularly when the question was presented

The Expression of Genericity in Child Language 579 after a short delay. The results in these two studies converge in the conclusion that English-speaking children can use the grammatical contrast between definite and bare plurals to obtain a generic interpretation. However, they differ as to where the performance gaps were, with respect to adult performance: in the specific definites, or the generic bare plurals. The difference in results suggests that contextual presentation and content of properties plays a role in determining the interpretation of these NP contrasts. Other studies such as Chambers et al. (2008), or Cimpian and Markman (2008) have tested instead children’s discrimination of sentences with bare plurals and demonstratives. Although there is no typological divide between languages with demonstrative versus bare plural generics, demonstratives can, under rather special circumstances, be used generically. Demonstrative NPs are generally linked to situational context, as in (27a), but in (27b) it is used to associate to a subtype of a kind, while pointing at some exemplars. (27) a. These cats are hungry, but that one is not. b. Those cars are expensive.

(Deictic reference) (Sub-type of the kind car)

However, the overwhelming majority of the uses of demonstratives are clearly referential and contextually-bound. Using demonstrative plurals as control, Chambers et al. (2008) confirmed sensitivity to syntactic form in generic interpretations. Four-year- olds were tested for generic comprehension by means of a story describing a novel kind with either a bare plural generic or a (non-generic) plural demonstrative. After the novel animals were introduced (as in either (28a) or (28b)), the children were asked if the property would extend to a new exemplar (28c): (28) a. These are pagons. Pagons are friendly. b. These are pagons. These pagons are friendly. c. Is this (new) pagon friendly? Positive verifications nearly doubled when the property had appeared with a bare plural as opposed to a demonstrative. Children’s generalizations were not affected by either the amount of positive evidence (number of friendly pagons), or by negative evidence (an unfriendly pagon). Instead, children’s willingness to generalize was determined by linguistic cues. This was also found for possessives with novel nouns (Do dobles/ my dobles have claws) by Gelman and Bloom (2007), or for demonstratives with actual kinds (Cimpian and Markman 2008). The general conclusion is that children acquiring English have no difficulty in using determiners to block generic interpretation. The distinction in interpretation is already in place by the age of 2 (Graham et al. 2011) What about other language types and other types of cues? To date, two studies have tested generic expressions in other languages: Pérez-Leroux et al.’s (2004) comparative study of definite plurals, and Munn et al.’s (2009) test of generic interpretation in Mandarin bare nouns. Pérez-Leroux and colleagues tested whether definite determiners have a default generic interpretation, contrasting a language where these are ambiguous

580 Ana T. Pérez-Leroux (Spanish) with one that disallows definite plural generics (English). English children in their study, although clearly discriminating between bare plurals and definite plural NPs, frequently allowed a generic interpretation to the definite plural in English. In Spanish, where the definite is ambiguous, children and adults had high rates of generic responses accepting typical properties (zebras and stripes), and rejecting the atypical properties (zebras and spots). Children did not go through a stage where the generic interpretation was not available. As in other studies, children limited the generic interpretations when presented with control demonstrative NPs. (29) Las cebras/Esas cebras tienen manchas? ‘Do the zebras/those zebras have spots?’ A similar comprehension study by Munn et al. (2009) shows that Mandarin-speaking children, like their English and Spanish peers, have no difficulty with generic interpretations. These authors presented preschool and school-aged children with scenarios with atypical members of a kind (boys with wheels instead of legs), and a question about generic interpretation (Do boys have legs?), contrasting bare nouns, with nouns marked with the pluralizer -men. This marker has a distribution akin to classifiers, and restricts the NP to a definite interpretation. Chinese adults provided more generic responses to bare nouns than to marked NPs, showing sensitivity to grammatical form, but clearly preferred the non-generic interpretation. Interestingly, younger children were not sensitive to grammatical form, and clearly preferred the generic reading in both cases. With age, the generic bias decreases abruptly, and the school-aged children provided almost no generic readings even when grammatically possible. The authors note that these older children had also expressed a strong bias for linking bare NPs to discourse in specific contexts, preferring definite over indefinite interpretations of bare nouns. They conclude that while generic readings are available and preferred by younger children, older children become highly sensitive to discourse context and choosing interpretations that link to the discourse context, which results in a reduction of generic responses. Lastly, two studies have considered the role of tense and aspect in restricting generic interpretations (Pérez-Leroux et al. 2004; Cimpian et al. in press). The generic interpretation of a characterizing sentence depends on the presence of a tense/aspect marker that can yield a habitual or characterizing interpretation. In English a generic reading is possible with the generic present tense, but is blocked by the present progressive. Although past tense itself is not incompatible with genericity, it often prevents a generic reading. When past events have affected the whole kind, it is appropriate to refer to it in the past (as in (5a)). But for a property of current applicability such as (30) to be in the past is a signal that the NP is to be interpreted referentially. Otherwise, if interpreted generically, the sentence has the false implication that either tigers have evolved and no longer eat meat, or are now extinct. (30) ¿Los tigres comían carne? ‘Were the tigers eating meat?’

The Expression of Genericity in Child Language 581 Pérez-Leroux et al. (2004) presented Spanish and English children with past tense versions of the same stories about atypical kinds, followed up by present and past definite plural sentences. For Spanish-speaking children, the generic interpretations of the ambiguous definite plurals were significantly reduced when the question was in the past tense Cimpian et al. (in press) also tested tense. Their study focused on whether children could interpret indefinite singular NPs as generic, despite the fact that indefinite singulars do not appear as frequently as generics in the input. A picture of an atypical yellow strawberry was followed by a prompt such as in (31). Children had no problem differentiating between the referent-linked definite singular, and the generic indefinite singular. (31) What color is a/the strawberry? In a subsequent experiment children were given present, present progressive, and past statements about an indefinite subject. (32) A bat sleeps/is sleeping/slept upside down. Children were aware that tense is a constraining factor for generic reference, becoming less willing to attribute the property to ‘a whole lot of bats’ when they heard the introductory sentence in the present progressive or the simple past. In sum, the comprehension evidence suggests that generic meanings are always available to children. Learning seems to consist of acquiring the relevant grammatical constraints. They show early sensitivity to most grammatical markers that constrain or license generic interpretation. However, the studies also reveal a certain degree of development, with sensitivity to the interpretive restrictions of some grammatical markers increasing over time.

24.3.6 The Role of Pragmatics Some of the studies in the literature have considered the extra grammatical cues that speakers use to determine whether a given sentence can be interpreted generically. Gelman and Raman’s (2003) study included an experiment testing whether number mismatch was a relevant cue for children. Pictures of an atypical object (i.e. one small elephant) were followed by a question that either matched or mismatched it in number (Is it small? versus Are they small?). Even the youngest children were able to distinguish between the matched and the mismatched condition, giving very few generic responses in the match condition. To follow up on whether children were responding to plurality or to number, these authors subsequently contrasted a mismatched plural target with a matched plural control. Two-year-olds were not able to rely on mismatch alone as a cue. They conclude 2-year-olds can reliably use morphosyntactic but not contextual cues to distinguish generics.

582 Ana T. Pérez-Leroux Other studies have examined the role of speakers in children’s willingness to accept properties as generic and to extend it to other entities. Stock et al. (2009) tested whether children can integrate grammatical sensitivity to generics with social cues. A non- obvious property of a novel kind was presented with either a bare plural or a demonstrative singular. The speaker was self-identified as uncertain, confident, or neutral about the information relayed. (32) I think/I know this wug can see in the dark. This wug can see in the dark. Children were subsequently asked if a second exemplar shared the relevant property. When the NP in the property description was a bare plural generic, children moderated their extensions according to speaker confidence. However, confidence had no effect on children’s responses following nongeneric (demonstrative) descriptions. Children, then, do not put blind faith in generic statements, but use speaker certainty when deciding whether to accept a generalization. Similarly, Cimpian and Markman (2008) tested whether children’s interpretation of a sentence as generic depended on the identity of the speaker. As hypothesized, children interpreted the sentences uttered by a teacher character (whose primary role is to provide generic knowledge) more often as generic than the same sentence when uttered by a veterinarian (whose primary role is to make judgments about the health of specific animals).

24.4 Learnability Considerations 24.4.1 Addressing the Learnability Problem: The Semantics The complexity of defining the semantics of the generic operator has led authors such as Leslie (2008) to suggest that to acquire the meaning of generic sentences the child cannot solely rely on direct experience between sentences and their context of use. Instead, children must be able to use some core knowledge of grammar, such as the role of operators and sentence structure to arrive at the right results. The generic operator Gen is a default operator that arises to bind any variables that remain free in a sentence’s restrictor once all the articulated quantifiers in the sentence have bound their variables. How could a language learner acquire the meaning of Gen? At first glance, overtly realized quantifiers such as every and most might seem easier to acquire, because they have a phonological form, their meanings more straightforward, and their denotations have a clear mapping in statistical terms. In contrast, the generic operator is to be inferred by its absence. Children are exposed to a complex set of experiences that may (but do not necessarily) support a generalization, and then hear sentences that (a) contain no overt quantifiers, and (b) have an interpretation that is not contextually bound.

The Expression of Genericity in Child Language 583 Despite the complexity of the association between generic meanings and state of affairs in the world, children do not seem to experience difficulties. The generic meanings are available to them, and they use them early. This suggests that children do not exactly learn the set of forms that map into the complex set of truth conditions underlying generics in the same way they learn to map overt quantifiers. Instead, Leslie (2008) argues, it seems simpler to propose that generics are the linguistic counterpart of children’s general tendency towards inductive learning. Children’s fundamental ability to generalize beyond particular experience is manifested early and robustly in infancy. Linguistic labels for objects aid infants in their generalizations. Yet, in the absence of labels, infants can still generalize to similar objects (Graham et al. 2004). This ability goes beyond language, but can be easily represented in a linguistic format. Leslie (2008) suggests that the relation between the extensions of the restrictor and the scope in a generic sentence is computed by the mind’s default ability to generalize from instances. Thus generalizations are represented as the logical form of a generic judgment. Consider a situation in which a child is taken to the farm, and observes a farmer milking a cow. He hears his mother say “That’s where milk comes from.” Such a child is likely to make the proper generalization to the kind cow. When providing a propositional format for this generalization, the child is in fact formulating a generic judgment, which can be represented in the familiar tripartite structure described in section 24.2.4. (33) Cows give milk. Gen [[x is a cow][x gives milk]] Children then do not need to learn how to process the semantics of generic expressions: it is part of their innate ability to generalize. All they need to know is two things: one, that all free variables in logical form need to be bound for the representation to be interpreted, and two, whether a given combination of grammatical markers leaves open the possibility for a given variable be bound to the generic operator. The implications are that the default operator Gen does not have to be learned; overt quantifiers do. To this purpose, Leslie considers the evidence in Hollander et al. (2002) on children’s interpretation of all, some and bare plural generics. Children were presented with yes/no questions about categories with respect to properties that applied universally within a category (Are fires hot?), properties that applied to a narrow segment of the category (Do bears have white fur?) or properties that do not apply to the category (Do pencils have noses?). Three-year-olds did not differentiate between the three types of noun phrases, showing problems with all and some, while accepting generics at rates comparable to 4-year-olds and adults. One possible interpretation of these findings is that children are ignoring the quantifiers all and some, and are treating all three versions of the questions as generic. Note that children’s difficulties were not lexical: these children had no problem understanding sentences with some and all when reference was evaluated against a concrete context (Are all/some/none of the crayons in a box?). Their difficulties were specific to the task of attributing general properties to kinds of things in the abstract. Here they seem to revert to the generic baselines. This fact favors the view that generic interpretations are not difficult for young children, but instead are “built-in” meanings to be relied

584 Ana T. Pérez-Leroux on during difficult comprehension tasks. Interestingly, adults show that universally quantified statements can become swept up into a generic interpretation when the properties being mentioned in the sentence are of the right kind. This generic overgeneralization effect is predicted by the hypothesis that generics express primitive, default generalizations.

24.4.2 Addressing the Learnability Problem: The Syntax Can children use the available input to learn which NPs are generic in their language? The literature reviewed in this chapter focuses on the problem that in English, parents use various NP forms for their generic expressions, so that NP form underdetermines interpretation. Sneed (2008) counters that the input is only uninformative if children only look at NP form without considering NP distribution. She integrates Diesing’s syntactic analysis of generics with a theory of learning that attends to differences in distributions and frequencies of markers. She proposes a learning algorithm that shifts the burden of acquiring genericity to the clausal architecture, by taking advantage of children’s knowledge of the mappings between syntax and semantics. The assumption is that universal, innate knowledge of language provides children with the basic principles for organizing the input. Under her proposal, children possess an understanding of:

a. the universal set of possible determiner meanings; b. the ambiguity of indefinites as existential or generics; c. the structural consequences of the individual level/stage level distinction; and d. the interaction between structural position at LF and the application of existential closure.

Under Diesing’s (1992) mapping hypothesis, the position of the subject at Logical Form determines its interpretation. This is implemented in the last step of Sneed’s (2008) learning algorithm. Indefinite subjects are interpreted existentially (in the nuclear scope) if they remain inside VP in Logical Form, and generically (in the restrictor), if raised above VP. Children would use the input to sort out the mapping between morphological determiners and semantic determiners. Thus, in this system the key piece of knowledge is which NPs are indefinites versus definites. To test this algorithm, Sneed analyzed parental inputs in corpora of storytelling and spontaneous interactions. Referential expressions were distributed among subject, object, and other syntactic positions, but generic and existential NPs appeared in specialized distributions. Generic NPs were found overwhelmingly in subject position, whereas existentials appeared there infrequently. The input was also sufficiently informative with regards to definiteness. Indefinites appeared twice as often as first versus second mention of a referent, whereas definites appeared at roughly the same proportions in first or second mention. Further classification of indefinites into singular indefinites

The Expression of Genericity in Child Language 585 and bare plurals shows this to be quite consistent, with bare plural subjects being used exclusively with the generic reading. Sneed also tested the implications of her algorithm experimentally. Since existential and generic bare plurals across syntactic positions have a (nearly) complementary distribution, existential bare plural subjects rarely appear in the input. Sneed thus proposed that a purely input-driven mechanism would predict initial difficulties with existential bare plural subjects. In contrast, a learning mechanism that is sensitive to structure (as relevant to the mapping hypothesis) will link ambiguity to predicate type. In other words, her learning model predicts plural subjects should remain ambiguous in the clauses containing stage-level predicates. Four-year olds were asked to judge ambiguous bare plural sentences (Alligators are in the desert). These were presented after story contexts that validated the existential reading and negated the generic (“There are alligators in the desert today but generally, in this zoo, they are in the river,” or vice versa. Children behaved like adults, accepting both interpretations of bare plurals. At the same time, these children were able to reject comparable unambiguous sentences with existential constructions (There are alligators in the river) and adverbs of quantification (Usually, alligators are in the desert). Children found the existential reading of bare subjects easy, even though this interpretation is not well supported in input frequencies.

24.5 Expanding the Learnability Perspective: Other Languages As we have seen, languages vary as to the ambiguities associated with generic sentences, and on which markers are available to block generic readings. Beyond these formal differences in linguistic structure, conceptually, languages of the world remain relatively invariant. We have also seen that children can integrate grammatical and pragmatic cues to determine generic interpretation. Furthermore, when deciding to generalize from a sentence they have heard, children pay attention to the speaker’s knowledge base. Across languages and input conditions, children use generics appropriately, early, and robustly. In comprehension experiments, children seem quite willing to interpret overt NPs generically, even for structures whose grammatical form precludes these readings. However, structural variation may yield language-specific variations in the frequency, age of acquisition, and conceptual implications of particular forms (Gelman and Tardif 1998). The empirical evidence in children’s comprehension and production of generic statements shows quite clearly that generic operators do not present a learning problem for the child. On the contrary, children can access these interpretations and appear aware of the robustness and contextual independence of generic expressions. They also seem to be able to exploit innate complex considerations about the mapping between sentence structure and NP interpretation. The learning problem seems to be located therefore neither in the semantics, nor the pragmatics nor the logical form of generics,

586 Ana T. Pérez-Leroux but in their morphosyntax. We have seen some evidence of generic overgeneralization, and also evidence that, with experience, children refine the meanings of the relevant grammatical ingredients that delimit generic interpretations. Despite the remarkable complexity of generics, the structure of the domain is now sufficiently well-understood to support the formulation of precise learning algorithms such as the one proposed by Sneed, where detailed learning steps are articulated and matched to an analysis of the input. However, the proposed learning algorithm is not yet general enough. It does not extend to definites, and therefore can only account for part of the domain of generics in English, that is, for bare plurals to the exclusion of singular definite generics. More importantly, it cannot account at all for languages with definite plural generic subjects (such as Spanish and other Romance languages). The essence of the approach could be generalized: taking as a point of departure a distributional comparison of the forms available in each grammar, and integrating this distribution with the relationship between predicate types and the position of subjects. Thus, although much more work is needed to explain acquisition within language diversity, this general approach provides a fruitful template to address the acquisition of generic language. Despite the many questions that remain about the learning path, the general insight is clear: Children do not need to actively “learn” generic meanings, as demonstrated by the striking fact that deaf children without a language community learn to express them in the absence of explicit language models. This striking fact supports the view that generics are part of a basic, fundamental cognitive ability. Children more broadly do not seem to require lots of special experience to determine which markers allow generic meanings, despite the complexity and variation in this linguistic domain. The evidence suggests that learning generics is a concurrent product of the basic task of language acquisition. Independently from the generic task, children must identify and map the basic grammatical elements of the target language. By integrating specific knowledge of morphosyntax with general knowledge of the articulation of sentence structure and interpretation, children acquire the linguistic expression of genericity.

Chapter 25

Lexical a nd Gr amm atica l Aspe c t Angeliek van Hout

25.1 Introduction The temporality of a given situation “out there in the world” can be described in many ways. Tense and aspect offer the essential parameters. Lexical aspect characterizes the temporal profile of event descriptions; a situation with a sleeping child can be referred to as a state of affairs (be asleep) or as a happening (sleep, wake up). Grammatical aspect imposes a perspective by focusing a particular time slice of a situation, such as the ongoing process (the baby was sleeping, mom was waking up the baby), the event as a whole (the baby slept, mom woke up the baby), or the resulting state (the baby has slept, mom has woken the baby up). Tense locates a situation at a certain time (was sleeping, is sleeping, will sleep). Temporality is thus determined by the three grammatical notions of lexical aspect, grammatical aspect, and tense. Tense anchors the time of an event vis-à-vis a reference time, often the moment of speech. Aspect imposes one or more layers of temporal structure on the event time thus defining its temporal properties. Lexical aspect (also called “situation type,” “inner aspect,” “Aktionsart”) characterizes the temporal contour, while grammatical aspect (or “viewpoint aspect,” “outer aspect”) determines the temporal viewpoint on the run-time of the event. The grammatical expression of aspect varies enormously: while there are languages with one, a few or many types of aspectual markers, other languages have no dedicated aspect markers whatsoever. In some languages tense and aspect morphology is conflated. This cross-linguistic variation makes aspect an interesting domain of linguistic investigation and a wide variety of acquisition studies have investigated aspect development in children, raising questions about its universal versus language-specific properties. In this chapter I review the acquisition literature of lexical and grammatical aspect; tense is beyond its scope. Section 25.2 summarizes the fundamental generalizations and

588 Angeliek van Hout linguistic analyses of the two types of aspect, and presents the cross-linguistic variation in aspect expression which raises questions of learnability. Section 25.3 presents the acquisition literature for lexical aspect, and section 25.4 does the same for grammatical aspect. Section 25.5 draws conclusions about the coverage of aspect acquisition research to date and presents an outlook on novel directions of research.

25.2 Aspectual Generalizations and Cross-linguistic Variation Sections 25.2.1 and 25.2.2 review the fundamental generalizations of lexical and grammatical aspect, and the connections between these two aspectual notions. Section 25.2.3 presents the range of cross-linguistic variation. For recent overviews of aspect theories, I refer to Rothstein (2004), van Hout et al. (2005), Verkuyl (2005), Klein (2009), and Filip (2012). Dahl (2000) presents a comprehensive volume on tense-aspect typology.

25.2.1 Lexical Aspect Situations “out there in the world” are not of a particular type and do not have any intrinsic temporal structure. It is the linguistic description of a situation that individuates an event or state in the world by “carving out” a certain “time slice” of a situation (Parsons 1990; Filip 1993; Partee 1999). Lexical aspect is a property of a linguistic description as given by the verb phrase, without tense and grammatical aspect. It involves lexical features as well as syntactic-semantic properties. Analyzing the lexical and formal-semantic properties of different aspectual classes in terms of event types, theories of aspect aim to explain why the different classes behave as they do in various linguistic tests. Vendler (1957) distinguishes four “time schemata”—nowadays referred to as aspectual classes or event types: state, activity, accomplishment, and achievement. Smith (1991a) defines these aspectual classes with the temporal properties of dynamism, durativity, and telicity (Table 25.1). States involve a period of undifferentiated time, whereas dynamic situations involve temporal activity and change. Durativity refers to the length of an event time: extended or instantaneous. Telicity (from the Greek telos ‘goal’) has been characterized as “culmination,” “natural endpoint,” and “set terminal endpoint.” Over the years researchers have proposed alternative classifications for the Vendler classes: collapsing the two telic classes into one category called “event,” taking accomplishments as a special subclass of activities; adding other (sub)classes (e.g. “semelfactives”). Aspectual classes are a useful linguistic tool, because they behave differently linguistically. Dowty (1979) presents a comprehensive overview of tests that serve as diagnostics to establish aspectual class, a few of which are listed in Table 25.2. Instead of aspectual features, formal-semantic

Lexical and Grammatical Aspect 589 Table 25.1 Vendler classes characterized by aspectual features Dynamic

Durative

Telic

State

–

+

–

be asleep, believe, trust, love

Activity

+

+

–

sleep, run, sing

Accomplishment

+

+

+

run a mile, make a chair, write a book

Achievement

+

–

+

wake up, reach the top, recognize, die

Table 25.2 Aspectual tests and entailments Diagnoses

OK

not OK

Progressive -ing

dynamicity

sleeping waking the baby up making a chair

*being asleep *loving the baby *knowing the answer

Durative adverbials for an hour

telicity

be asleep for an hour sleep for an hour

*make a chair for an hour *wake the baby up for an hour

Time-frame adverbials in an hour

telicity

make a chair in an hour *be asleep in a hour wake the baby up in an *sleep in an hour hour

V-ing entails has V-ed

telicity

the baby was sleeping → she has slept

she was waking up the baby she has woken her up

Subinterval property V from 1-3 entails V from 1-2

homogeneity

sleep from 1–3 → sleep from 1–2

make a chair from 1–3 make a chair from 1–2

V from 1–2 and 2–3 entails V from 1–3

cumulativity

sleep from 1–2 and sleep 2–3 → sleep from 1–3

make a chair from 1–2 and make a chair from 2–3 make a chair from 1–3

theories on the other hand use the properties of homogeneity and cumulativity to describe different event types (Bach 1986; Krifka 1989b; Kamp and Reyle 1993). Atelic predicates are homogeneous and cumulative; telic predicates are not (a predicate X is homogeneous if any subpart of the denotation of X can also be referred to as X; a predicate X is cumulative if two or more events in the denotation of X taken together can also be referred to as X). Thus defined, the telic–atelic distinction brings out specific implications and entailments, see Table 25.2.

590 Angeliek van Hout Even though the term “lexical aspect” is widely used, which is why I employ it in this chapter, the label is rather unfortunate, because aspectual class is not determined by the lexical properties of the verb on its own. “Compositional telicity” refers to the effect that the semantics of the verb’s arguments (whether or not they specify a particular quantity) co-determines telicity; compare telic write a letter versus atelic write letters and write poetry (Verkuyl 1972, 1993; Bach 1986; Krifka 1989b). This effect is modulated by a lexical factor: it arises only with verbs whose objects undergo an incremental change (write a letter is telic, the letter comes into existence over the course of writing), but not when the object is not so affected (push a cart is atelic, even if the object is quantized). Other predicative elements in the VP furthermore contribute to “predicate telicity,” including verb particles, prefixes, and directional PPs. Compare atelic run versus telic run to the store, and atelic eat versus telic eat up (Brinton 1985; van Hout 1996; Filip 2003). Finally, the mapping of the verb´s argument structure onto syntax plays a role. Direct objects (but not indirect objects) contribute to compositional telicity. This generalization extends to unaccusatives (Borer 1994; van Hout 2000). It reveals that telicity is also determined by syntax (Tenny 1987, 1994). The aspectual function of direct objects and unaccusative subjects in the syntax of telicity has been analyzed in many different theories of syntax: with a special thematic role (Tenny 1987; Dowty 1991; Krifka 1992), object case (van Hout 1996; Schmitt 1996; Ramchand 1997;), an aspect projection for aspect checking (Travis 1994; Demirdache and Uribe-Etxebarria 2000; Borer 2005; MacDonald 2008), and a telicity parameter (Slabakova 2001; Filip and Rothstein 2005). Summarizing the fundamentals, lexical aspect is a property of VPs and depends compositionally on the verb, its arguments, and other predicative elements. Aspectual classes are characterized by aspectual features and behave differently in various linguistic tests. Different theories engage different sets of features, and propose various classifications. Lexical aspect, in particular, the robust association between telicity and direct objecthood, involves the syntax–semantics interface.

25.2.2 Grammatical Aspect Theories There are many possible ways of looking at a given situation “out there in the world.” While lexical aspect presents a situation as a certain event type, grammatical aspect defines a particular viewpoint on it. Informally, grammatical aspect involves the “different ways of viewing the internal temporal constituency of a situation,” with perfective aspect as “the view of the situation as a single whole,” while imperfective aspect “pays essential attention to the internal structure of the situation” Comrie (1976: 3, 16). Formal-semantic theories define aspectual viewpoints in terms of aspectual operators which specify a relation between the run-time of an event—the interval along which an event stretches out in time—and a reference time (Smith 1991a; Kamp and Reyle 1993; Klein 1994; de Swart 1998; Demirdache and Uribe-Etxebarria 2007). Thus, perfective aspect includes the event’s run-time inside the reference time interval (Figure 25.1),

Lexical and Grammatical Aspect 591

Reference time Run-time

Figure 25.1 Perfective aspect.

Run-time Reference time

Figure 25.2 Imperfective aspect.

whereas imperfective aspect includes the reference time inside the run-time interval (Figure 25.2). Different aspectual implications and entailments follow. With perfective aspect, the entire event including the initial and final boundaries falls within a reference time interval. As a result, telic predicates modified by perfective aspect have an entailment of completion, and atelic predicates have an entailment of termination, and so the event has culminated or terminated. In contrast, imperfective aspect makes an assertion about part of the event’s run-time, and so there is no completion or termination entailment. The continuation test in (1) is based on this entailment; imperfective aspect in the first conjunct allows continuation in the second conjunct, (1a), but perfective aspect leads to a contradiction, indicated by # in (1b). (1) a. John was making a chair, and he may still be making a chair. b. # John made a chair, and he may still be making a chair. The definition of imperfective aspect as inclusion of the reference time in the run-time also explains the so-called “imperfective paradox,” the fact that a telic predicate when it is modified by imperfective aspect can refer to an incomplete event. If Mary is crossing the street, she may be interrupted and never reach the other side, and so the crossing is incomplete. Thus, culmination as given by the telicity of the predicate is cancelled by imperfective aspect. This follows because imperfective aspect lacks a completion entailment. Note that there is no imperfective paradox for atelic predicates: an atelic description modified by either perfective or imperfective aspect is true, also when an event is interrupted. Incompleteness of an event matters only for telic predicates and not atelic ones. This insight has often been used in experimental designs that test for telicity and/or perfective aspect. Other aspect operators define more aspects. Progressive aspect focuses the ongoing progress excluding the initial and final boundaries of the run-time. Perfect aspect focuses the resulting state immediately following the final boundary. Inchoative aspect focuses the start of the run-time immediately at the initial boundary. Delimitative aspect specifies a certain duration of the run-time. In addition to operators that yield episodic events, iterative and habitual aspect involve quantification over the run-time variable.

592 Angeliek van Hout Grammatical aspect is linguistically encoded by morphology, periphrasis, and aspectual adverbials, which modify the event description as determined by the verb phrase. There can be several grammatical aspect markers in one clause; the event description is modified by more layers of aspectual modification, with two or more operators applying in succession (for example, progressive and perfect aspect in has been sleeping). Thus, lexical and grammatical aspect together define the temporal properties of an event description. Tense marks the relation between the time of the described event and a deictically given reference moment, often the moment of speech, or else a moment established in the discourse. I do not review tense semantics (see Klein 1994, 2009; Binnick 2012), nor the acquisition of tense in this chapter (see Weist 2002; Wagner 2012).

25.2.3 Cross-linguistic Variation Cross-linguistically, the linguistic elements that determine lexical aspect and aspectual class do not essentially vary; the verb’s lexical aspect plays the most crucial role. The way in which the quantificational properties of the direct object affect telicity seems fairly constant across languages; the same goes for the aspectal role of verb particles and prefixes, directional PPs, and resultative constructions, to the extent that a given language has those constructions. As for grammatical aspect, all languages have ways to express notions such as stop or start an action, or be in the middle of it, if only with aspectual adverbials. However, there is a lot of cross-linguistic variability with grammatical encoding—whether or not it is encoded at all, and, if so, which aspects are expressed how (Comrie 1976; Dahl 1985, 2000). In some languages aspect is a morphological category, which means that verbs obligatorily carry aspect, as they come in aspectual pairs of perfective and imperfective forms. Another source of variation is how aspect is encoded: with morphology on the verb; with conflated aspect–tense morphology, with particles, periphrasis, or in some other, more indirect way. Table 25.3 illustrates different types of perfective–imperfective marking. Languages with morphological aspect, such as the Slavic languages and Greek, differ as to how the pairs are formed morphologically; the number of different (im)perfectivizing prefixes or infixes and the extent to which these combine with any verb or are lexically restricted; whether the aspectual opposition only surfaces in finite verbs, or also in nonfinite verbs; the possible uses of each member of a pair, etc. For many Slavic pairs the imperfective form of the verb is morphologically simple and the perfective form is derived from it by prefixation, stem change, or stress shift. Greek aspectual pairs are formed by stem alternation. The Romance languages contrast perfective and imperfective aspect in the past tense, conflating aspect and tense morphology, for example, imperfective Pretérito imperfecto and perfective Pretérito definido in Spanish. Aspect particles (also called “converbs”) present another type. These are invariable, short words that directly precede or follow the verb, for example, imperfective zai and perfective le in Mandarin Chinese. Their status is somewhere in between verb morphology and temporal adverbs. The

Lexical and Grammatical Aspect 593 Table 25.3 Various types of perfective–imperfective marking Aspect formation

Imperfective form

Perfective form

Language

Aspectual pairs

Klaun budował most. clown build.imperf.past bridge

Klaun z-budował most. clown perf-build.past bridge

Polish

Aspectual pairs

O kloun ehtize mia yefira. the clown build.imperf.past a bridge

O kloun ehtise mia yefira. the clown build.perf.past a bridge

Greek

Aspectual tenses

El payaso construía un puente. the clown build.imperf.past a bridge

El payaso construyó un puente. the clown build.perf.past a bridge

Spanish

Aspect particles

Zhe4-ge4 xiao3-chuo3 zai4 xiu1 qiao2. the clown asp build bridge

Zhe4-ge4 xiao3-chuo3 xiu1-le Mandarin yi2-zuo4 qiao2. Chinese the clown build-asp a-cl bridge

‘The clown was building a bridge.’ ‘The clown used to build a bridge.’

‘The clown built a bridge.’

English gloss

meaning of some particles is specifically aspectual; with others aspect is conflated with tense, for example Mandarin le entails a past tense. Other languages with aspect particles include Bahasa Indonesia, Tagalog, and Yoruba; moreover, creole languages typically have aspect (and tense) particles, for example Papiamentu and Sranan. Perfective verb forms entail completion (with telic predicates) or termination (with atelic ones), whereas imperfective verbs are typically used for ongoing actions and for habitual events (hence the two English translations in the Imperfective column). Aspect marking can also involve analytical constructions, consisting of an auxiliary and nonfinite form of the main verb. Many languages have a Perfect (formed by auxiliary have or be plus past participle), (2), and a Progressive (formed by auxiliary be plus gerund or a periphrasis form), (3), illustrated here for Italian, Dutch, and English. The meaning of the Progressive construction is essentially uniform across languages; it refers to the ongoing action. The Perfect on the other hand is more variable: while its form (auxiliary plus participle) and its essential meaning (focus on the result state after the event) are the same across languages, its use varies considerably, which affects possible combinations with adverbials and has effects in narratives (de Swart 2007). (2) a. Il pagliaccio ha costruito un ponte. b. De clown heeft een brug gebouwd. c. The clown has built a bridge.

(Italian) (Dutch) (English)

(3) a. Il pagliaccio stava costruiendo un ponte. b. De clown was een brug aan het bouwen. c. The clown was building a bridge.

(Italian) (Dutch) (English)

594 Angeliek van Hout Some languages have no aspectual marking on the verb; instead, aspectual effects are associated with other elements in the clause. In Finnish and Estonian, different direct object cases mark a perfective–imperfective contrast, (4). (4)

a. Kloun ehitas silda. Clown build.past bridge.part ‘The clown was building a bridge.’ b. Kloun ehitas silla. Clown build.past bridge.gen ‘The clown built a bridge.’

(Estonian)

25.3 Learnability Issues in Aspect This brief cross-linguistic review shows that acquiring aspect fundamentally involves learning the form–meaning associations of a given language. Given the variation in aspect expression, the learner can take little to nothing for granted, not even a cue such as “look for aspect markers on the verb” (cf. (4)). The learner´s task therefore is to discover if, in her language, aspect is grammatically encoded at all, and if so, how: which aspects are expressed by which morphosyntactic form, and vice versa, which forms carry which aspectual meaning. How does the learner determine the proper meaning of a certain form? Does she initially posit incorrect mappings, and if so, what brings her to give them up? This relates to more general questions in form–meaning acquisition. Is the learner equipped with universal meaning primitives which guide her to discover the forms in her language that carry them? Alternatively, do certain aspect forms stand out, possibly because they are morphologically salient or occur frequently, and call out to be associated to a meaning? Are certain meanings more basic than others, and therefore easier to acquire? And how does cognitive development of temporality interfere in this process? Two classic learnability theories are scaffolded on universal aspectual primitives: Bickerton’s (1984) Language Bioprogram and Slobin’s (1985) Basic Child Grammar. For Bickerton human grammars are designed with the semantic distinctions state/process and punctual/non-punctual, whereas for Slobin the notions result and process are the two major temporal perspectives, with result as a particularly salient cognitive category. These theories essentially argue that children face the acquisition task equipped with a universal and uniform set of pre-linguistic semantic notions, or “cognitive prerequisites” (Slobin 1973). Such privileged concepts prompt learners to discover which grammatical forms map onto them in their language, thus setting up a semantic space which “provides the Basic Child Grammar with a level of organization that serves as an opening wedge to the acquisition of language-specific grammatical distinctions, without at first biasing them towards any particular language” (Slobin 1985: 1184).

Lexical and Grammatical Aspect 595 In their Introduction to a special volume of First Language Shirai et al. (1998) take stock with 25 years of tense–aspect acquisition research and the learnability issues involved. They observe that young children seem to be universally sensitive to salient event characteristics (such as “result” and “change of state”), which leads to early acquisition when a language marks this characteristic in a reliable way (by way of inflection, a verb form, or particle). However, the authors warn against concluding that there is “a predetermined set of universal temporal-aspectual categories—whether innately specified or the result of prelinguistic cognitive and social development” (1998: 249). Rather, typological differences in the exact semantics of tense–aspect markers suggest that “individual languages draw upon the same general stock of distinctions, but with differing levels of granularity and with different ‘packagings’ for morphosyntactic expression” (1998: 249). Bowerman (1985) finds that there is an immediate influence of the variable semantics of the target language in children learning different languages and concludes that the “starting semantic space is flexible, not tightly structured” (Bowerman 1985: 1303). Inspired by the increasing evidence for the cross-linguistic diversity of form– meaning mappings already manifested in the earliest meanings of children’s utterances, Slobin (1997) radically revises his initial position replacing it with a form of linguistic determinism: “Children use linguistic cues to discover the collections of semantic elements that are packaged in the lexical and grammatical items of the language” (Slobin 1997: 318). Shirai et al. (1998) concur that the cross-linguistic variation in the levels of “granularity” of temporal semantics reveals itself in the child’s earliest constructions of temporal space which develop into “particular ‘packages’ of features that receive linguistic expression…. Accordingly, learning a language is learning how to think in the terms of that language in order to speak and to understand” (Slobin 2008: 8, 11). Cross- linguistic differences in children’s earliest tense–aspect forms show that learners establish this level of granularity in their language from early on. While Shirai and colleagues concede that there may exist aspectual primitives, albeit at a lower granular level than typically assumed in developmental theories, they conclude that “(l)inguistic marking guides the child to the formation of language-specific categories. It remains to be determined precisely what event characteristics are salient, and whether they are all universal and available to children everywhere” (Shirai et al. 1998: 250). I now review the aspect acquisition literature in order to evaluate if any such universal event characteristics have been established in developmental research (before 1998 and since). The central question is therefore: Are there any aspectual primitives, possibly innate, that help children in the task of acquiring the form–meaning relations of aspect? I focus in particular on telicity and on the perfective–imperfective distinction, as these are the most widely studied aspectual features in acquisition research. The literature review is organized along methodology: studies on tense–aspect marking in spontaneous speech in section 25.4, experimental studies on telicity comprehension in section 25.5 and experimental studies on grammatical aspect in section 25.6. For earlier reviews about the acquisition of aspect, I refer to Shirai et al. (1998), Li and Shirai (2000), Slabakova (2002), Weist (2002), and Wagner (2012).

596 Angeliek van Hout

25.4 Tense and Aspect in Spontaneous Speech Since the mid-1970s there have appeared many studies that analyze the acquisition of tense and aspect markers in spontaneous and elicited production in many different languages. Without exception, all studies find a remarkably consistent trend in the first stages of tense–aspect marking: Perfective aspect or past tense appears mostly on telic verbs, whereas imperfective aspect or present tense appears mostly on atelic verbs. Children reserve particular tenses or aspects for verbs of certain aspectual classes, undergeneralizing tense–aspect forms according to the lexical–aspectual feature of telicity. And so most verbs appear in just one tense or one aspect. This phenomenon of skewed production is referred to as the Aspect-First pattern, because the first markings seem to reflect (lexical) aspect. What varies from language to language is which particular tense–aspect forms appear in the Aspect-First pattern. In English, progressive -ing typically appears on activity verbs such as playing and holding, and not on telic verbs, while simple past tense (-ed and irregular forms) is mostly produced on telic verbs such as found, fell, and broke (Bloom et al. 1980). The French perfect (passé composé) is produced with telic verbs, a sauté ‘has jumped,’ while actions that do not lead to any result are mostly described in the present tense, flotte ‘floats’ (Bronckart and Sinclair 1973). The Italian past participle (passato prossimo) is initially only used (without auxiliary) with change-of-state verbs: caduto ‘fallen’ and trovato ‘found,’ while the present tense appears predominantly on atelic verbs: corre ‘runs’ and vole ‘flies’ (Antinucci and Miller 1976). Similar such asymmetrical patterns are attested in languages as diverse as Brazilian Portuguese, Dutch, German, Greek, Hebrew, Japanese, Inuktitut, Mandarin Chinese, Turkish, Polish, and Russian (De Lemos 1981; Stephany 1981; Berman 1983; Weist et al. 1984; Aksu-Koç 1988; Behrens 1993b; Shirai and Andersen 1995; Li and Bowerman 1998; Shirai 1998; Swift 2004; Gagarina 2008; van Hout and Veenstra 2010). The Aspect-First pattern is striking because children underuse their tense–aspect forms in very specific, non-target-like ways. The pattern occurs when children first start to use tense–aspect marking, and lasts several years. In order to explain these patterns, many researchers concluded that children initially form mismappings between forms and meanings; in particular, the tense or aspect morphemes are claimed to carry the semantics of (a)telicity—a lexical aspect notion—instead of the temporal or grammatical aspect semantics that these morphemes carry in the target languages. This type of explanation is called the Primacy of Aspect hypothesis (alternatively, Aspect-before- Tense hypothesis, Defective Tense hypothesis, Aspect hypothesis). The origins of the various incarnations of the Primacy of Aspect explanation are related to equally various sources, and there is a deep tension between those that posit some form of aspectual primacy versus input-driven, frequency-based theories.

Lexical and Grammatical Aspect 597

(i) Immature cognitive development: Young children initially lack the cognitive category of time, therefore they cannot map tense inflection on the target temporal category (Bronckart and Sinclair 1973; Antinucci and Miller 1976). (ii) A predisposition: Children map the verb inflection system onto certain basic, aspectual notions, thus using aspect to learn tense (Bloom et al. 1980). (iii) Incomplete grammar: Children’s very initial grammars do not yet have a proper tense category (Weist et al. 1984). (iv) Prototypes: The learner creates the best exemplars of the prototypical tense– aspect associations: [progressive, dynamic, atelic] and [past tense, result, telic]. These prototypical associations of tense–aspect features are not innate, but deduced from the input. The learner performs a distributional analysis of the input by extracting patterns of associations between tense and aspect markers (Shirai and Andersen 1995; Andersen and Shira 1996; Li and Shirai 2000).1 The prototype explanation relies crucially on the assumption that the distribution of tense–aspect markers is also skewed in the adult language, possibly less prominently so, but at least as a weakly discernible pattern. Some studies of tense–aspect in adult speech indeed find similar patterns in the input (Shirai and Andersen 1995). However, after reanalyzing some of the same corpora used by Shirai and Andersen, Olsen and Weinberg (1999) find that the patterns of the caregivers do not completely match those of the children. Van Hout and Veenstra (2010) reach the same conclusion comparing a Dutch child’s speech versus her mother’s speech. Wagner (1998), moreover, points out that the essence of temporal semantics is missing in the prototype explanation. Tense semantics has a classical and not a prototypical structure; past and present are defined very precisely with necessary and sufficient conditions. Semantically, grammatical and lexical aspect are more closely aligned with each other than either is with tense (in fact, some semantic theories do not even distinguish the two aspect levels). Morphosyntactically, tense and grammatical aspect are more closely aligned than either is with lexical aspect. Furthermore, no language reliably grammaticizes the distinction between telic and atelic predicates. An explanation based on prototypes therefore does not seem plausible. A different explanation of the Aspect-First phenomenon says that it reflects a learning strategy according to which a learner begins by assuming she is acquiring a grammatical-aspect-only language (Olsen and Weinberg 1999). Following the Subset Principle, this is the most restrictive hypothesis given that in some languages perfective markers are indeed restricted to telic predicates and imperfective markers to atelic predicates (e.g. Mandarin Chinese, Smith 1991a). When the learner encounters positive counterevidence, she discovers that her language is in fact more liberal. She will then break the strict one-to-one rule, relaxing her initial undergeneralization. On this explanation, it is unexpected, however, that the Aspect-First pattern lasts for a number

1

Wagner (2009) finds that the same prototypical associations are processed more quickly by adults than non-prototypical associations, which suggests that there is some linguistic basis for the prototypes.

598 Angeliek van Hout of years, as counterexamples which can trigger the relaxation are readily available in the input. There are more general arguments against Primacy of Aspect type theories. All the explanations discussed in this section have in common that the presumed mappings are essentially redundant with the lexical-aspectual semantics of verbs. This raises theoretical suspicion: in the languages of the world (that we know of so far), telicity is not typically expressed by verbal inflections (section 25.2). Theories that posit incorrect form–meaning mappings need to explain why child grammars initially posit atypical mappings—tense or aspect inflections associated with the semantic notion of (a) telicity—which are moreover not strictly obeyed and will have to be abandoned later on in development. The early use of inflection is effectively a mismapping or a too restricted mapping, and so the child must unlearn her initial, incorrect form–meaning mappings in order to progress to the target ones. But no theory spells out the triggers for such a re-mapping (with the exception of Olsen and Weinberg 1999). Furthermore, the skewed form–meaning mappings in child language are not absolute associations between tense–aspect forms and lexical aspect class, but rather strong tendencies (Smith 1980; Weist et al. 1984). However, theories that posit an association between tense–aspect morphemes and lexical aspect, predict an absolute pattern. There are, at least, two other alternative explanations for the Aspect-First pattern in child languages, both of which are grounded in semantic theory. Bohnemeyer and Swift (2004) suggest that it can be explained with their formal-semantic notion of event realization. They relate the so-called natural connection between telicity and perfectivity to the imperfective paradox: for an event to be realized (i.e. to assert that the event occurred), a telic eventuality description is dependent on perfective aspect. When the event is interrupted, it cannot be truthfully described with a telic predicate. Bohnemeyer and Swift argue that children’s Aspect-First distribution pattern reflects a “preference for aspectual reference under event realization” (2004: 292), assuming that children are cognitively biased to talk about realized events.2 This preference triggers the use of perfective aspect marking for telic verbs, because for telics only perfective gives event realization. For atelics, however, imperfective marking as well as perfective marking yields event realization, so on this theory one would expect both markings on atelic predicates, contrary to the Aspect-First pattern. The notion of aspect coercion (Kamp and Reyle 1993; de Swart 1998) is central in an explanation that relies on the complexity or extra processing costs involved in coercion. With overt perfective, progressive, or perfect aspect marking, there is regular aspect shift; a precisely defined operator applies to the event type given by the predicate. Aspect coercion, on the other hand, applies when the predicate is not of the right type, that is, when there is a mismatch between the requirements of an aspect or tense operator and the event type of the predicate. For example, when present tense combines with a telic predicate, the present tense operator wants to locate the event’s run time 2

Nina Hyams (p.c.) points out that this assumption is challenged, however, given that children also produce verbs with a modal interpretation Hoekstra and Hyams 1998; Hyams 2007).

Lexical and Grammatical Aspect 599 simultaneously with the moment of speech. This conflicts with the telic event type variable which includes culmination, and so a hidden operator first coerces the predicate into an ongoing process, so that the present tense operator can apply. Another case of coercion applies when an atelic predicate appears with perfective aspect in order to create an event type with a final boundary. Aspect coercion is not straightforward: it is open to subtle meaning effects, since a hidden operator must be inferred, which moreover must fit in the context. Possibly, children initially avoid aspect coercion, either because it is semantically-pragmatically complex or because it involves costly processing resources (Slabakova 2002; van Hout 2007a, 2007b). If so, they will not mark telic verbs with present tense nor atelic verbs with perfective aspect, because both of these combinations require aspect coercion. This exactly reflects the Aspect-First pattern (van Hout and Veenstra 2010). Recently, the Aspect-First pattern, which is a generalization about finite verbs, has been compared to aspectual patterns in nonfinite verbs in spontaneous speech, that is in root-infinitive clauses. The Aspect-First literature and the root-infinitive literature have been essentially disconnected. However, new findings reveal telicity and perfectivity effects in nonfinite clauses too, with remarkable parallels to the Aspect-First pattern (Hyams 2007, 2012). By far most of the bare verbs in English that appear in past-time contexts are telic, while most of the bare verbs that occur in present-time contexts are atelic (Torrence and Hyams 2004). Most root infinitives in Russian that occur in past- time contexts are perfective, while those that occur in present-time contexts are imperfective (Brun et al. 1999). Thus, “[i]‌n finite clauses past tense/perfective morphology is largely restricted to telic predicates; in nonfinite clauses past tense interpretation is largely restricted to telic predicates. In finite clauses present/imperfective morphology is restricted to atelic predicates; in nonfinite clauses present ongoing interpretation is restricted to atelic predicates” (Hyams 2007: 261). These parallels suggest that there is one underlying cause for both the undergeneralization of tense–aspect morphology to restricted aspectual classes in the Aspect-First pattern and the temporal–aspectual interpretation of root infinitives. Hyams advances the notion of “event structure” (defined by the combination of lexical and grammatical aspect) as an explanation: in nonfinite clauses, which have no tense marking, event structure determines temporal reference, while in finite clauses, event structure influences the interpretation of tense– aspect markers. She argues that the Aspect-First pattern arises in the early grammars as an artifact of an alternative temporal reference system based on event structure, specifically, the topological property of event closure assigns temporal reference; this in its turn spells out morphologically on the verb. The Aspect-First explanations aim to explain a certain pattern in production, whether they posit alleged form–meaning mismappings, undergeneralization of rules, learnability restrictions, avoidance strategies, or an alternative temporal interpretation systems. The question is if the theories hold up more generally, in particular, do they extend to comprehension? Since the end of the 1990s a series of comprehension studies has emerged. The real test case for any Aspect-First theory is to formulate precise predictions and test them in a structured experimental setting; some comprehension

600 Angeliek van Hout studies were explicitly designed to test an Aspect-First theory (e.g. Wagner 2001, to be discussed in section 25.6).

25.5 The Acquisition of Telicity The acquisition puzzle for telicity involves lexicon, syntax, and semantics, as it revolves around the following pieces: the lexical semantics of the verb, the quantization of the direct object, the event type-shifting contribution of particles and PPs, and the composition of verb and object. There are at least five questions about the process of telicity acquisition (van Hout 2007c). Two concern lexical learning: How does a child establish for each verb (i) whether or not it carves out an eventuality with a natural culmination point, and (ii) whether the theme participant is undergoing the event in an incremental way? Two further issues lie at the syntax–semantics interface: (iii) the compositional telicity rule for verb and object, and (iv) the predicate telicity rule for combining a verb with verb particles or directional PPs. Finally, one may wonder (v) whether telicity is easier to acquire in some languages than in others, and if so, why. How does a child deduce the aspectual properties of individual verbs from the input in the process of lexical learning? Wagner (2006, 2010) argues that syntactic bootstrapping helps with the acquisition of telicity of individual verbs. Syntactic bootstrapping exploits the idea that lexical learning is aided by structural cues, including the number and position of a verb’s arguments (Landau and Gleitman 1985; Gleitman 1990). Exploiting the relation between transitivity and telicity, Wagner claims that children follow a simple heuristic “transitive sentences have telic meaning” (2006: 52), using it as weak structural cue to learn verb meanings. Of course, not all transitive verbs in the adult lexicon are telic. Yet, the rule can serve as a bootstrap into the system: it offers an initial hypothesis and reasonable first guess for transitive verbs, as well as a cue for intransitive verbs as atelic verbs. As for the origin of this transitivity-telicity heuristic, Wagner alludes briefly to typological work which shows telicity as one of the prototypical properties of transitivity (Hopper and Thompson 1980; Dowty 1991), suggesting that it may reflect an innate presupposition. As an alternative source, she points to statistical patterns in the input. One might add the universal role of the direct object in the calculation of telicity as further support for the possibly innate status of the heuristic (Tenny 1987, 1994; Verkuyl 1993; see section 25.2). Employing the essentially different ways of event individuation of telic and atelic predicates, Wagner and Carey (2003) developed an event-counting task. The child must count how many times an action happens. For example, after showing an animated movie with a girl painting a flower in three brush strokes, the experimenter either asks “how many times does the girl paint a flower?” or “how many times does the girl paint?” For telic predicates, the criterion for deciding how many times is sensitive to their endpoint orientation; it is independent of the number of pauses in the action. For atelic predicates, there is no natural criterion to count events; instead the spatio-temporal

Lexical and Grammatical Aspect 601 pauses in the action provide a natural counting criterion. The results in Wagner (2006) show that the 2-year-olds indeed have a bias in the event-counting task, interpreting transitive verbs as implying culmination more often than the 3-year-olds. However, contrary to the predictions of a strict heuristic (transitivity implies telicity), the 2-year-olds did not assign all transitive clauses a telic interpretation equally often. There were fewer telic interpretations for atelic transitive predicates such as poke the balloon and push a ball than for telic transitives such as paint a flower and build a house. Wagner (2006) does not really discuss this finding, but Wagner (2010) presents a follow-up study with novel verbs based on the same event individuation principle. These results reveal the bias to interpret transitive verbs with a result orientation and intransitives with a process orientation more clearly, underscoring even more poignantly the idea that transitivity is taken to reflect culmination, at least initially. So the bootstrapping theory that transitivity implies telicity is supported. If the association between transitivity and telicity is a driving force in learning a verb’s inherent telicity, the prediction is that the learner initially takes all transitive verbs to be telic. Since not all verbs in the adult lexicon are telic, she will subsequently have to relax this bias. Transitive verbs of creation (make, build, draw) and destruction (eat, melt, erase) fit the telicity-transitivity cue, but verbs from the push-class verbs which are transitive and atelic do not, because their theme is not incremental. There are further constructions for which the telicity heuristic is incorrect, or which it does not cover: the subtle effects of count versus mass objects in compositional telicity, the class of telic intransitive verbs, and the “de-culminating” effect of imperfective aspect on telicity. So, while syntactic bootstrapping can help with question (i) about lexical telicity, it is not clear what prompts the learner to give up the heuristic. This is the essence of question (ii) about incremental theme-hood, which has not received any attention in the literature. Focusing on telicity issues at the syntax–semantics interface, (iii) and (iv), Van Hout (1997, 1998, 2000, 2007c) investigates the question: what is easier to acquire: predicate telicity or compositional telicity? The question is inspired by typological generalizations: temporal and aspectual meanings are most naturally expressed on verbs, not on nouns, because they are relevant to the verb (Bybee 1985). Slobin connects the typological notion of relevance to acquisition in his theory about “local cues,” which “can be interpreted without taking the entire sentence into account. The cue is local because it operates in a localized sentence element” (Slobin 1982: 162). This theory explains why certain forms are acquired earlier than others. One example that Slobin (1985) discusses, is that tense–aspect marking on verbs is acquired early, whereas marking of gender and definiteness of objects on the verb is difficult to acquire; clearly, tense–aspect is more relevant to the verb than information about the object. Pursuing the theory of relevance further for tense–aspect marking, Van Hout (2007c) argues that predicate telicity is easier to acquire than compositional telicity, because it is carried by verbs (morphology and other elements close to the verb—particles and prefixes), whereas compositional telicity relies on the integration of quantificational properties of the direct object. I even offer the suggestion that this may be so because there is an innate bias towards expecting aspectual distinctions to be marked on the verb.

602 Angeliek van Hout Table 25.4 Conditions and sample sentences in designs that compare predicate and compositional telicity Language

Predicate telicity

Compositional telicity

Atelicity

Dutch

Heeft de jongen het broodje opgegeten?

Heeft de jongen het broodje gegeten?

Heeft de jongen (brood) gegeten?

German

Hat der Junge aufgegessen?

Hat der Junge das Brötchen gegessen?

Hat der Junge gegessen?

Spanish

¿Se comió el sándwich?

¿Comió el sándwich?

—

English

Did the boy eat up the sandwich?

Did the boy eat the sandwich?

Did the boy eat (bread)?

The hypothesis that predicate telicity is easier than compositional telicity has been investigated by comparing particle verbs to their simple transitive counterparts. Several studies employ a paradigm that capitalizes on the incompatibility of telic predicates with incomplete or ongoing events: If a learner has acquired the telicity of a given form, she will give a no for yes/no-questions or reject the sentence in a Truth Value Judgment (TVJ) task. Some studies use static pictures, while the more recent studies use short movies. Table 25.4 illustrates two different telic conditions for particle verb eat up and simple transitive verb eat in Dutch, English, and German, and the atelic conditions with intransitive eat and transitive eat with a mass term object (as investigated by van Hout 1997, 1998; Schulz and Penner 2002; Penner et al. 2003; Schulz and Wittek 2003; Jeschull 2007; Ogiela 2007). In Spanish, verbs with and without aspectual se were contrasted; like particles, se also involves predicate telicity (Hodgson 2001). While just two verbs (eat, drink) were tested with and without particles in van Hout’s (1998) study and the German replication in Schulz and Penner (2002), the other studies investigated a larger range of (particle) verbs, Table 25.5. The hypothesis about predicate telicity is supported in all mentioned studies, across different methods, materials, verbs, and languages.3 Children essentially know telicity in the Predicate Telicity condition, mostly rejecting incomplete situations, but not in the Compositional Telicity condition, where they sometimes accepted incomplete situations, the degree of which varying from study to study.4 The effect is already there for

3 With the exception of Jeschull (2007), whose child participants did not always reject particle verbs for incomplete situations. 4 Interestingly, even the control adults occasionally or even regularly accepted simple transitives for incomplete situations, which suggests that simple transitives do not always yield compositional telicity, possibly because they are ambiguous between telic and atelic (Schulz and Penner 2002), or because telicity is merely a conversational implicature (Jeschull 2007; Hacohen 2010).

Lexical and Grammatical Aspect 603 Table 25.5 Verbs and particle verbs in various telicity studies Language

Verbs

Study

Dutch

(op)eten ‘eat up,’ (op)drinken ‘drink (up),’ bouwen ‘build’; lezen ‘read,’ schrijven ‘write’ (op)eten ‘eat up,’ (op)drinken ‘drink (up)’

van Hout (1997) van Hout (1998)

German

essen ‘eat,’ trinken ‘drink’ aufmachen ‘open,’ abmachen ‘take off,’ anmachen ‘turn on,’ zumachen ‘close’

Schulz and Penner (2002) Schulz and Wittek (2003)

Spanish

comer(se) ‘eat (up),’ beber(se) ‘drink (up),’ hacer(se) ‘do,’ leer(se) ‘read,’ limpiar(se) ‘clean’

Hodgson (2001)

English

eat (up), drink (up) eat (up), drink (up), fold (up), wrap (up) eat (up), drink (up), build, fix, carry (out), push (over)

van Hout (1998) Jeschull (2007) Ogiela (2007)

3-year-olds, and becomes more robust with age (4, 5, 6-year-olds). Children furthermore accepted atelic predicates for incomplete situations. The question about predicate versus compositional telicity can also be examined from a cross-linguistic perspective: Is telicity easier to acquire in languages with predicate telicity than in languages with compositional telicity (van Hout 2007c)? Weist et al. (1991) find that Finnish learners at the age of 6 have not yet acquired the aspectual implications of the two object cases (accusative entails completion, partitive does not). Comparing Finnish to the telicity studies in Dutch, English, and German just discussed, Van Hout (2007c) argues that Finnish supports her theory further, because it presents another case of compositional telicity which is acquired late. Children’s ambiguous interpretation of simple transitive verbs in the studies discussed here raises questions about compositional telicity, in particular, what do children know of the role of quantization of the direct object in telicity acquisition? Two recent dissertations looked into this issue. Ogiela (2007) compares verbs which she hypothesized to be more (build, fix) or less (eat, drink) sensitive to the quantization of the object, and finds that children derive more completion entailments for the former type (as do adults). She furthermore investigated if a cardinal number (build two bridges, eat two oranges) more explicitly quantizes the object than a definite determiner (build the bridges, eat the oranges). Children indeed gave more completion entailments, but only with eat-type verbs. Children are sensitive to the lexical-aspect properties of particular verbs (whether the relation between verb and object is more or less incremental), and moreover, that this interacts with explicitness of quantization on the object. The conclusion seems to be that English learners are sensitive to the quantization of the object in deriving completion entailments, thus revealing some knowledge of compositional telicity.

604 Angeliek van Hout Hacohen (2010) examined several properties of direct objects to see how these affect compositional telicity in Hebrew: mass /count (color cloth—the cloth), singular /plural (color the square—the squares), and definite /indefinite color a square—squares). She also tested the children on independent tests of definiteness and mass/count. Even though the participants in the Hebrew study were much older than those in the other telicity studies (7;9 up to 17), none of them did well on the telicity task, accepting telic predicates for incomplete situations. Finding that the same children did not do well on the definiteness and mass /count tasks either, Hacohen concludes that the non-adultlike performance on telicity is caused by children’s immature knowledge of quantization. This conclusion strengthens the earlier conclusions that compositional telicity is acquired late. The Hebrew learners in Hacohen (2010) did not show any sensitivity to quantization of the object, in contrast to the English learners in Ogiela (2007). Taking stock, two universal rules have been proposed in the telicity acquisition literature. Transitive sentences have telic meaning (Wagner 2006, 2010). Aspectual distinctions are marked on the verb (van Hout 2007c). The two heuristics seem at odds with each other: the transitivity heuristic assigns an important role to the object in compositional telicity, whereas the aspect-on-the-verb bias focuses the verb (particles and prefixes) as the main cue for predicate telicity. The former predicts that transitivity provides an early cue for telic interpretation, whereas the latter predicts that compositional telicity will be acquired late. Despite these conflicting predictions, the studies reviewed in this section provide support for each heuristic. One possible explanation to resolve the conflict may be that the different heuristics apply at different stages in acquisition. Note that only the youngest children in Wagner’s event counting studies, 2-year-olds, were sensitive to transitivity; the older children (mean age 3;6) did not show the transitivity bias anymore. The children in the studies using the complete–incomplete paradigm (van Hout 1998; and follow up studies) were 3 and older. It may well be after the transitivity bootstrap has functioned as a very first, but overly general heuristic to acquire some rough lexical-aspectual semantics of verbs, that the aspect-on-the-verb rule sets the stage for fine-tuning and modifying verbs’ lexical specifications. The acquisition theories for the two types of telicity will need to connect at some point, but it is unclear yet how. Furthermore, there remain questions about many telicity details, specifically how learners acquire the subtle aspectual role of the object (mass versus count, incremental versus non-incremental theme, object case), most of which have not been (fully) investigated.

25.6 The Acquisition of Perfective–Imperfective Aspect The learnability question for grammatical aspect concerns form–meaning: which aspects are expressed by which morphosyntactic forms, and vice versa, which forms

Lexical and Grammatical Aspect 605 carry which aspectual meaning? The Aspect-First pattern has been explained as a mismapping: tense–aspect markers initially are incorrectly mapped onto lexical aspect features telic and atelic (section 25.4). Alternatively, the Grammatical Aspect-First hypothesis claims that children initially use verbal morphology to mark grammatical aspect, instead of tense or lexical aspect; in particular, past forms carry the grammatical aspect of completion and present forms carry ongoingness, and no temporal information. In a set-up with props and a road drawn on the table with three marked locations, Wagner (2001) employs the relation between locations and real time. A puppet walks along the road and must play the same game at each location, for example, she must do three puzzles. As the puppet moves along the road, the first location becomes past time. While the puppet is playing at the second location, the present time, the experimenter asks a where question. In control conditions with explicit adverbials (before, now, later), the children did fine. In the test conditions, tense was marked on the auxiliary (where was /is the puppet doing a puzzle?). Two-year-olds correctly distinguished past and present only when the action played out at the first, past-time location showed the event to its completion. In the other condition where the situation at the first location was incomplete, however, they did not differentiate the two tenses. Wagner concludes that children seem to take past forms to entail completion, thus supporting the Grammatical Aspect-First hypothesis. There is partial support from Van der Feest and van Hout (2002) with Dutch learners. The problem with the Grammatical Aspect-First hypothesis is the same learnability puzzle pointed out above for the Primacy of Aspect theories (section 25.4): what is the trigger for the child to re-map the forms to their target meanings? A different perspective on grammatical aspect learning derives from a theory about semantic complexity. The essence is that certain combinations of lexical and grammatical aspect are semantically simpler than others, for example, telic plus perfective is simpler than telic plus imperfective. Telic predicates implicate a culmination moment, hence, a telic eventuality description creates an expectation of culmination. Perfect and perfective aspect are compatible with this expectation, but imperfective and progressive aspect cancel the culmination expectation. With the former combination, lexical and grammatical aspect thus line up in a transparent way, perfect(ive) aspect preserving the temporal character of a telic event description. In contrast, with a telic plus imperfective combination, there is an aspectual shift which “takes away” the culmination moment; the event description is changed by the semantic operation of aspect shift or aspect coercion (de Swart 1998; see section 25.4). Van Hout (2007b, 2007c, 2008) posits that the semantics of simple semantic operations is acquired early, and hypothesizes that children initially avoid aspect coercion and aspect shift. This theory of Semantic Complexity predicts an asymmetry in the development of perfective and imperfective aspect as related to telicity. For telic predicates, perfective aspect is acquired earlier than imperfective aspect; telic plus imperfective is semantically more complex, because aspect shift or coercion takes away the final event boundary. There is a further, mirror- image prediction for atelic predicates: imperfective aspect is acquired earlier than perfective aspect. An atelic plus perfective combination is more complex, because aspect

606 Angeliek van Hout shift or coercion must add a final boundary which is not implicated in an atelic predicate’s meaning. While the latter prediction has not been investigated, several studies were designed to investigate the former prediction. Moreover, many other aspect studies contain relevant data. The relevant paradigms use complete situations in which an event reaches its natural culmination versus incomplete or ongoing versions of the same event, capitalizing on the completion entailment of telic plus perfect(ive), and the lack of this entailment for telic plus imperfective (section 25.2). Comprehension studies with Dutch, Polish, and Italian learners support the theory (van Hout 2005, 2007a, 2007b, 2008): children were target-like on perfect and perfective aspect, choosing the complete situation; furthermore, they were overly liberal with imperfective aspect or tense, choosing ongoing as well as complete situations. The fact that this effect held cross-linguistically shows that the Semantic Complexity theory is independent of the language-specific encoding of these aspects, be it as a dedicated morphological category in Polish, as tense form in Dutch, or as aspectual tense in Italian. There are many more results in the aspect comprehension literature that show that learners have acquired the completion entailment of perfective or perfect aspect at an early age (2;6 or 3), depending on the study. Furthermore, some of these studies point out imperfect comprehension of imperfective aspect. For example, Russian and Greek learners do not always accept imperfective aspect for incomplete situations, even at the age of 4–6, whereas adults do ( Kazanina and Philips 2007; Gagarina 2008; Konstantzou et al. 2013).5 Table 25.6 presents an overview of studies which tested interpretation of telic predicates with perfective–imperfective aspect, using a relevant form that encodes this distinction in the language under investigation. All studies used a paradigm which contrasted complete situations versus ongoing or incomplete situations. I have indicated the youngest age at which the completion entailment of perfective aspect is in place and the age at which children still gave a non-targetlike interpretation of imperfective aspect. As a very rough criterion, I used 70 percent targetlike interpretation as a cut-off point. The overview shows that there is quite some variation in the age at which the completion entailment of perfective is acquired. Even so, targetlike interpretation of perfective is earlier than imperfective, as predicted by the Semantic Complexity hypothesis. All studies establish the result that the interpretation of imperfective aspect is still not targetlike at the ages of 5 and 6. However, it is difficult to draw general conclusions about the acquisition of perfective–imperfective aspect on the basis of this overview, as different studies used different paradigms, tasks, verbs, and aspect forms. The Semantic Complexity theory has been tested in comprehension, but it also extends to production (section 25.4). Do children produce their aspects or aspectual tenses in a targetlike manner, using all possible combinations, or is there a similar 5 However, Kazanina and Philips (2007) also found that Russian children accepted past-imperfective forms for incomplete situations when there is an explicit adverbial (e.g. while the boy was watering the flowers, the girl was cleaning the table). They conclude that children initially need help determining the intended interval against which they are supposed to evaluate imperfective aspect.

Table 25.6 Summary aspect comprehension studies across languages Age PERF: Youngest age at which completion entailment of perfective aspect is in place. Age IMP: Oldest age investigated at which imperfective is still non-target-like. Language

Study

Ages tested

Age PERF = Target

Basque

García del Real and Ezeizabarrena 2011

5;0–5;11

5a

Chinese

Li and Bowerman 1998

Finnish

Weist et al. 1991

2;6–6;6

Dutch

van Hout 2007a,b, 2008

3–5

3d

5

English

Matsuo 2009 Wagner 2002 Weist et al. 1991

1;6–6;8 1;11–5;7 2;6–6;6

5e 5e 2;66

4;7 5 2;6f

Italian

van Hout and Hollebrandse 2001; van Hout 2008

3;0–6;1

4

5

Spanish

García del Real and Ezeizabarrena 2011 Hodgson 2003

5;0–5;11 3;0–8;0

51 5 5

5 5 6

Greek

Delidaki 2006 Konstantzou 2014

3;0–6;5 4;0–6;11

6;5 4

6;5 6

Polish

van Hout 2005, 2008 Weist et al. 1991

2;0–4;11 2;6–6;6

3 2;6f

5 2;6f

Russian

Gagarina 2008 Kazanina and Philips 2007 Stoll 1998 Vinnitskaya and Wexler 2001

3;0–6;11 2;10–6;9 2;0–6;11 3;0–6;5

3 3 2;6g 3

5 6 3

5

Age IMP = Non-target

6b 6;6c

a This study did not test any children younger than 5. These 5-year-olds were on target for both

PERF and IMP.

b This reflects the data for their lexical verb class categories Accomplishment Resultative and Locative. c Not even the oldest Finnish learners of 6;6 interpreted accusative versus partitive case to

appropriately distinguish complete versus incomplete situations.

d The Dutch 3-year-olds correctly rejected the perfect for incomplete situations by 63%, which was

different from chance, but far from ceiling.

e The three studies with English learners find a sharply different age at which the completion

entailment of perfective aspect seems to be in place (in English -ed in simple past or in a prenominal participle). In contrast to Weist et al. (1991), Wagner (2002), and Matsuo (2009) only showed the object at the end of the event, not the agent of the action. The presence of agent-oriented information may be required by the younger children to be able to succeed in this task (Wagner 2002). f

Weist et al. concludes that the English and Polish learners in his study with a forced-choice selection task correctly differentiate the perfective-imperfective distinction at the age of 2;6. However, since no percentages correct are given for each aspect, it is impossible to tell from these data how firmly these young learners have acquired the completion entailment of the perfective, nor can one say anything about the extent to which they still have problems with the imperfective. g

Stoll (1998) only tested perfective forms, no imperfective forms.

608 Angeliek van Hout asymmetry as in comprehension? The absence of certain combinations in children’s spontaneous speech as established by the Aspect-First pattern points to an asymmetry: children seem to avoid the combinations of telic plus imperfective or progressive aspect and atelic plus perfect or perfective aspect. However, there is another twist to the Semantic Complexity theory. Modeling aspect in bidirectional Optimality Theory, Van Hout (2007a, 2007b, 2008) argues that children initially apply unidirectional optimization (de Hoop and Krämer 2005; Hendriks and Spenader 2005). On the one hand, this hypothesis explains why children have non-optimal comprehension of telic predicates plus imperfective aspect, as it involves aspect shift or coercion. On the other hand, it expects children to be targetlike in their production, thus predicting a surprising comprehension–production asymmetry (production ; Horn 1972). On this account, the quantifier some has a lower-bound semantics (“at least some and possibly all”) that is upper-bound by the implicature. Evidence for the lower-bound semantics comes from the fact that the implicature can be explicitly canceled (e.g. the speaker in (1a) may continue: “In fact, Mary ate all of the cakes.”) without logical contradiction. The rest of the examples above work in similar ways: (2b) rests on the scale < and, or > and (3b) on the scale < four, three …> (see Horn 1972 and 1984 for more examples). In some cases, SIs can be derived from non-logical scales that are based on contextual information (Hirschberg 1985). For instance, the child’s response in (4) implicates that she did not complete the action based on a temporal scale: (4) Mother: Did you make the fruit salad? Child: I peeled the fruit. Examples such as (1)–(4) have been studied extensively in philosophical and linguistic work on verbal communication and several influential proposals about scalar inferences have been developed beyond Grice’s original theory (Horn 1972, 2005; Harnish 1976; Gazdar 1979; Geurts 1998, 2010; Chierchia 2004; Sauerland 2004; Fox 2007; Spector 2007; Chierchia et al. 2009). Three proposals are worth mentioning in some detail. On certain neo-Gricean accounts, SIs based on logical scales such as those in (1b)–(3b) become stored in the lexicon and available immediately every time the weak scalar term is accessed, only to be canceled later in case they are not supported by context (Levinson 2000). Other recent accounts take the generation of SIs to be a grammatical process contributing to truth conditional content (Chierchia 2004; Chierchia et al. 2009): this grammatical account shares with Levinson’s (2000) neo-Gricean model the assumption 1

We omit reference to the speaker’s epistemic state from (1b) for simplicity.

614 Anna Papafragou and Dimitrios Skordos that logical scales are part of the lexicon and are activated every time a weak scalar item is encountered. Relevance theory proposes that all pragmatic inferences, including implicatures, are guided by a general tradeoff between the projected cognitive gains from computing an inference and the amount of cognitive effort necessary to derive it (Sperber and Wilson 1985/1996; see also Carston 1995; Noveck and Sperber 2007). A crucial difference between these three accounts is that Relevance theory sides with traditional Gricean accounts in treating SIs as products of context-driven inference (cf. also Horn 1989), whereas Levinson’s (2000) neo-Gricean account and the grammatical view hold that SIs from logical expressions are generated by default, independently from contextual input. From a developmental perspective, SIs raise several questions about both the nature of early conversational inferences and the mechanisms whereby contextual inferences are computed: Do children generate pragmatic inferences? If not, how do they recover and become pragmatically savvy adults? Are the mechanisms underlying linguistic pragmatics fundamentally the same across children and adults? Moreover, if one assumes continuity across the two populations, which aspects of pragmatic ability actually develop? Finally, how do different scalars contribute to the derivation of SIs? Within semantic theory, there has been considerable discussion of the similarities and differences in the semantic contribution of different expressions which all seem to give rise to scalar effects (Sadock 1984; Horn 1992; Koenig 1993; Carston 1995; and section 26.4 below). Do different scalar expressions follow distinct acquisition paths? Even though these issues are far from theoretically or empirically settled, they have been the topic of intensive recent experimentation and theorizing.

26.3 SI Processing in Children 26.3.1 Developmental Evidence Are SIs psychologically real for children (and adults)? Several findings from early studies designed to investigate children’s knowledge of quantification and propositional connectives bear on this question. For example, Smith (1980b) observed that preschool children, who had mastered many of the syntactic aspects of quantifiers like some and all, were likely to respond affirmatively to questions such as “Do some birds have wings?” (i.e. they were likely to interpret some as “at least some and possibly all”). In another study, Braine and Rumain (1981) found that adults tended to weakly favor an exclusive interpretation of the disjunction operator or (i.e. they tended to interpret p or q as “either p or q but not both”); however, 7-and 9-year-old children favored a logical/ inclusive interpretation of disjunction on which p or q is interpreted as “p or q and possibly both” (see also Paris 1973). However, these early studies mostly focused on the logical

Scalar Implicature 615 meaning of quantifiers/disjunction and the significance of these findings for children’s pragmatic development was largely overlooked. The first study to point out the relevance of this prior work and systematically investigate children’s understanding of SIs was Noveck (2001). That study examined comprehension of the modal term might in 5-, 7-, and 9-year olds, as well as adults, using a scenario that involved reasoning about the contents of a covered box. In a crucial trial, participants had to say whether they agreed or not with the statement “There might be a parrot in the box” when the evidence made it clear that there had to be a parrot in the box. Notice that the target statement was logically true (since might is compatible with have to) but pragmatically infelicitous (since the use of might could be grounds for excluding have to). Noveck found that adults tended to disagree with the statement, while children of all three age groups overwhelmingly agreed with it. In another experiment, Noveck tested French-speaking children’s and adults’ comprehension of the existential quantifier certains ‘some.’ He asked participants whether they agreed or not with statements of the form “Some elephants have trunks.” He found that 8-and 10-year-old children typically treated certains logically, as compatible with tous ‘all,’ whereas adults were equivocal between the logical and the pragmatic interpretations. As Noveck (2001) pointed out, these data cohere with results from earlier studies from the 1970s and 1980s (Smith 1980b; Braine and Rumain 1981; among others). Taken together, these results strongly indicate that children appear to be more logical than adults in reasoning tasks that involve the use of quantity terms. Specifically, it seems that otherwise linguistically competent children are oblivious to pragmatic inferences from the use of scalar terms such as modals and quantifiers. What is the nature of children’s failure with the calculation of scalar implicatures? One possibility is that this failure reflects a genuine inability to engage in the computations required to derive such implicatures. Another possibility is that the failure is due to the demands imposed by the experimental task on an otherwise pragmatically savvy child. As Noveck (2001) said, “[t]‌he tasks described here, which are typical of those found in the developmental literature, demand no small amount of work as they require children to compare an utterance to real world knowledge. This might well mask an ability to perform pragmatic inferencing at younger ages” (2001: 184). Under the second, but not the first, of these alternative hypotheses, children’s ability to derive scalar implicatures could improve under certain experimental circumstances. Evidence in support of the second hypothesis comes from a set of studies by Chierchia et al. (2001) and Gualmini et al. (2001). These studies investigated preschoolers’ interpretation of the disjunction operator or. Extending and confirming previous studies, the authors found that adults were sensitive to the implicature of exclusivity from the use of disjunction (e.g. they interpreted an utterance such as “Every boy chose a skateboard or a bike” as excluding the possibility that both a skateboard and a bike were chosen); children, by contrast, often showed virtually no sensitivity to the exclusive reading of disjunction. Crucially, children were able to distinguish between a stronger statement containing the conjunction operator and and a weaker scalar statement containing the disjunction operator or (Chierchia et al. 2001). Specifically, when children were

616 Anna Papafragou and Dimitrios Skordos presented with two statements (produced by two puppets) and were asked to reward the puppet who “said it better,” they overwhelmingly chose to reward the puppet who produced a stronger/more informative statement with and (“Every farmer cleaned a horse and a rabbit”) over a puppet who offered a weaker/less informative statement with or (“Every farmer cleaned a horse or a rabbit”) in a context that made the stronger statement true. More recently, Ozturk and Papafragou (2014) obtained similar results with modals. In one of their experiments, children (and adults) overwhelmingly preferred logical interpretations of the modal may in a reasoning task that involved locating a hidden animal; for instance, both groups accepted the statement “The cow may be in the orange box” in a context in which, according to the available evidence, the cow had to be in the orange box. A second experiment found that, if given a choice between two statements in the very same context, both adults and 5-year-olds preferred statements with have to over statements with may.2 Taken together, these studies show that children have knowledge of the relative information strength of sentences with or versus and, or may versus have to, and they use information strength as the basis of their preference for stronger sentences over weaker ones in circumstances which make the stronger sentences true. Thus it seems that a critical prerequisite for calculating SIs (namely, recognizing strength differences between otherwise identical conversational contributions) is in place in young children. What appears to be problematic for children is the step of generating and comparing stronger alternatives to a weak statement on-line in cases where the alternatives are not explicitly presented to them. More direct evidence for the conclusion that children are not entirely insensitive to pragmatic inferences but may be able to compute SIs under certain circumstances is provided by Papafragou and Musolino (2003). In one study, the authors tested Greek- speaking 5-year-olds and adults on three types of scales: the quantificational scale , the numerical scale < three, two> and the aspectual scale < finish, start>. Their method was a variation of the Truth Value Judgment (TVJ) task (Crain and McKee 1985) called the Acceptability Judgment task: participants watched a series of actedout stories along with a puppet. At the end of the story, the puppet was asked to say “what happened” and participants had to say whether the puppet “answered well.” On critical trials, the puppet produced a true but underinfomative statement (e.g. “Some/Two of the horses jumped over the fence” in a story in which every horse in a group of three horses jumped over a fence). It was found that 5-year-olds were much more likely than adults to accept the logically true but pragmatically infelicitous statements. Papafragou and Musolino (2003) hypothesized that this adult–child difference might be due to the difficulty of reconstructing the experimenter’s goal in this task: in asking whether the puppet “answered well,” children (unlike adults) may have been more likely to base their 2

Other evidence shows that, even in the absence of background evidence, children can differentiate between stronger and weaker scalars. For instance, if presented with two conflicting statements about the possible location of an object (“The x may be under the cup” versus “The x must be under the box”), children typically choose the location associated with the stronger expression (Hirst and Weil 1982; Byrnes and Duff 1989; Moore et al. 1989; Noveck et al. 1996).

Scalar Implicature 617 judgments on truth than pragmatic infelicity. To test this hypothesis, in a second study Papafragou and Musolino modified their procedure in several ways. First, to enhance awareness of the goals of the experiment, they initially trained children to detect pragmatically anomalous statements produced by a “silly puppet” (e.g. children were encouraged to say that the statement “This is a small animal with four legs” is “silly” and the puppet should simply say “This is a dog”). Second, to ensure there was a salient informativeness threshold, the experimental scenarios and test question were modified to focus on a character’s performance in a task (e.g. in one of the stories, Mr Tough brought back three horses that had run away; when asked how Mr Tough did, the puppet gave the response “He caught some/two of the horses”). Under these conditions, 5-year-olds were more likely to compute scalar implicatures, even though still not at adultlike levels. These data show that, given contextual support, children show some ability to spontaneously generate SIs; furthermore, these results leave open the possibility that different experimental manipulations might reveal higher success with pragmatic inferences.3 Building on Papafragou and Musolino (2003), other studies have confirmed the role of training and context in older children’s ability to calculate SIs. Guasti et al. (2005) showed that Italian-speaking 7-year-olds accepted underinformative but true statements of the type “Some giraffes have long necks” more often than adults when the statements were presented out of context (thereby replicating Noveck, 2001). However, when the same statements were preceded by training in rejecting infelicitous statements, children behaved exactly like adults (even though the effects of training did not persist when the children were re-tested a week later). Similarly, when underinformative statements were embedded within a story rather than presented out of context, children typically generated SIs at adultlike levels. Taken together, these studies make it increasingly clear that young children have the ability to make pragmatic inferences; however, they are limited in doing so by the cognitive resources they bring to bear on the process of utterance interpretation (see section 26.3.2 for details). A particularly striking conclusion from this work is that the type of task used to elicit pragmatic responses affects children’s success with scalar inferences. Notice that most or all of the early studies documenting children’s limited awareness of SIs primarily relied on judgments about the acceptability of weak scalar expressions in contexts in which a stronger term is warranted. These tasks, however, are different from the actual circumstances in which SIs are computed during naturalistic conversations in several respects (see Papafragou and Tantalou, 2004, for related discussion). First, experimental conditions did not make it clear whether (or why) SIs should be considered as part of what the speaker actually intended to communicate. In ordinary cases of intentional communication, the speaker intends the addressee to compute the implicature (and further intends the addressee to recover this intention; cf. Grice 1989). But in the experimental designs discussed so far, the computation of SIs was not similarly constrained by the 3 Papafragou and Musolino (2003) also discovered that 5-year-olds treated numbers differently from quantifiers in the generation of SIs. We postpone discussion of this finding and its theoretical significance until section 26.4.

618 Anna Papafragou and Dimitrios Skordos speaker’s intention. In some experiments (Papafragou and Musolino 2003), a “silly” puppet uttered an underinformative statement (probably because of incompetence) and might not have noticed that the statement carried the potential for conveying a SI; in others (Noveck 2001), underinformative statements appeared out of context and therefore invited participants to reconstruct a possible situation in which they could have been uttered by an actual communicator. In short, previous tasks measured children’s sensitivity to potential implicatures in an effort to approximate their performance with actual (communicated) implicatures. Second, previous tasks typically involved situations in which an utterance containing a scalar term (e.g. “Some of the Xs Ved”) semantically conveyed a true proposition (e.g. “Some and possibly all of the Xs Ved”) but carried a (potential) implicature that was false (“Not all of the Xs Ved”). To perform correctly in these tasks (i.e. to reject the statement), hearers had to take the implicature (rather than simply the proposition expressed) as the basis for their assent/dissent with the target statement. In other words, participants had to estimate the experimenter’s goal in setting up the task. This step is non-trivial, especially since, in the absence of cues about whether a logical or a pragmatic response is required, either type of response is acceptable. In fact, adults, who are otherwise able to compute SIs, when presented with underinformative statements (e.g. “Some airplanes have wings”) without supporting linguistic or extralinguistic context, agree with the statements about half of the time (Braine and Rumain 1981; Noveck 2001; Guasti et al. 2005; cf. Papafragou and Schwarz 2006). When examined closely, this response pattern turns out to be due to the fact that adults consistently select either the logically true meaning of the utterance or the pragmatically enriched interpretation (Guasti et al. 2005).4 Given adults’ mixed pattern of responses, children’s tendency to accept underinformative statements in the same environments cannot offer solid evidence of indifference to pragmatic meaning. Relatedly, when children are asked to evaluate logically true but pragmatically underinformative statements in a binary (Yes–No) task, they may notice pragmatic infelicity but not penalize it as heavily as logical falsehood. Evidence supporting this hypothesis comes from judgment tasks that elicited more fine-grained responses to scalar statements. When 6-to 7-year-olds were asked to offer a “small,” “big,” or “huge” strawberry as a reward to a speaker depending on how good the speaker’s responses were, children rewarded fully informative responses by giving the speaker 85 (out of 100 trials) “ ‘huge” strawberries, underinformative ones by giving 89 “big” strawberries, and false responses by giving 95 “small” strawberries; crucially, children of this age massively accepted underinformative statements in a standard Yes–No judgment task (Katsos and Bishop 2011). Thus what appears to be insensitivity on the part of children to violations of informativeness might be better explained as tolerance towards underinformative statements in a binary task. These data reinforce the suspicion that the observed 4 When judgment tasks either explicitly (Noveck 2001, Exp.2) or implicitly (Ozturk and Papafragou, 2014, Exp.1) encourage logical responses, adults overwhelmingly accept weak scalar statements (i.e. they appear to disregard pragmatic unacceptability).

Scalar Implicature 619 adult–child differences in pragmatic judgment tasks might be at least partly due to different choices in the way task requirements and goals are understood by young and more mature communicators (for instance, whether the task is interpreted as targeting logical versus pragmatic responses, or whether logical falsehood should be treated on a par with pragmatic infelicity). Interpreting task demands seems less relevant to processes underlying pragmatic processing in ordinary conversation but appears tied to metalinguistic awareness, an ability that is known to develop gradually over the school years (Ackerman 1981). These observations, taken together, suggest that the family of judgment tasks, even though useful as an initial tool in exploring awareness of SIs, may underestimate young children’s ability to compute implicatures “in the wild.” More recently, several studies have explored different methods of evaluating children’s early pragmatic abilities. In one such study (Papafragou and Tantalou 2004), Greek-speaking children were shown acted-out stories in which animals were asked to perform different tasks. After spending some time “off-stage,” each animal was asked whether it performed the task and would sometimes answer with underinformative statements (e.g. Experimenter: “Did you color the stars?,” Animal: “I colored SOME”). The reasoning was that, if children interpreted the animals’ answer pragmatically, they would arrive at the conclusion that the animals failed the task and would not give them a reward; but if children interpreted the statement logically, they would conclude that the animals might well have performed the task and should be rewarded. Children overwhelmingly interpreted underinformative statements pragmatically (i.e. they denied the animals the reward). Furthermore, children’s own reports as to why they had withheld the reward correctly made reference to the stronger scalar alternative (e.g. “The animal did not color ALL the stars”). Interestingly, children succeeded in computing SIs from non-logical scales alongside the more standard quantificational scales (e.g. Experimenter: “Did you eat the sandwich?,” Animal: “I ate THE CHEESE.”) Other work has confirmed that this more naturalistic method leads to higher rates of pragmatic responding than judgment tasks (Papafragou 2006; cf. also Verbuk, 2006b, for evidence that SIs are selectively generated in such question–answer pairs only when relevant). In another study, Pouscoulous et al. (2007) tested French-speaking children’s ability to generate SIs spontaneously, that is, without the use of training that had been used in earlier work (Papafragou and Musolino 2003; Guasti et al. 2005). A first experiment used a TVJ task: 9-year-olds and adults were shown a series of boxes with animals in or next to them and were asked to evaluate statements such as “Some elephants are in the boxes.” Some of the statements were both true and felicitous, some were true but infelicitous, and some statements were false. It was found that semantically competent 9-year- olds were overwhelmingly logical in their responses to infelicitous statements, while adults were mixed. A second experiment used a similar context but an action-based task. Participants were shown a series of boxes containing two, five, or zero tokens and heard a statement (e.g. “I want some/all/no boxes to contain a token”). Participants were expected to either add/remove tokens from the boxes, or leave the boxes unchanged so as to satisfy the experimenter’s wish. In critical trials, a some-statement was uttered

620 Anna Papafragou and Dimitrios Skordos when all of the boxes contained a token. Participants’ actions (i.e. whether they would leave the boxes intact or remove a token from at least one box) would reveal whether they had generated a logical or a pragmatic interpretation of the statement. Under these circumstances, even 5-and 7-year-olds generated SIs with regularity, and adults became more pragmatically oriented compared to Pouscoulous et al.’s (2007) first experiment. Not all paradigms that lack a judgment component reveal children’s success with SIs. Using a visual world task in which sets of objects were divided among different characters in the display, Huang and Snedeker (2009a) monitored participants’ eye movements to test on-line processing of scalars. In one of their trials, 5-year-olds and adults were asked to “point to the girl that has some of the soccer balls” while inspecting a scene that contained a girl with two soccer balls, a boy with two soccer balls, a girl with three socks, and a boy with nothing. If participants calculated the SI, they should be able to resolve the temporary ambiguity in the noun (sock … s versus socc … er balls) and infer that the sentence refers to the girl with the subset of soccer balls shortly after the onset of the quantifier some. Although adults generated SIs and used them to identify the correct girl shortly after hearing some, children failed to do so and had to wait until the disambiguating phoneme in the noun was heard to identify the referent. Further experimentation showed that children did not seem to be slowed down by contexts in which the SI was subsequently violated—a fact supporting the conclusion that children did not calculate the SI in the first place. It remains an open question how to properly account for children’s failures in these studies. One possibility is that children failed to reconstruct the set–subset relationship between soccer balls in the display, despite the experimenters’ efforts to highlight these relationships at the beginning of each trial when objects were distributed among characters. Another possibility is that children (and perhaps to an extent adults) initially resisted applying some to scenes with only two referents, which might have been more felicitously described by a numeral. The infelicity of some may have been heightened by the fact that other trials in Huang and Snedeker’s design involved numerals (“Point to the boy that has two of the soccer balls”); on these trials, both children and adults succeeded in quickly disambiguating the referent (see also section 26.4 below). Small modifications of Huang and Snedeker’s procedure that involved, among other things, removing numerally quantified statements from the trials have been shown to greatly increase the speed with which adults calculate the inference from some (Grodner et al. 2010;5 cf. Huang and Snedeker 2009b). It remains to be seen whether similar manipulations might affect children’s performance in the same task.

26.3.2 Pragmatic Inferences: What Develops? As already discussed, children are not entirely insensitive to pragmatic inferences. Nevertheless, the data surveyed in the previous section present a complex pattern 5

Grodner et al. (2010) used the term summa ‘some of ’ instead of some, with alla and nonna being the other contextually available alternatives.

Scalar Implicature 621 of early successes and failures with SIs. How can this pattern be explained? Based on the results reported, we propose that part of the answer lies with children’s difficulty in accessing and integrating different premises during the computation of speaker meaning—more specifically, the difficulty in inferring expectations of informativeness/relevance and evaluating a linguistic stimulus with respect to other possible alternatives that the speaker could have selected (cf. Papafragou and Tantalou 2004; Papafragou 2006). Recall that, according to the standard Gricean model, the calculation of an SI requires the hearer to go through the following steps (cf. section 26.2): (i) The speaker has uttered a sentence with a weak scalar item. (ii) The speaker thus violated the Quantity maxim since he/she chose a relatively weaker term from among a range of items logically ordered in terms of informational strength. (iii) Assuming that the speaker is trying to be cooperative and will say as much as he/ she truthfully can that is relevant to the exchange, the fact that he/she chose the weaker term is reason to think that he/she is not in a position to offer an informationally stronger statement. (iv) Thus, as far as the speaker knows, the stronger statement does not hold. To perform these calculations, the hearer needs to access a set of ordered alternatives to the weak scalar expression used by the speaker (Step ii); consider whether any of the stronger members of the ordered set are relevant and true (Step iii); if so, negate the stronger alternative (Step iii) and add the negated proposition to what the speaker intended to communicate (Step iv). Children’s failure to compute SIs in previous tasks seems tied to the process of generating stronger relevant alternatives to a weak statement (Steps ii and iii). We know that, if alternative scalar statements are explicitly supplied, children consistently prefer the stronger alternative to the weaker one (Chierchia et al. 2001 on or versus and; Ozturk and Papafragou 2014 on may versus have to). Thus children seem to have access to information about the logical properties of different scalar items, more specifically, the relative ordering of these items in terms of logical entailment (e.g. they recognize that and is stronger than or). The ability to appreciate the informational ordering of scalar items is a crucial prerequisite to Step (ii) above; nevertheless, this ability does not guarantee that children can spontaneously compute relevant scalar alternatives in the course of conversation. In fact, as shown by the studies reviewed in the previous section, the ability to consider stronger relevant alternatives is fragile: children failed to compute SIs when asked to judge weak scalar statements without specific cues about whether judgments should target pragmatic felicity (for which the target statement needs to be compared to potential alternatives that the speaker could have uttered) or logical truth (for which the presence of alternatives is irrelevant; Gualmini et al. 2001; Noveck 2001; Papafragou and Musolino 2003; Guasti et al. 2005). In experiments that clearly targeted pragmatic felicity and/or boosted the relevance of the stronger alternative, children’s performance with SIs improved (Papafragou and

622 Anna Papafragou and Dimitrios Skordos Musolino 2003; Papafragou and Tantalou 2004; Guasti et al. 2005; Papafragou 2006; Pouscoulous et al. 2007). If this line of reasoning is correct, children’s difficulty in identifying relevant alternatives to what the speaker said should extend beyond SIs to other environments involving alternative-generation. This prediction is borne out (Barner et al. 2011): when 4-year- olds were presented with three sleeping animals and asked whether “some/only some of the animals are sleeping,” they responded affirmatively about 66 percent of the time regardless of the form of the question. Children’s failure to respond with No to the question with bare some is not surprising given that SIs typically do not arise in questions. But children’s failure to respond with No to the question with only reveals that children have difficulty generating scalar alternatives even when the generation of alternatives is triggered by the grammar (since the focus element only grammatically requires the generation—and negation—of relevant alternatives). Crucially, when members of the set of animals were explicitly identified within the same displays (thereby making the set of relevant alternatives more salient), children’s performance with only improved dramatically: when asked whether “only the cat and the dog are sleeping,” children correctly gave No-responses 86 percent of the time. When members of the animal set were explicitly identified but the grammatical need to generate alternatives was eliminated (i.e. when simply asked whether “the cat and the dog are sleeping”), children accurately responded with an affirmative answer 93 percent of the time. At present, the extent to which children can monitor linguistic alternatives in different contexts in accordance with informativeness/relevance expectations remains unknown to a large extent (although see Skordos and Papafragou to appear). Some evidence, however, suggests that the ability to consider contrastive alternatives is not only within young learners’ reach but is, in fact, very active in language acquisition (see Papafragou and Tantalou 2004). For instance, 2-year-olds can use the fact that an adult used a novel word (e.g. dax) rather than a known word (e.g. car) in a context that contains a car and a novel, unlabeled object to infer that the novel word must refer to the novel object (Carey 1978b; Halberda 2006). This inference stems from the assumption that word (especially, noun) meanings are mutually exclusive or contrastive (Clark 1987, 1988; Markman 1989). The assumption of mutual exclusivity/contrast is suspended for words that belong to different levels of description or to different languages (Au and Glusman 1990; Diesendruck 2005), presumably because in these cases the known label is not considered a relevant alternative to the novel label (Barner et al. 2011. Inferences driven by mutual exclusivity/contrast are distinct from implicatures in several ways; nevertheless, both involve calculating speaker meaning (including speaker intention) and both require holding in mind and negating lexical alternatives to an expression used by the speaker. For these reasons, some researchers consider these processes in early word learning to be essentially Gricean in nature (Gathercole 1989; Clark 1990; Diesendruck and Markson 2001). Even though the precise affinities between the mechanisms underlying mutual exclusivity/lexical contrast and SIs remain open, the fact that young children successfully consult known lexical alternatives in conjecturing

Scalar Implicature 623 meanings for novel words bolsters the conclusion that children’s difficulties with SIs are not due to a complete inability to reason about conversational pragmatics. How do children come to organize lexical alternatives in the form of scales and use such alternatives to compute SIs? Obviously, the first step for children is to acquire the lexical semantics of individual scalar expressions. This step might take place quite early in development: for instance, very young children seem to know that some and all refer to distinct set relations (Barner, Chow, and Yang 2009). Beyond this semantic step the ability to treat terms such as some and all as scalemates requires further learning. The linguistic literature suggests that knowledge underlying which items form scales and which do not is quite complex: scalemates need to be expressions of equal length and complexity that are syntactically replaceable (Horn 1972; Levinson 2000; Katzir 2007), and scales obey several other constraints (Hirschberg 1985; Horn 1989; Matsumoto 1995).6 To develop sensitivity to what count as scalar alternatives, children need to be able to acquire these fine-grained restrictions on scales. One cue for grouping scale members together may come from explicit contrasts in adult speech (“You can eat some of the cookies, but not all of them.”). Another cue may be offered by the syntactic properties of semantically related lexical items (Barner et al. in press). The ability to access scalar alternatives seems to be undergoing development until at least the age of 7 or 9, since some studies find that children are still not quite adultlike at this age, especially if presented with scalar statements in isolation (e.g. Noveck 2001). Beyond these broad points, there are several possibilities regarding the precise theoretical and psychological status of scales. Recall that contextualist accounts such as Relevance theory (Sperber and Wilson 1985/1996; Noveck and Sperber 2007) and traditional Gricean/neo-Gricean accounts (Horn 1989) treat all SIs as products of context-driven inference. In contrast, defaultist accounts such as Levinson’s (2000) neo-Gricean account and the grammatical view of Chierchia et al. (2009) hold that SIs from logical expressions are generated by default, independently from contextual input, and later canceled if necessary. These two types of account make distinct commitments about the nature of scales. According to contextualist theories, there is no difference between logical (entailment-based) scales such as < all, some > and encyclopedic/ad hoc contextual scales (see examples (1)–(3) versus (4) in section 26.2): in both cases, stronger scalemates are accessed only when contextually relevant. According to defaultist accounts, logical scales differ in several respects from contextual scales: for instance, the scale < all… some > forms part of the lexical entry of individual quantifiers and stronger scalemates are accessed automatically every time a weak quantifier is encountered. One might hypothesize that such a scale might be easier to acquire compared to contextual scales because of the stable logical ordering of its members; furthermore, once established, it might lead to more reliable calculation of SIs 6 According to this literature, synonyms are expected to give rise to similar SIs since SIs are considered non-detachable (Horn 1972). Children seem to behave in accordance with this constraint (see Papafragou 2006, on inferences from start versus begin in Greek).

624 Anna Papafragou and Dimitrios Skordos compared to more context-dependent scales. So far no advantage has been detected for logical as opposed to non-logical scales: some studies find no difference between logical and contextual scales in children’s calculation of SIs (Papafragou and Tantalou 2004), whereas others find that contextual scales are easier for children (Katsos and Smith 2010; Barner et al. 2011). Even within the logical class, there are differences in terms of how often children succeed in deriving SIs from expressions belonging to different scales (Noveck 2001; Papafragou and Tantalou 2004; Papafragou 2006; Verbuk 2006b; Pouscoulous et al. 2007; and section 26.4). Taken together, this evidence tentatively supports two conclusions: first, the ability to appreciate what counts as a relevant alternative develops in scale-specific ways; second, the fact that logical scales—unlike contextual ones—involve stable semantic orderings of lexical alternatives does not guarantee that children find it easier to build such scales or draw from them when they need to generate lexical alternatives. The two theoretical positions sketched in this section also lead to different predictions about the online computation of SIs. According to contextualist theories, since SIs are always inferred as part of the speaker’s intended meaning with more or less cognitive effort depending on the literal meaning of the sentence and the available context, the generation of SIs should be cognitively costly: as a result, more time should be required to process a sentence where an SI needs to be generated compared to a sentence without an SI. According to defaultist accounts, since (entailment-based) SIs are generated by default and then canceled when necessary, other things being equal, it should take more time to process a sentence requiring the cancellation of an SI than of a sentence where the SI is simply generated but not canceled. Available evidence from adults appears to fall in line with contextualist accounts. In a reaction time study (Bott and Noveck 2004), adults who judged underinformative sentences such as “Some elephants are mammals” to be false took longer than those who judged them to be true. This shows that adults take longer to compute implicatures than to arrive at an utterance’s literal meaning and suggests that additional processes are involved (cf. Rips 1975; Noveck and Posada 2003; but see Feeney et al. 2004). In another study, adults reliably took more time to read sentences in which the generation of the SI was warranted by the context compared to sentences where no SI was warranted (Breheny et al. 2006). Further support for contextualist accounts comes from evidence showing that the amount of cognitive load from a concurrent task affects SI generation: when asked to judge underinformative sentences of the form “Some oaks are trees,” adults who were simultaneously engaged in a relatively easy secondary task were more likely to accept the sentences as true compared to other adults who were engaged in a relatively more demanding secondary task (de Neys and Schaeken 2007). Recently, studies of the time-course of quantifier processing using a visual world paradigm have also concluded that the generation of SI-strengthened interpretations of quantifiers involves additional time compared to literal interpretations in both adults and children (Huang and Snedeker 2009a, 2009b; but cf. Grodner et al. 2010 and the end of Section 26.3.1 above).

Scalar Implicature 625

26.4 The Interface between Semantics and Implicature As mentioned in the Introduction, a wide variety of expressions in natural language seem to form entailment-based scales. According to traditional accounts of SIs (Horn 1972; Grice 1975), all such scalar expressions should follow the same logic and should give rise to SIs in a similar way. A striking finding from past developmental studies is that not all scalar items are born equal: specifically, numerals seem to behave differently from quantifiers. This finding has important implications for semantic and pragmatic theories of scalars, so it is worth examining in some detail. The main observation comes from a study by Papafragou and Musolino (2003) already discussed in Section 26.3.1. In that study, 5-year-olds were asked to say whether they agreed or not with a puppet’s description of the outcome of a story. The study found that, in a story in which a set of three horses jumped over a fence, children were much more likely to reject the statement “Two of the horses jumped over the fence” than the statement “Some of the horses jumped over the fence.” Further studies showed that the tendency to interpret some as lower-bounded (“at least some”) but a number such as two as exact emerges early during language acquisition: in a sentence-to-picture matching task, 3-year-olds were more likely to select a picture in which an alligator had taken all four in a set of cookies after hearing the sentence “The alligator took some of the cookies” than the sentence “The alligator took two of the cookies” (Hurewitz et al. 2006; see also Barner, Chow, and Yang 2009; Huang et al. 2013). Moreover, as mentioned earlier, online studies of how children and adults process numbers versus quantifiers reveal that SI calculation begins to influence referential processing after a delay in the case of quantifiers but impacts the comprehension process more rapidly for numbers (Huang and Snedeker 2009a, 2009b). Despite this preference for exact number meanings, both children and adults display flexibility in their interpretation of number terms adopting, for instance, “at least” interpretations for numerals if the context warrants them (Musolino 2004). Finally, numerals are not the only scalar expressions that behave differently from weak scalars such as some: the proportional quantifier half also seems to preferentially give rise to exact interpretations. For instance, in a story in which a puppet finished building a tower, Greek-speaking children were much more likely to reject the statement “The puppet built half of the tower” than the statement “The puppet began to build the tower” (Papafragou 2006; cf. also Papafragou and Schwarz 2004 for a similar comparison between half and most). How are these differences between numerals and quantifiers to be explained? One interpretation of these findings is that numbers (and half) have exact semantics, unlike other weak scalar terms such as the quantifier some that have a lower-bounded, “at least” semantics and are pragmatically upper-bounded by an SI (Papafragou and Musolino 2003; Musolino 2004; Hurewitz et al. 2006; Papafragou 2006). According to this view,

626 Anna Papafragou and Dimitrios Skordos the meaning of two is plainly TWO and, depending on a contextual parameter, can yield “exact” or “at least” (and sometimes “at most”) interpretations (see Breheny, 2008, for a full semantic proposal). The “exact” view of numbers is compatible with evidence that numbers behave differently from regular, lower-bounded scalars such as quantifiers on a number of linguistic tests (Sadock 1984; Koenig 1991; Horn 1992; Carston 1995, 1998; Breheny 2008). For example, the “at least” and the “exact” interpretations of numbers intuitively belong to the truth-conditional content of numerally modified statements. In the dialogs in (5) and (6) (based on Horn 2004), the cardinal two allows a disconfirmatory response in case the “at least two, possibly more” reading applies. Contrast this with some, where the responder may cancel the SI using an affirmative response: (5) A: Does she have two children? B1: No, she has three. B2: ?Yes, (in fact) three. (6) A: Are some of your friends vegetarians? B1: ?No, all of them. B2: Yes, (in fact) all of them. The semantic reanalysis of numerals is consistent with proposals within developmental psychology according to which number words are mapped onto a dedicated magnitude system with exact semantics (a system which represents exact and unique numerosities; Gelman and Gallistel 1978; Gelman and Cordes 2001). On this picture, the processes underlying the acquisition of number words are distinct from the mechanisms responsible for learning and evaluating (non-numerical) quantified expressions. Numbers and quantifiers obey different principles: the numerical scale < …, four, three, two, one >, unlike the quantificational scale < all, most, many, … some >, has an ordering rule that strictly and completely determines the internal structure of the scale and the positioning of its members. Specifically, the numerical scale is based on the successor function which yields the next member of the scale through the rule “n + 1.” Children seem to recognize the distinct logic underlying numbers and quantifiers: 2-and 3-year-olds know that numbers such as six but not quantifiers such as a lot apply to unique specific quantities, even if they do not know any number word beyond one (Sarnecka and Gelman 2004). Furthermore, children begin to explicitly memorize the number list before they acquire number meanings, and even 2-year-olds are able to recite at least part of the list (e.g. “one, two, three”; Fuson 1988). The count list is essential for mapping number words onto number meanings. No comparable list exists for quantifiers or other scalars (Hurewitz et al. 2006). A different interpretation of the quantifier–number asymmetry was recently put forth by Barner and Bachrach (2010). On this proposal, numbers receive a unified semantic analysis with quantifiers and other scalars, that is, they have lower-bounded (“at least”) semantics and are upper-bounded by an SI (see also Levinson 2000). The reason young children successfully compute upper-bounded quantity interpretations from the use of numbers such as two but not from the use of quantifiers such as some is simply that lexical alternatives for numbers are easier to retrieve compared to quantifiers: as already

Scalar Implicature 627 pointed out, numbers belong to an explicitly memorized list of scalar alternatives—the count line—which is available to children even before individual number meanings are learned. More precisely, the count line provides easily accessible scalemates which serve to constrain the interpretation of novel numerals (i.e. numerals for which children have not acquired adultlike semantics). On this account, 2-year-olds initially assign a weak, “at least” interpretation to one. Once children acquire a weak (“at least”) interpretation for two, they use the contrast to two to set an upper boundary to their meaning of one (i.e. “no more than one”). By this account, assigning one an exact interpretation depends on first acquiring two, and deriving “exactly one” via pragmatic inference (similarly for assigning two an exact interpretation). Thus it is argued that children as young as 2 are able to compute certain types of SIs, since they rely on pragmatic inference to derive exactness in number meanings (but not upper-boundedness in quantifier meanings). One piece of evidence consistent with this account is that very young children use known numerals to constrain the interpretation of unknown numerals: when 2-year- olds who know the meaning of the word one (but no higher number) are shown two sets—for example, one balloon and five balloons—they infer that the word five refers to the set of five objects, despite not knowing its meaning (Wynn 1992; Condry and Spelke 2008). Crucially, 2-year-olds do not use known numerals to restrict known quantifiers such as some (Condry and Spelke 2008) or novel quantifiers such as toma (Wynn 1992).7 At this point, the number data await a final synthesis. We simply note that a semantic account that treats numbers as lower-bounded expressions on a par with weak quantifiers leaves open several questions. First, this account crucially relies on the assumption that pragmatic inferences used to fix lexical meaning for unknown numerals are truly comparable to SIs—an assumption which requires further empirical scrutiny (see discussion of mutual exclusivity/contrast in section 26.3.2). Second, it is unclear how a unified semantic account for numbers and quantifiers can explain the fact that the two types of expression seem to make distinct truth-conditional contributions (see (5)– (6) above). Finally, an explanation of children’s early success with exact meanings for numbers needs to extend to proportional quantifiers such as half, which nevertheless do not form part of the canonical number line or the standardly rehearsed counting routine. It is plausible that children associate half with its stronger scalemate all by hearing adults explicitly contrast the two terms (“Eat all your vegetables, not half of them”). Whether such contrasts are more frequent and potent compared, for example, to contrasts between some and all, remains to be determined. 7 In Wynn’s (1992) account, children consider known (exact) number labels in trying to figure out the meaning of novel number words. This proposal is consistent with the linguistic literature on SIs that generally take alternatives invoked for implicature-calculation to be other lexical items the speaker could have used. But in Barner and Bachrach’s account (see also Sauerland 2004; Spector 2007), scalar alternatives for numbers are not known number words but pragmatically enriched interpretations of known number words. That is, children use implicature-enriched “exact” interpretations of known numbers such as one contrastively in trying to constrain the meaning of unknown number words such as two. In this sense, implicature-calculation for numbers remains different from implicature-calculation for quantifiers and other scalars.

628 Anna Papafragou and Dimitrios Skordos

26.5 Conclusions Children’s ability to compute scalar implicatures is currently a very active area of research, where developmental data and semantic- pragmatic theorizing mutually inform and constrain each other. The emerging consensus from this literature is that children are capable of deriving pragmatic inferences and that the mechanism for deriving such inferences is broadly similar to the mechanism underlying utterance interpretation in adults. Nevertheless, children’s computation of SIs is limited in crucial respects, such as the ability to construct relevant alternatives from the use of a weak scalar expression, which seems to undergo development until well into the school years. An important direction for future work is to integrate the detailed findings on SIs with other findings in developmental pragmatics, including relevance implicatures (Verbuk 2006b; de Villiers et al. 2009; Shulze et al. 2010), pragmatic enrichment (Noveck et al. 2009), reference assignment (Nadig and Sedivy 2002), and various forms of nonliteral speech (Vosniadou 1987; Bernicot et al. 2007), on which children’s performance is similarly mixed. Future work also needs to relate results on SI-calculation in older children to findings from early word learning that indicate—somewhat paradoxically— precocious abilities to interpret communicative intentions in toddlers (Tomasello 1992; Baldwin 1993b; Bloom 2000; cf. also section 26.3.2). A promising hypothesis in integrating this entire body of work is that non-linguistic cues such as eye gaze can be used early and accurately to make basic inferences about the speaker’s referential intent in uttering a novel word; nevertheless, the ability to integrate linguistic and non-linguistic information to go beyond the words uttered and infer additional meanings the speaker had in mind (as is the case for SIs and other complex cases of inference) takes time and protracted linguistic experience to develop. The present data on children’s implicature calculation place important constraints on theories of semantic development. Most approaches to word learning to date have not addressed the fact that the interpretation of words is context-dependent or that words in use give rise to conversational inferences. Furthermore, the literature on semantic development sometimes implicitly assumes that pragmatic (“contextually enriched”) interpretations are acquired at a later stage than “pure” semantic meaning—hence it is possible to study the acquisition of semantics independently from pragmatic development. The results on SI reviewed in this chapter show that this approach is unproductive. First, as shown here, pragmatic interpretations are well within the grasp of young children. Second, semantic content is not transparent but is bundled with pragmatic inference in the input to children: to infer the correct (“at least”) meaning for scalar expressions, learners need to process adult uses of scalar items that may carry either lower-bounded or upper-bounded interpretations. Third, and most importantly, a pragmatically- informed approach captures important generalizations about how natural-language meaning is structured and learned. In the case of scalars, a single, powerful mechanism is responsible for delivering SIs from a wide variety of expressions

Scalar Implicature 629 (e.g. some, or, may) that all submit to a unified semantic-pragmatic treatment (lower- bounded semantics with upper-bounding implicatures). Furthermore, this mechanism applies universally: as the previous sections showed, implicatures from weak scalars arise cross-linguistically in much the same way (e.g. English some, Greek meriki, and French quelques all give rise to “not all” inferences). Thus the proper division of labor between semantics and SIs has wide explanatory potential, since it provides a unified account of what it is learners need to know about scalar expressions in any language. Together these observations suggest that theories of the acquisition of word meaning should take into serious consideration the pragmatic inferences which words give rise to in context. In certain cases (such as number words), the lexical semantics and pragmatics of child language can even provide evidence for the contents of the adult lexicon.

Acknowledgments Preparation of this chapter was partly supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NIH) grant 5R01HD55498-3 to A.P. We thank Ira Noveck for useful comments.

Pa rt V

T H E OR I E S OF LEARNING

Chapter 27

C om pu tati ona l T heories of L e a rni ng and Devel op me nta l Psycholing u i st i c s Jeffrey Heinz

27.1 Introduction A computer is something that computes, and since modern theories of cognition assume that humans make computations when processing information, humans are computers. What kinds of computations do humans make when they learn languages? Answering this question requires the collaborative efforts of researchers in several different disciplines and sub-disciplines, including language science (e.g. theoretical linguistics, psycholinguistics, language acquisition), computer science, psychology, and cognitive science. The primary purpose of this chapter is explain to developmental psycholinguists and language scientists more generally the main conclusions and issues in computational learning theories. This chapter is needed because 1. the mathematical nature of the subject makes it largely inaccessible to those without the appropriate training (though hopefully this chapter shows that the amount of training required to understand the main issues is less than what is standardly assumed); 2. the literature contains a number of unfortunate, yet widely cited, misunderstandings of the relevance of work in computational learning for language learning. I will try to clarify these in this chapter.

634 Jeffrey Heinz Learner

Experience

Languages

Figure 27.1 Learners are functions from experience to languages.

The main points in this chapter are: 1. The central problem of learning is generalization. 2. Consensus exists that, for feasible learning to occur at all, restricted, structured hypothesis spaces are necessary. 3. Debates pitting statistical learning against symbolic learning are misplaced. To the extent meaningful debate exists at all, it is about the learning criterion; i.e. how “learning” ought to be defined. In particular, it is about what kinds of experience learners are required to succeed on in order to say that they have “learned” something. 4. Computational learning theorists and developmental psycholinguists can profitably interact in the design of meaningful artificial language learning experiments. In order to understand how a computer can be said to learn something, a definition of learning is required. Only then does it become possible to ask whether the behavior of the computer meets the necessary and sufficient conditions of learning required by the definition. Computational learning theories provide definitions of what it means to learn and then asks, under those definitions: What can be learned, how and why? Which definition is “correct” of course is where most of the issues lie. At the most general level, a language learner is something that comes to know a language on the basis of its experience. All computational learning theories consider learners to be functions which map experience to languages (Figure 27.1). Therefore in order to define learning, both languages and experience need to be defined first.

27.2 Languages, Grammars, and Experience 27.2.1 Languages Before we can speak of grammars, which are precise descriptions of languages, it will be useful to talk about languages themselves. In formal language theory, languages

Computational Theories of Learning 635 are mathematical objects which exist independently of any grammar. They are usually defined as subsets of all logically possible strings of finite length constructible from a given alphabet. This can be generalized to probability distributions over all those strings, in which case they are called stochastic languages. The alphabet can be anything, so long as it is unchanging and finite. Elements of the alphabet can represent IPA symbols, phonological features, morphemes, or words in the dictionary. If desired, the alphabet can also include structural information such as labeled phrasal boundaries. It follows that any description of sentences and words that language scientists employ can be described as a language or stochastic language with a finite alphabet.1 It is useful to consider the functional characterizations of both languages and stochastic languages because they are the mathematical objects of interest to language scientists. As functions, a language L maps strings to one only if the string is in the language and all other logically possible strings are mapped to zero. Stochastic languages, as functions, map all logically possible strings to real values between zero and one such that they sum to one. Figure 27.2 illustrates functional characterizations of English as a language and as a stochastic language. The functional characterization of English as a language only makes binary distinctions between well-formed and ill-formed sentences. On the other hand, the functional characterization of English as a stochastic language makes multiple distinctions. In both cases, the characterizations are infinite in the sense that both assign nonzero values to infinitely many possible sentences. This is because there is no principled upper bound on the length of possible English sentences.2 How stochastic languages are to be interpreted ought to always be carefully articulated. For example, if the real numbers are intended to indicate probabilities of occurrence then the functional characterization in Figure 27.2 says that “John sang” is twice as likely to occur as “John and Mary sang” On the other hand, if the real numbers are supposed to indicate well-formedness, then the claim is that “John sang” is twice as well- formed (or acceptable) as “John sang and Mary danced.”3 As explained in the next section, from a computational perspective, the distinction between stochastic and non-stochastic languages is often unimportant. I use the word pattern to refer to both stochastic and non-stochastic languages in an intentionally ambiguous manner.

1 Languages with infinite alphabets are also studied (Otto 1985), but they will not be discussed in this chapter. 2

and John sang If there were, then there would be a value n such that “John sang    would be well- n-1times

and John sang formed but “John sang    ” would be as ill-formed as “John and sang.” ntimes

3

There is a technical issue here. If there are infinitely many nonzero values, then it is not always the case that they can be normalized to yield a well-formed probability distribution. For example, if each sentence is equally acceptable, we would expect a uniform distribution. But the uniform distribution cannot be defined over infinitely many elements since the probability for each element goes to zero.

636 Jeffrey Heinz

English as a language John sang → 1 John and sang → 0 John sang and Mary danced → 1

English as a stochastic language John sang → 1.2 × 10−12 John and sang → 0 John sang and Mary danced → 2.4 × 10−12

Figure 27.2 Fragments of functional characterizations of English as a language and a stochastic language.

27.2.2 Grammars Grammars are finite descriptions of patterns. It is natural to ask whether every conceivable pattern has a grammar. The answer is No. In fact most logically possible patterns cannot be described by any grammar at all of any kind. There is an analogue to real numbers. Real numbers are infinitely long sequences of numbers and some are unpredictable in an important kind of way: no algorithm exists (nor can ever exist) which can generate the real number correctly up to some arbitrary finite length; such reals are called uncomputable. Sequences for which such algorithms do exist (like π) are computable. More concretely, a real number is computable if and only if a Turing machine exists which can compute the exact value of the real number to any arbitrary degree of precision (and so can always provide the nth digit in its decimal expansion). A Turing machine is one of the most general kinds of computing device, and, by the Church–Turing thesis, Turing machines can instantiate any algorithm. Turing’s (1937) discovery was that uncomputable real numbers turn out to be the most common kind of real number and so most real numbers cannot be computed by any algorithm! Such a result may be initially hard to understand (after all, what is an example of an uncomputable real number?),4 but it is the foundation for the modern study of computation. Like real numbers, most logically possible patterns cannot be described by any Turing machine or other kind of grammar. Grammars are algorithmic in the sense that they are of finite length but describe potentially infinitely-sized patterns. In this way, grammars are just like machines or any other computing device. The Chomsky Hierarchy classifies logically possible patterns into sets of nested regions (Figure 27.3). Recursively Enumerable (r.e.) patterns are those for which there exists a Turing machine which answers affirmatively when asked, for any nonzero valued string s belonging to the pattern, whether s in fact has a nonzero value (Turing 1937; Rogers 1967; Harrison 1978).5

4

See Chaitin (2004). In contemporary theoretical computer science, the name “computably enumerable” is often used instead of “recursively enumerable.” This class is also called “semi-decidable.” 5

Computational Theories of Learning 637

Chumash sibilant harmony Applegate 1972

Finite

Regular

English consonant clusters Clements and Keyser 1983

English nested embedding Chomsky 1957

Swiss German Shieber 1985 Yoruba copying Kobele 2006

Context-free

Kwakiutl stress Bach 1975

Mildly contextsensitive

Contextsensitive

Primitive recursive Recursive Recursively enumerable

Figure 27.3 The Chomsky Hierarchy with natural language patterns indicated.

Recursive patterns are those for which a Turing machine exists, which, when asked what value the pattern assigns to any logically possible string, returns the right value.6 Therefore, language scientists which attribute the ability to discriminate well-formed from ill-formed sentences as part of linguistic competence, are tacitly asserting that sentence patterns in natural language are recursive. Recursive patterns are also called computable, or Turing-computable. Smaller regions correspond to patterns describable with increasingly less powerful machines (grammars). For example, the regular patterns are all those that can be described by machines that admit only finitely many internal states. In contrast, machines which generate nonregular patterns must have infinitely many internal states. The smallest region, the class of finite patterns, are those whose functional characterizations have only finitely many sentences with nonzero values. For further details regarding the Chomsky Hierarchy, readers are referred to Partee et al. (1993) and Sipser (1997).7 If the machines are probabilistic, then the stochastic counterparts of each class is obtained. Probabilistic machines are simply ones that may use random information 6 This class is also called “decidable” because for any recursive pattern, it is always possible to decide for any input, what its value is (0 or 1 or something else). This is in contrast to the r.e. (or semi-decidable) class where the machine may not answer—and run forever—on inputs with zero values. 7 Harrison (1978), Hopcroft et al. (1979, 2001), and Thomas (1997) offer more technical treatments.

638 Jeffrey Heinz Learner

Experience

Grammars

Figure 27.4 Learners are functions from experience to grammars.

(like coin flips) while running. Stochastic recursive languages are those describable with probabilistic Turing machines. By definition, such machines describe all computable probability distributions over all possible sentences. Similarly, regular stochastic languages are those describable by probabilistic machines which admit only finitely many states. Thus the crucial feature of regular patterns is not whether they are stochastic or not, but the fact they only require grammars that distinguish finitely many states. It is of course of great interest to know what kinds of patterns natural languages are. Figure 27.3 shows where some natural language patterns fall in the Chomsky Hierarchy. For example, phonological patterns do not appear to require grammars that distinguish infinitely many states unlike some syntactic patterns, which appear to require grammars that do.8 This distinction between these two linguistic domains is striking (Heinz and Idsardi 2011, 2013). It is also important to understand the goals of computational research of natural language patterns. In particular, establishing complexity bounds is different from hypotheses which state both necessary and sufficient properties of possible natural language patterns. For example the hypothesis that natural language patterns are mildly context-sensitive (Joshi 1985), is a hypothesis that seeks to establish an upper bound on the complexity of natural language. Joshi is not claiming, as far as I know, that any mildly context-sensitive pattern is a possible natural language one. In my opinion, it is much more likely that possible natural language patterns belong to subclasses of the major regions of the Chomsky Hierarchy. For example, Heinz (2010a) hypothesizes that all phonotactic patterns belong to particular subregular classes. I return to these ideas in section 27.5. Although from the perspective of formal language theory, grammars are the mathematical objects of secondary interest, it does matter that learners return a grammar, instead of a language. This is for the simple reason that, as mathematical objects, grammars are of finite length and the functional characterizations of patterns are infinitely long. Thus while Figure 27.1 describes learners as functions from experience to languages, they are more accurately described as functions from experience to grammars (Figure 27.4).

8 For more on the hypothesis that all phonological patterns are regular Kaplan and Kay (1994); Eisner (1997); and Karttunen (1998). Readers are referred to Chomsky (1956) and Shieber (1985) for arguments concerning the nonregular nature of grammars for human syntax.

Computational Theories of Learning 639

s0 s1 s2 ...

time

sn

Figure 27.5 The learner’s experience.

While the distinctions in the Chomsky Hierarchy can be used to classify the computational complexity of language patterns, they are much more general in the sense that they can be used to classify the complexity of many objects, such as real numbers, functions, or, as we will see, the kind of experience language learners receive in the course of learning.

27.2.3 Experience There are many different kinds of experience learning theorists consider, but they agree that the experience is a finite sequence (Figure 27.5). It is necessary to decide what the elements si of the sequence are. In this chapter we distinguish four kinds of experience. Positive evidence refers to experience where each si is known to be a nonzero-valued sentence of the target pattern. Positive and negative evidence refers to experience where each si is given as belonging to the target pattern (has a nonzero value) or as not belonging (has a zero value). Noisy evidence refers to the fact that some of the experience is incorrect. For example, perhaps the learner has the experience that some si belongs to the target language, when in fact it does not (perhaps the learner heard a foreign sentence or someone misspoke). Queried evidence refers to experience learners may have because they specifically asked for it. In principle, there are many different kinds of queries learners could make. This chapter does not address these last two kinds; readers are referred to Angluin and Laird (1988) and Kearns and Li (1993) for noisy evidence; and to Angluin (1988a, 1990), Becerra-Bonache et al. (2006), and Tîrnauca (2008) for queries.

27.2.4 Learners as Functions Armed with the basic concepts and vocabulary all learning theorists use to describe target languages, grammars, and experience, it is now possible to define learners. They are simply functions that map experience to grammars. For the most part formal learning theorists are concerned with computable functions. This is because an uncomputable learning function cannot be instantiated on any known computing device—such as a

640 Jeffrey Heinz human brain—and furthermore by the Church–Turing thesis it is impossible for it to be instantiated on any computing device. The characterization of learners in this section is very precise, but it is also very broad. Any learning procedure can be thought of as a function from experience to grammars, including connectionist ones (e.g. Rumelhart and McClelland 1986b), Bayesian ones (Griffiths et al. 2008), learners based on maximum entropy (e.g. Goldwater and Johnson 2003), as well as those embedded within generative models (Wexler and Culicover 1980; Berwick 1985; Niyogi and Berwick 1996; Tesar and Smolensky 2000; Niyogi 2006). Each of these learning models, and I would suggest any learning model, takes as its input a finite sequence of experience and outputs some grammar, which defines a language or a stochastic language. Consequently, all of these particular proposals are subject to the results of formal learning theory.

27.3 What is Learning? It remains to be defined what it means for a function which maps experiences to grammars to be successful. After all, there are many logically possible such functions, but we are interested in evaluating particular learning proposals. For example, we may be interested in those learning functions that are human-like, or which return human-like grammars.9

27.3.1 Learning Criteria It is important to define what it means to learn so that it is possible to determine what counts as a success. The general idea in the learning theory literature is that learning has been successful if the learner has converged to the right language. Is there some point n after which the learner’s hypothesis does not change (much)? Convergence can be defined in different ways, to which I return in the next paragraph. Typically, learning theorists conceive of an infinite stream of experience to which the learner is exposed so that it makes sense to talk about a convergence point. Is there a point n such that for all m ≥ n, Grammar Gm ≃ Gn (given some definition of ≃)? Figure 27.6 illustrates. The infinite streams of experience are also called texts (Gold 1967) and data presentations (Angluin 1988b). All three terms are used synonymously here. 9

This section draws on a large set of learning literature. Readers are referred to Nowak et al. (2002) for an excellent, short introduction to computational learning theory. Niyogi (2006), de la Higuera (2010), and Clark and Lappin (2011) provide detailed, accessible treatments, and Anthony and Biggs (1992), Kearns and Vazirani (1994), Jain et al. (1999), Lange et al. (2008), and Zeugmann and Zilles (2008) provide technical introductions. I have also relied on the following research: Gold (1967); Horning (1969); Angluin (1980); Osherson et al. (1986); Angluin (1988b); Angluin and Laird (1988); Vapnik (1995, 1998); Case (1999).

Computational Theories of Learning 641

datum The Learner ϕ and its Hypotheses over time s0

ϕ(⟨s0⟩) = G0

s1

ϕ(⟨s0, s1⟩) = G1

s2

ϕ(⟨s0, s1, s2⟩) = G2

... sn

time

ϕ(⟨s0, s1, s2, . . . , sn⟩) = Gn

... sm

ϕ(⟨s0, s1, s2, . . . , sm⟩) = Gm

...

Figure 27.6 If, for all m≥n, it is the case that Gm ≃ Gn (given some definition of ≃), then the learner is said to have converged.

Convergence has been defined in different ways, but there are generally two kinds. Exact convergence means that the learner’s final hypothesis must be 100 percent correct. Alternatively, approximate convergence means the learner’s final hypothesis need not be exact, but somehow “close” to 100 percent correct. Defining successful learning as convergence to the right language after some point n, raises another question with respect to experience: on which infinite streams must a learner converge? Generally two kinds of requirements have been studied. Some infinite streams are complete; that is, every possible kind of information about the target language occurs at some point in the presentation of the data. For example, in the case of positive evidence, each sentence in the language would occur at some finite point in the stream of experience. The second requirement is about whether the infinite streams are computable. This has two aspects. First, there are as many infinite texts as there are real numbers and so most of these sequences are not computable. Should learners be required to succeed on these? Or should learners only be required to succeed on those data sequences generable by Turing machines? The second aspect is more technical. Even if every sequence itself is computable, it may be the case that the set of all such sequences is not computable. This happens because, for each individual infinite sequence s in such a set, an algorithm exists which generates s, but no algorithm exists which can generate (all the algorithms for) all the sequences belonging to this set.10 10

As an example, consider the halting problem. This problem takes as input a program p and an input i for p, and asks whether p will run forever on i, or if p will eventually halt. It is known that there are infinitely many programs which do not halt on some inputs. For each such program p choose some input ip. Since ip is an input, it is finitely long and can be generated by some program. But no program exists which can generate every such ip. This is because if it could, it would follow that there is a solution to the halting problem. But in fact, the halting problem is known to be uncomputable; that is, no algorithm exists which solves it (Turing 1937).

642 Jeffrey Heinz The computablity of the data presentations is much more important than it may initially appear. In fact, its importance has been largely overlooked in interpreting the results of computational learning theory. As we will see, requiring learners to succeed on either all or only computable data presentations has important consequences for learnability.

27.3.2 Definitions of Learning Table 27.1 summarizes the kinds of choices to be made when deciding what learning means. The division of the choices into columns labeled “Makes learning easier” and “Makes learning harder” ought to be obvious. Learners only exposed to positive evidence have more work to do than those given both positive and negative evidence. Similarly, learners who have to work with noisy evidence will have a more difficult task than those given noise-free evidence. Learners allowed to make queries have access to more information than those not permitted to make queries. Exact convergence is a very strict demand, and approximate convergence is less so. Finally, requiring learners to succeed for every logically possible presentation of the data makes learning harder than requiring learners only to succeed for complete or computable presentations simply because there are far fewer complete and/or computable presentations. Figure 27.7 shows the proper subset relationships among complete and computable presentations of data. Using the coarse classification provided by Table 27.1, I now classify several definitions of learning (these are summarized in Table 27.2). The major results of these definitions are discussed in the next section. 1. Identification in the limit from positive data. Gold (1967) requires that the learner succeed with positive evidence only (A), noiseless evidence (b), and without queries (C). Exact convergence (D) is necessary: even if the grammar to which the learner converges generates a language which differs only in one sentence from the target language, this is counted as a failure. On the other hand, this framework is generous in that learners are only required to succeed on complete data presentations (e) but must succeed for any such sequence, not just computable ones (F). 2. Identification in the limit from positive and negative data. This is the same except the learner is exposed to both positive and negative evidence (a) (Gold 1967). 3. Identification in the limit from positive data with probability p. In this learning paradigm (Wiehagen et al. 1984; Pitt 1985), learners are probabilistic (i.e. have access to coin flips). Convergence is defined in terms of whether learners can identify the target language in the limit given any text with probability p. Thus this learning criterion is less strict than identification in the limit from positive data because exact convergence is replaced with a kind of approximate convergence (d). Otherwise, it is the same as identification in the limit from positive data.

Computational Theories of Learning 643 Table 27.1 Choices providing a coarse classification of learning frameworks according to whether they make the learning problem easier or harder Makes learning easier

Makes learning harder

a.

positive and negative evidence

A.

positive evidence only

b.

noiseless evidence

B.

noisy evidence

c.

queries permitted

C.

queries not permitted

d.

approximate convergence

D.

exact convergence

e.

complete infinite streams

E.

any infinite sequence

f.

computable infinite streams

F.

any infinite sequence

Complete and computable

Complete

Computable

Logically possible texts

Figure 27.7 Subset relationships between all logically possible classes of texts, classes of complete texts, computable classes of texts, and both complete and computable classes of texts.

4. Identification in the limit from distribution-free positive stochastic data with probability p. Angluin (1988b) considers a variant of Pitt’s framework immediately above where the data presentations are generated probabilistically from fixed, but arbitrary, probability distributions (including uncomputable ones). The term distribution-free refers to the fact that the distribution generating the data presentation is completely arbitrary. Like the previous framework, it is similar to identification in the limit from positive data but makes an easier choice with respect to convergence (d). 5. Identification in the limit from positive recursive data. Wiehagen (1977) considers a paradigm which is similar to identification in the limit from positive data except that the

644 Jeffrey Heinz learner is only required to succeed on complete, computable streams (f), and not any stream. The particular streams that learners are required to succeed on are those generable by recursive functions. 6. Identification in the limit from positive primitive recursive data. This paradigm, also studied by Gold (1967), is similar to the one at point 5. In fact, in terms of the classification scheme in Table 27.1, it is exactly the same. However, this paradigm makes stronger assumptions about the nature of the experience language learners receive as input. Here the data presentations that learners are required to succeed on are only those generable by primitive recursive functions. This class is nested between the recursive class and the context-sensitive class (see Figure 27.3). Therefore, learning in this framework is “easier” than in the one in point 5 because there are fewer data presentations learners need to succeed on. 7. Identification in the limit from computable positive stochastic data. Horning (1969), Osherson et al. (1986), and Angluin (1988b) study learning stochastic languages from positive data. Horning studies stochastic languages generated by context-free grammars where the rules are assigned probabilities with rational values. I focus on Angluin’s framework since she generalizes his study (and those of earlier researchers) to obtain the strongest result. Angluin studies approximately computable stochastic languages. Recall that a stochastic language, or distribution, D maps a string s to a real number, so D(s) = r. A distribution is approximately computable if and only if, for all strings s and for all positive rational numbers ε, there is a total recursive function f which is a rational approximation of D within ε; that is, such that |D(s) – f(s,ε)| < ε. The approximately computable stochastic languages properly include the context-free ones. In Angluin (1988b), as in Horning (1969), the data presentations must be generated according to the target distribution, which is fixed and is approximately computable. In this way, this definition of learning is like identification in the limit from positive recursive texts because learners do need to succeed on any data presentation, but only on complete and computable ones (f).11 On the other hand, instead of exact convergence, convergence need only be approximate (d). 8. Probably Approximately Correct (PAC). This framework makes a number of different assumptions (Valiant 1984; Anthony and Biggs 1992; Kearns and Vazirani 1994). Both positive and negative evidence are permitted (a). Noise and queries are not permitted (b,c). Convergence need only be approximate (d), but the learner must succeed for any kind of data presentation, both non-complete and uncomputable (E,F). What counts as convergence is tied to the degree of “non-completeness” of the data presentation. 11 If a data presentation is being generated from a computable stochastic language, then it is also complete. This is because for any sentence with nonzero probability, the probability of this sentence occurring increases monotonically to one as the size of the experience grows. For example, it is certain that the unlikely side of a biased coin will appear if it is flipped enough times.

Computational Theories of Learning 645 Learner

Experience

Grammars G

Figure 27.8 Learners which are constant functions map all possible experience to a single grammar.

To summarize this subsection, there have been many different definitions of what it means to “learn.” In the next section, the major results within each of these frameworks will be discussed. The factorization of these frameworks by the general properties listed in Table 27.1 makes it easier to interpret the results presented in the next section.

27.3.3 Classes of Languages Before continuing to section 27.4, it is important to recognize that computational learning theories are concerned with learners of classes of languages and not just single languages. This is primarily because every language can be learned by a constant function (Figure 27.8). For example, with any of definitions given in the list above, it is easy to state a learner for English (and just English). Just map all experience (no matter what it is) to a grammar for English. Even if we do not know what this grammar is yet, the learning problem is “solved” once we know it. Obviously, such “solutions” to the learning problem are useless, even if mathematically correct. For this reason, computational learning theories ask whether a collection of more than one language can be learned by the same learner. This more meaningfully captures the kinds of question language scientists are interested in: Is there a single procedure that not only learns English, but also Spanish, Arabic, Inuktitut, and so on?

27.4 Results of Computational Learning Theories Computational learning theorists have identified, given the definitions in the previous section, classes of languages that can and cannot be learned. Generally, formal learning theorists are interested in large classes of learnable languages because they want to see what is possible in principle. If classes of languages are learnable in principle, the next important question is whether they are feasibly learnable. This means whether learners can succeed with reasonable amounts of time and effort where reasonable is defined in

646 Jeffrey Heinz Table 27.2 Foundational results in computational learning theory. Letters in square brackets refer to properties in Table 27.1 Definition of learning

Feasible learnability of the major regions of the Chomsky Hierarchy

1. Identification in the limit from positive data [A b c D e F]

Finite languages are learnable but no superfinite class of languages is learnable, and hence neither are the regular, context-free, context-sensitive, recursive, nor r.e. languages.

2. Identification in the limit from positive and negative data [a b c D e F]

Recursive languages are learnable but the regular languages are not feasibly learnable.

3. Identification in the limit from positive data with probability p [A b c d e F]

For all p > 2/3: same as those identifiable in the limit from positive data.

4. Identification in the limit from distribution-free positive stochastic data with probability p [A b c d e F]

For all p > 2/3: same as those identifiable in the limit from positive data.

5. Identification in the limit from positive recursive data [A b c D e f]

Same as those identifiable in the limit from positive data.

6. Identification in the limit from positive primitive recursive data [A b c D e f]

R.e. languages are learnable but not feasibly.

7. Identification in the limit from computable positive stochastic data [A b c d e f]

Recursive stochastic languages are learnable but not feasibly.

8. Probably Approximately Correct [a b c d E F]

The finite languages are not learnable and hence neither are the regular, context-free, context- sensitive, recursive, nor r.e. languages.

standard ways according to computational complexity theory (Garey and Johnson 1979; Papadimitriou 1994).12 This section provides the largest classes known to be provably learnable under the different definitions of learning above. Where possible, I also indicate whether such classes can be feasibly learned. If one is not familiar with the regions in the Chomsky Hierarchy, it will be helpful to familiarize oneself with them before continuing (Figure 27.3). Table 27.2 summarizes the following discussion.

27.4.1 No Major Region of the Chomsky Hierarchy is Feasibly Learnable Gold (1967) proved three important results. First, a learner exists which identifies the class of recursive languages in the limit from positive and negative data. Second, 12

Discussion of how to measure the computational complexity of learning algorithms is discussed in detail in Valiant (1984); Pitt (1989); de la Higuera (1997, 2010); and Clark and Lappin (2011).

Computational Theories of Learning 647 a learner exists which identifies the finite languages in the limit from positive data, but no learner exists which can identify any superfinite class in the limit from positive data. Superfinite classes of languages are those that include all finite languages and at least one infinite language. It follows from this result that none of the major regions of the Chomsky Hierarchy are identifiable in the limit from positive data by any learner which can be defined as mapping experience to grammars. It is this result with which Gold’s paper has become identified. Gold’s third (and usually overlooked) result is that if learning is defined so that learners need only succeed given complete, positive, primitive recursive texts, then a learner does exist which can learn the class of r.e. languages. Wiehagen (1977) shows that if learning is defined so that learners need only succeed given complete, positive, recursive texts, then only those classes identifiable in the limit from positive data are learnable. Therefore no superfinite class is learnable in this setting. In other words, comparison of this result with Gold’s third result shows that restricting the data presentations to recursive texts does not increase learning power, but restricting them to primitive recursive texts does (see also Case 1999). Angluin (1988b), developing work begun in Horning (1969) and extended by Osherson et al. (1986), presents a result for stochastic languages similar in spirit to the ones above. She shows that under the learning criteria that learners are only required to succeed for presentations of the positive data generable by the target stochastic language, then the class of recursive stochastic languages is learnable.13 This result contrasts sharply with other frameworks that investigate the power of probabilistic learning frameworks. Wiehagen et al. (1984) and Pitt (1985) show that the class of languages identifiable in the limit from positive data with probability p is the same as the class of languages identifiable in the limit from positive data whenever. Angluin (1988b: 5) concludes “These results show that if the probability of identification is required to be above some threshold, randomization is no advantage.” Angluin also shows that for all p, the class of languages identifiable in the limit from positive data with probability p from distribution-free stochastic data is exactly the same as the the class of languages identifiable in the limit from positive data with probability p. Angluin observes that the “assumption of positive stochastic rather than positive [data presentations] is no help, if we require convergence with any probability greater than 2/3.” She concludes “the results show that if no assumption is made about the probability distribution [generating the data presentations], stochastic input gives no greater power than the ability to flip coins.” Finally, in the PAC learning framework (Valiant 1984), not even the class of finite languages is learnable (Blumer et al. 1989). In the cases where learners are known to exist in principle, we may examine their feasibility. In the case of the identification in the limit from positive and negative data, 13

Technically, she shows that the class of approximately computable distributions is learnable. The crucial feature of this class is that its elements are enumerable and computable, which is why I take some liberty in calling them recursive stochastic languages.

648 Jeffrey Heinz Gold (1978) shows that there are no feasible learners for even the regular class of languages. In other words, while learners exist in principle for the recursive class, they consume too much time and resources in the worst cases. In the case of identification in the limit from primitive recursive texts and identification in the limit from computable positive stochastic data, the learners known to exist in principle are also not feasible.14 Table 27.2 summarizes the results discussed in this section. It is worth examining this table to see exactly what makes learning the recursive class possible in principle. I will return to understanding this in section 27.5.

27.4.2 Other Results The facts as presened appear to paint a dismal picture—either large regions of the Chomsky Hierarchy are not learnable even in principle, or if they are, they are not feasibly learnable. However, there are many feasible learners for classes of languages even in the frameworks with the most demanding criteria, such as identification in the limit from positive data and PAC-learning. This rich literature includes Angluin (1980, 1982); Muggleton (1990); Garcia et al. (1990); Anthony and Biggs (1992); Kearns and Vazirani (1994); García and Ruiz (1996, 2004); Fernau (2003); Oates et al. (2006); Clark and Eyraud (2007); Heinz (2008, 2009, 2010a, 2010b); Yoshinaka (2008, 2011); Becerra-Bonache et al. (2010); Clark et al. (2010); Kasprzik and Kötzing (2010); Clark and Lappin (2011); and many others (see for example de la Higuera 2005, 2010). The language classes discussed in those works are not major regions of the Chomsky Hierarchy, but are subclasses of such regions. Some of these language classes are of infinite size and include infinite languages—but they crucially exclude some finite languages so they are not superfinite language classes. Figure 27.9 illustrates the nature of these classes. I return to this point in section 27.5.4 when discussing why the fundamental problem of learning is generalization. Also, the proofs that these classes are learnable are constructive, so concrete learning algorithms whose behavior is understood exist. The algorithms are successful because they utilize the structure inherent in the class, or equivalently, of its defining properties, to generalize correctly. Often the proofs of the algorithm’s success involve characterizing the kind of finite experience learners need in order to make the right generalizations. To sum up, even though identification in the limit from positive data and PAC- learning make the learning problem harder by requiring learners to succeed for any data presentation, so that no learners exist for superfinite classes of languages even in principle, there are feasibly learnable language classes in these frameworks. Furthermore, many of the above researchers have been keen to point out the patterns resembling natural language, which belong to these learnable subclasses.

14

These learners essentially compute an ordered list of grammars for the patterns within the target class. With each new data point, they find the first grammar in this list compatible with the experience so far.

Computational Theories of Learning 649

Finite Finit

Regular

Context-free

Mildly contextsensitive

Contextsensitive

Primitive recursive Recursive Recursively enumerable

Figure 27.9 A non-superfinite class of patterns which cross-cuts the Chomsky Hierarchy.

27.5 Interpreting Results of Computational Learning Theories 27.5.1 Wrong Reactions How have the above results been interpreted? Are those interpretations justified? Perhaps the most widespread myth about formal learning theory is the oft-repeated claim that Gold (1967) is irrelevant because 1. Gold’s characterization of the learning problem makes unrealistic assumptions; 2. Horning (1969) showed that statistical learning is more powerful than symbolic learning. As this section shows, these claims have been made by influential researchers in cognitive science, computer science, computational linguistics, and psychology. In this section I rebut these charges. The authors cited here repeatedly fail to distinguish different definitions of learnability, fail to identify Gold (1967) with anything other than identification in the limit from positive data, and/or make false statements

650 Jeffrey Heinz about the kinds of learning procedures Gold (1967) considers. With respect to the claim that identification in the limit makes unrealistic assumptions, I believe it is fair to debate the assumptions underlying any learning framework. However, the arguments put forward by the authors discussed in this section are not convincing, usually because they say very little about what the problematic assumptions are and how their proposed framework overcomes them without introducing unrealistic assumptions of their own. Before continuing, I would like to make clear that these criticisms are not leveled at the authors’ research itself, which is often interesting, important, and valuable in its own right. Instead I am critical of how these authors have motivated their work within the context of formal learning theory. Consider how Horning is used to downplay Gold’s work. For example, Abney (1996) writes though Gold showed that the class of context free grammars is not learnable, Horning showed that the class of stochastic context free grammars is learnable. (Abney 1996: 21)

The first clause only makes sense if, by “Gold,” Abney is referring to identification in the limit from positive data. After all, Gold did show that the context-free languages are learnable not only from positive and negative data, but also from positive data alone if the learners are only required to succeed on positive, primitive recursive data presentations (#5 in Table 27.2). As for the second clause, Abney leaves it to the reader to infer that Gold and Horning are studying different definitions of learnability. Abney emphasizes the stochastic nature of Horning’s target grammars as if that is the key difference in their results, but it should be clear from section 27.4 and Table 27.2 that the gain in learnability is not coming solely from the stochastic nature of the target patterns. The fact that the only data presentations learners are required to succeed on are computable ones also plays an important role. Several comparisons make this clear. First, approximate, probabilistic convergence itself does not appreciably increase learning power. This is made clear by comparing identification in the limit from positive data with identification in the limit from positive data with probability p (#1 and #3 in Table 27.2). Second, learning stochastic languages instead of non-stochastic languages also does not increase learning power. This is made clear by comparing identification in the limit from positive data with probability p with identification in the limit from positive stochastic data with probability p (#3 and #4 in Table 27.2). Consideration of the PAC learning paradigm bolsters these comparisons. PAC allows approximate convergence and target classes of stochastic languages (in addition to positive and negative data), yet not even the finite class of languages is learnable. What is responsble for these results? In those frameworks, learners are required to succeed for any data presentation. As Gold (1967) established in a non-stochastic setting (identification in the limit from positive primitive recursive data), the picture changes dramatically

Computational Theories of Learning 651 when learners are only required to succeed on data presentations which are not arbitrarily complex. Likewise, Horning’s results follow in no small part from the fact that learners are only required to succeed on computable data presentations, instead of all arbitrary ones (choice f/F in Table 27.1). The same holds true for Angluin’s (1988b) extension of Horning’s work to recursive stochastic languages (approximately computable distributions). However, computability of the data presentations is not the only factor in Angluin’s result. This is made clear by comparing identification in the limit from positive data with identification in the limit from positive recursive data (#1 and #5 in Table 27.2). In both cases, no superfinite class of languages is learnable. In non-stochastic settings, one has to reduce the complexity of the data presentations to primitive recursive ones for the r.e. class to become learnable (identification in the limit from positive recursive data). In other words, in non-stochastic settings, reducing the complexity of the data presentations to the computable, recursive class is not sufficient to make the recursive class learnable, but in stochastic settings, it is enough to make the recursive class learnable. In other words, the stochastic nature of the target patterns in combination with the reduced complexity of the data presentations is what makes the difference in Angluin’s (and Horning’s) results. However, most researchers fail to appreciate the distinctions drawn here. For example, in the introductory text to computational linguistics, Manning and Schütze (1999) write Gold (1967) showed that CFGs [context-f ree grammars] cannot be learned (in the sense of identification in the limit—that is whether one can identify a grammar if one is allowed to see as much data produced by the grammar as one wants) without the use of negative evidence (the provision of ungrammatical examples). But PCFGs [probabilistic context-free grammars] can be learned from positive data alone (Horning 1969). (However, doing grammar induction from scratch is still a difficult, largely unsolved problem, and hence much emphasis has been placed on learning from bracketed corpora . . .). (Manning and Schütze 1999: 386–7)

Like Abney (1996), Manning and Schütze do not mention Gold’s third result that CFGs can be learned if the data presentations are limited to primitive recursive ones. To their credit, they acknowledge the hard problem of learning PCFGs despite Horning’s (and later Angluin’s) results. Horning’s and Angluin’s learners are completely impractical and are unlikely to be the basis for any feasible learning strategy for PCFGs. For this reason, these positive learning results offer little insight on how PCFGs which describe natural language patterns may actually be induced from the kinds of corpus data that Manning and Schütze have in mind. Similarly, in his influential and important thesis on the unsupervised learning of syntactic structure, Klein (2005) writes: Gold’s formalization is open to a wide array of objections. First, as mentioned above, who knows whether all children in a linguistic community actually do learn the

652 Jeffrey Heinz same language? All we really know is that their languages are similar enough to enable normal communication. Second, for families of probabilistic languages, why not assume that the examples are sampled according to the target language’s distribution? Then, while a very large corpus won’t contain every sentence in the language, it can be expected to contain the common ones. Indeed, while the family of context- free grammars is unlearnable in the Gold sense, Horning (1969) shows that a slightly softer form of identification is possible for the family of probabilistic context-free grammars if these two constraints are relaxed (and a strong assumption about priors over grammars is made). (Klein 2005: 4–5)

Again, by “Gold’s formalization,” Klein must be referring to identification in the limit from positive data. Klein’s first point is that it is unrealistic to use exact convergence as a requirement because we do not know if children in communities all learn exactly the same language, and it is much more plausible that they learn languages that are highly similar, but different in some details. Hopefully by now it is clear that Klein is misplacing the reason why it is impossible to identify in the limit from positive data superfinite classes of languages. It is not because of exact convergence; it is because learners are required to succeed for any complete presentation of the data, not just the computable ones. In frameworks that allow looser definitions of convergence (PAC-learning, identification in the limit from positive data with probability p), the main results are more or less the same as in identification in the limit from positive data. A crucial component of Horning’s success is made clear in Angluin (1988b): identification in the limit from computable positive stochastic data only requires learners to succeed for data presentations which are computable. As for the unrealistic nature of exact convergence, is it not a useful abstraction? It lets one ignore the variation that exists in reality to concentrate on the core properties of natural language that make learning possible. Klein then claims that it is much more reasonable to assume that the data presentations are generated by a fixed unchanging probability distribution defined by the target PCFG. This idealization may lead to fruitful research, but it is hard to accept it as realistic. That would mean that for each of us, in our lives, every sentence we have heard up until this point, and will hear until we die, is being generated by a fixed unchanging probability distribution. It is hard to see how this could be true given that what is actually said is determined by so many non-linguistic factors.15 So if realism is one basis for the “wide array of objections” that Klein mentions, the alternative proposed does not look any better. Like Klein (2005), Bates and Elman (1996) also argue that Gold (1967) is irrelevant because of unrealistic assumptions. They write: A formal proof by Gold [1967] appeared to support this assumption, although Gold’s theorem is relevant only if we make assumptions about the nature of the learning 15 Even if we abstract away from actual words and ask whether strings of linguistic categories are generated by fixed underlying PCFGs, the claim is probably false. Imperative structures often have different distributions of categories than declaratives and questions, and the extent to which these are used in discourse depends entirely on nonlinguistic factors in the real world.

Computational Theories of Learning 653 device that are wildly unlike the conditions that hold in any known nervous system [Elman et al. 1996]. (Bates and Elman 1996: 1849)

By now we are familiar with authors identifying Gold (1967) solely with identification in the limit from positive data. What assumptions does Gold make that are “wildly unlike the conditions that hold in any known nervous system?” Gold only assumes that learners are functions from finite sequences of experience to grammars. It is not clear to me why this assumption is not applicable to nervous systems, or any other computer. Perhaps Bates and Elman are taking issues with exact convergence, but as mentioned above, learning frameworks that allow looser definitions of convergence do not change the main results, and even Elman et al. (1996) employ abstract models. Perfors et al. (2010) partially motivate an appealing approach to language learning that balances preferences for simple grammars with good fits to the data with the following: Traditional approaches to formal language theory and learnability are unhelpful because they presume that a learner does not take either simplicity or degree of fit into account (Gold 1967). A Bayesian approach, by contrast, provides an intuitive and principled way to calculate the tradeoff. . . . Indeed it has been formally proven that an ideal learner incorporating a simplicity metric will be able to predict the sentences of the language with an error of zero as the size of the corpus goes to infinity (Solomonoff 1978, Chater and Vitànyi 2007); in many more traditional approaches, the correct grammar cannot be learned even when the number of sentences is infinite (Gold 1967). However learning a grammar (in a probabilistic sense) is possible, given reasonable sampling assumptions, if the learner is sensitive to the statistics of language (Horning 1969). (Perfors et al. 2010: 163)

The first sentence is simply false. While it is true that Gold does not specifically refer to learners which take either simplicity or degree of fit into account, that in no way implies his results do not apply to such learners. Gold’s results apply to any algorithms that can be said to map finite sequences of experience to grammars, and the Bayesian models Perfors et al. propose are such algorithms. The fact that Gold does not specifically mention these particular traits emphasizes how general and powerful Gold’s results are. If Perfors et al. really believe Bayesian learners can identify a superfinite class of languages in the limit from positive data, they should go ahead and try to prove it. (Unfortunately for them, Gold’s proof is correct so we already know it is useless trying.) I address Chater and Vitànyi’s (2007) work in section 27.5.2 so let us move now to the statement that learners that are “sensitive to the statistics of language” can learn probabilistic grammars. This is attributed to Horning with no substantial discussion of the real issues. Readers are left believing in the power of statistical learning, unaware of the real issue of whether learning has been defined in a way as to require learners to succeed on complete and computable data presentations versus all complete data presentations. Again Gold showed that any r.e. language can be learned

654 Jeffrey Heinz from positive primitive recursive texts. Angluin (1988b) showed that learners that are “sensitive to the statistics of language” are not suddenly more powerful (identification in the limit from distribution-free positive stochastic data with probability p). Finally Perfors et al. (2010) hide another issue behind the phrase “under reasonable sampling conditions.” As mentioned previously in the discussion of Klein, I think there is every reason to question how reasonable those assumptions are. But I would be happy if the debate could at least get away from “the sensitivity of the learner to statistics” rhetoric to whether the assumption that actual data presentations are generated according to fixed unchanging computable probability distributions is reasonable. That would be progress and would reflect one actual lesson from computational learning theory.

27.5.2 Chater and Vitànyi (2007) Chater and Vitànyi (2007), who extend work begun in Solomonoff (1978), provide a more accurate, substantial, and overall fairer portrayal of Gold’s (1967) paper than these others, and corroborate some of the points made in this chapter. However, a couple of inaccuracies remain. Consider the following passage: Gold (1967) notes that the demand that language can be learned from every text may be too strong. That is, he allows the possibility that language learning from positive evidence may be possible precisely because there are restrictions on which texts are possible. As we have noted, when texts are restricted severely, e.g., they are independent, identical samples from a probability distribution over sentences, positive results become provable (e.g., Pitt, 1989); but the present framework does not require such restrictive assumptions. (Chater and Vitànyi 2007: 153)

This quote is misleading in a couple of ways. First, Gold (1967) goes much farther than just suggesting learning from positive evidence alone may be possible if the texts are restricted; in fact, he shows this (identification in the limit from positive primitive recursive texts). Second, the claim that their framework does not assume that the streams of experience which are the inputs to the “ideal language learner” is not “restrictive” depends on what one considers to be “restrictive.”16 Section 2.1 of Chater and Vitànyi’s (2007) paper explains exactly how the input to the learner is generated. They explain very clearly that they add probabilities to a Turing machine, in much the same way as probabilities can be added to any automaton. In this case, the consequence is they are able to describe recursive stochastic languages. In fact they conclude this section with the following sentence:

16 The reference to Pitt (1989) is also odd given that this paper does not actually provide the positive results the authors suggest as it discusses identification in the limit from positive data with probability p. Horning (1969), Osherson et al. (1986), or Angluin (1988b) are much more appropriate references here.

Computational Theories of Learning 655 The fundamental assumption concerning the nature of the linguistic input outlined in this subsection can be summarized as the assumption that the linguistic input is generated by some monotone computable probability distribution μc(x). (Chater and Vitànyi 2007: 138)

Thus in one sense their assumption is restrictive because the linguistic input is limited to computable presentations (option F in Table 27.1). One important lesson from computational learning theory that this chapter is trying to get across is that assuming that the data presentations (the linguistic input in Chater and Vitànyi’s terms) are drawn from a computable class is a a primary factor in determining whether all recursive patterns can be learned in principle or whether only superfinite classes can be. On the other hand, the quoted passage from Chater and Vitànyi (2007: 153) above is correct that they are able to relax an assumption made by Angluin (1988b) (and Horning). The data presentations in Chater and Vitànyi’s learning scenario do not need to be generated by a fixed probability distribution that does not change over time. Instead they obtain their result even allowing non-stationary distributions. This means the probability distribution at any given point in the data presentation depends on the sequence of data up to that point. In this way their learning framework overcomes the criticisms I leveled in earlier sections at other researchers who claim that Horning’s learning framework is more realistic than identification in the limit from positive data. On these grounds, Chater and Vitànyi’s result represents a real advance. But at what cost? There is another important difference between the “ideal language learner” and Angluin’s (1988b) learner which should not be overlooked. As Chater and Vitnàyi state clearly in their introduction (2007: 136): “Indeed, the ideal learner we consider here is able to make calculations that are known to be uncomputable.” In other words, not only is the ideal language learner not feasibly computable, it is not computable at all! The fact that the “ideal language learner” can learn all recursive stochastic languages from data presentations generated by computable, non-stationary probability distributions therefore significantly departs from the learning results described in Table 27.2, all of which were assuming learners themselves must be computable functions! If uncomputable learners are worthy of discussion, then it is important to know that the picture changes dramatically in non-stochastic settings. In particular, uncomputable learners with recursive data presentations can learn the r.e. class (Jain et al. 1999: 183)! In other words, permitting uncomputable learners significantly changes the results for identification in the limit from positive recursive data (#5 in Table 27.2). Jain et al. write (1999: 183) “It should be noted that if caretakers and natural phenomena are assumed to be computer simulable, then there is no reason to consider … noncomputable scientists and children.” Chater and Vitànyi also discuss the feasibility of the learner again towards the end of their paper, where they point to a “crucial set of open questions” regarding “how rapidly learners can converge well enough” with the kinds of data in a child’s linguistic environment. Of course it may be that there is some subclass of the recursive stochastic languages that the algorithm is able to learn feasibly, and which may include

656 Jeffrey Heinz natural language patterns. In my view, research in this direction would be a positive development.

27.5.3 Clark and Lappin (2011) Let us now turn to the landmark text by Clark and Lappin (2011). This book provides a thorough and welcome discussion of different computational learning theories and natural language acquisition. Many of the learning frameworks discussed in this chapter are surveyed there, and in many instances Clark and Lappin (2011) presents the same facts presented here. Nonetheless, Clark and Lappin argue forcefully against identification in the limit from positive data as an insightful learning paradigm, instead favoring probabilistic learning frameworks. It is remarkable to me how the same set of facts can be interpreted so differently. Clark and Lappin fault identification in the limit from positive data for making “overly pessimistic idealizing assumptions” (2011: 89). In particular, they identify the “the major problem with the Gold paradigm” as the fact that “it requires learning under every presentation” (2011: 102), including “an adversarial presentation of the data designed to undermine learning” (2011: 97). As they emphasize throughout, this learning paradigm “does not rule out an adversarial teacher who organizes a presentation in a way designed to undermine learning, for example by presenting a string an indefinite number of times at the beginning of a data sequence” (2011: 208). Instead, Clark and Lappin come down squarely in favor of probabilistic learning paradigms. They write “Recent work in probabilistic learning theory offers more realistic frameworks within which to explore the formal limits of human language acquisition” (2011: 98) and that “it is formally more convenient to model language acquisition in a probabilistic paradigm” (2011: 106). Also: “When we abstract away from issues of computational complexity, learning [within a probabilsitic paradigm] is broadly tractable. The first results along these lines are from Horning (1969)” (2011: 109). I find many of Clark and Lappin’s arguments selective. For example, the last statement about ignoring issues of computational complexity is odd because 12 pages earlier this was a criticism of the paradigms in Gold (1967): They “suffer from a lack of computational realism in that they disregard complexity factors and permit the learner unlimited quantities of time and data” (2011: 97). (An excellent discussion of computational complexity occurs in Chapter 7 to which I return in section 27.5.4.) Another example comes from a defense of Horning’s research: “Horning’s work is indeed limited, but it is not the endpoint of this research. Subsequent work greatly extends his results.” (2011: 109). Surely such a defense applies to identification in the limit from positive data! Gold (1967) was certainly not the endpoint of research and has been extensively studied, extended, and used to better understand learning, notably by Angluin (1980, 1982, 1988a), respectively, and even by Clark in his own research with his colleagues (Clark and Eyraud 2007; Clark et al. 2010; among others). Chapter 8 of Clark and Lappin (2011) is titled “Positive Results in Efficient Learning,” and highlights results set in the identification

Computational Theories of Learning 657 in the limit from positive data paradigm! Gold’s (1967) research has led to many variants and variations as described in the books by Osherson et al. (1986); Jain et al. (1999) and in the surveys by Lange et al. (2008) and Zeugmann and Zilles (2008), including variants that specifically address questions relevant to natural language acquisition, such as U-shaped learning (Carlucci et al. 2004, 2007, 2013). Returning to the substantive argument regarding adversarial data presentations, it is true that requiring learners to succeed on every data presentation is a significant factor which contributes to the result that only superfinite classes of languages are learnable under the identification in the limit from positive data paradigm.17 But, as discussed above, this is a factor even in stochastic settings! Clark and Lappin are clearly aware of this. For example, on page 99 when comparing the results of identification in the limt from positive data with identification in the limit from positive and negative data and identification in the limit from distribution-free positive stochastic data with probability p ((#1, #2 and #4 in Table 27.2), they write: [Angluin 1988b] summarizes the situation with respect to various probabilistic models that we discuss later: “These results suggest that the presence of probabilistic data largely compensates for the absence of negative data.” However, this conclusion must be qualified, as it depends heavily on the class of distributions under which learning proceeds. (Clark and Lappin 2011: 99)

And later, they discuss Angluin’s (1988b) identification in the limit from distribution- free positive stochastic data with probability p and explain that: allowing an adversary to pick the distribution over a presentation has the same effect as permitting an adversary to pick a presentation. This effect highlights an important fact: selecting the right set of distributions in a probabilistic learning paradigm is as important as selecting the right set of presentations in the [identification in the limit from positive data] paradigm. (Clark and Lappin 2011: 111)

This is one of the primary lessons of computational learning theory that this chapter has presented.18 Finally, it is worth emphasizing that frameworks which require learners to succeed only on complete and computable data presentations are weaker than frameworks which require learners to succeed on all complete data presentations, computable and uncomputable, for the simple reason that there are more data presentations of

17

However, the example given of an adversarial teacher is not persuasive to me because the problem is not adversarial teachers per se, but adversarial teachers that can generate data presentations that are more complex than those generable by primitive recursive functions. 18 I suspect Clark and Lappin may have misread the sentence quoted on page 99 of their book from Angluin (1988b: 2). They present Angluin’s sentence as a conclusion she has drawn from her study. But this sentence, which occurs in the introduction of Angluin’s paper, is referring to earlier results, which suggest that probabilistic data play this kind of role. She is setting up the topic which her paper

658 Jeffrey Heinz the latter type (Figure 27.7). Learners successful in these more difficult frameworks (mentioned in section 27.4.2) are more robust in the sense that they are guaranteed to succeed for any data presentation, even uncomputable ones. The fact that there are feasible learners which can learn interesting classes of languages under such strong definitions of learning underscores how powerful the positive learning results in these frameworks are.

27.5.4 Right Reactions Gold (1967) provides three ways to interpret his three main results: 1. The class of natural languages is much smaller than one would expect from our present models of syntax. That is, even if English is context-sensitive, it is not true that any context-sensitive language can occur naturally. . . . In particular the results on [identification in the limit from positive data] imply the following: The class of possible natural languages, if it contains languages of infinite cardinality, cannot contain all languages of finite cardinality. 2. The child receives negative instances by being corrected in a way that we do not recognize . . . 3. There is an a priori restriction on the class of texts [presentations of data; i.e. infinite sequences of experience] which can occur. . . (Gold 1967: 453–4)

The first possibility follows directly from the fact that no superfinite class of languages is identifiable in the limit from positive data. The second and third possibilities follow from Gold’s other results on identification in the limit from positive and negative data and on identification in the limit from positive primitive recursive data (#2 and #6 in Table 27.2). Each of these research directions can be fruitful, if honestly pursued. For the case of language acquisition, Gold’s three suggestions can be investigated empirically. We ought to ask 1. What evidence exists that possible natural language patterns form subclasses of major regions of the Chomsky Hierarchy? 2. What evidence exists that children receive positive and negative evidence in some, perhaps implicit, form? investigates. The next sentence in Angluin (1988) reads “These results also invite comparison with a new criterion for finite learnability proposed by Valiant [(Valiant 1984)].” And she continues: Our study is motivated by the question of what has to be assumed about the probability distribution in order to achieve the kinds of positive results on language identification. We define a variant of Valiant’s finite criterion for language identification, and show that in this case, the assumption of stochastically generated examples does not enlarge the class of learnable sets of languages. In other words, Angluin’s actual conclusion is not what Clark and Lappin (2011: 99) suggest it is.

Computational Theories of Learning 659 3. What evidence exists that each stream of experience each child is exposed to is guaranteed to be generated by a fixed, computable process (i.e. computable probability distribution or primitive recursion function)? More generally, what evidence exists that the data presentations are a priori limited? My contention is that we have plenty of evidence with respect to question (1), some evidence with respect to (2), and virtually no evidence with respect to (3). Consider question (1). Although theoretical linguists and language typologists repeatedly observe an amazing amount of variation in the world’s languages, there is consensus that there are limits to the variation, though stating exact universals is difficult (Greenberg 1963, 1978; Mairal and Gil 2006; Stabler 2009). Even language typologists who are suspicious of hypothesized language universals, once aware of the kinds of patterns that are logically possible, agree that not any logically possible pattern could be a natural language pattern. Here is a simple example: many linguists have observed that languages do not appear to count past two (Berwick 1982, 1985; Heinz 2007, 2009). For example, no language requires sentences with at least n ≥ 3 main constituents to have the nth one be a verb phrase (unlike verb-second languages like German). This is a logically possible language pattern. Here is another one: if an even number of adjectives modify a noun, then they follow the noun in noun phrases, but if an odd number of adjectives modify a noun they precede the noun in noun phrases. These are both recursive patterns; in fact, they are regular. According to Chater and Vitànyi (2007), if the linguistic input a child received contained sufficiently many examples of noun phrases which obeyed the even–odd adjective rule above, they would learn it. It is an empirical hypothesis, but I think children would fail to learn this rule no matter how many examples they were given. Chater and Vitànyi can claim that there is a simpler pattern consistent with data (e.g. adjectives can optionally precede or follow nouns) that children settle on because their lives and childhoods are too short for there to be enough data to move from the simpler generalization to the correct one. This also leads to an interesting, unfortunately untestable, prediction, that if humans only had longer lives and childhoods, we could learn such bizarre patterns like the even–odd adjective rule. In other words, they might choose to explain the absence of an even–odd adjective rule in natural languages as just a byproduct of short lives and childhoods, whereas I would attribute it to linguistic principles which exclude it from the collection of hypotheses children entertain. But there is a way Chater and Vitànyi can address the issue: How much data does “the ideal language learner” require to converge to the unattested pattern? The harder learning frameworks—identification in the limit from positive data and PAC—bring more insight into the problem of learning and the nature of learnable classes of patterns. First, these definitions of learning make clear that the central problem in learning is generalizing beyond one’s experience. This is because under these definitions, generalizing to infinite patterns requires the impossibility of being able to learn certain finite patterns (Gold’s first point above). I think humans behave like this.

660 Jeffrey Heinz Consider the birds in Table 27.3. If I tell you birds (a,b) are “warbler-barblers” and ask which other birds (c,d,e,f,g) are warbler-barblers you are likely to decide that birds (c,f,g) could be warbler-barblers but birds (d,e) definitely not. You would be very surprised to learn that in fact birds (a,b) are the only warbler-barblers of all time ever. Humans never even consider the possibility that there could just be exactly two “warbler-barblers.” This insight is expressed well by Gleitman (1990: 12): The trouble is that an observer who notices everything can learn nothing for there is no end of categories known and constructible to describe a situation. (Gleitman 1990: 12; emphasis in original)

Chater and Vitànyi (2007) can say that grammars to describe finite languages are more complex than regular or context-free grammars, and they are right, provided the finite language is big enough. Again, the question is what kind experience does “the ideal language learner” need in order to learn a finite language with exactly n sentences, and is this human-like? This question should be asked of all proposed language learning models. It is interesting to contrast “the ideal language learner” with Yoshinaka’s (2008, 2011) learners which generalize to context-free patterns (anbn) and context-sensitive patterns (anbncn) with at most a few examples (and so those learners cannot learn, under any circumstances, the finite language that contains only those few examples). Second, classes which are learnable within these difficult frameworks have the potential to yield new kinds of insights about which properties natural languages possess which make them learnable. As discussed in section 27.4.2, there are many positive results of interesting subclasses of major regions of the Chomsky Hierarchy which are identifiable in the limit from positive data and/or PAC-learnable, and which describe natural language patterns. The learners for those classes succeed because of the structure inherent to the class—a structure which can reflect deep, universal properties of natural language. Under weaker definitions of learning, where the recursive class of patterns is learnable, such insights are less likely to be forthcoming. Clark and Lappin (2011) have anticipated one way such insight could be forthcoming. I have mentioned that many of the learners which can learn the recursive class of languages in particular learning frameworks require more time and resources in the worst case than is considered to be reasonable. In chapter 7, Clark and Lappin (2011) provide excellent discussion on interpreting the computational complexity of learning algorithms

Table 27.3 Birds (a,b) are “warbler-barblers.” Which birds (c–g) do you think are “warbler-barblers”?

Computational Theories of Learning 661 themselves. As they point out, such infeasibility results provide “a starting point for research on a set of learning problems, rather than marking its conclusion” (2011: 148). This is because infeasible learning results always consider the worst-case. It may be that only some of the languages in a class require enormous time resources, but that others can be learned within reasonable time limits. Clark and Lappin explain that “to achieve interesting theoretical insight into the distinction between these cases, we need a more principled way of separating the hard problems from the easy ones” (2011: 148) which will “distinguish the learnable grammars from the intractable ones” (2011: 149). They go on to suggest that one possibility is to “construct algorithms for subsets of existing representation classes, such as context-free grammars” (2011: 149). In other words, in the learning frameworks where the entire recursive class is learnable, one way to proceed would to be find those subclasses which are feasibly learnable (recall Figure 27.9). As for Gold’s second point, there has been some empirical study into whether children use negative evidence in language acquisition (Brown and Hanlon 1970; Marcus 1993). Also learning frameworks which permit queries (Angluin 1988a, 1990), especially correction queries (Becerra-Bonache et al. 2006; Tîrnauca 2008), can be thought of as allowing learners to access implicit negative evidence. As for the third question at the start of this section, I do not know of any research that has addressed it. It is a hypothesis that the universe is computable (and therefore all data presentations would be as well). It is not clear to me how this hypothesis could ever be tested. Nonetheless, it should be clear that the commonly-cited statistical learning frameworks that have shown probabilistic context-free languages are learnable (Horning 1969), and in fact the recursive stochastic languages are learnable (Angluin 1988b; Chater and Vitànyi 2007) are pursuing Gold’s third suggestion. It also ought to be clear that the positive results that show recursive patterns can be learned from positive, complete, and computable data presentations are “in principle” results. As far as is known, they cannot learn these classes feasibly. Of course, as Clark and Lappin (2011) suggest, it may be possible that such techniques can feasibly learn interesting subclasses of major regions of the Chomsky Hierarchy which are relevant to natural language. If shown, this would be an interesting complement to the research efforts pursuing Gold’s first suggestion, and could also reveal universal properties of natural language that contribute to their learnability.

27.5.5 Summary There are many ways to define learning and how best to define learning to study the acquistion of natural language remains an active area of research, which is unfortunately sometimes contentious. Nonetheless, I hope the discussion in this section has made clear that feasible learning can only occur when target classes of patterns are restricted and structured appropriately. I have emphasized that the central problem of learning is generalization. Also, I hope to have made clear that debates pitting statistical learning

662 Jeffrey Heinz against symbolic learning have been largely misplaced. The real issue is about which data presentations learners should succeed on. Stochastic learning paradigms provide some benefit in learning power, but only when the class of presentations is limited to computable ones. Whether such paradigms are a step towards realism is debatable and not a fact to be taken for granted.

27.6 Artificial Language Learning Experiments Some key questions raised in the last section can in principle be addressed by artificial language learning experiments. These experiments probe the generalizations people make on the basis of brief finite exposure to artificial languages (Gómez and Gerken 2000; Petersson et al. 2004; Folia et al. 2010). The performance of human subjects can then be compared to the performance of computational learning models on these experiments (Dell et al. 2000; Chambers et al. 2002; Onishi et al. 2003; Saffran and Thiessen 2003; Goldrick 2004; Wilson 2006; Frank et al. 2007; Finley and Badecker 2009; Seidl et al. 2009; Finley 2011; Jäger and Rogers 2012; Koo and Callahan 2012; Lai 2015). But the relationship can go beyond comparison and evaluation to design. Well- defined learnable classes which contain natural language patterns are the bases for experiments. As mentioned, there are non-trivial interesting classes of languages which are PAC-learnable, which are identifiable in the limit from positive data, and which contain natural language patterns. The proofs are constructive and a common technique is identifying exactly the finite experience the proposed learners need to generalize correctly to each language in a given class. This critical finite experience is called the characteristic sample. The characteristic sample essentially becomes the training stimuli for the experiments. Other sentences in the language that are not part of the characteristic sample become test items. Finally, more than one learner can be compared by finding test items in the symmetric difference of the different patterns multiple learners return from the experimental stimuli. These points are also articulated by Rogers and Hauser (2010), and I encourage readers to study their paper. Let me provide a simple example to illustrate. Consider the mildly context-sensitive pattern (anbncn) which can learned in principle by both Chater and Vitànyi’s (2007) ideal language learner and Yoshinaka’s (2011) learner. However, each model requires a different amount of data to converge to this target. Which is more human-like? What about a mildly context-sensitive pattern outside the class of patterns learnable by Yoshinaka’s learner? Such a pattern can be learned by Chater and Vitànyi’s ideal language learner from some data presentation. Can humans replicate this feat? This is just the tip of the iceberg, and many such experiments are currently being conducted in many linguistic subfields including phonology, morphology, and syntax.

Computational Theories of Learning 663

27.7 Conclusion In this chapter I have tried to explain what computational learning theories are, and the lessons language scientists can draw from them. I believe there is a bright future for research which honestly integrates the insights of computational learning theories with the insights and methodologies of developmental psycholinguistics.

Acknowledgments I thank Lorenzo Carlucci, John Case, Alexander Clark, and Katya Pertsova for helpful discussion and an anonymous reviewer for valuable feedback.

Chapter 28

Statistical L e a rni ng , Inductive Bias , a nd Bay esian Inf e re nc e i n L anguage Ac qu i si t i on Lisa Pearl and Sharon Goldwater

Language acquisition is a problem of induction: the child learner is faced with a set of specific linguistic examples and must infer some abstract linguistic knowledge that allows the child to generalize beyond the observed data, that is, to both understand and generate new examples. Many different generalizations are logically possible given any particular set of input data, yet different children within a linguistic community end up with the same adult grammars. This fact suggests that children are biased towards making certain kinds of generalizations rather than others. The nature and extent of children’s inductive bias for language is highly controversial, with some researchers assuming that it is detailed and domain-specific (e.g., Chomsky 1973, 1981b; Baker 1978; Huang 1982a; Fodor 1983; Bickerton 1984; Lasnik and Saito 1984; Gleitman and Newport 1995) and others claiming that domain-general constraints on memory and processing are sufficient to explain the consistent acquisition of language (e.g. Elman et al. 1996; Sampson 2005). In this chapter, we discuss the contribution of an emerging theoretical framework called Bayesian learning that can be used to investigate the inductive bias needed for language acquisition.1 In the Bayesian view of learning, inductive bias consists of a combination of hard and soft constraints. Hard constraints make certain grammars2 impossible for any human to 1 Bayesian models themselves are not a new idea, and have long been used in mathematics and computer science for both statistical analysis and machine learning (Duda et al. 2000; Gelman et al. 2003; C. Bishop 2006), but their use in cognitive modeling, particularly in the area of language acquisition, is much newer. 2 We use the term “grammar” very broadly to mean any kind of abstract linguistic knowledge used for generalization.

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 665 acquire; in the language of Bayesian modeling, these impossible grammars are outside the learner’s hypothesis space. Grammars inside the hypothesis space are learnable given the right input data, but they may not all be equally easy to learn. Soft constraints, implemented in the form of a probability distribution over the hypothesis space, mean that the learner will be biased towards certain of these grammars more than others. A “difficult” (low-probability) grammar can be learned, but will require more evidence (input data favoring this grammar) in order to be learned. In the absence of such evidence, the child will instead acquire a high-probability grammar that is also compatible with the input. Under this view of learning, the central question of language acquisition is to determine what the hard and soft constraints are. A key assumption is that learners have access to domain-general statistical learning mechanisms that closely approximate the rules of probability theory. Given a particular set of input data, these probabilistic learning mechanisms then allow learners to converge on a grammar that is both compatible with the data and has high probability in the hypothesis space. In this sense, the grammar is the optimal choice, given the data. This notion of optimization arises from the ties between Bayesian modeling and the tradition of rational analysis in cognitive science (Chater and Oaksford 1999), which focuses on the adaptation of the organism to its environment, with the resulting implication that cognitive processes are in some sense optimized to the task. Probability theory plays a central role in Bayesian modeling precisely because it is a mathematical tool for optimizing behavior under uncertainty. We discuss all of these ideas and their implications further in section 28.2. Although many of these ideas may be new to the reader, some aspects of Bayesian modeling may be familiar from other approaches to learning. For example, many nativist linguists (particularly those in the Chomskyan tradition) also have a fundamental research goal of explicitly defining the hypothesis space of possible grammars (as part of Universal Grammar). However, unlike most models of learning based on Chomskyan theories, Bayesian models of learning are inherently probabilistic, both in defining a probability distribution over the hypothesis space and in the way the learner is assumed to incorporate information from linguistic data. These qualities allow Bayesian models to be more robust to noisy data and also to avoid some of the classic learnability problems faced by non-statistical learners, such as the subset problem, which is a specific instance of the no negative evidence problem (see Tenenbaum and Griffiths, 2001, for a review).3 Unlike deterministic learners, probabilistic learners can accumulate indirect negative evidence against a (probabilistic) grammar if a structure that is licensed by the grammar occurs in the input significantly less often than expected under the grammar.

3

We note that the ability to solve the subset problem does not negate the poverty of the stimulus argument de facto, as discussed later on in the introduction.

666 Lisa Pearl and Sharon Goldwater Although some other recent proposals incorporate probabilistic learning (e.g. Yang 2002; Legate and Yang 2007; Pearl 2011), they do not include the idea of optimization discussed earlier. In addition, many learning models in the Chomskyan tradition assume not only that the hypothesis space itself is defined using domain-specific concepts, but that the learning algorithm makes reference to these concepts, so that it too is domain-specific (see Chapter 29 by Sakas in this volume for a more detailed discussion). Bayesian models, in contrast, assume that the learner’s use of statistical information is entirely domain-general, and domain-specificity (if any) is restricted to the nature of the hypothesis space. Of course, the Bayesian approach is not the only statistically-grounded theoretical framework for studying language acquisition—the connectionist approach is similarly committed to domain-general statistical learning mechanisms (e.g. Rumelhart and McClelland 1986a; Elman et al. 1996; Prince and Smolensky 2004; Smolensky and Legendre 2006). Like Bayesian models, connectionist models incorporate a notion of optimization (minimizing prediction error); nevertheless, the two approaches differ in important ways. For example, a defining feature of connectionism is the use of distributed representations. Although Bayesian models could in principle be developed using distributed representations, these are given no special status, and in fact symbolic representations (e.g. rules and categories) are typically used because they make it easier to understand and define the space of hypotheses. For this reason, Bayesian models may be more attractive to linguists who are used to symbolic representations. Another major difference between the two approaches is that Bayesian models are declarative, defining the learner’s constraints and associated hypothesis space explicitly using mathematical equations, whereas connectionist models are procedural, imposing constraints only implicitly through the choice of network architectures and learning algorithms. We elaborate on this distinction in section 28.2.1, noting here only that the use of explicitly defined constraints can make it easier to understand the assumptions built into the learner and how they relate to linguistic theory or domain-general cognitive principles (i.e. whether the constraints are domain-specific or not). As implied by the previous paragraphs, there is nothing inherent in the Bayesian approach that either favors or disfavors domain-specific constraints, and Bayesian researchers hold different views about their necessity.4 Moreover, although Bayesian learners have certain advantages over non-statistical learners, we are not claiming that

4

Some might think this is a difference from connectionism, which is often associated with an anti-nativist viewpoint (Rumelhart and McClelland 1986a; Elman et al. 1996). However, the defining characteristics of the connectionist approach—e.g. distributed representations, parallel processing, and statistical learning mechanisms (see, for example, : Connectionist Approaches to Language, by Paul Smolensky)—are also agnostic regarding domain-specificity. One well-known connectionist proposal that incorporates strong domain-specific constraints is Harmonic Grammar (Legendre et al. 1990b; Smolensky et al. 1992; Smolensky and Legendre 2006).

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 667 these advantages are sufficient to overcome the problem of the poverty of the stimulus (PoS) on their own, nor (we think) would other Bayesians (e.g. see Regier and Gahl (2004), Pearl and Lidz (2009), and Pearl and Mis (2015) who also discuss the necessity of additional constraints). The PoS problem is much broader than any particular learnability problem such as the subset problem—it is a claim that the data children encounter are compatible with multiple generalizations. Even if Bayesian learning can help solve the subset problem, multiple generalizations may still be possible given the positive and indirect negative evidence available. The question, in our view, is not whether there is a PoS problem (there clearly is), but rather what kinds of constraints are needed in order to overcome it. The traditional argument from the PoS claims that the necessary constraints come from innate, domain-specific knowledge (Chomsky 1981b). While the use of Bayesian learning does not automatically negate the need for domain-specific constraints, the ability to obtain information through indirect negative evidence and other properties of statistical learning may mean that a Bayesian learner is able to acquire the correct generalizations with less domain-specific prior knowledge than the PoS argument normally assumes. Whether less means none or simply less detailed is an open question that can be evaluated empirically using Bayesian modeling. For example, one way to argue in favor of a less constrained hypothesis space is to show that a Bayesian statistical learner operating within that hypothesis space is capable of acquiring the linguistic generalization of interest, i.e. that additional constraints are not needed. We can then consider whether the constraints being used are domain-general or domain- specific, and whether they are necessarily innate or could be derived from previous linguistic experience. Researchers have applied this approach to problems such as the acquisition of English anaphoric one (Regier and Gahl 2004; Foraker et al. 2009; Pearl and Lidz 2009; Pearl and Mis 2011, 2015), the structure-dependence of syntactic rules (Perfors et al. 2011), and the type of syntactic rules that account for recursion (Perfors et al. 2010). Examples like those outlined in the preceding paragraphs show how Bayesian modeling can be used to argue that certain linguistic generalizations are learnable in principle. However, if the Bayesian framework is to be taken seriously as a way of modeling actual language acquisition, it is also important to show that children’s behavior is consistent both with the assumptions of the framework and the predictions of specific models. In the remainder of this chapter, we aim to do just that. We begin with the most basic assumption of the framework, namely that learning is based on statistical properties of the input data. We first review some of the wide- ranging behavioral evidence suggesting that children are indeed able to extract useful generalizations from statistical information, and can do so in a range of situations and using different types of statistics (section 28.1). We then formalize the details of the Bayesian approach, expanding upon the key features already mentioned, and discuss some additional pros and cons of this approach (section 28.2). Finally, we present several case studies to illustrate the ideas we have introduced and to show how Bayesian models can be applied to problems of interest in language acquisition (section 28.3).

668 Lisa Pearl and Sharon Goldwater

28.1 Experimental Studies of Statistical Learning Abilities Although statistical properties of language were widely studied by the structuralist linguists of the 1950s (e.g. Harris 1954), research in this area declined sharply with the rise of generative linguistics in the following decade, and only began to reemerge in the 1990s as an important topic in language acquisition. Domain-general processes of statistical learning were long recognized as part of the acquisition process even under generative theories (Chomsky 1955; Hayes and Clark 1970; Wolff 1977; Pinker 1984; Goodsitt et al. 1993; among others), but these processes by themselves were believed to be incapable of accounting for the acquisition of complex linguistic phenomena (e.g. syntactic or phonological structure, the syntax–semantics interface) without an accompanying structured hypothesis space for those linguistic phenomena. This does not mean that researchers did not investigate the nature of the learning procedure by which the child uses the input data to disambiguate between different hypotheses and attain the correct grammar—for example, see Wexler and Culicover (1980), Dresher and Kaye (1990), Gibson and Wexler (1994), and Niyogi and Berwick (1996). However, the learning procedure was often only interesting as a tool to support the validity of a particular hypothesis space, such as the parameters hypothesis space of Chomsky (1981b): a learning procedure, such as statistical learning, could demonstrate that it was possible for the child to converge on the correct hypothesis in the specified hypothesis space, given the available input data. Under this view of acquisition, the truly interesting question was about defining the child’s hypothesis space appropriately—and so domain-general statistical learning was largely ignored as a research topic. Saffran et al. (1996a) was an important study in this respect since it considered the nature of children’s statistical learning abilities to be a question worth pursuing. Though this study was aimed at the process of word segmentation (identifying words in a fluent stream of speech) rather than more abstract knowledge acquisition at the phonological, syntactic, or semantic level, it successfully demonstrated that very young children have “powerful mechanisms for the computation of statistical properties of language input” (Saffran et al. 1996a). In particular, it showed that 8-month-old infants were able to track statistical cues between syllables, and so segment novel words out from a stream of artificial language speech where the statistical information was the only cue to where word boundaries were. Saffran et al. hypothesized, and Aslin et al. (1998) later confirmed, that the cue the infants were using is what they called “transitional probability.” The transitional probability between syllables X and Y (e.g. “pre,” “tty”) is the probability that Y will occur following X, computed as the frequency of XY (“pretty”) divided by the frequency of X (“pre”).5 Pelucchi et al. (2009a) later showed that infants can track transitional 5

This statistic is more standardly known in probability theory as the conditional probability of Y given X.

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 669 probability in realistic child-directed speech, as well as the artificial language stimuli Saffran et al. and Aslin et al. used. With respect to word segmentation in natural language, Saffran et al. (1996a) believed transitional probability would be a reliable cue to word boundaries, since the transitional probability of syllables spanning a word boundary would be low while the transitional probability of syllables within a word would be high. For example, in the sequence “pretty baby,” the transitional probabilities between (1) “pre” and “tty” and (2) “ba” and “by” would be higher than the transitional probability between “tty” and “ba.” Because of this property, they assumed that infants’ ability to track transitional probability would be very useful for word segmentation in real languages (as opposed to the artificial language stimuli used in their study). Interestingly, later studies discovered that transitional probability is perhaps a less useful cue to segmentation in English child-directed speech than originally assumed (Brent 1999a; Yang 2004; Gambell and Yang 2006). The precise way in which infants might use transitional probability information (if at all) for realistic language data therefore remains an open question. Notably, however, the broader claim of Saffran et al. (1996a) was not tied to transitional probability, but instead was that some aspects of acquisition may be “best characterized as resulting from innately biased statistical learning mechanisms rather than innate knowledge” that explicitly constrains the hypothesis space. Tracking syllable transitional probability is clearly one kind of statistical learning mechanism, but it need not be the only one. This led to a revitalized interest in characterizing the statistical learning abilities of children, and what types of acquisition problems could be solved by these abilities. Subsequent research has investigated a number of questions raised by these initial studies, particularly the following: 1. What kinds of statistical patterns are human language learners sensitive to? 2. To what extent are these statistical learning abilities specific to the domain of language, or even to humans? 3. What kinds of knowledge can be learned from the statistical information available? The first question addresses the kinds of biases that are present in the human language learning mechanism, while the second question is important for understanding whether our linguistic abilities fall out from other cognitive abilities, or are better viewed as a cognitively distinct mechanism. The third question explores what can be gained if humans can capitalize on the distributional information available in the data. Many studies have attempted to ascertain the statistical patterns humans are sensitive to. Thiessen and Saffran (2003) discovered that 7-month-olds prefer syllable transitional probability cues over language-specific stress cues when segmenting words, while 9-month-olds show the reverse preference. Graf Estes et al. (2007) found that word-like units that are segmented using transitional probability are viewed by 17-month-olds as better candidates for labels of objects, highlighting the potential utility of transitional probability both for word segmentation and subsequent word–meaning mappings.

670 Lisa Pearl and Sharon Goldwater Moving beyond the realm of word segmentation, Gómez and Gerken (1999) discovered that 1-year-olds could learn both specific information about word ordering and more abstract information about grammatical categories in an artificial language, based on the statistical cues in the input. Thompson and Newport (2007) discovered that adults can use transitional probability between grammatical categories to identify word sequences that are in the same phrase, a precursor to more complex syntactic knowledge. It is worth pointing out that although most of the experiments described in this section have focused on transitional probability as the statistic of interest, researchers have begun to examine a wider range of statistical cues. These include other simple statistics involving relationships of adjacent units to one another, such as backward transitional probability (Perruchet and Desaulty 2008; Pelucchi et al. 2009b) and mutual information (Swingley 2005b). Another line of work focuses on non-adjacent dependencies, and when these are noticed and used for learning. Newport and Aslin (2004) showed that learners were sensitive to non-adjacent statistical dependencies between consonants and between vowels, using either of these to successfully segment an artificial speech stream (though see Bonatti et al. 2005 and Mehler et al. 2006, who only found a preference for statistical dependencies between consonants rather than for both consonants and vowels). Additionally, learners were unsuccessful when the non-adjacent dependencies were between entire syllables, suggesting a bias in either perceptual or learning abilities. Work by Gómez (2002) has shown that learners are able to identify non-adjacent dependencies between words, but only when there is sufficient variation in the intervening word. This idea is similar to the concept of frequent frames introduced by Mintz (2002). A frequent frame is an ordered pair of words that frequently co-occur with one word position intervening. For example, the___o ne is a frame that could occur with big, other, pretty, etc. Mintz suggests that frequent frames could be used by human learners to categorize words because they tend to surround a particular syntactic category (e.g. the___one tends to frame adjectives). Mintz (2002, 2006) demonstrated that both adults and infants are able to categorize novel words based on the frames in which those novel words appear. In addition, recent experimental studies in learning mappings between words and meanings (Yu and Smith 2007; Xu and Tenenbaum 2007; Smith and Yu 2008) suggest that humans are capable of extracting more sophisticated types of statistics from their input. Specifically, the experimental evidence suggests that humans can combine statistical information across multiple situations (though see Medina et al. 2011, for some evidence that learners do not prefer to combine information across situations), and that the statistics they use cannot always be characterized as just transitional probabilities or frequent frames. Yu and Smith (2007) and Smith and Yu (2008) examined the human ability to track probabilities of word–meaning associations across multiple trials where any specific word within a given trial was ambiguous as to its meaning. Importantly, only if human learners were able to combine information across trials could a word–meaning mapping be determined. Both adults (Yu and Smith 2007) and 12-and 14-month-old infants

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 671 (Smith and Yu 2008) were able to combine probabilistic information across trials. So, both adults and infants can learn the appropriate word–meaning mappings, given data that are uninformative within a trial but informative when combined across trials. Xu and Tenenbaum (2007) investigated how humans learn the appropriate set of referents for basic (cat), subordinate (tabby), and superordinate (animal) words, a task that has traditionally been considered a major challenge for early word learning (e.g. Markman 1989; Waxman 1990) because these words overlap in the referents they apply to (a tabby is a cat, which is an animal)—an example of the subset problem. Previously, it was assumed that children had an innate bias to prefer the “basic level” in order to explain children’s behavior (Markman 1989). One sophisticated statistical inference that can help with this problem is related to what Xu and Tenenbaum call a suspicious coincidence, and is tied to how well the observed data accord with a learner’s prior expectations about word–meaning mappings. For example, suppose we have a novel word blick, and we encounter three examples of blicks, each of which is a cat. The learner at this point might (implicitly) have two hypotheses (blick = animal, blick = cat), and expectations associated with these two hypotheses. Specifically, if blick = animal, other kinds of animals besides cats should be labeled blicks sometimes because the set of blicks is larger than just the set of cats. In the language of our introduction, the fact that three blicks were labeled and all of them were cats provides indirect negative evidence against the hypothesis that blick means animal. Or, in Xu and Tenenbaum’s terminology, it is a suspicious coincidence to see three cats if blick really means animal. Instead, it is more likely that blick is a “basic” label that is more specific, in this case cat. Xu and Tenenbaum (2007) discovered that both adults and children between the ages of 3 and 5 can use suspicious coincidences like this to infer the appropriate meaning of a novel word like blick. This suggests that humans are indeed able to perform this sophisticated statistical inference. Turning to the question of domain-specificity for human statistical learning abilities, Saffran et al. (1999) showed that both infants and adults can segment non-linguistic auditory sequences (musical tones) based on the same kind of transitional probability cues that were used in the original syllable-based studies. Similar results have been obtained in the visual domain using both temporally ordered sequences of stimuli (Kirkham et al. 2002) and spatially organized visual “scenes” (Fiser and Aslin 2002a). Conway and Christiansen (2005) adapted the grammar from Gómez and Gerken’s (1999) experiments to explore learning in different modalities: auditory, visual, and tactile. They showed that adults could learn grammatical generalizations in all three modalities, although there was a quantitative benefit to the auditory modality, as well as some qualitative differences in learning. These results (particularly those in the tactile modality, which is not used in natural languages) support the idea that the kinds of statistical learning seen in the earlier artificial language studies are highly domain-general, showing robustness across modalities and presentation formats. Another way of investigating whether particular learning abilities could in principle be specific to language is by comparing learning across species. If non-human animals are able to learn the same kinds of generalizations as humans, then whatever cognitive

672 Lisa Pearl and Sharon Goldwater mechanism is responsible must not be a linguistic one. To this end, Hauser et al. (2001) exposed cotton-top tamarins to the same kind of artificial speech stimuli used in the original Saffran et al. (1996a) segmentation experiments, and found that the monkeys were able to perform the task as well as infants. Saffran et al. (2008) later found that tamarins could also learn some simple grammatical structures based on statistical information, but were unable to learn patterns as complex as those learned by infants. This suggests that infants’ abilities to extract information from statistical patterns are more powerful than those of other animals. Additional evidence is provided by the experiments of Toro and Trobalon (2005), who showed that rats were able to segment a speech stream based on syllable co-occurrence frequency (similar to the mutual information explored in Swingley 2005b), but not transition probability alone. The rats also showed no evidence of learning generalizations from non-adjacent dependencies such as those in the Gómez (2002) experiments, or abstract rules as in Marcus et al. (1999). The main lesson from the experimental evidence reviewed in this section is that children do seem capable of using statistical information in their language input, from tracking simple statistical cues like transitional probability to making sophisticated inferences that combine ambiguous information from multiple data sources. To learn more about the abilities and biases of human learners, researchers continue to investigate the statistical information humans are sensitive to, and what kinds of generalizations are learned from this information. In addition, experiments using other modalities, domains, and species can help to shed light on the question of whether these abilities are domain-specific or domain-general. This kind of experimental research is undoubtedly important for our understanding of the role of statistical learning in language acquisition. However, the third question of what knowledge can be learned from the statistical information available can be addressed more easily, or in a complementary fashion, through other research methodologies such as Bayesian modeling, which we turn to in the next section.

28.2 An Introduction to the Bayesian Modeling Framework As noted in the introduction, Bayesian modeling offers a concrete way to examine what knowledge is required for acquisition, without committing a priori to a particular view about the nature of that knowledge. It also addresses the question of whether human language learners can be viewed as being optimal learners in a sense that will become clearer once we formalize the approach. We expand on both of these points in this section, starting with a conceptual introduction to the Bayesian approach that explains the kinds of questions it can answer and how these differ from the questions addressed by other approaches. Next, we describe the formal implementation of a Bayesian model, in particular how it operates over an explicitly defined hypothesis space. We then highlight

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 673 some attractive features of Bayesian models with respect to the kind of hypothesis spaces they can operate over, and conclude with a brief discussion of some of the algorithms commonly used in Bayesian modeling.

28.2.1 Bayesian Modeling as a Computational- level Approach Most models of language acquisition are procedural: they hypothesize specific procedures or algorithms that can be applied to the input and/or grammar in order to produce linguistically meaningful generalizations. For example, learners might segment words by identifying syllable sequences with high frequency and mutual information (Swingley 2005b), create a grammatical category by grouping together words that share a frequent frame (Mintz 2003; Wang and Mintz 2008; Chemla et al. 2009), use back- propagation to change the set of weights in a neural network (Elman 1990, 1993), or demote the ranking of a constraint if it causes an error in parsing the input (Boersma and Hayes 2001; Tesar and Smolensky 2000; Prince and Tesar 2004). These kinds of models provide what Marr (1982) calls algorithmic-level explanations, focusing on the question of how learners generalize from their input. In contrast, the Bayesian approach investigates the problem of language acquisition at Marr’s (1982) computational level of analysis, seeking answers to the questions of what computational problem is being solved and why the learner ends up with a particular solution. This kind of investigation calls for a declarative (rather than procedural) model of the learner. That is, in designing a Bayesian model, the researcher considers what the nature of the learning task is (i.e. what does the learner need to achieve), what sources of information are available, and what the inductive biases of the learner are (i.e. what kinds of generalizations/grammars are easy, difficult, or impossible to learn). It is then possible to ask what will be learned, given particular assumptions about these aspects of the problem and also assuming that the learner behaves optimally under those assumptions. This kind of approach is often referred to as an ideal observer (or ideal learner) analysis, since it explores the solutions that would be found by an idealized optimal learner capable of extracting the necessary statistical information from the input. The idea of optimality leads naturally to the use of probability theory for defining Bayesian models, because probability theory is a tool for determining optimal behavior under uncertainty. Some readers may not be comfortable with the idea of humans as optimal statistical learners, especially since well-known early studies in other areas of cognition suggested just the opposite (Cascells et al. 1978). However, the rational analysis view of cognition (Anderson 1990; Chater and Oaksford 1999) has countered by arguing that human behavior is adapted to our natural environment and the tasks we must achieve there— thus, “optimal” behavior must be interpreted within that context, rather than within the context of a laboratory experiment. Behavioral and modeling work has supported the idea of humans as optimal learners in areas such as numerical cognition (Tenenbaum

674 Lisa Pearl and Sharon Goldwater 1996), causal induction (Griffiths and Tenenbaum 2005), and categorization (Kemp et al. 2007). More recently, evidence has begun to accumulate in language acquisition as well (Xu and Tenenbaum 2007; Feldman et al. 2009a). Whether or not humans behave optimally in all situations, the kind of ideal learner analysis provided by a Bayesian model is still useful for answering two kinds of questions. First, the question of learnability: what is possible to learn from the available input, given particular assumptions about the learner’s inductive biases? Second, once we have identified the optimal solution to the problem as defined by the model, we can ask whether human behavioral data are consistent with the model’s predictions. If so, then we have helped to explain why humans behave in this way—it is the optimal response to the data they are exposed to. If not, then we can begin to investigate how and why humans might differ from the optimal behavior (Goldwater et al. 2009; Frank et al. 2010). Although these are worthwhile questions to investigate, some researchers still find the Bayesian approach unsatisfying because of its focus on computational-level explanations. In particular, Bayesian models often do not address how the learner might perform the computations required to achieve the optimal solution to the learning problem, even if such a solution were achieved. Rather, they simply state that if human behavior accords with the predictions of the model, then humans must be performing some computation (possibly a very heuristic one) that allows them to identify the same optimal solution that the model did. We discuss this issue further, including some responses to it, in section 28.2.4.

28.2.2 Formalizing the Bayesian Approach Bayesian models assume the learner comes to the task with some space of hypotheses H, each of which represents a possible explanation of the process that generated the input data. The hypothesis space could be discrete (e.g. a finite or infinite set of symbolic grammars) or continuous (e.g. a set of real-valued parameters representing the tongue positions necessary to produce a particular set of vowels). Given the observed data d, the learner’s goal is to determine the probability of each possible hypothesis h, that is, to estimate P(h|d), the posterior distribution over hypotheses. A correct estimate of the posterior distribution will allow the learner to behave optimally in the future, that is, to have the best chance of interpreting and/or generating new data in accordance with the true hypothesis (the one that actually generated the observed data). Rather than estimating P(h|d) directly, we first apply Bayes’ Theorem, derived from the axioms of probability theory, to reformulate it as in (1): (1) Bayes’ Theorem

P (h | d ) =

P (d | h) P (h) P (d )

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 675 where P(d|h), the likelihood, expresses how well the hypothesis explains the data, and P(h), the prior, expresses how plausible the hypothesis is regardless of any data. P(d), the evidence, is a constant normalizing factor that ensures that P(h|d) is a proper probability distribution, summing to 1 over all values of h. Often we only care about the relative probabilities of different hypotheses, in which case we can ignore the denominator and simply write Bayes’ Theorem as a proportionality, as in (2): (2) P (h | d ) ∝ P (d | h) P (h) Defining a Bayesian model usually involves three steps: 1. Defining the hypothesis space: Which hypotheses does the learner consider? 2. Defining the prior distribution over hypotheses: Which hypotheses is the learner biased towards or against? 3. Defining the likelihood function: How are the observed data generated under a given hypothesis? A simple example, adapted from Griffiths and Yuille (2006), should help to clarify these ideas. Suppose you are given three coins, and told that two of them are fair, and one produces heads with probability 0.9. You choose one of the coins at random and must determine whether it is fair or not, that is, whether θ (the probability of heads) is 0.5 or 0.9. Thus, the hypothesis space contains two hypotheses: h0 (θ = 0.5) and h1 (θ = 0.9), with P(h0) = 2/3 and P(h1) = 1/3. Data are obtained by flipping the coin, with the probability of a particular sequence d of flips containing s heads and t tails being dependent on θ, as P(d|θ) = θs(1–θ)t. For example, if θ = 0.9, then the probability of the sequence HHTTHTHHHT is 0.0000531. If θ = 0.5, then the same sequence has probability 0.000978. To determine which hypothesis is more plausible given that particular sequence, we can compute the posterior odds ratio as in (3): (3) Posterior Odds Ratio

P (d | h1 )P (h1 ) P (h1 | d ) P (d | h1 )P (h1 ) (0.0000531)(1 / 3) 1 P (d ) = = = ≈ P (h0 | d ) P (d | h0 )P (h0 ) P (d | h0 )P (h0 ) (0.000978)(2 / 3) 37 P (d )

This tells us that the odds in favor of h0 are roughly 37:1. Note that the P(d) (evidence) term cancels, so we do not need to compute it. This very simple example illustrates how to compare the plausibility of two different hypotheses, but in general the same principles can be applied to much larger and more complex hypothesis spaces (including countably infinite spaces), such as might arise in language acquisition. With minor modifications, we can also use similar methods

676 Lisa Pearl and Sharon Goldwater to compare hypotheses in a continuous (uncountably infinite) space (see Griffiths and Yuille (2006) for a more explicit description of the modifications required). Such a space might occur in a syntax-learning scenario if we suppose that the hypotheses under consideration consist of probabilistic context-free grammars (PCFGs), with different grammars varying both in the rules they contain, and the probabilities assigned to the rules.6 The input data in this situation could be a corpus of sentences in the language, with P(d|h) determined by the rules for computing string probabilities under a PCFG (Chater and Manning 2006). P(h) could incorporate various assumptions about which grammars the learner might be biased towards—for example, grammars with fewer rules, or grammars that incorporate linguistically universal principles. See section 28.3 below for more detailed examples of how these ideas can be applied to language acquisition.

28.2.3 Bayesian Hypothesis Spaces As mentioned in this section, a Bayesian learner can operate over a variety of hypothesis spaces (discrete, continuous, countably infinite, uncountably infinite, etc.), without changing the underlying principles of a Bayesian learner. Another useful property of Bayesian models is that the hypothesis space can be highly structured, supporting multiple levels of linguistic representation simultaneously. For example, the word segmentation model of Goldwater et al. (2006, 2009) contains two levels of representation—words and phonemes—though only one of these (words) is unobserved in the input and must be learned. However, Bayesian models can in principle learn multiple levels of latent structure simultaneously, and doing so can even improve their performance. For example, M. Johnson (2008) showed that learning both syllable structure and words from unsegmented phonemic input improved word segmentation in a Bayesian model similar to that of Goldwater et al. (2009). Feldman et al. (2009a) compared two Bayesian models of phonetic category acquisition to demonstrate that simultaneously learning phonetic categories and the lexical items containing those categories led to more successful categorization than learning phonetic categories alone. Dillon et al. (2011) also compared two Bayesian models of phonetic category learning: one that first learned phonetic categories and would later identify allophones and phonological rules based on those phonetic categories, and one that learned all the information (phonetic categories, allophones, and rules) at once. Again, the joint learner was more successful. By allowing us to build such joint models and compare them to staged learning models, the Bayesian approach is helpful for understanding the process of bootstrapping—using preliminary or uncertain information in one part of the grammar to help constrain learning in another part of the grammar, and vice versa.

6 Since probabilities are represented using real numbers, the hypothesis space is continuous; if the learner is assumed to acquire a non-probabilistic grammar, then the hypothesis space consists of a discrete set of grammars.

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 677 In addition to including multiple levels of structure, the predefined hypothesis space of a Bayesian learner can be instantiated very abstractly, which should appeal to generative linguists who believe abstract linguistic parameters determine much of the constrained variation observed in the world’s languages (Chomsky 1981b). Kemp et al. (2007) and Kemp and Tenenbaum (2008) discuss overhypotheses in Bayesian modeling, where overhypotheses refer to strong inductive constraints on possible hypotheses in the hypothesis space (Goodman 1955). This idea is intuitively similar to the classic notion of a linguistic parameter as an abstract (structural) property that constrains the hypothesis space of the learner. To see how, consider first a very simple example illustrating the idea of an overhypothesis, taken from Goodman (1955) and presented in Kemp et al. (2007). Suppose a learner is presented with several bags of marbles, where marbles can be either black or white. During training, the learner is allowed to examine all of the marbles in each bag, and finds that each bag contains either all black or all white marbles. During testing, the learner draws only a single marble from a bag and must predict the color distribution in the bag. Possible hypotheses are that the bag contains all black marbles, all white marbles, 70 percent black and 30 percent white, or any other combination. Possible overhypotheses are that all bags contain a uniform color distribution, all bags contain the same distribution, all bags contain a mixture of colors, etc. By observing (during training) several different bags that all have uniform color distributions, the learner learns to assign high probability to the overhypothesis of uniform color distribution. This overhypothesis in turn constrains the hypotheses for individual new bags observed—high probability is given to “all black” and “all white” before ever observing a marble from the bag, while low probability is given to hypotheses like “70 percent black and 30 percent white.” This example demonstrates how information can be indirectly used (i.e. at a very abstract level) to make predictions, for example observing all black bags and all white bags allows the prediction that a bag with mixed black and white marbles has low probability of occurring. Translating this to a linguistic example, suppose the marble bags are individual sentences. Some sentences contain verbs and objects, and whenever there is an object, suppose it appears after the verb (e.g. see the penguin, rather than the penguin see). Other sentences contain modal verbs and nonfinite main verbs, and whenever both occur, the modal precedes the main verb (e.g. could see, rather than see could). A learner could be aware of the shared structure of these sentences—specifically that these observable forms can be characterized as the head of a phrase appearing before its complements (specifically [VP V NP] and [IP Aux VP])—and could encode this “head-first” knowledge at the level that describes sentences in general, akin to a hierarchical Bayesian learner’s overhypothesis. This learner would then infer that sentences in the language generally have head-first structure. If this learner then saw a sentence with a preposition surrounded by NPs (e.g. penguins on icebergs are cute), it would infer that the preposition should precede its object ([NP penguins [PP on icebergs]] and not [NP [PP penguins on] icebergs]), even if it had never seen a preposition used before with an object. In this way, the notion of a head-directionality parameter can be encoded in a Bayesian learner at the level of an overhypothesis. In particular, inferences (e.g. about prepositional phrase internal structure) can be made on the basis of examples that bear on the general

678 Lisa Pearl and Sharon Goldwater property being learned (e.g. head directionality), even if those examples are not examples of the exact inference to be made (e.g. verb phrase examples).

28.2.4 Algorithms It is worth reiterating that, unlike neural networks and other algorithmic-level models such as those of Mintz (2003), Swingley (2005b), and Wang and Mintz (2008), Bayesian models are intended to provide a declarative description of what is being learned, not necessarily how the learning is implemented. Bayesian models predict a particular posterior distribution over hypotheses given a set of data, and can also be used to make predictions about future data based on the posterior distribution. If human subjects’ performance in a task is consistent with the predictions of the model, then we can consider the model successful in explaining what has been learned and which sources of information were used in learning. However, we do not necessarily assume that the particular algorithm used by the model to identify the posterior distribution is the same as the algorithm used by the humans. We only assume that the human mind implements some type of algorithm (as mentioned previously, perhaps a very heuristic one) that is able to approximately identify the posterior distribution over hypotheses. In practice, most Bayesian models of language acquisition have used Markov chain Monte Carlo algorithms such as Gibbs sampling to obtain samples from the posterior distribution (Geman and Geman 1984, Gilks et al. 1996; also see Resnik and Hardisty 2009 for an accessible tutorial). These are batch algorithms, which operate over the entire data set simultaneously. This is clearly an unrealistic assumption about human learners, who must process each data point as it is encountered, and presumably do not revisit or reanalyze the data at a later time (or at most, are able to do so only to a very limited degree). If humans are indeed behaving as predicted by Bayesian models, they must be using a very different algorithm to identify the posterior distribution over hypotheses—an algorithm about which most Bayesian models have nothing to say. Researchers who are particularly concerned with the mental mechanisms of learning often find the Bayesian approach unsatisfactory precisely because in its most basic form, it does not address the question of mechanisms (e.g. see McClelland et al. 2010 and Griffiths et al. 2010 for a reply). In response to this kind of critique, a recent line of work has begun to address the question of how learners might implement Bayesian predictions in a more cognitively plausible way. These kinds of models are sometimes called rational process models, since they are models of rational learners that are concerned with implementing the process of approximating Bayesian inference. For example, Shi et al. (2010) discuss how exemplar models may provide a possible mechanism for implementing Bayesian inference, since these models allow an approximation process called importance sampling. Other examples include the work of Bonawitz et al. (2011), who discuss how a simple sequential algorithm can be used to approximate Bayesian inference in a basic causal learning task, and that of Pearl et al. (2011), who (as will be described in section 28.3.2) investigated various online algorithms for Bayesian

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 679 models of word segmentation. See also McClelland (1998) for a discussion of how neural network architectures can be used to approximate optimal Bayesian inference (again emphasizing that the connectionist and Bayesian frameworks are not so much in opposition as they are addressing different aspects of the learning problem, with one focusing on the description of the task and the other focusing on the implementation).

28.3 Specific Example Studies This section surveys a few representative studies in different areas of language acquisition in order to illustrate how Bayesian modeling can be applied within each domain. For each study, we review the problem faced by the learner, describe the hypothesis space assumed by the model and how Bayesian inference operates within it, and discuss the results with reference to relevant behavioral data.

28.3.1 Phonetics and Phonology Feldman et al. (2009b), Feldman (2011), and Feldman et al. (2013) address the question of phonetic category acquisition, specifically the acquisition of vowel categories. This is a difficult problem because of the variation in acoustic properties between different tokens of the same vowel, even when spoken by the same speaker. Although the mean formant values of different vowel categories are different, the distribution of values overlaps considerably, for example a particular token of /e/may sound exactly like a token of /ε/, even if spoken by the same individual. Figure 28.1 illustrates this variation in men’s vowels sounds. 300

i

u

500

E

c æ

600

υ

ε

700 a

800 900

o

c

First format (Hz)

I

ν

400

2,500

2,000

1,500

1,000

Second formant (Hz)

Figure 28.1 Example distribution of men’s vowel sounds. Many vowel sounds have overlapping distributions, such as /e/and /ε/. (Reproduced from Feldman et al. 2009b.)

680 Lisa Pearl and Sharon Goldwater Experimental studies suggest that infants are able to learn separate phonetic categories for speech sounds that occur with a clear bimodal distribution (Maye et al. 2002; Maye and Weiss 2003), but the extent of overlap between phonetic categories in real speech suggests that some categories might be difficult to distinguish in this way. Instead, Feldman and colleagues hypothesize that learners must make use of an additional source of information beyond the acoustic properties of individual sounds; specifically, they take into account the words those sounds occur in (an idea also advocated by Swingley 2009a). Of course, young infants who are still learning the phonology of their language have very little lexical knowledge. Indeed, Feldman et al. (2009b) review experimental studies suggesting that phonetic categorization and word segmentation (a precursor to word–meaning mapping) occur in parallel, between the ages of 6–12 months. So, rather than assuming either that phonetic categories are acquired first and then used to learn words, or that words are acquired first and then used to disambiguate phonetic categories, Feldman et al. propose a joint model in which phonetic categories and word forms are learned simultaneously. They compare this model to a simpler baseline model in which phonetic categories alone are learned. We describe each of these models briefly before reviewing the results. Feldman et al.’s (2009b) baseline model is a distributional model of categorization: it assumes that phonetic categories can be identified based on the distribution of sounds in the data. In particular, it assumes that the tokens in each phonetic category have a Gaussian (normal) distribution, and the goal of the learner is to identify how many categories there are, and which sounds belong to which categories. Since the number of categories is unknown, Feldman et al. use a Dirichlet process prior (Ferguson 1973), a distribution over categories that does not require the number of categories to be known in advance. The Dirichlet process favors categorizations that contain a smaller number of categories, unless the distributional evidence suggests otherwise. In other words, if there is good reason to assume that a set of sounds are produced from two different categories (e.g. because they have a strongly bimodal distribution, leading to a low likelihood if collapsed into a single Gaussian category), then the model will split the sounds into two categories; otherwise it will assign them to a single category. Feldman et al.’s second model is a lexical-distributional model, which assumes that the input consists of acoustically variable word forms rather than just sequences of phonetic tokens (i.e. that the child recognizes that the phonetic tokens are part of larger units). The learner now has two goals: to find phonetic categories (as in the distributional learner) and also to recognize acoustically distinct word forms as variants of the same lexical item, grouping together tokens that contain the same sequence of phones. Note that these two tasks are interdependent. On the one hand, the categorization of phonetic tokens affects which word tokens are considered to be the same lexical item. On the other hand, if two word tokens are assigned to the same lexical item, then the acoustic tokens comprising them should belong to the same phonetic categories. The hypothesis space for this model thus consists of pairs of categorizations (of acoustic tokens into phonetic categories, and word forms into lexical items). Since the lexical

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 681 learning task is also viewed as categorization, it is modeled using another Dirichlet process, which prefers lexicons containing fewer items when possible. Using a small hand-constructed data set, Feldman et al. (2009b) show that the lexical- distributional model makes a counterintuitive prediction about minimal pairs (i.e. words that differ by a single phoneme). Specifically, if a pair of sounds (say, B and C) only occur within minimal pairs (say, lexical items AB, AC, DB, DC), then they are likely to be categorized as a single phoneme if they are acoustically similar, since this would reduce the size of the lexicon, replacing four words with two (AX, DX). On the other hand, if B and C occur in different contexts (say, AB and DC only), then they are more likely to be categorized as separate phonemes. This is because the lexical-distributional learner can use phonemes A and D to recognize that AB and DC are different words, and then use this information to recognize that the distribution of B and C are actually slightly different. This prediction is interesting for two reasons. First, it means that the lack of minimal pairs in early vocabularies (e.g. see Dietrich et al. 2007) may actually be helpful. Secondly, recent experiments by Thiessen (2007) seem to bear out the model’s prediction in a word learning task with 15-month-olds: infants are better at discriminating similar-sounding object labels (e.g. daw versus taw) after being familiarized with non-minimal pairs containing the same sounds (dawbow, tawgoo). In a second simulation, Feldman et al. (2009b) compared the performance of their distributional model, lexical-distribution model, and a second distributional model (Vallabha et al. 2007) on a larger corpus containing 5000 word tokens from a hypothetical set of lexical items containing only vowels (e.g. “aei”—vowel-only words were necessary because the model can only learn vowel categories). Both of the distributional models identified too few phonetic categories, collapsing highly overlapping categories into one category. In contrast, the lexical-distributional learner was much more successful in distinguishing between very similar categories. Although these results are preliminary and still need to be extended to more realistic lexicons, they provide intriguing evidence that simultaneously learning linguistic generalizations at multiple levels (phonetic categories and word forms) can actually make the learning problem easier than learning in sequence. Dillon et al. (2011) have also recently studied the acquisition of phonemes and phonological rules from acoustic data, using a Bayesian model. Like Feldman et al. (2009b, 2013), they recognize that word forms are comprised of phonetic categories. However, they also note that a phoneme is a more abstract representation that may relate multiple phonetic categories across word forms (known as allophones of the phoneme). For example, in Spanish, there is a single phoneme /b/that is realized in distinct ways, depending on the surrounding linguistic context: between two vowels, it is pronounced as the fricative /β/, while in all other contexts, it is pronounced as the stop /b/. These two pronunciations are distinct phonetic categories that appear in different lexical items. The Lexical-Distributional Model of Feldman et al. would likely recognize these as separate phonetic categories precisely because they appear in different linguistic contexts. However, it would not recognize them as being allophones of the phoneme /b/, related by a phonological rule that is conditioned on the surrounding phonemes;

682 Lisa Pearl and Sharon Goldwater instead, learning that mapping from two phonetic categories to a single phoneme would occur in a subsequent stage of learning. While this is a reasonable model of acquisition, it nonetheless implies a learning sequence where learning phonetic categories must happen before learning phonemes and phonological rules. Dillon et al. (2011) explore whether relaxing this assumption could lead to better learning. In particular, Dillon et al. (2011) investigate the vowel system of Inuktitut, which has three phonemes with two allophones each (for a total of six phonetic categories). This kind of vowel system is not uncommon in the world’s languages (e.g. it is shared by Quechua and many dialects of Arabic), and so represents a realistic learning problem. They compare a learner that attempts to first identify the phonetic categories from the acoustic data (and would only later hypothesize phonemes and phonological rules) to a learner that learns phonetic categories, phonemes, and phonological rules simultaneously. Their findings are similar to Feldman et al’s more general finding: Learning multiple levels of representation simultaneously can be a better strategy than trying to learn them in sequence. In particular, Dillon et al. (2011) found that the learner who only identifies phonetic categories will converge on phonetic categories that make it much harder to formulate the correct phonological rule (and so define the correct phonemes). This problem occurs because the learner disregards the linguistic context when identifying its categories, and uses only the acoustic information. In contrast, if the learner is trying to identify context-sensitive phonological rules at the same time that it is identifying phonetic categories, then it views the linguistic context as informative. This learner identifies phonetic categories that are conducive to formulating phonological rules based on linguistic context; it can then find the correct phonemes (and allophones) for the language. An interesting acquisition trajectory prediction that comes from Dillon et al.’s (2011) single-stage model is that children should have some knowledge of phonemes even while they are learning the phonetic categories of their language, as opposed to passing through a preliminary stage where they have solid knowledge of phonetic categories but little knowledge of phonological rules and phonemes. To our knowledge, the infant perceptual experimental literature does not currently distinguish between these possibilities, which suggests an area of future research.

28.3.2 Word Segmentation There have been a number of recent papers on Bayesian modeling of word segmentation. These are all based on the models presented in Goldwater (2006) and Goldwater et al. (2009), which make the simplifying assumption (shared by most other computational models of word segmentation) that the input to the learner consists of a sequence of phonemes, with each word represented consistently using the same sequence of phonemes each time it occurs. Between-utterance pauses are represented as spaces (known word boundaries) in the input data, but other word boundaries are not represented. So, the input corresponding to the two utterances “see the kitty? look at the

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 683 kitty!,” transcribed using the phonemic representation used by Goldwater et al., would be siD6kIti lUk&tD6kIti (or, represented orthographically for readability, seethekitty lookatthekitty). The hypothesis space considered by the learner consists of all possible segmentations of the data (e.g. seethekitty lookatthekitty, s e e t h e k i t t y l o o k a t t h e k i t t y, seet he k itty loo k att he k itty, see the kitty look at the kitty, etc.). In this model, P(d|h) is 1 for all of these segmentations because they are all completely consistent with the unsegmented data (in the sense that concatenating the words together produces the input data).7 Consequently, the segmentation preferred by the model is the one with the highest prior probability. The prior is defined, as in the Feldman et al. (2009b) models, using a Dirichlet process, which assigns higher probability to segmentations that contain relatively few word types, each of which occurs frequently and contains only a few phonemes. In other words, the model prefers segmentations that produce smaller lexicons with shorter words. Goldwater et al.’s (2009) computational studies were purely theoretical, with the aim of examining what kinds of segmentations would be preferred by a learner making the assumptions just outlined, as well as one of two additional assumptions: either that words are statistically independent units (a unigram model), or that words are units that predict each other (implemented in this case using a bigram model). While it is clear that the second of these assumptions holds in natural language, the first assumption is simpler (because the learner only needs to track individual words, rather than dependencies between words). So, if infants’ ability to track word-to-word dependencies is limited, then it is worth knowing whether the simpler model might allow them to achieve successful word segmentation anyway. Goldwater et al. found that the optimal segmentation for their unigram model (in fact for any reasonable unigram model) is one that severely undersegments the input data—the word boundaries it finds tend to be very accurate, but it does not find as many boundaries as actually exist. Thus, it produces “chunks” that contain more than one word. The bigram model is nearly as precise when postulating boundaries, but identifies far more boundaries overall, leading to a more accurate segmentation. This study is a good example of an ideal observer analysis, demonstrating the behavior of optimal statistical learners given the available input and certain assumptions about the capabilities of the learners (i.e. whether the learner knows to track word-to-word dependencies or not). Although there is some evidence that children do undersegment when learning words (Peters 1983), it is not clear whether they do so to the same degree as the unigram model, whether their segmentations are more similar to the bigram model, or neither. Thus this study by itself does not tell us whether human behavior is actually consistent with either of the proposed ideal learners, or in what situations, or how more limited (non-ideal) learners might differ from the ideal. 7 In fact, the full hypothesis space for the model consists of all possible sequences of potential words, including those that are inconsistent with the observed data, such as have some pizza and gix blotter po nzm. However, since these sequences are inconsistent with the data, P(d|h) = 0, these hypotheses can be disregarded.

684 Lisa Pearl and Sharon Goldwater Follow-up work by Goldwater and colleagues has begun to address these questions through experimental and computational studies. In the work of Frank et al. (2007, 2010), the authors examine the predictions of Goldwater et al.’s (2009) unigram word segmentation model, as well as that of several other models, and compare these predictions to human performance in several experiments.8 The experiments are modeled on those of Saffran et al. (1996a), and involve segmenting words from an artificial language based on exposure to utterances containing no pauses or other acoustic cues to word boundaries. Frank et al. performed three experiments, manipulating either the number of word tokens in each utterance (1–24 words), the total number of utterances (thus, word tokens) heard in the training phase (48–1200 word tokens), or the number of lexical items in the vocabulary (3–9 lexical items). In the experiment that manipulated the length of utterances, Frank et al. (2007, 2010) found that humans had more difficulty with the segmentation task as the utterance length increased, with a steep drop-off in performance between one and four words, and a more gradual decrease thereafter. Several of the models captured the general decreasing trend, but the Bayesian model correlated better with the human results than all other models tested. The Bayesian model’s results can be interpreted as a competition effect: longer utterances have more possible segmentations, so there is a larger hypothesis space for the model to consider. Although most hypotheses have very low posterior probability, nevertheless as the hypothesis space increases, the total probability mass assigned to all the incorrect hypotheses begins to grow. In the experiment that manipulated the amount of exposure, subjects’ performance improved as exposure increased, but again there was a non-linear effect, with greater improvement initially followed by a more gradual improvement later on. Again, the Bayesian model captured this effect better than the other models. The Bayesian model incorporates a notion of statistical evidence (more data lead to more certainty in conclusions), while many of the other models do not. For example, Frank et al. (2007, 2010) tested a transitional probability model and found that its performance changes very little over time because it only requires a few utterances to correctly estimate the transitional probabilities between syllables, after which the transitional probabilities do not change with more data. Taken together, these two experiments show that the Bayesian model incorporates notions of competition and accumulating evidence in ways that predict human segmentation behavior more effectively than other models, at least with respect to the effects of utterance length and exposure time. This suggests that in some ways humans do behave like an optimal statistical learner. However, the third experiment, which manipulated vocabulary size, showed that in other ways, humans are not like an ideal learner (or for that matter, like other proposed statistical learners). In this experiment, subjects found languages with larger vocabularies more difficult to segment than those with smaller 8

The unigram model was used because, in these experiments, words really are almost statistically independent, so the bigram model would have provided little or no benefit.

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 685 vocabularies. Although this finding was not surprising—intuitively, larger vocabularies impose greater memory demands—all of the models predicted exactly the opposite result. The models have perfect memory, so storing a larger vocabulary poses no difficulty. At the same time, a larger vocabulary makes the sequences of syllables that are true words more statistically distinct from the sequences that are not words. For example, with a three-word vocabulary (words A, B, C), an incorrect segmentation where the hypothesized words are all the possible two-word combinations (AB, AC, BA, BC, CA, CB) scores not much differently from the correct segmentation under the Bayesian model—one hypothesis has three words in the vocabulary, whereas the other has six. In contrast, if there are nine words in the vocabulary, then the analogous incorrect segmentation would require 72 vocabulary items, a much bigger difference from nine. Similarly, in a transitional probability model, transitions across words in a three-word language have relatively high probability, whereas transitions across words in a nine-word language have much lower probability, making them more distinct from within-word transitions. Thus, although humans performed most similarly to the Bayesian ideal learner model in the first two experiments, the third experiment provides an example where human performance differs from the statistically optimal solution assuming perfect memory. The preceding discussion suggests that in order to successfully model human behavior in some language acquisition tasks, it is necessary to account for human memory limitations. Frank et al. (2007, 2010) present several possible modifications to Goldwater et al.’s (2009) Bayesian model that incorporate such limitations through algorithmic means, and find that these are able to correctly model the data from all three experiments. Similar kinds of modifications were also explored by Pearl et al. (2011) in the context of word segmentation from naturalistic corpus data. Like Frank et al., Pearl et al. wanted to examine cognitively plausible algorithms that could be used to implement an approximate version of Goldwater et al.’s Bayesian model. To simulate limited cognitive resources, all the algorithms explored in Pearl et al. (2011) processed utterances one at a time, rather than in a batch as the ideal learner of Goldwater et al. (2009) did. Two algorithms used variants of a method called dynamic programming, which allows a learner to efficiently calculate the probability of all possible segmentations for a given utterance. A third algorithm attempted to additionally simulate the human memory decay process, and so focus processing resources on data encountered more recently. This algorithm was a modified form of the Gibbs sampling procedure used for ideal learners, and is called the decayed Markov Chain Monte Carlo (DMCMC) (Marthi et al. 2002). Notably, the DMCMC algorithm can be modified so it does significantly less processing than the ideal learner’s Gibbs sampling procedure (for the simulations in Pearl et al., the DMCMC algorithm did 89 percent less processing than the ideal learner’s algorithm). Simulations using these algorithms showed that in most cases, constrained learners were nearly as successful at segmentation as the ideal learner, despite their processing and memory limitations. These results suggest that children may not require an infeasible amount of processing power to identify words using an approximation of Bayesian

686 Lisa Pearl and Sharon Goldwater learning. On the other hand, Pearl et al. found that constrained learners did not always benefit from the bigram assumption which was helpful to the ideal learner, perhaps because those constrained learners lacked sufficient processing resources to effectively exploit that information. Interestingly, Pearl et al. also found that some of their constrained learners actually outperformed the ideal learner when the learners used a unigram assumption. This is a somewhat counterintuitive finding, since we might naturally assume that having more memory and more processing power (like the ideal learner has) is always better. However, these results are compatible with the “Less is More” hypothesis (Newport 1990), which suggests that fewer cognitive resources may actually be beneficial for language acquisition. This turned out to be true in the unigram learners Pearl et al. examined. In particular, a property of all unigram learners is that they will undersegment frequent short words that often appear in sequence (like it’s and a), preferring to make them a single word (itsa) in order to explain why they appear together so frequently. Because the ideal learner has a perfect memory, it can see all the data at once and realize that this sequence of phonemes often occurs. In addition, because the ideal learner also has more processing resources, it has more opportunity to “fix” a mistaken hypothesis that these are two separate lexical items instead of one. In contrast, the constrained learner lacks both the ability to see all the data at once and the processing resources to easily fix a “mistake” it made earlier on in learning (where the “mistake” is viewing the phoneme sequence it’s a as two lexical items). As such, it does not make the undersegmentation errors the ideal learner does. Though the Bayesian modeling studies discussed here are preliminary and the robustness of the results should be verified on other languages, they provide a tantalizing example of this idea that is used to explain children’s excellent language acquisition abilities.

28.3.3 Word–Meaning Mapping There have been two notable recent studies involving Bayesian models for learning word–meaning mappings. In section 28.1, we briefly mentioned some experimental results from one of these, Xu and Tenenbaum (2007), and refer the reader to that paper for a description of the computational aspects of the study. Here we discuss instead the work of Frank et al. (2009), who developed a Bayesian model that incorporates both nonlinguistic context and speaker intentions in learning noun–object mappings. Their model assumes that the words uttered by a speaker (and therefore observed by the learner) are determined by the process represented schematically in Figure 28.2. Given the set of objects O that are currently present, the speaker chooses some subset of those objects I as intended referents. The speaker also has a lexicon L containing one or more labels for each object. The utterance W contains one referring word for each intended referent, where that word is chosen at random from the labels available in L for the referent. W can also include non-referring words (verbs, determiners, etc.), but the model is set up in such a way that it prefers words to be referential if possible.

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 687 Objects available to refer to (O) Lexicon items speaker knows (L)

Objects speaker intends to refer to (I) Words uttered by speaker (W)

Figure 28.2 Generative process for producing words in a specific situation. The words uttered (W) depend on both the lexicon (L) and the intended objects (I). The intended objects (I) depend on what objects are currently present (O).

The model is tested using a corpus derived from videos of parent–child interactions, where each utterance was transcribed and annotated with the small number of objects that were visible during that utterance. Given the words uttered by a speaker (W) in the presence of a set of objects (O), the model simultaneously infers the most probable lexicon for the speaker (L) and which objects in O the speaker intended to refer to (I). Although each (W,O) pair can be highly ambiguous, pooling the data across many observable pairs allows the model to disambiguate the word–meaning mappings, just as humans were able to do in the cross-situational word learning experiments of Yu and Smith (2007) and Smith and Yu (2008). The model far out-performed other statistical learning methods such as conditional probability and mutual information, identifying the most accurate set of lexicon items and speaker-intended objects. In addition to its overall high accuracy, the Bayesian model reproduced several known word-learning behaviors observed in humans. For example, the model exhibited a mutual exclusivity preference (Markman and Wachtel 1988; Markman 1989; Markman et al. 2003) because having a one-to-one mapping between a lexicon item and an object referent maximized the probability of a speaker using that lexicon item to refer to that object. The Bayesian model can also reproduce a behavior that children show called one- trial learning (Carey 1978b; Markson and Bloom 1997), where it only takes one exposure to a word to learn its meaning. This occurs when the learner’s prior knowledge and the current available referents in the situation make one word–meaning mapping much more likely than others. For example, suppose there are two objects in the current situation, a bird and an unknown object. Suppose the word dax is used. If the child has prior knowledge of the word bird and what it tends to refer to, then the model will view the lexicon item dax as most likely referring to the unknown object after only this one usage. A third child behavior this model can capture is the use of words for individuating objects (Xu 2002). Xu (2002) found that when infants hear two different labels, they expect two different objects and are surprised if only one object is present; when only one label is used, they expect only one object to be present. That is, infants have an

688 Lisa Pearl and Sharon Goldwater expectation that words are used referentially. This behavior falls out naturally in the Bayesian model because the model has a role for speaker intentions. Specifically, the model used its assumptions about how words work (they are often used referentially) to make inferences about the states of the world that caused a speaker to produce particular utterances (i.e. one label indicates one object, and two labels indicate two objects). In this way, the model replicated the infant behavior results from Xu (2002). In a similar fashion, this model can directly incorporate speaker intention to explain behavioral results such as those of Baldwin (1993a). Baldwin found that children could learn the appropriate label for an object even if a large amount of time elapsed between the label and the presentation of the object as long as the speaker’s intention to refer to the object with that label was clear. In the Bayesian model, this information can be directly incorporated at the level of speaker intentions.

28.3.4 Syntax–Semantics Mapping The meaning of a word is not always directly connected to a referent in the world, however. Some words are anaphoric—that is, they refer to something previously mentioned. Here is an example of one being used anaphorically in English: (1) “Look! A black cat. Oh, look—there’s another one!”

In this situation, many adults would expect to see a second black cat. That is, a common interpretation of one is a cat that has the property black and this utterance would sound somewhat strange (without additional context) if the speaker actually was referring to a gray cat. This interpretation occurs because the adults assume the linguistic antecedent of one is the phrase black cat (i.e. one could be replaced by black cat without the meaning of the utterance changing: Look! A black cat. Oh, look—there’s another black cat.) Lidz et al. (2003b) ran a series of experiments with 18-month-old children to test their interpretations of one in utterances like (1), and found that they too shared this intuition. So, Lidz et al. (2003b) concluded that this knowledge about how to interpret one must be known by 18 months. The interpretation of one depends on what antecedents one can have. According to common linguistic theory, an anaphor and its antecedent must have the same syntactic category. But what category is that? A common representation of the syntactic structure for black cat is in (2), where N0 refers to a basic noun like cat and N′ is a category that includes both basic nouns like cat and nouns containing modifiers like black cat. (2) [n’ black [n0’ [n cat]]]

Since one can have the string black cat as its antecedent, and black cat is category N′, then one should also be category N′. If one were instead category N0, it could never have black cat as its antecedent—it could only have cat as its antecedent, and we could not get the

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 689 interpretation many adults do for (1). Note that the bracketing notation in (2) indicates that cat can be labeled as both syntactic category N′ ([n′[n0 cat]]) and syntactic category N0 ([n0 cat]). This is what allows us to have cat as one’s antecedent sometimes, as in I don’t want a black cat—I want a gray one. In this utterance, one must refer to cat, rather than black cat, and this is possible because cat can also be category N′, just as one is. Under this view, adults (and 18-month-olds) have apparently learned that one should be category N′, since they allow black cat to be its antecedent. This includes both syntactic and semantic interpretation knowledge. On the syntactic side, one is category N′ in utterances like (1). On the semantic side, when the potential antecedent of one contains a modifier (such as black), that modifier is relevant for determining the referent of one. This relates to syntax–semantics mapping: because the modifier is relevant for determining the referent, the larger of the two N′ options should be chosen as the intended antecedent (black cat instead of just cat). How this knowledge is acquired by 18 months has long been debated. The problem is the following. Suppose the child encounters utterance (1) from above in the context where two black cats are present. Suppose this child is not sure which syntactic category one is in this case—N′ or N0. If the child thinks the category is N0, then the only possible antecedent string is cat, and the child should look for the referent to be a cat. Even though this is the wrong syntactic category for one, the observable referent will in fact be a cat. How will the child realize that this is the wrong category, since the observable referent (a cat that also happens to be black) is compatible with this hypothesis? The same problem arises even if there is no modifier in the antecedent: Look! A cat. Oh, look—there’s another one. The hypothesis that one is category N0 is compatible with the antecedent cat, which is compatible with the observable referent being a cat, as indeed it is. These ambiguous data dominate children’s input for anaphoric one, and only rarely do unambiguous data appear (approximately 0.25 percent according to a corpus analysis by Lidz et al. 2003b and never in the corpus analysis conducted by Pearl and Mis 2011, 2015). This is unsurprising once we realize that unambiguous data require a specific coincidence of utterance and situation. For example, suppose there are two cats present, one black and one gray. An unambiguous utterance would be Look! A black cat. Hmmm … there’s not another one around, though. In this utterance, one cannot refer to cat, since there is clearly another cat around. Instead, one must refer to black cat, which would allow the utterance to make sense—there’s not another black cat around (the other cat is gray). Because black cat includes both a modifier and a noun, it must be N′, so this data point is unambiguous for one’s syntactic category and semantic interpretation. Because unambiguous data are so rare in children’s input, knowledge of anaphoric one was traditionally considered unlearnable without innate, domain-specific biases on the hypothesis spaces of children. In particular, many nativists such as Baker (1978), Hornstein and Lightfoot (1981), and Crain (1991) proposed that children already knew that one was category N′, so the choice between N0 and N′ would never occur, eliminating the syntactic component of the learning problem.

690 Lisa Pearl and Sharon Goldwater Regier and Gahl (2004) discovered that a learner using Bayesian inference can leverage useful information from ambiguous examples that include a modifier, like (1). Specifically, for examples like (1), the learner observes how often the referent of one is a cat that is black. If the referents keep being black cats, this is a suspicious coincidence if one referred to cat, and not to black cat. The learner capitalizes on this suspicious coincidence and soon determines that one takes black cat as its antecedent in these cases. Since the string black cat can only be an N′ string (see (2)), the learner can then infer that one is of category N′ as well. The only specific linguistic knowledge the learner requires is (i) the definition of the hypothesis space (hypothesis 1: one = N′ category, hypothesis 2: one = N0 category), and (ii) knowing to use these specific informative ambiguous data. Pearl and Lidz (2009) later explored the consequences of a Bayesian learner that did not know this second piece of information, and instead attempted to learn from all potentially informative ambiguous data involving anaphoric one (such as Look, a cat! Oh, look—another one). Pearl and Lidz found that this “equal opportunity” learner made the wrong choice, inferring that one was category N0 due to the suspicious syntactic coincidences available in the additional ambiguous data. Thus, the second piece of information is vital for success, and Pearl and Lidz speculated that it is linguistic-specific knowledge since it requires the child to ignore a specific kind of language data (note, however, that it could be derived using a domain-general strategy—see Pearl and Lidz (2009) for more detailed discussion of this point). Foraker et al. (2009) investigated another strategy for learning the syntactic category of one, this time drawing only on syntactic information and ignoring information about what the intended referent was. In particular, a learner could notice that one is restricted to the same syntactic arguments (called modifiers) that words of category N′ are restricted to, rather than being able to have both modifiers and another syntactic argument (complements) that words of category N0 can have. That is, one, like N′ words, can take only modifiers as arguments, while N0 words can take both modifiers and complements as arguments. This restriction is a suspicious coincidence if one is really category N0. So, a Bayesian learner can infer that one is category N′. Notably, however, the ability to distinguish between modifiers and complements requires the child to make a complex conceptual distinction (see Foraker et al. 2009, for more discussion on this point) as well as link that conceptual distinction to the syntactic distinction of complements and modifiers, and it is unclear if 18-month-old children would be able to do this. Pearl and Mis (2011, 2015) considered expanding the learner’s view of the relevant data to include what they call indirect positive evidence. Specifically, they note that one is not the only anaphoric element in English. Other pronouns also have this referential property, such as it, her, him, etc. Moreover, other pronouns share distributional properties with one: Look, at the black cat! I want it/her/him/one. This might cause children to view data involving these other pronouns as informative for learning about one. Notably, the antecedents for these pronouns always include the modifiers—in the utterance under discussion, the antecedent is the black cat, and so the referent will be the black cat in

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 691 question. If a Bayesian learner is tracking whether the mentioned property (e.g., “black,” as indicated by the modifier black) is important for picking out the intended referent, these additional pronoun data will cause that learner to assume that mentioned properties are indeed important for interpreting anaphoric elements. So, when the child encounters an ambiguous example with anaphoric one like Look, a black cat! Oh, look— another one, the child will assume the mentioned property black is important, and pick one’s antecedent to be black cat rather than cat. This then leads to the correct interpretation. Moreover, because black cat can only be category N′, this also leads to the correct syntactic category for one in this context. Interestingly, while the learner gets the correct interpretation in this context, and so matches the 18-month-old behavioral data from Lidz et al. (2003b), the learner actually has the wrong hypothesis about one’s category (one=N0) when no modifier is present (Look, a cat! Oh, look—another one.) However, this wrong hypothesis does not lead to the wrong interpretation in this utterance, and so could easily go undetected. (See Pearl and Mis 2015 for discussion of examples where the wrong hypothesis about one has observable consequences.) This suggests that the 18-month-olds from Lidz et al. (2003b) may not have the full range of adult intuitions either.

28.3.5 Syntactic Structure Children must also discover the rules that determine what order words appear in. For example, consider the formation of yes/no questions in English. If we start with a sentence like The cat is purring, the yes/no question equivalent of this sentence is Is the cat purring? But how does a child learn to form this yes/no question? One rule that would capture this behavior would be “Move the first auxiliary verb to the front,” which would take the auxiliary verb is and move it to the front of the sentence. This rule is a linear rule, since it only refers to the linear order of words (“first auxiliary”). Another rule that would capture this behavior is “Move the main clause auxiliary verb to the front.” This is a structure-dependent rule, since it refers to the structure of the sentence (“main clause”). (3) Example of yes/no question formation

(i) Sentence: The cat is purring. (ii) Linear Rule: Move the first auxiliary verb Is the cat tis purring (iii) Structure-Dependent Rule: Move the main clause auxiliary verb Is [S the cat tis purring] Both of these rules account for simple yes/no questions like Is the cat purring?, but only the structure-dependent rule accounts for behavior of more complex yes/no questions, as in (4).

692 Lisa Pearl and Sharon Goldwater (4) Example of complex yes/no question formation

(i) Sentence: The cat who is in the corner is purring. (ii) Linear Rule: Move the first auxiliary verb *Is the cat who tis in the corner is purring (iii) Structure-Dependent Rule: Move the main clause auxiliary verb Is [S the cat [S who is in the corner] tis purring] Children as young as 3 years old appear to know that structure-dependent rules are required for complex yes/no question formation in English (Crain and Nakayama 1987), yet unambiguous examples like (4iii) that explicitly demonstrate this structure- dependence are rare in child-directed speech (Legate and Yang 2002; Pullum and Scholz 2002). Since the yes/no question data children usually see are compatible with both linear and structure-dependent rules, it seems surprising that children know the structure-dependent rule for complex yes/no questions at such an early age. A standard explanation is that children innately know that language rules are structure-dependent, so they never consider other kinds of analyses for their input, such as linear rules (e.g. Chomsky 1971). Perfors et al. (2006, 2011) investigated whether a Bayesian learner that considered both linear and structure-dependent analyses could correctly infer that structure- dependent analyses were preferable, given child-directed speech data. Children must have a structure-dependent analysis of the linguistic data before they can hypothesize structure-dependent rules, so inferring a structure-dependent representation is a foundation for later inferring structure-dependent rules. Perfors et al. proposed that while complex yes/no questions implicating structure-dependent analyses might be rare, other data in the input, taken together, might collectively implicate structure-dependent analyses for the language as a whole. This could indirectly implicate the correct complex yes/no question structure without the need to observe complex yes/no questions in the input. The hypothesis space of the Bayesian learner included both a linear set of rules (a linear grammar) and a structure-dependent set of rules (a hierarchical grammar) to explain the observable child-directed speech data. That is, given data (D), the learner inferred which grammar (G) satisfied two criteria: 1. the grammar best able to account for the observable data 2. the simplest grammar, where a grammar with fewer and/or shorter rules can be thought of as simpler The posterior probability P(G|D), which Bayes’ Theorem tells us is proportional to P(D|G)*P(G), incorporates both criteria. The likelihood P(D|G) rewards grammars that are best able to account for the observable data, while also rewarding simpler derivations using the available grammar rules. The prior P(G) rewards simpler grammars.

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 693 For data, Perfors et al. (2006, 2011) used the child-directed sentences from the Adam corpus (Brown 1973) of the CHILDES database (MacWhinney 2000), and divided the sentences into six groups based on frequency. The most frequent sentences also tended to be simpler. Perfors et al. found that a hierarchical grammar was optimal for all the data sets that included more complex sentence forms, that is, those that included at least some sentences that occurred less frequently than 100 times. Thus, if the Bayesian learner is exposed to enough complex sentences, it can infer that structure-dependent rules for generating the observed data are better than linear rules, and can apply this knowledge to analyzing and proposing rules for complex yes/no questions, even if no complex yes/no questions have been encountered before. Interestingly, even the earliest data in the Adam corpus shows a diversity of linguistic forms, suggesting that young children’s data may be varied enough for them to prefer structure-dependent analyses if they are approximating the Bayesian inference procedures used by Perfors et al. An open question is whether children have the memory and processing capabilities to make these approximations. Perfors and colleagues (Perfors et al. 2010) also used Bayesian learners to investigate how recursion might be instantiated in grammars. Recursion occurs when a phrasal category can be expanded using rules that eventually include another instance of that category, as in (5), where an S can be expanded using an NP (5i) and an NP can be expanded using an S (5ii). (5) Recursive rule example

(rule i) (rule ii)

S → NP VP NP → N complementizer S

Recursion has been argued to be a fundamental and possibly innate part of the language faculty (Chomsky 1957), as well as one of the only parts of the language faculty specific to humans (Hauser et al. 2002). Perfors et al. (2010) evaluated grammars with and without recursive rules to decide which was optimal for parsing child-directed speech data. Grammars with recursive rules allow infinite embedding (Depth 3+ in (6)), while grammars without recursive rules allow embedding only up to a certain depth, for example two clauses deep (Depth 0, 1, and 2 in (6)). (6) Embedding

(a) Subject-embedding [Depth 0] [Subj The cat] is purring. [Depth 1] [Subj The cat that [Subj the girl] petted] is purring. [Depth 2] [Subj The cat that [Subj the girl that [Subj the boy] kissed] petted] is purring. [Depth 3+] [Subj The cat that [Subj the girl that [Subj the boy that [Subj…] kissed] petted] is purring.]

694 Lisa Pearl and Sharon Goldwater

(b) Object-embedding [Depth 0] The cat chased [Obj the mouse]. [Depth 1] The cat chased [Obj the mouse that scared [Obj the dog]]. [Depth 2] The cat chased [Obj the mouse that scared [Obj the dog that barked at [Obj the mailman]]]. [Depth 3+] The cat chased [Obj the mouse that scared [Obj the dog that barked at [Obj the mailman that [Obj…]]]]. The Bayesian learner had the same preferences as the one in Perfors et al. (2006, 2011): it attempted to identify the grammar that best balanced simplicity and the ability to account for the observed data. Note that grammars with recursive rules predict sentences that will rarely or never occur, such as the sentences with embedding of Depth 3+ in (6), so these grammars will not fit the data as well as grammars with limited embedding. However, a recursive grammar is often simpler than one that needs to encode exactly a specific depth of embedding, so whether recursion is learned will depend on the tradeoff between these two factors with respect to the observed data. Perfors et al. (2006, 2011) could have assumed that the learner considered only two kinds of hypotheses: grammars where rules are recursive whenever possible, or grammars where no rules are recursive. Instead, they also allowed the learner to have separate recursive rule types for subject-NPs (as in (6a)) as opposed to object-NPs (as in (6b)), since embedding is more often observed and more easily comprehended when it is object embedding (compare Depth 2 in (6a) to Depth 2 in (6b)). The Bayesian learner, when given child-directed speech data, inferred that the optimal grammar was one where the subject-NP rules allowed both recursive rules and depth-limited embedding while the object-NP rules were only recursive. This result is due to the fact that multiple embeddings are much more frequently observed in object- NPs than in subject-NPs. More broadly, it also suggests that a statistical learner may be able to discover when recursive rules are useful and when they are not. A child would not necessarily need to innately know that recursion is required for representing object- NPs. Instead, if recursive rules are available in their hypothesis space, children would be able to infer from their input that recursion is required for some parts of the grammar.

28.3.6 General Summary of Studies We have tried to review several studies that highlight the contribution of Bayesian inference to language acquisition, including studies in the domains of phonetics and phonology, word segmentation, word–meaning mapping, syntax–semantics mapping, and syntactic structure. Though Bayesian modeling is only one approach to understanding language acquisition, it provides a way to investigate questions about the utility of statistical information in the data and which acquisition problems statistical learning can deal with effectively. In addition, it can often provide a coherent account of observed

Statistical Learning, Inductive Bias, AND BAYESIAN INFERENCE 695 human behavior by demonstrating what a learner using Bayesian inference would do with the available data.

28.4 Conclusion In this chapter, we have discussed ways in which Bayesian modeling can be used to explore questions of interest to the language acquisition community. As Bayesian models assume humans can use statistical information in sophisticated ways, we also provided a historical overview of statistical learning within the field of language acquisition, including experimental studies that demonstrate human statistical learning ability. We then discussed the Bayesian modeling framework, including some of its benefits that may be particularly interesting to both developmental and theoretical linguists. Finally, we reviewed several computational studies that modeled acquisition of knowledge in different domains using Bayesian inference techniques. Statistical learning techniques such as Bayesian inference, when coupled with well-defined problems and hypothesis spaces, can help us understand both the nature of the data available to children and how they are able to acquire complex linguistic generalizations so rapidly.

Chapter 29

C ompu tat i ona l Approac h e s to Parameter Set t i ng in Generative L i ng u i st i c s William Gregory Sakas

29.1 Introduction

29.1.1 Principles and Parameters Since the very beginning of modern generative linguistics, a central tenet of the field has been that any theory of human language grammar must provide a viable account of how a child1 language learner acquires the grammatical knowledge of an adult. In Aspects of the Theory of Syntax, Chomsky (1965) provided one of the earliest generative accounts of the process. His acquisition model consisted of two components: a rule writing system and a formal evaluation metric. The process of acquisition was envisioned as the child’s construction of a grammar by creating rules that were compatible with the linguistic environment they were exposed to, and given a choice of grammars, selecting the grammar most highly valued under the evaluation metric. At the time, most generative theories of syntax were highly transformational. Although both simplicity and economy were proposed as the proper evaluation metrics, transformational rules were often highly complex and unconstrained, making it difficult to form a working model of how a simplicity or economy evaluation metric could be

1

Throughout I mean “child” and “children” to be inclusive of infant-and toddler-aged young humans.

Computational Approaches to Parameter Setting 697 applied.2 The principles and parameters framework (Chomsky 1981a, 1986) grew out of the inability of linguists to bring to fruition this early vision of language learning. The principles and parameters framework does away with both the rule writing system and the evaluation metric. Instead the principles are the grammar constraints and operations that govern all possible human languages, and parameters are the points of variation between languages. Learning is the process of setting parameters to exactly the parameter values that generate the child’s native language.3 Once the parameters are set to their correct values, the child has achieved adult syntactic competence. Both the principles and parameters are prescribed innately as part of Universal Grammar (UG). As Chomsky puts it: We can think of the initial state of the faculty of language as a fixed network connected to a switch box; the network is constituted of the principles of language, while the switches are options to be determined by experience. When switches are set one way, we have Swahili; when they are set another way, we have Japanese. Each possible human language is identified as a particular setting of the switches—a setting of parameters, in technical terminology. If the research program succeeds, we should be able literally to deduce Swahili from one choice of settings, Japanese from another, and so on through the languages that humans acquire. The empirical conditions of language acquisition require that the switches can be set on the basis of the very limited properties of information that is available to the child. (Chomsky 2000b: 8)

A grammar, in current theories of generative syntax, is no longer construed as a set of rules and transformation operations, but rather as a vector of labeled parameters together with parameter values that generate a particular human language. As an example: all sentential clauses in all languages must have a subject by the Extended Projection Principle (EPP). Whether or not the subject is obligatorily explicit is determined by a parameter, the so-called Null Subject parameter, together with its parameter value. English has the Null Subject parameter set to the value –Null Subject, and Spanish has it set to the value +Null Subject. The number of parameters that define the domain of human grammars (the number of switches in Chomsky’s metaphorical description) is posited as finite and the parameters are standardly considered to be binary, that is, there exist two (and only two) values for each parameter. Although this description of the Null Subject Parameter lives in Government and Binding Theory, grammars in the Minimalist program are also parameterized; parameters are feature-based and housed in the lexicon.4 2 There were parallel concerns related to how features and markedness in phonology could be reconciled with an evaluation metric, see discussion in The Logic of Markedness (Battistella 1996). 3 Throughout I mean “native language” to be inclusive of native languages in the case that a child is brought up in a multi-lingual environment, and likewise for “target grammar.” 4 For reasons of narrative and space, discussion here is largely restricted to computational acquisition models of binary syntactic parameters. It is important to note that there are important computational results stemming from research in P&P acquisition that use multi-valued parameters (e.g. Manzini and Wexler 1987; Wexler and Manzini 1987; Briscoe 2000; Villavicencio 2001), and models that address P&P acquisition of phonology rather than of syntax (Dresher and Kaye 1990; Dresher 1999; Pearl 2007).

698 William Gregory Sakas At first blush it would seem that the principles and parameters (P&P) framework offers a much more manageable account of acquisition than the model Chomsky first proposed in Aspects. Instead of generating a rule from a virtually infinite (i.e. unconstrained) space of rules and applying an evaluation metric to an infinite set of grammars, all a child needs to do is to observe the relevant linguistic characteristic of a single utterance (e.g. a declarative sentence without a subject), and some parameter can be set appropriately (e.g. to +Null Subject). Indeed, though only lightly sketched out when the P&P framework was conceived, it was widely accepted that this “switch-f lipping” concept of parameter setting was how language acquisition proceeded. Parameter setting was more or less instantaneous, “automatic,” accurate, and deterministic. There appeared to be no need to search a space of grammars, and no need to revise a parameter value once a choice was made. This view has often been referred to as triggering theory, triggering, or the trigger model of acquisition. Most (though not all) computational models of parameter setting have taken another approach. The acquisition process is implemented as a nondeterministic (trial-and- error) search of the grammar or parameter space (all possible combinations of parameter values, that is, all possible grammars). These approaches are the antithesis of the triggering approach which is about the deterministic (non-trial-and-error) process of setting parameters based on reliable evidence; triggering has no need to conduct a search of the grammar space—given proper evidence to trigger one parameter value over another, set the parameter to that value once and for all. The computational psycholinguistics community has, by and large, abandoned triggering theory. This has come about not for lack of interest in triggering, but rather because of the realization that the domain of human languages is vexed with a considerable amount of parametric ambiguity. There are many examples of syntactic phenomena which can be licensed by multiple combinations of parameter values necessitating that the learner choose from amongst alternatives (Clark 1989, 1992).5 Though the correct parameter values are necessarily among the alternatives, there would presumably be many incorrect alternatives. If true (at least as far as the argument goes) a triggering model in the best case would be stymied into inaction due to ambiguity, in the worst, it would make unrecoverable errors. In short, parametric ambiguity is the driving force behind the genesis of computational approaches to parameter setting and will serve well as a central theme throughout this chapter. In what follows I present a brief overview of three core concepts relevant to the computational modeling of language acquisition, then a non-exhaustive history of research specifically on computational models of P&P acquisition, and finally draw together some points on whether or not the deterministic triggering theory is viable after all.

5

A concrete example is given in section 29.2.1, “Clark: Parametric Ambiguity.”

Computational Approaches to Parameter Setting 699

29.1.2 No Negative Evidence, Learnability vs. Feasibility, and the Subset Principle The term negative evidence in the study of language acquisition refers to evidence about what is not grammatical in the target language (the language the learner is being exposed to). The evidence can take many forms: overt statements from caregivers about what is not grammatical, repetitions of a child’s ungrammatical utterance but with a minor grammatical change, urging by caregivers to get a child to repeat ungrammatical utterances but grammatically, etc.6 Most accounts of acquisition in a generative framework accept results from psycholinguistic research (e.g. Brown and Hanlon 1970; Marcus 1993) that children do not receive negative evidence. All computational modeling endeavors in P&P acquisition make this assumption as well—only sentences from the target language are encountered by the learner, and there is no “extraneous” information about ungrammaticality. Learnability, often referred to as the “logical study of language acquisition,” attempts to answer the question: Under what conditions, in principle, is acquisition possible? There is an important distinction to be made between learnability and feasibility (Kapur 1994; Sakas 2000). Feasibility attempts to answer a different question: Is acquisition possible within a reasonable amount of time and/or with a reasonable amount of computation? Though this second question has been raised even before P&P (e.g. Pinker 1979), its importance has increased post-P&P since any P&P domain consisting of a finite number of parameters and values is formally learnable (Osherson et al. 1986; Bertolo 2001). By “formally learnable,” I mean that there exist mathematical proofs establishing that there is some learner that will acquire every language in any and all P&P domains. The proofs demonstrate (given some assumptions7) that an exhaustive search of any finite space of grammars (P&P or not) will eventually discover the target grammar— the grammar generating the sentences of the language the learner is being exposed to. Since a grammar space delineated by a P&P framework is by definition finite, any P&P domain is learnable. However in the case of a P&P domain, the domain quickly becomes quite large; a mere 30 parameters yields a space of 230 grammars—over a billion (actually exactly a gigabyte of grammars—well over the 258,890,850 possible combinations of 5 numbers on the New York Mega Millions Lottery), and grows exponentially in the number of parameters. Forty parameters would delineate a space of 240 = over a trillion grammars (more than the number of neurons in the human brain), and 50 parameters would 6

Another type of negative evidence is statistical in nature. If a linguistic construction never occurs in the input, at some point, it could be argued, that a construction in ungrammatical. This type of negative evidence is referred to as Indirect negative evidence (Chomsky 1981a). Indirect negative evidence plays a particularly important in Bayesian statistical models of acquisition, but does not play a role in the P&P models discussed in this chapter. 7 Most notably the consideration by the learner of grammars that generate subset languages before grammars that generate their superset languages. See discussion immediately below.

700 William Gregory Sakas manifest a grammar space equal to the number of stars in five thousand Milky Way galaxies (each galaxy containing approximately 200 billion stars). Since children presumably don’t embark on an exhaustive search over multiple grammars on every input sentence, learnability results concerning (finite) P&P domains are unattractively weak from the point of view of computational psycholinguistics. Still, both questions are relevant. There are psychologically attractive computational models that are not able to acquire all the target languages in a domain (i.e. which do not meet learnability criteria) and models which do acquire all the target languages8 but require an unwieldy number of input sentences (i.e. which do not meet feasibility criteria). The Subset Principle, first proposed by Gold (1967) (though not by that name), and later revisited in depth from more linguistic viewpoints (Berwick 1985; Manzini and Wexler 1987; Wexler and Manzini 1987) is a product of formal learnability theory which has significant implications for theories of human language learning. An informal definition of the Subset Principle will suffice here: the learner must never hypothesize a language which is a proper superset of another language that is equally compatible with the available data. The basic idea is that if a language learner were to hypothesize a superset of the target language, he or she would presumably be forever caught in the superset language unable to retreat to the target language. This is because the only evidence available to the learner are sentences of the (subset) target language (given the assumption that there is no negative evidence), all of which are compatible with the incorrectly hypothesized (superset) language. In theory, a child who is unfortunate enough to hypothesize a superset of his or her target language would become an adult able to comprehend all utterances of their fellow native speakers; but since their internal grammar licenses the superset language, they would presumably be uttering ungrammatical sentences that are not licensed by the subset grammar. Since human learners do not exhibit this behavior,9 it has generally been accepted that some form of the Subset Principle must be followed by the child language learner. However, many issues concerning the Subset Principle are currently unresolved: Do children actually apply the Subset Principle? Perhaps children violate the Subset Principle and are yet able to retreat to the subset target language? Do subset–superset relationships actually exist in the domain of human languages? How could children apply the Subset Principle incrementally (in real-time) without global knowledge of what sentences are licensed by each and every grammar in the human language domain? Extensive discussion of these points can be found in Fodor and Sakas (2005), but for our purposes here, we only need to keep an eye on how each of the computational models that are reviewed deals with, ignores, or sets aside issues related to subset–superset relationships between languages. 8

Although learners in all generative theories of syntax acquire a target grammar, the exposition is sometimes clearer when the phrase acquire a target language is used. The reader should take acquire a language to mean acquire a grammar. 9 Though there is the occasional report in the literature of overgeneration and subsequent retreat by children (Déprez and Pierce 1993).

Computational Approaches to Parameter Setting 701

29.2 The Models What follows is a review of computational models of parameter setting post Chomsky’s original conception of the process. The reader wishing to skip to one particular model is advised to first read the following longish section 29.2.1 “Clark: Parametric Ambiguity” as it introduces a number of concepts that have prevailed in the field for over two decades. Discussions following section 29.2.1 will make reference to points made there. This review is certainly not comprehensive, but at least in my opinion the models mark important stages in computational approaches to parameter setting, and taken together tell a comprehensible story.

29.2.1 Clark: Parametric Ambiguity One of the earliest computational models of parameter setting was presented by Robin Clark (1989, 1992). Importantly, Clark was the first to point out that any parameter setting model of language acquisition would need to address parametric ambiguity: when syntactic phenomena can be licensed by multiple combinations of parameter values which would necessitate the learner choosing from amongst alternatives. Clark presents extended discussion of ambiguity involving how case theory, θ-theory, government theory, and lexical properties interact to create a variety of parameter settings—meaning different grammars—that would license a subset of English sentences. But a simple example will suffice here. A surface sequence of Subject Verb Object may either be generated as a base structure with no movement (as in English) or be derived from a base structure such as Subject Object Verb by movement of the verb to the second position (as occurs in German). One syntactic description of this V2 phenomenon involves three proposed parameters: Obligatory Topic (ObT), V-to-I movement, and I-to-C movement. For each parameter a + parameter value indicates “obligatory.” +ObT dictates some topicalizable lexical element (or constituent) be moved into Spec,CP; +V-to-I dictates movement of a tensed verb (or aux) under I from its base-generated position dominated by V; and +I-to-C dictates movement of a tensed verb (or aux) from I to C. Under this description, German is +ObT, +V-to-I, and +I-to-C. The surface effect is that the topic (in Spec,CP) is the first realized item in the surface string, and the verb, moved from V to I then from I to C, is necessarily the second item in the string. The result is that the same string can be assigned at least two structures licensed by UG.10 10 Note that it could be licensed by other combinations of parameter values as well, for example by optional topicalization of S (–ObT), and, V in root position (–V-to-I) or V in I position (+VtoI) without movement into C (–I-to-C). This is also an example of what Clark calls parametric interaction: when parameter values conspire to generate surface phenomena that do not reveal their parametric signature. I.e., a learner given a surface string can’t tell which parameters (or parameter values) were employed in generating the string.

702 William Gregory Sakas What does this mean to the child language learner? Consider a child encountering the sentence Mary sees the dragon. If the child were to (mistakenly) assign the German parameter values she could parse the sentence and interpret it, but would presumably at some point utter a sentence such as John will the ball hit which is licensed by the German, but not by (contemporary) English. Since there are very few errors of comission made by children (Snyder 2007) it would seem that the parameter setting mechanism must either be: 1. recognizing the ambiguity and choosing not to adopt either the German or English settings,11 or, 2. ignoring the ambiguity, guessing a grammar hypothesis—recall, a vector of parameter values (perhaps monitoring how well each value performs on sentences encountered in the future). At some point during the course of acquisition, the learner stops guessing and acquires the correct parameter values and uses them for production. Clark also defines a related difficulty parametric learners are faced with, that of parametric expression. As an example consider another proposed parameter: the pied piping parameter. A +value of this parameter dictates that the complement of a prepositional phrase is moved with its head. A –value allows the complement to move and leave the head behind. French is +pied piping, Colloquial English is –pied piping. (The syntactic details of the parameter are not important for present purposes.) Now, Mary sees the dragon does not contain a prepositional phrase—neither value of the pied-piping parameter is expressed. As a result the sentence can be parsed with either value. This is different from parametric ambiguity since the setting of the pied piping parameter is irrelevant to the structure of the sentence. In the case of pied piping, it might seem fairly easy to recognize which parameters a sentence expresses, but that is not true of all parameters. Take the ObT parameter, if the subject (Mary) has moved into topic position in Mary sees the dragon then in order to determine if the +ObT value is expressed, the learner would need to know if the movement was due to a principle of core grammar (all grammars allow topicalization) or due to a +ObT setting. If Mary remains in the base-generated subject position (Spec,IP) then the –ObT parameter value is clearly expressed;12 since Mary is at the head of the sentence nothing can have been moved into topic position (higher than Mary). To determine if the ObT parameter is expressed in the utterance, the learner would need access to the grammar that was used to produce the utterance by the speaker which she clearly does

11 Though perhaps temporarily adopting one or the other for the purpose of understanding the sentence at hand. 12 Note that this is not so clear if the grammar allows null topics. Then there may have been a topicalizable element (e.g. an adverb) moved into spec,CP and subsequently deleted. In this case it is unclear if the –ObT parameter is expressed. See discussion of these sorts of parameter interactions in Sakas and Fodor (2012).

Computational Approaches to Parameter Setting 703 not possess. This is also true for the verb movement parameters discussed. Not only is parametric expression related to ambiguity, it is intertwined with it in non-trivial ways. Finally, a point unrelated to either parametric expression or ambiguity. Clark notes that learning should be constrained so that the acquisition process proceeds gradually. That is to say, the learner does not hypothesize wildly different sets of parameter values after receiving each input sentence. To encode this notion, Clark defines the Single Value Constraint. “language learning is gradual by specifying that each successive hypothesis made by the learner differs from the previous hypothesis by the value of at most one parameter” (Clark 1992: 90). Although the computational approach employed by Clark does not require the Single Value Constraint, it deserves mention as it has been adopted by Gibson and Wexler’s TLA model described in section 29.2.2. (Gradualism is discussed at length by Yang (2002), see section 29.2.4.) Given the difficulty of identifying and resolving the potentially complicated entanglements associated with parametric ambiguity, Clark was the first computational linguist to relinquish triggering and to employ a standard machine learning approach from artificial intelligence in his model of parameter setting, a genetic algorithm. A genetic algorithm (GA) is a search strategy designed to mirror evolution. In its most generic form the algorithm receives an input, then the GA considers a pool of potential hypotheses (a population) by applying a fitness metric to all the hypotheses in the pool. After the fitness metric is applied, the least fit hypotheses are removed from the pool (“die off ”), the attributes (genes) of the most fit are combined (mating), and some attributes are changed at random (mutation). The process repeats until some termination criterion (or criteria) is met; only the most fit hypotheses survive.13 Genetic algorithms implement a form of heuristic search—a strategy that is expected to identify the optimal hypotheses in the search space (e.g. the space of all possible parametric grammars), but is not logically guaranteed to converge on an optimal solution. Heuristic search is most often employed when the search space is too large and unruly for efficient implementation of an algorithm that would provably converge, for example exhaustive search. However in the case of modeling human learning, psychological plausibility is also a consideration. Clark speculates (1992: 143) that his GA model could reasonably be embodied in a human language learner which makes it an attractive alternative to exhaustive search (see the discussion above in the section 29.1.2) though there has been some critique (see later in this section). Indeed all the search strategies outlined in this chapter that have been suggested for models of parameter setting are heuristic. In Clark’s parameter setting model the pool of hypotheses are parameterized grammars; each hypothesis is a vector of parameter values (a string of +s and –s), 13

The terminology may become a little confusing due to the use of both standard GA jargon and linguistically appropriate terminology based on context. A mini thesaurus might help: population = pool = collection of hypotheses member of the population = hypothesis = grammar = vector of parameter values = string of +’s and –’s gene = attribute = a single parameter value of a grammar.

704 William Gregory Sakas each value is “gene.” When an input sentence is encountered, the algorithm parses the sentence with all the grammars in the pool. Then a particular fitness metric is applied to all the grammar hypotheses. In its calculations, the fitness metric prefers: 1. grammars that return fewer parsing violations; 2. grammars that generate subset languages; 3. grammars that require fewer parameter values to successfully parse the input. After a fitness ranking is calculated for all the grammar hypotheses in the population, Clark’s GA proceeds as already described. Given: • a set hypotheses, Pool = (h1, h2, …, hn ), i.e., a set of grammars, where each hi = (p1, p2, …, pm), each pi is a either a +value or a –value • a set of fitness values, Fit = (f1, f2, …, fn ) where each fi is the fitness of hypothesis hi based on (1), (2), and (3) above • a function MAX which randomly selects two hypotheses probabilistically favoring the most fit hypotheses over the less fit A sketch of the algorithm follows.14 Clark’s Genetic Algorithm • Set Pool to n randomly chosen hypotheses • For each input sentence s: o For each hypothesis hi in Pool : ■ Parse s with hi ■ Calculate each hypothesis’ fitness, fi o Remove the least fit hypotheses from Pool o If Pool contains only one hypothesis: STOP o Apply MAX to identify hmax1 and hmax2 o Mate hmax1 and hmax2 to generate two new hypotheses, hchild1 and hchild2 o Add hchild1 and hchild2 to Pool o Randomly pick a single hypothesis, perform mutation. During the course of learning, the fitness metric together with the mating operation gradually improves the population of hypotheses on linguistic grounds (1), (2), and (3). The mutation operation is designed to introduce diversity into the population. If the algorithm is proceeding towards a sub-optimal solution, a random change of a few parameter values gives the algorithm a chance to “change course”

14

The order of the steps differs slightly from Clark’s.

Computational Approaches to Parameter Setting 705 and hypothesize grammars outside of the current population. The single remaining hypothesis at the end of learning is the grammar the GA found during its search of the parameter space that would best be able to generate all of the sentences of the target language. Clark’s work in computational modeling of parameter setting is noteworthy because it breaks a variety of pre-existing molds. One is that his model is not error-driven in the spirit of many pre-parameter setting acquisition algorithms (Wexler and Culicover 1980) in that the learner may change its hypothesis pool even if one or more of the hypotheses in the pool are able to assign a correct syntactic structure to the input sentence. This offers Clark’s model several advantages, it is robust to noise, does not in principle fall prey to subset–superset relationships between languages in the domain, hence does not explicitly need to implement the Subset Principle,15 and is able to gradually improve the learner’s hypotheses not only in the case that the grammars produce many parsing violations (as a strictly error-driven learner would), but also in the case that the grammar pool is correctly parsing the input; not being error-driven allows Clark’s model to take advantage of “expected events, not [just] to occurrence of unexpected events” (Kapur 1994: 510). As already mentioned, it is also the earliest work that exposes a significant quandary for the original concept of triggering as a model for syntactic parameters—parameter setting cannot be a transparent automatic enterprise, it must face and navigate a host of complications due to parameter interaction and parametric ambiguity. Clark’s solution is to take a radically different approach than classical triggering—parameter settings are acquired through a search of the hypothesis space of grammars. The search is nondeterministic (i.e. trial-and-error), but is guided (presumably) advantageously by the pool of most fit grammars. The pool is updated from generation to generation randomly after each sentence is encountered; parameter values change from one input sentence to the next input sentence. Clark abandons the deterministic triggering model.16 Clark’s research marks the beginning of a two-way divergence in approaches to computationally modeling syntactic parameter setting: 1. a nondeterministic approach that portrays the learning process as a largely trial- and-error (non-deterministic) search through the space of possible grammars until a grammar is found that licenses all the target sentences encountered by the learner—a learner which makes ambiguity-induced errors along the way, and 2. a deterministic approach that includes tests to identify and discard ambiguous triggers, so that learning can be based solely on unambiguous information in order to 15

Though there is much discussion of a standard variety of the Subset Principle in Clark (1992), the probabilistic version Clark employs—i.e., the folding of SP into the fitness metric—is distinctly different. 16 Around the same time there was complementary work in the acquisition of generative phonology (Dresher and Kaye 1990) which posited similar viewpoints concerning parameter interaction and ambiguity. However Clark argued for a radically different parameter setting mechanism than triggering—a non-deterministic, trial-and-error approach, whereas Dresher and Kaye in their learning model tried to maintain much of the determinism associated with the classical concept of triggering.

706 William Gregory Sakas avoid errors. This second abides by what has been dubbed the Parametric Principle (Sakas 2000; Sakas and Fodor 2001)—correctly set the value of every parameter independently of evaluating whole grammars—which echoes the spirit of the classical trigger model. There are advantages and disadvantages to both approaches. I return to the significance of this “fork in the road” in section 29.3 and present some conjectures about what directions computational modeling of P&P acquisition will take at the end of this chapter, but for now let us wind up discussion of Clark’s non-deterministic approach. A nontrivial problem with GA models in general, and which Clark also acknowledges is true of his parameter setting model in particular, is what Clark refers to as the endgame problem. Basically, as the population (pool of most fit hypotheses) approaches an optimal hypothesis, the diversity in the pool decreases due to the fact that the genetic operators are being applied over more and more similar hypotheses. Linguistically, it may be the case that the GA is considering a small pool of grammars that differ “from the target by only one or two parameter settings that are not represented in the population” (Clark 1992: 135) If this situation occurs, the GA’s only egress is to rely on random mutation of the genes (parameter settings) in the pool which is inefficient and becomes even more inefficient if the expression of the incorrect parameter value occurs in the linguistic environment relatively infrequently. For example, young children typically encounter few sentences that contain more than one clause (degree-0 sentences consist of exactly one clause, degree-1 sentences consist of two clauses, etc.). It is easy to imagine a situation in which the GA has a small population of grammars that are near optimal in their ability to license degree-0 (single clause) sentences, but that all have one or two parameters governing long distance dependencies misset to the wrong parameter value. As an illustration, consider Chinese as the target language and all the hypotheses in the pool are able to correctly parse degree-0 Chinese sentences but all also have a parameter specifying global anaphoric antecedents set to –Global (Chinese allows global anaphors to be globally bound to their antecedents), that is, the +Global gene is not to be seen in the population. The GA could introduce the +Global gene into the pool and give it a reasonable chance to survive only in the fortuitous case that: (i) the mutation operation picks the +/–G lobal parameter to mutate from –Global to +Global, and (ii) a rare degree-1 sentence which expresses the correct +Global value appears in the input stream before the mutated grammar “dies off,” and (iii) the mutated grammar is chosen as one of the two hypotheses to mate during an iteration when such a degree-1 sentence occurs. If any of these three events were not to come to pass, the +Global gene would be doomed to die out of the population, necessitating all three be realized later during the course of acquisition. Thus, the probability is very low that the GA will converge to a single, correct hypothesis within a reasonable amount of time; it does not meet the feasibility criterion. Although this inefficiency is a direct product of the search heuristics employed by the GA, it is greatly exacerbated by the well-documented scarcity of parametric expression of certain syntactic phenomena in the input encountered by children (see the discussion in

Computational Approaches to Parameter Setting 707 Chapter 28 by Lisa Pearl and Sharon Goldwater in this volume on the argument from the poverty of the stimulus). Another concern not raised explicitly by Clark but highlighted by others (Nyberg 1992; Yang 2002) is the use of multiple grammars to parallel parse a single input sentence. Clearly human adults do not engage in massive parallel parsing of sentences (although it may be that there is some limited parallel parsing employed (Gibson 1991)). It is not specified in Clark’s articles how many grammar hypotheses should be maintained in the population as learning proceeds but it seems likely, based on other successful applications of GAs in other domains, that it would be considerably more than two or three—especially at the outset of learning—which undermines the psychological plausibility of the model (cf. Daelemans 2002). Finally the fitness metric itself has been brought into question (Dresher 1999) on grounds that knowledge of subset–superset relations should arise from intensional or I-language cues, rather than a list provided by UG as Clark argues. And, more interestingly, that the heuristic of preferring grammars that return fewer parameter violations is not a useful strategy in the domain of human languages. Dresher gives an example from generative phonology where if there were a change in just one parameter “every syllable [of the target language] will receive the wrong stress.” Dresher continues: If we then move further from the target by changing other parameter values in the wrong direction, our performance—in terms of syllables or words correct—will appear to improve. In general, depending on the situation, small changes can have big effects and big changes can have small effects. (Dresher 1999: 16)

Dresher is referring to a domain's smoothness. A smooth domain is one in which there is a tight correspondence between changes in the grammar space and changes in the languages the grammars generate.17 Dresher is maintaining that a parameterized domain of metrical stress is unsmooth and hence Clark’s implementation of fitness is untenable for stress. A similar argument can be made for the domain of word-order parameters. For example, consider a P&P grammar, G, which consists of a vector of parameter values including the value Pi(v) and that G licenses language L. If the domain is smooth then changing Pi(v) to Pi(v′) should produce a grammar that licenses a language very similar to L. This is probably more true of some parameters than others. If we take G to be the parameter settings for English and Pi(v) to be –pied-piping (English allows prepositional heads to be “stranded” away from their objects, see the start of this section). If – pied-piping is changed to +pied-piping with all other parameters remaining the same, we would have basically a schoolmarm-approved version of English where sentences are not allowed to end with a preposition and nothing else in the English language is affected. However, if we were to change a single parameter value that would flip English 17 The term smoothness as used here is a description of the language/grammar domain. It is sometimes also used as a description of the learning process meaning “gradualness”—when the learner considers successive grammar hypotheses that differ minimally from one another.

708 William Gregory Sakas into a head-final language from a head-initial language, the new language would share very few sentences with English. We might arrive “back” to English by changing the verb movement parameters to help place the verbs into more English-like positions in the generated sentences. However in doing so we would be changing more parameters away from their correct ones—moving the grammar further from the correct parameter values of English—and at the same time moving the hypothesized language closer to the language of English. This is analogous to the case Dresher described for metrical stress above. To the best of my knowledge there are no formal definitions of smoothness, but the concept of a smooth domain is generally accepted to mean that all grammar variation behaves like the pied-piping parameter and not like the headedness parameter. Dresher’s point is that Clark’s fitness metric would work only in the case that the domain of human languages is smooth; a point that has been strongly argued against by Chomsky and others. For example, Chomsky writes: There is no simple relation between the value selected for a parameter and the consequence of this choice as it works its way through the intricate system of universal grammar. It may turn out that the change of a few parameters, or even of one, yields a language that seems to be quite different in character from the original. [Such as the headedness parameter described above.] (Chomsky 1988: 63)

We return to discussion of the relationship between domain smoothness and nondeterministic learning in section 29.2.4. In summary, Clark’s groundbreaking work unveiled severe limitations of classical triggering theory. Under the umbrella assumption that human language is rife with ambiguity and that grammars contain interactions among syntactic parameters that are difficult to tease apart, Clark set the stage for the computational modeling of P&P acquisition for over two decades. Many have adopted Clark’s observation that linguists’ view of automatic triggering is too weak to cope with ambiguity—acquisition requires a computationally motivated strategy to search over the space of grammars.

29.2.2 Gibson and Wexler: Triggering as Search In 1994, in a seminal article, Edward Gibson and Kenneth Wexler presented the Triggering Learning Algorithm (TLA). The algorithm is appealingly simple. The learner tests its current grammar hypothesis on an input sentence. If the hypothesis licenses the input, the learner does nothing. If the hypothesis does not license the input, then the learner randomly flips a single parameter. If the new hypothesis licenses the input, then the new hypothesis is adopted. Otherwise the learner retains the original hypothesis. As Gibson and Wexler note, the learner is constrained in three ways. First, following Wexler and Culicover (1980), the learner is error-driven; it will only change its hypothesis if the current hypothesis generates an error when analyzing an input. In an

Computational Approaches to Parameter Setting 709 error-driven learning system, convergence falls out naturally from the learning algorithm; once the target has been attained the algorithm canonically will never change its (correct) hypothesis (assuming there’s no noise in the input); there is no need for a stopping criterion or an oracle that has knowledge that the target hypothesis has been achieved. Second, following Clark (1992) the learner employs the Single Value Constraint (SVC) (see discussion in section 29.2.1). Finally, the learner is constrained to only adopt a new hypothesis in the case that the new hypothesis can license the current input sentence when the learner’s current hypothesis cannot, that is, when the new hypothesis is “better” than the current one. Gibson and Wexler (1994) name this the Greediness Constraint. Given: • n, the number of parameters • Gcurr = (p1, p2, …, pn), the current grammar (i.e. a vector of parameter values), where each pi is a either a +value or a –value • Gnew, like Gcurr, but only hypothesized temporarily, for a single sentence A sketch of the algorithm follows. Gibson and Wexler’s TLA • For each sentence s: o Randomly set all pis in Gcurr o If Gcurr can parse s, then do nothing (Error-driven) otherwise: ■ Set all the pis in Gnew equal to the pis in Gcurr ■ Pick a random j, where 1 ≤ j ≤ n ■ Toggle the value of pj in Gnew from either: a + to a –,or, a –to a + (SVC) ■ Parse s with Gnew ■ If Gnew can parse s, the Gcurr becomes Gnew, otherwise do nothing (Greediness) There are several attractive aspects of Gibson and Wexler’s work. Greediness, the SVC, and being error-driven together ensure conservatism. Like Clark’s GA model, learning is gradual; the constraints prohibit the learner from jumping to radically different grammars on every input. More importantly, Gibson and Wexler restrict the computational resources required by the learner. The TLA requires no memory, and at most the application of only two grammars on a single input. These are significant improvements on the GA model where multiple grammars are tested simultaneously on the input and the GA’s pool of most fit hypotheses implement a form of memory. Another noteworthy aspect is the language domain that Gibson and Wexler construct to test the TLA. Gibson and Wexler were the first to construct a space based on linguistically authentic parameters. They incorporated three parameters into their domain generating 23 = 8 abstract but linguistically motivated languages, and Gibson and Wexler use it to show under what circumstances the domain is learnable by the TLA. This methodology is still actively pursued today, though its scale has increased.

710 William Gregory Sakas Finally Gibson and Wexler (1994) were the first to make the distinction between global and local triggers. A global trigger for parameter Pi is a sentence that requires Pi be set to a specific value v, denoted as Pi(v). If Pi is set to another value v′ then the sentence cannot be parsed by the grammar—regardless of any of the other parameter values. A global trigger must also be a sentence of all languages generated by grammars with Pi(v). Like a global trigger, a local trigger for parameter Pi is a sentence that requires Pi be set to a specific value v but only when the other parameters are set to specific values. If the other parameters are set differently, then Pi(v) would work just as well as Pi(v′), that is, the sentence would not be a trigger for Pi(v). Local triggers exist only in some languages that require Pi(v), those that have the other parameters set to specific values. A useful extension of global and local triggers is presented in section 29.2.5 on Sakas and Fodor. Interestingly Gibson and Wexler’s domain contains only local triggers (and no subset–superset relationships among the languages, so the TLA does not need to implement the Subset Principle). Given that the TLA model embodies some psychologically and linguistically desirable features, and that finite parameter spaces are presupposed to be easy to learn, a somewhat surprising result is that the TLA fails to acquire some,18 but not all eight of the target languages’ grammar hypotheses within this very simple parametric framework. This is because the learner gets “trapped” forever in an incorrect hypothesis, or local maxima on the way to the target grammar due to a lack of local triggers generated by the target grammar. Basically if the learner hypothesises one of these local maxima grammars there are no sentences that will allow the TLA to make a move towards the target grammar. Though Gibson and Wexler explore a number of solutions to this problem including defaults and parameter ordering or maturation (cf., Bertolo 1995b), later research (Berwick and Niyogi 1996; Niyogi and Berwick 1996) which elegantly formulates the learning process as a Markov chain establishes that there are inevitable situations in which the TLA is doomed to fall prey to local maxima in Gibson and Wexler’s three parameter domain; that is, it fails to meet the learnability criterion.19 This is supported by two simulation studies on two larger domains in which the TLA failed to acquire the target grammar a large percentage of the time (Kohl 1999; Fodor and Sakas 2004). Even if the domain of human languages does indeed supply local triggers for all parameters, results from probabilistic analysis of the TLA (Sakas 2000; Sakas and Fodor 2001) suggest that given a uniform distribution of ambiguity as the domain is scaled up to incorporate a reasonable number of parameters, the number of sentences required for 18

The number depends on which grammar is the target and which is the starting hypothesis grammar. 19 Two interesting studies, taken together, point to the fact that local maxima are caused by both the formulation of the domain and the learning algorithm applied to the domain. Turkel (1996) demonstrates that a genetic algorithm does not encounter any local maxima when acquiring the grammars of Gibson and Wexler’s domain. Frank and Kapur (1996) show the number of local maxima encountered by the TLA is reduced if three different (but linguistically sensible) parameters are used to generate the languages of Gibson and Wexler’s domain

Computational Approaches to Parameter Setting 711 TLA to converge increases exponentially—over a billion sentences would be required to acquire a target grammar of 30 parameters. This dovetails with results from Niyogi and Berwick (1996) which shows that the TLA does worse than a random step learner which chooses grammars randomly without any of Gibson and Wexler’s three constraints, and is also supported by Fodor and Sakas’ (2004) simulation study. Hence, the TLA also fails to meet the feasibility criterion. However, Sakas (2000) also provides results demonstrating that the TLA does succeed as a feasible model of acquisition in the case that the language domain being acquired is smooth (and assuming there are no local maxima in the domain). This is a recurring theme in the current chapter (see discussion of Clark in section 29.2.1, and Yang in section 29.2.4). I propose the following hypothesis: (H1) Nondeterministic computational search heuristics work well in smooth P&P syntax domains. We return to this in section 29.2.4 on Yang. For now we adopt the premise that the domain of human syntax is not smooth (see section 2.1, and that the TLA is ultimately not viable as an account of human language learning though Gibson and Wexler’s work is noteworthy for its methodology as the first computational attempt at modeling syntax acquisition with a minimal amount of computational resources.

29.2.3 Fodor: Unambiguous Triggers In response to Gibson and Wexler’s work on the TLA, Janet Dean Fodor introduced a family of learners that does abide by the Parametric Principle, The Structural Triggers Learner (STL) (Fodor 1998a, 1998b; Sakas and Fodor 2001), that takes a significantly different view of the triggering process. Rather than guessing a grammar hypothesis or hypotheses to adopt, as both Clark’s and Gibson and Wexler’s models did, Fodor’s STL makes uses of structural information generated by the parser. The key to how the STL operates is that parameter values are not simply +s or –s but rather bits of tree structure or treelets. Each treelet contains (the minimal) structural information required to define a point of variation between languages, that is, a treelet is a parameter value. For example, in the CoLAG domain described below (see section 29.2.4), the parameter value that licenses Wh-Movement is a treelet consisting of a Spec,CP with a [+WH] feature. UG demands that a Spec,CP[+WH] dominates a morphologically wh-marked lexical item and is associated with its trace somewhere else in the sentence. If a grammar contains this treelet, then wh-marked items are fronted under Spec,CP[+WH]. In this picture, triggers and parameter values are ingredients of both grammars and ingredients of trees. Suitable as ingredients of grammars, they are all combined into one large grammar (termed a supergrammar) which the parser applies to the input in exactly the same way as any other grammar is applied. Parameter values (suitable as ingredients

712 William Gregory Sakas of trees) are present in the parse trees output by the parser so that the learning device is able to see which of them had contributed to parsing an input sentence and would know which to adopt. UG provides a pool of these schematic treelets, one for each parameter value, and each natural language employs some of them. Used as a trigger by the learning mechanism, a treelet is detected in the structure of input sentences. As a parameter value, it is then adopted into the learner’s current grammar and is available for licensing new sentences. Thus, a grammar for the STL is a vector of treelets rather than a vector of +s and –s. The STL is error-driven. If the current grammar, Gcurr, cannot license s, new treelets will be utilized to achieve a successful parse.20 Treelets are applied in the same way as any “normal” grammar rule, so no unusual parsing activity is necessary. The STL hypothesizes grammars by adding parameter value treelets to Gcurr when they contribute to a successful parse. The basic algorithm for all STL variants is: 1. If Gcurr can parse the current input sentence, s, retain the parametric treelets that make up Gcurr. 2. Otherwise, parse the sentence making use of any parametric treelets available in the supergrammar, giving priority to those in Gcurr, and adopt those treelets that contribute to a successful parse. We call this parametric decoding. Because the STL can decode inputs into their parametric signatures, it stands apart from other acquisition models in that it can detect when an input sentence is parametrically ambiguous. During a parse of s, if more than one treelet could be used by the parser (i.e. a choice point is encountered), then s is parametrically ambiguous.21 Note that the TLA does not have this capacity because it relies only on a can-parse/can’t-parse outcome and does not have access to the on-line operations of the parser. Originally, the ability to detect ambiguity was employed in two variations of the STL: the strong STL and the weak STL. The strong STL executes a full parallel parse of each input sentence and adopts only those treelets (parameter values) that are present in all the generated parse trees. This would seem to make the strong STL an extremely powerful, albeit psychologically implausible, learner.22 However, it is not guaranteed to converge on the target grammar (Gtarg). The strong STL needs some unambiguity to be present in the structures derived from the sentences of the target language. For example, there may not be a single input generated by Gtarg that when parsed, even in parallel, yields an unambiguous treelet for a particular parameter. 20

In addition to the parameter treelets, UG principles are also available for parsing, as they are in the other models discussed in the chapter. 21 The account given here is idealized. In fact, the picture is complicated by the existence of within- language ambiguity (One morning I shot an elephant in my pajamas) and the fact that a temporary choice point might be disambiguated by the end of an input sentence (see discussion in Fodor 1998b). 22 It is important to note that the strong STL is not a psychologically plausible model. Rather, it is intended to demonstrate the potential power of parametric decoding (Fodor 1998a; Sakas and Fodor 2001).

Computational Approaches to Parameter Setting 713 Unlike the strong STL, the weak STL executes a psychologically plausible left-to-right serial (deterministic) parse.23 One variant of the weak STL, the waiting STL, deals with ambiguous inputs abiding by the heuristic: Don’t learn from sentences that contain a choice point. These sentences are simply discarded for the purposes of learning. This is not to imply that children do not parse ambiguous sentences they hear, but only that they set no parameters if the current evidence is ambiguous. It is important to note that the strong STL and the waiting STL do not perform a search over the space of possible grammars; they decisively set parameters. This is quite unlike either Clark’s GA model, or Gibson and Wexler’s TLA model, which do engage in grammar space search. Thus the STL model is closer to the original concept of triggering. As with the TLA, these STL variants have been studied from a mathematical perspective (Bertolo et al. 1997a, 1997b; Sakas 2000; Sakas and Fodor 2001). Bertolo et al. offer a formal learnability proof that the STL will fail in principle in domains that exhibit certain patterns of overlap between languages due to an insufficiency of unambiguous triggers. However, the proof relies in part on the unsupported premise that a deterministic earner cannot employ default parameter values. Probabilistic analysis conducted by Sakas (2000) and Sakas and Fodor (2001) point to the fact that the strong and weak STLs are extremely efficient learners in conducive domains with some unambiguous inputs but may become paralyzed in domains with high degrees of ambiguity.24 These results among other considerations spurred a new class of weak STL variants which we informally call the guessing STL family (Fodor 1998b; Fodor and Sakas 2004). The basic idea behind the guessing STL models is that there is some information available even in sentences that are ambiguous, and decoding can exploit that information. In its simplest form a guessing STL is a waiting STL that does not discard any inputs. When a choice point is encountered during a parse, the guessing STL model adopts a parameter treelet based on any one of a number of heuristics, for example, randomly, favor treelets without traces or that abide by other universal parsing strategies, favor treelets that have worked well in the past,25 etc. Strictly speaking, the guessing STL violates the Parametric Principle and does perform a nondeterministic search of the hypothesis space constrained by one its heuristics. However, simulation studies on an abstract domain26 consisting of 13 linguistically plausible word-order parameters

23

With the capability of reanalysis when faced with garden path sentences (Fodor 1998b). They used a mathematical formulation of the domain in which language membership of an input was determined by a probability; languages were not collections of strings. See Yang (2002) for a similar approach to domain modeling. 25 The first two of these have been implemented (Fodor and Sakas 2004), the third has not. Discussion can be found in Fodor (Fodor 1998b). Though not identical, Fodor’s notion is similar to a later proposal by Yang which has been implemented. See discussion below in section 29.2.4. 26 The domain was constructed much in the spirit of Gibson and Wexler’s domain, though it is much larger (3,000+ languages as opposed to Gibson and Wexler’s eight) and covers many more syntactic phenomena (13 parameters as opposed to Gibson and Wexler’s three). We return to discussion this domain in section 29.2.5 on Sakas and Fodor. 24

714 William Gregory Sakas (Sakas 2003; Fodor and Sakas 2004) show that the guessing STL variants perform significantly more efficiently than the TLA (when no local maxima were encountered)— approximately a 10-fold improvement.27 This is because the guessing STL still makes extensive use of decoding and certain parameters are easy to set due to unambiguous information. Preliminary results indicate that some parameter values never change after a few initial sentences, even though the guessing STL is “allowed” to change them, while others will change during the acquisition process. For example, once the pied- piping parameter is set, the learner will never have the need to change it back—it will never encounter a sentence (other than noise) that requires the opposite value of the parameter. This is unlike the verb movement parameters and the obligatory topic parameter described in section 29.2.1 in which parameters interact, so the guessing STL is forced to employ one of its heuristics to search the space of these parameters until it settles on the correct values. The end result is that the Parametric Principle is effectively being adhered to—at least for some parameters. Why does this help? Because abiding by the Parametric Principle cuts the search space in half every time a parameter is set. Consider a space of 10 parameters which specify 210 = 1,024 grammars. If a single parameter is set permanently, the learner need only consider the other 9 parameters which reduces the search space to 29 = 512 grammars. If 4 out of 10 parameters are fixed, the search space is reduced to 26 = 64 grammars; the size of the search space is reduced exponentially as more and more parameters are set. The TLA never reduces the search space since it relies on only parse/can’t parse information from the parser; that is, it does not perform decoding. Hence, the search space is never reduced. The guessing STL models only needs to search the space of “problematic” interacting parameters which makes it far more efficient than the TLA. In summary, Fodor’s STL model is a departure from previous computational models that search the space of all possible grammars. The STL uses the parser to decode sentences into bits of structure—treelets—that embody parameter values. The STL is then able to choose from among the treelets that result from decoding and incorporate them into the current grammar hypothesis. The purest of the STL variants (waiting and strong) are very close to the classical triggering model of parameter setting, though not as “automatic” since they require effort on the part of the parser (still, as Fodor argues, parsing is needed for comprehension anyway). However, they clearly suffer from a lack of unambiguous inputs. Results from simulation studies of the guessing STL variants, although further from the classic triggering model since they do search a partial space of hypotheses, indicate that the STL is a viable candidate as a model of human language acquisition. 27 The TLA converged on the target grammar 12% of the time. The waiting and strong STL variants acquired the target grammar only 25% and 26% of the time, respectively. The guessing STL variants always converged on the target grammar. Note that none of the learners were faced with subset–superset choices in this study.

Computational Approaches to Parameter Setting 715

29.2.4 Yang: Variational Learning Charles Yang in a well-received book (Yang 2002) puts forward the argument that it is necessary for any learner to perform well in domains without unambiguous inputs since the “general existence of unambiguous evidence has been questioned” (Clark 1992; Clark and Roberts 1993). He provides an elegant statistical parameter-setting algorithm which is indeed capable of converging to a target grammar in a domain that contains no unambiguous evidence (Straus 2008). The algorithm maintains a vector of weights, each weight in the vector is associated with a parameter and represents the probability that a particular value for that parameter is hypothesized after encountering an input sentence. If the probability is above 0.5, then the +value will be more likely hypothesized by the learner, if it is under 0.5 then the –value will more likely be hypothesized. The weights are “updated” by the learner based on the outcome of a can parse/can’t parse attempt at parsing the current input sentence, s. Through the process of updating, Yang’s weights serve as a form of memory of how well particular parameter values have performed on past inputs. Note that weights are a well-formulated construct akin to Fodor’s notion of the activation levels of parametric treelets (Fodor 1998b: 360). Given: • n, the number of parameters • a vector of weights, W = (w1, w2, …, wn), where each wi is stipulated to be between 0 and 1 • the current grammar (i.e. a vector of parameter values), Gcurr = (p1, p2, …, pn), where each pi is a either a +value or a –value A sketch of the algorithm follows. Yang’s Variational Learner • For each wi in W, set wi to 0.5 • For each input sentence s: ■ For each parameter pi in Gcurr: a. with probability wi, choose the value of pi to be +; b. with probability 1 –wi, choose the value of pi to be –; ■ Parse s with Gcurr ■ Update the weights in W accordingly. Yang gives two methods of updating the weights. The details are not important to the discussion here, but the basic idea is to “reward” Gcurr if the parse of s with Gcurr is successful (or a number of parses on different inputs are successful), and “punish” the weights if the parse (or parses) fail. Rewarding or punishing inches the weights in one direction or the other depending on the current parameter values that are in Gcurr.

716 William Gregory Sakas Updating by rewarding: If a parameter in Gcurr is +, then the weights are nudged towards 1.0. If a parameter in Gcurr is –, then the weights are nudged towards 0.0. Updating by punishing: If a parameter in Gcurr is +, then the weights are nudged towards 0.0. If a parameter in Gcurr is –, then the weights are nudged towards 1.0. For example if Gcurr is rewarded (i.e. Gcurr can parse the current input) and parameter pi in Gcurr is set to +, and its weight, wi, is equal to 0.6, then 0.6 is increased by some amount. Rewarding has the effect of the learner being more likely to hypothesize a +value in the future (step (a) in the algorithm). If Gcurr is punished however, 0.6 is decreased by some amount, hence a +value is less likely to be hypothesized in the future. The same holds for –values. The amount to increase or decrease the weights is determined by the Linear reward-penalty (LR-P) scheme (Bush and Mosteller, 1955; see Pearl, 2007, who proposes an interesting Bayesian alternative that offers some advantages). The picture to paint is one of coexisting grammars that are in competition with each other in the attempt to match the observed linguistic evidence. There is much of interest in Yang’s algorithm as compared to previous computational models. First, like Clark’s GA model, it is not error-driven. Unlike an error- driven learner which waits until there is a misfit between the current hypothesis and the input data, a Variational Learner through rewarding and punishing makes progress both when the data fit the current grammar, and when they do not. Second, it is resource-light. All that is needed in terms of memory (in addition to the parameter values themselves) is a number of weights equal in number to the number of parameters. This is a significant improvement over Clark’s GA model which requires a pool of grammar hypotheses. Another improvement over Clark’s GA is the relatively simple update mechanism of the weights on single parses. Clark’s model requires multiple parses over batches of sentences (though cf., Daelemans 2002). Third, although the resources required of Yang’s learner are somewhat more than what is required of Gibson and Wexler’s TLA, it is in principle impervious to pitfalls of local maxima since there is always a probability greater than 0 that the learner will temporarily jump to a different area of the parameter space, a feat impossible for the TLA to perform. Finally, a recent dissertation (Straus 2008) proves that at least one method of Yang’s method updating the weights (“The Naïve Parameter Learning” model) is guaranteed to converge in almost any P&P domain28—including domains containing only ambiguous 28 The proof requires that all the grammars, other than the target, are less likely than the target grammar to parse the sentences of the target language. This implies that domain does not contain a language which is a superset of the target. If it did, then the probability of parsing an input sentence would be the same for both the superset grammar and the target grammar. The same would be true

Computational Approaches to Parameter Setting 717 data. Hence, unlike Fodor’s STL, Yang’s learner need not rely on the existence of unambiguous input. In its own right, Variational Learning makes two significant contributions to computational modeling of parameter setting. Walter Daelemans, in a review of Yang’s dissertation (which serves as the basis for Yang’s book) puts it well:29 In conclusion, I think Charles Yang’s dissertation is an important milestone in P&P- based theories of language learning, and Variational Learning deserves to be widely studied. For the first time, a general learning method is combined with a UG-based hypothesis space into an acquisition model that seems to have largely the right formal characteristics and that improves upon earlier proposals. Especially interesting is the fact that the approach provides a new tool for the study of both child language development and language change in an objective, corpus-based, and quantitative way. (Daelemans 2002: 142)

To unpack Daelemans’ comments: by general learning Daelemans is making reference to what is often called domain general learning—learning methods that can be applied to any domain without resorting to prior knowledge of, or being biased by specific facts of the domain that is being acquired. This can be contrasted readily with triggering theory and Clark’s approach, each require that a learner have either knowledge (of what serves as a trigger), or bias (favor subset languages) specific to a language. Yang’s learner requires no specific knowledge or bias (the parse test is presumably not part of the learner, only the ability of the learner to observe the result). The point is, that in principle, Variational Learning is not specific to language learning. Hence, Yang’s work marks the first time that domain general learning is employed in a “UG-based” model of language acquisition.30 This is significant because historically language acquisition research in the generative tradition has taken the stance that the learning mechanism must be language specific in part due to the paucity of input that children receive. The second point Daelemans makes, and the more important one, is that Yang’s research includes explication of how Variational Learning can explain developmental data—specifically of Null Subject data produced by children learning English— and language change data. Language change aside, to date it is the only study of how a of a language that was weakly equivalent to the target language—exactly the same sentences of the target language, licensed by the target grammar and (at least one) other grammar. Domains with these characteristics are not covered by the results in the dissertation. 29 In other parts of the review, Daelemans is somewhat critical of Yang’s quick dismissal of “related approaches in (statistical) machine learning, computational linguistics and genetic algorithms.” 30 This might be taken as a little strong. Clark’s genetic algorithm follows a standard implementation used by many AI researchers working in a wide range of domains. The TLA could also be used to search a non-linguistic domain (assuming that the domain can be meaningfully parameterized). Daelemans’ point is that Clark’s fitness metric, and Gibson and Wexler’s constraints on learning were both motivated by linguistic considerations, presumably giving the TLA a domain specific language learning bias. Yang’s learner, on the other hand, employs a generic, domain-neutral heuristic: reward hypotheses that cover the current input datum, punish those that do not.

718 William Gregory Sakas computational model of syntactic parameter setting is shown to perform in accordance with actual data from child speech. Despite the impact of Yang’s work, there are two fronts on which weaknesses can be identified. The first one, ironically, concerns one of the abovementioned strengths of Yang’s work—the ability of the Variational Learner to make predictions about developmental data. Sugisaki and Snyder (2005) and Snyder (2007) argue that “the acquisition of preposition-stranding in English poses a serious challenge” to Yang’s learner, because if the + and –values of the pied piping parameter were in competition there would be observable effects of pied-piping in children’s utterances. Sugisaki and Snyder present data from a previous study (2003) which targets wh-questions. They studied English corpora from 10 children in the CHILDES database (MacWhinney 2000) and found that none of the children produced a wh-question with pied-piping—the preposition was consistently stranded. This, they argue, is not consistent with the prediction Variational Learning would make. Until the –value (as in English) of the pied piping parameter was rewarded sufficiently, the Variational Learner would have a non-zero chance of hypothesizing the +value. Indeed at the beginning of learning there is an equal chance of choosing either value.31 Unlike the pied-piping parameter which produces observable effects in children’s utterances, Yang’s Null Subject data are the result of modeling grammar competition where English-speaking children are making errors of omission, omitting required subjects, so there are no observable effects in the utterances of the children when entertaining an incorrect parameter value. In general, any nondeterministic learning model would predict that there would errors of comission in developmental data—grammatical errors that are observable in children’s utterances. Snyder (2007) argues extensively that this is not the case and that developmental data are consistent with a deterministic classical triggering model of parameter setting. Another potential weakness of Variational Learning involves the feasibility of Yang’s model in terms of number of inputs required to converge. Yang states that “about 600,000 sentences were needed for converging on ten interacting parameters.” Preliminary results from this author’s simulation studies comparing Fodor’s STL models, the TLA, and Variational Learning on the CoLAG domain of 13 parameters (see section 29.2.5) concur: The Variational Learner (and the TLA) requires an order of magnitude more input sentences to converge than the guessing STL. This in and of itself does not argue conclusively that Yang’s model is not viable as a model of human language acquisition (perhaps the STL models converge too quickly!). However, Yang’s result (of 600,000 sentences) was generated by a simulation study of an artificial domain in which a high level of smoothness was imposed—grammars that shared more parameter values with the target grammar (effectively) generated languages32 more similar to the target language than grammars with fewer parameter values in common with the target (see 31 They also argue that the time course of development with respect to wh-questions and preposition stranding is incorrectly predicted by Yang’s learner. 32 Yang used a probabilistic framework to craft the language domain similar to that of Sakas (2000) and Sakas and Fodor (2001). See footnote 25.

Computational Approaches to Parameter Setting 719 (H1) and discussion in section 29.2.1). Given a smooth domain Yang’s reward/punish heuristics will accelerate the Variational Learner towards the target at an increasing rate. That is, as learning proceeds the learner will be more and more likely to hypothesize grammars that are more and more similar to the target grammar, that is, that have more and more parameters in common with the target. Recall that mathematical modeling of the TLA showed a feasible level of performance in a smooth domain, but not in other domains (Sakas 2000). It remains an open question to what extent the Variational Learner can cope with unsmooth parametric domains. Despite these two weaknesses (the latter, only a potential weakness) Yang’s Variational Learner stands as the state-of-the-art in non-triggering, nondeterministic computational modeling of parameter setting.

29.2.5 Sakas and Fodor: Trigger Disambiguation Since Clark’s identification of the problems of parametric ambiguity and parameter interaction, computational modeling of parameter setting mostly employs nondeterministic search strategies over the parameter space. Recent work by Sakas and Fodor (2012) suggests that the problem of parametric ambiguity might be overcome within a deterministic framework. This work makes use of a domain containing 3072 natural-language-like artificial languages used for testing models of syntactic parameter setting: the CUNY-CoLAG domain33 (or CoLAG for short). All languages in the domain share general structural principles, which constitute the Universal Grammar (UG) of CoLAG. The individual languages are generated by grammars defined by 13 binary syntactic parameters which control familiar phenomena such as head direction, null subjects, wh-movement, topicalization, and so forth. In the spirit of previous work (Gibson and Wexler 1994; Bertolo et al. 1997a, 1997b; Kohl 1999), the CoLAG domain has a universal lexicon (e.g. S for subject, O1 for direct object, O2 for indirect object, -wa for topic marker, etc.); sentences (or more accurately sentence patterns) consist of sequences of the non-null lexical items, e.g. S-wa O1 Verb Aux or O1-wh Verb S O2. Grammars are non-recursive and the domain contains 48,086 sentences (on average 827 per language), and 87,088 fully specified syntactic trees which means there is substantial syntactic ambiguity (another indication of ambiguity is that every sentence is licensed by 53 grammars on average).34 Using computer database queries, Sakas and Fodor (2012) discovered that almost half of the parameter values lack unambiguous triggers in some or all of the languages which need those parameter values. For three of the 13 parameters there are insufficient unambiguous triggers for both (+ and –) values. However, by making use of some standard 33

Spelled out: City University of New York—Computational Language Acquisition Group domain. The trees are constructed in the manner of Generalized Phrase Structure Grammar using SLASH categories to create movement chains. 34

720 William Gregory Sakas default values (e.g. a value that licenses subset languages), and using a non-standard “modest toolkit” of disambiguating strategies—between-parameter-defaults and conditioned triggers (described later in this section)—they were able to establish unambiguous triggers for all non-default parameter values in all of the CoLAG languages. Sakas and Fodor’s study suggests that contra previous work, a computational model of parameter setting could proceed deterministically by setting each parameter only after encountering an unambiguous trigger for the required value (ignoring ambiguous input). Since unambiguous triggers are reliable, each parameter would need only be set once without the need to engage in trial-and-error exploration of the parameter space (see discussion of the Parametric Principle in section 29.2.3). Sakas and Fodor (2012) emphasize the distinction between E-triggers and I-triggers (cf., Lightfoot 1999, 2006). An E-trigger is an observable property of a sentence, and an I-trigger is an innate piece of grammatical structure that realizes the observable E-trigger. Consider again the pied-piping parameter which specifies whether or not prepositional objects must always reside in a prepositional phrase. The E-trigger would be “"non-adjacency of P and O3” whereas the I-trigger would be whatever aspects of the grammar that licenses the O3 being moved out of the PP.35 I-triggers are exactly the “treelets” of previous work on STLs (see section 29.2.4) and are available to the child learner as prescribed by UG. Contra E-triggers, I-triggers are not readily observable in the linguistic environment. Though there is significant discussion of the relationship between I-triggers and the E-triggers, the results most relevant here revolve around establishing unambiguous E-triggers for every parameter in the CoLAG domain. Sakas and Fodor also extend Gibson and Wexler’s distinction between local and global (E-)triggers by incorporating the notions of valid triggers and available triggers.36 A valid trigger is a trigger that can be used (safely) by the learner to set a parameter to a particular value Pi(v). A globally valid trigger for Pi(v) is what I have been referring to as an unambiguous trigger here—a trigger that exists only in languages whose grammars have Pi(v) and does not exist in any grammar having Pi(v′). A locally valid trigger is a trigger that can be used to set Pi(v) given that the settings of other parameters are established. “Availability” specifies the extent to which a trigger exists in the languages of the domain. A globally available trigger exists in all languages whose grammar has Pi(v) whereas a locally available trigger exists in only a subset of languages whose grammars have Pi(v).37 Note that a globally valid trigger may not be globally available—a distinction Gibson and Wexler do not make. Sakas and Fodor (2012) first searched CoLAG for globally valid E-triggers and found that for five (out of the 13) parameters every language generated by either the + or –value 35

From S&F: “ ‘SLASH O3’ as in the node label ‘PP[SLASH O3]’ which by definition dominates a PP whose internal object (O3) has been extracted,” though of course the formulation of I-triggers would differ from one theoretical framework to another. 36 The following presentation of S&F’s definitions is slightly simplified. However, the specifics are not relevant to the discussion here. 37 A global trigger for Gibson and Wexler is, by these definitions: a globally available, globally valid trigger. A local trigger for Gibson and Wexler is a locally valid, locally available trigger.

Computational Approaches to Parameter Setting 721 contained at least one. For another five parameters there existed at least one globally valid trigger for the +value. Thus, for these five parameters, by designating the –value as the default, the +value could be reliably triggered in languages requiring the +value, and a learner would never be falsely led to the +value in languages requiring the –(default) value since by definition the globally valid triggers for the +value do not exist in languages that require the –value. For the remaining three parameters (ItoC Movement, Q-Inversion, and Optional Topic) there were languages in CoLAG that lacked globally valid triggers for both the + and –values. Sakas and Fodor make the move to “create” unambiguous triggers by employing two disambiguating strategies; between-parameter defaults and conditioned triggers.38 Sakas and Fodor (2012) define these strategies as follows: A between-parameter default: A sentence property s that is compatible with the marked values of two different parameters Pi and Pj is stipulated as triggering Pi, leaving Pj to be set by other triggers. This is helpful in cases where Pi suffers from a shortage of triggers but Pj does not; in effect, Pj makes a gift of trigger s to Pi. Example: Absence of an overt topic, which is compatible with both +OptTop and +NullTop, is designated as triggering +OptTop only; where appropriate, +NullTop is set by a different trigger (absence of an obligatory sentence constituent).

A conditioned trigger: A sentence property s compatible with two (or more) parameter values vi and vj (values of the same parameter or different ones) becomes unambiguous as a trigger for vi once the value vk of some other parameters are established. (This is a locally valid trigger in terms of the definitions above.) Example: the surface word order AuxVOS is compatible with either −ItoC or +ItoC but becomes an unambiguous trigger for −ItoC once the Headedness in CP parameter has been set to Complementizer-final.

Sakas and Fodor (2012) take both strategies to be innately endowed. They also point out neither strategy could be employed in an ad hoc fashion; both require that “safety requirements” be met (e.g. do not rely on a default value for vk when creating a conditioned trigger). Whenever needed in CoLAG, the safety requirements for both strategies were met. After applying these disambiguation techniques, the remaining three of the CoLAG parameters could be set; in the end, all 13 parameters in CoLAG could be reliably set. The key concept is that if unambiguous (globally valid) triggers do not exist in the input, they can be created by strategies that a learner could employ. In fact this is precisely what standard (within-parameter) defaults do. Given input that is ambiguous between two values of a parameter, disambiguate by favoring the default value. 38

Similar strategies have been used in the past for modeling the setting of phonological parameters (Dresher and Kaye 1990; Dresher 1999).

722 William Gregory Sakas Of course, there is no guarantee that this positive result for the CoLAG domain will extend to the full domain of natural languages. Even if it does, the finding that deterministic parameter setting is feasible does not entail that it is the means by which children acquire their grammars; in fact there are several drawbacks of strictly deterministic learning as discussed in the next section. Sakas and Fodor’s work, however, re-opens the possibility for future computational (and psycholinguistic) investigation to incorporate components of classical triggering theory.

29.3 Back to the Future: A Return to Classical Triggering Theory? Sakas and Fodor’s work suggests that the turn to non-deterministic, heuristic search in most existing computational models of parameter setting—a radical departure from the original conception of triggering—may have been premature. However, even if the domain of human languages contains sufficient unambiguous E-triggers given a workable model of trigger disambiguation, other linguistic, computational, and psycholinguistic issues concerning strictly deterministic parameter setting have been raised in the literature. • Deterministic learning is not robust in the face of noisy input—input that contains linguistic phenomena not in the target language. A deterministic learner runs the risk of mis-setting a parameter based on noise with no recourse (by definition) to re-setting the parameter to the correct value (Nyberg 1992; Kapur 1994; Fodor 1998b). • Determinism also entails immediate shifts from one grammar to another rather than a gradual change over time (van Kampen 1997; Yang 2002). • A deterministic learning mechanism that is “perfect” and only sets parameters correctly precludes an explanation of the development of creole languages based on pidgin language input (and language change in general) that engages acquisition at its center (Lightfoot 2006 and references there). Although valid concerns, they do not rule out a fundamentally deterministic learning mechanism which possesses nondeterministic components (e.g. a system that requires multiple exposure to an unambiguous E-trigger, or even exposure to multiple E-triggers multiple times, in order to ensure reliable learning in the face of noise). Such a learning system could well model the course of acquisition better than a fully trial-and-error system (see the discussion concerning the fit of Yang’s model to developmental data in section 29.2.4). Another concern that is specific to unambiguous E-triggers is their implementation— are they an innately endowed list of fully formed sentence patterns? Unlikely. Are they

Computational Approaches to Parameter Setting 723 a list of innately endowed “schemas” (e.g. a prepositional object separated from its preposition)? Possibly. Or are they computationally derived from innately endowed I-triggers? This last is the most attractive of the three options. However, it remains an open question if it is a computationally feasible option (Gibson and Wexler 1994; Sakas and Fodor 2012). Still the advantages of a deterministic, triggering learning system are compelling. One would be hard pressed to imagine a more efficient and precise acquisition model than one that uses unambiguous (including disambiguated) E-triggers to set a parameter to its correct value. Equally as important, deterministic learning allows for reliable use of default values. Default values are necessary if the marked value of a parameter generates a superset of the default value; they are a straightforward implementation of the Subset Principle in a parametric system. They also minimize the workload of the learning mechanism since no computations need to be carried out to establish a default value. As a result, defaults are often presupposed in much of pyscholinguistics research. However, if the learner employs a nondeterministic search strategy, defaults are rendered virtually worthless. This is because a trial-and-error learner is never completely certain that the current hypothesized grammar is correct; that is, the hypothesis might contain incorrect marked values for any number of parameters; the search heuristics would need sufficient evidence for both values of every parameter which in turns entails less efficient learning than a learning model that makes use of defaults. Although there have been attempts aimed at trying to reap the benefits of default values within a nondeterministic learning paradigm (Bertolo 1995b, 1995a), the intricate solutions are unsatisfactory compared to the simplicity of the classic triggering model; a default value can be reliably left unaltered unless unambiguous evidence for the marked value is encountered by the learner. In the end, it appears (at least to me) that the benefits of a deterministic, classical triggering model outweigh the disadvantages and that many of the components of classical triggering should be, and could be, retained in computational modeling of human language acquisition. But this endeavor is in its infancy. If I were to chart an “ideal” path that the next generation of computational models of syntactic parameter setting would take, I would mark first that modelers need to develop a psychologically plausible computational mapping from I-triggers to E-triggers. This is not a trivial endeavor and has remained an open question for several decades, however, any computational model of triggering will need to implement mapping between I-and E-triggers if it is to be widely accepted as a model of child language acquisition. Following this, extensive study of parameter interaction and parametric ambiguity would need to be conducted in order to discover if trigger disambiguation can overcome whatever pockets in the domain of human languages exhibit a dearth of unambiguous E-triggers. To the extent that trigger disambiguation fails in creating sufficient unambiguous triggers, the learning model would employ a simple statistical mechanism (e.g. Fodor 1998b; Yang 2002) to navigate what would be expected to be a small area of the parameter space non-deterministically (see discussion of the Guessing-STL models in section 29.2.3). Finally, robustness would need to be built into the model and tested against corpora of actual child-directed

724 William Gregory Sakas speech complete with all expected mis-speaks, sentence fragments, etc.—and, of course, in as many languages as possible. That is, if I were to map such a trajectory out! This future is clearly “a ways away,” but going back to the essential elements of classical triggering theory, adapting them in a computational implementation of parameter setting, and, finally empirically demonstrating that the adaptations make for an efficient, precise, and developmentally accurate model, is the quickest and most promising route to get there.

Chapter 30

L earning with V i ol a bl e C onstra i nts Gaja Jarosz

30.1 Introduction Learnability is a central problem of linguistics. A complete theory of language must explain how this rich and complex system of knowledge can be learned from the limited available input. Work on learnability seeks to answer this question by developing explicit formal models of the learning process, models that provide concrete hypotheses about how language can be learned. Work on learnability within Optimality Theory (OT; Prince and Smolensky 2004)1 and related frameworks has contributed a great deal to our understanding of language learning. These results contribute not only to our understanding of how language might be learned in principle, but also to our understanding of how language is actually acquired by children. This chapter reviews these important contributions, focusing primarily on major learnability results and challenges, but also reviewing work that relates the predictions of computational models to developmental findings. This chapter discusses the various facets of the learning problem, the challenges they pose for the learner, and the constraint-based approaches that have been developed to address them. Under the standard view in OT, the set of universal constraints defines, via permutation, the space of possible adult languages (the typology). This formal system defines the hypothesis space for learning—the set of languages that

1 The focus of this chapter is on learning, but there is also a lot of work on the important, related computational problems of generation and recognition. Learning and generation are intrinsically linked since most (all?) constraint-based learning models assume a generation mechanism. These problems are conceptually distinct, however, and the focus of the present paper is on learning (for related work on generation and recognition problems, see e.g. Ellison 1994; Tesar 1995; Frank and Satta 1998; Eisner 2000a, 2002; Riggle 2004; Idsardi 2006; Heinz et al. 2009).

726 Gaja Jarosz learning models take as the targets of learning. Under this view, the major subproblem of language learning is the learning of grammars (rankings or weightings of constraints) for a given set of constraints. The grammar-learning subproblem itself consists of a number of non-trivial subproblems. In the general case, the grammar must be learned from a set of unstructured overt forms to which the learner is exposed. Tesar and Smolensky (1998, 2000) decompose this larger grammar- learning problem into a number of processes, including the learning of grammars from full structural descriptions. Full structural descriptions are the representations evaluated by constraints in an OT grammar and include underlying/lexical representations as well as hidden structural information, such as prosodic or syntactic structure. Although this narrower grammar-learning subproblem is itself non-trivial, there now exist a number of solutions both for classic OT (section 30.2) as well as for probabilistic and weighed extensions of OT (section 30.3). Learnability results relating to the broader grammar-learning subproblem, which makes the more realistic assumption that learners do not have access to hidden structure, are discussed in section 30.4. Section 30.5 briefly reviews results on an important aspect of language learning, the learning of restrictive languages. Finally, most learnability work takes into account basic considerations of psychological plausibility, such as the computational resources expected of the learner. Some computational work goes further in attempting to model aspects of the acquisition process itself and relating the behavior of the model to developmental findings—section 30.6 reviews constraint-based modeling work of this sort. Although most work on constraint-based computational modeling has focused within the domain of phonology, most of the models for constraint-based learning are entirely general, applying equally well to other linguistic domains. In other words, most of the learnability results depend only on the formal structure of the framework and not at all on the content of the constraints or the representations they evaluate. This contrasts with approaches to learning within parametric frameworks that rely crucially on identifying domain-specific cues for the setting of individual parameters (Dresher and Kaye 1990; Dresher 1999). Another approach to learning in parametric frameworks relies on the existence of triggers, special kinds of data that can uniquely specify individual parameter settings (Gibson and Wexler 1994; Frank and Kapur 1996; Fodor 1998a). As discussed by Tesar and Smolensky (1998, 2000), guaranteeing the existence of such triggers often requires restrictions on the grammar in order to ensure parameters are independent, a goal that unfortunately often conflicts with the goals of typological explanation. For extensive discussion comparing constraint-based and parametric approaches to the broader grammar-learning subproblem, see Tesar (2004b). Although constraint-based learning models are largely domain-general, they have been applied extensively to the learning of phonological grammars, especially the learning of phonological alternations. The discussion in this chapter will therefore focus on phonology; the extension of these learning models to syntax and other domains remains a promising direction for further research.

Learning with Violable Constraints 727

30.2 Learning Rankings This section reviews proposals within Classic OT for the learning of a ranking of constraints consistent with a set of (fully structured) data. Tesar (1995) and Tesar and Smolensky (1998) developed a family of learning algorithms for the (narrow) grammar- learning subproblem within classic OT almost simultaneously with the introduction of OT. Section 30.2.1 discusses Recursive Constraint Demotion (RCD), an algorithm that learns rankings given a set of winner–loser pairs. Algorithms for extracting and processing these winner–loser pairs during on-line learning are discussed in section 30.2.2.

30.2.1 Recursive Constraint Demotion Learning a ranking entails finding a ranking of the constraints under which each learning datum is rendered optimal. In order for the datum to be optimal, every constraint that favors some other candidate must be dominated by a constraint that favors the learner’s datum. The requirements behind this ranking logic are particularly transparent in Comparative Tableaux (Prince 2002a). Rather than showing constraint violations of individual candidates, each row of a Comparative Tableaux represents a winner–loser pair, a comparison between the constraint violations of the designated winner and another candidate, the loser. The winner–loser pair comparison indicates for each constraint whether it favors the winner (W), the loser (L), or has no preference (e). For example, if the winner is [pat], the constraint NoCoda assigns more violations to the winner [pat] than to the loser [pa], so this winner–loser pair would get an L for NoCoda. The L indicates that NoCoda must be dominated by some constraint with a W for this pair in order for [pat] to be optimal. Stated in terms of Comparative Tableaux, a ranking selects the winner as optimal if for every winner–loser pair, every loser-preferring constraint is dominated by a winner-preferring constraint. Put simply, each L must be preceded by a W in the same row. Viewed in this way, RCD (Tesar 1995; Tesar and Smolensky 1998) is an efficient way of reordering the constraints such that each loser-preferring constraint is dominated by some winner-preferring constraint. RCD takes full advantage of strict domination by creating the ranking top down—once a candidate is ruled out by high-ranking constraints, it is no longer considered by the algorithm. It works recursively, repeatedly selecting constraints to rank in the hierarchy, removing them from the working set, and repeating the process. Each time a set of constraints is removed from the working set, it is placed in successively lower strata of the hierarchy. A constraint is available to rank in the hierarchy if it prefers no losers that are still in play in the working set (has no Ls in its column). Once removed and placed at the top of the hierarchy, constraints that are winner-preferring make new constraints available to rank by eliminating losing candidates from consideration, and the process repeats until no constraints are left to rank.

728 Gaja Jarosz (1)

Constraints

a. *+Voi/Obs—No voiced obstruents. One violation for every voiced obstruent. b. ID[Voi]—No changes in voicing. One violation for every voicing feature changed from input to output. c. Agree[Voi]—Sequences of obstruents must agree in voicing specification. One violation for every pair of adjacent obstruents differing in voicing. d. Max—No deletion. One violation for every deleted segment. (2) A set of winner-loser pairs before RCD

/lupz/ /lupz/ /dan/

*+Voi/Obs [lups] ~ [lup] e [lups] ~ [lupz] W [dan] ~ [tan] L

ID[Voi] L L W

Agree[Voi] e W e

Max W e e

The process is illustrated for the winner–loser pairs in (2), which rely on the constraints in (1). The first pass of RCD examines each of the four constraints, identifying Agree[Voi] and Max as the two constraints that prefer no losers. These two constraints are thus placed at the top of the hierarchy and removed. As shown in (3), these two constraints favor the winners of the first two winner–loser pairs, with Agree[Voi] ruling out the losing candidate [lupz], and Max ruling out the losing candidate [lup]. The high ranking of Agree and Max has ruled out these candidates so RCD removes them from consideration. (3) Understanding the result of the first pass of RCD

/lupz/ /lupz/ /dan/

[lups] ~ [lup] [lups] ~ [lupz] [dan] ~ [tan]

Agree[Voi] e W e

Max W e e

*+Voi/Obs e W L

ID[Voi] L L W

Thus, after the first pass of RCD there are just two constraints and one winner–loser pair left, as shown in (4). Now, RCD is called recursively on this resulting set of winner–loser pairs and reduced set of constraints and determines that ID[Voi] is the only constraint that prefers no losers, and so it is placed in the next highest stratum and removed. At this point the ranking that has been built is {Agree[Voi], Max} » ID[Voi]. (4) After the first pass of RCD

/dan/

[dan] ~ [tan]

*+Voi/Obs L

ID[Voi] W

Since ID[Voi] favors the winner [dan] for the remaining winner–loser pair, the pair is removed, leaving an empty set of winner–loser pairs. On the final pass of RCD, there are no losers left so all the remaining constraints, just *+Voi/Obs in this case, are placed in the ranking. The hierarchy found by RCD is thus {Agree[Voi], Max} » ID[Voi] »

Learning with Violable Constraints 729 *+Voi/Obs, and any total ranking consistent with it is guaranteed to be consistent with all the winner–loser pairs. In this example, which has four constraints, it takes RCD three passes to find a consistent ranking. In the general case, it will take at most n passes of RCD to find a consistent ranking (assuming there is one) for n constraints (Tesar 1995; Tesar and Smolensky 1998). This is because, if there is a consistent ranking, then each pass of RCD ranks at least one constraint and thus is complete within n passes. How long each pass takes also depends on the number of constraints, but each pass cannot consider more than n constraints. The whole procedure therefore cannot exceed n2 steps, which compares very favorably with the size of the search space, for example the number of rankings, which is n!. Another important feature of RCD is its ability to determine when no ranking is consistent with a set of winner–loser pairs. Inconsistency is detected if at any point during learning there remain unranked constraints, but all of them prefer some losers. Inconsistency detection is a crucial component of later proposals dealing with structural ambiguity (Tesar 1997b, 2004b) and the learning of underlying forms (Tesar et al. 2003; Tesar 2006a, 2006b, 2009; Merchant 2008; Merchant and Tesar 2008), which are discussed in section 30.4. In addition to RCD, Tesar (1995) and Tesar and Smolensky (1998) discuss a related learning algorithm, Constraint Demotion (CD), which is the basis for the well-known Error-Driven Constraint Demotion, discussed in the next section. CD examines one winner–loser pair at a time and places all loser-preferring constraints ranked above the highest-ranked winner-preferring constraint to a stratum immediately below the highest-ranked winner-preferring constraint. In other words, it demotes high-ranking loser-preferring constraints so that all Ls end up below a W. The computational properties of CD are similar to RCD: CD is also guaranteed to efficiently find a correct ranking if there is one (see Tesar 1995 for in depth comparisons). The main advantage of CD is that it can be applied to one winner–loser pair at a time. In contrast to RCD, however, CD cannot detect inconsistency. The next section discusses variants of both of these algorithms, which process one learning datum at a time, relying on error-driven learning.2

30.2.2 Error-driven Learning Given a consistent set of winner–losers pairs, RCD is guaranteed to efficiently find a correct ranking; however, the procedure itself does not specify where these winner– loser pairs come from. Given full structural descriptions for the learning data, winners are simply the observed forms in the learning data. Error-driven learning (Wexler and Culicover 1980; Tesar 1995; Tesar and Smolensky 1998) provides a particularly efficient 2

For alternative ways of constructing losing competitors, see Riggle (2004).

730 Gaja Jarosz way of selecting informative losers. In error-driven learning, the learner processes each learning datum to determine what candidate is considered optimal under the current ranking. Specifically, the learner uses the underlying form provided in the full structural description of the datum and its current ranking to generate its output for that datum. If the learner’s output matches the learning datum, the current ranking already correctly accounts for that learning datum, and the learner does nothing. If, however, the learner’s output does not match the learning datum, it is considered an error and triggers re-ranking. The learner creates a winner–loser pair using the learning datum and the learner’s own (incorrect) output, and this pair is used to update the ranking hypothesis. In this way, the learner only considers winner–loser pairs that are informative because they indicate that a change to the current ranking is necessary: the (potentially infinite) space of possible candidates is never directly consulted during learning. Learning continues until no more errors are produced. This section describes two error-driven learners for the Classic OT (narrow) grammar-learning subproblem, Error-Driven Constraint Demotion (Tesar 1995; Tesar and Smolensky 1998) and Multi-Recursive Constraint Demotion (Tesar 1997b, 2004b). Other error-driven learners for Classic OT are discussed by Boersma (1998) and Magri (2009). Error-Driven Constraint Demotion (EDCD) maintains a single grammar hypothesis, processes one learning datum at a time, and, if an error is produced, applies CD to the resulting winner–loser pair. EDCD is an on-line learning algorithm since it processes one datum at a time. EDCD also has efficient data complexity: it needs to process at most ½n(n–1) informative winner–loser pairs before settling on a final ranking (Tesar 1995; Tesar and Smolensky 1998). Finally, just like CD, EDCD is guaranteed to find a ranking consistent with the set of winner–loser pairs it considers, if there is one3 (Tesar 1995; Tesar and Smolensky 1998). The appeal of EDCD is its correctness, efficiency, and simplicity; however, because EDCD cannot detect inconsistency, Tesar later proposed an error-driven variant of RCD called Multi-Recursive Constraint Demotion (Tesar 1997b, 2004b). Multi-Recursive Constraint Demotion (MRCD) is also on-line since it processes one learning datum at a time. Unlike EDCD, however, MRCD keeps track of all the winner–loser pairs it encounters. Specifically, rather than maintaining a single ranking and using it to process each learning datum, MRCD maintains a list of winner–loser pairs, called a support, that it uses to process each new datum. For each learning datum, MRCD applies RCD to the support in order to construct a hierarchy, uses the hierarchy to generate the learner’s output for the datum, and if an error is produced, adds the resulting winner–loser pair 3

EDCD is guaranteed to find a ranking consistent with all the winner–loser pairs it encounters; however, as discussed by Tesar (2000) and Merchant and Tesar (2008), EDCD can fail to find all the necessary winner–loser pairs as a result of the treatment of ties. Thankfully, a simple adjustment to EDCD’s original treatment of tied constraints fixes this issue. Tesar (2000) proposes to replace the “pooling ties” of the original EDCD with “conflict ties,” which generate an error if there is crucial conflict between constraints that are tied. Boersma (2009) proposes an alternative solution, “permuting ties.” Finally, Merchant and Tesar (2008) suggest another possibility, relying on Riggle’s (2004) Contenders algorithm to generate losers.

Learning with Violable Constraints 731 to the support. Just like EDCD, MRCD has efficient data complexity, and never needs to consider more than ½n(n–1) winner–loser pairs before settling on a ranking that produces no more errors (Tesar 1997b, 2004b). Storing the list of winner–loser pairs does increase the amount of computational effort expected of the learner, and it means the learner applies the full RCD algorithm during processing of each datum, but it also enables the learner to detect when a new winner–loser pair is inconsistent with the existing list. The relevance of this feature of MRCD will become clear in the context of learning with hidden structure, discussed in section 30.4. In sum, nearly simultaneously with the introduction of OT, Tesar (1995) and Tesar and Smolensky (1998) introduced a family of learning algorithms for the narrow grammar- learning subproblem of Classic OT. The CD family includes RCD, which is guaranteed to efficiently find a ranking consistent with a set of winner–loser pairs or to detect inconsistency, if there is no consistent ranking. The on-line, error-driven variant of RCD is called MRCD. The family of algorithms also includes CD, which processes one winner– loser pair at a time, and whose on-line, error-driven variant is called EDCD. These learning algorithms form a foundation for subsequent work on larger subproblems for Classic OT as well as for a number of error-driven learning algorithms for probabilistic extensions of OT. Approaches to the narrow grammar-learning subproblem relying on probabilistic and weighted generalizations of OT are discussed next.

30.3 Beyond Total Ranking This section discusses two orthogonal ways of generalizing classic OT and the application of these generalizations to the grammar-learning subproblem. One way that the classic OT ranking can be generalized is by assuming that grammars are actually mixtures of multiple total rankings, or probability distributions over rankings. The notion that an individual’s grammar may consist of multiple rankings has played a prominent role in the study of variation and optionality (see Coetzee and Pater 2008b and Anttila 2007 for overviews). Under this view, variation arises when an individual’s grammar varies stochastically between multiple total rankings, with different rankings selecting different candidates as optimal. While a variety of such probabilistic ranking approaches have been explored in the generative linguistics literature, within the OT learnability literature, the most well-known example of probabilistic ranking is Stochastic OT, which comes with an associated learning algorithm, the Gradual Learning Algorithm (Boersma 1997; Boersma and Hayes 2001). Section 30.3.1 presents the Gradual Learning Algorithm for Stochastic OT and its application to learning with full structural descriptions.4 The other method of generalizing classic OT, discussed in section 30.3.2, 4 Jarosz (2006a) also develops a theory of learning for a probabilistic extension of OT, but since the main focus of that proposal is on learning without full structural descriptions, it is discussed in section 30.4.

732 Gaja Jarosz involves numerical constraint weighting rather than ranking. The discussion focuses on Harmonic Grammar (Legendre et al. 1990c; Smolensky and Legendre 2006), out of which OT developed, and also discusses probabilistic weighting, a combination of the two extensions to classic OT.

30.3.1 Probabilistic Ranking In Stochastic OT (Boersma 1997; Boersma and Hayes 2001), constraints are not strictly ranked on an ordinal scale. Rather, each constraint is associated with a mean ranking value along a continuous scale. Formally, each ranking value represents the mean of a normal distribution, and all constraints’ distributions are assumed to have equal standard deviations, which are generally arbitrarily set to 2. At evaluation time, a selection point is chosen independently from each of the constraints’ distributions, and the numerical ordering of these selection points determines the total ordering of constraints, with higher numerical values corresponding to higher relative ranks. In this way, Stochastic OT defines a probability distribution over total orderings of constraints. The farther apart the ranking values of two constraints are, the higher the probability of a particular relative ranking between them. Conversely, when the ranking values for two constraints are close, each relative ranking has a good chance of being selected. This possibility enables Stochastic OT to model free variation: if two active constraints conflict, different rankings will correspond to different outputs being selected as optimal on different evaluations. This is the main typological consequence of Stochastic OT that differs from classic OT: it predicts that final-state grammars can be variable. In sum, Stochastic OT maintains OT’s evaluation metric for choosing the optimal output form given a ranking: it differs by allowing a single grammar to vary stochastically among different total rankings. The Gradual Learning Algorithm for Stochastic OT (GLA; Boersma 1997; Boersma and Hayes 2001) is an on-line, error-driven learner like EDCD and MRCD. It processes one surface form at a time, and learning is triggered when the output generated by the learner does not match the observed output. As in EDCD and MRCD, the learner uses its current grammatical hypothesis to generate an output form for each learning datum it processes. To generate an output, the GLA samples a total ranking from its grammar by randomly choosing selection points for each of the constraints, and uses the resulting total ranking to generate its output for the learning datum. In the case of a mismatch, the algorithm slightly decreases the ranking values of loser- preferring constraints and slightly increases the ranking values of winner-preferring constraints. All constraints are adjusted by the same amount, called the plasticity. To some extent, the update rule for the GLA resembles the update rule for EDCD, except that ranking values of all the loser-preferring constraints are pushed down little-by- little over many errors, and the ranking values for winner-preferring constraints are also adjusted. The basic insight is that as learning continues, constraints favoring losers will gradually be pushed lower and lower—and those favoring winners higher and

Learning with Violable Constraints 733 higher—until errors become diminishingly rare. In general, the algorithm is not guaranteed to converge on a correct grammar, or any grammar for that matter, as shown most concretely by Pater (2008). Pater shows that the GLA can fail to converge on a grammar for a simple language generated from a total ranking of constraints. In practice, however, the algorithm usually performs well assuming it is provided with full structural descriptions of the learning data (see e.g. Boersma 1997; Boersma and Hayes 2001 for simulations). Probabilistic extensions of OT offer a number of advantages over discrete ranking. The CD family of algorithms discussed in section 30.2 assumes that the learning data are free of noise and errors. If presented with data containing errors or variation, the CD family of algorithms will not find a total ranking—it will either detect inconsistency (MRCD) or endlessly cycle between different rankings (EDCD). In contrast, probabilistic extensions of OT enable resistance to noise and errors due to the learner’s sensitivity to frequency. In the GLA, each update to the grammar is small, but grammar updates accumulate over many exposures, and therefore more frequent patterns influence the grammar more over time. This sensitivity to frequency means that the effect on the grammar of a small proportion of errors in the data is overpowered by systematic, high frequency patterns. It is possible to view the noisy evaluation of Stochastic OT solely as a component of the learning process, which is removed upon the completion of learning (see e.g. Boersma and Pater 2008: section 7). Under this view, the learning task is taken to be learning of a total ranking, and probabilistic learners like the GLA have the advantage over CD in being resistant to small amounts of noise. On the other hand, as just discussed, the GLA, in contrast to the CD algorithms, is not provably correct. In general, however, noisy evaluation is usually taken as more than a noise- resisting component of learning; it is usually taken to imply substantive predictions about typology, distinct from those of Classic OT. Namely, it predicts that adult languages can exhibit optionality or free variation. Since probabilistic extensions of OT extend the range of possible grammars, they extend the grammar-learning subproblem to the learning of these probabilistic grammars. In other words, the grammar-learning subproblem becomes larger (and harder) because the task of the learner now includes not only learning of languages generated from total rankings, but also of languages generated from mixtures of total rankings. Probabilistic learners like the GLA have the ability to match frequencies in the learning data, thereby learning languages with free variation. The GLA has also been used to learn stochastic rankings that can be used to model aspects of gradient grammaticality, where a structure’s grammaticality is modeled in terms of its production frequency under the stochastic grammar (Zuraw 2000; Boersma and Hayes 2001; Hammond 2004; Hayes and Londe 2006). A final advantage of probabilistic grammars, discussed further in section 30.6, regards the modeling of acquisition. By representing grammars as distributions over rankings, probabilistic learners can model gradual learning curves, whereby accuracy on individual overt forms improves gradually over time.

734 Gaja Jarosz

30.3.2 Weighting and Probabilistic Weighting There are two main types of weighted constraint grammars differing in how the numerically weighted constraints are interpreted at evaluation time. Both evaluate competing output structures based on their relative harmony, which is the weighted sum of constraint violations. The weight of each constraint is multiplied by the number of violations it incurs (expressed as a negative integer), and the results are summed over all constraints as illustrated in Tableau (5). (5) Weighted Grammars and Calculation of Harmony

/lupz/ Max 22 ID[Voi] 20 Agree[Voi] 15 *+Voi/Obs 10 a. lupz –1 –1 b. lups –1 c. lup –1 d. lu –2

Harmony = –25 Harmony = –20 Harmony = –22 Harmony = –44

In Harmonic Grammar (HG) (Legendre et al. 1990a, 1990b, 1990c; Smolensky and Legendre 2006) and its close relatives, such as Linear OT (Keller 2000), the optimal output is defined as the output with highest harmony. Thus, HG defines candidate (b) in Tableau (5) as optimal. In a probabilistic extension of HG, called noisy HG (Boersma and Pater 2008), the weights of the constraints are selected from independent normal distributions at evaluation time, just as in Stochastic OT. The difference is that in Stochastic OT these numerical weights are interpreted as a strict ranking, whereas in noisy HG, they correspond directly to the weights used in evaluation. Thus, noisy HG defines a probability distribution over weightings of constraints in the same way that Stochastic OT defines a probability distribution over rankings. This variation in weights/rankings determines the probability with which different output structures are selected as optimal. In Maximum Entropy (also called log-linear) grammars, which have also been applied to learning with OT- like constraints (Johnson 2002; Goldwater and Johnson 2003; Jäger 2007), the probability associated with an output candidate is directly related to the harmony. Unlike Noisy HG, Maximum Entropy models use a single weighting to define the probability with which different candidates are selected: specifically, the probability of an output is proportional to the exponential of its harmony. This contrasts with noisy HG, according to which only candidates that win under some weighting are assigned probability. As a result, noisy HG assigns zero probability to candidates like (d) above, while a Maximum Entropy model assigns nonzero probability. In sum, while the stochastic component in noisy HG resides in the weightings themselves being noisy, the stochastic component in Maximum Entropy models exists at the level of candidate output structures directly. Given full structural descriptions of the data, HG, noisy HG, and Maximum Entropy grammars can all be learned by an on-line, error-driven learning algorithm very much

Learning with Violable Constraints 735 like the GLA (Jäger 2007; Boersma and Pater 2008).5 This algorithm for weighted grammars, called Stochastic Gradient Ascent/Descent (SGA), is error-driven: when there is an error, weights of loser-preferring constraints are slightly decreased, and weights of the winner-preferring constraints are slightly increased, just as in the GLA. The only difference between the GLA and SGA update rules is that in SGA the amount of change for the weight is proportional to the difference between the number of constraint violations assigned to the winner and the number of constraint violations assigned to the loser, whereas in the GLA all that matters is which form has more/fewer violations (Jäger 2007; Boersma and Pater 2008). This slight difference in the update rule has important formal consequences: the learning algorithms for HG, noisy HG, and Maximum Entropy grammars are provably convergent on the correct (nonvarying) target grammar given inputs paired with fully structured outputs (Fischer 2005; Jäger 2007; Boersma and Pater 2008). In sum, weighted grammars of various sorts can all be learned by an on-line, error- driven learning algorithm very much like the GLA for Stochastic OT, but, unlike the GLA, Stochastic Gradient Ascent is provably correct. Furthermore, probabilistic weighting (noisy HG and Maximum Entropy) shares the advantages of probabilistic ranking with respect to noise tolerance, learning of variation, and prediction of gradual learning curves. Both HG and Maximum Entropy grammars have been used to model gradient grammaticality. Indeed, the original application of HG was to model the interaction of syntactic and semantic factors on the graded acceptability of intransitive sentences in French (Legendre et al. 1990a, 1990c). More recently, modeling of gradient grammaticality has been explored for Maximum Entropy grammars (Hayes and Wilson 2008) as well as for HG (Keller 2000; Coetzee and Pater 2008a).6 Despite these many advantages, weighted constraint systems have the potential disadvantage of fundamentally departing from the typological predictions of Classic OT, for which there is extensive empirical support. Indeed, OT developed out of HG, and Prince and Smolensky (2004) dismissed HG as insufficiently restrictive. Recently, there has been a resurgence of interest in weighted grammars, however. For a fixed set of constraints, weighting is a more powerful system that permits a range of additive constraint interactions, or cumulative effects. Some authors have argued that the move to the weighted constraint grammar model is warranted in order to account for attested cumulative effects (Keller 2000; Keller and Asudeh 2002; Goldwater and Johnson 2003; Jäger and Rosenbach 2006; Coetzee and Pater 2008a).7 Pater (2009) 5

See Soderstrom et al. (2006) for a related learning algorithm for a connectionist implementation of HG, and Johnson (2002) and Goldwater and Johnson (2003) for other learning algorithms for Maximum Entropy grammars. See also Bane et al. (2010) on general properties of learning with HG, including a linear bound on the number of training examples necessary to learn a weighting of constraints. 6 Actually, Hayes and Wilson’s model also learns the constraints themselves. For other work on learning constraints, see Hayes (1999); Flack (2007b). 7 See Hayes and Londe (2006); Jäger and Rosenbach (2006); Jarosz (2010), however, on how Stochastic OT is able to account for certain kinds of additive constraint interactions.

736 Gaja Jarosz argues that HG is not as powerful as has been assumed, that many bizarre predictions go away when certain constraints that are problematic on independent grounds are removed, and that HG more elegantly captures certain cumulative effects than alternative approaches. As Pater points out, it would be a mistake to assume that a typological theory using weighted constraints must rely on the same constraints that OT relies on, and therefore the relative restrictiveness of the two theories is not clearly in a subset/superset relationship. However, Bane and Riggle (to appear) show that even using basic constraints like Max and Dep gives rise to novel (and arguably undesirable) typological predictions in HG. In sum, the appropriateness of weighted constraint grammars as a typological theory is an open question and a topic of much ongoing debate (Prince 2002b; Legendre et al. 2006; Tesar 2007; Pater 2009; Bane and Riggle to appear; Potts et al. 2010).

30.3.3 Summary Classic OT ranking can be extended in various ways by replacing discrete ranking with ranks/weights on a continuous scale and/or by adding noise to the evaluation process. Probabilistic variants allow for tolerance of noise and errors in the learning data and for learning of variable target languages. Of the three probabilistic variants discussed in this section, only the SGA for noisy HG and the SGA for Maximum Entropy grammars are guaranteed to converge on a grammar consistent with the data; the GLA for Stochastic OT does not always converge on a grammar consistent with the data, even for data generated from total rankings of constraints. On the other hand, Stochastic OT is a more restricted generalization of Classic OT, and therefore its typological consequences are closest to that of Classic OT. Weighted grammars generalize Classic OT in an orthogonal direction, abandoning strict domination in favor of additive constraint interaction, thereby predicting a range of cumulative effects. There exist provably correct learning algorithms for both probabilistic and non-probabilistic weighted grammars. The typological consequences of assuming a grammar relying on weighting rather than ranking are a topic of much ongoing debate.

30.4 Hidden Structure The presentation so far has focused on on-line, error-driven learning algorithms for (probabilistic) ranking and weighting for the narrow grammar-learning subproblem. As these sections have shown, there are a number of available learning algorithms that solve this important subproblem. However, the success of all of these learning algorithms relies on the assumption that learners are provided with full structural descriptions of the data, including prosodic/syntactic structure as well as underlying representations, which are not available to the human learner.

Learning with Violable Constraints 737 To understand the challenge posed by hidden structure, recall that the grammar update for all these error-driven learners involves comparing the constraint violations of the learner’s output and the learning datum in order to determine how the rankings or weightings of constraints should be adjusted. Hidden structure obscures the constraint violations of the observed learning data. Constraint violations are assigned to fully structured input–output pairs, not to unstructured overt forms that the human learner is presumably exposed to. Structural ambiguity, created by prosodic or syntactic structure, obscures the violations of any constraints that reference the hidden structure, whether they assign that structure or simply depend on it. For example, if the learning datum is a tri-syllabic word with medial stress, for example [pakǽti], there are (at least) two analyses: [(pakǽ)ti], with an initial iambic foot, and [pa(kǽti)], with a final trochaic foot. Without the footing, violations of constraints like Trochaic and Iambic for the learning datum are unknown. If underlying/lexical representations are unknown, then the violations of faithfulness constraints for the observed output are likewise unknown. Thus, without the full structural descriptions, the vector of constraint violations for the observed output required to calculate the update to the grammar is not apparent from the learning datum. Learning without access to full structural descriptions is a major and rich area of ongoing research. Work in this area usually focuses on one of two major subproblems: structural ambiguity and the learning of the lexicon. Sections 30.4.1–30.4.3 review several learning theories that address the problem posed by structural ambiguity (the broad grammar-learning subproblem) and section 30.4.4 briefly reviews work on the learning of the lexicon.

30.4.1 Robust Interpretive Parsing Within OT, learning in the face of structural ambiguity has been a topic of ongoing work since at least Tesar (1997a, 1998) and Tesar and Smolensky (1998, 2000). In order to apply error-driven learning in the presence of structural ambiguity, Tesar and Smolensky (1998) proposed Robust Interpretive Parsing (RIP), which provides an educated guess, based on the current constraint ranking, about the structure of the observed output. Specifically, RIP uses the learner’s current hierarchy to select the most harmonic candidate among the structural descriptions consistent with the learning datum. That is, RIP uses standard OT evaluation, but rather than choosing between competing pronunciation of the same input, it chooses between competing analyses, or parses, of the same overt form. RIP thus selects an analysis of each learning datum and a corresponding full structural description that enables the grammar updates to be calculated in the usual way. RIP is motivated independently as a mechanism for analyzing/parsing overt structures given a (adult) grammar; however, in the context of learning, RIP does not always work because the learner’s (incorrect) grammar can cause RIP to assign incorrect structure, leading the learner astray. Tesar and Smolensky (2000) presented simulation

738 Gaja Jarosz results for an RIP version of CD on a large metrical phonology test set with structural ambiguity. They found that RIP/CD learned just 60.5 percent of the languages in the system correctly when starting from an unranked initial hierarchy.8 RIP was later applied to the GLA (Apoussidou and Boersma 2003; Boersma 2003; Apoussidou 2007), as well as to SGA for HG (Boersma and Pater 2008). Boersma and Pater report on simulations comparing the performance of RIP variants of CD, GLA, and SGA for HG with the same test set used by Tesar and Smolensky (2000). They found that RIP/SGA and RIP/GLA outperform RIP/CD, with RIP/SGA for noisy HG getting the highest performance, learning almost 89 percent of the languages in the system on average. These results appear promising for RIP/SGA, which dramatically outperforms RIP/CD and RIP/GLA. However, Jarosz (to appear) showed that a baseline algorithm, relying on random search, outperforms all of these RIP-based algorithms on the same test set, given the same learning conditions. Indeed, the random baseline algorithm learned all the languages on all learning trials, getting 100 percent accuracy. Measuring performance of explicit baseline models is important because it quantifies the difficulty of the learner’s task and the efficiency of learning algorithms on that task. As Jarosz (to appear) shows, the difficulty of the learning task is difficult to anticipate. These results mean that it is vital to evaluate these and other learning algorithms on much larger, harder problems to ensure that their performance and efficiency will scale better than random search.

30.4.2 Inconsistency Detection In another strand of research, Tesar (1997b, 2004b) develops an approach to structural ambiguity that builds on MRCD’s ability to detect inconsistency. Recall that the learner’s hypothesis maintained by MRCD is actually a list of winner–loser pairs, the support, which is consulted during the processing of each learning datum. In order to deal with structural ambiguity, the Inconsistency Detection Learner (IDL) maintains a set of supports, each corresponding to an internally consistent combination of analyses of the learning data examined so far. For each learning datum, IDL runs RCD on each of the supports to determine if the ranking for that support generates an error for that datum. In order to learn from the error, the algorithm needs to assign a structural description to the learning datum. IDL does so by considering each analysis of the datum in turn. Specifically, for each support that yields an error, IDL constructs a version of that support for each structural analysis of the current learning datum and tests each resulting support for consistency using RCD. Supports that are inconsistent are discarded, while supports that are consistent are retained for further processing. In this way, the learner is guaranteed to find a combination of analyses, and an associated ranking, that is consistent with all the data, assuming the data were generated from a 8

Boersma and Pater (2008) show that the accuracy of RIP/CD falls to around 47% on this test set when permuting ties rather than pooling ties are used.

Learning with Violable Constraints 739 total ranking of constraints. Unlike MRCD, IDL does not come with a general proof of its efficiency: the efficiency of the algorithm depends on the degree of structural ambiguity present in the data and on the extent to which the learning data are mutually constraining.9 However, based on simulations with metrical phonology systems like those of Tesar and Smolensky (2000), Tesar (1997b, 2004b) showed that the total number of supports that IDL considers during learning for such systems is relatively small because many supports can be quickly ruled out due to inconsistency.10 In sum, IDL is guaranteed to find a consistent hypothesis, but it does involve considerable additional complexity beyond that of MRCD, GLA, and SGA, which maintain just one grammatical hypothesis at a time. It is also important to note that the inconsistency detection approach is incompatible with noisy or variable data since the approach searches for a hypothesis consistent with all the data.

30.4.3 Likelihood Maximization Jarosz (2006a, 2006b) proposes a theory of constraint-based learning that addresses the full problem of learning both the grammar and the lexicon of underlying forms given unstructured surface forms. Maximum Likelihood Learning of Lexicons and Grammars (MLG; Jarosz 2006a) formalizes learning as an optimization problem within the general framework of likelihood maximization. MLG differs fundamentally from the error- driven learning approaches discussed in section 30.4.1 and does not rely on access to the full structural descriptions of the learning data. Although MLG provides a unified solution to hidden structure of both kinds, the present discussion focuses on structural ambiguity, and the learning of the lexicon in MLG is discussed in the next section. MLG assumes that a grammar is defined as a probability distribution over rankings, as in Stochastic OT. Learning in MLG is not error-driven, however. Instead, for each datum, the learner determines what components (or parameters) of the grammar are capable of generating the datum, and it rewards those components. Simplifying somewhat, for each datum the learner determines which rankings can generate the datum, and it rewards successful rankings. More precisely, the learner rewards a ranking in proportion to its probability given the datum and the current grammar. Crucially, in order to determine whether a component of the grammar is able to generate the datum, the learner need only determine whether the overt portion of the learner’s output matches the overt portion of the learning datum. The full structural description of the learning datum is irrelevant for determining how much a grammar component should be 9

In the general case the algorithm must iterate through all the structural analyses of a datum for which an error is produced one-by-one. As noted by Eisner (2000b), this set could be very large (such as with the case of syntactic parse trees)—this raises questions about the tractability of this procedure in the general case. 10 However, given that mutually constraining characteristics of the system may also benefit the baseline learner, Jarosz (2013) advocates that IDL’s efficiency be explicitly compared to that of the baseline.

740 Gaja Jarosz rewarded—only the overt form is required. In this way, updates in MLG do not depend on assigning a structural interpretation to each overt form. Since optimization is defined over probabilistic grammars, MLG exhibits a sensitivity to frequency and robustness to noise, like the other probabilistic approaches (Jarosz 2006a, 2006b). In addition, formalization of the learning problem as an optimization problem makes available a wealth of well-understood and mathematically sound techniques for performing the optimization. Jarosz (2006a) illustrates the capacities of the theory to learn structurally ambiguous syllabification using the well-known Expectation-Maximization (EM) algorithm (Dempster et al. 1977), adapted appropriately to the task. EM makes gradual updates to the grammar and is guaranteed to converge on a (local) maximum. The challenge for this approach is the identification of an appropriate representation of the grammar that allows updates to the grammar to be made efficiently. The simulations presented by Jarosz (2006a) made the simplifying assumption that the grammar is represented as a probability distribution over the set of total rankings, making the procedure intractable for larger constraint systems. In recent work, Jarosz develops a variant of the theory, and a corresponding family of learning algorithms, that do not make these simplifying assumptions (Jarosz 2009b). The family of algorithms includes an on-line, sampling variant of EM that maintains a single grammar hypothesis at a time, represented in terms of a stochastic extension of partial ordering. Whereas true EM updates rely on calculating expectation over the entire data in batch, the basic idea behind these sampling variants is that for a given overt form, the learner uses the current grammar to randomly sample a ranking compatible with it, and then rewards that ranking. Thus, whereas in error-driven learning the goal of processing each datum is identifying losing competitors, in MLG the goal is identifying successful rankings to reward. Jarosz (2009b) shows that this family of algorithms achieves higher accuracy than RIP/ SGA on Tesar and Smolensky’s (2000) test set given the same learning conditions but does not compare their performance to the baseline discussed in Jarosz (to appear).

30.4.4 Learning Underlying Forms The previous sections discussed learning models that address the broader grammar- learning subproblem, an area of much ongoing work. Another area of ongoing research, and one that is less well understood, is the simultaneous learning of grammars and lexicons.11 The three different approaches to the broader grammar-learning subproblem— RIP, IDL, and Likelihood Maximization—are also the three main lines of attack on the problem of learning underlying representations. After considering the challenge posed by this problem, this section briefly reviews these approaches. The simultaneous learning of a grammar and a lexicon is a particularly challenging task due to the interdependence of these two components of linguistic knowledge. The 11

Tesar and Smolensky (1998, 2000) proposed lexicon optimization as a solution to the problem of learning a lexicon given an adult grammar.

Learning with Violable Constraints 741 models discussed so far have assumed the lexicon is available to the learner. In general, however, the choice of lexicon depends on the grammar, and the choice of grammar depends on the lexicon (Tesar and Smolensky 2000; Albright and Hayes 2002; Tesar et al. 2003; Tesar 2006a). For example, an alternation like [rat] ~ [radə] can be accounted for by a lexical entry /rad/with final devoicing or a lexical entry /rat/with intervocalic voicing. How can the learner make any decisions about either component without knowing anything about the other? From the perspective of an error-driven learner, when an error is produced, it is not clear whether the problem should be attributed to the grammar, the lexicon, or both.12 One major approach to the learning of underlying representations builds on MRCD and inconsistency detection (Tesar et al. 2003; Prince and Tesar 2004; Tesar 2004a, 2006a, 2006b, 2008, 2009; Alderete et al. 2005; Merchant 2008; Merchant and Tesar 2008; Akers 2011). This approach relies on the mutually constraining nature of grammatical and lexical information. As the learner builds up some information about the grammar, this grammatical information allows some aspects of the lexicon to be inferred, which in turn allows further inferences about the grammar, with the process iterating between grammar and lexicon learning. Inconsistency detection plays a key role in the inference of grammatical as well as lexical information. An important technique within this approach is Contrast Analysis, which considers pairs of surface forms differing in exactly one morpheme (Tesar 2004a, 2006a, 2006b; Alderete et al. 2005; Merchant 2008; Merchant and Tesar 2008). The learner uses such pairs to test possible underlying features for a morpheme against the amassed grammatical information. When a certain feature value leads to inconsistency, the learner can safely set the feature to the opposing value. Recent work has improved on the efficiency of such inferences by assuming certain restrictions on phonological grammars (Tesar 2008, 2009; Akers 2011). The Likelihood Maximization approach to learning lexicons in MLG (Jarosz 2006a) is the same as the approach to dealing with structural ambiguity: rewarding of successful lexical representations depends only on matches with the overt data. MLG assumes a probabilistic extension of the standard generative phonology model: the learner selects an underlying representation for a given morphological/semantic input, and then uses that underlying representation and the grammar to generate a surface structure. The lexical representation in MLG is probabilistic, however, which means that each morpheme is associated with a distribution over possible underlying representations, and this distribution changes gradually during learning. During simultaneous grammar and

12

Actually, the challenge is greater than this—the learner must not only (efficiently) find a lexicon and grammar that together generate the learning data, the learner must identify a restrictive grammar. The problem of learning restrictive languages is discussed in section 30.5—in the context of simultaneously learning a grammar and lexicon, this problem is particularly relevant because systematic restrictions in the target language can be (incorrectly) accounted for by making restrictions on the lexicon. Thus, the learner must not only identify a restrictive grammar and lexicon combination, the learner must ensure that all systematic restrictions are handled by the grammar. See Jarosz (2006a, 2009); Tesar (2006a) Jarosz (2009c) for further discussion of this challenging issue.

742 Gaja Jarosz lexicon learning, the learner rewards both lexical and grammatical representations that are successful in generating the overt data (Jarosz 2006a, 2009, 2011). A third approach to learning underlying representations assumes that the grammar and lexicon interact in parallel, with the lexicon represented in terms of lexical constraints (Apoussidou 2006, 2007). Lexical constraints connect abstract morphemes/ meanings with possible underlying representations: there is one constraint for each possible underlying representation of each morpheme, and these constraints interact in parallel with standard grammatical constraints. As a result of this novel representation of the lexicon, lexical learning can be viewed as a constraint ranking problem, just like grammar learning. However, as with structural ambiguity, the learner cannot directly observe the constraint violations of the observed data because the underlying representations associated with overt learning data are hidden. Apoussidou proposes to deal with this by using what is essentially RIP/GLA (see section 30.4.1 above) except that interpretive parsing consists of selecting an underlying form, rather than the structure, consistent with the overt portion of the learning datum. The learner generates its own underlying representation and surface form for the learning datum and compares it to the full structural description of the learning datum provided by robust interpretive parsing. If the underlying representations and overt forms do not match (an error), the GLA update rule can be applied as usual to adjust the ranking values of the constraints, which in this case include both lexical and grammatical constraints.

30.4.5 Discussion This section has reviewed several developing approaches for dealing with two kinds of hidden structure, structural ambiguity and underlying representation. In many cases, work on one problem has assumed a solution to the other, but in general, grammars and lexicons must both be learned from (potentially) structurally ambiguous overt forms. All three of the main approaches have been applied to this full learning problem (Jarosz 2006a; Apoussidou 2007; Akers 2011), but this is an area where much further work is needed.

30.5 Learning Restrictive Languages Identifying learning models that are capable of learning combinations of lexicons and grammars in an efficient and psychologically plausible way from unstructured overt forms is a major challenge and the focus of much ongoing work, as discussed in this chapter. This, however, is still not the full problem facing the language learner. In general, there are many grammars that are consistent with a set of learning data, and the learner’s task is to identify the most restrictive grammar among these, the one that is able to generate all the observed forms and as few additional forms as possible. Restrictiveness

Learning with Violable Constraints 743 entails that the learner’s grammar does not overgenerate by accepting forms that are ungrammatical in the target language. For example, if the learner has encountered no syllables with codas, and yet no alternations that eliminate or avoid codas, the learning data would be consistent with either a high or a low ranking of NoCoda. However, it is generally assumed that the learner should in this case acquire a grammar with high- ranked NoCoda so that syllables with codas would be rejected as ungrammatical. Within OT, a well-known solution to the restrictiveness problem relies on ranking biases, the most well-known of which is the Markedness » Faithfulness (M » F) bias (Smolensky 1996a). In general, ranking markedness constraints as high as possible favors grammars that avoid marked configurations by mapping them to unmarked alternatives. The M » F bias does not exhaust the notion of restrictiveness, however; additional ranking biases are needed to supplement the M » F bias. These include the Specific-Faithfulness » General-Faithfulness bias (Smith 2000; Hayes 2004; Tessier 2007) and the Output-Output Faithfulness » Input-Output Faithfulness bias (McCarthy 1998; Hayes 2004; Tessier 2007). One simple way to incorporate ranking biases in learning models is to assume the biases determine the initial state of learning (Smolensky 1996a; Boersma and Levelt 2000; Hayes 2004; Jesney and Tessier 2011). In this way the learner begins with a restrictive grammar and modifies it in response to overt data. Implementing a M » F bias is straightforward since constraints can easily be identified as Markedness or Faithfulness constraints. Implementing the Specific-Faithfulness » General-Faithfulness bias is trickier, however, since the general-to-specific relationships between constraints are language-specific and must themselves be learned (Prince and Tesar 2004; Tessier 2007). Furthermore, Prince and Tesar (2004) and Hayes (2004) show that starting from a biased initial ranking is not sufficient to ensure that learners end up in a grammar respecting the bias. Prince and Tesar (2004) and Hayes (2004) independently develop learning algorithms based on MRCD, called Biased Constraint Demotion (BCD) and Low Faithfulness Constraint Demotion (LFCD), respectively, which incorporate persistent ranking biases that affect ranking decisions throughout the learning process. As Prince and Tesar explain, guaranteeing that markedness constraints end up as high as possible is not trivial, however, because the choice of which faithfulness constraint to rank (when one must be ranked) can affect the learner’s ability to rank markedness constraints highly later on. Subsequent work has argued that implementing ranking biases in the initial stage of learning is more effective for GLA learners than for CD learners (Boersma and Levelt 2003) and more effective for SGA/HG learners than for GLA learners (Jesney and Tessier to appear).13 In sum, ranking biases provide an indirect way to approximate the restrictiveness of a grammar, and their implementations and effectiveness varies across learning models. 13

See Tessier (2009), however, on hypothetical learning situations arising during RIP-type learning that could lead learners into superset traps that she suggests would be more difficult for memory-less learners like GLA and SGA to get out of. Further investigation of these issues is needed to understand the conditions under which such scenarios could arise and the capacities of various learners to deal with them.

744 Gaja Jarosz In addition to exploring the effects of different implementations of ranking biases, one area for further work is the concurrent learning of restrictive grammars and lexicons. In general, the restrictiveness of grammar and lexicon combinations cannot be reduced to the relative ranking of particular classes of constraints (Alderete and Tesar 2002; Jarosz 2009). The restrictiveness of a learner’s hypothesis depends crucially on the lexicon, and the choice of the lexicon can determine the restrictiveness of the hypothesis as a whole (Alderete and Tesar 2002; McCarthy 2005b; Jarosz 2009). For example (Alderete and Tesar 2002), consider a language that has regular final stress unless the final vowel is epenthetic, in which case stress is penultimate. The learner can explain the penultimate stress forms in one of two ways, either by assuming a lexical stress difference and faithfulness to underlying stress, or by assuming epenthesis and avoidance of stress on epenthetic vowels. The problem is that both solutions involve high ranking of a faithfulness constraint and cannot be distinguished by ranking biases. Nonetheless, the solution relying on faithfulness to underlying stress overgenerates, predicting penultimate stress for forms with any final vowel, not just an epenthetic one. One general approach to dealing with the restrictiveness of grammar and lexicon combinations is developed within MLG. Specifically, in MLG the restrictiveness of grammars is defined directly in terms of the ability of the grammar to maximize the likelihood of the learning data given a rich base (Jarosz 2006a, 2006b). The rich base is simply the universal space of possible underlying forms, which is independently needed as the source of possible underlying forms for lexical learning. Jarosz (2009c) shows that this formulation is more general than any of the ranking biases and extends to subset/superset relationships that arise due to the interaction of grammars and lexicons. In general, further work in this area is needed to better understand the scope of the problem and the extent to which existing approaches can cope with the kinds of restrictiveness issues that arise during the simultaneous learning of lexicons and grammars.

30.6 Modeling Acquisition There is a rich literature on formal modeling of acquisition with violable constraints and a good deal of learnability work that has aimed to model these results computationally. Developmental research has identified a number of general characteristics of child language, characteristics that differentiate child grammars from adult grammars and which computational modeling work has sought to capture. It has been observed that child grammars are generally less marked than adult grammars and that acquisition involves a progression from an unmarked initial state to the target grammar via the gradual addition of successively more marked structures (Jakobson 1941/1968; Stampe 1969). Within constraint-based frameworks, such a progression can be explained by positing an initial ranking of M » F (Gnanadesikan 1995/2004; Smolensky 1996a), the same bias that is independently motivated to favor the learning of restrictive final-state grammars. Assuming an initial ranking of M » F has become standard practice in work

Learning with Violable Constraints 745 on computational modeling of acquisition. Another characteristic of child grammars is their variable or noisy nature as compared to adult grammars (see e.g. Legendre et al. 2002). Children’s production is notoriously variable throughout the process of acquisition, with pronunciations of a single word or structure varying from one utterance to another.14 One aspect of gradual learning that has been examined in several studies concerns the interacting roles of markedness and frequency and their effects on the order of acquisition. Based on the study of twelve Dutch children’s development of syllable structure, Levelt et al. (2000) and Levelt and van de Vijver (1998/2004) found that the acquisition orders of all twelve children were consistent with a markedness bias. However, they also found an effect of frequency, with higher frequency syllable types being acquired earlier. They proposed specific roles for universal markedness and language- particular frequency, with markedness determining possible orders cross-linguistically and frequency playing a secondary role, determining acquisition order for structures not differentiated by universal markedness. In subsequent work, Boersma and Levelt (2000) showed that the GLA was able to reproduce these acquisition orders given an initial M » F grammar and the frequencies of various syllable types in Dutch. Recently, Jarosz (2010) extended these results to other languages as well as to other probabilistic constraint-based learners. Specifically, Jarosz found that attested relative orders of acquisition of complex clusters in English, German, French, and Polish were consistent with the kind of markedness and frequency interaction proposed by Levelt and colleagues. Furthermore, she showed via simulations with the GLA, SGA for noisy HG, and MLG, that all three of these models were able to predict the distinct acquisition orders across languages with distinct frequencies of syllable types. Several additional predictions of these probabilistic learning models are important to note. Due to the incremental nature of learning and the fact that grammars are probability distributions over rankings/weighting, these models predict gradual learning curves with gradual learning of individual target forms and variation throughout the learning process. As production on a particular target form gradually improves over time, the learner passes through intermediate grammars exhibiting variation for the target form, where the target form is sometimes produced accurately and other times reduced to a less marked structure. The rate of accurate production of the target form improves gradually during learning, and the learner eventually settles on a grammar that consistently generates the target form. Thus, a natural consequence of gradual learning with probabilistic grammars is the ability to predict fine-grained learning curves as well as intermediate grammars characterized by variation. Tessier (2007, 2009) develops an extension of MRCD that models acquisition as gradual re-ranking in a Classic OT grammar. As Tessier explains, RCD alone cannot model gradual re-ranking because it identifies a ranking that is compatible with 14 Another characteristic of child language that has been the focus of recent work is the extent to which child grammars exhibit cumulative effects not found in adult grammars (Albright et al. 2008; Jesney and Tessier 2009; Jarosz 2010).

746 Gaja Jarosz all winner–loser pairs in a single application.15 In Error-Selective Learning (ESL) the learner stores errors in a temporary structure called the Error Cache and gradually adds errors to the support in such a way that re-ranking of the current hierarchy is minimal. This results in a series of classic OT grammars that are intended to model intermediate acquisition stages (as well as initial and final stages). Modeling acquisition in terms of a sequence of rankings predicts discrete stages of acquisition permitting more and more of the target forms to be captured, but it does not allow for the finer-grained modeling possible with probabilistic grammars, which can also model gradual learning of individual target forms. This is because in ESL, each stage is a classic OT grammar, and any individual target form is acquired in one step when the markedness constraint it violates is demoted. ESL is guided by frequency in the data in an indirect way: re-ranking is triggered by markedness constraints that are involved in generating sufficiently many errors. Therefore ESL also predicts a general effect of frequency on the order of acquisition, but see Tessier on how the approach differs from the sensitivity to frequency exhibited by probabilistic learners. An important aspect of ESL is that it incorporates persistent ranking biases, building on BCD (Prince and Tesar 2004) and LFCD (Hayes 2004). Each time the ranking is updated (when an error is added to the support), the learner identifies a ranking that is compatible with the support while at the same time enforcing the ranking biases as much as possible. This means that the learner selects errors that cause minimal change and also allow for the intermediate grammar to be restrictive, reflecting the ranking biases. Tessier shows how such restrictive intermediate stages are necessary in order to capture certain intermediate child grammars, called Intermediate Faith (IF) stages. In IF stages, children’s grammars permit target marked structures only in privileged positions, such as stressed syllables. Tessier shows that while the GLA cannot pass through such stages, the SGA for HG can pass through such stages purely as a consequence of the differences in constraint interaction (Jesney and Tessier 2009). In sum, probabilistic learners like the GLA, SGA, and MLG have a number of advantages with respect to modeling of acquisition. Starting from an initial M » F grammar, they can automatically derive order of acquisition effects as a consequence of their sensitivity to frequency. They also produce detailed and gradual learning curves and correctly predict intermediate grammars to be highly variable. Acquisition order can also be modeled as a sequence of total rankings by the ESL learner. Via the incorporation of persistent ranking biases, ESL is able to model intermediate stages that the GLA with just an M » F bias cannot. For the most part, modeling of acquisition has focused on modeling aspects of the narrow grammar-learning subproblem.16 There is much work left on comparing the predictions of the various models and connecting 15

It would be possible for MRCD to model gradual re-ranking if it happened to process the learning data in just the right order. Tessier’s proposal in essence ensures that this happens in general. 16 However, see Jarosz (2011) for simulations with MLG modeling experimental results on the simultaneous learning of the grammar and lexicon and Tessier (2009) for discussion of modeling acquisition of the broader grammar-learning subproblem.

Learning with Violable Constraints 747 their predictions to developmental findings across linguistic domains. An important issue for further work is determining how the simultaneous learning of a grammar and lexicon and the presence of structural ambiguity affect the predictions of computational models for acquisition. There is much potential for productive collaborative efforts between researchers in developmental linguistics and those working on computational modeling. Developmental findings can inform the design and evaluation of computational models; conversely, existing computational models can be used to generate concrete predictions of learning theories for further experimental or developmental investigation.

30.7 Conclusion Computational modeling of learning for constraint-based grammars is an extremely rich area of research. For each learnability subproblem there now exist several alternative computational models with different strengths and weaknesses whose predictions and differences are being explored in ongoing work. Significant progress on each of the learnability subproblems has already been made, furthering our understanding of how the difficult task of language learning can be solved by language learners. The concrete connections between the predictions of computational models and developmental findings provide further, independent support for existing constraint-based learning models, and further our understanding of how children learn language. Nonetheless, many challenges remain for future work. The capacities of various learning models to deal with the narrow grammar-learning problem are relatively well understood; indeed, several of the existing models have proofs of correctness and/or efficiency. The capacities of learning models to deal with the broader grammar-learning problem and the simultaneous learning of lexicons are less well understood. Much further work will be needed to explore the limits of successful learning, as well as the learning of restrictive grammars, in this context. Another important area for future work is developing richer and more extensive datasets with hidden structure and variation on which the various models can be evaluated. With such a rich array of available approaches, there is much potential for future work to build on the strengths of existing approaches to identify a psychologically plausible model that can learn both restrictive grammars and lexicons from unstructured data.

Pa rt V I

AT Y P IC A L P OP U L AT ION S

Chapter 31

L anguage Dev e l op me nt in Children w i t h Deve l opm enta l Di s orde rs Andrea Zukowski

31.1 Introduction A fact of life that has been observed for centuries is that language sometimes fails to develop properly or fully in children. Sometimes this language difficulty co-occurs with other cognitive deficits, and sometimes other deficits are not obviously present. While there is no shortage of practical reasons for understanding language impairments in children, the reason this fact is of interest to cognitive scientists is because the vast majority of children have no such trouble. Most children succeed in acquiring in their own minds the same mental system of rules that mature members of their language community implicitly follow. The question of how children do this is the central issue in developmental linguistics. Interest in children with neurocognitive disorders stems from the hope that the study of when and how language development sometimes goes wrong, will provide insight into this fundamental question.

31.2 Researching Child Language in Special Populations: What Are the Questions? There are two broad categories of questions about child language in special populations that are relevant to theories of human language acquisition. First, under what conditions

752 Andrea Zukowski do language impairments occur in children? Second, when language is impaired in children, how does it go wrong? Under what conditions do language impairments occur in children? Questions that fall into this category are of theoretical interest because of their relevance to claims about language modularity, proposals for innate components to language, proposals of relationships between language and cognition, and theories of language development. Can language be impaired in children with no other cognitive deficits? In other words, can language be selectively impaired? Do cognitive deficits necessarily coincide with language impairment, or is selective sparing of language sometimes observed? When language is impaired in children, how does it go wrong? The other broad class of questions concerns how language goes wrong when it does go wrong. Some questions in this category are discussed under the heading “Delayed or Deviant?” When language development goes wrong, does it go wildly wrong? For instance, are the various parts of grammar acquired in an unusual sequence? Do children with language impairments make types of errors that typically developing children never make? Is there any evidence that their linguistic representations are faulty? Or does development seem to follow the expected course, with minimal deviations in the order of acquisition and the types of errors made, and with no apparent problems in their linguistic representations? In addition to the delayed/deviant questions, there are others that fall under this category. When we look across disorders, what pattern do we see? Can language development go wrong in endlessly many ways, or are there “weak spots” that are especially prone to disruption, even across disorders with completely different etiologies? When we look at a single special population across languages, what pattern do we see? How does the disorder manifest itself differently depending on characteristics of the language being learned? The bulk of this chapter provides a discussion of the questions outlined in the previous paragraphs and the relevant findings from a variety of developmental disorders, including Specific Language Impairment (SLI), Down Syndrome (DS), and Williams Syndrome (WS), as well as autism, Fragile X, and dyslexia. Findings from typically developing (TD) children and from children learning a second language (child L2), are also discussed. The final section offers some discussion of what seems to be missing in either the answers being sought or the questions being asked about language in the literature on developmental disorders.

31.3 Under What Conditions do Language Impairments Occur in Children? 31.3.1 Can Language be Selectively Impaired? The question of whether language can be selectively impaired has implications for theories that take a stance on the “initial state” of children. If human learners have some

Language Development in Children 753 language-dedicated knowledge or language-dedicated learning mechanisms, then it is possible that in a neurodevelopmental disorder, language could be impaired while the rest of cognition is unaffected. If human learners have no such language-dedicated specialization, such a profile would be entirely unexpected. Three disorders are considered in this chapter for their possible match to this profile: Specific Language Impairment, Dyslexia, and Down Syndrome. Specific Language Impairment (SLI) is a behaviorally diagnosed disorder in which there is poor language with no obvious cause. By definition, children with SLI achieve language scores that are poorer than expected based on age and IQ (i.e. scoring in the lowest 10 percent on a standardized test of reception and/or expressive language), but at the same time they have a nonverbal IQ that is within broadly normal limits (i.e. 85 or higher), and no evidence of hearing loss, physical abnormality of the speech apparatus, or environmental deprivation, nor any history of brain damage (Bishop 2006b). It might seem that by its very definition SLI constitutes evidence for selective impairment of language, but this is not necessarily the case. Some cognitive and perceptual deficits that impact language may not interfere much with performance on IQ tasks. Most researchers would acknowledge that there do exist some children with SLI who exhibit language deficits in the absence of any concurrent nonlinguistic deficits of the kind that have been claimed to underlie SLI language problems (e.g. auditory deficits; McArthur and Bishop 2004, 2005; Rosen et al. 2009). However, the question of whether such cases constitute evidence for selective impairment of language is complicated by several issues. First, although such cases do exist, it is nonetheless very common for children with SLI to have additional deficits. Auditory processing has been implicated for decades as a problem area among people with SLI, and although the nature of the auditory processing deficit is still not entirely understood, it is now believed that auditory brain maturation may be 3–4 years delayed in the majority of people with SLI, and this impairment seems to affect both verbal and nonverbal stimuli (McArthur and Bishop 2005). Although by itself, the high incidence of auditory problems in this population should not detract from apparent cases of a “pure” language impairment, a second finding from the same body of work suggests why caution is still warranted: an auditory processing deficit can be detected in behavioral tests during early or mid-childhood, and may persist into late adolescence, as indexed by electrophysiological measures, even though the problems are no longer detectable with behavioral tests. Furthermore, recent longitudinal work has corroborated that there is a link between delays in early infancy in prosodic processing and later vocabulary development and language ability (Weber 2005; Newman et al. 2006) and between delayed maturation of the auditory pathways and later vocabulary and language ability (Penner et al. 2006). Thus, cases of SLI without concurrent nonlinguistic deficits cannot necessarily be taken at face value, which is why the significance of these cases remains unclear. Nevertheless, evidence that SLI is partially caused by heritable impairments in language-dedicated abilities has recently come not from possible “pure” cases, but from

754 Andrea Zukowski twins studies that include children with multiple deficits. Bishop et al. (2006) have shown that both impaired phonological short-term memory and impaired tense marking are heritable factors implicated in SLI, and they are only moderately correlated with each other, suggesting relatively independent genetic origins, even in people who have both deficits. A second possible case of selective impairment is dyslexia, a disorder behaviorally defined as an asymmetry between reading ability and intelligence in children despite adequate reading instruction (Ramus et al. 2003). Research in dyslexia suggests a highly similar picture to that of SLI: it is a heterogeneous disorder in which children may have phonological deficits, auditory deficits, and/or cerebellar deficits, suggesting that each of these may be a “risk factor” for dyslexia. However, research also suggests that among these three deficits, a phonological deficit may be the only deficit that is sufficient, by itself, to result in a dyslexic profile; that is, many people with dyslexia appear to have a ‘pure’ phonological deficit (Ramus et al. 2003). A third possible case of selective impairment of language is Down Syndrome, one of the most common developmental disorders involving cognitive deficits. Unlike SLI and dyslexia, which are defined behaviorally, the genetic cause of DS is known (95 percent of cases involve a third copy of chromosome 21). The question regarding DS is obviously not whether language is impaired while the rest of cognition is intact, but whether there is a relative disadvantage for language. The evidence suggests that the answer is yes. This is certainly true when the comparison group is typically developing children (Laws and Bishop 2003; among others). When the comparison group is another group with comparable mental retardation of different etiology, most studies find that language is worse in people with DS. This is true for most studies comparing DS to Williams Syndrome (Bellugi et al. 1990; Ring and Clahsen 2005; but see Joffe and Varlokosta 2007). Boys with DS also show greater syntactic delays than boys with Fragile X (Price et al. 2008). DS continues to stand out as children get older. Adolescents and young adults with DS still show poorer expressive language than two groups matched on nonverbal mental age: TD children (age 3–6 years) and adolescents and young adults with Fragile X (Finestack and Abbeduto 2010). In one particularly useful study, Kernan and Sabsay (1996) compared adults with DS to adults with mental retardation of unknown etiology, all aged 18–35 years (adults older than 35 were excluded due to the high incidence of Alzheimer’s like brain plaques in older people with DS). A mixed etiology group is thought to provide a good comparison because it is believed to cancel out any cognition-language asymmetries that may exist in other syndromes. Kernan and Sabsay found that the DS group performed worse than the mixed etiology group on almost every measure of morphology and syntax (lexical abilities did not differ). Thus, the evidence suggests that language in people with DS is worse than expected relative to overall mental age; that is, language is selectively impaired. To summarize, there is reasonably good evidence that language can be selectively impaired. This profile seems to be observed in a subset of people with dyslexia, where a phonological impairment can occur in the absence of other cognitive

Language Development in Children 755 impairments, and in people with DS, where the language impairment exceeds a background level of general impairment. Further research is necessary to establish whether some people with SLI can accurately be said to show selective impairment of language, but recent work suggests that even if there are no “pure” cases of language impairment in SLI, twins research has the potential to identify which factors are heritable contributors to language problems, and the two factors thus far implicated are language-dedicated abilities (phonological short-term memory and facility with tense marking).

31.3.2 Can Language be Selectively Spared? Selective sparing of language, just like selective impairment of language, is a pattern that would be of interest to any theory that takes a stance on children’s initial state. All other things being equal, theories that posit that children have language-dedicated knowledge or language-dedicated learning mechanisms would predict that a learner could have broad cognitive deficits, but still develop a normal grammatical system. What does the evidence suggest? Fowler (1998) reviewed several hundred articles on language in people with mental retardation of a variety of etiologies, and reported that not one claimed that there was no effect on at least some aspect of language development. However, it is impossible to know what conclusion to draw from this, since language development can be measured in so many different ways, and language performance on many measures will be adversely affected by cognitive impairments, regardless of the language abilities of the test takers. In other words, the details really matter. Thus, the next section will examine in more detail a disorder that is often cited as a flagship case of selective sparing of language: Williams Syndrome. In the earliest papers which brought WS greater attention (Bellugi et al. 1988a, 1988b) language stood out in two ways that suggested a pattern of selective sparing. First, language performance was better in adolescents with WS than in adolescents with DS of comparable intellectual impairment. And second, while the absolute level of language performance seemed excellent (if not perfect), this contrasted sharply with extremely poor spatial construction abilities as measured by drawings of complex objects such as a bicycle or copying of complex block patterns. Both of these findings are still believed to be accurate, but it turns out they are the wrong comparisons to make in order to determine whether language is selectively spared in WS. This is because people with DS, as discussed in section 31.3.1, are thought to have language skills that are worse than expected relative to their mental age, and because spatial construction abilities in people with WS fall well below their overall mental age, representing the Achilles’ heel of their cognitive skills. What is now known as a result of several decades of research is that when people with WS are compared to typically developing (TD) children of similar mental age, their language performance almost never exceeds

756 Andrea Zukowski that of the TD group, and sometimes falls below this level (see Brock 2007 and Mervis et al. 2003 for reviews).1 In retrospect, this result should not be surprising, and it was probably naïve to expect otherwise. Why? There are two logically possible ways in which language could be better in children with WS than in a control group with comparable mental age, but neither pattern is likely to be observed for good reasons. First, they could have a more advanced grammar, meaning they come to know components of the mature system earlier than expected for their mental age. Or second, they could have a grammar that is mental-age appropriate, but do better than expected for their mental age at deploying what knowledge they possess during speaking and listening. The first outcome is unlikely to be observed for the practical reason that TD children reach a mature state of grammatical knowledge for most aspects of language at such a young age relative to their own mental/chronological age. Thus, by the time it is feasible to test young people with WS with experimentally designed measures, TD controls matched to them on mental age are already at ceiling on most things. The second outcome is unlikely because there is good reason to believe that the skills necessary for successful deployment of some types of grammatical knowledge take time to develop, and are subject to individual variation in cognitive abilities even among neurotypical adults. It is highly unlikely that a group with cognitive impairments will do better at deploying their existing grammatical knowledge than a neurotypical group. For these reasons, the lack of evidence for selectively spared language among people with WS or any other disorder involving mental retardation is far from surprising, and should not be taken as providing convincing evidence against claims that humans are born with language-dedicated knowledge or language-dedicated learning mechanisms. However, at the same time, WS should stop being invoked as an example of selective sparing of language.

31.3.3 An Alternative Question for Cognitively Impaired Populations: is Language ‘Intact’? Many researchers who study language in WS have tacitly accepted that the study of WS is not likely to bear meaningfully on the “selective sparing” question (for the reasons discussed in section 33.3.3), and have focused instead on other questions whose answers can be ascertained by studying WS, and that bear on fundamental questions about language acquisition. One such question is this: regardless of how people with WS perform in comparison to control groups of various kinds, what evidence is there that they end up with a normal underlying system of grammatical principles and representations? In other words, do they attain an “intact” mental grammar? This question is an important 1

The one exception to this generalization is vocabulary for concrete nouns and verbs, which does appear to be better than expected relative to overall mental age in people with WS.

Language Development in Children 757 one for researchers from multiple theoretical perspectives. It is important to linguists and developmentalists who adopt a generativist perspective because they believe that the mental grammar systems that children come to know are underdetermined by the input children are exposed to, and thus the fact that children converge on the particular system that they do is evidence that language development has been constrained in a similar way across children, suggesting dedicated language-learning mechanisms. The “intact” question as applied to children with WS then, is whether the same constraints on language learning guide development of language in this population, despite moderate levels of cognitive impairment and abnormal brain architecture. The “intact” question is also important to developmentalists who adopt a neuroconstructivist background, because they believe that there are no language-dedicated constraints guiding language acquisition, and that although the outcome of normal development is a domain-specific language system, the outcome of development for individuals with WS will be deviant, and will not be characterizable as having some parts of a normal system intact, and other parts impaired (Karmiloff-Smith 1998). There has been a lot of debate over whether language is “intact” or not in people with WS. Part of the disagreement comes from differing understandings of what kinds of things can have the property of being intact or not. While most would probably agree that a bit of grammatical knowledge can be intact or not (one can know a rule or not), some researchers have also applied “non-intact” to levels of performance: “it is clear that even in adulthood, individuals with WS do not have intact sentence comprehension and repetition abilities” (Grant et al. 2002: 414). Disagreement also arises because of differences of opinion about what would count as evidence for intact or non-intact grammar. For instance, some researchers seem to believe that if people with WS perform worse than expected for their mental age on a language test, they cannot have an intact grammar (Thomas et al. 2010). But of course an intact grammar by itself does not ensure perfect performance, nor even mental-age appropriate performance. On the other hand, the only way one can infer intact knowledge of some component of grammar is by devising a test that cannot be passed without the requisite knowledge. Since there can be disagreement even about whether passing a test reflects underlying knowledge (e.g. for a prime example, see Musolino et al. 2010 a reply by Thomas et al. 2010 and a reply to the reply by Musolino and Landau 2010), it can be helpful to use tests that yield attempts at complex sentence production in the absence of models in the immediate context. Most researchers find successful sentence production compelling (though unfortunately not all parts of grammatical knowledge can be tested via children’s productions). One can only investigate whether “language” is intact (in any population) by examining one language component at a time, and the answer may be different for different components of grammar. To date, by adolescence or earlier, people with WS have been shown to have intact knowledge of all of the constraints on reflexive binding in English (Clahsen and Almazan 1998; Ring and Clahsen 2005; McKeown et al. 2008; Zukowski et al. 2008), knowledge of the constraint on plurals inside compounds (Zukowski 2005), knowledge of the form of subject gap relative clauses and object gap relative clauses (Zukowski 2009), knowledge of the form of yes–no questions and

758 Andrea Zukowski affirmative wh-questions (Zukowski 2001, 2004), and knowledge of several parts of the rules for forming English tag questions (Zukowski and Larsen 2004). They show evidence of knowing that when negation occurs in a particular structural relationship (c- command) with respect to “or,” there is a predictable change in the possible meanings of “or” (Musolino et al. 2010). Knowledge of c-command was therefore also demonstrated in this latter study. While any collection of findings of intact components of grammar is necessarily incomplete, as the list of apparently intact components grows larger and more diverse, the challenge of how to explain this degree of successful language acquisition without recourse to language-dedicated constraints on learning becomes greater.

31.4 When Children with Developmental Disorders Have Problems with Language, How Exactly Does Language Go Wrong? In this section, a number of questions are addressed about how language goes wrong in children with developmental disorders that affect language in some way. First, do the language problems observed suggest deviance in either the process or the product of language acquisition compared to typical children? Second, what pattern do we see when we look at language problems across disorders? Does language go wrong in different ways for every different disorder, or are the same areas repeatedly found to be problematic? And finally, what pattern do we see when we look at language problems cross-linguistically, within a single disorder? How does a single disorder manifest itself differently depending on characteristics of the language being learned?

31.4.1 The Question and Relevance of Deviance The question of whether language is “delayed or deviant” has been asked about virtually every special population at one time or another. It is actually misleading to set these two descriptions in opposition to each other, because when language is impaired in any population, there is virtually always an initial delay, meaning that early milestones (such as the child’s first 10 words, first novel word combinations, etc.) are late relative to typical children; this is true for children with WS, DS, SLI, and autism. Thus, the “delayed or deviant” question is really whether language is merely delayed or whether it is both delayed and deviant. Given recurring interest in the question of whether language is deviant in special populations, it is surprising that there has been so little discussion in the literature of what should “count” as deviant language. This is where theory can and should be used to guide predictions and to help in determining the import of differences observed between typical children and special populations.

Language Development in Children 759 While it is technically true that performance that is different in any way from that of typical children (matched on some relevant dimension) is deviant performance, some types of differences, if attested, would be of enormous theoretical significance, while other differences might be of little import. Consider a case where a learner consistently makes the same syntactic error in her spontaneous speech, but this type of error is never made by typically developing children at any age. This would be headline news, and rightly so. It would suggest the possibility that the learner has constructed a rule that typically developing children never consider. The scarcity of errors in the speech of young typical children, and the complete absence of many types of logically possible errors has been taken as evidence that children are highly constrained by human biology in the hypotheses they make about language. Thus, the presence of a “deviant” error of this type in a learner from a special population would suggest that the learner is not being constrained in the usual way in hypothesis formation, and thus is likely to be learning language in a deviant way.

31.4.2 Do Children with Disorders make Deviant Errors? What evidence is there in the literature for truly deviant errors—errors that occur consistently in the productive speech of children with a developmental disorder, but are unattested in typically developing children learning the same language? There have been repeated claims in the literature that language development in people with WS is deviant. Brock (2007) reviewed the findings that have been cited as evidence for this claim, and concluded that there was “little evidence that the ‘end state’ of language development is abnormal,” and that there was scant support for the most clearly articulated hypothesis of a particular deviant type of development (a “phonology–semantics imbalance,” Thomas and Karmiloff-Smith 2003; see also a review by Mervis et al. 2003 for a similar conclusion that language development is largely normal in WS, with a few interesting caveats about abstract vocabulary). This should not be understood to indicate that people with WS do not make errors. They certainly do. But their errors are predominantly confined to those that are also observed in TD children (some of these are reviewed in the section “What parts of language ‘break’?”). Apart from WS, there do appear to be a small number of errors made by children with other disorders that seem to be truly deviant. Young children with SLI have been observed to omit obligatory complementizers or relative pronouns in both their spontaneous and elicited production of relative clauses. For instance, instead of saying “There’s my baby who wants to go in the train,” a child with SLI was observed to say “There’s my baby wants to go in train” (Schuele and Tolbert 2001: 258). Schuele and Tolbert used an elicited production task to facilitate the production of subject relative clauses, and tested 20 children with SLI, aged 5–7 years. The children with SLI omitted obligatory relative pronouns or complementizers in 63 percent of their attempted subject gap relative clauses. Similar omissions have observed in spontaneous speech as well (Schuele and Nicholls 2000). Omissions of obligatory complementizers in relative clauses have also been observed in the elicited speech of Swedish-speaking preschool age children

760 Andrea Zukowski with SLI (Hakansson and Hansson 2000). Importantly, in these studies, omissions of obligatory complementizers are never observed in younger TD children. This is true for TD children learning either English or Swedish. For instance, Schuele and Tolbert included 15 TD children aged 3–5 years. In over 100 relative clause productions, none of them ever omitted an obligatory complementizer. It does appear that these omissions eventually decline with age in children with SLI. While the 5-year-olds in Schuele and Tolbert’s study showed 91 percent omissions, the rate for the 7-year-olds was 51 percent, and omissions did not occur at all in the elicited productions of older Hebrew-speaking children with SLI, aged 9–14 years (Novogrosky and Friedmann 2006). However, the complete absence of a comparable stage in typical development, suggests that this stage, despite being temporary, may reflect something truly deviant. A second possible deviant error comes also from SLI, although in this case the findings regarding typical children are less clear. Van der Lely and Battell (2003: 170) have shown that 10-to 17-year-old children with SLI sometimes produce wh-questions that have a “filled gap,” as in “Which one did he wear the coat?” (the target response was “Which coat did he wear?”) and “Who Mrs. Scarlett saw someone in the lounge?” (the target response was “Who did Mrs. Scarlett see in the lounge?”). The same children with SLI fail to reject sentences like these in a grammaticality judgment test (van der Lely et al. 2011), suggesting that these errors are not merely an artifact of a particular elicitation procedure. Typically developing children learning English do not usually produce filled gaps in their wh-questions at any age. However, there is some reason to believe that this apparent difference does not reflect a deep difference in the grammars of young typical children and older children with SLI. That is, although TD children do not produce such errors, young children (aged 5–6 years), like older children with SLI, fail to reject such errors in a grammaticality judgment test (van der Lely et al. 2011). To summarize, there are surprisingly few examples in any developmental disorder for errors that suggest that children are learning language in a fundamentally deviant way. This is not because children with disorders do not make errors—they do indeed (as discussed in section “What Parts of Language Break?”). But the vast majority of production errors that children with developmental disorders make look just like errors made by younger TD children. Given that every time one speaks, there is the potential to make mistakes, and there are so many different ways one could get things wrong, it is striking how few convincing cases of truly deviant errors have been observed. The very absence of deviant errors, however, makes those few that do exist all the more intriguing.

31.4.3 Do Children with Disorders Learn Language in a Deviant Order? Very little is currently understood about why acquisition follows the particular order that it does, in any given language. For this reason, it would be difficult to know what to conclude if any deviation from “normal” order was observed in children with a

Language Development in Children 761 developmental disorder. Still, it is worth pointing out—because the findings appear to be quite uniform—that the usual order of acquisition is indeed followed in almost all of the research that has examined this question for any of the developmental disorders discussed in this chapter. For example, a longitudinal comparison of SLI children from age 4–8 years and TD children from age 2 or 3 years to 5 or 6 years showed very little deviance in the order of acquisition of 29 linguistic structures (Curtiss et al. 1992). A longitudinal study of children with DS showed that they were following the normal sequence of development of Brown’s grammatical morphemes (Fowler et al. 1994), but were “stalling” at a very early point. Similarly, Klein (1995) showed evidence that order of acquisition of syntactic structures is also normal in children with WS. In Klein’s study, the number of children with WS who produce examples of particular types of syntactic phrases and clauses showed a high negative correlation with the ‘relative complexity’ of those phrases, which is based on the order in which these structures develop among unimpaired children. There is, however, at least one example in the literature suggesting a deviant order of acquisition. This example comes from research on people with DS. The pattern involves difficulty in comprehending sentences containing reflexives but not sentences containing pronouns—a pattern that is the opposite of the pattern observed among TD children. This unusual pattern has been observed in young English-speaking adults with DS (Perovic 2006), in English-speaking teenagers with DS (Ring and Clahsen 2005), and in young Serbo-Croatian-speaking adults with DS (Perovic 2004). Reflexives and pronouns have been examined in other developmental disorders (WS, SLI), and this backwards pattern is not observed in these other groups. Thus this deviant pattern may be unique to children with DS. In summary, children with developmental disorders do not tend to go wildly wrong in either the hypotheses they consider about language (as reflected in the types of errors they make) nor in the order in which different structures are acquired. This finding is in contradiction to expectations if it is correct that children with disorders must learn language without the benefit of the usual constraints that are hypothesized to guide language development in typical children.

31.4.4 What Parts of Language “Break”? The Relevance of Cross-disorder Comparisons If the normal constraints or mechanisms guiding language acquisition are not in play, then technically all bets are off as to how language will develop. In fact, you might well expect that every child lacking normal constraints will learn a different grammar. For this reason, similarities and differences across disorders in the locus of grammatical difficulty are of considerable interest. Some researchers have concluded that language is impaired in a different way for every disorder. In a recent comparison of language disorders in children with mental

762 Andrea Zukowski retardation (DS, WS, and Fragile X Syndrome), McDuffie and Abbeduto (2009: 58) concluded that “the profile of language development and strengths and weaknesses varies dramatically across syndromes, not only in terms of degree of impairment, but also in the profile or character of the impairments.” In contrast, other researchers have concluded that there is considerable overlap in difficult areas of language among people with different disorders, and they tend to also be the same areas that are difficult for TD children learning the same languages. For example, Penke (to appear) says that “while deficit symptoms in spontaneous speech may vary between language disorders, research has revealed a surprising correspondence of such symptoms across different language disorders.” Penke lists these cross-disorder correspondences: difficulties with bound inflectional morphology, including omissions and/or substitutions of inflections, omission of function words such as determiners and auxiliaries, shorter sentence length, reduced syntactic complexity, with rare production of wh-questions, passives, or subordinate clauses, and failure to “move” verbs in required contexts in those languages that have obligatory verb movement. These similarities are rather broadly described, which leaves open the possibility that they conceal important differences across disorders. A tighter degree of correspondence in problem areas across etiologically distinct disorders could be telling us something very important. So, what evidence is there for detailed similarities in problem areas across disorders? We will consider three examples here: root infinitives, English tag questions, and relative clauses. Children with a variety of different disorders sometimes leave their verbs untensed, as do very young TD children from many language backgrounds. A common context in which children do this is in root or main clauses, which obligatorally require tense, hence this phenomenon is known as “root infinitives.” The reason this is a good place to look for tight correspondences between groups is that the actual pattern of root infinitives observed in typically developing children is quite detailed. First, the omission of tense- marking inflections is probabilistic rather than absolute. Second, when tense is marked, children tend to choose the correct inflectional form. Third, tense omissions occur more frequently than omissions of non-tense inflectional morphemes. And fourth, children obey “distributional contingencies” associated with finiteness in their language (Paradis and Crago 2001; Phillips 2004). For example, if legitimate nonfinite clauses in the language look different in predictable ways from finite clauses, children’s incorrect nonfinite clauses will mirror the characteristics of legitimate non-finite clauses.2 The question is, do children with different types of disorders exhibit the whole suite of “root infinitive” characteristics that TD children do? The existing evidence suggests that wherever the answer is known, the answer is yes. We consider evidence from children with SLI, autism, DS, and WS, as well as TD children learning a second language. 2 For example, non-finite clauses, which occur in embedded contexts in many languages, lack overt subjects in Dutch, Russian, and German, and thus children learning these languages—but not children learning languages which lack this property—tend to leave subjects out when they produce root infinitive main clauses (Phillips 2004).

Language Development in Children 763 Children with SLI learning English exhibit the same suite of root infinitive properties as TD children learning English (Rice and Wexler 1996). Paradis and Crago (2000) first established the root infinitive profile for French TD children, and then showed that French children with SLI show the same profile, including adherence to distributional contingencies associated with finiteness in French. Kjelgaard and Tager-Flusberg (2001) tested English-speaking children with autism, and found that those who have impaired language have the same root infinitive profile as children with SLI (Tager-Flusberg 2004). Paradis and Crago (2001) and Paradis (2005) have even shown that the typical root infinitive profiles for both English and French TD children are also observed in children learning English or French as a second language—a potentially very important comparison, since children learning a second language are neurotypical and have no cognitive impairments. Evidence is incomplete for children with WS. Although children with WS (mean age 7;7) matched on MLU to both TD children and children with SLI outperform both of these groups in tense marking—and are at nearly ceiling (Rice et al. 1999b; Rice 2003)— they are older than both of these comparison groups by several years, and it is not known whether they exhibit a typical root infinitive profile at a younger age. Evidence is also incomplete for children with DS. Ring and Clahsen (2005) have argued that a group of adolescents with DS (age 12–14 years) do not exhibit an “extended” root infinitive stage, because, in addition to performing poorly with tense morphemes, they also perform very poorly with one non-tense morpheme (English comparative adjectival inflection -er). However, these children with DS did omit tense markers probabilistically rather than absolutely, and when they did produce a tensed form, they chose the correct form, suggesting that they exhibit several of the features of a typical root infinitive profile.3 Although the evidence is still incomplete for some disorders, the detailed profile of the root infinitive stage observed in TD children appears to be replicated in children with a variety of etiologically distinct disorders (SLI, autism, and possibly DS), as well as older children learning a second language. A second area of language that is useful to consider for close correspondences in performance across disorders is English tag questions. Tag questions can be added to statements to add a request for confirmation from the listener, as in Maggie left before Paul, didn’t she? or The boys couldn’t see the game, could they? Correctly formed tag questions must have a pronoun that matches the subject of the statement, and an auxiliary verb that matches the auxiliary verb of the statement (or an appropriately inflected form of do-support). Correct tag questions must also take the opposite polarity of the statement to which they are attached (negative, if the statement is affirmative, affirmative if the statement is negative). Given these three different dependencies, there are many ways 3

The one possible deviation from the typical profile was their poor performance with comparative adjectives; however, performance with the comparative morpheme vis-à-vis simultaneous performance with tense morphemes appears to have not been reported for TD children at an age during which they are actively in the root infinitive stage. Thus, it is not clear whether the DS group is in any way unusual with respect to the typical root infinitive profile.

764 Andrea Zukowski to get tag questions wrong, and hence there is plenty of opportunity for unique profiles among different groups or individuals. What does the evidence show? Dennis et al. (1982) reported that among TD children, 6-year-olds are at chance (50 percent) in their choice of polarity for tag questions in their elicited productions, 8-and 10-year-olds are barely better (40 percent correct), and even 12-year-olds still only average 25 percent correct. By contrast, performance with pronouns and auxiliaries is much better, and nearly at ceiling level for 8-year-olds. Weckerly et al. (2004) examined tag questions using a similar procedure in TD children, children with SLI, and children with focal brain lesions. They had predicted that the SLI group would have the greatest difficulty with auxiliaries, because auxiliaries are tense-related, but this was not the case. Rather, the same pattern of difficulty was followed by all of the groups: pronouns > auxiliaries > polarity. Zukowski and Larsen (2004) used a similar procedure to examine tag questions in a group of children and adolescents with WS, and the same pattern of difficulty was observed. In all of these cases, the long- lasting difficulty of polarity lasted even longer in the disordered groups than in TD children. A third area of language that shows close correspondence across disorders is relative clauses. In every developmental disorder that has been examined, as well as in TD children, relative clauses are harder when the gap is in object position than when the gap is in subject position. This pattern has been reported for adolescents with SLI (Novogrodsky and Friedmann 2006), WS (Zukowski 2009), and Down Syndrome (Stathopoulou 2007), and for children with reading impairment (Bar-Shalom et al. 1993). An additional similarity is that there are very few grammatical errors observed in any of these groups, with the exception of young children with SLI (the SLI errors are discussed in section 31.4.5). However, despite the rarity of errors of grammatical form, other types of errors are observed in studies of relative clauses among these groups, and these errors are again very similar across disorders. For example, in some experimental studies, TD children make “wrong head errors.” These are errors in which the target head of the relative clause is replaced with a noun from inside the modifier clause. For example, in an elicited production task, a child might be asked “Which cow turned red?,” and instead of providing the target response “The cow who the girl is pointing to,” the child would say “The girl who is pointing to the cow.” Such responses invoke the correct modifying event (girl pointing to cow event), and are correct in every grammatical detail, but they are inappropriate answers to the question that was asked. In elicited production tasks where TD children learning English are observed to make wrong head errors, older children with WS do too, at even higher rates (Zukowski 2009), but they make them in precisely the same contexts as TD children—only in trials targeting object gap relatives, never in trials targeting subject gap relatives—and inter-trial variation in rates of wrong head errors is highly similar among TD and WS groups, suggesting that the errors are made for the same reason in both groups. Wrong head errors have also been observed in Hebrew- speaking adolescents with SLI and in controls matched for mental age, in elicited production (Novogrodsky and Friedmann 2006; see their table 1, error labeled “Subject

Language Development in Children 765 relative, incongruent with the question”). Importantly, wrong head errors are not confined to children, nor to production tasks. In comprehension tasks where TD Korean children are observed to make wrong head comprehension errors, adult L2 learners of Korean do too (O’Grady et al. 2000). These findings about root infinitives, tag questions, and relative clauses suggest that language development is unfolding in a qualitatively typical way across these many disorders. Additional similarities are not difficult to find. Tuller et al. (2007: 371) observed that object pronouns (clitics) are problematic in the acquisition of French by typical children, children with SLI, children learning French as a second language, as well as deaf children, leading them to conclude that “there are certain aspects of French which are vulnerable when language acquisition is disrupted, whatever the cause for that disruption may be.” These close parallels in the areas of language difficulty observed across a variety of special populations, as well as in neurotypical children and adults, present a challenge to theories that predict deviance in either the process or the product of language development in children with developmental disorders. They must be explained by any theory of typical language development and any theory of the nature of a particular developmental disorder.

31.4.5 The Relevance of Cross-linguistic Findings within Individual Disorders The relevance of cross-linguistic differences within individual disorders is expressed well by in this quotation from Paradis et al. (2005: 34): “On the assumption that the underlying cause of SLI must be the same for all affected children regardless of which and how many languages they are exposed to, any theory of the underlying cause of SLI must be compatible with cross-linguistic variations in its surface manifestations.” The same can be said for other developmental disorders. Cross-linguistic data, therefore, offer a natural way for researchers to test predictions made by hypotheses about the locus of language problems in a disorder that usually originate from the consideration of a single language. Of the developmental disorders discussed in this chapter, SLI is the only one for which a reasonable amount of cross-linguistic findings have now been collected and reported, so this section is confined to a discussion of SLI. SLI findings have been reported for (at least) English, Italian, Spanish, German, Dutch, Hebrew, Swedish, Greek, Russian, Cantonese, and Japanese. A large portion of this work has examined grammatical morphology, due largely to the fact that this is a particularly problematic area for English-speaking children with SLI. Summarizing this work, Crago et al. (2008) concluded that “there are no universal cross-linguistic characteristics of SLI,” and Leonard (2009) concluded similarly “despite the apparently universal nature of this disorder, there are striking differences in how SLI manifests itself across languages” (2009: 308). However, despite the cross-linguistic differences, some “characteristic tendencies” have emerged in this work,

766 Andrea Zukowski regarding grammatical morphology. For example, in any given language, it is very typical for children with SLI to have particular difficulty with some bound or free grammatical morphemes (which constitute clinical markers of SLI for that language), and less difficulty with others. “Verb-related morphemes” are usually implicated (e.g. tense, agreement, aspect, direct object clitics, voice). In languages with impoverished inflectional verb morphology, omissions of morphemes are much more frequent than substitutions of incorrect forms (Crago et al. 2008). Importantly, the differing profiles of problem areas observed across SLI children from different language backgrounds are also mirrored in TD children learning the same languages (Leonard 1998, 2009; Crago et al. 2008). Beyond work on grammatical morphology, several areas of syntax have now been examined across multiple languages in SLI, including wh-questions (English, French, Hebrew, Greek, Swedish, and German), relative clauses (English, Hebrew, Swedish), and passives (English, Greek, Russian). All of these structures involve syntactic movement, as well as requiring the assignment of thematic roles to NPs in non-canonical clause positions, and all of them show some degree of impairment in children with SLI. Among children with SLI, a subject–object asymmetry is observed in both wh-questions and relative clauses, with object-extracted structures showing greater difficulty than subject-extracted structures, and within wh-questions, referential questions (e.g. Which x did you see?) show greater difficulty than non- referential questions (e.g Who did you see?). Both of these patterns are observed in TD children as well. However, similarly to the case with grammatical morphology, there seem to be differences in how the difficulty of these structures is manifested cross-linguistically. Friedmann and Novogrodsky (2011: 368) sum up the differences in wh-question production this way: “French-speaking children with SLI use in-situ Wh-words instead of Wh fronted questions in spontaneous speech and in elicited production tasks…. English-speaking children produce Wh questions with filled gap, mainly in object questions … and Swedish and German speakers omit the Wh-word” (the references cited for these claims are: Penner et al. 1999; Ebbels and van der Lely 2001; van der Lely and Battell 2003; Hamann 2005; Hansson and Nettelbladt 2006; Jakubowicz and Gutierrez 2007; Jakubowicz 2011). English-speaking children with SLI, in addition to producing wh-questions with filled gaps, also judge wh-questions with filled gaps to be acceptable in a grammaticality judgment format (van der Lely et al. 2011). While in the case of grammatical morphology, the cross-linguistic SLI patterns seemed to mirror— in exaggerated form—cross-linguistic differences among TD children, the evidence is less conclusive as to whether these different cross-linguistic manifestations of wh- question difficulty in children with SLI also mirror patterns observed in TD children. Some evidence suggests this is correct. For example, TD French 3-and 4-year-olds do produce some wh-in-situ object questions, even though they do so far less frequently than 8-year-old and 11-year-old children with SLI (Jakubowicz 2011). Other evidence suggests that some SLI errors with wh-questions are deviant. For example, TD English children, aged 5–7 years do not fill gaps in their elicited object questions, unlike 10-to 17-year-old children with SLI matched on receptive language skills and morphology

Language Development in Children 767 (van der Lely and Battell 2003).4 What can be concluded about the cause of SLI from what is currently known about cross-linguistic similarities and differences in its manifestation? Leonard (2009) examined the existing cross-linguistic findings on grammatical morphology, with the specific goal of determining how well each of three different types of theories of SLI can account for them: one that emphasizes general problems in processing, one that emphasizes problems in phonological and/or prosodic features of grammatical morphemes, and one that posits problems in underlying grammatical representations. None of the theories could account for all of the similarities and differences observed across languages. This could be taken to suggest that we just do not have the right theory yet, and some variant of one of the three theories will eventually explain everything. However each of these theories manages to account for some of the cross-linguistic findings quite well, and this raises a second possibility for why none of the theories has yet explained everything. Bishop and colleagues (2006) have recently argued that SLI is a disorder that is caused by multiple, partially independent, overlapping risk factors—risk factors that also occur in the TD population individually, but which have a particularly adverse effect on language learning specifically when they co-occur in the same individual. Twins work investigating this possibility has shown that both phonological short-term memory and morphosyntactic deficits (involving tense) are highly heritable in SLI, but they have different genetic origins (Bishop et al. 2006), and thus each of these two partially independent deficits may account for different aspects of the SLI language profile. If a multiple risk factor model proves to be the correct one for SLI, no theory that focuses on one dimension of language, perception, or cognition should be able to account for everything (a similar idea has recently been proposed for autism, with twins research similarly supporting a multiple risk factor model for this disorder; Ronald et al. 2006; Happé et al. 2006). Another possible implication of the cross-linguistic work is suggested by the finding that many of the areas of great difficulty for children with SLI are an exaggerated reflection (sometimes greatly exaggerated) of the areas of difficulty experienced by typical children learning the same languages. Bishop (2009) has argued that in the debate over whether SLI results from domain general perceptual problems or domain-specific representational problems, what has been overlooked is that a fundamental problem in SLI is failure to learn aspects of language at a normal rate despite repeated exposure, even when perceptual and linguistic demands are reduced. In other words, language learning itself seems to be impaired. This idea seems quite compatible with the finding that the difficulty in SLI is an exaggerated reflection of difficulty with the particular learning problems that children learning the same language have to face on the basis of qualitatively similar input. Yet researchers have focused very little attention on mechanisms of language learning as a possible locus of deficit in SLI (or in any of the other disorders 4 Nevertheless, TD children in this age range do sometimes judge as “grammatical/acceptable” object questions with filled gaps (van der Lely et al. 2011), so it is possible that at an underlying level, the grammars of children with SLI with respect to wh-movement are not qualitatively different from those of TD children.

768 Andrea Zukowski discussed in this chapter). This can and should change, but it will require detailed models of the process of language learning.

31.5 What’s Missing? It should be clear that in spite of all the work that has already been done toward understanding the conditions under which language impairments occur, and how language goes wrong when it does go wrong, there is a tremendous amount of work that still needs to be done. But hopefully it is clear that there is also a lot to be gained from doing it. The following are some things that are missing in the work that is being done that I believe would move the field forward in constructive ways. First, in terms of resources, it would be tremendously useful to have publicly available longitudinal speech samples representing each of the disorders that has been discussed in this chapter. This could allow researchers a chance to test hypotheses about the locus of a language problem prior to embarking on costly studies with disordered populations that it can be difficult to access. Second, more cross-linguistic work is needed on individual corners of grammar. Crago et al. (2008: 280) complained that “there is still an insufficient number of systematic comparisons of the same morpheme type using the same methodology across languages to enable firm cross-linguistic generalizations.” These authors were referring to SLI—the developmental disorder for which the most cross- linguistic work has been conducted. It is true that more cross-linguistic work is needed for SLI. But their complaint is doubly deserved for other developmental disorders. At present there is very little cross-linguistic data on language in children with WS, DS, and autism. How would such work be helpful? All of the best cross-linguistic SLI work has been motivated by hypotheses about a particular component of grammar or perception being impaired. These hypotheses have led to predictions of different surface manifestations of difficulty—differences that would be inexplicable without recourse to the models of grammar or perception underlying them. So cross-linguistic work on other disorders is useful wherever there is a hypothesis stating that a particular component of language that has different surface manifestations across languages is impaired. A second motivation for more cross-linguistic work on other disorders is based on the main cross- linguistic finding from the existing work on SLI: the finding that the problem areas are just an exaggeration of the problem areas experienced by TD children learning the same language. This finding is compatible with Bishop’s proposal that SLI is a disorder affecting not language itself, but language learning. When enough cross- linguistic data has accumulated from other disorders, will this pattern be observed for other disorders as well? Is impairment to language learning itself a common pattern among disorders? Is impairment to specific grammatical components also observed in some disorders?

Language Development in Children 769 A third gap in the field is that there is so little attention being paid to what is, in my mind, the biggest unexplained finding in the study of language in developmental disorders: namely, why do we see so much overlap in areas of syntactic difficulty among vastly different special populations? In some cases, the similarities are exquisite: they are similar not just in what is difficult, but also in how speakers respond to the difficulty: what kinds of errors they make, where they are most prone to making them, where they fail to make them entirely, and what kinds of errors they do not make under any circumstances. This fact is very useful, and it is also very puzzling. It is useful because very many people doing work in this field work on just one or two disorders. When you develop a theory of the cause of the language problems in one disorder, you should want to know whether the next disorder over exhibits the same problems, because some hypotheses would be untenable if it does. The similarities across disorders are also very puzzling. Why would the same language problems occur in a child with cognitive impairments, in a child with normal intelligence but difficulty learning language, and in an adult with no impairments who is simply learning a second language? The first thing to ascertain for each of the similarities observed is how deep they really are. A single problem might have multiple qualitatively different causes that would be observable if examined in the right way. For example, Friedmann et al. (2006) examined different possible causes for a common difficulty observed in both SLI and Broca’s agrammatic aphasia in comprehending object gap relative clauses. The critical findings came from different predictions about what types of production errors would be expected from one type of underlying syntactic impairment versus another. We do not know how many of the cross-disorder similarities that have already been observed are like this—superficially similar problems, but with different underlying impairments. But examples such as these suggest one answer regarding the cause of some cross-disorder similarities: they reflect the fact that some linguistic phenomena are multi-faceted and could be impaired by any one of a number of quite different problems. If, after careful investigation, some cross-disorder similarities are found to be deeply similar, the puzzle would still remain, and would call for a different type of explanation. It should be obvious that doing such work requires examining a single linguistic structure using multiple methods (comprehension, production, grammaticality judgment), as well as having a theory of which component parts of grammar are involved in building and interpreting that structure. There is one final thing that I believe anyone working on language in developmental disorders should be aware of: the emerging “risk factors” models of a variety of cognitive disorders. Recent twins work on SLI, autism, and dyslexia suggests that many developmental disorders may arise as a consequence of multiple partially independent impairments (Bishop 2006a). Indeed this work has caused researchers who have spent decades investigating single-cause explanations of individual disorders to abandon them as misguided (Bishop 2006a; Happé et al. 2006). In my opinion, it would be wise to keep abreast of these models for two reasons. First, although cognitive, perceptual, or linguistic risk factors were sought out specifically because researchers were

770 Andrea Zukowski looking for the “causes” of heritable disorders that are defined behaviorally, those risk factors that can be shown to matter critically in the heritability of language problems in behaviorally defined disorders may also turn out to play an important role in typical child language learning, as well as learning in children with genetically defined disorders. Indeed, the levels of “impairment” seen in affected children on these risk factors are not at all “off the scale” of the range seen in the normal population—they are just the bottom end of the normal distribution (Happé et al. 2006). The second reason to keep abreast of these models is that researchers conducting these twins studies need our help in the form of theoretically driven measures of underlying cognitive processes that can be incorporated into a genetically informative designs (Bishop 2006a). For that, we will need better models than we have today for how children learn all of the aspects of language that they must learn. Theoretical models like those of Christophe et al. (2008) and computational models like those of Heinz (2007) are a good step in the right direction.

Chapter 32

The Genetics of Sp oke n L anguag e Jennifer Ganger

The last two decades have seen the study of the genetic basis of language progress by leaps and bounds, due in part to large twin samples and in part to advances in statistical and molecular genetic techniques. This chapter assesses and explains current findings in this field with a focus on spoken language. The reader will, by the end of the chapter, not only be up to date with current findings but will also have some of the necessary tools to assess and integrate forthcoming literature.

32.1 Why Study the Genetics of Language? Genetics can inform psycholinguistics on two crucial issues that have been at the forefront of research in recent decades: modularity and mechanism. Genetics may also address the longstanding issue of innateness of language. The issue of modularity within cognition and of subdivisions within language itself (into grammar, vocabulary, and so on) have preoccupied the field for decades and have been addressed with behavioral (Pinker and Ullman 2002), theoretical (Fodor 1983; Pinker 1999), and neuropsychological studies (Ullman 2004). Genetics provides a new line of evidence. In behavioral or statistical genetics, we can ask not only whether grammar and vocabulary, for instance, have similar heritabilities, but we can also ask, regardless of individual heritability estimates, whether their genetic variances are correlated. This genetic correlation tells us whether the variation of the two traits is based on the same genes. Furthermore, as genes that affect language are identified, we can ask whether the same gene affects two components of language, for example, grammar and vocabulary. Either method can give us more information about whether specific

772 Jennifer Ganger linguistic abilities are controlled by the same genes and hence likely to arise from the same developmental and cognitive processes. Genetics also has the potential to help the field move beyond a view of cognition as merely modular or non-modular and address actual mechanisms of typical and atypical development. Although no other species has language, a gene whose mutation is associated with a language impairment can be traced through pre-and postnatal development in other species to learn its role in the developing brain, and the results can be used to generate predictions about human gene mechanisms of function that can be tested. Finally, the long-time debate on innateness of language in humans might seem an obvious benefactor of the search for genes related to human language. Besides the fact that a deeper understanding of mechanism may refine or eliminate questions of innateness (Marcus 2004), a positive finding, whether of heritability or specific genes, could vindicate decades of research that has indirectly supported innateness of language (Chomsky 1986; Pinker 1994). This connection is controversial, though, because inquiry into the genetics of language is fundamentally and inextricably based on individual differences in language behavior and the covariation between these individual differences and quantity or identity of shared genes. To accept that genetics can address the questions of innate knowledge, we must make the assumption that the genes that produce individual differences are the very same ones that allow all humans to acquire language. Thus, while we may have to suspend some disbelief and be open to the possibility that individual differences can provide information about the origins of a species-typical trait, there is much to be learned from genetics about the structure of cognition and the mechanisms of its development.

32.2 Genetics of Normal Variation Traditional behavioral, or statistical, genetics uses twins, adopted children, and other family designs to mathematically tease apart genetic and environmental contributions to variance in normal behavior. I focus here on twin studies because they are more common, particularly among recent and ongoing studies.

32.2.1 Genetics of Normal Variation: Statistical Methods Twin studies are based on naturally occurring variation and are designed to partition phenotypic, or observed, variance in a trait into genetic and non-genetic sources. Thus, the twin method pinpoints the origins of individual differences. The underlying logic of the classic twin study is that identical or monozygotic (MZ) twins share 100 percent of their genes while fraternal or dizygotic (DZ) twins share, on average, 50 percent of their genes. If both types of twins share their environments to the same extent, then any increased covariation in MZ twins over DZ twins, measured by intra-pair correlations,

The Genetics of Spoken Language 773 must be the result of sharing twice as much genetic material. To the extent that the MZ correlation (rmz) is greater than the DZ correlation (rdz), individual differences in the trait have an origin in genetic differences, and the trait is said to be heritable (Plomin et al. 2008). The remaining variance is attributed to non-heritable, or environmental, variance. The environmental variance is then typically divided into two sources: shared environment and non-shared or unique environment. Shared environment (c2) stems from environmental differences between different pairs—any environmental factor that makes twins within a pair more similar to one another and different from other pairs. Such variance is assumed to arise from home-and family-related variables, such as the prenatal environment, parenting style, income, nutrition, access to educational resources, and so on. Non-shared environment (e2) comes from within-pair environmental differences—any environmental factor that makes co-twins differ from one another. Non-shared environment is assumed to originate from unique physical and social experiences for each twin outside the home, such as having different friends or teachers, pursuing different activities, contracting different diseases, and other random accidents or events. However, non-shared environment may also come from different qualities of placenta in utero, different perinatal experiences, differential parental treatment if that treatment does not have a genetic origin, or from random error in measurement. These quantities can be estimated as • heritability: h2 = 2 ∙ (rmz – rdz); • shared environment: c2 = rmz – h2; and • non-shared environment: e2 = 1 – rmz (Plomin et al. 2008). For most of the twentieth century these formulae were sufficient because the focus of behavior genetics was to show that traits had non-zero heritability (Turkheimer 2000), and the study of language followed that trend. However, in the last 10–20 years, the trend in behavior genetics has moved from showing that a trait is heritable to using heritability as a tool to address various more complex issues, such as modularity and gene-environment interactions (e.g. Tuvblad et al. 2006). Although gene-environment interactions have not yet been extensively explored in language, divisions within language (e.g. vocabulary and grammar) as well as modularity of language from other cognitive processes, have begun to be explored. Exploration of these matters necessitated more sophisticated techniques, and in the study of normal variation, the simple formulae above have given way to structural equation modeling (SEM) methods (Neale et al. 2006). SEM provides the tools to test alternative models with modest sacrifice of power (e.g. a model with different heritabilities for males and females) and to estimate h2, c2, and e2 in more than one variable at a time, so-called multivariate behavior genetic modeling. It is worth taking some time to consider the basic procedure and interpretation of this technique so the reader is in a position to understand modern published works.

774 Jennifer Ganger SEM, or model fitting, begins with variance and covariance estimates from MZ and DZ observed twin data. The MZ twin As, MZ twin Bs, DZ twin As, and DZ twin Bs each provide an estimate of population variance for any trait of interest, such as non-word reading or verbal memory. These are calculated as ordinary variances; the fact that the sample consists of twins is irrelevant. Each variance estimate is assumed to be the sum of three independent sources of variance: additive genetic variance (A), shared environmental variance (C), and non-shared environmental variance (E):

σ2 = σ2A + σ2C + σ2E (1)

With one variance estimate for each of the four aforementioned groups (MZ-A, MZ- B, DZ-A, DZ-B), there are four estimates of the population variance, resulting in four equations of the form in Equation 1. Then the unique nature of twins comes into play. Co-variances are calculated for MZ and DZ twins separately. The MZ co-twin covariance is assumed to be the sum of additive genetic variance and shared environmental variance. The DZ co-twin covariance is assumed to be the sum of one-half of the additive genetic variance (since DZ twins share half of their genes) and shared environmental variance.

COVmz = σ2A + σC2 (2) 2 1 2 COVdz = 2 σ A + σC (3)

Thus, we have a system of six related equations: four iterations of Equation 1, Equation 2, and Equation 3 (Purcell 2008). SEM techniques are then used to find possible solutions for each term (σA2, σC2, σE2), plug them into the equations, and compare them to the observed variances and covariances, providing us with the estimates that have the best overall fit to the observed data. This fitness can be measured by several different statistics: chi-square (χ2), Akaike’s Information Criterion (AIC) (Akaike 1974), and root- mean square error of approximation (RMSEA), all of which should be low to indicate a small deviation of observed from expected values in modeling. A non-significant χ2 indicates a potentially good fit, as does RMSEA ≤ .06 (Hu and Bentler 1999) and small AIC. When the model is based on additive genetic, shared environment, and non- shared environment, as just demonstrated, it is known as the ACE model. Estimates of h2, c2, and e2 are then derived as the proportion of total variance each term accounts for (e.g., h2 = σA2/(σA2 + σC2 + σE2)). The significance of each can be gauged with confidence intervals, but the standard method of deciding on the best model is to drop one term at a time and compare the fit of the resulting model to the original. For example, to decide whether A (additive genetic variance) contributes significantly, one would compare the ACE model to a model with just CE and ascertain whether the fit deteriorated, determined by testing whether χ2, AIC, or RMSEA increased significantly.

The Genetics of Spoken Language 775 This procedure extends from looking at variables individually to examining their relationship with one another, so-called multivariate analysis. The basis of multivariate analysis is to use the covariance across twins and across variables. So, in the bivariate case, we are interested in the covariance between twin A’s score on variable 1 and twin B’s score on variable 2 (for example, the covariance between twin A’s verbal intelligence and twin B’s nonverbal intelligence). Then, in what is known as the correlated factors model (Neale and Maes 2003; Neale et al. 2006), the same kind of SEM procedure described in the previous paragraphs can be used to estimate, for any two variables, the correlation between their additive genetic variances, shared environmental variances, or non- shared environmental variances. The correlation between the genetic variances of the two variables is known as the genetic correlation (rg), and it is interpreted as the proportion of genes relevant to one variable that is shared by the other. (Same for shared environment correlation (rc) and non-shared environment correlation (re)). For instance, a genetic correlation of .30 for verbal and nonverbal intelligence in 2-year-olds indicates that 30 percent of the genes that contribute to individual differences in one measure also contribute to individual differences in the other. A related statistic is bivariate heritability. Just as we can think of the univariate heritability as the proportion of phenotypic variance that is due to genetic variance, we can think of bivariate heritability as the proportion of a phenotypic correlation between two variables that is mediated genetically. For example, if the phenotypic correlation between verbal and nonverbal intelligence is .42 (Price et al. 2000) and the bivariate heritability is .17, then (.42)(.17) = .07 of the observed correlation between the two measures is mediated genetically.

32.2.2 Genetics of Normal Variation: Findings 32.2.2.1 Univariate Results As noted at the start of the chapter, twentieth-century twin studies succeeded in showing that language is heritable. These studies are reviewed decisively in Stromswold (2001), where they are divided into those that address vocabulary, phonology, articulation, and grammar. The vocabulary results were mixed and heavily dependent on age and measure. For studies of infants and toddlers involving parent report, Stromswold (2001) reported small to moderate heritability (h2 = .29) (Reznick et al. 1997; Ganger et al. 1999; Dale et al. 2000), though one study using preferential looking yielded no heritability (Reznick et al. 1997). On the other hand, standardized vocabulary measures in older children produced more substantial heritability: h2 = .53 in five studies of children 3-to 12-years-old (Fischer 1973; Foch and Plomin 1980; Mather and Black 1984; Segal 1985; Thompson et al. 1991). Stromswold’s (2001) analysis revealed that phonology measures produced somewhat higher heritabilities than vocabulary, including .71 for non-word repetition (Bishop et al. 1999); .68 for phonological awareness (Hohnen and Stevenson 1999); and .78 for

776 Jennifer Ganger a phonology composite (Stromswold 2001). Articulation showed somewhat less heritability, 26 percent, though estimates vary depending on the measure and on whether SES (socio-economic status) is partialled out (Matheny and Bruggemann 1973; Mather and Black 1984). As for grammar, there were fewer studies to review, but perhaps the most reliable univariate study, Mittler (1969), employed the Illinois Test of Psycholinguistic Abilities and reported a heritability of around .50 in 4-year-olds. Grammar-related measures from standardized tests given to 6-to 13-year-old twins yielded moderate to high heritabilities as well (Stromswold 2001). Thus, before the field began to focus on multivariate techniques and modularity, a small but informative body of work indicated that the subsystems of language were heritable, particularly after 3 years of age. What they could NOT show was (1) whether these language abilities could be separated genetically from other aspects of cognition, and (2) whether they could separated from one another genetically.

32.2.2.2 Multivariate Results, Part 1: Verbal versus Nonverbal Abilities In response to such questions, the Twin Early Development Study (TEDS) was initiated. TEDS sought to recruit every pair of twins born in England and Wales from 1994 to 1996 (Oliver and Plomin 2007), with the rationale that complex, multivariate models require large sample sizes for adequate power. In addition to other areas of cognition and behavior, this project has focused on language, language disorders, and the relationship between language and other domains of cognition. The conclusion from ten years of study has been that genes for cognition are “generalist” (Plomin and Kovas 2005; Kovas and Plomin 2006; Haworth et al. 2009) because there is substantial genetic overlap between domains and between normal variation and disability. I will present these results in some detail, first for broad verbal versus nonverbal ability, and then for finer divisions within language. The first dataset comes from the twins at age 2 years (Price et al. 2000) using the MacArthur Communicative Development Inventory (MCDI: UK Short Form) (Fenson et al. 1994; Fenson et al. 1997), a parent-based vocabulary and grammar checklist, as the index of verbal development, and a novel measure called the PARCA (Saudino et al. 1998) as a measure of nonverbal ability. The PARCA is comprised of parent-administered tasks of design copying, match-to-sample, block building, and action imitation, as well as parent-report items regarding specific behaviors (for example, “Does your child recognize himself/herself when looking in the mirror?”). Price et al. (2000) found both the CDI and the PARCA to have low heritability (h2 = .24 for the CDI, .23 for the PARCA) and a low bivariate heritability, .17, indicating that only 17 percent of the phenotypic correlation of .42 (= .07) was mediated genetically. The genetic correlation (rg) was .30, which indicates that 30 percent of the genes involved in vocabulary and grammar are also involved in the PARCA, while the remainder do not overlap.

The Genetics of Spoken Language 777 The next TEDS report comes from Colledge et al. (2002), who studied the same twins at age 4, using measures appropriate for this age. The verbal measure was a composite of standardized verbal tests including the Renfrew Bus Story (Renfrew 1997a), which requires retelling a story; the Renfrew Action Pictures task (Renfrew 1997b) (AP Grammar), which asks for descriptions of pictures and assigns points based on the use of inflectional morphemes and function words; the British Ability Scale Verbal Comprehension task (BAS) (Elliot et al. 1996), which asks the child to arrange objects according to instructions; a phonological awareness task, requiring matching on the basis of rhyme and initial consonant; McCarthy Word Knowledge (McCarthy 1972), which asks for verbal definitions of words; McCarthy Verbal Fluency, which requires the timed generation of words within specified semantic categories; and McCarthy Opposite Analogies, which requires completing sentences by providing opposites. The nonverbal measure was a composite of all the perceptual-performance and quantitative subtests of the McCarthy Scales of Cognitive Abilities (McCarthy 1972), including block building; puzzle solving; tapping, which requires imitating sequences of mallet strikes on a 4-key xylophone; copying a geometric design; drawing a person; conceptual grouping, which requires choosing and sorting blocks by color, shape, and size, culminating in odd-one-out problems; number questions, culminating in simple story problems; and numerical memory (classic digit span, forward and backward). These measures, in 4-year-olds, had moderate heritabilities (h2 = .39 for the verbal composite, .41 for nonverbal), substantial bivariate heritability (.40, 95 percent CI: –.26 –.74), and a large genetic correlation (rg = .63). However, when subtests that loaded heavily on both factors were removed (phonological awareness, draw-a-child, number questions, numerical memory, and conceptual sorting), the genetic correlation dropped to rg = .50 (Colledge et al. 2002). Finally, there are TEDS data from the same twins at 12 years of age, when they participated in a distance study (Haworth et al. 2009). The internet-or phone-adapted tasks included standardized tests of reading comprehension and fluency; general cognitive ability (“g”) measured by the WISC-III general knowledge, vocabulary, and picture completion tests and Raven’s Standard and Progressive Matrices (Raven et al. 1998); mathematics, based on British math curriculum standards including the categories of understanding number, non-numerical processes, and computation and knowledge; and spoken language. Spoken language was assessed in three ways: with triplets of aurally presented sentences from which the participants were asked to choose the two with the same meaning; with the Figurative Language subtest of the Test of Language Competence- Expanded Edition (Wigg et al. 1989), requiring comprehension of non-literal language; and with the Making Inferences subtest of the Test of Language Competence, requiring knowledge of pragmatics. Haworth et al. (2009) reported phenotypic correlations ranging from .54 to .65 and genetic correlations of rg = .82 between language and g; rg = .65 between language and math; and rg = .63 between language and reading. The TEDS group has used the results from Price et al. (2000), Colledge et al. (2002), and Haworth et al. (2009) to argue that genes are “generalist” (Kovas and Plomin 2006;

778 Jennifer Ganger Haworth et al. 2009). That is, they take the results just described as evidence for the existence of genes that affect all of cognition and a lack of domain-specific genes. This is a curious stance for several reasons. First, the results at age 2 should lead to the opposite conclusion. At this age, the nonverbal task, the PARCA, is actually primarily nonverbal in nature, and the genetic correlation between it and the CDI is small (.30). Clearly, there are genes that affect both verbal and nonverbal ability, but there are also genes—apparently 70 percent of relevant genes—that affect one but not the other. Second, at subsequent ages, it is hard to argue that the verbal and nonverbal tasks are completely non-overlapping in the skills they entail, as the descriptions in this section should make apparent. At age 4, when the genetic correlation between verbal and nonverbal ability putatively takes a giant leap and half the genes overlap, the verbal measures are not confined to mere knowledge of language in the Chomskyan sense, but include a broad swath of social and pragmatic skills in addition to grammatical and lexical knowledge, while the nonverbal measures require some of the same skills in the course of following directions and completing the tasks properly, even with the obviously language-dependent tasks (like numerical memory) removed. At age 12, the highest genetic correlation is between g and language, but g includes two verbal IQ subtests. It is also not surprising that language and math overlap in genes, since math problems are typically verbally presented and encoded. Hence, though solving math problems requires skills in addition to language, language is also required. The high genetic correlation between reading and spoken language is also unsurprising, especially with no articulation measure included. The only truly nonverbal task, the Raven’s, is packaged into g and not considered separately. Thus, much of the genetic overlap could be caused by the nature of the tasks at age 4 and 12, not the underlying knowledge the tasks were presumed to tap. Finally, the genetic correlations are far from 1.0 at 2, 4, or 12 years of age. That fact implies that although some genes are shared, there must also be genes that are unique to each ability, or at least non-overlapping. Although the findings may provide evidence for genes that affect cognition broadly, they hardly clinch the case for domain-general cognition. Whether or not cognition is modular, we should not be at all surprised to find that the vast majority of genes that are related to brain functioning affect many aspects of that functioning, as Marcus and Rabagliati (2006) also point out. After all, looking across mammalian species, we share more than 99 percent of our DNA with chimpanzees, and yet the differences are more striking (at least to us) than the similarities. Likewise, even if verbal and nonverbal intelligence have many genes in common, they do not have all genes in common, and presumably it is those genetic differences— however small—that are responsible for observed behavioral differences. In fact, the differences are not so small. The highest genetic correlation observed between a verbal and putatively nonverbal ability is .82 (spoken language and g at age 12), meaning 18 percent of the genes do NOT overlap. Therefore, although the data are intriguing, we cannot use them to conclude that there are not distinct components of cognition.

The Genetics of Spoken Language 779

32.2.2.3 Multivariate Results, Part 2: Dissecting Language In addition to studies of verbal and nonverbal abilities, the TEDS group has explored the genetic correlation structure of grammar, vocabulary, phonology, and articulation in the same twins up to age 4 years. Starting with toddlers, Dionne et al. (2003) report on vocabulary and grammar at 2 and 3 years of age. Both measures are derived from the CDI (this includes the data from Price et al. 2000). For the vocabulary measure, parents are presented with a long list of words and are asked to indicate which words their children have understood and uttered. For the grammar measure, there are a series of pairs, each one contrasting a grammatical simplification common in child speech to a more adultlike form, for example, two shoe versus two shoes. Parents are asked to indicate which their child would say. Dionne et al. (2003) reported small to moderate univariate heritabilities for each measure, consistent with previous studies (10–20 percent for vocabulary and 39–40 percent for grammar). They also found large phenotypic correlations between vocabulary and grammar at each age (r = .68 at 2 years; r = .52 at 3 years) and across variables and ages (r = .47 for age 2 vocabulary and age 3 grammar; r = .41 for age 2 grammar and age 3 vocabulary), both of which are also consistent with previous work (Bates and Goodman 1997). Their novel contribution was to dissect that large cross-trait phenotypic correlation using behavior genetic modeling. At both 2 and 3 years of age, there were large genetic correlations between vocabulary and grammar: rg = .63 at age 2 and .89 at age 3 for the 1994 birth cohort; rg = .76 at age 2 and .63 at age 3 for the 1995 cohort, meaning that 63–89 percent of the genes contributing to vocabulary are also expected to contribute to grammar at each age (and vice versa). Cross-trait, cross-lagged analyses were also reported (vocabulary at age 2 with grammar at age 3; and grammar at age 2 with vocabulary at age 3) and these yielded smaller but significant genetic correlations of .30–.50, meaning 30–50 percent of the same genes are shared for vocabulary and grammar across ages. In another series of papers, most notably Hayiou-Thomas et al. (2006), the TEDS group makes a similar argument for 4-year-olds. In these reports, a subset of the TEDS sample, some of which was selected for low language performance, was tested by trained experimenters rather than parents. The data are a subset of those reported in Colledge et al. (2002) (Renfrew Bus Story; AP Grammar; BAS Verbal Comprehension; McCarthy Word Knowledge, Verbal Fluency, and Verbal Memory; phonological awareness; articulation; non-word repetition). Phenotypic factor analysis indicated that articulation and non-word repetition should form their own factor, while all the remaining language measures (various vocabulary, verbal knowledge, and grammar measures) loaded more strongly on another factor. There were strong genetic correlations both between the two factors (rg = .64) but even stronger within the second factor. Some of the highest were: rg = .86 between Bus Story and AP Grammar; rg = .61 between Bus Story and Verbal Memory; rg = .77 between AP Grammar and McCarthy Verbal Fluency; rg = .67 between AP Grammar and McCarthy Word Knowledge; and rg = .45 between AP Grammar and McCarthy Verbal Memory. This analysis was reinforced with a broader model showing a shared genetic factor for

780 Jennifer Ganger nearly all the language measures in the second factor, with only modest contributions from independent genes for each measure. The authors conclude that vocabulary and grammar stem from a common set of genetic factors and reject a dual-system model within language (Hayiou-Thomas et al. 2006). Much as in the verbal–nonverbal multivariate studies described in the last section, Dionne et al. (2003) and Hayiou-Thomas et al. (2006) take these results as evidence for a lack of fractionation within language, and the results from 2-and 3-year-olds are in fact compelling. To the extent that the CDI truly differentiates vocabulary and grammar, the overlap in genes is impressive and merits explanation, whether it be bootstrapping (as Dionne et al. argue), underlying lack of differentiation between words and grammar (Tomasello 2003), or the mundane fact that sentences cannot be formed without words. At the same time, we should once again bear in mind that the genetic overlap between the two is well under 1.0, implying that some non-overlapping genes act on vocabulary and grammar independently. The results and conclusions from Hayiou-Thomas et al. (2006) are more troubling. The verbal measures employed are quite broad, but the authors seem to have put a lot of stock into the labels on standardized tests without fully exploring the ancillary skills required to complete the tasks. Nearly all of the tasks rely heavily on following directions, on memory, and on understanding the meaning of words and sentences. This is true in both the grammar and vocabulary tests as well as the verbal memory tests. Furthermore, some of the highest genetic correlations come from tests that are nearly identical in task demands (e.g. Bus Story and McCarthy Verbal Memory, which both require retelling a story), yet the authors treat these genetic correlations as informative. To summarize both sections, the multivariate research from the TEDS group is groundbreaking in size, scope, and technique, but as we move forward, researchers must be more careful about specifying task demands so interpretations are clear.

32.3 Behavior Genetics of Language Disability I turn now from normal variation to disability. Here, we want to know whether disabilities have genetic or environmental causes, and how disorders relate to normal variation. While twins remain informative in this endeavor, the study of disorders or disabilities requires slightly different techniques because the twins are selected on the basis of disability or extreme score rather than being sampled from the entire distribution.

32.3.1 Behavior Genetics of Language Disability: Statistical Methods Before 1990 or so, the statistic of choice for selected twin data was the concordance rate, a measure of the probability of a twin being diagnosed with a disorder given that his

The Genetics of Spoken Language 781 or her co-twin has the diagnosis. A concordant pair is one in which both co-twins are affected or diagnosed with the trait of interest. The widely used proband concordance rate is calculated as the number of affected individuals in concordant pairs divided by the total number of affected individuals. This rate is calculated separately for MZ and DZ twins and compared to find heritability. In the 1990s, concordance studies began to be augmented with a method described by DeFries and Fulker (1985), now known as DF-extremes analysis. This method is based on a linear equation: C =B1P+B2 R + A where P is the proband or affected twin’s score, C is his/her co-twin’s score, R is the genetic relationship between twins (1.0 for MZs, .5 for DZs), and A is a constant of regression. The coefficient B1 provides an estimate of twin similarity independent of zygosity and is not of interest here. Rather, we are interested in B2, which represents the regression of the co-twin’s score on genetic relatedness (R) with twin similarity (B1) partialled out. B2 can be interpreted as the genetic contribution to the mean difference between the probands and the population and can be used to calculate h2g, the heritability of group differences, that is, the heritability of the difference between the impaired and unimpaired groups. This is estimated by dividing B2 by the difference between the means for the probands and the population (DeFries and Fulker 1985 and Plomin and Kovas 2005 also provide a nice description of this technique for the interested reader).

32.3.2 Behavior Genetics of Language Disability: Findings Stromswold (2001) described five twin studies of Specific Language Impairment (SLI) using these methods with a total of 266 MZ and 161 DZ twin pairs. She reported an average proband concordance rate of 84 percent for MZ twins and 52 percent for DZ twins, striking evidence for heritability of language disability. Since the 1990s, a major focus of behavior genetics has been not whether language disability is heritable, but rather whether the same genes that cause variation in normal language ability are also responsible for language disability, as opposed to having a different set of genes that act independently to cause language disability. This issue has been addressed in depth by TEDS researchers, who have concluded that the same genes are responsible not only for language ability and disability, but for all cognitive disabilities, where disability is defined as the lowest 5–15 percent of scores on the same tests used to explore the entire distribution (Plomin and Kovas 2005; Haworth et al. 2009). In the earliest report, Spinath et al. 2004, use a composite of vocabulary and grammar data from the CDI across ages 2, 3, and 4 years. (This includes data from Dionne et al. 2003 and Price et al. 2000, described in section 32.2.2.3.) The lowest 5 percent of scorers are considered to be language impaired and these individuals were entered into DF-extremes analysis. The resulting group heritability for language disability, which is probably more properly considered language delay in many cases but is nonetheless

782 Jennifer Ganger likely to be indicative of a developmental language disability, was 49 percent. Plomin and Kovas (2005) argue that this also means that 49 percent of the same genes that contribute to variation in the normal range on the CDI at age 2–4 years will also contribute to placing someone in the lowest 5 percent on this test, but they do not provide a proof for this assertion. In Viding et al. (2004), the same data from 4.5-year-olds used in Colledge et al. (2002) and Hayiou-Thomas et al. (2006) are again examined with an eye to the lowest segment of scores. Using DF-extremes analysis, Viding et al. report high group heritabilities for low language ability: h2g = .37 for the lowest 15 percent of language performance; h2g = .48 for the lowest 2 percent; and h2g = .76 for the lowest 1 percent of scores. Thus, genetic variance has a great deal to do with falling in the lowest 1 percent of language performance. Again, Plomin and Kovas (2005) argue that the same genes are at work for language ability and disability at 4.5 years of age. Continuing with 12-year-olds, Haworth et al. (2009) use a 15 percent cutoff of their rich data set (described in section 32.2.2.2) to address questions about the etiology and modularity of cognitive disability. They report large group heritabilities for each disability: h2g = .52 for general cognitive disability; .74 for reading disability; and .60 for spoken language disability. And, just as they found substantial genetic correlations for the different cognitive domains in the normal range, they also found large genetic correlations across disabilities: rg = .57 on average among disabilities, and rg = .80 specifically between language and general disability. Haworth et al. use these results, alongside those described earlier, to argue for generalist genes that affect all aspects of cognition and cognitive disability, going so far as to argue that genetically-based intervention will be similar among all cognitive disabilities. Though the research is groundbreaking, we must once again keep in mind that the measures in Haworth et al. (and Viding et al. 2004) may not have been fully differentiated from one another behaviorally. Furthermore, once again, the genetic correlations they report were less than 1.0, so we must allow the possibility of some distinct genes for different abilities and disabilities. Ideally, language disorders would be fractionated even further, as in the studies of normal variation, to address whether there are distinct etiologies for impairments of grammar, morphology, phonology, lexical access, or articulation, using circumscribed or controlled tests. Bishop et al. (1999), began exploring these questions and reported high h2 for non-word repetition and low h2 for auditory processing, but did not have the tools for multivariate analysis. The question is more directly addressed in Bishop et al. (2006) in which the same set of twins from Viding et al. (2004) was examined but with two specific tests—a test of non-word repetition and a test of grammar and morphology, the Test of Grammatical Impairment (Rice and Wexler 2001). Bishop et al. reported significant heritability for both measures, but, in a departure from other TEDS work, little genetic correlation in their bivariate analysis, indicating that impairments in phonological versus grammatical functioning have different genetic origins when careful measures with basis in psycholinguistic theory are employed.

The Genetics of Spoken Language 783

32.4 Molecular Genetics of Language Underlying all of the genetic variation discussed in the previous section are actual genes. Genes code for proteins or regulate other genes that code for proteins, and then proteins build structures and guide their function. While behavior genetics has provided ample evidence that genes contribute to individual differences in language, the true genetic basis of language will only be understood when we identify actual genes that contribute to language and map out their function in development as well as in adult functioning. This search is made difficult not only by the preclusion of animal models, but also by the complexity of language as a phenotype, including its interactions with other domains of cogntion. This complexity suggests that multiple genes will be implicated and that at least some of those genes may not be specific to language. While recent twin research has addressed both normal variation and language disorders, molecular genetics follows in the medical tradition of gene discovery and is focused primarily on disorders. In the following subsections, the logic behind molecular genetic methods is first reviewed, and then results from this field are considered. As we shall see, the 2000s saw massive progress in language gene discovery, but the work of understanding the implications and mechanisms of those genes has just begun.

32.4.1 Molecular Genetics of Language Disability: Methods The logic behind this research is to compare individuals with and without language disability and find the alleles that are shared by the affected members but not the unaffected members. This method is known as linkage. It identifies stretches of DNA whose variance is correlated or linked with occurrence of the disorder. Such a region, known as a quantitative trait locus (QTL), is not necessarily the gene itself, but may contain it or be close to it and therefore may help pinpoint the location of the gene. The location of a QTL, mapped using an extensive map of markers or recognizable sequences of DNA, is reported as a chromosome number (1 through 22, X, or Y), followed by the arm (p for short arm, q for long arm), and the region, where regions were defined based on staining that is thought to be unrelated to function (Griffiths et al. 2008). So, for instance, 3p31 means chromosome 3, short arm, 31st region as revealed by staining. If the identified region corresponds clearly to a gene of known function, the named gene could be reported instead. Although the logic is simple, the statistical methods for a genome-wide search are complex because there are many regions of DNA to test and because multiple regions may work together to influence a phenotype. The reader is referred to Plomin et al. (2008) and Griffiths et al. (2008), for more on the basics, as well as Allen-Brady et al. (2009) as an example of some newer methods.

784 Jennifer Ganger

32.4.2 Molecular Genetics of Language Disability: Findings Excellent recent reviews of this field have been provided by Grigorenko (2009), Newbury and Monaco (2010), Scerri and Schulte-Korne (2010), and Li and Bartlett (2012). The summary in this section strives to include all promising candidate genes and QTLs but is shallow in detail compared to those reviews. The list of candidate genes discussed in this section begins with a series of genes and QTLs associated with increased risk for reading impairment (dyslexia), then continues with those linked to SLI, and finally reviews the candidate gene FOXP2. For each region or gene, there is a brief history followed by recent findings. For several of the candidate genes, these findings include two studies that each represent a concerted effort to fractionate language into sensible components (e.g. articulation, comprehension, non-word repetition). These are Rice et al. (2009), who study an American sample of language impaired youths, and Newbury et al. (2011), who examine a previously identified British sample of language impaired individuals diagnosed with dyslexia or specific language impairment.

32.4.2.1 Candidates and Quantitative Trait Loci Derived from Reading-impairment Studies While the focus of this review is on language and not reading, it is now recognized that many cases of dyslexia originate with a language difficulty, so these genes have the potential to elucidate language processes as well. In fact, several of the dyslexia candidate genes described in the following paragraphs are known to be important in neural migration and, in studies of brain development in rodents, some have produced ectopias reminscent of those described in dyslexics by Galaburda (Galaburda and Kemper 1979) as well as auditory processing deficits (Threlkeld et al. 2007). Nine dyslexia-related loci were identified in the last three decades, DYX1–DYX9. All but DYX4, DYX6, and DYX9 have resulted in the discovery of candidate genes and thus will be considered here (Scerri and Schulte-Korne 2010). Locus: DYX1. Candidate genes: DYX1C1, CYP19A1. The locus DYX1 (chromosome 15) was initially reported as being linked to dyslexia by Smith et al. (1983) and later led to the finding of the candidate genes DYX1C1 (Taipale et al. 2003) and CYP19A1 (Anthoni et al. 2012). DYX1C1 has been replicated in some samples, but not others (Scerri and Schulte-Korne 2010). CYP19A1 represents a very recent finding, and, furthermore the authors find association between CYP19A1 and dyslexia or specific reading phenotypes in only some of the populations they studied (Anthoni et al. 2012). In their meticulous follow-up research, Newbury et al. (2011) report an association for DYX1C1 with a real-word forced-choice spelling test, but not with other measures, including reading, other spelling tests, phonological decoding, or non-word repetition. Newbury et al.’s effect was found only in a dyslexic (not SLI) sample, but at the same time, DYX1C1 was associated with overall risk for SLI and not dyslexia. Rice et al.

The Genetics of Spoken Language 785 (2009) also tested DYX1C1 for association with specific language tests and found linkage between this region and standardized tests of reading, overall language, non-word repetition, vocabulary, and articulation, but not tense or mean length of utterance. However, targeted analysis of the DYX1C1 gene in Rice et al. (2009) did not result in significant association with any measure. Locus: DYX2 (6p22). Candidate genes: KIAA0319, DCDC2. Since its discovery by Cardon et al. (1994), the locus DYX2 led to the identification of two more dyslexia- related genes, KIAA0319 and DCDC2. Numerous studies have replicated the association of these genes with risk for reading impairment in populations from North America, Europe, and Australia (Scerri and Schulte-Korne 2010). In their studies of more specific language phenotypes, Newbury et al. (2011) find no associations for DCDC2 with reading-related phenotypes, while Rice et al. (2009) report significant association for DCDC2 with articulation (but not other measures) in their SLI sample. On the other hand, for the gene KIAA0319, both Newbury et al. and Rice et al. find significant associations with reading and language phenotypes. Newbury et al. find an association with reading and receptive language within an SLI sample and real-word forced-choice orthographic coding in a dyslexic sample, and Rice et al. (2009) report an association with articulation as well as overall language in an SLI sample. Locus: DYX3 (2p15- 16). Candidate genes: MRPL19, C2orf3. Initially reported by Fagerheim et al. (1999), DYX3 led to the discovery of the candidate genes MRPL19 and C2orf3 (Anthoni et al. 2007). Newbury et al. (2011) report a significant association with a point near both of these genes for single-word reading in an SLI sample, while Rice et al. (2009) find no association or linkage. Locus: DYX5 (3p12-q13). Candidate gene: ROBO1. DYX5, initially reported to be linked to dyslexia by Nopola-Hemmi et al. (2001), was replicated in other studies and ultimately led to the cloning of ROBO1 (Roundabout, axon-guidance receptor, homolog 1 (Drosophola)) (Stein et al. 2004). Locus: DYX7 (11p15). Candidate gene: DRD4. The association of DYX7 with dyslexia has been reported in two studies (Scerri and Schutle-Korne 2010). This locus contains the candidate gene, DRD4 (Dopamine Receptor D4), known to be associated with attention-deficit hyperactivity disorder (ADHD) (Faraone et al. 2001). However, attempts to replicate the association with reading impairment have been unsuccessful (Scerri and Schulte-Korne 2010). Locus: DYX8 (1p34-36). Candidate gene: KIAA0319L. Although linkage to DYX8 has been robustly established (Scerri and Schulte-Korne 2010), the candidate gene KIAA0319L (KIAA0319-like) has only been reported in one study (Couto et al. 2008).

32.4.2.2 Candidates and Quantitative Trait Loci Derived from Studies of Specific Language Impairment Locus: SLI1 (16q24). Candidate genes: CMIP, ATP2C2. Among the first regions to be linked to SLI in a genome-wide linkage was a QTL dubbed SLI1 (SLI Consortium 2002, 2004). In subsequent multivariate analysis, the phenotypes associated with this site included non-word repetition, single word reading, single word spelling,

786 Jennifer Ganger reading comprehension score, and to a lesser extent, oral directions (Monaco and SLI Consortium 2007). The linkage was replicated in a different sample by Falcaro et al. (2008). In a targeted study of this region performed with a subset of the SLI Consortium sample selected for extreme scores, Newbury et al. (2009) identified two genes in this locus: CMIP (c-maf-inducing protein) was associated with non-word repetition, and ATP2C2 (calcium-encoding ATPase) was associated with non-word repetition and also, to a lesser extent, with oral directions, word classes, comprehension, formulating sentences, and vocabulary. With both genes, the effect of non-word repetition came largely from five-syllable words. These findings were confirmed and refined in Newbury et al. (2011). Locus: SLI2 (19q13). SLI2 was identified by the same groups and with the same sample as SLI1 (SLI Consortium 2002, 2004; Monaco and SLI Consortium 2007; Falcaro et al. 2008). This QTL is also associated with non-word repetition, as well as several measures of expressive and receptive language from the CELF (Clinical Evaluation of Language Fundamentals, Semel et al. 1987). However, as of this writing, no candidate genes in this region have been identified. Locus: SLI3 (13q21). SLI3 was linked to the presence of language impairment as measured by standardized tests in Bartlett et al. (2002, 2004) in a North American family study. Although no candidate gene has yet been uncovered (Li and Bartlett 2012), Simmons et al. (2010) reported an interaction between this locus and BDNF (Brain- Derived Neurotropin Factor, a gene known to be important in development and involved in memory functioning (Yamada et al. 2002).

32.4.2.3 FOXP2 Much excitement was generated by the discovery of a single gene that was linked to a disorder of speech and language in one family, the KE family (Lai et al. 2001; Pinker 2001), a large pedigree with many language impaired members. The family’s core deficit is now recognized to be an orofacial dyspraxia (Hurst et al. 1990; Vargha-Khadem et al. 2005), most detectable during a task of multisyllabic word and non-word repetition, and one that extends to imitation of non-language material—in other words, the planning or execution of complex sequences of oral movements. However, there are also higher level impairments in syntax and morphology that cannot be easily explained by the dyspraxia, so the disorder clearly affects more abstract elements of language as well (Gopnik and Crago 1991; Fisher 2006). The pattern of transmission in the family indicated autosomal dominance (i.e. the offspring of an affected parent appeared to have a 50 percent chance of developing the disorder, while the offspring of unaffected parents never showed the disorder), and in earlier work, Fisher et al. (1998) had narrowed the search to a region of chromosome 7 (7q31) and named the QTL SPCH1. After finding an unrelated patient with the same disorder and an identifiable chromosomal translocation, the same group was able to focus even more narrowly on the location. This search ultimately led to the discovery of a new gene in this region, one that cosegregated perfectly with the language deficit in KE (Lai et al. 2001). They named

The Genetics of Spoken Language 787 the gene FOXP2: “FOX” by convention for “forkhead box,” a type of gene characterized some time ago in drosophila and known to be important in development. The “P2” has to do with housekeeping among developmental biologists, but note that the all- capital spelling indicates this is a human gene; the mouse version is Foxp2 (Kaestner et al. 2000). FOXP2 codes for a protein with multiple helical sections and wing-like loops. This morphology is notable because the shape allows the molecule to bind to DNA and control whether other genes are transcribed, a process known as regulation. Such control of transcription is crucial both in brain development and in ongoing functioning of the mature brain. The mutation in FOXP2 in the KE family was a change in a single base or nucleotide, hence referred to as a single nucleotide polymorphism (SNP) or a point mutation, that had the result of changing the identity of one amino acid in the resulting protein. This change affects a region of the protein that is not only highly conserved over evolution (Enard et al. 2002), but also occurs in a binding domain of the protein, resulting in a reduction of the protein’s effectiveness in binding its target and regulating transcription (Lai et al. 2003). While several groups tried and failed to link broader language impairment or specific language phenotypes to the FOXP2 gene (Meaburn et al. 2002; Newbury et al. 2002; O’Brien et al. 2003), others have replicated and extended the finding and have made considerable progress in understanding the mechanisms using animal models. One strategy has been to focus investigation on those genes that are regulated by FOXP2, and one that has proved fruitful is CNTNAP2. Vernes et al. (2008) reported that CNTNAP2 variants covary with non-word repetition in children with SLI, a finding replicated with dyslexics (Peter et al. 2011) and reading-related measures in individuals with SLI (Newbury et al. 2011). These findings are bolsted by the implication of the same region in language impairment in an isolated population in Chile (Villanueva 2011). Furthermore, Whitehouse et al. (2011) report an association between CNTNAP2 and variation in performance in 2-year-olds on the Infant Monitoring Questionnaire: Common Subscale, which includes proto-imperative actions, following simple commands, and the use of 2-or 3-word combinations (Bricker and Squires 1989). Furthermore, Abrahams et al. (2007) demonstrated high levels of expression of CNTNAP2 in language-related brain regions in frontal and temporal cortex, and this region has also been implicated in language delay in autism (Alarcon et al. 2008). Thus, while research into the genes regulated by FOXP2 is still in its infancy, the search has already proven fruitful. In addition to pinpointing additional candidate genes for language, FOXP2 has also provided a window into the process, development, and evolution of human language through animal models. In particular, FOXP2 is involved in similar or analogous brain areas in every species studied, and seems to be universally expressed in motor circuits (cortico-striatal and olivo-cerebellar) that support motor memory or implicit learning (Lai et al. 2003). Second, intriguingly, the gene is expressed in songbirds in the neural circuits relevant for learning song (Vargha-Khadem et al. 2005), suggesting a homologous

788 Jennifer Ganger function in a distant species. Third, there are just two differences in this gene between chimpanzees and humans. Both are single base changes that appear to have functional significance and both occurred in the last 200,000 years, which could coincide with the emergence of language in humans 50,000 to 100,000 years ago (Enard et al. 2002; Fisher 2006; Konopka et al. 2009). These findings are intriguing for a couple of reasons. First, the expression of foxp2 in implicit learning circuits and the nature of the resulting language deficits in the KE family together lend support to a view of grammar as related to procedural learning and memory (Vargha-Khadem et al. 2005; Ullman and Pierpont 2005). Second, the recency of the mutations in humans suggests a mechanism by which existing mammalian procedural memory circuits could have been co-opted for language in humans.

32.4.3 Summary I have reviewed ten candidate genes, four of which have received support from multiple sources: DYX1C1 (chromosome 15), DCDC2 (chromosome 6), KIAA0319 (chromosome 6), and FOXP2 (chromosome 7). Several others represent promising newer findings: CYP19A1 (chromosome 15), MRPL19 (chromosome 2), C2ORF3 (chromosome 2), KIAA0319L (chromosome 1), CMIP (chromosome 16) and ATP2C2 (chromosome 16). Finally, two additional QTLs, SLI2 on chromosome 19 and SLI3 on chromosome 13, have not led to candidate genes but have received considerable support as being associated with language disability. For the genetically uninitiated reader, this proliferation of genes will seem overwhelming, but it is not altogether different from other complex cognitive developmental disorders—autism, ADHD, schizophrenia, and others also implicate many genes (Plomin et al. 2008), and probably reflects the influence of multiple genes on a complex phenotype, the fact that there are multiple ways to disrupt normal functioning, and the failure to use homogeneous populations and tasks, a point also noted by Stromswold 2008 and Rice et al. 2009. With the genes that have been discovered, researchers can already begin to make progress in understanding the path from genes to behavior and the evolution of language. Works by Fisher (2006), Abrahams et al. (2007), Vernes et al. (2008), Konopka et al. (2009), and Smith et al. (2010) represent just some of the attempts along these lines. Returning to the question of domain-specificity of language, the genes described herein discriminate dyslexics from non-dyslexics or language-impaired from non- language-impaired. Thus, to the extent that dyslexia and SLI are circumscribed disorders affecting nothing but language, the genes that underlie them could also be specific to language. However, the issue of the specificity of language and reading disorders is itself hotly debated, with many arguing that dyslexia and SLI involve non-linguistic deficits as well. Thus the finding of candidate genes for dyslexia and language impairment are exciting, but do not by themselves provide definitive evidence of language modularity. Furthermore, despite valiant attemps (Rice et al. 2009; Newbury et al. 2011), just one candidate gene, CMIP, has been consistently narrowed down to a single phenotype, and

The Genetics of Spoken Language 789 that phenotype is non-word repetition, which in itself requires several linguistic and nonlinguistic skills. Thus, the existing data do not show definite evidence of domain- specificity for language overall, nor for the fractionation of knowledge within language.

32.5 Integrating and Evaluating Findings Though the last 20 years have seen stunning progress, we are still in the early stages of uncovering the genetic underpinnings of language and understanding their meaning. The first wave of twin studies showed that language has non-zero heritability but did not address issues of modularity. The second wave, the TEDS papers, was a tour de force of multivariate analysis but perhaps did not employ careful enough measures to cleanly divide cognition or fractionate language. I hope there will be a third wave that does both—one that employs even more specific tasks using multivariate analyses. Bishop et al. (2006) is a good example of what this third wave might look like, and indeed they find more evidence for modularity than other TEDS papers. In parallel, advances in molecular genetics have also shed light on language, though there is considerable room for studies with more homogeneous samples in terms of language impairment, and more specific tasks in terms of fractionating language. The behavioral and molecular data do converge on one point: the full range of language abilities relies on many genes and probably shares a large number of them with other aspects of cognition. This should come as no surprise. What remains to be fully explored is whether, if we employ careful enough measures, we can find genetic variance or specific genes that subserve language alone.

Chapter 33

Phon ol o gi c a l Dis orde r s Theoretical and Experimental Findings Daniel A. Dinnsen, Jessica A. Barlow, and Judith A. Gierut

33.1 Introduction Research on the sound systems of young children with phonological disorders holds obvious clinical interest, but the interest of disorders for phonological theory and for typical phonological acquisition may be less clear. Some have even gone so far as to suggest that the patterns found in acquisition, and especially in disordered systems, may be tangential to the facts of fully developed languages and claims about linguistic competence (e.g. Kenstowicz and Kisseberth 1979; Anderson 1981; Hale and Reiss 1998). Presumed performance factors, such as motor immaturity and/or articulatory limitations, are often cited as reasons for putting aside children’s developing sound systems. It is striking, however, that many of the phonological restrictions evident in early acquisition (normal or disordered) are the same as those found in fully developed languages, including restrictions on phonetic inventories, syllable structure, and phonotactics generally. Surely, the speakers of a language with severe limits on the structure of their syllables and/or phonetic inventories do not suffer from motor problems. Also, something other than motor limitations would seem to be involved for those many children who, for example, replace /θ/with [f]‌while those same children prefer [θ] as the substitute for /s/(e.g. Velleman 1988; Bernhardt and Stemberger 1998; Dinnsen and Barlow 1998). This point is tacitly acknowledged by those accounts that attempt to recast these facts in terms of lexical inertia, perceptual confusion, and/or subtle acoustic differentiation (e.g. Velleman 1988; Fikkert 2006; Ettlinger 2009). Nevertheless, current theories of phonology offer grammatical accounts of these and other developmental phenomena that have interesting consequences for theory and acquisition. Those accounts adopt the continuity hypothesis (e.g. Pinker 1984), which maintains that the grammatical principles

Phonological Disorders 791 that govern fully developed and developing languages are the same. As such, children’s developing phonologies can offer a valuable testing ground for theoretical claims about acquisition and the inner workings of a sound system. The purpose of this chapter is to highlight some of the findings about phonological disorders that have derived from and have contributed to recent linguistic theories and to underscore the unique descriptive and experimental research opportunities that phonological disorders afford in the evaluation of linguistic claims. Toward this end, we take up one of the central issues in research on phonological acquisition and disorders, namely the nature of children’s internalized underlying representations. This issue will also serve to illustrate some of the different claims that follow from contemporary approaches to phonological analysis. Because claims about underlying representations are also inextricably intertwined with the processes (or rules) that relate those underlying representations to their phonetic output, we will explore the nature of those processes with key developmental error patterns that involve restrictions on phonetic inventories, restrictions on the distribution of sounds within a word, paradigm effects, conspiracies, and principled restrictions on consonant clusters. Our focus will be on functional (non-organic) phonological disorders. Children with such disorders are developing typically in all respects, except for evidence of a phonological problem. They score within expected limits on standardized tests of hearing, nonverbal intelligence, oral-motor structure and function, receptive/expressive vocabulary, and receptive/expressive language. The children do, however, tend to score poorly on standardized tests of production accuracy, such as the Goldman-Fristoe Test of Articulation-2 (Goldman and Fristoe 2000). While, as we will see, the sound systems of children with phonological disorders tend to resemble those of younger children with typical phonological development, the study of disorders has a number of unique research advantages. Because this cohort of children is older, they are able to attend to elicitation tasks that yield the relevant amount and kinds of data (e.g. minimal pairs and morphologically related forms of words) needed by the analyst to generate linguistic accounts; younger children could not achieve the same. Because of the disordered nature of the phonological system, it is ethical to manipulate their phonologies vis-à-vis experimental treatment studies; the same is not appropriate for typically developing or fully developed systems. Moreover, because the course of phonological acquisition is slowed for these children, it is possible to obtain distinct and reliable snapshots of the phonology as it gradually unfolds over time and with treatment. In typical development, the rapid rate of acquisition precludes moment-by-moment sampling, and in fully developed systems, the rate of change is so protracted that it may take decades or more to discover the innovations. Developmental phonological disorders offer further opportunities to explore individual differences because the primary paradigm used in experimental treatment manipulations is the single-subject design (e.g. McReynolds and Kearns 1983; Gierut 2008b). Unlike population models, single-subject designs require that each child serves as his or her own control, which is demonstrated through stability of performance at baseline relative to change in performance during and following treatment. Maturational issues

792 Dinnsen, Barlow, and Gierut are controlled through the staggered instatement of treatment across sets of three or more children, such that time lags reflect opportunities for spontaneous development, and when not observed, are confirmatory that treatment itself was responsible for phonological change. Given the relatively small number of participants, generalizability to the population (and to phonological systems more generally) must be achieved through direct and systematic replications. As replications accumulate—by children exhibiting the same, and then again different phonological patterns, and in manipulation of the same, and then again different linguistic structures—paradigmatic constructs emerge. Treatment can thus be implemented as an experiment, wherein linguistic properties are manipulated as the independent variable, and changes in a child’s phonology are measured as the dependent variable, thereby establishing a causal relationship between treatment and phonological learning. Moreover, because treatment induces learning, it is possible to document changes in the structure of the sound system over time. The empirical evidence that is garnered from experimental and longitudinal examinations of the developing phonology may then be used to validate, challenge, and/or reform linguistic theory. Many of the research questions that have been asked about disorders and typically developing phonologies are the same, resulting in a large literature that allows the two groups to be compared along specific dimensions. Some of those questions include: What is the composition and structure of children’s phonetic/phonemic inventories (e.g. Stoel-Gammon 1985; Dinnsen 1992; Ingram 1992; Gierut et al. 1994; Cataño et al. 2009)? Is there a discernable developmental trajectory in which sounds/phonemic distinctions are acquired (e.g. Smit et al. 1990; Gierut et al. 1996)? Are some contexts within the word favored for the acquisition of particular sounds (e.g. Ferguson 1978; Stoel-Gammon 1996; Bernhardt and Stemberger 2007; Inkelas and Rose 2007; Dinnsen and Farris-Trimble 2008b)? Do children perceive distinctions in advance of producing those distinctions (e.g. Locke 1980a, 1980b; McGregor and Schwartz 1992; Smolensky 1996b; Pater 2004)? Do children who seemingly merge target distinctions actually make subtle articulatory/acoustic differences (i.e. covert contrasts) that adult listeners fail to detect (e.g. Weismer et al. 1981; Weismer 1984; Forrest and Rockman 1988; Velleman 1988; Forrest et al. 1990; Forrest et al. 1994; Scobbie et al. 1997; Inkelas and Rose 2007)? Studies that have addressed these questions have revealed much about the nature of developing phonologies generally and especially about individual differences and cross- linguistic variation. However, as regards differences between normal and disordered phonologies, they say little more than that phonological disorders exhibit many of the same phenomena seen in typical development, implying that such disorders appear to reflect a delayed trajectory of acquisition. Phonological theories have evolved over the years and have offered different perspectives for analyses that bear on fundamental properties of sound systems and that could serve to identify differences, if any, between disordered, typically developing, and fully developed phonologies. Two very different theoretical approaches have addressed these issues. One broadly defined approach draws on rule-based frameworks, such as Generative Phonology (e.g. Chomsky and Halle 1968; Kenstowicz 1994) and natural

Phonological Disorders 793 phonology (e.g. Donegan and Stampe 1979). Monographs with a focus on phonological disorders that have adopted some version of these rule-based perspectives include Grunwell (1982), Edwards and Schriberg (1983), Elbert et al. (1984), Elbert and Gierut (1986), Ingram (1989), and Yavaş (1991). The other more recent approach draws on versions of the constraint-based framework of Optimality Theory (e.g. Prince and Smolensky 2004). Monographs with a focus on disorders from this perspective include Bernhardt and Stemberger (1998) and Dinnsen and Gierut (2008). In what follows, we will see that these grammatical frameworks make very different claims about children’s underlying representations and the processes that relate those representations to their actual phonetic output. These different claims will then serve to highlight the mutual benefits for theory, development, and clinical intervention.

33.2 Rule-based Accounts 33.2.1 Relational Analyses One of the main contributions of rule-based approaches to phonological acquisition has been the identification and characterization of commonly occurring error patterns in young children’s early speech. These substitution patterns have been described as rules or processes and include, among others, Velar Fronting (/k/> [t]‌), Deaffrication (/ʧ/> [t]), Stopping (/f/> [p]), Consonant Harmony (‘duck’ > [kʌk]), Final Consonant Omission (‘dog’ > [dɔ]), Final Devoicing (‘bag’ > [bæk]), Cluster Reduction (/pl/> [p], /sp/> [p]), and Gliding (/r/> [w]). See Chapter 4 by Goad in this volume for more on children’s substitution processes. Children with phonological disorders have also been found to exhibit these same error patterns, although they tend to persist over a longer period of time (e.g. Grunwell 1982; Ingram 1989; Bernhardt and Stoel- Gammon 1996; Bernhardt and Stemberger 1998). Cross- linguistic studies of phonological disorders (e.g. Hua and Dodd 2006; Yavaş 2010 and references therein) reveal much the same picture, although some substitution patterns that might be considered unusual or rare for English have been found to be common for learners of other languages, for example Coronal Backing (/t/> [k]) and Initial Consonant Omission (‘soup’ > [up]). Similarly, substitution patterns that are common for learners of English might be considered unusual or rare for other languages, for example Liquid Gliding (/r/> [w]). Such cross-linguistic differences in error patterns are often attributed to frequency effects, functional load, and/or phonetic implementation differences in those languages (e.g. Pye et al. 1987; Ingram 1989; Cataño et al. 2009). The general assumption in most of the accounts outlined in the previous paragraph has been that these substitution patterns in both typical and atypical development reflect rules or processes that operate on the children’s correctly internalized

794 Dinnsen, Barlow, and Gierut underlying representations. While these accounts employ fundamental phonological constructs such as underlying representations and rules, they depart in at least one important respect from conventional analyses of fully developed languages. That is, accounts of developing phonologies have largely been relational in nature, specifying a mapping or correspondence between the target pronunciation and the child’s output system. The presumption has been that the child takes something close to the target pronunciation as the underlying representation of words. Associating target pronunciations with children’s underlying representations can, however, result in highly abstract claims about the children’s underlying representations that may not be supported by standard empirical evidence. Analyses of fully developed languages usually rely on observable evidence of phonemic contrasts and/or morpho-phonological alternations to support substantive claims about underlying representations. The same type of information is also presumed to aid learners in their acquisition of those representations. The analytical departure of developmental accounts from conventional practice places those accounts at the center of a long-running debate regarding the degree of abstractness to be tolerated in children’s underlying representations and the rules that relate those representations to their phonetic outputs (e.g. Kiparsky 1968, 1976; Donegan and Stampe 1979). The highly abstract character of underlying representations is most evident in accounts that posit sounds underlyingly that do not occur phonetically in the child’s output. The absence of those sounds at the phonetic level must, in turn, be attributed to questionable context-free rules of absolute neutralization, precluding the possibility of an alternation. This problem can be illustrated by considering the common error pattern that replaces target velars with coronals in all contexts (e.g. Ingram 1974; Smit 1993b; Bernhardt and Stoel-Gammon 1996). Consider, for example, the data in (1) from a child with a phonological disorder, Child 179 (age 4 years; 7 months) (Dinnsen 2008). (1) Context-free Velar Fronting, Child 179 (age 4;7) a) Velars replaced by alveolar stops (Velar Fronting) [toʊm] ‘comb’ [doʊ] ‘girl’ [wɑt] ‘rock’ [bæd] ‘bag’ [pɔtɪt̚ ] ‘pocket’ [fwɑdi] ‘froggie’ b) Alveolar stops produced correctly [toʊz] ‘toes’ [doʊ] ‘door’ [baɪt] ‘bite’ [mʌd] ‘mud’ [budi] ‘bootie’ [mʌdi] ‘muddy’ This error pattern would be described by a context-free phonological rule of Velar Fronting that took as its input target-appropriate velars as underlying representations, converting all of them to corresponding coronal stops at the phonetic level, as illustrated in the derivations in (2).

Phonological Disorders 795 (2) Velar Fronting and target-appropriate underlying representations Velar Fronting:

[dorsal] → [coronal] /…

(Velars are replaced by coronals in all contexts.) Underlying Velar Fronting Phonetic

/koʊm/ ‘comb’ toʊm [toʊm]

/hʌɡ/ ‘hug’ hʌd [hʌd]

Again, under this account, it is assumed that the child’s underlying representations include velar stops. This is a highly abstract analysis, as velar stops never surface phonetically in the child’s speech. The abstractness controversy is no less an issue in those accounts of error patterns that are restricted to certain contexts. For example, many children (normal and disordered) have been found to restrict Velar Fronting to word-initial position, producing velars correctly in other contexts (e.g. Bernhardt and Stemberger 1998; Inkelas and Rose 2007; Dinnsen and Farris-Trimble 2008b). This error pattern is exemplified by another child with a phonological disorder, Child 173 (age 5;11), as shown in (3) (Dinnsen and Farris- Trimble 2008b). (3) Word-initial Velar Fronting, Child 173 (age 5;11) a) Word-initial target /k/realized as [t]‌ [tɔb] ‘cob’ [toʊm] ‘comb’ b) Word-initial target /t/realized as [t]‌ [toʊz] ‘toes’ [tijʊ] ‘tear’ c) Word-final target /k/realized as [k]‌ [bʊk] ‘book’ [bæk] ‘back’ d) Word-final target /t/realized as [t]‌ [but] ‘boot’ [it̚ ] ‘eat’ The derivations in (4) illustrate the different behavior of the more restricted Velar Fronting process in particular contexts, assuming underlying velars in all contexts of relevant words. (4) Word-initial Velar Fronting and target-appropriate underlying representations Initial Velar Fronting:

[dorsal] → [coronal] /#_

(Velars are replaced by coronals word-initially.) Underlying Initial Velar Fronting Phonetic

/koʊm/ ‘comb’ toʊm [toʊm]

/bʊk/ ‘book’ — [bʊk]

796 Dinnsen, Barlow, and Gierut While the assumption of underlying velars in these cases might seem less abstract, especially given that the child produced velars phonetically in other contexts, the postulation of a velar consonant word- initially can be difficult to motivate, at least in terms of conventional evidence involving a morpho- phonological alternation, as is used in accounts of fully developed languages (cf. final devoicing in German: Tag [tak] ‘day,’ but Tage [taɡə] ‘days’). The problem in English is that the addition of a prefix before a word-initial velar might not induce the desired alternation between a word-initial coronal and an intervocalic velar. This is due in large part to the general linguistic preference to reduce allomorphy, especially in early stages of language acquisition (e.g. Hayes 2004). More specifically, to motivate target- appropriate underlying representations in this case, we would want to see in the child’s speech an alternation in target words that begin with a velar (e.g. ‘comb’ [toʊm] and ‘re-comb’ [rikoʊm]) along with the absence of an alternation in target words that begin with a coronal (e.g. ‘type’ [taɪp] and ‘retype’ [ritaɪp]). Such evidence would establish that place of articulation among lingual consonants is indeed contrastive post-vocalically in the child’s phonology, and that the contrast is neutralized by a rule word-initially. Put another way, when underlyingly distinct sounds come to appear in a context not affected by the rule (i.e. post-vocalically), those sounds will appear phonetically as postulated underlyingly. Unfortunately, this type of evidence is rarely presented in developmental studies or is otherwise unavailable. In the absence of conventional evidence in support of claims about underlying representations and independent of the production facts, many have looked to children’s presumed perceptual abilities for supporting evidence (Smith 1973). Though children’s early speech discrimination abilities are generally assumed to be intact, it must be acknowledged that perceptual deficits have been observed for some children (e.g. Locke 1980a, 1980b; Rvachew and Jamieson 1989; Munsen et al. 2005). Additionally, some children may not attend to the very same acoustic distinctions that are important to the adult (e.g. Nittrouer and Studdert-Kennedy 1987). Thus, claims about children’s underlying representations based on presumed perceptual abilities must be regarded with caution. That is, a child might comprehend and distinguish between two spoken words, but this only establishes that the child recognizes some difference between them; it says nothing necessarily about the intended (adultlike) target distinction. Some have also attempted to find support for claims about underlying distinctions through instrumental phonetic analyses that compare a child’s output productions for target sounds that are presumably merged by these rules of absolute neutralization. For example, while the process of Velar Fronting is generally understood to merge the lingual place distinction between coronals and velars in favor of a coronal, it has been shown that some children maintain a subtle acoustic and/or articulatory distinction between coronals that are derived from Velar Fronting and those that correspond with target coronals (e.g. Forrest et al. 1994; Gibbon 1999; Inkelas and Rose 2007). Such a finding can be taken as compelling evidence of an underlying distinction of some sort, but it forces a reconceptualization of the rule responsible for the error pattern. That is, instead of being a neutralization rule, Velar Fronting would in this instance be behaving as a

Phonological Disorders 797 non-neutralizing phonetic implementation rule, signaling a covert contrast. The status of neutralization rules versus those that effect sub-phonemic differences has been shown to pose different challenges for both clinical intervention (e.g. Gierut 1986; Gierut et al. 1987) and second-language acquisition (e.g. Flege 1987). Instrumental phonetic analyses of this sort have yielded similar results for a variety of other error patterns, including Gliding, Initial Voicing, Final Devoicing, Stopping, Dentalization, Final Consonant Omission, Cluster Reduction, among others (e.g. Weismer et al. 1981; Catts and Jensen 1983; Weismer 1984; Velleman 1988; Forrest et al. 1994; Barlow and Keare 2008). At the same time, it should also be noted that many of these same analyses of other children’s error patterns have failed to identify phonetic distinctions among merged sounds. In such cases, it is, of course, always possible that distinctions were being made along dimensions that were not investigated by the analyst. The other option is that some children do, in fact, absolutely neutralize certain target distinctions. In either of these two circumstances, the postulation of underlying target distinctions remains abstract, lacking empirical support. In addition to the abstractness problem, the postulation of target-appropriate underlying representations in these relational accounts can, in some instances, result in claims that attribute to a child seemingly contradictory and inconsistent grammatical processes (e.g. Camarata and Gandour 1984; Gierut 1986; Williams and Dinnsen 1987). As an illustration, consider the forms in (5) from a child with a phonological disorder, Child NE (age 4;6). This child replaced some target velars with coronals (Velar Fronting, (5a)). However, the reverse also occurred with some target coronals being replaced by velars (Coronal Backing, (5b)). In addition, velars and coronals were also sometimes produced correctly ((5c) and (5d)). These anomalies can be reconciled if we consider, independent of the target language, the child’s distribution of these consonants relative to the following vowel. That is, coronals and velars occurred in complementary distribution, with coronals occurring before front vowels ((5a) and (5d)), while velars occurred before back vowels ((5b) and (5c)). The realization and distribution of lingual consonants in Child NE’s phonology were, thus, entirely predictable and consistent properties of pronunciation. The target values for place of articulation played little or no role in the child’s realization of these consonants. (5) Contradictory and inconsistent processes, Child NE (age 4;6) a) Velars replaced by alveolar stops (Velar Fronting) [te] ‘cage’ [deʔ] ‘gate’ b) Coronals replaced by velar stops (Coronal Backing) [kɑ] ‘Tom’ [ɡɑ] ‘dog’ c) Velars produced correctly [ko] ‘comb’ [ɡoʔ] ‘goat’ d) Alveolars produced correctly [dɪʊ] ‘deer’ [dɛ] ‘dress’

798 Dinnsen, Barlow, and Gierut While it might seem surprising that a child would arrive at this distributional generalization, given that coronals and velars contrast in English, the generalization does find some support in the phonotactic probabilities of English. That is, velars are approximately two times more likely than a coronal to occur before a back vowel in English, and coronals are almost two times more likely than a velar to occur before a front vowel. A similar distributional pattern has also been reported for other children (e.g. Fudge 1969; Braine 1974; Grunwell 1981; Stoel-Gammon 1983; Wolfe and Blocker 1990; Gierut et al. 1993; Davis and MacNeilage 1995; Levelt 1996; Tyler and Langsdale 1996; Fikkert and Levelt 2008), as well as for other fully developed languages (e.g. Clements 1990). Various other error patterns involving the complementary distribution of sounds have also been identified in both typical and atypical phonological development (e.g. Smith 1973; Camarata and Gandour 1984; Gierut 1986; Gierut and Champion 2000). Problems of the sort just described have led some rule-based analysts to move away from strictly relational accounts in favor of analyses that view the children’s speech as an independent system, just as any generative phonologist would conduct a synchronic analysis of a previously unstudied language (e.g. Haas 1963; Dinnsen and Maxwell 1981; Weismer et al. 1981; Dinnsen 1984; Leonard and Brown 1984; Gierut 1985, 1986; Gierut et al. 1987; Williams and Dinnsen 1987). As we will see in the next section, independent rule-based analyses result in a rather different picture of disorders—one that includes the additional dimension of incorrectly internalized underlying representations as an explanation for some error patterns.

33.2.2 Independent Analyses Returning to the case of Child NE in (5) above, we can reconsider the facts from the perspective of an independent (non-relational) analysis. The fact that coronals and velars occurred in complementary distribution would be taken as evidence of an allophonic rule that operated on underlying coronals, converting them to velars before back vowels at the phonetic level, as illustrated in (6). An important aspect of independent, rule-based accounts is that underlying representations incorporate just those properties of pronunciation that are idiosyncratic and/or learned. Child NE’s underlying representations would, thus, be assumed to be simpler and more limited than those of the target language, at least in terms of available phonemes. That is, velars would be excluded from the phonemic inventory because they are more marked than coronals. Coronals and velars would be analyzed as allophones of the same coronal phoneme, and the phonetic appearance of any velar consonant would then come about from the exclusive application of the allophonic rule. This has the interesting consequence that velars would in some instances be produced correctly, but for the “wrong” reasons. That is, the target-appropriate realization of a velar before a back vowel (e.g. (5c)) would be derived from an incorrectly internalized underlying representation, namely a coronal, due to the application of an

Phonological Disorders 799 allophonic rule that is not a rule of the target language. Formulating the rule to yield a velar before back vowels (rather than a coronal before front vowels) is consistent with the general tendency for allophonic rules to produce marked sounds (e.g. Houlihan and Iverson 1979), and velars are acknowledged to be marked relative to coronals. These different approaches to analysis make different claims about the nature of the child’s problem. For example, the independent analysis would claim that the child needs to restructure his lexical representations to provide for the occurrence of velar consonant phonemes in all relevant words while also suppressing the allophonic rule. This means that the child must effect a “phonemic split” by reassociating each allophone to two separate phonemes, which has been shown to be an especially difficult problem for both first-and second-language learners to overcome (e.g. Lado 1957; Gierut 1986; Gierut and Champion 1999a). On the other hand, one of the contrasting clinical claims that emerges from a relational analysis of these facts is that the underlying representation of some words, specifically target words with velars occurring before back vowels, would not warrant clinical attention. The relational analysis might, then, underestimate the scope of the problem. (6) Allophonic rule and derivations for Child NE  V  [coronal] → [dorsal]/ __    +back  Coronals are replaced by dorsals before back vowels. Coronal Backing:

Underlying Coronal Backing Phonetic

/to/‘comb’ ko [ko]

/te/‘cage’ — [te]

/dɪʊ/‘deer’ — [dɪʊ]

/tɑ/ ‘Tom’ kɑ [kɑ]

The role that correctly and incorrectly internalized underlying representations can play in explaining individual differences is further illustrated by the case of two children with phonological disorders, both of whom exhibited the same common error pattern of Final Consonant Omission (e.g. Weismer et al. 1981; Dinnsen and Elbert 1984). Child A (age 7;2) evidenced a morpho-phonological alternation between word- medial obstruents and word-final null (e.g. [dɔɡi] ‘doggie’ ~ [dɔ] ‘dog’). Child C (age 3;10) evidenced no alternation, omitting the obstruent both medially and finally in morphologically-related forms (e.g. [dai] ‘doggie’ ~ [dɔ] ‘dog’). Child A’s alternation would be taken as compelling evidence for the claim that a rule of Final Consonant Deletion operated on target-appropriate underlying representations. The fact that the omitted obstruent actually occurred in a related form of the word when not in word-final position provided positive, observable evidence about the underlying representation of such words, which is precisely the evidence that is appealed to in similar such accounts of fully developed systems. The rule’s restriction to word-final position accounted for the non-occurrence of the final obstruent in “dog” words. The rule was prevented from applying to the morpheme-final obstruent in “doggie” words because

800 Dinnsen, Barlow, and Gierut the obstruent was not in word-final position. The rule and derivations in (7) illustrate the account of Child A’s consonantal alternations, assuming target-appropriate underlying representations. (7) Derivational account of Child A’s error pattern Final-C Deletion:

[-son] → ø /_#

(Word-final obstruents are deleted.)

Underlying Final-C Deletion Phonetic

/dɑɡ/ ‘dog’ dɑ [dɑ]

/dɑɡ-i/ ‘doggie’ — [dɑɡi]

On the other hand, the absence of an alternation in the case of Child C fails to support target-appropriate underlying representations and would receive a rather different account. Child C would be claimed to differ from Child A by having internalized a more impoverished (sub)set of the target underlying representations. Child C’s underlying representations would be restricted by a morpheme-structure condition that prohibited morpheme-final consonants. For discussion of morpheme- structure conditions in different models of the lexicon, see Spencer (1986), Menn and Matthei (1992), and McCarthy (1998). Inasmuch as this morpheme-structure condition holds at the level of the lexicon, a phonological rule would either be superfluous or redundant. Child C’s phonetic forms would thus be identical to his underlying representations. (8) Derivational account of Child C’s error pattern Morpheme-structure condition: Underlying /dɑ/ ‘dog’ — Phonetic [dɑ]

Morpheme-final obstruents are prohibited. /dɑ-i/ ‘doggie’ — [dɑi]

Such characterizations make different claims about the learning task that would confront children. For example, if Child A did indeed have the correct underlying representations, he would have simply needed to suppress the rule of Final Consonant Deletion. Child C, on the other hand, having no phonological rule per se, would have been faced with the task of elaborating the range of contrasts in his underlying representations to bring them into conformity with those of the target language (e.g. lexical restructuring). Presumed differences of this sort in children’s knowledge of underlying representations have been shown to play a role in learning. In a series of experimental treatment studies (e.g. Dinnsen and Elbert 1984; Gierut 1985; Gierut et al. 1987), children were taught sounds for which they had more or less knowledge relative to the target underlying distinctions, as determined from standard techniques of analysis within Generative Phonology (e.g. relying on evidence of alternations, contrast, and distribution). When

Phonological Disorders 801 treatment was focused on underlying representations that were presumably internalized incorrectly (i.e. error patterns associated with the absolute nonoccurrence of a target sound or with the absence of an alternation), children learned those sounds and evidenced widespread generalization to other untreated sounds and contexts. However, when treatment was focused on underlying representations that were judged to be internalized correctly (i.e. error patterns associated with alternations), learning was more limited with little to no generalization to untreated aspects of the phonology. This observation highlighting the ease of acquiring phonotactics of a language versus the difficulty of changing rules of the grammar is consistent with other reports (e.g. Hammerly 1982; Flege 1987). Individual differences in children’s learning patterns thus represented a novel source of evidence in support of presumed differences within and across children’s underlying representations. Summing to this point, there can be little doubt that the different types of rule-based analyses of phonological disorders have yielded non-trivial claims that have clinical implications for diagnosis, treatment, and the projection of learning. One of the most important findings from relational accounts is that many of the children’s error patterns can be expressed as rules that involve a systematic correspondence between the target language and the child’s repairs. One limitation of those accounts is that they often entail highly abstract underlying representations, the substance of which generally lacks empirical support, posing a problem for both the analyst and the child learner. Independent (non-relational) analyses of phonological disorders tend to fit better with conventional accounts of fully developed languages, at least in terms of the type of evidence required to support claims about underlying representations. The general conclusion that emerges from independent accounts is that children’s error patterns can emerge from simpler, more restricted underlying representations and/ or phonological processes (rules) that may not be consistent with those of the target language. Despite their long history and their many contributions to our understanding of fully developed and developing sound systems, rule-based accounts are not without their own problems. Some of those problems relate directly to questions about young children’s developing phonologies. For example, why would children have processes in their phonology that are not evident in the language being learned? Other problems relate to questions that are relevant to both fully developed and developing sound systems. For example, why do many of the same processes recur cross-linguistically? Also, why do different processes within and across languages (or children) often target the same phonological structures with different repairs? Questions of this sort motivated a shift away from rule-based theories in favor of the constraint-based approach of Optimality Theory (e.g. Prince and Smolensky 2004). The following section highlights some of the descriptive and experimental findings about phonological disorders that have emerged from and/or contribute to evaluations of Optimality Theory.

802 Dinnsen, Barlow, and Gierut

33.3 Constraint-based Accounts The constraint-based framework of Optimality Theory (e.g. Prince and Smolensky 2004) differs in several important respects from earlier rule-based approaches. First, there are no rules available to effect changes in the mapping from underlying to phonetic representations. Instead, phonological processes, specifically their targets and repairs, are accounted for by a language-specific ranking of universal, violable constraints. Constraints are of two fundamental types, namely markedness constraints and faithfulness constraints. Markedness constraints assign violations to output structures that are typologically marked, while faithfulness constraints assign violations to output candidates that differ from the input representation. Because constraints can conflict with one another, violations of higher ranked constraints are deemed more serious. The output candidate that best satisfies the constraint hierarchy is selected as optimal. Ranking a markedness constraint over an antagonistic faithfulness constraint will generally have the effect of inducing a change in the mapping of an underlying representation to the corresponding phonetic representation by favoring an unmarked and unfaithful candidate over a more marked but faithful candidate. The second significant difference is that Optimality Theory assumes that input representations are universal. This follows from the principle of “Richness of the Base” (e.g. Smolensky 1996a; Prince and Smolensky 2004) and means that there can be no language- specific (or child- specific) restrictions on underlying representations. This is clearly contrary to the standard position of rule-based accounts, which maintains that underlying representations must be restricted on a language-specific basis. Consequently, while some children might incorrectly internalize certain lexical items, their constraint hierarchies alone must account for the facts and, at the same time, allow for the possibility that the underlying representations could be as rich as necessary on universal grounds. This means that children’s underlying representations could be as rich as those of the target language, providing for a relational account of the substitution errors.

33.3.1 Exclusion of Sounds from the Phonetic Inventory and Substitution Patterns Some of the basic elements of Optimality Theory can be illustrated by reconsidering Child 179’s context-free error pattern of Velar Fronting (see the data in (1) above). This error pattern involves the exclusion of a target sound from the child’s inventory, and would, in terms of an independent rule-based analysis, have been achieved by a restriction on that child’s underlying representations. However, within Optimality Theory, Richness of the Base precludes such a characterization. Non-occurring sounds would instead be allowed to occur as possible inputs to the child’s underlying

Phonological Disorders 803 representations, and an undominated markedness constraint that militates against velar consonants would result in their exclusion from the phonetic inventory. The relevant markedness and faithfulness constraints for this error pattern are given in (9) along with two alternative tableaux for the same target word. The tableaux show that the ranking of *k over ID[place] converges on the selection of the observed output no matter what might be assumed about the underlying representation of target velars. (9) Inventory restrictions and substitution patterns Constraints (a) Markedness *k:

Dorsal consonants are banned

(b) Faithfulness ID[place]:

Corresponding segments must have identical place features

/koʊm/ ‘comb’

*k ID[place]

a.

koʊm

*!

b. ☞

toʊm

*

/toʊm/ ‘comb’

*k ID[place]

a.

koʊm

*!

b. ☞

toʊm

*

The eventual emergence of velar consonants and other new sounds in the child’s phonetic inventory would depend on the availability to the child of positive evidence to motivate the demotion of the markedness constraints below antagonistic faithfulness constraints. (See Chapter 30 by Jarosz in this volume for more on learning algorithms in constraint-based theories.) A similar explanation could be provided for other phenomena observed in phonological development, such as Stopping or Gliding, which would be attributed to a markedness constraint against fricatives or liquids (*Fricative or *Liquid, respectively) outranking faithfulness to manner features. In both cases, demotion of the markedness constraints below the relevant faithfulness constraints would allow for the missing sound classes to emerge in the child’s sound system. Additionally, when sounds, sound classes, or phonological structures participate in implicational relationships associated with typological universals or harmonic scales, treatment aimed at the most marked of these generally results in the most widespread generalization to other untreated sounds and structures (e.g. Dinnsen et al. 1990; Dinnsen et al. 1992; Tyler and Figurski 1994; Gierut 2008a; cf. Rvachew and Nowak 2001; Rvachew and Bernhardt 2010). For example, an implicational relationship between liquids and fricatives has been observed to hold across many languages

804 Dinnsen, Barlow, and Gierut such that the occurrence of liquids in the inventory implies the occurrence of fricatives (Dinnsen et al. 1990; Tyler and Figurski 1994). In optimality theoretic terms, this implicational relationship would be achieved by a fixed ranking of *Liquid over *Fricative. For a child who exhibits both Stopping and Gliding, the suggestion would be to focus treatment on liquids, because such treatment would be expected to cause both markedness constraints *Liquid and *Fricative to be demoted below the antagonistic faithfulness constraint. Fricatives would, thus, be predicted to emerge without directly treating that class of sounds. Treatment on fricatives, on the other hand, would demote only *Fricative, while *Liquid would remain highly ranked. The Gliding error pattern would, thus, persist and require an additional round of treatment aimed at liquids.

33.3.2 Error Patterns Involving Complementary Distribution of Sounds Optimality Theory provides a straightforward account of children’s various error patterns involving the complementary distribution of sounds. To illustrate, recall the data in (5) from Child NE, who produced coronals and velars in complementary distribution, resulting in seemingly contradictory and inconsistent error patterns. An optimality theoretic account of these facts is sketched in (10). Consistent with accounts of allophonic phenomena in fully developed languages, a context-sensitive markedness constraint (*t/[back]) must be ranked over a context-free markedness constraint (*k), which in turn must be ranked over a faithfulness constraint (ID[place]). Moreover, the tableaux in (10) assume target-appropriate underlying representations for the initial consonants of these words (as expected under Richness of the Base), but the constraint hierarchy essentially guarantees that the attested outputs are selected as optimal, no matter what might be assumed about the child’s underlying representations. (10) Account of complementary distribution Constraints (a) Markedness *t/[back]: Coronal consonants are banned before back vowels *k: Dorsal consonants are banned (b) Faithfulness ID[place]:

Corresponding segments must have identical place features

Ranking: *t/[back] >> *k >> ID[place]

Phonological Disorders 805 /tɑm/ ‘Tom’ a.

tɑ

*t/[back]

*t/[back]

to /ɡet/ ‘gate’

a.

*

*

*k

ID[place]

*

a. ☞ ko b.

ID[place]

*!

b. ☞ kɑ /koʊm/ ‘comb’

*k

*! *t/[back]

ɡeʔ

* *k *! *

b. ☞ deʔ /dir/ ‘deer’

ID[place]

*t/[back]

*k

ID[place]

*!

*

a. ☞ dɪʊ b.

kɪʊ

Optimality theoretic accounts of this sort also help to explain why children with such error patterns might find it difficult to effect a phonemic split. That is, the complete suppression of the error pattern requires that both markedness constraints be demoted below the antagonistic faithfulness constraint. For this to happen, the child must recognize two important facts about English. One of those critical facts that Child NE must take note of is that coronals can and do occur before back vowels. Recognition of this fact alone would, however, not completely solve the problem because it would only motivate the demotion of *t/[back] below ID[place], leaving *k undominated with the consequence that velars would no longer occur in the phonetic inventory. The other fact, then, that the child must take note of is that velars can and do occur (before all types of vowels). Recognition of this fact should motivate the demotion of *k below ID[place], allowing velars to emerge phonetically from target-appropriate underlying representations that include velars. Optimality theoretic analyses of this sort can be used to inform clinical treatment by identifying the most beneficial treatment stimuli for presentation to the child. For example, Child NE’s treatment might involve presentation of word pairs in which coronals occur before back vowels and velars occur before front vowels (e.g. “toe” versus “key”). Such word pairs would be simultaneously targeting both markedness constraints for demotion.

806 Dinnsen, Barlow, and Gierut

33.3.3 Individual Differences and Morpho-phonological Alternations The constraint-based framework of Optimality Theory would also approach the facts about Final Consonant Omission for Child A and Child C (introduced in section 33.2.2) from a very different perspective. Consistent with Richness of the Base, Child A and Child C would be claimed to have the same correctly internalized underlying representations and a highly ranked markedness constraint banning coda consonants (NoCoda), which in turn dominates the antagonistic faithfulness constraint prohibiting deletion (Max). The tableau in (11) illustrates the account of “dog” words for both children. (11) Final Consonant Omission (a) Markedness NoCoda: Coda consonants are banned (b) Faithfulness Max: Input segments must have corresponding output segments (no deletion) Ranking: NoCoda >> Max /dɑɡ/

NoCoda Max

a.

[dɑɡ]

b. ☞

[dɑ]

*! *

The difference in the two children’s hierarchies would relate to the ranking of an output-to-output correspondence constraint (BA-Faith), which demands identity between the base word and the affixed form of those words (e.g. Benua 1997). Markedness constraints and output-to-output correspondence constraints have been argued to outrank faithfulness constraints in the initial-state (e.g. Smolensky 1996a; McCarthy 1998; Hayes 2004; Prince and Tesar 2004). Child C reflects that default ranking with the error pattern of the base word being carried over into the derived form of those words. Child A’s alternating outputs suggest that BA-Faith had been demoted below Max. That demotion would presumably have been based on positive evidence, namely the child’s observation that consonants occur between vowels in relevant words. (12) Different rankings of the output-to-output correspondence constraint (a) Output-to-output correspondence constraint BA- Faith: The base of derived and non-derived words must be identical

Phonological Disorders 807

Child C: BA-Faith, NoCoda >> Max /dɑɡ-i/ Base: [dɑ] a. ☞

[dɑ-i]

b.

[dɑɡ-i]

BA-Faith

NoCoda

Max *

*!

Child A: NoCoda >> Max >> BA-Faith /dɑɡ-i/ Base: [dɑ] a.

NoCoda

[dɑ-i]

b. ☞ [dɑɡ-i]

Max

BA-Faith

*! *

Optimality theoretic accounts of this sort also offer an explanation for the individual learning patterns that had previously been attributed to differences in children’s knowledge of underlying representations. Let us reconsider in these newer terms the treatment and subsequent learning patterns of Child A and Child C (Dinnsen and Elbert 1984). Both children were taught word-final consonants, compelling the demotion of NoCoda. In the case of Child C, the suppression of Final Consonant Deletion and the sustained dominance of BA-Faith resulted in the occurrence of final consonants, yielding correctly realized base forms of words, which thereby allowed the corresponding word-medial consonants to be realized without directly treating that context. The results of treatment for Child A were more limited because BA-Faith had already been demoted on its own (naturally) prior to treatment. Summing to this point, the general conclusion that emerges from these optimality theoretic accounts of phonological disorders is that children’s underlying representations do not need to be restricted on a child-specific basis. Also, seemingly anomalous phenomena follow naturally from the same principles and mechanisms relevant to fully developed languages. This has the consequence of shifting the focus more sharply on the nature of the children’s constraints and constraint hierarchies. In what follows, we turn attention to error patterns that involve consonant clusters and “conspiracies,” bearing directly on the substance of those hierarchies with mutual benefits for theory and development.

33.3.4 Consonant Clusters One particular aspect of phonological acquisition that has attracted considerable descriptive, theoretical, and experimental attention is children’s acquisition of consonant clusters. It has long been known that clusters pose production problems for children with normal and disordered phonologies, resulting in a variety of error patterns

808 Dinnsen, Barlow, and Gierut (e.g. Grunwell 1982; Ingram 1989; Chin and Dinnsen 1992; Smit 1993b; Bernhardt and Stemberger 1998). One of the most common error patterns for both groups of children is to reduce the cluster to the less sonorous segment of the cluster, generally an obstruent (e.g. Barlow 1997, 2001; Ohala 1999; Pater and Barlow 2003; Gnanadesikan 1995/2004). Despite observed individual differences across children, optimality theoretic accounts reveal comparable constraint hierarchies, which entail an intricate interplay of constraints involving syllable structure, segmental inventories, conspiracies, and harmonic scales, especially with regard to sonority. We highlight a few of the descriptive studies that have focused on clusters in disordered phonologies, with novel findings brought to bear on theoretical issues, and then turn attention to some of the findings that have emerged from experimental treatment studies.

33.3.4.1 Descriptive Studies It is widely held that true onset clusters are governed by the sonority sequencing principle and minimal distance considerations (e.g. Selkirk 1982; Clements 1990; Kenstowicz 1994). This means that the sequence of segments in an onset cluster tends to rise in sonority, and that a small sonority difference between segments in a cluster implies the occurrence in that language of other clusters with larger sonority differences. For example, assuming the sonority scale in (13), the onset cluster /kl/would rise in sonority and would have a smaller sonority difference compared to /kw/. Consequently, the occurrence of relatively marked clusters such as /kl/should imply the occurrence of less marked onsets with an obstruent plus glide, but not vice versa. (13) Sonority scale Obstruents >> Nasals >> Liquids >> Glides Low sonority High sonority /s/+obstruent clusters clearly do not comply with these principles and have been argued for many languages to behave differently on other grounds, motivating an alternate structural representation. One proposal associates /s/to the syllable as an adjunct and the following obstruent alone to the onset (e.g. Davis 1990; Barlow 1997). Other types of /s/clusters with a following nasal, liquid, or glide do at least rise in sonority, although the sonority difference between /s/and a nasal is smaller than is otherwise permitted in English. This makes the status of /s/+sonorant clusters, whether adjuncts or true clusters, somewhat ambiguous. At the very least, the small sonority difference associated with /s/+nasal clusters suggests that larger sonority differences should occur, and they obviously do in English. However, the occurrence of /s/+nasal clusters begs the question of why other stop+nasal clusters with larger sonority differences do not occur. The more interesting developmental and theoretically relevant question is: Do children acquire /s/+sonorant clusters in conformity with the predicted markedness of these clusters, especially as might follow in Optimality Theory from a fixed ranking of constraints? Farris-Trimble and Gierut (2008) addressed this question in a cross-sectional and longitudinal investigation of 110 children with phonological disorders (age 3;0 to 7;4).

Phonological Disorders 809 Eighty of 110 children presented with no /s/ clusters. Of the remaining 30 children, 26 presented with /s/+nasal clusters, 7 presented with /s/+liquid clusters, and 22 presented with /s/+glide clusters. Overall, /s/+nasal clusters were produced with the greatest frequency of occurrence across children, closely followed by /s/+glide clusters. Both were produced with much greater frequency than /s/+liquid clusters. These cross-sectional results are suggestive of a markedness gap in children’s inventories of /s/+sonorant clusters with most marked /s/+nasal and least marked /s/+glide clusters being acquired without /s/+liquid clusters, which are presumably of intermediate markedness, at least as far as sonority distance is concerned. This, of course, assumes that the /s/+sonorant clusters constitute a homogeneous class. Alternatively, it could be that some /s/+sonorant clusters are represented with an adjunct, while the others would be represented as a true cluster. The longitudinal development of each of the 30 children who presented with some /s/ cluster types further showed that more than a third acquired /s/+nasal clusters and /s/+glide clusters, but not /s/+liquid clusters. While some of these children excluded liquid consonants altogether from their phonetic inventories, it is noteworthy that several others did produce singleton liquids, but not /sl/clusters. Given standard assumptions about fixed rankings of constraints within Optimality Theory (e.g. Prince and Smolensky 2004), gaps of this sort are not expected. Farris-Trimble and Gierut showed, however, that gapped inventories follow quite naturally if faithfulness to the marked is incorporated into the theory, as is customary in Stringency Theory (e.g. de Lacy 2002, 2006). Stringency theory maintains that a markedness constraint can ban a particular structure, but that same constraint must also ban every related structure that is more marked. On the other hand, stringently formulated faithfulness constraints can preserve any given structure, but must also preserve every related structure that is more marked. Because stringently formulated constraints are freely permutable, they can be interleaved with one another in the hierarchy to yield gapped inventories of precisely the type documented for /s/clusters. (14) Gapped inventory (a) Stringent markedness constraints *sn-sl: Initial s+nasal and s+liquid clusters are banned (b) Stringent faithfulness constraints ID[sn]: Corresponding segments in an s+nasal cluster must be identical ID[sn-sl]: Corresponding segments in an s+nasal or s+liquid cluster must be identical /sneɪk/ ‘snake’

ID[sn]

*sn-sl

ID[sn-sl]

*

a. ☞

sneɪk

b.

sleɪk

*!

c.

sweɪk

*!

*

* *

810 Dinnsen, Barlow, and Gierut /slip/ ‘sleep’

ID[sn]

*sn-sl

a.

slip

*!

b.

snip

*!

c. ☞

swip /swit/ ‘sweet’

ID[sn-sl] * *

ID[sn]

*sn-sl

a. ☞

swit

b.

snit

*!

c.

slit

*!

ID[sn-sl]

Related to these findings, it is also noteworthy that Farris(-Trimble) and Gierut (2005) explored the input characteristics of word frequency and neighborhood density of English words with /s/ clusters. They found that the /sl/cluster occurred in words of greater frequency than those comprised of the /sn/cluster. Moreover, the /sl/cluster occurred in words with the highest density compared to the density of words that contained all the other /s/ clusters. The observed gap in /s/clusters, thus, runs counter to standard expectations about the role of the input. Yavaş (2010) summarizes the findings from another series of descriptive studies devoted to cross-linguistic acquisition of /s/clusters by children with phonological disorders (including English, Hebrew, Spanish, Dutch, and Croatian). Those findings provide strong support for Pater and Barlow’s (2003) factorial typology, and the common tendency for /s/+nasal clusters to reduce to the nasal. Perhaps more generally, reduction of /s/+noncontinuant clusters typically involves retention of the second consonant. For comparable findings from cross-linguistic studies of typical development, see Yavaş et al. (2008). A different issue relating to children’s acquisition of onset clusters has been raised by the intriguing claim that the second consonant of the onset cluster depends on the occurrence of that same consonant in a coda (e.g. Baertsch 2002). Baertsch’s proposal, which is cast in optimality theoretic terms and based largely on fully developed languages, provides a principled explanation for why onset clusters would depend on the occurrence of coda consonants. This Split-Margin Approach to syllable structure employs a complex set of locally conjoined markedness constraints based on the fixed peak and margin hierarchies of Prince and Smolensky (2004). In terms of acquisition, this would mean, for example, that a child would need to acquire /l/in a coda before an onset cluster such as /kl/could be acquired. The plausibility of this claim for acquisition is supported by those developmental studies that have shown that children acquire CVC syllables prior to onset clusters (e.g. Fikkert 1994; Lleó and Prinz 1996; Levelt et al. 2000). However, Bernhardt and Stemberger (1998) have suggested that children with

Phonological Disorders 811 phonological disorders differ from those with typical development by acquiring onset clusters before acquiring coda consonants. A first systematic investigation of this issue with a focus on children with disorders appeared in a descriptive study by Barlow and Gierut (2008). For the purposes of that study, 16 children with phonological disorders (age 3;0 to 8;6) who produced consonant plus liquid (CLV) clusters were identified. It was then determined whether these children also produced coda liquids (CVL). The findings were mixed at best and revealed the importance of specifying the criteria for judging a structure (e.g. cluster or coda) to be acquired or not (see Sander 1972 for arguments). No matter how the data were analyzed (i.e. combined independent and relational analyses, independent analyses alone, or relational analyses alone), the Split Margin Approach received only partial support. More specifically, using the most conservative (strict) criterion, as few as 35 percent of the grammars showed support for the Split Margin proposal. Employing a more generous analysis criterion, one which defined acquisition as simply a two-time occurrence of the structures in question, it was found that as many as 68 percent of the grammars supported the claim that CLV clusters imply the occurrence of CVL. As Barlow and Gierut (2008) note, the children who failed to comply with the Split Margin hypothesis by producing clusters without a corresponding coda consonant may not represent true counterexamples to the claim because they may have represented their putative onset clusters as complex (branching) segments, similar to affricates (e.g. Menyuk 1972; Barton et al. 1980; Barlow and Dinnsen 1998; Gierut and O’Connor 2002). We suspect that the rigorous and systematic application of criteria similar to those employed in Barlow and Gierut’s study would yield comparable results for typical development or even for fully developed languages. Thus, it may be premature to consider the relationship (or lack thereof) between onset clusters and coda consonants as a defining property of a disordered phonology. One of the earliest and most serious problems for rule-based theories was the discovery in fully developed languages of phonological conspiracies, that is, phenomena involving several processes that are structurally quite different from one another, but that all function together in a language to achieve the same end (e.g. Kisseberth 1970; Kiparsky 1976). The problem was that rule-based theories provided no straightforward way to express the phonotactic generalization or to relate the rules that participated in the conspiracy. Optimality Theory, on the other hand, does capture such phonotactic generalizations and can generally account for the different repairs by a characteristic hierarchy involving at least one highly ranked markedness constraint, which dominates two or more crucially ranked faithfulness constraints (e.g. Kager 1999a; Baković 2000b; McCarthy 2002). The default ranking of markedness over faithfulness (e.g. Smolensky 1996a) begins to provide some of the essentials of a conspiracy, making young children’s developing phonologies an especially fruitful venue for the investigation of conspiracies. While the conspiracy problem was also recognized early on in accounts of young children’s developing phonologies (e.g. Smith 1973; Menn 2004), it was not until the advent of Optimality Theory that acquisition researchers were able to return to the problem with a new and insightful solution. Conspiracies do in fact occur in

812 Dinnsen, Barlow, and Gierut developing phonologies. For example, a conspiracy identified by Łukaszewicz (2007) occurred in the phonology of a typically developing child (age 4;0–4;4), who was acquiring Polish as her first language. This child employed four different strategies for reducing onset clusters: deletion and coalescence (both found in word-initial clusters), and metathesis and gemination (found in word-medial clusters). Łukaszewicz argued that each of these strategies was a way to satisfy a highly ranked markedness constraint against onset clusters (*ComplexOnset). The interaction between this constraint and other faithfulness constraints determined which strategy the child employed as a repair. Children with phonological disorders have also been found to exhibit conspiracies (e.g. Pater and Barlow 2003; Dinnsen and Farris-Trimble 2008a; Dinnsen 2011). These conspiracies take on added significance because of their clinical implications in the design of treatment and the projection of learning. For example, Pater and Barlow (2003) identified a conspiracy in the sound system of a child with a phonological disorder, Child LP65 (age 3;8). The conspiracy involved different repairs to produce outputs that satisfied a highly ranked markedness constraint banning fricatives (*Fricative). Singleton fricatives were produced as stops (e.g. ‘sun’ > [tʌn]), and fricatives in clusters were deleted (e.g. ‘swim’ > [wɪm]). As shown in (15), the Stopping of singleton fricatives was achieved by ranking *Fricative over the faithfulness constraint against deletion (Max), which in turn was ranked over the faithfulness constraint that demanded identity in manner features (ID[continuant]). In fricative+sonorant clusters, LP65 deleted the fricative. Deletion was compelled by another highly ranked markedness constraint that banned clusters (*ComplexOnset). The important point is that it was the fricative that was deleted rather than the sonorant. This was achieved by ranking *Fricative over the hierarchy of markedness constraints that govern onsets in terms of sonority (i.e. *Glide-Onset). Thus, even a glide was a better onset than a fricative in this case, despite the general cross-linguistic tendency for onset cluster reduction to favor the preservation of the obstruent (e.g. Pater and Barlow 2003). The ranking of ID[continuant] over *Glide-Onset also explained why the fricative in a cluster did not become a stop. (15) Conspiracy to avoid fricatives Constraints: (a) Markedness *Fricative: Fricatives are banned *Complex-Onset: Complex (branching) onsets are banned *Glide-Onset: Glides are banned in onsets (b) Faithfulness Max: Input segments must have corresponding output segments (no deletion)

Phonological Disorders 813 ID[cont]: C orresponding input and output segments must have the same value for the feature [continuant]

/sʌn/ ‘sun’ a.

sʌn

*Fric *Comp-Onset Max ID[cont] *Glide-O nset *! *

b. ☞ tʌn c.

ʌn /swɪm/ ‘swim

a.

swɪm

b.

twɪm

*! *Fric *Comp-Onset Max ID[cont] *Glide-O nset *!

*!

sɪm

e.

tɪm

* *

c. ☞ wɪm d.

*

*!

*

* *

*!

The identification of conspiracies in disordered phonologies and their characterization within Optimality Theory have important clinical implications. Conspiracies pose special opportunities and special challenges for treatment. Most importantly, conspiracies reveal that, while error patterns may differ, some of those error patterns are, in fact, related and have the same source. Additionally, conspiracies in fully developed languages have been assumed to represent stable states, largely because they result in transparent (surface-true) generalizations (e.g. Kiparsky 1976). Thus, we might expect conspiracies to be resistant to clinical treatment. Similarly, to fully eradicate the different error patterns associated with the conspiracy, the prediction would be that it would be necessary to first target for treatment those structures that the highly ranked markedness constraint prohibits (e.g. *Fricative for LP65). It is that constraint that presumably lies at the heart of the various error patterns participating in the conspiracy. Second, because a conspiracy generally involves two or more dominated, but crucially ranked faithfulness constraints, it would also be important to demote that highly ranked markedness constraint below the lowest ranked of these faithfulness constraints (e.g. for LP65, below ID[cont], which is ranked below Max). This now presents a very interesting and clinically valuable prediction that some treatment plans should be more efficacious than others in eradicating the error patterns associated with conspiracies. The most efficacious treatment of a conspiracy would be expected to target those structures banned by the top ranked markedness constraint in a context where the different repairs could coincide (i.e. where the various faithfulness constraints were violated). This might

814 Dinnsen, Barlow, and Gierut mean in the case of LP65 that fricatives should be targeted for treatment in clusters (e.g. “swim” versus “twin”) rather than in singletons (e.g. “sun” versus “ton”). The reason is that clusters provide the context for deletion, and the occurrence of fricatives in clusters would entail a manner contrast in contradiction to the stopping error pattern. Despite the fertile ground that phonological disorders offer for the investigation of conspiracies, to our knowledge, there has been no systematic determination of how conspiracies change or are lost in the course of normal development and/or as a result of clinical treatment. Future treatment studies that target conspiracies by focusing on the demotion of different markedness constraints hold much promise for revealing the life cycle of those conspiracies.

33.3.4.2 Experimental Studies We shift attention now to some results that have emerged from the unique vantage of clinical treatment experiments that enrolled children with phonological disorders. Recalling the discussion in section 33.3.4.1 regarding true clusters, adjuncts, and affricates, a series of treatment studies employing a single-subject experimental design has illuminated the relationships between these structures. In one study (Gierut 1999), children were assigned to one of two treatment conditions. Two groups of children were taught a more (or less) marked true cluster based on sonority differences, and a third group of children was taught an adjunct cluster. Exposure to true clusters resulted in gradient learning. That is, children taught more marked clusters (i.e. those with smaller sonority differences) evidenced the broadest generalization across the sonority spectrum. True clusters also prompted linear learning: unmarked clusters with the largest sonority difference were most accurate, and this scaled down incrementally, such that clusters with the smallest sonority difference had least accuracy. True clusters also prompted acquisition of affricates. In comparison, children who were taught adjunct clusters had gapped learning patterns as defined by sonority. For example, they might have acquired clusters with large and small sonority differences, but clusters with intermediate sonority differences were not learned. The magnitude of learning true clusters was also suppressed, with minimal levels of accuracy. Further, all children taught adjunct clusters learned these with high degrees of accuracy at posttreatment, but none learned affricates. Similar findings have emerged from treatment studies of children with phonological disorders who were acquiring Spanish as their first language (e.g. Anderson 2002; Barlow 2005). In a follow-up study by Gierut and Champion (1999b), children were taught three- element clusters (e.g. /str/), which combine an adjunct with a true cluster. The findings from that study replicated and expanded the earlier findings. That is, while children did not generalize to the three-element clusters, they did achieve high levels of accuracy of adjunct clusters, true clusters, and affricates. In the final leg of this series, Gierut (2008a) examined the relationship between clusters and affricates by teaching some children true clusters and other children affricates. As background, Lleó and Prinz (1996, 1997) traced the emergence of syllable structure in typically developing toddlers learning German or Spanish as a first language, and found that

Phonological Disorders 815 these children acquired onsets as simple singletons before they learned affricates, with these occurring in advance of clusters. Based upon this, Lleó and Prinz hypothesized that evidence of branching structure was a phonological trigger that motivated the expansion of syllable structure in acquisition. Specifically, they proposed that branching at the level of the segment is prerequisite to branching at the level of the onset. This suggests that the occurrence of onset clusters implies affricates, which in turn implies singletons, but not the reverse. Onset clusters might then be judged to be more marked than affricates. The results from Gierut (2008a) revealed that teaching clusters (i.e. the marked manipulation) resulted in widespread generalization to other clusters. Following treatment, children produced a variety of cluster types ranging from those with a small sonority difference (e.g. /sm, sn/) to those with a larger sonority difference (e.g. /tw, kw/). Teaching a cluster had the further effect of inducing generalization in children’s use of affricates, even though affricates were never taught. Thus, teaching clusters had the predicted effect of inducing generalization to both marked and unmarked structures. However, teaching affricates (i.e. the relatively unmarked manipulation) yielded a different set of findings. Children who were taught affricates evidenced minimal generalization to this class (i.e. less than 7 percent improvement), which is surprising because affricates were directly treated (cf. Rvachew and Bernhardt 2010). Taken together, the collective set of findings from this series of experimental studies on clusters is consistent with and supportive of optimality theoretic claims about phonological structure and phonological complexity, as borne out here in the context of a “disordered” population, under experimental circumstances, and with respect to language learning.

33.4 Conclusion This chapter has highlighted some of the findings that have emerged from a much larger body of descriptive and experimental research on phonological disorders conducted within current rule-based and constraint-based frameworks. Special attention was given to the issue of children’s underlying representations and the processes that map those representations onto the phonetic output. The controversies surrounding this issue have been dealt with in very different ways in the different theoretical frameworks and paint very different pictures of children’s phonologies and acquisition. One of the main differences relates to the optimality theoretic principle of Richness of the Base, which precludes language-specific restrictions on underlying representations. The illustrative examples presented here demonstrate that Optimality Theory can account for the facts of phonological disorders without restricting the children’s underlying representations. This shifts the focus of inquiry to the children’s constraint hierarchies as the explanation for their error patterns. When viewed from this perspective, phonological disorders look very similar to typically developing and fully developed phonologies. That is, disordered systems exhibit many of the same phenomena seen in these other systems, including allophonic and neutralization processes, output-to-output correspondence relations, conspiracies,

816 Dinnsen, Barlow, and Gierut and principled restrictions on onset consonant clusters. Phonological disorders also offer the unique opportunity to evaluate theoretical claims through experimental treatment studies that attempt to induce change in the children’s constraint hierarchies. Clearly, many issues remain, but research on phonological disorders stands to benefit from and contribute to future developments in phonological theory.

Acknowledgments We are especially grateful to Michael Dow, Christopher Green, Adam Jacobson, Louise Lam, Michele Morrisette, Joe Pater, Traysa Sprankles, Juliet Stanton, and an anonymous reviewer for their comments and help in the preparation of this manuscript. This work was supported in part by a grant to Indiana University from the National Institutes of Health (DC001694).

References

Abbot-Smith, K. and Behrens, H. (2006). How known constructions influence the acquisition of other constructions: The German passive and future constructions. Cognitive Science, 30: 995–1026. Abbot-Smith, K., Lieven, E., and Tomasello, M. (2001). What pre-school children do and do not do with ungrammatical word orders. Cognitive Development, 16: 679–92. Abdulkarim, L. and Roeper, T. (1997). Economy of representation: Ellipsis and NP reconstructions. In R. Shillcock (ed.), Language Acquisition: Knowledge representation and processing. Edinburgh, UK: Human Communication Research Center. Abels, Klaus (2003). Successive cyclicity, anti-locality, and adposition stranding. Unpublished Ph.D. thesis, University of Connecticut. Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh: University of Edinburgh Press. Abney, S. (1987). The English noun phrase in its sentential aspect. Unpublished Ph.D. thesis, MIT. Abney, Steven (1996). Statistical methods and linguistics. In J. L. Klavans and P. Resnik (eds), The Balancing Act: Combining symbolic and statistical approaches to language. Cambridge, MA: MIT Press, 1–26. Abrahams, B. S., Tentler, D., Perederiy, J. V., Oldham, M. C., Coppola, G., and Geschwind, D. H. (2007). Genome-wide analysis of human perisylvian cerebral cortical patterning. Proceedings of the National Academy of Sciences, 104: 17849–54. Abramson, Arthur S. and Lisker, Leigh (1970). Discriminability along the voicing continuum: Cross-language tests. In Proceedings of Sixth International Conference of Phonetic Sciences, 569–73. Ackerman, B. (1981). When is a question not answered? The understanding of young children of utterances violating or conforming to the rules of conversational sequencing. Journal of Experimental Child Psychology, 31: 487–507. Adam, Galit (2002). From variable to optimal grammar: Evidence from language acquisition and language change. Unpublished Ph.D. thesis, Tel-Aviv University. Adam, Galit and Bat-El, Outi (2008). The Trochaic bias is universal: Evidence from Hebrew. In Anna Gavarró and M. Joao Freitas (eds), Language Acquisition and Development: Proceedings of GALA 2007. Cambridge: Scholars Publishing, 12–24. Adani, F. (2009). Re- thinking the acquisition of relative clauses in Italian: Towards a grammatically-based account. Journal of Child Language, 22: 1–25. Adger, D. (2003). Core Syntax: A minimalist approach. Oxford: Oxford University Press. Adone, D. (2012). The Acquisition of Creole Language: How children surpass their input. New York: Cambridge University Press. Aguado-Orea, J. J. (2004). The acquisition of morpho-syntax in Spanish: Implications for current theories of development. Unpublished Ph.D. thesis, University of Nottingham.

818 References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6): 716–23. Akers, Crystal (2011). Simultaneous learning of hidden linguistic structures. Unpublished Ph.D. thesis, Rutgers University. Akhtar, N. (1999). Acquiring basic word order: Evidence for data-driven learning of syntactic structure. Journal of Child Language, 26: 339–56. Akhtar, N. and Tomasello, M. (1997). Young children’s productivity with word order and verb morphology. Developmental Psychology, 33: 952–65. Aksu-Koç, Ayhan (1988). The Acquisition of Aspect and Modality: The case of past reference in Turkish. Cambridge: Cambridge University Press. Aksu-Koç, A., Avci, G., Aydin, C., Sefer, N., and Yasa, Y. (2005). The relation between mental verbs and ToM performance: Evidence from Turkish children. Paper presented at the IASCL Convention, Berlin, July. Alarcon, M., Abrahams, B. S., Stone, J. L., Duvall, J. A., Perederiy, J. V., Bomar, J. M., Sebat, J., Wigler, M., Martin, C. L., Ledbetter, D.H ., Nelson, S. F., Cantor, R. M., and Geschwind, D. H. (2008). Linkage, association, and gene-expression analysis identify CNTNAP2 as an autism-susceptibility gene. American Journal of Human Genetics, 82: 150–9. Albright, Adam (2002). The identification of bases in morphological paradigms. Unpublished Ph.D. thesis, UCLA. Albright, Adam (2007). How many grammars am I holding up? In Charles B. Chang and Hannah J. Haynie (eds), Proceedings of WCCFL26. Somerville MA: Cascadilla Press, 1–20. Albright, Adam and Hayes, Bruce (2002). Modeling English past tense intuitions with minimal generalization. In Michael Maxwell (ed.), Proceedings of the Sixth Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON). Philadelphia: ACL. Albright, Adam and Hayes, Bruce (2003). Rules vs. analogy in English past tenses: A computational/experimental study. Cognition, 90: 119–61. Albright, Adam and Hayes, Bruce (2011). Learning and learnability in phonology. In John A. Goldsmith, Jason Riggle, and Alan C. Yu (eds), The Handbook of Phonological Theory. Malden, MA: Wiley-Blackwell, 661–90. Albright, Adam, Magri, Giorgio and Michaels, Jennifer (2008). Modeling doubly marked lags with a split additive model. In Harvey Chan, Heather Jacob, and Enkeleida Kapia (eds), BUCLD 32: Proceedings of the 32nd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 36–47. Alderete, John and Tesar, Bruce (2002). Learning covert phonological interaction: An analysis of the problem posed by the interaction of stress and epenthesis. Unpublished manuscript, Rutgers University, New Brunswick, NJ. Alderete, John, Brasoveanu, Adrian, Merchant, Nazarré, Prince, Alan and Tesar, Bruce (2005). Contrast analysis aids the learning of phonological underlying forms. In John Alderete, Chung-hye Han, and Alexei Kochetov (eds), Proceedings of the 24th West Coast Conference on Formal Linguistics. Somerville, MA: Cascadilla Proceedings Project, 34–42. Alegre, Maria A. and Peter Gordon. (1996). Red rats eater exposes recursion in children’s word formation. Cognition, 60(1): 65–82. Aleksandrov, Aleksandr (1883). Detskaja reč’. Russkij Filosofskij Vestnik X: 86–120. Alexiadou, Artemis (2001). Functional structure in Nominals: Nominalization, and ergativity. Amsterdam: John Benjamins. Alishahi, A. and Stevenson, S. (2008). A computational model for early argument structure acquisition. Cognitive Science, 32(5): 789–834.

References 819 Alishahi, A. and Stevenson, S. (2010). Learning general properties of semantic roles from usage data: A computational model. Language and Cognitive Processes, 25(1): 50–93. Allen, G. D. and Hawkins, S. (1978). The development of phonological rhythm. In A. Bell and J. Hooper (eds), Syllables and Segments. New York: Elsevier, 173–85. Allen, Margaret. (1978). Morphological investigations. Unpublished Ph.D. thesis, University of Connecticut. Allen, S. E. M. (1996). Aspects of Argument Structure Acquisition in Inuktitut. Amsterdam: John Benjamins. Allen, S. E. M. (2000). A discourse-pragmatic explanation for argument representation in child Inuktitut. Linguistics, 38: 483–521. Allen, S. E. M. (2008). Interacting pragmatic influences on children’s argument realization. In M. Bowerman and P. Brown (eds), Cross-linguistic Perspectives on Argument Structure. New York, NY: Lawrence Erlbaum Associates, 191–211. Allen, S. and Crago, M. (1996). Early passive acquisition in Inuktitut. Journal of Child Language, 23: 129–55. Allen-Brady, K., Miller, J., Matsunami, N., Stevens, J., Block, H., Farley, M., Krasny, L., Pingree, C., Lainhart, J., Leppert, M., McMahon, W. M. and Coon, H. (2009). A high-density SNP genome-wide linkage scan in a large autism extended pedigree. Molecular Psychiatry, 14: 590–600. Altvater-Mackensen, Nicole, and Fikkert, Paula (2010). The acquisition of the stop-fricative contrast in perception and production. Lingua, 120: 1898–909. Amaral, P. (2010). Almost means “less than”: Preschoolers’ comprehension of scalar adverbs. Poster presented at BUCLD 34. In J. Chandlee, K. Franich, K. Iserman, and L. Keil (eds), Supplemental Online Proceedings. Amaral, L. and Roeper, T. (2014). Multiple grammars and second language representation. Second Language Research, 30(1): 3–36. Ambridge, B. and Rowland, C. (2009). Predicting children’s errors with negative questions: Testing a schema-combination account. Cognitive Linguistics, 20(2): 225–66. Ambridge, B., Pine, J. M., Rowland, C. F., and Young, C. R. (2008). The effect of verb semantic class and verb frequency (entrenchment) on children’s and adults’ graded judgements of argument-structure overgeneralization errors. Cognition, 106(1): 87–129. Ambridge, B., Pine, J. M., Rowland, C. F., Jones, R. L, and Clark, V. (2009). A semantics-based approach to the “no negative-evidence” problem. Cognitive Science, 33(7): 1301–16. Ambridge, B., Pine, J. M., and Rowland, C. F. (2011). Children use verb semantics to retreat from overgeneralization errors: A novel verb grammaticality judgment study. Cognitive Linguistics, 22: 303–23. Andersen, Roger, and Yasuhiro, Shirai (1996). The primacy of aspect in first and second language acquisition: The pidgin-creole connection. In W. C. Ritchie and T. K. Bhatia (eds), Handbook of Second Language Acquisition. San Diego: Academic Press, 527–70. Anderson, J. R. (1990). The Adaptive Character of Thought. Hillsdale, NJ: Erlbaum. Anderson, Raquel T. (2002). Onset clusters and the sonority sequencing principle in Spanish: A treatment efficacy study. In F. Windsor, M. L. Kelly, and N. Hewitt (eds), Investigations in Clinical Phonetics and Linguistics. Mahwah, NJ: Erlbaum, 213–24. Anderson, Stephen R. (1981). Why phonology isn’t natural. Linguistic Inquiry, 12: 493–539. Angluin, Dana (1980). Inductive inference of formal languages from positive data. Information Control, 45: 117–35. Angluin, Dana (1982). Inference of reversible languages. Journal for the Association of Computing Machinery, 29: 741–65.

820 References Angluin, Dana (1988a). Queries and concept learning. Machine Learning, 2: 319–42. Angluin, Dana (1988b). Identifying languages from stochastic examples. Tech. Rep. 614, New Haven, CT: Yale University Press. Angluin, Dana (1990). Negative results for equivalence queries. Machine Learning, 5: 121–50. Angluin, Dana and Laird, Philip (1988). Learning from noisy examples. Machine Learning, 2: 343–70. Anisfield, M. and Gordon, M. (1968). On the psychophonological structure of English inflectional rules. Journal of Verbal Learning and Verbal Behavior, 7: 973–9. Anthoni, H., Sucheston, L. E., Lewis, B. A., Tapia-Paez, I., Fan, X., Zucchelli, M., Taipale, M., Stein, C. M., Hokkanen, M.-E., Castren, E., Pennington, B. F., Smith, S. D., Olson, R. K., Tomblin, J. B., Schulte-Korne, G., Nothen, M., Schumacher, J., Muller-Myhsok, B., Hoffman, P., Gilger, J. W., Hynd, G. W., Nopola-Hemmi, J., Leppanen, P. H. T., Lyytinen, H., Schoumans, J., Nordenskjold, M., Spencer, J., Stanic, D., Boon, W. C., Simpson, E., Makela, S., Gustaffson, J.-A., Peyrard-Janvid, M., Iyengar, S., and Kere, J. (2012). The aromatase gene CYP19A1: Several genetic and functional lines of evidence supporting a role in reading, speech, and language. Behavior Genetics, 42: 509–27. Anthoni, H., Zucchelli, M., Matsson, H., Muller-Myshok, B., Fransson, I., Schumacher, J., Masinen, S., Onkamo, P., Warnke, A., Griesemann, H., Hoffmann, P., Nopola-Hemmi, J., Lyytinen, H., Schulte-Korne, G., Kere, J., Nothen, M. M., and Peyrard-Janvid, M. (2007). A locus on 2p12 containing the co-regulated MRPL19 and C2ORF3 genes is associated to dyslexia. Human Molecular Genetics, 16: 667–77. Anthony, M., and Biggs, N. (1992). Computational Learning Theory. Cambridge: Cambridge University Press. Antinucci, F. and Miller, R. (1976). How children talk about what happened. Journal of Child Language, 3(2): 167–89. Anttila, Arto (2007). Variation and optionality. In Paul de Lacy (ed.), The Cambridge Handbook of Phonology. Cambridge: Cambridge University Press, 519–36. Apoussidou, Diana (2006). On-line learning of underlying forms. Unpublished manuscript, University of Amsterdam. Apoussidou, Diana (2007). The learnability of metrical phonology. Unpublished Ph.D. thesis, University of Amsterdam. Apoussidou, Diana and Boersma, Paul (2003). The learnability of latin stress. IFA Proceedings, 25: 101–48. Applegate, R. B. (1972). Ineseño Chumash grammar. Unpublished Ph.D. thesis, University of California. Aravind, A., de Villiers, J., Roeper, T., and Yang, C. (in preparation). Earliest Child Complement Forms. Archangeli, Diana and Pulleyblank, Douglas (1989). Yoruba vowel harmony. Linguistic Inquiry, 20: 173–217. Archer, Stephanie L. and Curtin, Suzanne (2011). Perceiving onset clusters in infancy. Infant Behavior and Development, 34: 534–40. Arii, T., Syrett, K., and Goro, T. (2014). Setting the standard in the acquisition of Japanese and English comparatives. In Proceedings of the 50th Annual Chicago Linguistic Society. Armon-Lotem, S. (1998). Mommy sock in a minimalist eye: On the acquisition of DP in Hebrew. In N. Dittmar and Ζ. Penner (eds), Issues in the Theory of Language Acquisition: Essays in honor of Jürgen Weissenborn. Bern: Peter Lang, 15–36. Armstrong, T. (2001). Research report. Amherst: University of Massachusetts.

References 821 Aronoff, J. M. (2003). Null subjects in child language: Evidence for a performance account. In G. Garding and M. Tsujimura (eds), Proceedings of the West Coast Conference on Formal Linguistics 22. Somerville, MA: Cascadilla Press, 43–55. Aronoff, M. (1994). Morphology by Itself. Cambridge, MA: MIT Press. Arosio, F., Adani, F., and Guasti, M. T. (2009). Grammatical features in the comprehension of Italian relative clauses by children. In A. Gavarrò et al. (eds), Merging Features: Computation, Interpretation and Acquisition. New York: Oxford University Press, 138–58. Arunachalam, S. and Waxman, S. R. (2010). Meaning from syntax: Evidence from 2-year-olds. Cognition, 110: 442–6. Arunachalam, S., Syrett, K., and Waxman, S. (submitted). Taking it nice and slow: Adverbs support verb learning in 2-year-olds. Aschermann, E., Gülzow, I., and Wendt, D. (2004). Differences in the comprehension of passive voice in German-and English-speaking children. Swiss Journal of Psychology, 63: 235–45. Aske, Jon (1989). Path predicates in English and Spanish: A closer look. In Kira Hall, Michael Meacham, and Richard Shapiro (eds), Proceedings of the 15th Annual Meeting of the Berkeley Linguistics Society. Berkeley, CA: Berkeley Linguistics Society, 5–16. Aslin, R. N. (1993). Segmentation of fluent speech into words: Learning models and the role of maternal input. In B. de Boysson-Bardies et al. (eds), Developmental Neurocognition: Speech and face processing in the first year of life. Dordrecht: Kluwer, 305–15. Aslin, R. N., Saffran, J. R., Newport, E. L. (1998). Computation of conditional probability statistics by human infants. Psychology Science, 9: 321–4. Asplin, K. (2002). Can complement frames help children learn the meaning of abstract verbs? Unpublished Ph.D. thesis, University of Massachusetts. Atkinson-King, K. (1973). Children’s acquisition of phonological stress contrasts. UCLA Working Papers in Phonetics 25. Au, T. K. and Glusman, M. (1990). The principle of mutual exclusivity in word learning: To honor or not to honor? Child Development, 61: 1474–90. Aungst, Lester and Frick, James V. (1964). Auditory discrimination ability and consistency of articulation of /r/. Journal of Speech and Hearing Disorders, 29: 76–85. Austin, J., Blume, M., Parkinson, D., Núñez del Prado, Z., and Lust, B. C. (1998). Interactions between pragmatic and syntactic knowledge in the first language acquisition of Spanish null and overt pronominals. In J. Lema and E. Treviño (eds), Theoretical Analyses on Romance Languages. Amsterdam: John Benjamins, 35–52. Avrutin, S. (1994). Psycholinguistic investigations in the theory of reference. Unpublished Ph.D. thesis, MIT. Avrutin, S. (1999). Development of the Syntax–Discourse Interface. Dordrecht: Kluwer Academic. Avrutin, S. (2004). Optionality in child and aphasic speech. Lingue e Linguaggio, 1(1): 67–9. Avrutin, S. (2006). Weak syntax. In Y. Grodzinsky and K. Amunts (eds), Broca’s Region Oxford: Oxford University Press, 49–62. Avrutin, S. and Coopmans, P. (1999). A syntax–discourse perspective on the acquisition of reflexives in Dutch. Paper presented at GALA, Potsdam. Avrutin, S. and Cunningham, J. (1997). Children and reflexivity. Proceedings of the 21th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 21, 13–23. Avrutin, S. and Thornton, R. (1994). Distributivity and binding in child grammar. Linguistic Inquiry, 25: 265–7 1.

822 References Avrutin, S. and Wexler, K. (1992). Development of Principle B in Russian: Coindexation at LF and coreference. Language Acquisition, 4: 259–306. Baauw, S. (2000). Grammatical features and the acquisition of reference. A comparative study of Dutch and Spanish. Unpublished Ph.D. thesis, Utrecht University. Baauw, S. and Cuetos, F. (2003). The interpretation of pronouns in Spanish language acquisition and breakdown: Evidence for the “Principle B Delay” as a non-unitary phenomenon. Language Acquisition, 11(4): 219–75. Baauw, S., Escobar, L., and Philip, W. (1997). A delay of Principle B effect in Spanish speaking children: The role of lexical feature acquisition. In A. Sorace, C. Heycock, and R. Shillcock (eds), Proceedings of the GALA ’97 Conference on Language Acquisition. Edinburgh, Scotland: Human Communication Research Centre. Baauw, S., Kuipers, M., Ruigendijk, E., and Cuetos, F. (2006). The production of SE en SELF anaphors in Spanish and Dutch children. In V. Torrens and L. Escobar (eds), The Acquisition of Syntax in Romance Languages, Language Acquisition and Language Disorders. Amsterdam: John Benjamins, 41:3–22. Baauw, S., Zuckerman, S., Ruigendijk, E., and Avrutin, S. (2011). Principle B Delays as a processing problem: Evidence from task effects. In A. Grimm, A. Müller, C. Hamann, and E. Ruigendijk (eds), Production-comprehension Asymmetries in Child Language. Berlin: De Gruyter, 247–72. Baayen, R. Harald (2008). Corpus linguistics in morphology: Morphological productivity. In A. Luedeling and M. Kyto (eds), Corpus Linguistics. An international handbook. Berlin: Mouton De Gruyter. Babyonyshev, Maria (1993). Acquisition of the Russian Case system. In Colin Phillips (ed.), Papers on Case and Agreement II, MIT Working Papers in Linguistics. Cambridge, MA: MIT Working Papers in Linguistics, 19: 1–43. Babyonyshev, M., Ganger, J., Pesetsky, D., and Wexler, K. (2001). The maturation of grammatical principles: Evidence from Russian unaccusatives. Linguistic Inquiry, 32: 1–44. Bach, Emmon (1986). The algebra of events. Linguistics and Philosophy, 9: 5–16. Bach, Emmon (1975). Long vowels and stress in Kwakiutl. Texas Linguistic Forum, 2: 9–19. Baertsch, Karen S. (2002). An optimality theoretic approach to syllable structure: The split margin hierarchy. Unpublished Ph.D. thesis, Indiana University. Baker, C. L. (1978). Introduction to Generative-transformational Syntax. Englewood Cliffs, NJ: Prentice-Hall. Baker, C. L. (1979). Syntactic theory and the projection problem. Linguistic Inquiry 10: 533–81. Baker, M. (1988). Incorporation: A theory of grammatical function changing. Chicago: University of Chicago Press. Baker, M. C. (1997). Thematic roles and syntactic structure. In L. Haegeman (ed.), Elements of Grammar: Handbook in generative syntax. Dordrecht, Netherlands: Kluwer, 73–137. Baker, M. C. (2008). The macroparameter in a microparameter world. In T. Biberauer (ed.), Limits of Syntactic Variation. Amsterdam: John Benjamins, 351–73. Baković, Eric (2000a). Harmony, dominance and control. Unpublished Ph.D. thesis, Rutgers University. Baković, Eric (2000b). The conspiracy of Turkish vowel harmony. Jorge Hankamer WebFest. . Baldwin, D. (1993a). Early referential understanding: Infants’ ability to recognize acts for what they are. Developmental Psychology, 29: 832–43. Baldwin, D. A. (1993b). Infants’ ability to consult the speaker for clues to word reference. Journal of Child Language, 20: 395–418.

References 823 Bale, A. (2006). The universal scale and the semantics of comparison. Unpublished Ph.D. thesis, McGill University. Bane, Max and Riggle, Jason (to appear). The typological consequences of weighted constraints. To appear in Proceedings of the Forty-Fifth Meeting of the Chicago Linguistic Society (2009). Bane, Max, Riggle, Jason and Sonderegger, Morgan (2010). The VC dimension of constraintbased grammars. Lingua, 120(5): 1194–208. Barker, C. (2002). Dynamics of vagueness. Linguistics and Philosophy, 25: 1–36. Barlow, Jessica (1997). A constraint-based account of syllable onsets: Evidence from developing systems. Unpublished Ph.D. thesis, Indiana University. Barlow, Jessica A. (2001). A preliminary typology of initial clusters in acquisition. Clinical Linguistics and Phonetics, 15(1/2): 9–13. Barlow, Jessica A. (2005). Phonological change and the representation of consonant clusters in Spanish: A case study. Clinical Linguistics and Phonetics, 19: 659–79. Barlow, Jessica A. and Dinnsen, Daniel A. (1998). Asymmetrical cluster development in a disordered system. Language Acquisition, 7: 1–49. Barlow, Jessica A. and Gierut, Judith A. (2008). A typological evaluation of the split margin approach to syllable structure in phonological acquisition. In Daniel A. Dinnsen and Judith A. Gierut (eds), Optimality Theory, Phonological Acquisition and Disorders. London: Equinox Publishing Ltd, 407–26. Barlow, Jessica A. and Keare, Amanda (2008). Acquisition of final voicing: An acoustic and theoretical account. In Ashley W. Farris-Trimble and Daniel A. Dinnsen (eds), Phonological Opacity Effects in Optimality Theory. Bloomington, IN: IULC Publications, 81–98. Barner, D. and Bachrach, A. (2010). Inference and exact numerical representation in early language development. Cognitive Psychology, 60: 40–62. Barner, D. and. Snedeker, J. (2008). Compositionality and statistics in adjective acquisition: 4- year-olds interpret tall and short based on the size distributions of novel noun referents. Child Development, 79: 594–608. Barner, D., Chow, K., and Yang, S. J. (2009). Finding one’s meaning: A test of the relation between quantifiers and integers in language development. Cognitive Psychology, 58: 195–219. Barner, D., Libenson, A., Cheung, P., and Takasaki, M. (2009). Cross-linguistic relations between quantifiers and numerals in language acquisition: Evidence from Japanese. Journal of Experimental Child Psychology, 103: 421–40. Barner, D., Brooks, N., and Bale, A. (2011). Accessing the unsaid: The role of scalar alternatives in children’s pragmatic inference. Cognition, 188: 87–96. Baron, I., Herslund, M., and Sørensen, F. (eds) (2001). Dimensions of Possession. Amsterdam/ Philadelphia: Benjamins. Bar-Shalom, E., S. Crain, and Shankweiler, D. (1993). A comparison of comprehension and production abilities of good and poor readers. Applied Psycholinguistics, 14: 197–227. Barth, H., Kanwisher, N., and Spelke, E. (2003). The construction of large number representations in adults. Cognition, 86(3): 201–21. Bartlett, C. W., Flax, J. F., Logue, M. W., Vieland, V. J., Bassett, A. S., Tallal, P., and Brzustowicz, L. M. (2002). A major susceptibility locus for specific language impairment is located on 13q21. American Journal of Human Genetics, 71: 45–55. Bartlett, C. W., Flax, J. F., Logue, M. W., Smith, B. J., Vieland, V. J., Tallal, P., and Brzustowicz, L. M. (2004). Examination of potential overlap in autism and language loci on chromosomes 2, 7, and 13 in two independent samples ascertained for Specific Language Impairment. Human Heredity, 57: 10–20.

824 References Barton, David (1976). The role of perception in the acquisition of phonology. Unpublished Ph.D. thesis, University College, London. Barton, David, Miller, Ruth, and Macken, Marlys A. (1980). Do children treat clusters as one unit or two? Papers and Reports on Child Language Development, 18: 105–37. Bartsch, K. and Wellman, H. M. (1995). Children Talk about the Mind. New York: Oxford University Press. Bartsch, R. and Vennemann, T. (1972a). Semantic Structures. Frankfurt: Athenäum. Bartsch, R. and Vennemann, T. (1972b). The grammar of relative adjectives and comparison. Linguistische Berichte, 20: 19–32. Barwise, J. (1979). On branching quantifiers in English. Journal of Philosophical Logic, 8(1): 47–80. Barwise, J. and Cooper, R. (1981). Generalized quantifiers and natural language. Linguistics and Philosophy, 4(2): 159–219. Basilico, D. (2003). The topic of small clauses. Linguistic Inquiry, 34: 1–35. Bat-El, Outi (2008). Morphologically conditioned V–∅ alternation in Hebrew: Distinctions among nouns, adjectives and participles, and verbs. In S. Armon-Lotem, G. Danon, and S. Rothstein (eds), Generative Approaches to Hebrew Linguistics. John Benjamins. Amsterdam, 197–222. Bates, E. (1976). Language and Context. New York: Academic Press. Bates, Elizabeth, and Elman, Jeffrey (1996). Learning rediscovered. Science, 274: 1849–50. Bates, E. and Goodman, J. C. (1997). On the inseparability of grammar and the lexicon: Evidence from acquisition, aphasia, and real-time processing. Language and Cognitive Processes, 12: 507–84. Bates, E. and MacWhinney, B. (1987). Competition, variation, and language learning. In B. MacWhinney (ed.), Mechanisms of Language Acquisition. Hillsdale, NJ: Erlbaum, 157–93. Bates, E., Marchman, V., Thai, D., Fenson, L., Dale, P., Reznick, J. S., Reilly, J., and Hartung, J. (1994). Developmental and stylistic variation in the composition of early vocabulary. Journal of Child Language, 21: 85–123. Battistella, E. (1996). The Logic of Markedness, New York: Oxford University Press. Bauer, Laurie (1978). The Grammar of Nominal Compounding, with Special Reference to Danish, English and French. Odense, Denmark: Odense University Press. Bauer, Laurie (2001). Morphological Productivity. Cambridge: Cambridge University Press. Beach, C., Katz, W. F. and Skowronski, A. (1996). Children’s processing of prosodic cues for phrasal interpretation. Journal of the Acoustical Society of America, 99: 1148–60. Beard, Robert (1995). Lexeme-morpheme Base Morphology: A general theory of inflection and word formation. Albany, NY: SUNY Press. Beard, Robert (1996). Base rule ordering of heads in nominal compounds. Paper presented at the 1996 Linguistic Society of America Meeting, San Diego, California (circulated manuscript, Bucknell University). Becerra-Bonache, Leonor, Horia Dediu, Adrian and Tîrnauca, Cristina (2006). Learning DFA from correction and equivalence queries. In ICGI, Lecture Notes in Computer Science. Dordrecht: Springer, 4201: 281–92. Becerra-Bonache, Leonor, Case, John, Jain, Sanjay, and Stephan, Frank (2010). Iterative learning of simple external contextual languages. Theoretical Computer Science, 411: 2741–56. Beck, S. (2011). Comparison constructions. In C. Maienborn, K. von Heusinger, and P. Portner (eds), Semantics: An international handbook of natural language meaning. Berlin: Mouton de Gruyter, 2: 1341–90.

References 825 Beck, Sigrid and Snyder, William (2001). Complex predicates and goal PP’s: Evidence for a semantic parameter. In A. H.-J. Do, L. Dominguez, and A. Johansen (eds), Proceedings of the 25th Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Beck, S., Oda, T., and Sugisaki, K. (2004). Parametric variation in the semantics of comparison: Japanese versus English. Journal of East Asian Linguistics, 13: 289–344. Beck, S., Krasikova, S., Fleischer, D., Gergel, R., Hofstetter, S. Savelsberg, C., Vanderelst, J., and Villalt, E. (2009). Crosslinguistic variation in comparison constructions. Linguistic Variation Yearbook, 9: 1–66. Becker, Michael (2009). Phonological trends in the lexicon: The role of constraints. Unpublished Ph.D. thesis, University of Massachusetts. Becker, M. and N. Hyams (2000). Modal reference in children’s root infinitives. In E. Clark (ed.), Proceedings of the Thirtieth Annual Child Language Research Forum. Stanford: CSLI, 113–22. Becker, Michael and Tessier, Anne-Michelle (2011). Trajectories of faithfulness in child-specific phonology. Phonology, 28: 163–96. Becker, Michael, Nevins, Andrew, and Ketrez, Nihan (2011). The Surfeit of the Stimulus: Analytic biases filter lexical statistics in Turkish laryngeal alternations. Language, 87(1): 84–125. Becker, Misha (2006). There began to be a learnability puzzle. Linguistic Inquiry, 37: 441–56. Becker, Misha (2007). Animacy, expletives, and the learning of the raising-control distinction. In A. Belikova, L. Meroni, and M. Umeda (eds), Proceedings of GALANA. Sommerville, MA: Cascadilla Press, 12–20. Becker, Misha (2009). The role of animacy and expletives in verb learning. Language Acquisition, 16: 283–96. Beckman, Mary E. and Edwards, Jan (2000). Lexical frequency effects on young children’s imitative productions. In Michael B. Broe and Janet B. Pierrehumbert (eds), Papers in Laboratory Phonology V: Language Acquisition and the Lexicon. Cambridge, MA: Cambridge University Press, 208–18. Beckman, M. and Pierrehumbert, J. (1986). Intonational structure in English and Japanese Phonology Yearbook, 3: 255–310. Bedore, L. and Leonard, L. (2001). Grammatical morphology deficits in Spanish-speaking children with specific language impairment. Journal of Speech, Language, and Hearing Research, 44(4): 905–24. Beeler, M. S. (1970). Sibilant harmony in Chumash. International Journal of American Linguistics, 36: 14–17. Behrens, H. (1993a). Early Encoding of Temporal Reference in German. Proceedings of 24th Annual Child Language Research Forum, Stanford. Behrens, Heike (1993b). Temporal reference in German child language: form and function of early verb use. Unpublished Ph.D. thesis, University of Amsterdam. Behrens, Leila (2005). Genericity from a cross- linguistic perspective. Linguistics, 43(2): 275–344. Beilin, H. (1975). Studies in the Cognitive Basis of Language Development. New York: Academic Press. Beilin, H. and Lust, B. (1975). A study of development of logical and linguistic connectives: Linguistic data. In H. Beilin (ed.), Studies in the Cognitive Basis of Language Development. New York: Academic Press.

826 References Bel, A. (2001). Teoria lingüística i adquisició del llenguatge. Barcelona: Institut D’estudis Catalans. Bel, A. (2003). The syntax of subjects in the acquisition of Spanish and Catalan. International Journal of Latin and Romance Linguistics, 15: 1–26. Belletti, A., and Rizzi, L. (1988). Psych-verbs and theta-theory. Natural Language and Linguistic Theory, 6: 291–352. Bellugi, U. (1965). The development of interrogative structures in children’s speech. In K. F. Riegel (ed.), The Development of Language Functions. Ann Arbor, MI: University of Michigan, 103–37. Bellugi, U. (1971). Simplification in children’s language. In R. Huxley and E. Ingram (eds), Language Acquisition: Models and methods. New York: Academic Press. Bellugi, U., Sabo, H., and Vaid, J. (1988a). Spatial deficits in children with Williams Syndrome. In J. Stiles-Davis, M. Kritchevsky, and U. Bellugi, Spatial Cognition: Brain bases and development. Hillsdale, NJ: Lawrence Erlbaum, 273–98. Bellugi, U., Marks, S. et al. (1988b). Dissociation between language and cognitive functions in Williams Syndrome. In D. Bishop and K. Mogford (eds), Language Development in Exceptional Circumstances. Hillsdale, NJ: Lawrence Erlbaum, 177–89. Bellugi, U., Bihrle, A., Jernigan, T., Trauner, D., and Doherty, S. (1990). Neuropsychological, neurological, and neuroanatomical profile of Williams syndrome. American Journal of Medican Genetics, 6: 115–25. Bencini, G. M. L. and Valian, V. V. (2008). Abstract sentence representations in 3-year- olds: Evidence from language production and comprehension. Journal of Memory and Language, 59: 97–113. Bennis, Hans J., Marcel den Dikken, Peter Jordens, Susan Powers, and Jürgen Weissenborn (1995). Picking up particles. In Dawn MacLaughlin and Susan McEwen (eds), Proceedings of the 19th annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 70–81. Benua, Laura (1997). Transderivational identity: Phonological relations between words. Unpublished Ph.D. thesis, University of Massachusetts. [Published in 2000 by Garland, New York.] Benus, S., Smorodinsky, I., and Gafos, A. (2004). Gestural coordination and the distribution of English “geminates.” In S. Arunachalam and T. Scheffler (eds), Proceedings of 27th Penn Linguistic Colloquium. Penn Linguistics Club, Philadelphia, PA, 33–46. Beran, M., and Rumbaugh, D. (2001). “Constructive” enumeration by chimpanzees on a computerized task. Animal Cognition, 4(2): 81–9. Berger, C. V. (1999). De Verwerving van Anaforische Relaties in het Italiaans: Een Onderzoek naar Lexical Feature Learning [The acquisition of anaphoric relations in Italian: An investigation in lexical feature learning]. Unpublished Master's thesis, Utrecht University. Berko Gleason, Jean (1958). The child’s learning of English morphology. Word 14: 150–77. Berman, Ruth (1983). Establishing a schema: Children’s construals of verb-tense marking. Language Sciences, 5: 61–78. Berman, R. (1986). The acquisition of Hebrew. In D. I. Slobin (ed.), The Cross-linguistic Study of Language Acquisition: Verb movement. Vol 1: The data. Hillsdale, NJ: Lawrence Erlbaum Associates. Berman, R. A. (1987). A developmental route: Learning about the form and use of complex nominals in Hebrew. Linguistics, 25-6: 1057–85.

References 827 Berman, R. (1990). On acquiring an (S)VO language: Subjectless sentences in children’s Hebrew. Linguistics, 28: 1135–66. Berman, R. (1993). Marking verb transitivity in Hebrew-speaking children. Journal of Child Language, 20: 641–70. Berman, Ruth, and Sagi, I. (1981). On word-formation and word-innovation in early age. Balshanut Ivrit Xofshit, 18. (in Hebrew). Bermúdez-Otero, Ricardo (1999). Constraint interaction in language change: Opacity and globality in phonological change. Unpublished Ph.D. thesis, University of Manchester. Bermúdez-Otero, Ricardo (2003). The acquisition of phonological opacity. In J. Spenader, A. Eriksson, and Ö. Dahl (eds), Proceedings of the Stockholm Workshop on Variation within Optimality Theory. Department of Linguistics, Stockholm University, Stockholm, 25–36. Bernal, S., Dehaene-Lambertz, G., Millotte, S., and Christophe, A. (2010). Two-year-olds compute syntactic structure on-line. Developmental Science, 12: 69–76. Bernhardt, Barbara and Stemberger, Joseph (1998). Handbook of Phonological Development: From the perspective of constraint-based nonlinear phonology. San Diego: Academic Press. Bernhardt, Barbara and Stemberger, Joseph P. (2007). Phonological impairment in children and adults. In Paul de Lacy (ed.), The Cambridge Handbook of Phonology. Cambridge: Cambridge University Press, 575–94. Bernhardt, Barbara and Stoel-Gammon, Carol (1996). Underspecification and markedness in normal and disordered phonological development. In Carol Johnson and John H. V. Gilbert (eds), Children’s Language. Mahwah, NJ: Laurence Erlbaum, 9: 206–44. Bernicot, J., Laval, V. and Chaminaud, S. (2007). Nonliteral language forms in children. In what order are they acquired in Pragmatics and Metapragmatics? Journal of Pragmatics, 39: 2115–32. Bertolo, S. (1995a). Learnability properties of parametric models for natural language acquisition. Unpublished Ph.D. thesis, Rutgers University. Bertolo, S. (1995b). Maturation and learnability in parametric systems. Language Acquisition, 4(4): 277–318. Bertolo, S. (2001). A brief overview of learnability. In S. Bertolo (ed.), Language Acquisition and Learnability. Cambridge: Cambridge University Press, 1–14). Bertolo, S., Broihier, K., Gibson, E., and Wexler, K. (1997a). Characterizing learnability conditions for cue-based learners in parametric language systems. In Proceedings of The Fifth meeting of The Mathematics of Language Conference (MOL5) . Saarbrücken. Bertolo, S., Broihier, K., Gibson, E., and Wexler, K. (1997b). Cue-based learners in parametric language systems: Application of general results to a recently proposed learning algorithm based on unambiguous “superparsing.” In M. G. Shafto and P. Langley (eds), Proceedings of The Nineteenth Annual Conference of the Cognitive Science Society (CogSci-1997). Stanford University. Bertoncini, Josiane and Mehler, Jacques (1981). Syllables as units in infant speech perception. Infant Behavior and Development, 4: 247–60. Bertoncini, Josiane, Bijeljac-Babic, Ranka, Jusczyk, Peter W., Kennedy, Lori J., and Mehler, Jacques (1988). An investigation of young infants’ perceptual representations of speech sounds. Journal of Experimental Psychology, 117: 21–33. Bertoncini, Josiane, Floccia, Caroline, Nazzi, Thierry and Mehler, Jacques (1995). Morae and syllables: Rhythmical basis of speech representations in neonates. Language and Speech, 38: 311–29.

828 References Berwick, Robert (1982). Locality principles and the acquisition of syntactic knowledge. Unpublished Ph.D. thesis, MIT. Berwick, R. C. (1985). The Acquisition of Syntactic Knowledge. Cambridge, MA: MIT Press. Berwick, R. C. and Niyogi, P. (1996). Learning from triggers. Linguistic Inquiry, 27(4): 605–22. Berwick, Robert and Weinberg, Amy (1984). The Grammatical Basis of Linguistic Performance. Cambridge, MA: MIT Press. Bever, Thomas G. (1970). The cognitive basis for linguistic structures. In J. R. Hayes (ed.), Cognition and the Development of Language. New York, NY: Wiley, 279–362. Bhatt, R. and Pancheva, R. (2004). Late merger of degree clauses. Linguistic Inquiry, 35: 1–45. Bhatt, R. and Pancheva, R. (2007). Degree quantifiers, position of merger effects with their restrictors, and conservativity. In C. Barker, and P. Jacobson (eds), Direct Compositionality. Oxford: Oxford University Press, 102–31. Bhatt, R. and Takahashi, S. (2007). Direct Comparisons: Resurrecting the direct analysis of phrasal comparatives. In M. Gibson and T. Friedman (eds), Proceedings of SALT XVII. Ithaca, NY: CLC Publications, Cornell University. Biberauer, T., Holmberg, A., Roberts, I., and Sheehan, M. (eds) (2010). Parametric Variation: Null subjects in minimalist theory. Cambridge: Cambridge University Press. Bickerton, Derek (1984). The Language Bioprogram Hypothesis. Behavioral and Brain Sciences, 7(2): 173–222. Bierwisch, M. (1967). Some semantic universals of German adjectivals. Foundations of Language, 3: 1–36. Bierwisch, M. (1989). The semantics of gradation. In M. Bierwisch and E. Lang (eds), Dimensional Adjectives: Grammatical structure and conceptual interpretation [Grammatische und konzeptuelle Aspekte von Dimensionsadjectiven]. Springer series in language and communication. Berlin: Akademie-Verlag, Springer, 26: 71–262. Bijeljac-Babic, Ranka, Bertoncini, Josiane and Mehler, Jacques (1993). How do four-day-old infants categorize multisyllabic utterances? Developmental Psychology, 29: 711–21. Binnick, Robert (2012). The Oxford Handbook of Tense and Aspect. Oxford: Oxford University Press. Bishop, C. (2006). Pattern Recognition and Machine Learning. Berlin: Springer. Bishop, D. V. M. (2006a). Developmental cognitive genetics: How psychology can inform genetics and vice versa. The Quarterly Journal of Experimental Psychology, 59(7): 1153–68. Bishop, D. V. M. (2006b). What causes specific language impairment in children? Current Directions in Psychological Science, 15(5): 217–21. Bishop, D. V. M. (2009). Specific language impairment as a language learning disability. Child Language Teaching and Therapy, 25(2): 163–165. Bishop, D., and Bourne, E. (1985). Do young children understand comparatives? British Journal of Developmental Psychology, 3: 123–32. Bishop, D. V. M., Bishop, S. J., Bright, P., James, C., Delaney, T., and Tallal, P. (1999). Different origin of auditory and phonological procesing problems in children with language impairment: Evidence from a twin study. Journal of Speech, Language and Hearing Research, 42: 155–68. Bishop, D. V. M., Adams, C. V., and Norbury, C. F. (2006). Distinct genetic influences on grammar and phonological short-term memory deficits: evidence from 6-year-old twins. Genes, Brain and Behavior, 5(2): 158–69. Bittner, Maria. (1994). Case, Scope and Binding. Dordrecht: Kluwer. Blake, Barry J. (2001). Case, 2nd edn. Cambridge: Cambridge University Press.

References 829 Blanchard, Daniel, Heinz, Jerry, and Golinkoff, Robert (2010). Modeling the contribution of phonotactic cues to the problem of word segmentation. Journal of Child Language, 37: 487–511. Bleile, Ken (1991). Child Phonology: A Book of Exercises for Students. San Diego: Singular Publishing. Blom, E. (2007). Modality, infinitives, and finite bare verbs in Dutch and English child language. Language Acquisition, 14(1): 75–113. Blom, E. and van Geert, P. (2004). Signs of a developing grammar: Subject drop and inflection in early child Dutch. Linguistics, 42, 195–234. Blom, E., Krikhaar, E., and Wijnen, F. (2001). Nonfinite clauses in Dutch and English child language: An experimental approach. Proceedings of the Annual Boston University Conference on Language Development, 25(1): 133–44. Bloom, L. (1970). Language Development: Form and function in emerging grammars. Cambridge, MA: MIT Press. Bloom, L. (1973). One Word at a Time: The use of single word utterances before syntax. The Hague: Mouton. Bloom, L., Hood, L., and Lightbown, P. M. (1974). Imitation in language development: If, when and why. Cognitive Development, 6: 380–420. Bloom. L., Lightbown, P., and Hood, L. (1975a). Structure and variation in child language. Monographs of the Society for Research in Child Development, 40. Bloom. L., Miller, P., and Hood, L. (1975b). Variation and reduction as aspects of competence in language development. In A. Pick (ed.), Minnesota Symposia on Child Psychology. Minneapolis: University of Minnesota Press, 9: 3–55. Bloom, Lois, Lifter, Karen, and Hafitz, Jeremy (1980). Semantics of verbs and development of verb Inflection in child language. Language, 56: 386–412. Bloom, L., Tackeff, J., and Lahey, M. (1984). Learning to in complement constructions. Journal of Child Language, 11: 391–406. Bloom, L., Rispoli, M., Gartner, B., and Hafitz, J. (1989). Acquisition of complementation. Journal of Child Language, 16: 101–20. Bloom, P. (1990). Subjectless sentences in child language. Linguistic Inquiry, 21: 491–504. Bloom, P. (2000). How Children Learn the Meaning of Words. Cambridge, MA: MIT Press. Bloom, P. and Wynn, K. (1997). Linguistic cues in the acquisition of number words. Journal of Child Language, 24: 511–33. Bloom, P., A. Barss, J. Nicol, and L. Conway (1994). Children’s knowledge of binding and coreference: Evidence from spontaneous speech, Language, 70(1): 53–7 1. Blumenfeld, Lev (2003). Russian palatalization in Stratal OT: Morphology and [back]. In W. Browne, J.-Y. Kim, B. H. Partee, and R. Rothstein (eds), Proceedings of the Annual Workshop on Formal Approaches to Slavic Linguistics 11, 141–58. Amherst: University of Massachusetts, Amherst. Michigan Slavic Publications. Blumer, Anselm, Ehrenfeucht, A., Haussler, David and Warmuth, Manfred K. (1989). Learnability and the Vapnik-Chervonenkis dimension. JACM, 36: 929–65. Bock, J. K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18: 355–87. Bock, Kathryn, and Levelt, Willem (1994). Language production: Grammatical encoding. In by Morton A. Gernsbacher (ed.), Handbook of Psycholinguistics. San Diego: Academic Press, 3: 945–83.

830 References Boeckx, C. (2011). Approaching parameters from below. In A. M. Di Sciullo and C. Boeckx (eds), The Biolinguistic Enterprise: New perspectives on the evolution and nature of the human language faculty. Oxford: Oxford University Press, 1: 205–21. Boersma, Paul (1997). How we learn variation, optionality, and probability. IFA Proceedings, 21: 43–58. Boersma, P. (1998). Functional phonology: Formalizing the interactions between articulatory and perceptual drives. Unpublished Ph.D. thesis, University of Amsterdam. Boersma, Paul (2003). Review of Tesar and Smolensky (2000): Learnability in Optimality Theory. Phonology, 20(3): 436–46. Boersma, Paul (2009). Some correct error-driven versions of the Constraint Demotion Algorithm. Linguistic Inquiry, 40(4): 667–86. Boersma, Paul and Hayes, Bruce (2001). Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry, 32(1): 45–86. Boersma, Paul and Levelt, Claartje (2000). Gradual Constraint-Ranking Learning Algorithm predicts acquisition order. In Proceedings of 30th Child Language Research Forum. Stanford, CA: CSLI, 229–37. Boersma, Paul and Levelt, Claartje (2003). Optimality theory and phonological acquisition. Annual Review of Language Acquisition, 3: 1–50. Boersma, Paul and Pater, Joe (to appear 2016). Convergence properties of a Gradual Learning Algorithm for harmonic grammar. In John McCarthy and Joe Pater (eds), Harmonic Grammar and Harmonic Serialism. London: Equinox Press. Bohnemeyer, Jürgen and Swift, Mary (2004). Event realization and default aspect. Linguistics and Philosophy, 27: 263–96. Bolinger, D. (1967a). Adjectives in English: Attribution and predication. Lingua, 18: 1–34. Bolinger, D. (1967b). Adjective comparison: A semantic scale. Journal of English Linguistics, 1: 2–10. Bolinger, D. (1968). Entailment and the meaning of structures. Glossa, 2: 118–27. Bolinger, D. (1972). Degree Words. The Hague: Mouton. Bonatti, L., Peña, M., Nespor, M., and Mehler, J. (2005). Linguistic constraints on statistical computations: The role of consonants and vowels in continuous speech process. Psychological Science, 16: 451–9. Bonawitz, E., Denison, S., Chen, A., Gopnik, A., and Griffiths, T. L. (2011). A simple sequential algorithm for approximating Bayesian inference. In L. Carlson, C. Holscher, and T. Shipley (eds), Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society. Booij, Geert (1995). The Phonology of Dutch. Oxford: Oxford University Press. Booij, Geert (2002). The Morphology of Dutch. Oxford: Oxford University Press. Borer, Hagit (1988). On the parallelism between compounds and constructs. Yearbook of Morphology, 1: 45–66. Borer, Hagit (1994). The projection of arguments. In E. Benedicto and J. Runner (eds), Functional Projections, University of Massachusetts Occasional Papers 17. GSLA, Amherst, 19–47. Borer, H. (1999). Deconstructing the construct. In K. Johnson and I. G. Roberts (eds), Beyond Principles and Parameters. Dordrecht: Kluwer, 43–89. Borer, H. (2003). Exo-skeletal vs. endo-skeletal explanations. In J. Moore and M. Polinsky (eds), The Nature of Explanation in Linguistic Theory. Stanford, CA: CSLI, 31–67. Borer, Hagit (2005). Structuring Sense; Volume II: The Normal Course of Events. Oxford: Oxford University Press.

References 831 Borer, Hagit, and Wexler, Kenneth (1987). The maturation of syntax. In Thomas Roeper and Edwin Williams (eds), Parameter Setting. Dordrecht: Reidel, 123–72. Borer, Hagit, and Wexler, Kenneth (1992). Bi-unique relations and the maturation of grammatical principles. Natural Language and Linguistic Theory, 10: 147–89. Borowsky, Toni (1986). Topics in the lexical phonology of English. Unpublished Ph.D. thesis, University of Massachusetts. Bortfeld, H., Morgan, J. L., Golinkoff, R. M., and Rathbun, K. (2005). Mommy and me: Familiar names help launch babies into speech-stream segmentation. Psychological Science, 16: 298–304. Boser, K., Lust, B., Santelmann, L., and Whitman, J. (1992). The syntax of CP and V-2 in early child German: The strong continuity hypothesis. In K. Broderick (ed.), NELS 23. Amherst: University of Massachusetts: GLSA. Bosse, S. (2011). The syntax and semantics of applicative arguments in German and English. Unpublished Ph.D. thesis, University of Delaware. Boster, C. T. (1991). Children’s failure to obey Principle B: Syntactic problem or lexical error? Unpublished manuscript, University of Connecticut, Storrs. Boster, C. (1994). Simulating children’s null subjects: An early language generation model. Proceedings of the 32nd annual meeting of the Association for Computational Linguistics. Boster, C. (1997). Processing and parameter setting in language acquisition: A computational approach. Unpublished Ph.D. thesis, University of Connecticut. Boster, C. T. and Crain, S. (1993). On children’s understanding of every and or. Conference Proceedings: Early Cognition and the Transition to Language. Austin, TX: University of Texas at Austin. Bott, L. and Noveck, I. A. (2004). Some utterances are underinformative: The onset and time course of scalar inferences. Journal of Memory and Language, 51: 437–57. Bouton, L. (1970). Antecedent contained pro-forms. In M. A. Campbell, J. Lindholm, A. Davison, W. Fisher, L. Furbee, J. Lovins, E. Maxwell, J. Reighard, and S. Straight (eds), Papers from the Sixth Regional Meeting of the Chicago Linguistic Society. Chicago: University of Chicago, Chicago Linguistic Society, 154–67. Bowerman, M. F. (1973). Early Syntactic Development: A Crosslinguistic Study with Special Reference to Finnish. London: Cambridge University Press. Bowerman, M. (1974). Early development of concepts underlying language. In R. Schiefelbusch and L. Lloyd (eds), Language Perspectives: Acquisition, retardation, and intervention. Baltimore: University Park Press, 191–209. Bowerman, M. (1976). Semantic factors in the acquisition of rules for word use and sentence construction. In D. Morehead and A. Morehead (eds), Directions in Normal and Deficient Language Development. Baltimore: University Park Press, 99–179. Bowerman, M. (1982). Starting to talk worse: Clues to language acquisition from children’s late speech errors. In S. Strauss (ed.), U-shaped Behavioral Growth. New York: Academic Press, 101–45. Bowerman, Melissa (1985). What shapes children’s grammars? In D. Slobin (ed.), The Crosslinguistic Study of Language Acquisition. Hillsdale, NJ: Lawrence Erlbaum, 1257–319. Bowerman, M. (1988). The “no negative evidence” problem: How do children avoid constructing an overly general grammar? In J. Hawkins (ed.), Explaining Language Universals. Oxford: Basil Blackwell, 73–101. Bowerman, Melissa (1996). Learning how to structure space for language: A crosslinguistic perspective. In Paul Bloom, Mary A. Peterson, Lynn Nadel, and Merril F. Garrett (eds), Language and Space. Cambridge, MA: MIT Press, 385–436.

832 References Bowerman, M. and Croft, W. (2008). The acquisition of the English causative alternation. In M. Bowerman and P. Brown (eds), Crosslinguistic Perspectives on Argument Structure: Implications for learnability. Mahwah, NJ: Erlbaum, 279–306. Boyd, J. K. and Goldberg, A. E. (2011). Learning what not to say: The role of statistical preemption and categorization in “a”-adjective production. Language, 81: 1–29. Braine, Martin D. S. (1974). On what might constitute a learnable phonology. Language, 50: 270–99. Braine, M. D. S. (1976a). Children’s first word combinations. Monographs of the Society for Research in Child development, 41 (1) (serial no. 164). Braine, Martin D. S. (1976b). Review of Neil Smith (1973). The acquisition of phonology: A case study. Language, 52: 489–98. Braine, M. D. S. and Brooks, P. J. (1995). Verb argument structure and the problem of avoiding an overgeneral grammar. In M. Tomasello and W. E. Merriman (eds), Beyond Names for Things: Young children’s acquisition of verbs. Hillsdale, NJ: Lawrence Erlbaum Associates, 353–76). Braine, M. D. S. and Rumain, B. (1981). Development of comprehension of “or”: Evidence for a sequence of competencies. Journal of Experimental Child Psychology, 31: 46–70. Brame, Michael (1974). The cycle in phonology: Stress in Palestinian, Maltese, and Spanish. Linguistic Inquiry, 5: 39–60. Brandone, Amanda C. and Gelman, Susan A. (2009). Differences in preschoolers’ and adults’ use of generics about animals and artifacts: A window onto a conceptual divide. Cognition, 110: 1–22. Brandone, A., Addy, D. A., Pulverman, R., Golinkoff, R., and Hirsh-Pasek, K. (2006). One- for-one and two-for-two: Anticipating parallel structure between events and language. Proceedings of the Annual Boston University Conference on Language Development, 30: 36–47. Brandone, Amanda C., Cimpian, Andrei, Leslie, Sarah-Jane, and Gelman, Susan A. (2012). Do lions have manes? For children, generics are about kinds rather than quantities. Child Development, 83: 423–3. Breheny, R. (2008). A new look at the semantics and pragmatics of numerically quantified noun phrases. Journal of Semantics, 25(2): 93–140. Breheny, R., Katsos, N., and Williams, J. (2006). Are Generalised Scalar Implicatures generated by default? An on-line investigation into the role of context in generating pragmatic inferences. Cognition, 100: 434–63. Brennan, J. (2009). Pronouns, inflection, and Irish prepositions. NYU Working Papers in Linguistics, Volume 2: Papers in Syntax. New York: New York University. Brent, M. (1999a). An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning, 34: 71–105. Brent, Michael R. (1999b). Speech segmentation and word discovery: A computational perspective. Trends in Cognitive Science, 3: 294–301. Brent, M. R., and Cartwright, T. A. (1996). Distributional regularity and phonotactic constraints are useful for segmentation. Cognition, 61: 93–125. Brent, M. R. and Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition, 81: B33–B44. Bresnan, J. W. (1973). Syntax of the comparative clause construction in English. Linguistic Inquiry, 4: 275–473. Bresnan Joan. (1982a). Control and complementation. Linguistic Inquiry, 13: 343–434.

References 833 Bresnan, J. (1982b). The Mental Representation of Grammatical Relations. Cambridge, MA: MIT Press. Bretherton, I. and Beeghly, M. (1982). Talking about internal states: The acquisition of an explicit theory of mind. Developmental Psychology, 18: 906–21. Bricker, D. and Squires, J. (1989). The effectiveness of parental screening of at-risk infants: The infant monitoring questionnaires. Topics in Early Childhood Special Education, 9: 67–85. Brinton, Laurel (1985). Verb particles in English: Aspect or Aktionsart? Studia Linguistica, 39: 157–68. Briscoe, E. J. (2000). Grammatical acquisition: Inductive bias and coevolution of language and the language acquisition device. Language, 76(2): 245–96. Brock, J. (2007). Language abilities in Williams syndrome: A critical review. Development and Psychopathology, 19: 97–127. Broihier, Kevin, Hyams, Nina, Johnson, Kyle, Pesetsky, David, Poeppel, David, Schaeffer, Jeanette, and Wexler, Ken (1994). The acquisition of Germanic verb particle construction. Paper presented at the 18th Boston University Conference on Language Development, Boston University. Bromberg, H. S. and Wexler, K. (1995). Null subjects in child wh-questions. MIT Working Papers in Linguistics, 26: 221–47. Bronckart, Jean-Paul and Sinclair, H. (1973). Time, tense and aspect. Cognition, 2: 107–30. Brooks, P.J. and Tomasello, M. (1999). Young children learn to produce passives with nonce verbs. Developmental Psychology, 35: 29–44. Brown, Cynthia and Matthews, John (1997). The role of feature geometry in the development of phonetic contrasts. In S. J. Hannahs and Martha Young-Scholten (eds), Focus on Phonological Acquisition. Benjamins. Amsterdam, 67–112. Brown, P. (1998). Children’s first verbs in Tzeltal: Evidence for an early verb category. Linguistics, 36(4): 713–53. Brown, P. (2008). Verb- specificity and argument realization in Tzeltal child language. In M. Bowerman and P. Brown (eds), Crosslinguistic Perspectives on Argument Structure: Implications for Learnability. Mahwah, NJ: Erlbaum, 167–90. Brown, R. W. (1968). The development of Wh questions in child speech. Journal of Verbal Learning and Behavior, 7: 279–90. Brown, R. W. (1973). A First Language: The Early Stages. Cambridge, MA: Harvard University Press. Brown, R. and Bellugi, U. (1964). Three processes in the child’s acquisition of syntax. Harvard Educational Review, 34(2): 133–51. Brown, R. and Hanlon, C. (1970). Derivational complexity and order of acquisition in child speech. In J. Hayes (ed.), Cognition and the Development of Language. New York: Wiley, 11–53. Bruce. G. (1977). Swedish Word Accents in Sentence Perspective. Lund: Geerup. Bruce. G. (1987). “How floating is focal accent?” In K. Gregerson and H. Basbøl (eds), Nordic prosody IV. Odense: Odense University Press, 41–9. Bruening, B. (2006). Differences between the wh-scope-marking and wh-copy constructions in Passamaquoddy. Linguistic Inquiry, 37(1): 25–49. Brugman, Claudia (1983). The use of body-part terms as locatives in Chalcatongo Mixtec. In Alice Schlichter, Wallace L. Chafe, and Leanne Hinton (eds), Reports from the Survey of California and Other Indian Languages, Report #4. University of California, Berkeley, 235–90.

834 References Brun, Dina, Avrutin, Sergey, and Babyonyshev, Maria (1999). Aspect and its temporal interpretation during the optional infinitive stage in Russian. In A. Greenhill, H. Littlefield, and C. Tano (eds), Proceedings of the 23rd BUCLD. Somerville, MA: Cascadilla Press, 120–31. Bryan, Michelle (2009). An exploration of r-sound mispronunciations during speech development of English-and French-speaking children. Unpublished BA honours thesis, Dept. of Psychology, Calgary, University of Calgary, Canada. Buckley, Eugene (2003). Children’s unnatural phonology. In Proceedings of the Twenty-Ninth Annual Meeting of the Berkeley Linguistics Society: General Session and Parasession on Phonetic Sources of Phonological Patterns: Synchronic and Diachronic Explanations. Berkeley Linguistics Society. Berkeley, CA, 523–34. Budwig, Nancy (1989). The linguistic marking of agentivity and control in child language. Journal of Child Language, 16: 263–84. Budwig, Nancy (1990). The linguistic marking of nonprototypical agency: An exploration into children’s use of passives. Linguistics, 28: 1221–52. Budwig, N. (2001). An exploration into children’s use of passives. In M. Tomasello and E. Bates (eds), Language Development: The Essential Readings. Malden, MA: Blackwell Publishers, 227–47. Buesa, C. (2006). Root non-agreeing forms in early child Spanish. Paper presented at the Generative Approaches to Language Acquisition—North America. Bundy, R. S., Columbo, J., and Singer, J. (1982). Pitch perception in young infants. Developmental Psychology, 18: 10–14. Bunger, A. (2006). How we learn to talk about events: Linguistic and conceptual constraints on verb learning. Unpublished Ph.D. thesis, Northwestern University. Bunger, A., Trueswell, J., and Papafragou, A. (2010). Seeing and saying: The relation between event apprehension and utterance formulation in children. Proceedings of the Annual Boston University Conference on Language Development. Boston: Boston University Press, 34: 58–69. Burzio, Luigi (1986). Italian Syntax: A government-binding approach. Dordrecht: Reidel. Burzio, Luigi (2000). Cycles, non-derived-environment blocking, and correspondence. In J. Dekkers, F. van der Leeuw, and J. van de Weijer (eds), Optimality Theory: Syntax, Phonology, and Acquisition. Oxford: Oxford University Press. Burzio, Luigi (2004). Sources of paradigm uniformity. In Laura J. Downing, T. A. Hall, and Renate Raffelsiefen (eds), Paradigms in Phonological Theory. Oxford: Oxford University Press. Bush, R. R. and Mosteller, F. (1955). Stochastic Models for Learning. New York: Wiley. Butt, Miriam (2006). Theories of Case. Cambridge: Cambridge University Press. Bybee, Joan (1985). Morphology: A study of the relation between meaning and form. Amsterdam: John Benjamins. Bybee, Joan (1995). Regular morphology and the lexicon. Language and Cognitive Processes, 10: 425–55. Bybee, J. (1998). The emergent lexicon. Proceedings of the Chicago Linguistics Society, 34: 421–35. Bybee, Joan L. (2001). Phonology and Language Use. Cambridge, MA: Cambridge University Press. Bybee, Joan and Hopper, Paul (eds) (2001). Frequency and the Emergence of Linguistic Structure. Amsterdam: John Benjamins. Bybee, Joan and Pardo, Elly (1981). On lexical and morphological conditioning of rules: A nonce-probe experiment with Spanish verbs. Linguistics, 19: 937–68. Bybee, Joan and Slobin, Dan (1982). Rules and schemas in the development and use of the English past tense. Language, 58(2): 265–89.

References 835 Byrnes J. and Duff, M. (1989). Young children’s comprehension of modal expressions. Cognitive Development, 4: 369–87. Cabré Sans, Y. and Gavarró, A. (2006). Subject distribution and verb classes in child Catalan. In A. Belikova, L. Meroni, and M. Umeda (eds), Proceedings of the 2nd Conference of GALANA. Somerville, MA: Cascadilla Press, 51–60. Cairns, H., McDaniel, D., Hsu, J. R., and Ra, M. (1994). A longitudinal study of principles of control and pronominal reference in child English. Language, 70: 260–88. Camacho, J. A. (2013). Null Subjects. Cambridge: Cambridge University Press. Camacho, J. and Elías-Ulloa, J. (2010). Null subjects in Shipibo switch-reference systems. In J. Camacho, R. Gutiérrez Bravo, and L. Sánchez (eds), Information Structure in Languages of the Americas. Berlin: Mouton de Gruyter. Camaioni, L. and Longobardi, E. (2001). Noun versus verb emphasis in Italian mother-to-child speech. Journal of Child Language, 28(3): 773–85. Camarata, Stephen M. and Gandour, Jack (1984). On describing idiosyncratic phonologic systems. Journal of Speech and Hearing Disorders, 49: 262–6. Campbell, A. L. and Tomasello, M. (2001). The acquisition of English dative constructions. Applied Psycholinguistics, 22: 253–67. Caprin, C. and Guasti, M. T. (2006). A cross-sectional study on the use of “be” in early Italian. In V. Torrens and L. Escobar (eds), The Acquisition of Syntax in Romance Languages. Amsterdam: John Benjamins. Caramazza, Alfonso (1988). Some aspects of language processing revealed through the analysis of acquired aphasia: The lexical system. Annual Review of Neuroscience, 11: 395–421. Cardinaletti, A. and Starke, M. (1995). The tripartition of pronouns and its acquisition: Principle B Problems are ambiguity problems, in J. Beckman (ed.), Proceedings of the Northeastern Linguistic Society 25, GLSA. Amherst: University of Massachusetts. Cardinaletti, A. and Starke, M. (1996). Deficient pronouns: A view from Germanic, in H. Thráinsson, S. D. Epstein, and S. Peters (eds), Studies in Germanic Syntax II. Dordrecht: Kluwer Academic. Cardon, L. R., Smith, S. D., Fulker, D. W., Kimberling, W. J., Pennington, B. F., and DeFries, J. C. (1994). Quantitative trait locus for reading disability on chromosome 6. Science, 266(5183): 276–9. Carey, S. (1978a). Less may never mean more. In R. N. Campbell and P. T. Smith (eds), Recent Advances in the Psychology of Language: Language development and mother–child interaction. New York, NY: Plenum, 109–31. Carey, S. (1978b). The child as word learner. In J. Bresnan, G. Miller, and M. Halle (eds), Linguistic Theory and Psychological Reality. Cambridge, MA: MIT Press, 264–93. Carey, S. (2009). Origins of Concepts. Oxford: Oxford University Press. Carlson, Gregory N. (1977). Reference to kinds in English. Unpublished Ph.D. thesis, University of Massachusetts. Carlson, Gregory N. and Pelletier, Francis Jeffrey (eds) (1995). The Generic Book. Chicago: Chicago University Press. Carlucci, L. and Case, J. (2013), On the necessity of U-shaped learning. Topics in Cognitive Science, 5: 56–88. Carlucci, L., Case, J., Jain, S., and Stephan, F. (2004). U-shaped learning may be necessary. 37th Annual Meeting of the Society for Mathematical Psychology, Ann Arbor, Michigan, July, abstract in Journal of Mathematical Psychology, 49(1): 97, 2005. Carlucci, L., Case, J., Jain, S., and Stephan, F. (2007). Memory-limited U-shaped learning. Information and Computation, 205: 1551–73.

836 References Carpenter, Angela (2005). Acquisition of a natural vs. an unnatural stress system. In A. Burgos, M. R. Clark-Cotton, and S. Ha (eds), Proceedings of the 29th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 134–43. Carrier-Duncan, J. (1985). Linking of thematic roles in derivational word formation. Linguistic Inquiry, 16: 1–34. Carston, R. (1995). Quantity maxims and generalized implicature. Lingua, 96: 213–44. Carston, R. (1998). Informativeness, relevance and scalar implicature. In R. Carston and S. Uchida (eds), Relevance Theory: Applications and implications. Amsterdam: John Benjamins. Carter, R. (1976). Some constraints on possible words. Semantikos, 1: 27–66. Cascells, W., Schoenberger, A., and Grayboys T. (1978). Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299(18): 999–1001. Case, J. (1999). The power of vacillation in language learning. SIAM Journal on Computing, 28: 1941–69. Cassidy, K. W. and Kelly, M. H. (1991). Phonological information for grammatical category assignments. Journal of Memory and Language, 30: 348–69. Cataño, Lorena, Barlow, Jessica A., and Moyna, María Irene (2009). A retrospective study of phonetic inventory complexity in acquisition of spanish: Implications for phonological universals. Clinical Linguistics and Phonetics, 23(6): 446–72. Cattell, R. (1978). On the source of interrogative adverbs. Language, 34: 61–77. Catts, Hugh W. and Jensen, Paul J. (1983). Speech timing of phonologically disordered children: Voicing contrasts of initial and final stop consonants. Journal of Speech and Hearing Research, 26: 501–10. Cauvet, E., Alves Limissuri, R., Millotte, S., Margules, S. and Christophe, A. (2010). What 18- month-old French-learning infants know about nouns and verbs. Poster presented at the XVIIth International Conference on Infant Studies, Baltimore (USA), 11–14 April. Cazden, C. B. (1968). The acquisition of noun and verb inflections. Child Development, 39: 433–48. Cazden, C. (1970). Children’s questions: Their forms, functions and role in education. Young Children, March: 202–20. Chafe, W. (1995). The realis–irrealis distinction in Caddo, the Northern Iroquoian Languages, and English. In J. Bybee and S. Fleischman (eds), Modality in Grammar and Discourse. Philadelphia, PA: John Benjamins Publishing Company. Chaitin, Gregory (2004). How real are real numbers? ArXiv: math/0411418v3. Chambers, Craig, Graham, Susan, and Turner, Juanita N. (2008). When hearsay trumps evidence: How generics guide preschoolers inferences about unfamiliar things. Language and Cognitive Processes, 23: 749–66. Chambers, Jack K. (1973). Canadian raising. The Canadian Journal of Linguistics, 18: 113–35. Chambers, Kyle E., Onishi, Kristine H., and Fisher, Cynthia (2002). Learning phonotactic constraints from brief auditory experience. Cognition, 83: B13–B23. Chambers, Kyle E., Onishi, Kristine H., and Fisher, Cynthia L. (2003). Infants learn phonotactic regularities from brief auditory experience. Cognition, 87: B69–B77. Chambers, Kyle E., Onishi, Kristine H., and Fisher, Cynthia L. (2011). Representations for phonotactic learning in infancy. Language Learning and Development, 7: 287–308. Chaney, Carolyn F. (1978). Production and identification of /j, w, r, l/ in normal and articulation impaired children. Unpublished Ph.D. thesis, Boston University. Chao, Yuen-Ren (1951). The Cantian idiolect: An analysis of the Chinese spoken by a twenty- eight-months-old child. University of California Publications in Semitic Phonology, 11: 27–44.

References 837 Charest, M. J. and Leonard, L. B. (2004). Predicting tense: Finite verb morphology and subject pronouns in the speech of typically-developing children and children with Specific Language Impairment. Journal of Child Language, 31(1): 231–46. Charles-Luce, Jan and Luce, Paul (1990). Some structural properties of words in young children’s lexicons. Journal of Child Language, 17: 205–15. Chater, N. and Manning, C. (2006). Probabilistic models of language processing and acquisition. Trends in Cognitive Sciences, 10: 287–91. Chater, N. and Oaksford, M. (1999). Ten years of the rational analysis of cognition. Trends in Cognitive Sciences, 3(2): 57–65. Chater, Nick and Vitányi, Paul (2007). “Ideal learning” of natural language: Positive results about learning from positive evidence. Journal of Mathematical Psychology, 51: 135–63. Chemla, E., Mintz, T., Bernal, S., and Christophe, A. (2009). Categorizing words using “frequent frames”: What cross-linguistic analyses reveal about distributional acquisition strategies. Developmental Science, 12(3): 396–406. Chen, A. (2007). Intonational realization of topic and focus by Dutch-acquiring 4-to-5-year- olds. In Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS), 1553–56. Chen, A. (2011). Tuning information packaging: Intonational realization of topic and focus in child Dutch. Journal of Child Language, 38: 1055–83. Cheung, H. (2006). False belief and language comprehension in Cantonese-speaking children. Journal of Experimental Child Psychology, 9: 79–98. Cheung, H., Hsuan-Chih, C., Creed, C., Ng, L., Ping Wang, S., and Mo, L. (2004). Relative roles of general and complementation language in theory-of-mind development: Evidence from Cantonese and English. Child Development, 75: 1155–70. Chevrot, J., Dugua, C. and Fayol, M. (2009). Liaison acquisition, word segmentation and construction in French: A usage-based account. Journal of Child Language, 36: 557–96. Chiat, Shulamuth (1981). Context-specificity and generalization in the acquisition of pronominal distinctions. Journal of Child Language, 8: 75–91. Chiat, Shulamuth (1983). Why Mikey’s right and my key’s wrong: The significance of stress and word boundaries in a child’s output system. Cognition, 14: 275–300. Chien, Y.-C. (1992). Theoretical implications of the Principles and Parameters Model for language acquisition in Chinese. In H. C. Chen and O. J. L. Tzeng (eds), Language Processing in Chinese. Amsterdam: Elsevier Science, 313–45. Chien, Y. C. and B. Lust (2006). Chinese children’s knowledge of the Binding Principles. In P. Li, L. H. Tan, E. Bates, and O. J. L. Tzeng (eds), Handbook of East-Asian Psycholinguistics (Vol. 1: Chinese). Cambridge: Cambridge University Press, 1: 23–38. Chien, Y.-C., and Wexler, K. (1987). A comparison between Chinese-speaking and English- speaking children’s acquisition of reflexives and pronouns. Paper presented at the 12th Annual Boston University Conference on Child Language Development, Boston. Chien, Y.-C. and Wexler, K. (1990). Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition, 1: 225–95. Chierchia, Gennaro (1998). Reference to kinds across languages. Natural Language Semantics, 6: 339–405. Chierchia, G. (2004). Scalar implicatures, polarity phenomena and the syntax–pragmatics interface. In A. Belletti (ed.), Structures and Beyond. Oxford: Oxford University Press, 39–103. Chierchia, G., Crain, S., Guasti, M. T., Gualmini, A., and Meroni, L. (2001). The acquisition of disjunction: Evidence for a grammatical view of scalar implicatures. Proceedings of the 25th

838 References Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 157–68. Chierchia, G., Fox, D., and Spector, B. (2009). Hurford’s Constraint and the Theory of Scalar Implicatures. In P. Egré and G. Magri (eds), Presuppositions and Implicatures. Proceedings of the MIT-Paris Workshop. Cambridge, MA: MIT Working Papers in Linguistics, 47–62. Chin, Steven B. and Dinnsen, Daniel A. (1992). Consonant clusters in disordered speech: Constraints and correspondence patterns. Journal of Child Language, 19: 259–85. Cho, S.-W. (2009). Acquisition of Korean reflexive anaphora. In C. Lee (ed.), Handbook of East- Asian Psycholinguistics (Vol. 3: Korean). Cambridge: Cambridge University Press. Choi, S. (1991). Early acquisition of epistemic meanings in Korean: A study of sentence-ending suffixes in spontaneous speech of three children. First Language, 11: 93–119. Choi S. (1995). The development of epistemic sentence-ending modal forms and functions in Korean children. In J. Bybee and S. Fleischman (eds), Modality in Grammar and Discourse. Amsterdam: Benjamins, 165–204. Choi, Y. and Mazuka, R. (2003). Young children’s use of prosody in sentence parsing. Journal of Psycholinguistic Research, 32: 197–217. Choi, Y. and Trueswell, J. C. (2010). Children’s (in)ability to recover from garden paths in a verb-final language: Evidence for developing control in sentence processing. Journal of Experimental Child Psychology, 106: 41–61. Cholin, Joana, Levelt, Willem J., and Schiller, Niels O. (2006). Effects of syllable frequency in speech production. Cognition, 99: 205–35. Chomsky, C. (1969). The Acquisition of Syntax in Children from 5 to 10. Cambridge, MA: MIT Press. Chomsky, N. (1955). The Logical Structure of Linguistic Theory. Cambridge, MA: MIT Humanities Library, Microfilm, Published in 1977 by Plenum. Chomsky, Noam (1956). Three models for the description of language. IRE Transactions on Information Theory, 113124: IT-2. Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton De Gruyter. Chomsky, N. (1959). A review of B.F. Skinner’s Verbal Behavior. Language, 35: 26–58. Chomsky, Noam (1964). Current issues in linguistic theory. In Jerry Fodor and Jerrold Katz (eds), The Structure of Language: Readings in the Philosophy of Language. Englewood Cliffs, NJ: Prentice Hall. Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, N. (1971). Problems of Knowledge and Freedom. London: Fontana. Chomsky, N. (1973). Conditions on transformations. In S. Anderson and P. Kiparsky (eds), A Festschrift for Morris Halle. New York: Holt, Rinehart and Winston, 237–86. Chomsky, N. (1975). Reflections on Language. New York: Pantheon. Chomsky, N. (1977). On wh-movement. In P. Cullicover, T. Wasow, and A. Akmajian (eds), Formal Syntax. New York: Academic Press, 71–132. Chomsky, N. (1980). Rules and Representations. Oxford: Basil Blackwell. Chomsky, N. (1981a). Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. (1981b). Rules and Representations. New York: Columbia University Press. Chomsky, N. (1982). Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, MA: MIT Press. Chomsky, N. (1986). Knowledge of Language: Its nature, origin, and use. New York: Praeger. Chomsky, N. (1988). Language and Problems of Knowledge: The Managua Lectures. Cambridge, MA: MIT Press.

References 839 Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, Noam (1998). Minimalist inquiries: The framework. In Roger Martin, David Michaels, and Juan Uriagereka (eds), Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik. Cambridge, MA: MIT Press. Chomsky, N. (2000a). Minimalist inquiries: The framework. In R. Martin, D. Michaels, J. Uriagereka (eds), Step by Step. Cambridge, MA: MIT Press. Chomsky, N. (2000b). New Horizons in the Study of Language and Mind. Cambridge: Cambridge University Press. Chomsky, N. (2001). Derivation by phase. In M. Kenstowicz (ed.), Ken Hale: A Life in Language. Cambridge, MA: MIT Press, 1–52. Chomsky, N. (2004). Beyond explanatory adequacy, structures and beyond. In A. Belletti (ed.), The Cartography of Syntactic Structures. Oxford; New York: Oxford University Press, 3: 104–31. Chomsky, N. (2005). On Phases. In R. Freidin, C. P. Otero, and M.-L. Zubizaretta (eds), Foundational Issues in Linguistic Theory. Cambridge, MA: MIT Press. Chomsky, Noam and Halle, Morris (1968). The Sound Pattern of English. New York: Harper and Row. Chomsky, N. and Lasnik, H. (1993). The Theory of Principles and Parameters. In J. J. v. S. A. S. W. Halbband and T. Vennemann (eds), Syntax: Ein Internationales Handbuch Zeitgenossischer Forschung. An International Handbook of Contemporary Research. Berlin: Federal Republic of Germany: Walter de Gruyter, 506–69. Christiansen, Morten H., Allen, Joseph, and Seidenberg, Mark S. (1998). Learning to segment speech using multiple cues: A connectionist model. Language and Cognitive Processes, 13: 221–68. Christiansen, Morten H., Onnis, Luca, and Hockema, Stephen A. (2009). The secret is in the sound: From unsegmented speech to lexical categories. Developmental Science, 12: 388–95. Christofidou, A. (1998). Number or case first? Evidence from Modern Greek. In A. Aksu- Koç, E. Erguvanli Taylan, A. Sumru Özsoy, and A. Küntay (eds), Perspectives on Language Acquisition. Selected papers from the VIIth International Congress for the study of Child Language, 46–59. Christophe, A., Dupoux, E., Bertoncini, J., and Mehler, J. (1994). Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. Journal of Acoustical Society of America, 95: 1570–80. Christophe, A., Guasti, M. T., Nespor, M., Dupoux, E., and van Ooyen, B. (1997). Reflections on phonological bootstrapping: Its role for lexical and syntactic acquisition. Language and Cognitive Processes, 12: 585–612. Christophe, A., Mehler, J., and Sebastian-Galles, N. (2001). Perception of prosodic boundary correlates by newborn infants. Infancy, 2: 385–94. Christophe, A., Peperkamp, S., Pallier, C., Block, E., and Mehler, J. (2004). Phonological phrase boundaries constrain lexical access: I. Adult data. Journal of Memory and Language, 51: 523–47. Christophe, A., Millotte, S., Bernal, S., and Lidz, J. (2008). Bootstrapping lexical and syntactic acquisition. Language and Speech, 51: 61–75. Church, Kenneth W. (1987). Phonological parsing and lexical retrieval. Cognition, 25: 53–69. Cimpian, Andrei, and Markman, Ellen M. (2008). Preschool children’s use of cues to generic meaning. Cognition, 107(1): 19–53.

840 References Cimpian, Andrei and Markman, Ellen M. (2009). Information learned from generic language becomes central to children’s biological concepts: Evidence from their open-ended explanations. Cognition, 113(1): 14–25. Cimpian, Andrei, Brandone, Amanda C., and Gelman, Susan A. (2010a). Generic statements require little evidence for acceptance but have powerful implications. Cognitive Science, 34(8): 1452–82. Cimpian, Andrei, Gelman, Susan A., and Brandone, Amanda C. (2010b). Theory-based considerations influence the interpretation of generic sentences. Language and Cognitive Processes, 25: 261–76. Cimpian, Andrei, Meltzer, Trent J., and Markman, Ellen M. (2011). Preschoolers’ use of morphosyntactic cues to identify generic sentences: Indefinite singular noun phrases, tense and aspect. Child Development, 82(5): 1561–78. Cinque, G. (1988). On si constructions and the theory of arb. Linguistic Inquiry, 19: 521–81. Cinque, G. (1990). Types of A′ Dependencies. Cambridge, MA: MIT Press. Cinque, Guglielmo, and Rizzi, Luigi (2010). Mapping Spatial PPs. New York: Oxford University Press. Clahsen, H. and Almazan, M. (1998). Syntax and morphology in Williams syndrome. Cognition, 68: 167–98. Clahsen, Harald and Penke, Martina (1992). The acquisition of agreement morphology and its syntactic consequences: New evidence on German child language from the Simone-corpus. In Jürgen Meisel (ed.), The Acquisition of Verb Placement. Dordrecht: Kluwer Academic Publishers. Clahsen, H. and Temple, C. (2002). Words and rules in children with Williams Syndrome. In Y. Levy and J. Schaeffer (eds), Language Competence across Populations: Towards a Definition of Specific Language Impairment. Dordrecht: Kluwer, 323–52. Clahsen, Harald, Rothweiler, Monika, Woest, Andreas, and Marcus, Gary (1992). Regular and irregular inflection in the acquisition of German noun plurals. Cognition, 45: 225–55. Clahsen, H., Eisenbeiss, S., and Vainikka, A. (1994). The seeds of structure. In T. Hoekstra and B. Schwartz (eds), Language Acquisition Studies in Generative Grammar. Amsterdam: John Benjamins, 85–118. Clahsen, H., Eisenbeiss, S., and Penke, M. (1995). Lexical learning in early syntactic development. In H. Clahsen (ed.), Generative Perspectives on Language Acquisition. Amsterdam: Benjamins, 129–60. Clahsen, H., Kursawe, C., and Penke, M. (1996). Introducing CP: Wh-questions and subordinate clauses in German child language. In C. Koster and F. Wijnen (eds), Proceedings of the Groningen Assembly on Language Acquisition. Groningen: Centre for Language and cognition, 5–22. Clahsen, H., Aveledo, F. and Roca, I. (2002). The development of regular and irregular verb inflection in Spanish child language. Journal of Child Language, 29: 591–622. Clancy, P. M. (1985). The acquisition of Japanese. In D. I. Slobin (ed.), The Crosslinguistic Study of Language Acquisition, vol. 1. Hillsdale, NJ: Erlbaum. Clancy, P. (1993). Preferred argument structure in Korean acquisition. In E. V. Clark (ed.), Proceedings of the 25th Annual Child Language Research Forum. Stanford, CA: Center for the Study of Language and Information, 307–14. Clancy, P. M. (1996). Referential strategies and the co-construction of argument structure in Korean acquisition. In B. Fox (ed.), Studies in Anaphora. Amsterdam: John Benjamins, 33–68.

References 841 Clark, Alexander, and Eyraud, Rémi (2007). Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research, 8: 1725–45. Clark, A. and Karmiloff-Smith, A. (1993). The cognizer’s innards: A psychological and philosophical perspective on the development of thought. Mind and Language, 8: 487–519. Clark, Alexander, and Lappin, Shalom (2011). Linguistic Nativism and the Poverty of the Stimulus. Oxford: Wiley-Blackwell. Clark, Alexander, Eyraud, Rémi, and Habrard, Amaury (2010). Using contextual representations to efficiently learn context-free languages. Journal of Machine Learning Research, 11: 2707–44. Clark, E. V. (1972). On the child’s acquisition of antonyms in two semantic fields. Journal of Verbal Learning and Verbal Behavior, 11: 750–8. Clark, E. V. (1978). Awareness of language: Some evidence from what children say and do. In R. J. A. Sinclair and W. Levelt (eds), The Child’s Conception of Language. Berlin: Springer Verlag. Clark, E. V. (1987). The principle of contrast: A constraint on language acquisition. In B. MacWhinney (ed.), Mechanisms of Language Acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates, 1–33. Clark, E. V. (1988). On the logic of contrast. Journal of Child Language, 15: 317–35. Clark, E. V. (1990). On the pragmatics of contrast. Journal of Child Language, 17: 417–31. Clark, Eve (1993). The Lexicon in Acquisition. Cambridge: Cambridge University Press. Clark, Eve V., Hecht, Barbara Frant, and Mulford, Randa C. (1986). Coining complex compounds in English: Affixes and word order in acquisition. Linguistics, 24(1): 7–30. Clark, H. (1970). The primitive nature of children’s relational concepts. In J. R. Hayes and R. Brown (eds), Cognition and the Development of Language. New York: John Wiley and Sons, 260–78. Clark, R. (1970). Concerning the logic of predicate modifiers. Noûs, 4: 311–55. Clark, R. (1989). On the relationship between the input data and parameter setting. In J. Carter and R. M. Dechaine (eds), Proceedings of the 19th Annual Meeting of the North East Linguistic Society (NELS 19), Amherst, MA. Clark, R. (1992). The selection of syntactic knowledge. Language Acquisition, 2: 83–149. Clark, R. and Roberts, I. (1993). A computational model of language learnability and language change. Linguistic Inquiry, 24(2): 299–345. Clarkson, M. G. and Clifton, R. K. (1985). Infant pitch perception: Evidence for responding to pitch categories and the missing fundamental. Journal of the Acoustical Society of America, 77: 1521–7. Clements, George N. (1984). Vowel harmony in Akan: A consideration of Stewart’s word structure conditions. Studies in African Linguistics, 15: 321–37. Clements, George N. (1986). Compensatory lengthening and consonant gemination in Luganda. In Leo Wetzels and Engin Sezer (eds), Studies in Compensatory Lengthening. Dordrecht: Foris, 37–77. Clements, George N. (1990). The role of the sonority cycle in core syllabification. In John Kingston and Mary E. Beckman (eds), Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech. Cambridge: Cambridge University Press, I: 283–333. Clements, George, and Keyser, Jay (1983). CV Phonology: A Generative Theory of the Syllable. Cambridge, MA: MIT Press. Clements, G. N. and Sezer, Engin (1982). Vowel and consonant disharmony in Turkish. In H. van der Hulst and N. Smith (eds), The Structure of Phonological Representations, part 2. Foris. Dordrecht, 213–55.

842 References Clumeck, H. (1980). The acquisition of tone. In G. H. Yeni-Komshian, J. F. Kavanagh, and C. A. Ferguson (eds), Child phonology, Vol 1. Production. New York, Academic Press, 1: 257–75. Coady, Jeffry A. and Aslin, Richard N. (2003). Phonological neighbourhoods in the developing lexicon. Journal of Child Language, 30: 441–69. Coady, Jeffry A. and Richard N. Aslin (2004). Young children’s sensitivity to probabilistic phonotactics in the developing lexicon. Journal of Experimental Child Psychology, 89: 183–213. Coady, Jeffry A. and Evans, Julia L. (2008). Uses and interpretations of non-word repetition tasks in children with and without specific language impairment (SLI). International Journal of Language and Communication Disorders, 43: 1–40. Coates, J. (1987). The acquisition of the meanings of modality in children aged eight and twelve. Journal of Child Language, 15: 425–34. Coetzee, Andries and Pater, Joe (2008a). Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic. Natural Language and Linguistic Theory, 26(2): 289–337. Coetzee, Andries and Pater, Joe (2008b). The place of variation in phonological theory. In John Goldsmith, Jason Riggle, and Alan Yu (eds), The Handbook of Phonological Theory, 2nd edn. Oxford: Blackwell. Cohen, Ariel (2001). On the generic use of indefinite singulars. Journal of Semantics, 18: 183–209. Cohen, Ariel. (2004). Generics and mental representation. Linguistics and Philosophy, 27: 529–56. Cole, P., and Hermon, G. (2000). Partial Wh-movement: Evidence from Malay. In U. Lutz, G. Mueller, and A. von Stechow (eds), Wh-scope Marking: Linguistik Aktuell Linguistics Today. Amsterdam, Philadelphia: John Benjamins, 37: 101–30. Cole, P., Hermon, G., and Yanti (2005). Voice in Malay/Indonesian. Lingua, 118: 1500–53. Colledge, E., Bishop, D. V. M., Koeppen-Schomerus, G., Price, T. S., Happe, F. G. E., Eley, T. C., Dale, P. S., and Plomin, R. (2002). The structure of language abilities at 4 years: A twin study. Developmental Psychology, 38: 749–57. Collins, C. (2005). A smuggling approach to the passive in English. Syntax, 8, 81–120. Compton, A. J. and M. Streeter (1977). Child phonology: Data collection and preliminary analyses. Papers and Reports on Child Language Development, 13:. 99–109. Comrie, Bernard (1976). Aspect. Cambridge, MA: Cambridge University Press. Comrie, Bernard (1977). In defense of spontaneous demotion. In P. Cole and J. Sadock (eds), Grammatical Relations, no. 8 in Syntax and Semantics. London: Academic Press. Condry, K. F., and Spelke, E. S. (2008). The development of language and abstract concepts: The case of natural number. Journal of Experimental Psychology, 137: 22–38. Conroy, A., and Thornton, R. (2003). Children’s knowledge of Principle C in discourse. Proceedings of the Sixth Tokyo Conference on Psycholinguistics. Tokyo: Hituzi Syobo Publishing Company, 69–94. Conroy, A. (2008). The role of verification strategies in semantic ambiguity resolution in children and adults. Ph.D. dissertation, University of Maryland. Conroy, A., and Lidz, J. (2007). Production/comprehension asymmetry in children’s why questions. Paper presented at the Generative Approaches to Language Acquisition North America 2, Cambridge: Cascadilla Press. Conroy, A., Fults, S., Musolino, J., and Lidz, J. (2008). Surface scope as a default: The effect of time in resolving Quantifier Scope Ambiguity. Poster presented at the 21st CUNY Conference on Sentence Processing. Chapel Hill: University of North Carolina, 13 March.

References 843 Conroy, A., Takahashi, E., Lidz, J., and Phillips, C. (2009). Equal treatment for all antecedents: How children succeed with Principle B. Linguistic Inquiry, 40: 446–86. Conway, C. M., and Christiansen, M. H. (2005). Modality-constrained statistical learning of tactile, visual, and auditory sequences. Journal of Experimental Psychology: Learning Memory and Cognition, 31(1): 24–3916. Conwell, E., and Demuth, K. (2007). Early syntactic productivity: Evidence from dative shift. Cognition, 103: 163–79. Coopmans, P., Krul, M., Planting, E., Vlasveld, I., and van Zoelen, A. (2004). Dissolving a Dutch Delay in the acquisition of syntactic and logophoric reflexives. In A. Brugor, L. Micciulla, and C. E. Smith (eds), Proceedings of the 28th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 108–19. Coots, J. H. (1976). Children’s knowledge and acquisition of polar spatial adjectives. Unpublished Ph.D. thesis, University of Utah. Corver, N. (1990). The syntax of left branch extractions. Unpublished Ph.D. thesis, Tilburg University. Corver, N. (1993). A note on subcomparatives. Linguistic Inquiry, 24: 773–81. Côté, Marie-Hélène (2000). Consonant cluster phonotactics: a perception-based approach. Unpublished Ph.D. thesis, MIT. Couto, J. M., Gomez, L., Wigg, K., Cate-Carter, T., Archibald, J., Anderson, B., Tannock, R., Kerr, E. N., Lovett, M. W., Humphries, T., and Barr, C. L. (2008). The KIAA0319-like (KIAA0319L) gene on chromosome 1p34 as a candidate for reading disabilities. Journal of Neurogenetics, 22: 295–313. Crago, M., Paradis, J., and Menn, L. (2008). Crosslinguistic perspectives on syntax and semantics of language disorders. In M. J. Ball and M. Perkins (eds), The Handbook of Clinical Linguisitcs. Malden, MA: Blackwell. Crain, S. (1991). Language acquisition in the absence of experience. Behavioral and Brain Sciences, 14: 597–650. Crain, S., and Fodor, J. D. (1993). Competence and performance in child language. In E. Dromi (ed.), Language and Cognition: A Developmental Perspective. Norwood, NJ: Ablex, 141–7 1. Crain, S. and Khlentzos, D. (2008). Is logic innate? Biolinguistics, 2(1): 24–56. Crain, S. and Khlentzos, D. (2010). The logic instinct. Mind and Language, 25(1): 30–65. Crain, S. and Nakayama, M. (1987). Structure dependence in grammar formation. Language, 63: 522–43. Crain, S. and McKee, C. (1985). The acquisition of structural restrictions on anaphora. Proceedings of NELS 15. Amherst, MA: GLSA Publications, 94–110. Crain, S. and Pietroski, P. (2001). Nature, nurture and Universal Grammar. Linguistics and Philosophy, 24(2): 139–86. Crain, S. and Thornton, R. (1998). Investigations in Universal Grammar: A Guide to Experiments on the Acquisition of Syntax and Semantics. Cambridge, MA: MIT Press. Crain, Stephen, Thornton, Rosalind, and Murasugi, Keiko (1987). Capturing the evasive passive. Paper presented at the Boston University Conference on Language Development. Reprinted in Language Acquisition, 2009, 16: 123–33. Crain, S., Ni, W., and Conway, L. (1994). Learning, parsing and modularity. In C. Clifton, L. Frazier, and K. Rayner (eds), Perspectives on Sentence Processing. Hillsdale, NJ: Lawrence Erlbaum. Crain, S., Thornton, R., Boster, C., Conway, L., Lillo-Martin, D., and Woodams, E. (1996). Quantification without qualification. Language Acquisition, 5(2): 83–153.

844 References Crain, S., Gardner, A., Gualmini, A., and Rabbin, B. (2002). Children’s command of negation. Proceedings of the Third Tokyo Conference on Psycholinguistics. Tokyo: Hituzi Publishing Company, 71–95. Crain, S., Thornton, R., and Murasugi, K. (2009). Capturing the evasive passive. Language Acquisition, 16: 123–33. Crawford, Jean (2004). An adversity passive analysis of early Sesotho passives: Reanalyzing a counterexample to maturation. Poster presentation at GALANA 1. Crawford, J. (2009). Sesotho Passives: The long and short of it. In J. Chandlee, M. Franchini, S. Lord, and G. M. Rheiner (eds), Proceedings of the 33rd annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Crawford, J. (2012). Developmental perspectives on the acquisition of the passive. Unpublished Ph.D. dissertation, University of Connecticut. Cresswell, M. J. (1976). The semantics of degree. In B. H. Partee (ed.), Montague Grammar. New York: Academic Press, 261–92. Cristià, Alejandrina and Seidl, Amanda (2008). Is infants’ learning of sound patterns constrained by phonological features? Language Learning and Development, 4: 203–27. Cristià, Alejandrina, Seidl, Amanda, and Gerken, LouAnn (2011). Learning classes of sounds in infancy. University of Pennsylvania Working Papers in Linguistics, 17: 69–76. Croft, W. A. (1990). Possible verbs and the structure of events. In S. L. Tsohatzidis (ed.), Meanings and Prototypes: Studies in Linguistic Categorization. London: Routledge, 48–73. Croft, W. A. (1991). Syntactic Categories and Grammatical Relations. Chicago: University of Chicago Press. Croft, W. A. (1994). The semantics of subjecthood. In M. Yaguello (ed.), Subjecthood and Subjectivity: The Status of the Subject in Linguistic Theory. Paris: Ophrys, 29–75. Croft, W. A. (1998). Event structure in argument linking. In M. Butt and W. Geuder (eds), The Projection of Arguments: Lexical and Syntactic Constraints. Stanford: CSLI Publications, 21–63. Croft, William A. and Cruse, D. A. (2004). Cognitive Linguistics (Cambridge Textbooks in Linguistics). Cambridge: Cambridge University Press. Crystal, D. (1979). Prosodic development. In P. Fletcher and M. Garman (eds), Language Acquisition: Studies in First Language Development. Cambridge: Cambridge University Press, 174–97. Culicover, P. W. and Jackendoff, R. (2005). Simpler Syntax. Oxford: Oxford University Press. Cunillera, T., Toro, J. M., Sebastian-Galles, N., and Rodriguez-Fornells, A. (2006). The effects of stress and statistical cues on continuous speech segmentation: An event-related brain potential study. Brain Research, 1123: 168–78. Cunillera, T., Gomila, A., and Rodríguez-Fornells, A. (2008). Beneficial effects of word final stress in segmenting a new language: Evidence from ERPs. BMC Neuroscience, 18: 9–23. Cunillera,T., Càmara, E, Laine, M., and Rodríguez-Fornells, A. (2010). Words as anchors known words facilitate statistical learning. Experimental Psychology, 57: 134–41. Currie-Hall, Kathleen (2006). Finding vowels without phonology? In Montreal-Ottawa- Toronto Phonology Workshop. Available from . Currie Hall, Kathleen and Smith, E. Allyn (2006). Finding vowels without phonology? Handout from presentation at Montreal-Ottawa-Toronto Phonology Workshop. Curtin, S. (2009). Twelve-month-olds learn novel word–object pairings differing only in stress pattern. Journal of Child Language, 36: 1157–65.

References 845 Curtin, S. (2010). Young infants encode lexical stress in newly encountered words. Journal of Experimental Child Psychology, 105: 376–85. Curtin, S. (2011). Do newly formed word representations encode non-criterial information? Journal of Child Language, 38: 904–17. Curtin, S. and Zuraw, K. (2002). Explaining constraint demotion in a developing system. Boston University Conference on Language Development, 26: 118–29. Curtin, S., Mintz, T. H. and Christiansen, M. H. (2005). Stress changes the representational landscape: Evidence from word segmentation. Cognition, 96: 233–62. Curtiss, S., Katz, W., and Tallal, P. (1992). Delay versus deviance in the language acquisition of language-impaired children. Journal of Speech, Language and Hearing Research, 35: 373–83. Cutler, A and Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabulary. Computer Speech and Language, 2: 133–42. Cutler, A. and Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: human perception and performance, 14: 113–21. Cutler, A., Mehler, J., Norris, D. G., and Segui, J. (1986). The syllable’s differing role in the segmentation of French and English. Journal of Memory and Language, 25: 385–400. Dabrowska, E., Rowland, C., and Theakston, A. (2009). The acquisition of questions with long- distance dependencies. Cognitive Linguistics, 20(3): 571–97. Daelemans, W. (2002). Review of: in Knowledge and Learning Language, Charles D. Yang (ed.). Glot International, 6(5): 137–42. Daelemans, W. S., Gillis, S., and Durieux, G. (1994). The acquisition of stress: A data-oriented approach. Computational Linguistics, 20: 421–51. Dahan, D., and Brent, M. R. (1999). On the discovery of novel wordlike units from utterances: An artificial- language study with implications for native- language acquisition. Journal of Experimental Psychology: General, 128: 165–85. Dahl, Östen (1985). Tense and Aspect Systems. Oxford: Blackwell. Dahl, Östen (ed.) (2000). Tense and Aspect in the Languages of Europe. Berlin: Mouton de Gruyter. Daland, Robert, Hayes, Bruce, White, James, Garellek, Marc, Davis, Andrea, and Norrmann, Ingrid (2011). Explaining sonority projection effects. Phonology, 28: 197–234. Dale, P.S., Dionne, G., Eley, T. C., and Plomin, R. (2000). Lexical and grammatical development: A behavioral genetic perspective. Journal of Child Language, 27: 619–42. Dalston, Rodger M. (1972). A spectrographic analysis of the spectral and temporal acoustic characteristics of English semivowels spoken by three- year-old children and adults. Unpublished Ph.D. thesis, Northwestern University. Daugherty, K. and Seidenberg, M. (1994). Beyond rules and exceptions: A connectionist approach to inflectional morphology. In S. D. Lima, R. L. Corrigan, and G. K. Iverson (eds), The Reality of Linguistic Rules. Amsterdam: John Benjamins. Davidiak, E., and Grinstead, J. (2004). Root nonfinite forms in child Spanish. Paper presented at the Generative Approaches to Language Acqusition North America. Davidson, Lisa and Goldrick, Matthew (2003). Tense, agreement and defaults in child Catalan: An optimality theoretic analysis. In Silvina Montrul and Francisco Ordóñez (eds), Linguistic Theory and Language Development in Hispanic Languages: Papers from the 5th Hispanic linguistics symposium and the 4th conference on the acquisition of Spanish and Portuguese. Somerville, MA: Cascadilla Press, 193–211. Davidson, L., Jusczyk, P. W., and Smolensky, P. (2004). The initial and final states: Theoretical implications and experimental explorations of Richness of the base. In R. Kager, J. Pater, and

846 References W. Zonneveld (eds) (2004). Constraints in Phonological Acquisition. Cambridge: Cambridge University Press, 321–68. Davis, Barbara L. and Macneilage, Peter F. (1990). Acquisition of correct vowel production: A quantitative case study. Journal of Speech and Hearing, 33: 16–27. Davis, Barbara L. and MacNeilage, Peter F. (1995). The articulatory basis of babbling. Journal of Speech and Hearing Research, 38: 1199–211. Davis, Stuart (1990). Italian onset structure and the distribution of il and lo. Linguistics, 28: 43–55. Dayal, V. (1994). Scope marking as indirect wh dependency. Natural Language and Linguistic Theory, 2: 137–70. Dayal, Veneeta (2004). Number marking and (in)definiteness in kind terms. Linguistics and Philosophy, 27: 393–450. Dayal, Veneeta (2009). Semantic variation and pleonastic determiners: The case of the plural definite generic. In Nguyen Chi Duy Khuong Richa and Samar Sinha (eds), The Fifth Asian GLOW: Conference. Proceedings, CIIL (Mysore) and FOSSSIL (New Delhi). De Crousaz, I. and Shlonsky, U. (2003). The distribution of a subject clitic pronoun in a Franco- Provençal dialect and the licensing of pro. Linguistic Inquiry, 34: 413–42. De Haan, F. (2009). Irrealis: Fact or fiction? Linguistic Sciences. de Lacy, Paul (2002). The formal expression of markedness. Unpublished Ph.D. thesis, University of Massachusetts. de Lacy, Paul (2006). Markedness: Reduction and preservation in phonology. Cambridge: Cambridge University Press. De Lemos, Claudia (1981). Interactional processes in the child’s construction of language. In W. Deutsch (ed.), The Child’s Construction of Language. New York: Academic Press, 57–76. De Neys, W., and Schaeken, W. (2007). When people are more logical under cognitive load: Dual task impact on scalar implicature. Experimental Psychology, 54: 128–33. de Villiers, J. (1991). Why questions. In T. Maxwell and B. Plunkett (eds), Papers in the Acquisition of wh: Proceedings of the UMass Roundtable, May, 1990. Amherst, MA: University of Massachusetts Occasional Papers. de Villiers, J. G. (1995). Steps in the mastery of sentence complements. Paper presented at the biennial meeting of the Society for Research in Child Development. Indianapolis, IN. de Villiers, J. G. (1999). On acquiring the structural representations for false complements. In Hollebrandse, B. (ed.), New Perspectives on Language Acquisition. Amherst, MA: University of Massachusetts Occasional Papers. de Villiers, J. G. (2001a). Extension, intension and other minds. In M. Almgren, A. Barrena, M.-J. Ezeizabarrena, I. Idiazabal, and B. MacWhinney (eds), Research in Child Language Acquisition: Proceedings of the 8th Conference of the International Association for the Study of Child Language. Somerville, MA: Cascadilla Press. de Villiers, J. (2001b). Language acquisition, point of view and possible worlds. Introduction to symposium. In M. Almgren, A. Barrena, M.-J. Ezeizabarrena, and B. MacWhinney (eds), Research in Child Language Acquisition: Proceedings of the 8th Conference of the International Association for the Study of Child Language. Somerville, MA: Cascadilla Press. de Villiers J. G. (2004). Getting complements on your mental state (verbs). In J. Van Kampen and Sergio Baauw (eds), Proceedings of 2003 GALA conference. Utrecht: LOT, 13–26. de Villiers, J. G. (2005). Can language acquisition give children a point of view? In J. Astington and J. Baird (eds), Why Language Matters for Theory of Mind. New York: Oxford University Press.

References 847 de Villiers, J. (2007). The interface of language and theory of mind. Lingua, 117(11): 1858–78. de Villiers, J. G. and de Villiers, P. A. (1973a). A crosssectional study of the acquisition of grammatical morphemes. Journal of Psycholinguistic Research, 2: 267–78. de Villiers, J. and de Villiers, P. (1973b). Development of the use of word order in comprehension. Journal of Psycholinguistic Research, 2: 331–41. de Villiers, J. G. and de Villiers, P.A. (2000). Linguistic determinism and false belief. In P. Mitchell and K. Riggs (eds), Children’s Reasoning and the Mind. Hove: Psychology Press. de Villiers, J. G. and de Villiers, P.A. (2009). Complements enable representation of the contents of false belief: Evolution of a theory. In S. Foster-Cohen (ed.), Language Acquisition. Basingstoke: Palgrave Macmillan. de Villiers J. G. and Pyers, J. (2002). Complements to cognition: A longitudinal study of the relationship between complex syntax and false-belief understanding. Cognitive Development, 17: 1037–60. de Villiers, J. and Roeper, T. (1991). Introduction: The acquisition of wh-questions. In B. Plunkett and T. Maxfield (eds), University of Massachusetts Occasional Papers in Linguistics. The Acquisition of wh. de Villiers, J. and Roeper, T. (1995). Relative clauses are barriers to Wh-movement for young children. Journal of Child Language, 22: 389–404. de Villiers, J., Roeper, T., and Vainikka, A. (1990). The acquisition of long distance rules. In L. Frazier and J. G. de Villiers (eds), Language Processing and Acquisition. Dordrecht: Kluwer. de Villiers, J. G., Curran, L., Philip, W., and DeMunn, H. (1998). Acquisition of the quantificational properties of mental predicates. In Proceedings of the 22nd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. de Villiers, J., Cahillane, J., and Altreuter, E. (2005). Touchy subject: Optimality and coreference. Unpublished manuscript, Smith College. de Villiers, J., Cahillane, J., and Altreuter, E. (2006). What can production reveal about Principle B? In K. U. Deen, J. Nomura, B. Schulz, and B.D. Schwartz (eds), Proceedings of the Inaugural Conference on Generative Approaches to Language Acquisition—North America, University of Connecticut Occasional Papers in Linguistics 4 (1). de Villiers, J. G., Roeper, T., Bland-Stewart, L., and Pearson, B. (2008). Answering hard questions: Wh-movement across dialects and disorder. Applied Psycholinguistics, 29: 67–103. de Villiers, J. G., de Villiers, P. A., and Roeper, T. (2011). Wh-questions: Moving beyond the first phase. Lingua, 121(3): 352–66. de Villiers, J. G., Harrington, E., Gadilauskas, E., and Roeper, T. (2012). Tense and truth in children’s question answering. In A. Biller, E. Chung, and A. Kimball (eds), Proceedings of the 36th Annual Boston University Conference on Language Development, Somerville, MA: Cascadilla Press. de Villiers, P. and de Villiers, J. (1978). Language Acquisition. Cambridge, MA: Harvard University Press. de Villiers, P. A., Burns, F., and Pearson, B.Z. (2003). The role of language in the theory of mind development of language-impaired children: Complementing theories. In B. Beachley, A. Brown, and F. Conlin (eds), Proceedings of the 27th Annual Boston University Conference on Language Development, Somerville, MA: Cascadilla Press. de Villiers, P. A., de Villiers, J. G., Coles-White, D., and Carpenter, L. A. (2009). Acquisition of relevance implicatures in typically-developing children and children with autism. In J. Chandlee, M. Franchini, S. Lord, and G. Rheiner (eds), Proceedings of the 33rd Annual Boston University Conference on Language Development. Boston, MA: Cascadilla Press, 1: 121–32.

848 References DeBoer, Bart and Kuhl, Patricia K. (2003). Investigating the role of infant-directed speech with a computer model. Acoustics Research Letters Online, 4: 129–34. Deen, K. U. (2003). The acquisition of Nairobi Swahili: The morphosyntax of inflectional prefixes and subjects. Unpublished Ph.D. thesis, University of California. Deen, Kamil Ud (2005). The Acquisition of Swahili. Amsterdam: John Benjamins. Deen, Kamil Ud and Hyams, Nina (2006). The morphosyntax of mood in early grammar with special reference to Swahili. First Language, 26(1): 67–102. DeFries, J. C. and Fulker, D. W. (1985). Multiple regression analysis of twin data. Behavior Genetics, 15(5): 467–73. Dehaene, S. (1997). The Number Sense: How the Mind Creates Mathematics. New York: Oxford University Press. Dehaene, S. (2009). Origins of mathematical intuitions. Annals of the New York Academy of Sciences, 1156(1): 232–59. Dehé, Nicole, Jackendoff, Ray, McIntyre, Andrew, and Urban, Silke (2002). Verb–Particle Explorations. Berlin/New York: Mouton de Gruyter. Delattre, P. C. (1966). A comparison of syllable length conditioning among languages. International Journal of Applied Linguistics, 4: 182–98. Delfitto, D. (2002). On the semantics of pronominal clitics and some of its consequences. Catalan Journal of Linguistics, 1: 41–69. Delidaki, Sophia (2006). The acquisition of tense and aspect in child Greek. Unpublished Ph.D. thesis, University of Reading. Dell, G. S., Reed, K. D., Adams, D. R., and Meyer, A. S. (2000). Speech errors, phonotactic constraints, and implicit learning: A study of the role of experience in language production. Journal of Experimental Psychology: Learning, memory, and cognition, 26: 1355–67. Demirdache, Hamida, Uribe-Etxebarria, Myriam (2000). The primitives of temporal relations. R. Martin, D. Michaels, and J. Uriagereka (eds), Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik. Cambridge, MA: MIT Press, 157–86. Demirdache, Hamida, and Uribe-Etxebarria, Myriam (2007). The syntax of time arguments. Lingua, 117: 330–66. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1): 1–38. Demuth, Katherine (1989). Maturation and the acquisition of Sesotho passive. Language, 65: 56–80. Demuth, K. (1990). Subject, topic and the Sesotho passive. Journal of Child Language, 17: 67–84. Demuth, K. (1993). Issues in the acquisition of the Sesotho tonal system. Journal of Child Language, 20: 275–301. Demuth, K. (1994). On the “underspecification” of functional categories in early grammars. In B. Lust, M. Suñer, and J. Whitman (eds), Syntactic Theory and First Language Acquisition: Cross-linguistic perspectives. Hillsdale, NJ: Lawrence Erlbaum Associate, 119–34. Demuth, K. (1995a). The acquisition of tonal systems. In J. Archibald (ed.), Phonological Acquisition and Phonological Theory. Hillsdale, Lawrence Erlbaum, 111–34. Demuth, Katherine (1995b). Markedness and the development of prosodic structure. In Jill Beckman (ed.), Proceedings of the North East Linguistic Society 25. Volume 2: Papers from the Workshops on Language Acquisition and Language Change. GLSA. University of Massachusetts, Amherst, 2: 13–25.

References 849 Demuth, K. (1996). The prosodic structure of early words. In J. L. Morgan and K. Demuth (eds), Signal to Syntax: Bootstrapping from speech to grammar in early acquisition. Mahwah, Lawrence Erlbaum, 171–84. Demuth, K. (1998). Argument structure and the acquisition of Sesotho applicatives. Linguistics, 36: 781–806. Demuth, Katherine (2001). Prosodic constraints on morphological development. In J. Weissenborn and B. Höhle (eds), Approaches to Bootstrapping: Phonological, syntactic and neurophysiological aspects of early language acquisition. Amsterdam: John Benjamins, 3–21. Demuth. K. (2008). Exploiting corpora for language acquisition research. In H. Behrens (ed.), Corpora in Language Acquisition Research: History, methods, perspectives. Amsterdam: John Benjamins, 199–205. Demuth, Katherine (2009). The prosody of syllables, words and morphemes. In E. Bavin (ed.), Cambridge Handbook on Child Language. Cambridge: Cambridge University Press, 183–98. Demuth, Katherine (2011). The acquisition of phonology. In John A. Goldsmith, Jason Riggle, and Alan Yu (eds), The Handbook of Phonological Theory. Malden, MA: Blackwell, 571–95. Demuth Katherine and Ellis, David (2009). Revisiting the acquisition of Sesotho noun class prefixes. In J. Guo, E. Lieven, S. Ervin-Tripp, N. Budwig, Seydazçalikan, and K. Nakamura (eds), Crosslinguistic Approaches to the Psychology of Language: Festschrift for Dan Slobin. Hillsdale, NJ: Lawrence Erlbaum, 131–48. Demuth, Katherine and Fee, E. Jane (1995). Minimal words in early phonological development. Unpublished manuscript. Demuth, Katherine and McCullough, E. (2009). The prosodic (re)organization of children’s early English articles. Journal of Child Language, 36: 173–200. Demuth, Katherine and Tremblay, Annie (2008). Prosodically-conditioned variability in children’s production of French determiners. Journal of Child Language, 35: 99–127. Demuth, K., Machobane, M., and Moloi, F., and Odato, C. (2005). Learning animacy hierarchy effects in Sesotho double object applicatives. Language, 81(2): 421–47. Demuth, Katherine, Culbertson, Jennifer, and Alter, Jennifer (2006). Word-minimality, epenthesis and coda licensing in the early acquisition of English. Language and Speech, 49: 137–74. Demuth, Katherine, Moloi, Francina, and Machobane, Malillo (2010). Three-year-olds’ comprehension, production and generalization of Sesotho passives. Cognition, 115: 238–51. den Besten, H., and Edmondson, J. (1983). The verbal complex in continental West Germanic. In W. Abraham (ed.), On the Formal Syntax of the Westgermania. Papers from the ’3rd Groningen Grammar Talks. Amsterdam/Philadelphia: John Benjamins, 155–216. Dennis, M., Sugar, J., and Whitaker, H. A. (1982). The acquisition of tag questions. Child Development, 53: 1254–7. Déprez, V. and Pierce, A. (1993). Negation and functional projections in early grammar. Linguistic Inquiry, 24(1): 25–67. Déprez, V., and Pierce, A. (1994). Crosslinguistic evidence for functional projections in early child grammar. In Teun Hoekstra and Bonnie D. Schwartz (eds), Language Acquisition Studies in Generative Grammar, Amsterdam. Amsterdam: John Benjamins Publishing Company, 57–84. Derwing, B. L. and Baker, W. J. (1986). Assessing morphological development. In P. J. Fletcher and M. Garman (eds), Language Acquisition: Studies in first language development, 2nd edn. Cambridge: Cambridge University Press, 326–38.

850 References Di Sciullo, A. M. and Agüero-Bautista, C. (2008). The delay of Principle B Effect (DPBE) and its absence in some languages. Language and Speech, 51(1&2): 77–100. Diesendruck, G. (2005). The principles of conventionality and contrast in word learning: An empirical examination. Developmental Psychology, 41: 451–63. Diesendruck, G. and Markson, L. (2001). Children’s avoidance of lexical overlap: A pragmatic account. Developmental Psychology, 37: 630–41. Diesing, Molly (1992). Indefinites. Cambridge, MA: MIT Press. Diessel, H. (2004). The Acquisition of Complex Sentences. Cambridge Studies in Linguistics, 105. Cambridge: Cambridge University Press. Diessel, H. and Tomasello, M. (2001). The acquisition of finite complement clauses in English: A corpus-based analysis. Cognitive Linguistics, 12: 1–45. Dietrich, C., Swingley, D., and Werker, J. (2007). Native language governs interpretation of salient speech sound differences at 18 months. Proceedings of the National Academy of Sciences, 104(41): 16027–31. Dikken, Marcel den (1995). Particles: On the Syntax of Verb-particle, Triadic and Causative Constructions. New York: Oxford University Press. Dillon, B., Dunbar, E., and Idsardi, B. (2011). A single stage approach to learning phonological categories: Insights from Inuktitut. Manuscript, University of Massachusetts, Amherst, and University of Maryland, College Park. Dillon, Brian, Dunbar, Ewan, and Idsardi, William (2013). A single-stage approach to learning phonological categories: Insights from Inuktitut. Cognitive Science, 37: 344–77. Dinnsen, Daniel A. (1984). Methods and empirical issues in analyzing functional misarticulation. In Mary Elbert, Daniel A. Dinnsen, and Gary Weismer (eds), Phonological Theory and the Misarticulating Child (ASHA monographs no. 22). Rockville, MD: American Speech- Language-Hearing Association, 5–17. Dinnsen, Daniel A. (1992). Variation in developing and fully developed phonologies. In Charles A. Ferguson, Lise Menn, and Carol Stoel- Gammon (eds), Phonological Development: Models, research, implications. Timonium, MD: York Press, 191–210. Dinnsen, Daniel A. (1996). Context effects in the acquisition of fricatives. In UBC International Conference on Phonological Acquisition, 136–48. Dinnsen, Daniel A. (2008). Fundamentals of Optimality Theory. In Daniel A. Dinnsen and Judith A. Gierut (eds), Optimality Theory, Phonological Acquisition and Disorders. London: Equinox Publishing Ltd., 3–36. Dinnsen, Daniel A. (2011). On the unity of children’s phonological error patterns: Distinguishing symptoms from the problem. Clinical Linguistics and Phonetics, 25: 968–74. Dinnsen, Daniel A. and Barlow, Jessica A. (1998). On the characterization of a chain shift in normal and delayed phonological acquisition. Journal of Child Language, 25: 61–94. Dinnsen, Daniel A. and Elbert, Mary (1984). On the relationship between phonology and learning. In Mary Elbert, Daniel A. Dinnsen, and Gary Weismer (eds), Phonological Theory and the Misarticulating Child (ASHA monographs no. 22). Rockville, MD: ASHA, 59–68. Dinnsen, Daniel A. and Farris-Trimble, Ashley W. (2008a). An opacity-tolerant conspiracy in phonological acquisition. In Ashley W. Farris-Trimble and Daniel A. Dinnsen (eds), Phonological Opacity Effects in Optimality Theory. Bloomington, IN: IULC Publications, 99–118. Dinnsen, Daniel A. and Farris-Trimble, Ashley W. (2008b). The prominence paradox. In Daniel A. Dinnsen and Judith A. Gierut (eds), Optimality Theory, Phonological Acquisition and Disorders. London: Equinox Publishing Ltd., 277–308.

References 851 Dinnsen, Daniel A. and Gierut, Judith A. (2008). Optimality Theory, Phonological Acquisition and Disorders. London: Equinox Publishing Ltd. Dinnsen, Daniel A. and Edith. M. Maxwell (1981). Some phonology problems from functional speech disorders. Innovations in Linguistics Education, 2: 79–98. Dinnsen, Daniel A., Chin, Steven B., Elbert, Mary, and Powell, Thomas W. (1990). Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. Journal of Speech and Hearing Research, 33: 28–37. Dinnsen, Daniel A., Chin, Steven B., and Elbert, Mary (1992). On the lawfulness of change in phonetic inventories. Lingua, 86: 207–22. Dinnsen, Daniel A., O’Connor, Kathleen M., and Gierut, Judith A. (2001). The puzzle- puddle-pickle problem and the Duke-of-York gambit in acquisition. Journal of Linguistics, 37: 503–25. Dionne, G., Dale, P. S., Boivin, M., and Plomin, R. (2003). Genetic evidence for bidirectional effects of early lexical and grammatical development. Child Development, 74: 394–412. Dixon, R. M. W. (1994). Ergativity. Cambridge: Cambridge University Press. Dobrovie-Sorin, Carmen and Laca, Brenda (1998). La génericité entre la référence à l’espèce et la quantification générique. Actes de Langues et Grammaires, III: 165–79. Donaldson, M., and Balfour, G. (1968). Less is more: A study of language comprehension in children. British Journal of Psychology, 59: 461–7 1. Donaldson, M., and Wales, R. J. (1970). On the acquisition of some relational terms. In J. R. Hayes and R. Brown (eds), Cognition and the Development of Language. New York: John Wiley and Sons, Inc., 235–268. Donegan, Patricia J. and Stampe, David (1979). The study of natural phonology. In Daniel A. Dinnsen (ed.), Current Approaches to Phonological Theory. Bloomington, IN: Indiana University Press, 126–73. Dowty, D. R. (1979). Word Meaning and Montague Grammar: The semantics of verbs and times in generative semantics and in Montague’s PTQ. Dordrecht: D. Reidel Publishers. Dowty, D. R. (1989). On the semantic content of the notion “hematic role.” In G. Chierchia, B. H. Partee, and R. Turner (eds), Properties, Types, and Meaning. Dordrecht: Kluwer, 69–130. Dowty, D. R. (1991). Thematic proto-roles and argument selection. Language, 67: 547–619. Doya, Kenji, Ishii, Shin, Pouget, Alexandre, and Rao, Rajesh P. N. (2007. The Bayesian Brain: Probabilistic Approaches to Neural Coding. Cambridge, MA: MIT Press. Drachman, Gaberell (1978). Child language and language change: A conjecture and some refutations. In Jacek Fisiak (ed.), Recent Developments in Historical Phonology. The Hague: Mouton, 123–44. Dresher, B. E. (1999). Charting the learning path: Cues to parameter setting. Linguistic Inquiry, 30(1): 27–67. Dresher, B. Elan (2009). The Contrastive Hierarchy in Phonology. Cambridge: Cambridge University Press. Dresher, B. E. and Kaye, J. D. (1990). A computational learning model for metrical phonology. Cognition, 34: 137–95. Driva, E. and Terzi, A. (2007). Children’s passives and the theory of grammar. In A. Gavarrò and M. J. Freitas (eds), Proceedings of GALA. Newcastle: Cambridge Scholar Publishers. Drozd, K. (2000). Children’s weak interpretation of universally quantified sentences. In M. Bowerman and S. Levinson (eds), Conceptual Development and Language Acquisition. Cambridge: Cambridge University Press, 340–76. Duanmu, San (2007). The Phonology of Standard Chinese, 2nd edn. Oxford: Oxford University Press.

852 References Duda, R., Hart, P., and Stork, D. (2000). Pattern Classification. Oxford: Wiley-Interscience. Duffley, P. (1992). The English Infinitive. London: Longman. Dunbar, Ewan, and Idsardi: William J. (2010). Review of Daniel Silverman (2006). A critical introduction to phonology: of sound, mind, and body. In Phonology, 27: 325–31. Dyck, Carrie (1995). Constraining the phonology–phonetics interface: With exemplification from Spanish and Italian dialects. Unpublished Ph.D. thesis, University of Toronto. Ebbels, S. and van der Lely, H. (2001). Metasyntactic therapy using visual coding for children with severe persistent SLI. International Journal of Communication Disorders, 36 (Supplement): 345–50. Ebeling, K. S. and Gelman, S. A. (1988). Coordination of size standards by young children. Child Development, 59: 888–96. Ebeling, K. S., and Gelman, S. A. (1994). Children’s use of context in interpreting big and little. Child Development, 65: 1178–92. Echols, C. and Newport, E. (1992). The role of stress and position in determining first words. Language Acquisition, 2: 189–220. Echols, C., Crowhurst, M. J. and Childers, J. B. (1997). The perception of rhythmic units in speech by infants and adults. Journal of Memory and Language, 36: 202–25. Eddington, D. (2000). Spanish stress assignment within the analogical modeling of language. Language, 76: 92–109. Edwards, Mary Louise (1974). Perception and production in child phonology: The testing of four hypotheses. Journal of Child Language, 1: 205–19. Edwards, Jan, and Beckman, Mary (2008). Some cross-linguistic evidence for modulation of implicational universals by language-specific frequency effects in phonological development. Language Learning and Development, 4: 122–56. Edwards, Mary Louise and Shriberg, Lawrence D. (1983). Phonology: Applications in communicative disorders. San Diego, CA: College-Hill Press. Edwards, Jan, Beckman, Mary E., and Munson, Benjamin (2004). The interaction between vocabulary size and phonotactic probability effects on children’s production accuracy and fluency in nonword repetition. Journal of Speech, Language, and Hearing Research, 47: 421–36. Ehri, L. C. (1976). Comprehension and production of adjectives in seriation. Journal of Child Language, 3: 369–84. Eilers, R. E. and Oller, D. Kimbrough (1976). The role of speech discrimination in developmental sound substitutions. Journal of Child Language, 3: 319–29. Eilers, Rebecca E., Wilson, Wesley R., and Moore, John M. (1977). Development of changes in speech discrimination in infants. Journal of Speech and Hearing Research, 20: 766–80. Eimas, Peter D. and Miller, Joanne L. (1981). Organization in the perception of segmental and suprasegmental information by infants. Infant Behavior and Development, 4: 395–9. Eimas, Peter D., Siqueland, Einar R., Jusczyk, Peter W., and Vigorito, James (1971). Speech perception in infants. Science, 171: 303–6. Eisenbeiss, Sonja (1993). Auxilliaries and the acquisition of the passive. In Eve Clark (ed.), The Proceedings of the 25th Annual Child Language Research Forum. Stanford, CA: Center for the Study of Language and Information, 235–42. Eisenbeiß, S. (2000). The acquisition of the determiner phrase in German child language. In M.-A. Friedemann and L. Rizzi (eds), The Acquisition of Syntax: Studies in Comparative Developmental Linguistics. London: Longman, 26–63. Eisenbeiss, S., Bartke, S., and Clahsen, H. (2005/06). Structural and lexical case in child German: Evidence from language-impaired and typically-developing children. Language Acquisition, 13(1): 3–32.

References 853 Eisenbeiß, S., Matsuo, A., and Sonnenstuhl, I. (2009). Learning to encode possession. In W. McGregor (ed.), The Expression of Possession. Berlin: deGruyter, 143–211. Eisenberg, S. and Cairns, H. (1994). The development of infinitives from three to five. Journal of Child Language, 21: 713–34. Eisner, Jason (1997). Efficient generation in primitive Optimality Theory. In Proceedings of the 35th Annual ACL and 8th EACL. Madrid, 313–20. Eisner, Jason (2000a). Directional constraint evaluation in Optimality Theory. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000). Saarbrücken, Germany, 257–63. Eisner, Jason (2000b). Easy and hard constraint ranking in Optimality Theory: Algorithms and complexity. In Jason Eisner, Lauri Karttunen, and Alain Theriault (eds), Proceedings of the 5th Workshop of the ACL Special Interest Group in Computational Phonology. Luxembourg, 22–33. Eisner, Jason (2002). Comprehension and compilation in Optimality Theory. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia, 56–63. Elbert, Mary and Gierut, Judith A. (1986). Handbook of Clinical Phonology: Approaches to assessment and treatment. Austin, TX: Pro-ed. Elbert, Mary, Dinnsen, Daniel A. and Weismer, Gary (1984). Phonological Theory and the Misarticulating Child (ASHA monographs no. 22). Rockville, MD: ASHA. Elbourne, P. (2003). Are our children speaking Middle English? Paper presented at CUNY Syntax Supper. Elbourne, P. (2005). On the Acquisition of Principle B. Linguistic Inquiry, 36: 333–65. Elías-Ulloa, J. (2008). Subject doubling and the mixed null subject system of Capanahua. In Proceedings of the Conference on Indigenous Languages of Latin America-III (CILLA-III). Austin, TX: University of Texas. Elisha, I. (1997). Functional categories and null subjects in Hebrew and child Hebrew. Unpublished Ph.D. thesis, City University of New York. Elisha, I. and Valian, V. (2012). Two-year-olds’ use of syntactic features in learning subjects in Hebrew. Unpublished manuscript, Bar Ilan University and Hunter College. Elliot, C. D., Smith, P., and McCullough, K. (1996). Verbal comprehension scale. In British Ability Scales, 2nd edn.Windsor: nferNelson. Ellison, Mark (1994). Phonological derivation in Optimality Theory. In Proceedings of the Fifteenth International Conference on Computational Linguistics, 1007–13. Elman, J. (1990). Finding structure in time. Cognitive Science, 14: 179–211. Elman, J. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48: 71–99. Elman, Jeffrey L., Bates, Elizabeth A., Johnson, Mark H., Karmiloff-Smith, Annette, Parisi, Domenico, and Plunkett, Kim (1996). Rethinking Innateness: A Connectionist Perspective on Development. Cambridge, MA: MIT Press and Bradford Book. Embick, D. (2004). On the structure of resultative participles. Linguistic Inquiry, 35: 355–92. Emonds, Joseph E. (1985). A Unified Theory of Syntactic Categories. Dordrecht: Foris. Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S., Wiebe, V., Kitano, T., Monaco, A. P., and Paabo, S. (2002). Molecular evolution of FOXP2, a gene involved in speech and language. Nature, 418: 869–72. Engstrand, O., Williams, K., and Strömqvist, S. (1991). Acquisition of Swedish tonal word accent contrast. In Proceedings of the 12th International Congress of Phonetic Sciences (ICPhS), 324–7.

854 References Eom, Soyoung and Snyder, William (2012). Children’s acquisition of English datives: Competing parametric accounts. In Y. Otsu (ed.), Proceedings of the 2012 Tokyo Conference on Psycholinguistics. Tokyo: Hituzi Shobo, 41–59. Erreich, A. (1984). Learning how to ask: Patterns of inversion in yes/no and wh-questions. Journal of Child Language, 11: 579–92. Escobar, L. and A. Gavarró (1999). The acquisition of Catalan Clitics and its implications for complex verb structure. Report de Recerca, Rep. No. GGT–99–3, Grup de Gramàtica Teòrica, Universitat Autònoma de Barcelona, Barcelona, Spain. Ettlinger, Marc (2009). Phonological chain shifts during acquisition: Evidence for lexical optimization. In M. Abdurrahman, A. Schardl, M. Walkow (eds), 38th Annual Meeting of the Northeast Linguistics Society. Amherst, MA: GLSA, 259–69. Everaert, M. (1986). The Syntax of Reflexivization. Dordrecht: Foris. Everett, D. (2005). Cultural constraints on grammar and cognition in Pirahã. Current Anthropology, 46(4): 621–46. Ezeizabarrena, M. J. (1996). Adquisición de la Morfología verbal en Euskera y Castellano por Niños Bilingües. Unpublished Ph.D. thesis, University of Hamburg. Fabricius, W., Sophian, C. and Wellman, H. (1987). Young children’s sensitivity to logical necessity in their inferential search behavior. Child Development, 58: 409–23. Fagerheim, T., Raeymaekers, P., Tonnessen, F. E., Pedersen, M., Tranebjaerf, L., and Lubs, H. A. (1999). A new gene (DYX3) for dyslexia is located on chromosome 2. Journal of Medical Genetics, 36: 664–9. Fais, Laurel, Kajikawa, Sachiyo, Amano, Shigeaki, and Werker, Janet F. (2009). Infant discrimination of a morphologically relevant word-final contrast. Infancy, 14: 488–99. Falcaro, M., Pickles, A., Newbury, D. F., Addis, L., Banfield, E., Fisher, S. E., Monaco, A. P., Simkin, Z., Conti-Ramsden, G., and The SLI Consortium (2008). Genetic and phenotypic effects of phonological short-term memory and grammatical morphology in specific language impairment. Genes, Brain, and Behavior, 7: 393–402. Fanselow, G., and Mahajan, A. (2000). Towards a minimalist theory of wh-expletives, wh- copying, and successive cyclicity. In U. Lutz, G. Müller, and A. von Stechow (eds), Wh-Scope Marking. Amsterdam: Benjamins, 195–230. Faraone, S. V., Doyle, A. E., Mick, E., and Biederman, J. (2001). Meta-analysis of the association between the 7–repeat allele fo the dopamine D(4) receptor gene and attention deficit hyperactivity disorder. American Journal of Pscyhiatry, 158: 1052–7. Farris[-Trimble], Ashley W. and Gierut, Judith A. (2005). Statistical regularities of the input as predictive of phonological acquisition. Paper presented at the Symposium on Research in Child Language Disorders, University of Wisconsin, Madison, WI. Farris-Trimble, Ashley W. and Gierut, Judith A. (2008). Gapped s-cluster inventories and faithfulness to the marked. In Daniel A. Dinnsen and Judith A. Gierut (eds), Optimality Theory, Phonological Acquisition and Disorders. London: Equinox Publishing Ltd., 377–406. Fee, E. J. (1995). Two strategies in the acquisition of syllable and word structure. Child Language Research Forum, 27: 29–38. Feeney, A., Scrafton, S., Duckworth, A., and Handley, S. J. (2004). The story of “some”: Everyday pragmatic inference by children and adults. Canadian Journal of Experimental Psychology, 58: 121–32. Feider, H. (1973). Comparatives in early child language. Glossa, 7: 3–20. Feigenson, L. and Carey, S. (2003). Tracking individuals via object-files: Evidence from infants’ manual search, Developmental Science, 6: 568–84.

References 855 Feigenson, L. and Halberda, J. (2004). Infants chunk object arrays into sets of individuals, Cognition, 91: 173–90. Feigenson, L. and Halberda, J. (2008). Conceptual knowledge increases infants’ memory capacity. Proceedings of the National Academy of Sciences, 105(29): 9926–30. Feigenson, L., Dehaene, S., and Spelke, E. S. (2004). Core systems of number. Trends in Cognitive Sciences, 8(7): 307–14. Feldman, H., Goldin-Meadow, S., and Gleitman, L.R. (1978). Beyond Herodotus: The creation of language by isolated deaf children. In J. Locke (ed.), Action, Gesture and Symbol. New York: Academic Press, 351–414. Feldman, L. B. and Fowler, C. A. (1987). The inflected noun system in Serbo-Croatian: Lexical representation of morphological structure. Memory and Cognition, 15(1–12). Feldman, N. (2011). Interactions between word and speech sound categorization in language acquisition. Unpublished Ph.D. thesis, Brown University. Feldman, N., Griffiths, T., and Morgan, J. (2009a). The influence of categories on perception: Explaining the perceptual magnet effect as optimal statistical inference. Psychological Review, 116: 752–82. Feldman, N., Griffiths, T., and Morgan, J. (2009b). Learning phonetic categories by learning a lexicon. Proceedings of the 31st Annual Conference of the Cognitive Science Society. . Feldman, N., Griffiths, T., Goldwater, S., and Morgan, J. (2013). A role for the developing lexicon in phonetic category acquisition. Psychological Review, 120(4): 751–78. Felser, C. (2001). Wh-expletives and secondary predication: German partial wh-movement reconsidered. Journal of Germanic Linguistics, 13: 5–38. Felser, C. (2004). Wh-copying, phases, and successive cyclicity. Lingua, 114: 543–74. Fennell, Christopher T. (2004). Infant attention to phonetic detail in word forms: Knowledge and familiarity effects. Unpublished Ph.D. thesis, University of British Columbia. Fennell, Christopher T. (2006). Infants of 14 months use phonetic detail in novel words embedded in naming phrases. In Proceedings of the 30th annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Fennell, Christopher T. and Werker, Janet F. (2003). Early word learners’ ability to access phonetic detail in well-known words. Language and Speech, 46: 245–64. Fennell, Christopher, and Werker, J. F. (2004). Infant attention to phonetic detail: Knowledge and familiarity effects. In Proceedings of the 28th annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Fennell, Christopher T., Waxman, Sandra R., and Weisleder, Adriana (2007). With referential cues, infants successfully use phonetic detail in word learning. In Proceedings of the 31st Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 178–189. Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thal, D., and Pethick, S. J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, 59. Fenson, L., Pethick, S., Renda, C., Cox, J. L., Dale, P. S., and Reznick, J. S. (1997). Technical manual and user’s guide for the MacArthur Communicative Developmental Inventories: Short form versions. Unpublished manuscript, San Diego State University. Ferdinand, Astrid (1996). The acquisition of the subject in French. Unpublished Ph.D. thesis, HIL/Leiden University. Ferguson, Charles A. (1978). Fricatives in child language acquisition. In Vladimir Honsa and M. J. Hardman-de-Bautista (eds), Papers on Linguistics and Child Language. The Hague: Mouton, 93–115.

856 References Ferguson, T. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1(2): 209–30. Fernandes, K. J., Marcus, G. F., DiNubila, J. A., and Vouloumanos, A. (2006). From semantics to syntax and back again: Argument structure in the third year of life. Cognition, 100: B10–B20. Fernau, Henning (2003). Identification of function distinguishable languages. Theoretical Computer Science, 290: 1679–7 11. Ferrand, Ludovic, Segui, Juan, and Grainger, Jonathan (1996). Masked priming of word and picture naming: The role of syllabic units. Journal of Memory and Language, 35: 708–23. Ferreira, F. (1994). Choice of passive voice is affected by verb type and animacy. Journal of Memory and Language, 33: 715–36. Fiengo, R. and May, R. (1994. Indices and Identity. Cambridge, MA: MIT Press. Fikkert, Paula (1994). On the acquisition of prosodic structure. Unpublished Ph.D. thesis. University of Leiden and HIL Dissertations. Fikkert, Paula (2005). Getting sound structures in mind. Acquisition bridging linguistics and psychology? In Anne Cutler (ed.), Twenty-first Century Psycholinguistics: Four Cornerstones. Mahwah, NJ: Lawrence Erlbaum, 43–56. Fikkert, Paula (2006). Developing representations and the emergence of phonology: Evidence from perception and production. Presented at the meeting of LabPhon 10, Paris. Fikkert, Paula (2007). Acquiring phonology. In Paul de Lacy (ed.), The Cambridge Handbook of Phonology. Cambridge: Cambridge University Press, 537–54. Fikkert, Paula (2010). Developing representations and the emergence of phonology: Evidence from perception and production. In Cécile Fougeron, Barbara Kühnert, Mariapaola D’Imperio, and Nathalie Vallée (eds), Laboratory Phonology 10. Berlin: De Gruyter, 227–60. Fikkert, Paula and Freitas, Maria J. (2004). The role of language-specific phonotactics in the acquisition of onset clusters. In Leonie Cornips and Jenny Doetjes (eds), Linguistics in the Netherlands 2004. Amsterdam: John Benjamins, 58–68. Fikkert, Paula, and Levelt, Claartje (2008). How does place fall into place? The lexicon and emergent constraints in the developing phonological grammar. In Peter Avery, B. Elan Dresher, and Keren Rice (ed.), Contrast in Phonology: Theory, Perception Acquisition. Berlin: Mouton de Gruyter, 231–70. Fikkert, Paula and Levelt, Clara C. (2008). How does place fall into place? The lexicon and emergent constraints in the developing phonological grammar. In Peter Avery, B. Elan Dresher, and Keren Rice (eds), Contrast in Phonology: Perception and Acquisition. Berlin: Mouton, 231–70. Filip, Hana (1993). Aspect, situation types and nominal reference. Unpublished Ph.D. thesis, University of California. [Published in 1999 as Aspect, Eventuality Types and Noun Phrase Semantics. New York: Garland Publishing.] Filip, Hana (2003). Prefixes and the delimitation of events. Journal of Slavic Linguistics, 11(1): 55–101. Filip, Hana (2012). Aspectual class and Aktionsart. In K. von Heusinger, C. Maienborn, and P. Portner (eds), Semantics: An International Handbook of Natural Language Meaning. Berlin/ New York: Mouton de Gruyter. Filip, Hana and Rothstein, Susan (2005). Telicity as a Semantic Parameter. In J. Lavine, S. Franks, H. Filip, and M. Tasseva-Kurktchieva (eds), Formal Approaches to Slavic Linguistics (FASL 14). Ann Arbor, MI: University of Michigan Slavic Publications, 139–56.

References 857 Fillmore, C. J. (1968). The case for case. In E. Bach and R. Harms (eds), Universals in Linguistic Theory. New York: Holt, Rinehart and Winston. Finch-Williams, A. (1981). Biggerest or biggester: A study in children’s acquisition of linguistic features of comparative adjectives and nonlinguistic knowledge of seriation. Unpublished Ph.D. thesis, University of Kansas. Finestack, L. H. and Abbeduto, L. (2010). Expressive language profiles of verbally expressive adolescents and young adults with Down Syndrome or Fragile X Syndrome. Journal of Speech, Language and Hearing Research, 53: 1334–48. Finley, Sara (2011). The privileged status of locality in consonant harmony. Journal of Memory and Language, 65: 74–83. Finley, Sara and Badecker, William (2009). Artificial language learning and feature-based generalization. Journal of Memory and Language, 61: 423–37. Finn, A. S. and Kam, C. L. (2008). The curse of knowledge: First language knowledge impairs adult learners’ use of novel statistics for word segmentation. Cognition, 108: 477–99. Fischer, K. M. (1973). A comparison of the similarties in language skills of identical and fraternal twin pairs. Unpublished Ph.D. thesis, University of Pennsylvania. Fischer, Marcus (2005). A Robbins-Monro type learning algorithm for an entropy maximizing version of Stochastic Optimiality Theory. MA thesis. Humboldt University, Berlin. Fiser, J. and Aslin, R. N. (2002a). Statistical learning of new visual feature combinations by infants. Proceedings of the National Academy of Sciences, 99(24): 15822–6. Fiser, J. and Aslin, R. N. (2002b). Statistical learning of higher-order temporal structure from visual shape-sequences. Journal of Experimental Psychology: Learning, memory, and cognition, 28: 458–67. Fisher, C. (1994). Structure and meaning in the verb lexicon: Input for a syntax-aided verb learning procedure. Language and Cognitive Processes, 9: 473–518. Fisher, C. (2000). From form to meaning: A role for structural analogy in the acquisition of language. In H. W. Reese (ed.), Advances in Child Development and Behavior. New York: Academic Press, 27: 1–53. Fisher, C. (2002). Structure limits on verb mapping: The role of abstract structure in 2.5-year- olds’ interpretations of novel verbs. Developmental Science, 5: 55–64. Fisher, Cynthia, Gleitman, Henry, and Gleitman, Lila R. (1991). On the semantic content of subcategorization frames. Cognitive Psychology, 23: 331–92. Fisher, C., Hall, G., Rakowitz, A., and Gleitman, L. R. (1994). When it is better to give than to receive: Syntactic and conceptual constraints on vocabulary growth. Lingua, 92: 333–75. Fisher, S. E. (2006). Tangled webs: Tracing the connections between genes and cognition. Cognition, 101: 270–97. Fisher, S. E., Vargha-Khadem, F., Watkins, K. E., Monaco, A. P., and Pembrey, M. E. (1998). Localization of a gene implicated in a severe speech and language disorder. Nature Genetics, 18: 168–70. Fitzpatrick, J. M. (2006). Deletion through movement. Natural Language and Linguistic Theory, 24: 399–431. Flack, Kathryn (2007a). Templatic morphology and indexed markedness constraints. Linguistic Inquiry, 38: 749–58. Flack, Kathryn (2007b). The sources of phonological markedness. Unpublished Ph.D. Thesis, University of Massachusetts. Flege, James E. (1987). The production of “new” and “similar” phones in a foreign language: Evidence for the effect of equivalence classification. Journal of Phonetics, 15: 47–65.

858 References Flemming, Edward. (2001). Scalar and categorical phenomena in a unified model of phonetics and phonology. Phonology, 18: 7–44. Fletcher, J. (1991). Rhythm and final lengthening in French. Journal of Phonetics, 19: 193–212. Fletcher, Paul (1985). A Child’s Learning of English. Oxford: Blackwell. Foch, T. T. and Plomin, R. (1980). Specific cognitive abilities in 5-to 12-year-old twins. Behavior Genetics, 10: 507–20. Fodor. J. A. (1966). How to learn to talk: Some simple ways. In F. Smith and G. Miller (eds), The Genesis of Language. Cambridge, MA: MIT Press, 105–22. Fodor, J. A. and Garrett, M. (1966). Some reflections on competence and performance. In J. Lyons and R. J. Wales (eds), Psycholinguistic Papers. Edinburgh: University of Edinburgh Press. Fodor, J. (1983). Modularity of Mind. Cambridge, MA: MIT Press. Fodor, J. D. (1985). Why learn lexical rules? Paper presented at the Tenth Annual Boston University Conference on Language Development, October 25–27. Written up as “The procedural solution to the projection problem”. Unpublished manuscript, City University of New York. Fodor, J. D. (1992). Designated triggers versus the Subset Principle. Unpublished manuscript, CUNY Graduate Center, New York. Fodor, J. D. (1998a). Unambiguous triggers. Linguistic Inquiry, 29(1): 1–36. Fodor, J. D. (1998b). Parsing to learn. Journal of Psycholinguistic Research, 27(3): 339–74. Fodor, J. D. and Sakas, W. G. (2004). Evaluating models of parameter setting. In A. Brugos, L. Micciulla, and C. E. Smith (eds), Proceedings of The 28th Annual Boston University Conference on Language Development (BUCLD 28). Boston, MA. Fodor, J. D. and Sakas, W. G. (2005). The Subset Principle in syntax: Costs of compliance. Journal of Linguistics, 41(3): 513–69. Folia, Vasiliki, Uddén, Julia, de Vries, Meinou, Forkstam, Christian, and Petersson, Karl Magnus (2010). Artificial language learning in adults and children. Language Learning, 60: 188–220. Supplement 2. Folli, R., and Harley, H. (2002). Consuming results in Italian and English: Flavours of v. In P. Kempchinsky and R. Slabakova (eds), Aspect. Dordrecht: Kluwer, 95–120. Foppolo, F. and Panzeri, F. (2013). Do children know when their rooms count as clean? In S. Kan, C. Moore-Cantwell, and R. Staubs (eds), Proceedings of the 40th Meeting of the North East Linguistic Society. Amherst, MA: GLSA, 205–18. Foraker, S., Regier, T., Khetarpal, A., Perfors, A., and Tenenbaum, J. (2009). Indirect evidence and the poverty of the stimulus: The case of anaphoric one. Cognitive Science, 33: 287–300. Forrest, Karen and Rockman, Barbara K. (1988). Acoustic and perceptual analysis of word- initial stop consonants in phonologically disordered children. Journal of Speech and Hearing Research, 31: 449–59. Forrest, Karen, Weismer, Gary, Hodge, Megan, Dinnsen, Daniel A., and Elbert, Mary (1990). Statistical analysis of word-initial /k/and /t/produced by normal and phonologically disordered children. Clinical Linguistics and Phonetics, 4: 327–40. Forrest, Karen, Weismer, Gary, Elbert, Mary, and Dinnsen, Daniel A. (1994). Spectral analysis of target-appropriate /t/and /k/produced by phonologically disordered and normally articulating children. Clinical Linguistics and Phonetics, 8: 267–81. Fougeron, Cécile (1999). Prosodically conditioned articulatory variation: A review. UCLA Working Papers in Phonetics, 97: 1–73. Fougeron, Cécile and Keating, Patricia A. (1996). Articulatory strengthening in prosodic domain-initial position. UCLA Working Papers in Phonetics, 92: 61–87.

References 859 Fowler, A. (1984). Language acquisition in Down’s syndrome children: Production and comprehension. Psychology. Philadelphia, University of Pennsylvania. Fowler, A. (1990). Language abilities in children with Down syndrome: Evidence for a specific syntactic delay. In D. Cicchetti and M. Beeghly (ed.), Down Syndrome: The Developmental Perspective. New York: Cambridge University Press. Fowler, A. (1998). Language in mental retardation: Associations with and dissociations from general cognition. In J. A. Burack, R. M., Hodapp, and E. F. Zigler (eds), Handbook of Mental Retardation and Development. New York: Cambridge University Press. Fowler, A., Gelman, S., and Gleitman, L. R. (1994). The course of language learning in children with Down syndrome. In H. Tager-Flusberg (ed.), Constraints on Language Acquisition. Hillsdale, NJ: Lawrence Earlbaum Associates. Fox, D. (1999). Economy and Semantic Interpretation. Cambridge, MA: MIT Press. Fox, D. (2002). Antecedent-contained deletion and the copy theory of movement. Linguistic Inquiry, 33: 63–96. Fox, D. (2007). Free choice disjunction and the theory of scalar implicatures. In U. Sauerland and P. Stateva (eds), Presupposition and Implicature in Compositional Semantics. New York: Palgrave Macmillan. Fox, Danny, and Grodzinsky, Yusef (1998). Children’s passive: A view from the by-phrase. Linguistic Inquiry, 29: 311–32. Franck, J. and Lassotta, R. (2011). Revisiting evidence for lexicalized word order. Unpublished manuscript, University of Geneva. Franck, J., Millotte, S., and Lassotta, R. (2011). Early word order representations: Novel arguments against old contradictions. Language Acquisition, 18(2), 121–35. Frank, M. C., Goldwater, S., Mansinghka, V., Griffiths, T., and Tenenbaum, J. (2007). Modeling human performance on statistical word segmentation tasks. Proceedings of the 29th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 281–6. Frank, M. C., Goodman, S., and Tenenbaum, J. (2009). Using speakers’ referential intentions to model early cross-situational word learning. Psychological Science, 20(5): 578–85. Frank, M. C., Goldwater, S., Griffiths, T., and Tenenbaum, J. (2010). Modeling human performance in statistical word segmentation. Cognition, 117: 107–25. Frank, Robert and Kapur, Shyam (1996). On the use of triggers in parameter setting. Linguistic Inquiry, 27(4): 623–60. Frank, Robert and Satta, Giorgio (1998). Optimality theory and the generative complexity of constraint violability. Computational Linguist, 24(2): 307–15. Frantz, Donald G. (2009). Blackfoot Grammar, 2nd edn. Toronto: University of Toronto Press. Fraser, Colin, Bellugi, Ursula, and Brown, Roger (1963). Control of grammar in imitation, comprehension, and production. Journal of Verbal Learning and Verbal Behavior, 2: 121–35. French, Margot (1984). Markedness and the acquisition of pied-piping and preposition stranding. McGill Working Papers in Linguistics, 2: 131–44. Freudenthal, D., Pine, J. M., and Gobet, F. (2006). Modeling the development of children’s use of optional infinitives in Dutch and English using MOSAIC. Cognitive Science: A multidisciplinary journal, 30(2): 277–310. Freudenthal, D., Pine, J. M., Aguado-Orea, J., and Gobet, F. (2007a). Modeling the developmental patterning of finiteness marking in English, Dutch, German, and Spanish using MOSAIC. Cognitive Science: A multidisciplinary journal, 31(2): 311–41. Freudenthal, D., Pine, J., and Gobet, F. (2007b). Understanding the developmental dynamics of subject omission: The role of processing limitations in learning. Journal of Child Language, 34: 83–110.

860 References Freudenthal, D., Pine, J., and Gobet, F. (2010). Explaining quantitative variation in the rate of Optional Infinitive errors across languages: A comparison of MOSAIC and the Variational Learning Model. Journal of Child Language, 37(3): 643–69. Friederici, Angela D. (2005). Neurophysiological markers of early language acquisition: From syllables to sentences. Trends in Cognitive Sciences, 9: 481–8. Friederici, Angela D. and Wessels, Jeanine M. (1993). Phonotactic knowledge and its use in infant speech perception. Perception and Psychophysics, 54: 287–95. Friederici, A. D., Friederich, M., and Christophe, A. (2007). Brain responses in 4-month-old infants are already language specific. Current Biology, 17: 1208–11. Friedmann, N. (2007). Young children and A-chains: The acquisition of Hebrew unaccusatives. Language Acquisition, 14: 377–422. Friedmann, N. and Costa, J. (2011). Acquisition of SV and VS order in Hebrew, European Portuguese, Palestinian Arabic and Spanish. Language Acquisition, 18: 1–38. Friedmann, N. and Novogrodsky, R. (2011). Which questions are most difficult to understand? The comprehension of Wh questions in three subtypes of SLI. Lingua, 121: 367–82. Friedmann, N., Gvion, A., and Novogrodsky, R. (2006). Syntactic movement in agrammatism and S-SLI: Two different impairments. In A. Belleti, E. Bennati, C. Chesi, E. Di Domenico, and I. Ferrari (eds), Language Acquisition and Development. Cambridge: Cambridge Scholars Press. Friedmann, N., Belletti, A. and Rizzi, L. (2009). Relativized relatives: Types of intervention in the acquisition of A-bar dependencies. Lingua, 199: 67–88. Friedrich, Claudia K., Lahiri, Aditi, and Eulitz, Carsten (2008). Neurophysiological evidence for underspecified lexical representations: Asymmetries with word initial variations. Journal of Experimental Psychology: Human perception and performance, 34: 1545–59. Friedrich, Manuela and Friederici, Angela D. (2005). Phonotactic knowledge and lexical– semantic processing in one-year-olds: Brain responses to words and nonsense words in picture contexts. Journal of Cognitive Neuroscience, 17: 1785–802. Friedrich, M. and Friederici, A. D. (2008). Neurophysiological correlates of online word learning in 14-month-old infants. NeuroReport, 19: 1757–61. Fudge, Eric C. (1969). Syllables. Journal of Linguistics, 5: 253–86. Fujimoto, Mari (2008). L1 acquisition of Japanese particles: A corpus- based study. Unpublished Ph.D. thesis, The City University of New York. Fuson, K. C. (1988). Children’s Counting and Concepts of Number. New York: Springer. Gagarina, Natalia (2008). Stanovlenie grammatičeskich kategorij russkogo glagola v detskoj reči [Entstehung der grammatischen Kategorien des russischen Verbs im Erstspracherwerb]. St Petersburg: Nauka. Gagarina, Natalia, Andjelkovic, Darinka, Hrzica, Gordana, Kiebzak- Mandera, Dorota, Konstanzou, Katerina, Kovacevic, Melita, Savic, Maja (2010). Comprehension and production of an aspectual distinction in Slavic languages and Modern Greek. Poster presentation at Let the Children Speak: Learning of critical language skills across 25 languages. London: Wellcome Trust Conference Center. Gahl, S. and Garnsey, S. (2004). Knowledge of grammar, knowledge of usage: Syntactic probabilities affect pronunciation variation. Language, 80: 748–75. Galaburda, A. M. and Kemper, T. L. (1979). Cytoarchitectonic abnormalities in developmental dyslexia: A case study. Annals of Neurology, 6: 94–100. Galligan, R. (1987). Intonation with single words: Purposive and grammatical use. Journal of Child Language, 14: 1–21.

References 861 Gambell, T. and Yang, C. (2006). Word segmentation: Quick but not dirt. Unpublished manuscript, Yale University. Ganger, J. B., Pinker, S., Chawla, S., and Baker, A. (1999). A twin study of early vocabulary and syntactic development. Unpublished manuscript, Pittsburgh, PA. García, Pedro, and Ruiz, José (1996). Learning k-piecewise testable languages from positive data. In Laurent Miclet and Colin de la Higuera (eds), Grammatical Interference: Learning Syntax from Sentences, Lecture Notes in Computer Science. Berlin: Springer, 1147: 203–10. García, Pedro, and Ruiz, José (2004). Learning k-testable and k-piecewise testable languages from positive data. Grammars, 7: 125–40. García, Pedro, Vidal, Enrique, and Oncina, José (1990). Learning locally testable languages in the strict sense. In Proceedings of the Workshop on Algorithmic Learning Theory, 325–38. García del Real, Isabel, and Ezeizabarrena, Marie-José (2011). Comprehension of grammatical and lexical aspect in early Spanish and Basque. S. Ferré, P. Prévost, L. Tuller, and R. Zebib (eds), Selected Proceedings of the Romance Turn IV: Workshop on the acquisition of romance languages. Cambridge: Cambridge Scholars Publishing, 82–103. Garey, M. R., and Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W. H. Freeman. Garnica, Olga K. (1973). The development of phonemic speech perception. In Timothy E. Moore (ed.), Cognitive Development and the Acquisition of Language. New York: Academic Press, 215–22. Gathercole, S. E., Willis, C. S., Baddeley, A. D., and Emslie, H. (1994). The children’s test of nonword repetition: A test of phonological working memory. Memory, 2: 103–27. Gathercole, V. C. (1979). Birdies like birdseed the bester than buns: A study of relational comparatives and their acquisition. Unpublished Ph.D. thesis, University of Kansas. Gathercole, V. C. (1985). More and more and more about more. Journal of Experimental Child Psychology, 40: 73–104. Gathercole, V. (1989). Contrast: A semantic constraint? Journal of Child Language, 16: 685–702. Gathercole, V. C. (2009). “It was so much fun. It was 20 fun!” Cognitive and linguistic invitations to the development of scalar predicates. In V. C. Mueller Gathercole (ed.), Routes to Language: Studies in honor of Melissa Bowerman. New York: Psychology Press, 319–443. Gauthier, B., Shi, R., and Xu, Y. (2007). Learning phonetic categories by tracking movements. Cognition, 103: 185–205. Gauthier, B., Shi, R., and Xu, Y. (2009). Learning prosodic focus from continuous speech input: A neural network exploration. Language Learning and Development, 5: 94–114. Gavarró, Anna, Pérez-Leroux, Ana T., and Roeper, Thomas (2006). Definite and bare noun contrasts in child Catalan. In Viçens Torrens and Linda Escobar (eds), The Acquisition of Syntax in Romance Languages. Amsterdam: John Benjamins, 51–68. Gavruseva, E. (2000). On the syntax of possessor extraction. Lingua, 110: 743–72. Gavruseva, E., and Thornton, R. (2001). Getting it right: Acquisition of whose-questions in child English. Language Acquisition, 9(3): 229–67. Gawlitzek-Maiwald, I. (2000). “I want a chimney builden”: The acquisition of infinitival constructions in German–English bilingual children. In S. Döpke (ed.), Cross-Linguistic Structures in Simultaneous Bilingualism. Amsterdam: John Benjamins. Gazdar, G. (1979). Pragmatics: Implicature, presupposition, and logical form. New York: Academic Press. Gazdar, Gerald, Klein, Ewan, Pullum, Geoffrey, and Sag, Ivan (1985). Generalized Phrase Structure Grammar. Cambridge, MA: Harvard University Press.

862 References Gelman, A., Carlin, J., Stern, H., and Rubin, D. (2003). Bayesian Data Analysis. Sussex: Chapman and Hall. Gelman, R. and Cordes, S. (2001). Counting in animals and humans. In E. Dupoux (ed.), Language, Brain, and Cognitive Development: Essays in Honor of Jacques Mehler. Cambridge, MA: MIT Press, 279–303. Gelman, R. and Gallistel, C. R. (1978). The Child’s Understanding of Number. Cambridge, MA: Harvard University Press. Gelman, Susan A. (2003). The Essential Child. New York: Oxford University Press. Gelman, Susan A. and Bloom, Paul (2007). Developmental changes in the understanding of generics. Cognition, 105: 166–83. Gelman, Susan A. and Brandone, Amanda C. (2010). Fast-mapping placeholders: Using words to talk about kinds. Language Learning and Development, 6: 223–40. Gelman, S. A. and Ebeling, K. S. (1989). Children’s use of nonegocentric standards in judgments of functional size. Child Development, 60: 920–32. Gelman, Susan A. and Raman, Lakshmi (2003). Preschool children use linguistic form class and pragmatic cues to interpret generics. Child Development, 74: 310–25. Gelman, Susan A., and Raman, Lakshmi (2007). This cat has nine lives? Children’s memory for genericit in language. Developmental Psychology, 43: 1256–68. Gelman, Susan A. and Tardif, Twila Z. (1998). A cross-linguistic comparison of generic noun phrases in English and Mandarin. Cognition, 66: 215–48. Gelman, Susan A., Goetz, Peggy J., Sarnecka, Barbara S., and Flukes, Jonathan (2008). Generic language in parent–child conversations. Language Learning and Development, 4: 1–31. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6: 721–41. Gentner, D., and Namy, L. (2004). The role of comparison in children’s early word learning. In D. G. Hall and S. R. Waxman (eds), Weaving a Lexicon. Cambridge, MA: MIT Press, 533–68. George, Leland and Kornfilt, Jaklin (1981). Finiteness and boundedness in Turkish. In Frank Heny (ed.), Binding and filtering. Cambridge, MA: MIT Press, 105–27. Gerken, L. A. (1991). The metrical basis for children’s subjectless sentences. Journal of Memory and Language, 30: 431–51. Gerken, L. A. (1994). A metrical template account of children’s weak syllable omissions. Journal of Child Language, 21: 565–84. Gerken, L. A. (1996). Prosodic structure in young children’s language production. Language, 72: 683–7 12. Gerken, L. A. (2004). Nine-month-olds extract structural principles required for natural language. Cognition, 93: B89–B96. Gerken, L. A., and Bollt, A. (2008). Three exemplars allow at least some linguistic generalizations: Implications for generalization mechanisms and constraints. Language Learning and Development, 4: 228–48. Gerken, L., Jusczyk, P. W., and Mandel, D. R. (1994). When prosody fails to cue syntactic structure: 9–month-olds’ sensitivity to phonological versus syntactic phrases. Cognition, 51: 237–65. Gertner, Y., Fisher, C., and Eisengart, J. (2006). Learning words and rules: Abstract knowledge of word order in early sentence comprehension. Psychological Science, 17: 684–91.

References 863 Gervain, J., Nespor, M., Mazuka, R., Horie, R., and Mehler, J. (2008). Bootstrapping word order in prelexical infants: A Japanese-Italian cross-linguistic study. Cognitive Psychology, 57(1): 56–74. Geurts, B. (1998). Scalars. In P. Ludewig and B. Geurts (eds), Lexicalische Semantik aus kognitiver Sicht. Tübingen: Narr, 95–117. Geurts, B. (2003). Quantifying kids. Language Acquisition, 11(4): 197–218. Geurts, B. (2010). Quantity Implicatures. Cambridge: Cambridge University Press. Geurts, B., Katsos, N., Cummins, C., Moons, J., and Noordman, L. (2010). Scalar quantifiers: Logic, acquisition, and processing. Language and Cognitive Processes, 25: 130–48. Gibbon, Fiona E. (1999). Undifferentiated lingual gestures in children with articulation/phonological disorders. American Journal of Speech-Language Pathology, 7: 38–48. Gibson, E. A. F. (1991). A computational theory of human linguistic processing: Memory limitations and processing breakdown. Unpublished Ph.D. thesis, Carnegie Mellon University. Gibson, E., and Wexler, K. (1994). Triggers. Linguistic Inquiry, 25: 407–54. Gierut, Judith A. (1985). On the relationship between phonological knowledge and generalization learning in misarticulating children. Unpublished Ph.D. thesis, Indiana University. Gierut, Judith A. (1986). Sound change: A phonemic split in a misarticulating child. Applied Psycholinguistics, 7: 57–68. Gierut, Judith (1999). Syllable onsets: Clusters and adjuncts in acquisition. Journal of Speech Language and Hearing Research, 42: 708–26. Gierut, Judith A. (2008a). Experimental instantiations of implicational universals in phonological acquisition. In Daniel A. Dinnsen and Judith A. Gierut (eds), Optimality Theory, Phonological Acquisition and Disorders. London: Equinox Publishing Ltd., 355–76. Gierut, Judith A. (2008b). Fundamentals of experimental design and treatment. In Daniel A. Dinnsen and Judith A. Gierut (eds), Optimality Theory, Phonological Acquisition and Disorders. London: Equinox Publishing Ltd., 93–118. Gierut, Judith A. and Champion, Annette H. (1999a). Interacting error patterns and their resistance to treatment. Clinical Linguistics and Phonetics, 13: 421–31. Gierut, Judith A. and Champion, Annette H. (1999b). Learning and the representation of complex onsets. In Annabel Greenhill, Heather Littlefield, and Cheryl Tano (eds), Proceedings of the 23rd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 196–203. Gierut, Judith A. and Champion, Annette H. (2000). Ingressive substitutions: Typical or atypical phonological pattern? Clinical Linguistics and Phonetics, 14: 603–17. Gierut, Judith A. and O’Connor, Kathleen M. (2002). Precursors to onset clusters in acquisition. Journal of Child Language, 29: 495–517. Gierut, Judith A., Elbert, Mary and Dinnsen, Daniel A. (1987). A functional analysis of phonological knowledge and generalization learning in misarticulating children. Journal of Speech and Hearing Research, 30: 462–79. Gierut, Judith A., Cho, Mi-Hui and Dinnsen, Daniel A. (1993). Geometric accounts of consonant– vowel interactions in developing systems. Clinical Linguistics and Phonetics, 7: 219–36. Gierut, Judith A., Simmerman, Christina L. and Neumann, Heidi J. (1994). Phonemic structures of delayed phonological systems. Journal of Child Language, 21: 291–316. Gierut, Judith A., Morrisette, Michele L., Hughes, Mary, and Rowland, Susan (1996). Phonological treatment efficacy and developmental norms. Language, Speech and Hearing Services in Schools, 27: 215–30.

864 References Gilks, W. R., Richardson, S., and Spiegelhalter D. J. (eds) (1996). Markov Chain Monte Carlo in Practice. Suffolk: Chapman and Hall. Gillette, J., Gleitman, L., Gleitman, H., and Lederer, A. (1999). Human simulation of vocabulary learning. Cognition, 73(2): 135–76. Giorgi, Alessandra, and Pianesi, F. (1997). Tense and Aspect: From semantics to morphosyntax. Oxford: Oxford University Press. Gitterman, D. and Johnston, J. R. (1983). Talking about comparisons: A study of young children’s comparative adjective usage. Journal of Child Language, 10: 605–21. Givón, T. (1984). Syntax: A functional-typological introduction. Vol I. Amsterdam: John Benjamins. Givón, T. (1994). Irrealis and the subjunctive. Studies in Language, 18(2): 265–337. Gleitman, Lila (1990). The structural sources of verb meaning. Language Acquisition, 1(1): 3–55. Gleitman, L. and Newport, E. (1995). Language: An invitation to cognitive science. In L. Gleitman and M. Liberman (eds), An Invitation to Cognitive Science: Vol 1: Language. Cambridge, MA: MIT Press, 1: 1–24. Gleitman, Lila, Cassidy, Kimberly, Nappa, Rebecca, Papafragou, Anna, and Trueswell, John C. (2005). Hard words. Language Learning and Development, 1: 23–64. Gnanadesikan, Amalia (1995/2004). Markedness and faithfulness constraints in child phonology In René Kager, Joe Pater, and Wim Zonneveld (eds), Constraints in Phonological Acquisition. Cambridge: Cambridge University Press, 73–108. Goad, Heather (1996). Codas, word minimality, and empty-headed syllables. In Eve Clark (ed.), Proceedings of the 28th Annual Child Language Research Forum. Stanford, CA: Center for the Study of Language and Information, 113–22. Goad, Heather (1997). Consonant harmony in child language: An optimality- theoretic account. In S. J. Hannahs and Martha Young- Scholten (eds), Focus on Phonological Acquisition. Amsterdam: Benjamins, 113–42. Goad, Heather (2001). Assimilation phenomena and initial constraint ranking in early grammars. In Anna H.-J. Do, Laura Domínguez, and Aimee Johansen (eds), Proceedings of the 25th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 307–18. Goad, Heather (2002). Markedness in right-edge syllabification: Parallels across populations. Canadian Journal of Linguistics, 47: 151–86. Goad, Heather (2006). Are children’s grammars rogue grammars? Glide substitution in branching onsets. Recherches linguistiques de Vincennes, 35: 103–32. Goad, Heather (2011). The representation of sC clusters. In Marc van Oostendorp, Colin Ewen, Elizabeth Hume, and Keren Rice (eds), The Blackwell Companion to Phonology. Oxford: Wiley-Blackwell, 898–923. Goad, Heather and Buckley, Meaghen (2006). Prosodic structure in child French: Evidence for the foot. Catalan Journal of Linguistics, 5: 109–142. Special issue on the Acquisition of Romance Languages. Goad, Heather and Prévost, Adèle-Elise (2010). A test case for markedness: The acquisition of Québec French stress. Unpublished manuscript. Goad, Heather and Rose, Yvan (2004). Input elaboration, head faithfulness and evidence for representation in the acquisition of left-edge clusters in West Germanic. In René Kager, Joe Pater, and Wim Zonneveld (eds), Constraints in Phonological Acquisition. Cambridge: Cambridge University Press, 109–57.

References 865 Goad, Heather and White, Lydia (2004). (Non)native-like ultimate attainment: The influence of L1 prosodic structure on L2 morphology. In A. Brugos, L. Micciulla, and C. Smith (eds), Proceedings of the 28th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 177–88. Goad, Heather, White, Lydia, and Steele, Jeffrey (2003). Missing inflection in L2 acquisition: Defective syntax or L1-constrained prosodic representations? Canadian Journal of Linguistics, 48: 243–63. Goehl, Henry and Golden, S. (1972). A psycholinguistic account of why children do not detect their own errors. Unpublished manuscript. Göksun, T., Küntay, A., and Naigles, L. (2008). Turkish children use morphosyntactic bootstrapping in interpreting verb meaning. Journal of Child Language, 35: 291–323. Gold, E. M. (1967). Language identification in the limit. Information and Control, 10(5): 447–74. Gold, E. M. (1978). Complexity of automata identification from given data. Information and Control, 37: 302–20. Goldberg, A. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press. Goldberg, A. (2006). Constructions at Work: The nature of generalization in language. Oxford: Oxford University Press. Goldberg, A., Casenhiser, D., and Sethuraman, N. (2004). Learning argument structure generalizations. Cognitive Linguistics, 14: 289–316. Goldberg, A., Casenhiser, D., and Sethuraman, N. (2005). The role of prediction in construction learning. Journal of Child Language, 32: 407–26. Goldin-Meadow, S., and Mylander, C. (1998). Spontaneous sign systems created by deaf children in two cultures. Nature, 391: 279–81. Goldin-Meadow, Susan, Gelman, Susan A., and Mylander, Carolyn (2005). Expressing generic concepts with and without a language model. Cognition, 96: 109–26. Goldman, Ronald and Fristoe, Macalyne (2000). Goldman-Fristoe test of articulation-2. Circle Pines, MN: American Guidance Service, Inc. Goldrick, Matthew (2004. Phonological features and phonotactic constraints in speech production. Journal of Memory and Language, 586–603. Goldsmith, J. (1976). Autosegmenatal phonology. Unpublished Ph.D. thesis, MIT. [Published 1979 by New York, Garland Press.] Goldsmith, John (1993). Harmonic Phonology. In J. Goldsmith (ed.), The Last Phonological Rule. Chicago, IL: University of Chicago Press, 21–60. Goldsmith, J. (ed.) (1995). The Handbook of Phonological Theory. Oxford, Blackwell. Goldwater, S. (2006). Nonparametric Bayesian models of lexical acquisition. Unpublished Ph.D. thesis, Brown University. Goldwater, Sharon, and Johnson, Mark (2003). Learning OT constraint rankings using a maximum entropy model. In Jennifer Spenader, Anders Eriksson, and Osten Dahl (eds), Proceedings of the Stockholm Workshop on Variation within Optimality Theory, 111–20. Goldwater, S., Griffiths, T., and Johnson, M. (2006). Interpolating between types and tokens by estimating power law generators. Neural Information Processing Systems, 18. Goldwater, S., Griffiths, T., and Johnson, M. (2009). A Bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1): 21–54. Golinkoff, R. M., Hirsh-Pasek, K., Cauley, K. M., and Gordon, L. (1987). The eyes have it: Lexical and syntactic comprehension in a new paradigm. Journal of Child Language, 14: 23–45.

866 References Golinkoff, R., Hirsh-Pasek, K., and Schweisguth, M. A. (2001). A reappraisal of young children’s knowledge of grammatical morphemes. In J. Weissenborn and B. Höhle (eds), Approaches to Bootstrapping: Phonological, lexical, syntactic, and neurophysiological aspects of early language acquisition. Amsterdam: Benjamins, 1: 167–88. Gómez, R. (2002). Variability and detection of invariant structure. Psychological Science, 13: 431–6. Gómez, R. and Gerken, L. (1999). Artificial grammar learning by 1–year-olds leads to specific and abstract knowledge. Cognition, 70: 109–35. Gómez, R. L., and Gerken, L. A. (2000). Infant artificial language learning and language acquisition. Trends in Cognitive Science, 4: 178–86. Gómez, Rebecca L. and Maye, Jessica (2005). The developmental trajectory of nonadjacent dependency learning. Infancy, 7(2): 183–206. Gonzalez Gomez, N., Nayeli, and Nazzi, T. (2012). Acquisition of non-adjacent phonological regularities in the first year of life: Evidence from a perceptual equivalent of the labial- coronal effect. Infancy, 17: 498–524. Gonzalez Gomez, N. and Nazzi, T. (2013). Effects of prior phonotactic knowledge on infant word segmentation: The case of non-adjacent dependencies. Journal of Speech, Language, and Hearing Research, 56: 840–9. Goodluck, H. (1981). Children’s grammar of complement subject interpretation. In S. Tavakolian (ed.), Language Acquisition and Linguistic Theory. Cambridge, MA: MIT Press. Goodluck, H. and Behne, D. (1992). Development in control and extraction. In J.Weissenborn, H. Goodluck, and T. Roeper (eds), Theoretical Issues in Language Acquisition. Hillsdale, NJ: Erlbaum. Goodluck, Helen and Stojanovic, Danijela (1997). The structure and acquisition of relative clauses in Serbo-Croatian. Language Acquisition, 5: 285–315. Goodluck, Helen and Tavakolian, Susan L. (1982). Competence and processing in children’s grammar of relative clauses. Cognition, 11: 1–27. Goodluck, H., Foley, M., and Sedivy, J. (1992). Adjunct islands and acquisition. In H. Goodluck and M. Rochemont (eds), Island Constraints. Dordrecht: Kluwer, 181–94. Goodluck, H., Terzi, A. and Diaz, G. C. (2001). The acquisition of control crosslinguistically: Structural and lexical factors in learning to license PRO. Journal of Child Language, 28: 153–72. Goodman, N. (1955). Fact, fiction, and Forecast. Cambridge, MA: Harvard University Press. Goodrich, Mary and Snyder, William (2013). Atelic paths and the Compounding Parameter: Evidence from Acquisition. In N. Goto, K. Otaki, A. Sato, and K. Takita (eds), Proceedings of GLOW in Asia IX 2012. Mie, Japan: Mie University, 19–30. Goodsitt, J. V., Morgan, J. L., and Kuhl, P. K. (1993). Perceptual strategies in prelingual speech segmentation. Journal of Child Language, 20: 229–52. Gopnik, M. and Crago, M. B. (1991). Familial aggregation of a developmental language disorder. Cognition, 39: 1–50. Gor, V. and Syrett, K. (2015). Picking up after sloppy children: What pronouns reveal about children’s analysis of English comparative constructions. In E. Grillo and K. Jepson (eds), Proceedings of the 39th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 191–203. Gordishevsky, Galina and Schaeffer, Jeannette (2008). The development and interaction of case and number in early Russian. Language Acquisition and Language Disorders, 45: 31–59. Gordon, M. (2002). A factorial typology of quantity-insensitive stress. Natural Language and Linguistic Theory, 20: 491–552.

References 867 Gordon, M. (2005). A perceptually-driven account of onset-sensitive stress. Natural Language and Linguistic Theory, 23: 595–653. Gordon, P. (1985). Level-ordering in lexical development. Cognition, 21: 73–93. Gordon, P. (2004). The origin of argument structure in infant event representations. Proceedings of the Annual Boston University Conference on Language Development, 28: 189–98. Gordon, Peter, and Chafetz, Jill (1990). Verb-based versus class-based accounts of actionality effects in children’s comprehension of passives. Cognition, 36: 227–54. Gormley, Andrea (2003). The production of consonant harmony in child speech. Unpublished Master’s thesis, University of British Columbia. Goro, T. (2004a). The emergence of Universal Grammar in the emergence of language: The acquisition of Japanese logical connectives and positive polarity. Manuscript, University of Maryland at College Park. Goro, Takuya (2004b). On the distribution of to-infinitives in early child English. In A. Brugos, L. Micciulla, and C. E. Smith (ed.), Proceedings of the 28th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 199–210. Goro, T. (2007). Language specific constraints on scope interpretation in first language acquisition. Unpublished Ph.D. thesis, University of Maryland. Goro, T. and Akiba, S. (2004). The acquisition of disjunction and positive polarity in Japanese. In V. Chand, A. Kelleher, A. J. Rodríguez, and B. Schmeiser (eds), WCCFL 23: Proceedings of the 23rd West Coast Conference on Formal Linguistics. Somerville, MA: Cascadilla Press, 251–64. Goro, T., Minai, U., and Crain, S. (2005). Two disjunctions for the price of only one. BUCLD 29: Proceedings of the 29th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 1: 228–39. Goro, T., Minai, U., and Crain, S. (2006). Bringing out the logic in child language. In L. Bateman and C. Ussery (eds), Proceedings of the Thirty-Fifth Annual Meeting of the North East Linguistic Society. Amherst, MA: GLSA Publications, 245–56. Gout, A. (2001). Etapes précoces de l’acquisition du lexique. Unpublished Ph.D. thesis, Ecole des Hautes Etudes en Sciences Sociales. Gout, A., Christophe, A., and Morgan, J. L. (2004). Phonological phrase boundaries constrain lexical access II. Infant data. Journal of Memory and Language, 51: 548–67. Goyet, L., Nishibayashi, L-.L., and Nazzi, T. (2013). Early syllabic segmentation of fluent speech by infants acquiring French. PlosOne, 8: e79646. Goyet, L., de Schonen, S. and Nazzi, T. (2010). Syllables in word segmentation by French-learning infants: An ERP study. Brain Research, 1332: 75–89. Graf Estes, Katharine (2009). From tracking statistics to word learning: Statistical learning and lexical acquisition. Language and Linguistic Compass, 3: 1379–89. Graf Estes, K., Evans, J., Alibali, M., and Saffran, J. (2007). Can infants map meaning to newly segmented words? Psychological Science, 18(3): 254–60. Graf Estes, Katharine, Edwards, Jan, and Saffran, Jenny R. (2011). Phonotactic constraints on infant word learning. Infancy, 16: 180–97. Graham, Susan A., Kilbreath, Cari S., and Welder, Andrea N. (2004). 13-month-olds rely on shared labels and shape similarity for inductive inferences. Child Development, 75: 409–27. Graham, Susan A., Nayer, Samantha L., and Gelman, Susan A. (2011). Two-year-olds use the generic/nongeneric distinction to guide their inferences about novel kinds. Child Development, 82: 493–507. Grant, J., Valian, V., and Karmiloff-Smith, A. (2002). A study of relative clauses in Williams syndrome. Journal of Child Language, 29: 403–16.

868 References Graves, Michael F. and Koziol, Stephen (1971). Noun plural development in primary grade children. Child Development, 42: 1165–73. Graziano-King, J. (1999). Acquisition of comparative forms in English. Unpublished Ph.D. thesis, City University of New York. Graziano-King, J., and Cairns, H. S. (2005). Acquisition of English comparative adjectives. Journal of Child Language, 2: 345–73. Green, G. (1974). Semantics and Syntactic Regularity. Bloomington: Indiana University Press. Green, Lisa, and Roeper, Thomas (2007). The acquisition path for aspect: Remote past and habitual in child African American English. Language Acquisition, 14(3): 269–313. Greenberg, Joseph H. (1963). Some universals of language with special reference to the order of meaningful elements. In Joseph H. Greenberg (ed.), Universals of Language. Cambridge, MA: MIT Press, 73–113. Greenberg, Joseph (1978). Initial and final consonant sequences. In Joseph Greenberg (ed.), Universals of Human Language: Volume 2, Phonology. Stanford University Press, 2: 243–79. Grice, H. P. (1975). Logic and conversation. In P. Cole, and J. L. Morgan (eds), Syntax and Semantics: Speech acts. New York: Academic Press, 3: 41–58. Grice, H. P. (1989). Studies in the Way of Words. Cambridge, MA: Harvard University Press. Griffiths, A. J. F., Wessler, S. R., Lewontin, R. C., and Carroll, S. B. (2008). An Introduction to Genetic Analysis, 9th edn. New York: W.H. Freeman. Griffiths, T. and Tenenbaum, J. (2005). Structure and strength in causal induction. Cognitive Psychology, 51: 334–84. Griffiths, T. and Yuille, A. (2006). A primer on probabilistic inference. Trends in Cognitive Sciences, 10(7). Supplement to special issue on Probabilistic Models of Cognition. Griffiths, Thomas L., Kemp, Charles, and Tenenbaum, Joshua B. (2008). Bayesian models of cognition. In The Cambridge Handbook of Computational Cognitive Modeling. Cambridge: Cambridge University Press, 1–49. Griffiths, T., Chater, N., Kemp, C., Perfors, A., and Tenenbaum, J. (2010). Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Sciences, 14: 357–64. Grigorenko, E. L. (2009). Speaking genes or genes for speaking? Deciphering the genetics of speech and language. Journal of Child Psychology and Psychiatry, 50: 116–25. Grimm, Hannelore (1973). Strukturanalystische Unntersuchung der Kindersprache. Stuttgart: Hans Huber. Grimshaw, J. (1981). Form, function, and the language acquisition device. In C. L. Baker and J. J. McCarthy (eds), The Logical Problem of Language Acquisition. Cambridge, MA: MIT Press, 165–82. Grimshaw, J. (1990). Argument Structure. Cambridge, MA: MIT Press. Grimshaw, J. (1991). Extended projection. Unpublished manuscript, Brandeis University. Grimshaw, J. (1993). Minimal projection, heads, and optimality. Unpublished manuscript, Rutgers University. Grimshaw, J. and T. Rosen (1990). Knowledge and obedience: The developmental status of the Binding Theory. Linguistic Inquiry, 21: 187–222. Grimshaw, J. and Vikner, S. (1993). Obligatory adjuncts and the structure of events. In E. Reuland and W. Abraham (eds), Knowledge and Language, Volume II: Lexical and conceptual structure. Dordrecht: Kluwer, II: 143–55. Grinstead, J. (1994). The emergence of nominative case assignment in child Catalan and Spanish. Unpublished Master’s thesis, UCLA.

References 869 Grinstead, J. (1998). Subjects, sentential negation and imperatives in child Spanish and Catalan. Unpublished Ph.D. thesis, UCLA. Grinstead, John (2000). Case, inflection and subject licensing in child Catalan and Spanish. Journal of Child Language, 27: 119–55. Grinstead, J. (2004). Subjects and interface delay in child Spanish and Catalan. Language, 80(1): 40–72. Grinstead, J. and Spinner, P. (2009). The clausal left periphery in child Spanish and German. Probus, 21(1): 51–82. Grinstead, J., Warren, V., Ricci, C., and Sanderson, S. (2009a). Finiteness and subject-auxiliary inversion in child English. In J. Chandlee, M. Franchini, S. Lord, and G.-M. Rheiner (eds), Proceedings of the 33rd annual Boston University Conference on Language Development. Boston University: Cascadilla Press, 211–22. Grinstead, J., Warren, V., Ricci, C., and Sanderson, S. (2009b). The optional inversion stage in child English. Paper presented at the Generative Approaches to Language Acquisition. Lisbon. Grinstead, J., De la Mora, J., Pratt, A., and Flores, B. (2009c). Temporal interface delay and root nonfinite verbs in Spanish-speaking children with specific language impairment: Evidence from the grammaticality choice task. In J. Grinstead (ed.), Hispanic Child Languages: Typical and impaired development. Amsterdam: John Benjamins, 239–64. Grinstead, J., De la Mora, J., Vega-Mendoza, M., and Flores, B. (2009d). An elicited production test of the optional infinitive stage in child Spanish. In J. Crawford, K. Otaki, and M. Takahashi (eds), Generative Approaches to Language Acquisition—North America (GALANA 2008). Somerville, MA: Cascadilla Press, 36–45. Grinstead, J., Vega- Mendoza, M., and Goodall, G. (2010). Subject–verb inversion and verb finiteness are independent in Spanish. Paper presented at the Hispanic Linguistic Symposium. Grodner, D., Klein, N., Carbary, K., and Tanenhaus, M. (2010). “Some,” and possibly all, scalar inferences are not delayed: Evidence for immediate pragmatic enrichment. Cognition, 116: 42–55. Grodzinsky, J. and Reinhart, T. (1993). The innateness of binding and coreference. Linguistic Inquiry, 24: 69–102. Grodzinsky, J., Wexler, K., Chien, Y.-C., Marakovitz, S., and Solomon, J. (1993). The breakdown of binding relations. Brain and Language, 45: 396–422. Gropen, J., Pinker, S., Hollander, M., Goldberg, R., and Wilson, R. (1989). The learnability and acquisition of the dative alternation in English. Language, 65: 203–57. Gruber, J. (1965a). Studies in lexical relations. Unpublished Ph.D. thesis, MIT. [Distributed by MIT Working Papers in Linguistics.] Gruber, J. S. (1965b). Look and see. Language, 43(4): 937–47. Gruber, Jeffrey (1967). Topicalization in child language. Foundations of Language, 3(1): 37–65. Grunwell, Pamela (1981). The development of phonology: A descriptive profile. First Language, 2: 161–91. Grunwell, Pamela (1982). Clinical Phonology. London: Croom Helm. Grüter, T. (2007). Investigating object drop in child French and English: A truth value judgment task. In A. Belikova, L. Meroni, and M. Umeda (eds), Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America. Somerville, MA: Cascadilla Proceedings Project. Gualmini, A. (2003). Some knowledge children don’t lack. Proceedings of the 27th Boston University Conference on Language Development, Somerville, MA: Cascadilla Press.

870 References Gualmini, A. (2008). The rise and fall of Isomorphism. Lingua, 118: 1158–76. Gualmini, A., and Crain, S. (2002). Why no child or adult must learn De Morgan’s laws. Proceedings of the 26th Boston University Conference on Language Development. Cascadilla Press, Somerville, MA, 243–54. Gualmini, A. and Crain, S. (2004). Operator conditioning. Proceedings of the 28th Boston University Conference on Language Development. Cascadilla Press, Somerville, MA, 232–43. Gualmini, A. and Crain, S. (2005). The structure of children’s linguistic knowledge. Linguistic Inquiry, 36: 463–74. Gualmini, A. and Schwarz, B. (2007). Negation and downward entailingness: Consequences for learnability theory. To appear in Proceedings of he Workshop on Negation and Polarity, University of Tubingen. Gualmini, A., Crain, S., Meroni, L., Chierchia, G., and Guasti, M., T. (2001). At the semantics/ pragmatics interface in child language. Proceedings of Semantics and Linguistic theory XI. Ithaca, NY: CLC Publications, Department of Linguistics, Cornell University. Gualmini, A., Meroni, L., and Crain, S. (2003a). Children’s asymmetrical responses. Proceedings of 4th Tokyo Conference on Psycholinguistics. Tokyo: Hitsuji Shobo. Gualmini, A., Meroni, L., and Crain, S. (2003b). An asymmetric universal in child language. In M. Weisgerber (ed.), Proceedings of Sinn und Bedeutung VII. Konstanz: Konstanz Linguistics Working Papers, 136–48. Gualmini, A., Hulsey, S., Hacquard, V., and Fox, D. (2008). The question–answer requirement for scope assignment. Natural Language Semantics, 16: 205–37. Guasti, M. T. (1993). Causative and Perception Verbs. A comparative study. Torino: Rosenberg and Sellier. Guasti, M. T. (1994). Verb syntax in Italian child grammar: Finite and non-finite verbs. Language Acquisition, 3(1): 1–40. Guasti, M. T. (1996a). Acquisition of Italian Interrogatives. In H. Clahsen (ed.), Generative Perspectives on Language Acquistion: Empirical findings, theoretical considerations, and cross- linguistic comparisons. Amsterdam, Philadelphia: John Benjamins. Guasti, M. T. (1996b). Semantic restrictions in Romance causatives and the incorporation approach. Linguistic Inquiry, 27: 294–313. Guasti, M. T. (2000). An excursion into interrogatives in Early English and Italian. In M. A. Friedmann and L. Rizzi (eds), The Acquisition of Syntax: Studies in Comparative Developmental Linguistics. Longman, 103–28. Guasti, M. T. (2002). Language Acquisition: The growth of grammar. Cambridge, MA: MIT Press. Guasti, M. T. and Chierchia, G. (2000). Backward versus forward anaphora: Reconstruction in child grammar. Language Acquisition, 8: 129–70. Guasti, M. T. and Rizzi, L. (1996). Null Aux and the acquisition of residual V2. In A. Stringfellow, D. Cahana-Amitay, E. Hughes, and A. Zukowski (eds), BUCLD 20: Proceedings of the 20th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Guasti, M. T., Thornton, R., and Wexler, K. (1995). Negation in children’s questions: The case of English. Paper presented at the Boston Conference on Language Development, Sommerville: Cascadilla Press. Guasti, M. T., Chierchia, G., Crain, S., Foppolo, F., Gualmini, A., and Meroni, L. (2005). Why children and adults sometimes (but not always) compute implicatures. Language and Cognitive Processes, 20: 667–96.

References 871 Guasti, Maria Teresa, Gavarró, Anna, de Lange, Joke, and Caprin, Claudia (2008). Article omission across child languages. Language Acquisition, 15: 89–119. Guasti, M. T., Branchini, C., and Arosio, F. (2012). Interference in the production of Italian subject and object wh-questions. Applied Psycholinguistics, 33: 185–223. Guerriero, A. M. S., Oshima-Takane, Y., and Kuriyama, Y. (2006). The development of referential choice in English and Japanese: A discourse-pragmatic perspective. Journal of Child Language, 33: 823–57. Guilfoyle, E. (1984). The acquisition of tense and the emergence of lexical subjects. McGill Working Papers in Linguistics/Cahiers linguistiques de McGill, 2(1): 20–31. Guilfoyle, E. and Noonan, M. (1988). Functional categories and language acquisition. Paper presented at the Boston University Conference on Language Development. Guilfoyle, Eithne and Noonan, Máire (1992). Functional categories and language acquisition. Canadian Journal of Linguistics, 37(2): 241–72. Guillaume, Thierry, Vihman, Marilyn, and Roberts, Mark (2003). Familiar words capture the attention of 11 month-olds in less than 250 ms. NeuroReport, 14(18): 2307–10. Gupta, P. and Touretzky, D. S. (1994). Connectionist models and linguistic theory: Investigations of stress systems in language. Cognitive Science, 18: 1–50. Gussenhoven, C. (2004). The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Gussmann, Edmund (1980). Studies in Abstract Phonology. Cambridge, MA: MIT Press. Gutierrez Mangado, M. J. (2006). Acquiring long-distance wh-questions in L1 Spanish: A longitudinal investigation. In V. Torrens and L. Escobar (eds), The Acquisition of Syntax in Romance Languages: Language acquisition and language disorders. Amsterdam: John Benjamins, 41: 251–87. Haas, William (1963). Phonological analysis of a case of dyslalia. Journal of Speech and Hearing Disorders, 28: 239–46. Hackl, M. (2000). Comparative quantifiers. Unpublished Ph.D. thesis, MIT. Hacohen, Haviya (2010). On the acquisition of Hebrew compositional telicity. Unpublished Ph.D. thesis, Ben- Gurion University of the Negev. [Dissertation notice in Language Acquisition, 18(1): 2011: 84–6.] Hacquard, V. (2005). Aspect and actuality entailment: Too and enough constructions. In E. Maier, C. Bary, and J. Huitink (eds), Proceedings of Sinn und Bedeutung, 9: 116–30. Hacquard, V. (2006). Aspects of too and enough constructions. In E. Georgala and J. Howell (eds), Proceedings of Semantics and Linguistic Theory XV. Ithaca, NY: CLC Publications. Haddad, Y. A. (2006). Control in Lebanese Arabic. Working paper, Variation in Control Structures Project. Haegeman, L. (1985). The get passive and Burzio’s generalization. Lingua, 66: 53–77. Haegeman, L. (1995). Root infinitives, tense, and truncated structures in Dutch. Language Acquisition, 4(3): 205–55. Haegeman, L. (1997). Register variation, truncation and subject omission in English and in French. English Language and Linguistics, 1: 233–70. Haegeman, L. (1999). Adult null subjects in non pro-drop languages. In M.-A. Friedemann and L. Rizzi (eds), The Acquisition of Syntax. London: Addison, Wesley and Longman. Haegeman, L. and Ihsane, T. (2001). Adult null subjects in the non-pro-drop languages: Two diary dialects. Language Acquisition, 9: 329–46. Haiyun, L. and Chunyan, N. (2009). Phase Impenetrability Condition and the acquisition of unaccusatives, object- raising ba- constructions and passives in Mandarin- speaking

872 References children. In J. Crawford (ed.), Proceedings of the 3rd Conference on Generative Approaches to Language Acquisition North America (GALANA 2008). Somerville, MA: Cascadilla Proceedings Project, 148–52. Hakansson, G. and Hansson, K. (2000). Comprehension and production of relative clauses: A comparison between Swedish impaired and unimpaired children. Journal of Child Language, 27: 313–33. Halberda, J. (2006). Is this a dax which I see before me? Use of the logical argument disjunctive syllogism supports word-learning in children and adults. Cognitive Psychology, 53(4): 310–44. Halberda, J. and Feigenson, L. (2008). Developmental change in the acuity of the “number sense”: The approximate number system in 3-, 4-, 5-, and 6-year-olds and adults. Developmental Psychology, 44(5): 1457–65. Halberda, J., Taing, L., and Lidz, J. (2008). The development of “most” comprehension and it potential dependence on counting ability in preschoolers. Language Learning and Development, 4: 99–121. Halberda, J., Ly, R., Wilmer, J. B., Naiman, D. Q., and Germine, L. (2012). Number sense across the lifespan as revealed by a massive Internet-based sample. Proceedings of the National Academy of Sciences, 109(28): 11116–20. Hale, Kenneth (1973). Deep-surface canonical disparities in relation to analysis and change: An Australian example. In Thomas Sebeok (ed.), Current Trends in Linguistics, 11. The Hague, Mouton, 401–58. Hale, K. and Keyser, S. J. (1993). On argument structure and the lexical expression of syntactic relations. In K. Hale and S. J. Keyser (eds), The View from Building 20: Essays in linguistics in honor of Sylvain Bromberger. Cambridge, MA: MIT Press, 53–109. Hale, K., and Keyser, S. J. (2002). Prolegomenon to a Theory of Argument Structure. Cambridge, MA: MIT Press. Hale, Mark and Reiss, Charles (1998). Formal and empirical arguments concerning phonological acquisition. Linguistic Inquiry, 29: 656–83. Hale, Mark, and Reiss, Charles (2008). The Phonological Enterprise. Oxford: Oxford University Press. Halle, Morris (1959). Sound Pattern of Russian. Cambridge, MA: MIT Press. Halle, Morris, Vaux, Bert, and Wolfe, Andrew (2000). On feature spreading and the representation of place of articulation. Linguistic Inquiry, 31: 387–444. Halle, M. and Marantz, A. (1993). Distributed morphology and the pieces of inflection. In K. Hale and S. J. Keyser (eds), The View from Building 20: Essays in linguistics in honor of Sylvain Bromberger. Cambridge, MA: MIT Press, 111–76. Hallé, P., de Boysson-Bardies, B. and Vihman, M. M. (1991). Beginnings of prosodic organization: Intonation and duration patterns of disyllables produced by French and Japanese infants. Language and Speech, 34: 299–318. Hallé, P., Durand, C., and de Boysson-Bardies, B. (2008). Do 11-month-old French infants process articles? Language and Speech, 51: 45–66. Halliday, M. A. K. (1975). Learning How to Mean: Explorations in the development of language. London: Edward Arnold. Hamann, C. (2005). The production of Wh-questions by French children with SLI-movement is difficult. 10th International Congress for the Study of Child Language, Freie Universita ̈t of Berlin. Hamann, C. and Plunkett, K. (1998). Subjectless sentences in child Danish. Cognition, 69: 35–72.

References 873 Hamann, C., Kowalski, O. and Philip, W. (1997). The French Delay of Principle B Effect. In E. Hughes, M. Hughes and A. Greenhill (eds), Proceedings of the Annual Boston University Conference on Language Development, vol. 21. Somerville, MA: Cascadilla Press. Hamburger, H. (1980). A deletion ahead of its time. Cognition, 8: 389–416. Hamann, C. (1991). Adjectival semantics. In A. von Stechow and D. Wunderlich (eds), Semantics—An International Handbook of Contemporary Research. Berlin: de Gruyter, 657–73. Hammerly, Hector (1982). Contrastive phonology and error analysis. International Review of Applied Linguistics, 20: 17–32. Hammond, Michael (2004). Gradience, phonotactics, and the lexicon in English phonology. International Journal of English Studies, 4: 1–24. Han, C.-H.(1998). The structure and interpretation of imperatives: Mood and force in Universal Grammar. Unpublished Ph.D. thesis, University of Pennsylvania. Han, C. H., Lidz, J., and Musolino, J. (2007). Verb-movement and grammar competition in Korean: Evidence from quantification and negation. Linguistic Inquiry, 38: 1–47. Hanink, Emily and Snyder, William (2014). Particles and compounds in German: Evidence for the Compounding Parameter. Language Acquisition, 21(2): 199–211. Hankamer, J. (1973). Why there are two than’s in English. In C. Corum, T. C. Smith-Start, and A. Weiser (eds), Proceedings of the 8th annual meeting of the Chicago Linguistic Society. Chicago, IL: Chicago Linguistic Society, 179–91. Hankamer, J. and Sag, I. (1976). Deep and surface anaphora. Linguistic Inquiry, 7: 391–428. Hansson, Gunnar Ó. (2001). Theoretical and typological issues in consonant harmony. Unpublished Ph.D. thesis, University of California Berkeley. Hansson, K. and Nettelbladt, U. (2006). Wh-questions in Swedish children with SLI. Advances in Speech-Language Pathology, 8: 376–83. Happé, F., Ronald, A., et al. (2006). Time to give up on a single explanation for autism. Nature Neuroscience, 9(10): 1218–20. Harley, H. (1995). Subjects, events, and licensing. Unpublished Ph.D. thesis, MIT. Harley, H. and Noyer, R. (2000). Licensing in the non-lexicalist lexicon. In B. Peeters (ed.), The Lexicon/Encyclopaedia Interface. Amsterdam: Elsevier Press, 349–74. Harnish, R. (1976). Logical form and implicature. In T. G. Bever, J. J. Katz and T. Langdon (eds), An Integrated Theory of Linguistic Ability. New York: Thomas Y. Crowell, 313–92. Reprinted in S. Davis (ed.) (1991). Pragmatics: A Reader. Oxford: Oxford University Press, 316–64. Harris, F. and Flora, J. (1982). Children’s use of get passives. Journal of Psycholinguistic Research, 11: 297–311. Harris, James (1973). On the order of certain phonological rules in Spanish. In Stephen R. Anderson and Paul Kiparsky (eds), A Festschrift for Morris Halle. New York: Holt, Reinhart and Winston, 59–76. Harris, James (1991). The exponence of gender in Spanish. Linguistic Inquiry, 22(1): 27–62. Harris, John (1997). Licensing inheritance: An integrated theory of neutralisation. Phonology, 14: 315–70. Harris, T. and Wexler, K. (1996). The optional-infinitive stage in child English: Evidence from negation. In H. Clahsen (ed.), Generative Perspectives on Language Acquisition: Empirical findings. Amsterdam: John Benjamins, 1–42. Harris, Zellig (1951). Methods in Structural Linguistics. Chicago: University of Chicago Press. Harris, Z. (1954). Distributional structure. Word, 10: 146–62.

874 References Harrison, Michael A. (1978). Introduction to Formal Language Theory. Boston, MA: Addison-Wesley. Harrison, P. (2000). Acquiring the phonology of lexical tone in infancy. Lingua 110: 581–616. Hastie, Trevor, Tibshirani, Rob, and Friedman, Jerome (2009). The Elements of Statistical Learning. New York: Springer. Haugen, Einar (1951). Directions in modern linguistics. Language, 27: 211–22. Haugen, Einar (1956a). The syllable in linguistic description. In Morris Halle, Horace Lunt, Hugh MacLean, and Cornelis H. van Schooneveld (eds), For Roman Jakobson: Essays on the occasion of his sixtieth birthday. The Hague: Mouton, 213–21. Haugen, Einar (1956b). Syllabification in Kutenai. International Journal of American Linguistics, 22: 196–201. Hauser, M. D., Newport, E. L., and Aslin, R. N. (2001). Segmentation of the speech stream in a non-human primate: Statistical learning in cotton-top tamarins. Cognition, 78: B53–B64. Hauser, M., Chomsky, N., and Fitch, W. T. (2002).The faculty of language: What is it, who has it, and how did it evolve? Science, 298: 1569–79. Hawkins, Sarah (1973). Temporal coordination of consonants in the speech of children: Preliminary data. Journal of Phonetics, 1: 181–217. Haworth, C. M. A., Kovas, Y., Harlaar, N., Hayiou-Thomas, M. E., Petrill, S. A., Dale, P. S., and Plomin, R. (2009). Generalist genes and learning disabilities: A multivariate genetic analysis of low performance in reading, mathematics, language, and general cognitive ability in a sample of 8,000 12-year-old twins. Journal of Child Psychology and Psychiatry, 50(10): 1318–25. Hay, Jennifer B. and Baayen, R. Harald (2005). Shifting paradigms: Gradient structure in morphology. Trends in Cognitive Sciences, 9: 342–8. Hayes, B. (1989). Compensatory lengthening in moraic phonology. Linguistic Inquiry, 20: 253–306. Hayes, B. (1995). Metrical Stress Theory: Principles and Case Studies. Chicago: University of Chicago Press. Hayes, Bruce (1999). Phonetically-driven phonology: The role of Optimality Theory and inductive grounding. In Michael Darnell, Edith Moravcsik, Frederick J. Newmeyer, Michael Noonan, and Kathleen Wheatley (eds), Functionalism and Formalism in Linguistics, Volume I: General Papers. Amsterdam: Benjamins, I: 243–85. Hayes, Bruce (2004). Phonological acquisition in Optimality Theory: The early stages. In René Kager, Joe Pater, and Wim Zonneveld (eds), Constraints in Phonological Acquisition. Cambridge: Cambridge University Press, 158–203. Hayes, Bruce and Londe, Zsuzsa Cziraky (2006). Stochastic phonological knowledge: The case of Hungarian vowel harmony. Phonology, 23(1): 59–104. Hayes, Bruce and Steriade, Donca (2004). The phonetic bases of phonological markedness. In Bruce Hayes, Robert M. Kirchner, and Donca Steriade (eds), Phonetically Based Phonology. Cambridge: Cambridge University Press, 1–33. Hayes, Bruce and Wilson, Colin (2008). A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry, 39: 379–440. Hayes, J., and Clark, H. (1970). Experiments in the segmentation of an artificial speech analog. In J. R. Hayes (ed.), Cognition and the Development of Language. New York: Wiley, 221–34. Hayes, Rachel A., Slater, Alan M. and Brown, Elizabeth (2000). Infants’ ability to categorise on the basis of rhyme. Cognitive Development, 15: 405–19.

References 875 Hayes, Rachel A., Slater, Alan M., and Longmore, Christopher A. (2009). Rhyming abilities in 9-month-olds: The role of the vowel and coda explored. Cognitive Development, 24: 106–12. Hayiou-Thomas, M. E., Kovas, Y., Harlaar, N., Plomin, R., Bishop, D. V. M., Dale, P. S. (2006). Common aetiology for diverse language skills in 4½-year-old twins. Journal of Child Language, 33: 339–68. Hegarty, M. (1992). Adjunct extraction without traces. In D. Bates (ed.), The Proceedings of the Tenth West Coast Conference on Formal Linguistics. Stanford, CA: CSLI, 209–22. Heim, I. (1982). The semantics of definite and indefinite noun phrases. Unpublished Ph.D. thesis, University of Massachusetts. Heim, I. (1985). Notes on comparatives and related matters. Unpublished manuscript, University of Texas, Austin. Heim, I. (1998). Anaphora and Semantic Interpretation: A Reinterpretation of Reinhart’s Approach. In U. Sauerland and O. Percus (eds), The Interpretive Tract, MIT Working Papers in Linguistics 25: 205–46. Heim, I. (2000). Degree operators and scope. In B. Jackson and T. Matthews (eds), Proceedings of Semantics and Linguistic Theory X. Ithaca, NY: CLC Publications, 40–64. Heim, I. (2006). Little. In M. Gibson and J. Howell (eds), Proceedings of Semantics and Linguistic Theory XVI. Ithaca, NY: CLC Publications, 35–58. Heim, I. and Kratzer, A. (1998). Semantics in Generative Grammar. Oxford: Blackwell. Heim, I., Lasnik, H. and May, R. (1991). Reciprocity and plurality. Linguistic Inquiry, 22: 63–101. Heinz, J. N. (2007). The inductive learning of phonotactic patterns. Unpublished Ph.D. thesis, University of California. Heinz, Jeffrey (2008). Left- to- right and right- to- left iterative languages. In Alexander Clark, François Coste, and Lauren Miclet (ed.), Grammatical Inference: Algorithms and Applications, 9th International Colloquium, Lecture Notes in Computer Science. Berlin: Springer, 5278:84–97. Heinz, J. (2009). On the role of locality in learning stress patterns. Phonology, 26: 303–51. Heinz, Jeffrey (2010a). Learning long-distance phonotactics. Linguistic Inquiry, 41: 623–61. Heinz, Jeffrey (2010b). String extension learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics, 897–906. Heinz, Jeffrey, and Idsardi, William (2011). Sentence and word complexity. Science, 333: 295–7. Heinz, Jeffrey and Idsardi, William (2013). What complexity differences reveal about domains in language. Topics in Cognitive Science, 5(1): 111–31. Heinz, Jeffrey, Kobele, Gregory M., and Riggle, Jason (2009). Evaluating the complexity of Optimality Theory. Linguistic Inquiry, 40(2): 277–88. Hendriks, P. and Spenader, J. (2005). When production precedes comprehension: An optimization approach to the acquisition of pronouns. Language Acquisition, 13(4): 319–48. Hendriks, P., Banga, A., van Rij, J., Cannizzaro, G., and Hoeks, J. (2011). Adults’ on-line comprehension of object pronouns in discourse. In A. Grimm, A. Müller, C. Hamann, and E. Ruigendijk (eds), Production–Comprehension Asymmetries in Child Language. Berlin: De Gruyter, 193–216. Herslund, Michael (1984). Particles, prefixes and preposition stranding. In Finn Sørensen and Lars Heltoft (eds), Topics in Danish Syntax. Copenhagen: Akademisk Forlag, 34–7 1. Hestvik, A. and Philip, W. (2000). Binding and coreference in Norwegian child language. Language Acquisition, 8: 171–235. Higginbotham, J. (1991). Either/or. Proceedings of NELS, 21: 143–55.

876 References Higginbotham, J. and May, R. (1981). Questions, quantifiers and crossing. The Linguistic Review, 1(1): 41–80. Higginson, R. P. (1985). Fixing-assimilation in language acquisition. Unpublished Ph.D. thesis, Washington State University. de la Higuera, Colin (1997). Characteristic sets for polynomial grammatical inference. Machine Learning, 27: 125–38. de la Higuera, Colin (2005). A bibliographical study of grammatical inference. Pattern Recognition, 38: 1332–48. de la Higuera, Colin (2010). Grammatical Inference: Learning Automata and Grammars. Cambridge: Cambridge University Press. Hildebrand, Joyce (1987). The acquisition of preposition stranding. Canadian Journal of Linguistics, 32: 65–85. Hill, Archibald A. (1958). Introduction to Linguistic Structures: From Sound to Sentence in English. New York: Harcourt, Brace and World. Hillenbrand, James M. (1983). Perceptual organization of speech sounds by infants. Journal of Speech and Hearing Research, 26: 268–82. Hillenbrand, James M. (1985). Perception of feature similarities by infants. Journal of Speech and Hearing Research, 28: 317–18. Hiramatsu, K. (2000). Accessing linguistic competence: Evidence from children’s and adults’ acceptability judgments. Unpublished Ph.D. thesis, University of Connecticut. Hirsch, C. and Hartman, J. (2006). Some (wh-) questions concerning passive interactions. In A. Belletti, E. Bennati, C. Chesi, E. Di Domenico, and I. Ferrari (eds), Proceedings of the Conference on Generative Approaches to Language Acquisition. Cambridge: Cambridge Scholars Press. Hirsch, C. and Wexler, K. (2006). Children’s passives and their resulting interpretation. In K. U. Deen, J. Nomura, B. Schulz, and B. D. Schwartz (eds), The Proceedings of the Inaugural Conference on Generative Approaches to Language AcquisitionNorth America, University of Connecticut Occasional Papers in Linguistics, 4: 125–36. Hirsch, Christopher, and Wexler, Kenneth (2007). The late development of raising: What children seem to think about seem. In William D. Davies and Stanley Dubinsky (eds), New Horizons in the Analysis of Control and Raising. Dordrecht: Springer, 35–70. Hirsch, C., Hartman, J., and Wexler, K. (2006). Constraints on passive comprehension: Evidence for a theory of sequenced acquisition. Manuscript, MIT. Hirsch, C., Orfitelli, R., and Wexler, K. (2007). When seem means think—The role of the experiencer-phrase in children’s comprehension of raising. In A. Belikova, L. Meroni, and M. Umeda (eds), Proceedings of the Conference on Generative Approaches to Language Acquisition North America 2. Somerville, MA: Cascadilla Press. Hirschberg, J. (1985). A theory of scalar implicature. Unpublished Ph.D. thesis, University of Pennsylvania. Hirsh- Pasek, K. and Golinkoff, R. (1996a). The Origins of Grammar. Cambridge, MA: MIT Press. Hirsh-Pasek, K., and Golinkoff, R. (1996b). The preferential looking paradigm reveals emerging language comprehension. In D. McDaniel, C. McKee, and H. Cairns (eds), Methods for Assessing Children’s Syntax. Cambridge, MA: MIT Press. Hirsh-Pasek, K., Kemler Nelson, D. G., Jusczyk, P. W., Cassidy, K. W., Druss, B., and Kennedy, L. (1987). Clauses are perceptual units for young infants. Cognition, 26: 269–86. Hirst, D. J. and A. Di Cristo (eds) (1998). Intonation Systems: A Survey of Twenty Languages. Cambridge: Cambridge University Press.

References 877 Hirst, W. and Weil, J. (1982). Acquisition of the epistemic and deontic meaning of modals. Journal of Child Language, 9: 659–66. Ho Ka Yan, Agnes (2006). Argument omission in Cantonese preschool children: A discourse- pragmatics perspective. Unpublished undergraduate thesis, Hong Kong University. Retrieved August 31, 2012, from . Hochberg, J. G. (1988a). First steps in the acquisition of Spanish stress. Journal of Child Language, 15: 273–92. Hochberg, J. G. (1988b). Learning Spanish stress: Developmental and theoretical perspectives. Language, 64: 683–706. Hochmann, J.-R., Endress, A. D., and Mehler, J. (2010). Word frequency as a cue for identifying function words in infancy. Cognition, 115: 444–57. Hockema, S. A. (2006). Finding words in speech: An investigation of American English. Language Learning and Development, 2: 119–46. Hodgson, Miren (2001). The acquisition of Spanish reflexive se in transitive verb-argument structures and singularity. In B. Skarabela et al. (eds), Proceedings of 26th BUCLD. Somerville, MA: Cascadilla Press, 302–13. Hodgson, Miren (2003). Children’s production and comprehension of Spanish grammatical aspect. In R. Gess and E. Rubin (eds), Theoretical and Experimental Approaches to Romance Linguistics. Amsterdam: John Benjamins. Hodgson, Miren (2010). Locatum structures and the acquisition of telicity. Language Acquisition, 3: 155–182. Hoekstra, T. and N. Hyams (1998). Aspects of root infinitives. Lingua, 106(1–4): 81–112. Hoekstra, T. and Jordens, P. (1994). From adjunct to head. In T. Hoekstra and B. Schwartz (eds), Language Acquisition Studies in Generative Grammar. Amsterdam: John Benjamins, 119–49. Hoffman, Paul R., Stager,Sheila, and Daniloff, Raymond G. (1983). Perception and production of misarticulated /r/. Journal of Speech and Hearing Disorders, 48: 210–15. Hofherr, P. C. (2003). Inflected complementizers and the licensing of non-referential pro-drop. In W. E. Griffin (ed.), The Role of Agreement in Natural Language. Austin, TX: Proceedings of the Fifth Annual Texas Linguistics Society Conference, 47–58. Hohaus, V. and Tiemann, S. (2009). “. . . this much is how much I’m taller than Joey …” A corpus study in the acquisition of comparison constructions. In Proceedings of AcquisiLyon. Lyon: Université de Lyon , 90–3. Höhle, B. and Weissenborn, J. (2003). German-learning infants’ ability to detect unstressed closed-class elements in continuous speech. Developmental Science, 6: 122–7. Höhle, B., Weissenborn, J., Kiefer, D., Schulz, A., and Schmitz, M. (2004). Functional elements in infants’ speech processing: The role of determiners in the syntactic categorization of lexical elements. Infancy, 5: 341–53. Höhle, B., Schmitz, M., Santelmann, L. M., and Weissenborn, J. (2006). The recognition of discontinuous verbal dependencies by German 19-month-olds: Evidence for lexical and structural influences on children’s early processing capabilities. Language Learning and Development, 2: 277–300. Höhle, B., Bijeljac-Babic, R., Herold, B., Weissenborn, J., and Nazzi, T. (2009). The development of language specific prosodic preferences during the first year of life: Evidence from German and French. Infant Behavior and Development, 32: 262–74. Hohne, E. A. and Jusczyk, P. W. (1994). Two-month-old infants’ sensitivity to allophonic differences. Perception and Psychophysics, 56: 613–23.

878 References Hohnen, B. and Stevenson, J. (1999). The structure of genetics influences on general cognitive, language, phonological, and reading abilities. Developmental Psychology, 35: 590–603. Hollander, Michelle A., Gelman, Susan A., and Star, Jon (2002). Children’s interpretation of generic noun phrases. Developmental Psychology, 38: 883–94. Hollebrandse, B. and Roeper, T. (2014). Recursion. Empirical and formal approaches to recursion in language acquisition. In T. Roeper and M. Speas (eds), Recursion: Complexity in Cognition. New York: Oxford University Press. Hollebrandse, B., Hobbs, K., de Villiers, J. G., and Roeper, T. (2008). Second order embedding and second order false belief. In Anna Gavarró and M. João Freitas (eds), Language Acquisition and Development: Proceedings of GALA 2007, 270–80. Hollich, George J., Jusczyk, Peter W., and Luce, Paul A. (2002). Lexical neighborhood effects in 17-month-old word learning. In Barbora Skarabela, Sarah Fish, and Anna H.-J. Do (eds), Proceedings of the 26th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 1: 314–23. Holmberg, Anders (2000). Am I unscientific? A reply to Lappin, Levine, and Johnson. Natural Language and Linguistic Theory, 18: 837–42. Holmberg, A. (2010). The null generic subject pronoun in Finnish: A case of incorporation in T. Parametric Variation: Null subjects in Minimalist theory. Cambridge: Cambridge University Press, 200–30. Holmberg, A. and Nikanne, E. (2002). Expletives, subjects, and topics in Finnish. In P. Svenonius (ed.), Subjects, Expletives, and the EPP. Oxford: Oxford University Press, 71–105. Holmes, Urban T. (1927). The phonology of an English-speaking child. American Speech, 2: 219–25. de Hoop, Helen, and Krämer, Irene (2005). Children’s optimal interpretations of indefinite subjects and objects. Language Acquisition, 13: 103–23. Hooper [Bybee], Joan (1976). Introduction to Natural Generative Phonology. New York: Academic Press. Hopcroft, John, Motwani, Rajeev, and Ullman, Jeffrey (1979). Introduction to Automata Theory, Languages, and Computation. Boston, MA: Addison-Wesley. Hopcroft, John, Motwani, Rajeev, and Ullman, Jeffrey (2001). Introduction to Automata Theory, Languages, and Computation. Boston, MA: Addison-Wesley. Hopper, Paul, and Thompson, Sandra (1980). Transitivity in grammar and discourse. Language, 56: 251–99. Horgan, D. (1978). The development of the full passive. Journal of Child Language, 5: 65–80. Horn, L. (1969). A presuppositional approach to Only and Even. Proceedings of the Chicago Linguistic Society, 5: 98–107. Horn, L. (1972). On the semantic properties of logical operators in English. Unpublished Ph.D. thesis, University of California. Horn, L. R. (1984). Toward a new taxonomy for pragmatic inference: Q-based and R-based implicature. In D. Schiffrin (ed.), Georgetown University Round Table on Languages and Linguistics 1984. Washington, DC: Georgetown University Press, 11–42. Horn, L, R. (1989). A Natural History of Negation. Chicago: University of Chicago Press. Repr. in 2001. Stanford, CA: CSLI Publications. Horn, L. (1992). The said and the unsaid. In C. Barker and D. Dowty (eds), Proceedings of SALT II. Department of Linguistics, Ohio State University, 163–92. Horn, L. R. (2004). Implicature. In L. R. Horn and G. Ward (eds), The Handbook of Pragmatics. Oxford: Blackwell, 3–28.

References 879 Horn, L. R. (2005). The border wars: A neo-Gricean perspective. In K. Turner and K. von Heusinger (eds), Where Semantics Meets Pragmatics. Oxford: Elsevier, 21–48. Hornby, P. A. and Hass, W. A. (1970). Use of contrastive stress by preschool children. Journal of Speech and Hearing Research, 13: 395–9. Horning, J. J. (1969). A study of grammatical inference. Unpublished Ph.D. thesis, Stanford University. Hornstein, Norbert (1999). Movement and control. Linguistic Inquiry, 30: 69–96. Hornstein, N. and Lightfoot, D. (1981). Explanation in Linguistics: The logical problem of language acquisition. London: Longman. Hornstein, Norbert and Amy Weinberg. (1981). Case theory and preposition stranding. Linguistic Inquiry, 12: 55–91. Hoshi, H. (1994). Passive, causative and the light verbs: A study on theta role assignment. Unpublished Ph.D. thesis, University of Connecticut. Houlihan, Kathleen and Iverson, Gregory K. (1979). Functionally-constrained phonology. In Daniel A. Dinnsen (ed.), Current Approaches to Phonological Theory. Bloomington, IN: Indiana University Press, 50–73. Houston, D. M. and Jusczyk, P. W. (2000). The role of talker-specific information in word segmentation by infants. Journal of Experimental Psychology: human perception and performance, 26: 1570–82. Houston, D. M., Jusczyk, P. W., Kuijpers, C., Coolen, R., and Cutler, A. (2000). Cross-language word segmentation by 9-month-olds. Psychonomics Bulletin and Review, 7: 504–9. Houston, Derek M., Jusczyk, Peter W., and Jusczyk, Anne Marie (2003). Memory for bisyllables in 2-month-olds. In Derek M. Houston, Amanda Seidl, George Hollich, Elizabeth K. Johnson, and Ann Marie Jusczyk (eds), Jusczyk Lab Final Report. Purdue University. West Lafayette, IN. . Houston, D. M., Santelmann, L. M., and Jusczyk, P. W. (2004). English-learning infants’ segmentation of trisyllabic words from fluent speech. Language and Cognitive Processes, 19: 97–136. van Hout, Angeliek (1996). Event semantics of verb frame alternations: A case study of Dutch and its acquisition. Unpublished Ph.D. thesis, Tilburg University. [Published in 1998. New York: Garland Publishing.] van Hout, Angeliek (1997). Learning telicity: Acquiring argument structure and the syntax/ semantics of direct objects. E. Hughes et al. (eds), Proceedings of 21st BUCLD. Somerville, MA: Cascadilla Press, 678–88. van Hout, Angeliek (1998). On the role of direct objects and particles in learning telicity in Dutch and English. In A. Greenhill, M. Hughes, and H. Littlefield (eds), Proceedings of 22th BUCLD. Somerville, MA: Cascadilla Press, 397–408. van Hout, Angeliek (2000). Event semantics in the lexicon–syntax interface: Verb frame alternations in Dutch and their acquisition. In C. Tenny and J. Pustejovsky (eds), Events as Grammatical Objects. Stanford, CA: CSLI Publications, 239–82. van Hout, Angeliek (2005). Imperfect imperfectives: On the acquisition of aspect in Polish. P. Kempchinsky and R. Slabakova (eds), Aspectual Inquiries. Dordrecht: Springer, 317–44. van Hout, Angeliek (2007a). Optimal and non-optimal interpretations in the acquisition of Dutch past tenses. In A. Belikova, L. Meroni, and M. Umeda (eds), Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America (GALANA). Somerville, MA: Cascadilla Proceedings Project, 159–70.

880 References van Hout, Angeliek (2007b). Acquisition of aspectual meanings in a language with and a language without morphological aspect. H. Caunt-Nulton, S. Kulatilake, and I. Woo (eds), 31st BUCLD Proceedings Supplement. . van Hout, Angeliek (2007c). Acquiring telicity cross-linguistically: On the acquisition of telicity entailments associated with transitivity. In M. Bowerman and P. Brown (eds), Crosslinguistic Perspectives on Argument Structure: Implications for learnability. Hillsdale: Routledge, 255–78. van Hout, Angeliek (2008). Acquisition of perfective and imperfective aspect in Dutch, Italian and Polish. Lingua, 118(11): 1740–65. van Hout, Angeliek and Hollebrandse, Bart (2001). On the acquisition of the aspects in Italian. In J.-Y. Kim and A. Werle (eds), The Proceedings of SULA: The semantics of underrepresented languages in the Americas. University of Massachusetts Occasional Papers 25. GLSA, Amherst, 111–20. van Hout, A. and Veenstra, A. (2010). Telicity marking in Dutch child language: Event realization or no aspectual coercion? In J. Costa, A. Castro, M. Lobo, and F. Pratas (eds), Language Acquisition and Development: Proceedings of GALA 2009. Cambridge, MA: Cambridge Scholars Publishing, 216–28. van Hout, A., de Swart, H., and Verkuyl, H. (2005). Introducing perspectives on aspect. In H. Verkuyl, H. de Swart, and van Hout, A. (eds), Perspectives on Aspect. Dordrecht: Springer, 1–17. van Hout, Angeliek, Gagarina, Natalia, Dressler, Wolfgang, plus 25 others (2010). Learning to understand aspect across languages. Presented at 35th BUCLD. Hsu, J. R., Cairns, H., and Fiengo, R. (1985). The development of grammars underlying children’s interpretation of complex sentences. Cognition, 20: 25–48. Hu, L. and Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1): 1–55. Hua, Z. and Dodd, B. (2000). The phonological acquisition of Putonghua (modern standard Chinese). Journal of Child Language, 27: 3–42. Hua, Zhu and Dodd, Barbara (2006). Phonological Development and Disorders in Children: A multilingual perspective. Buffalo, NY: Multilingual Matters Ltd. Huang, C.-T. J. (1982a). Logical relations in Chinese and the theory of grammar. Unpublished Ph.D. thesis, MIT. Huang, C.-T. (1982b). Move wh in a language without wh-movement. The Linguistic Review, 1: 369–416. Huang, C.-T. J. (1984). On the distribution and reference of empty pronouns. Linguistic Inquiry, 15: 531–74. Huang, C.-T. J. (1989). Pro-drop in Chinese. In O. Jaeggli and K. J. Safir (eds), The Null Subject Parameter. Dordrecht, NL: Kluwer, 185–214. Huang, C.-T. J. and Liu, C.-S. L. (2001). Logophoricity, attitudes and ziji at the interface. In P. Cole, G. Hermon, and C.-T. J. Huang (eds), Long-Distance Reflexives (Syntax and Semantics). New York: Academic Press, 33: 141–95. Huang, C.-T. James, Li, Y.-H. Audrey, and Li, Yafei (2009). The Syntax of Chinese. Cambridge: Cambridge University Press. Huang, Y. and Snedeker, J. (2009a). Online interpretation of scalar quantifiers: Insight into the semantics–pragmatics interface. Cognitive Psychology, 58: 376–415. Huang, Y. and Snedeker, J. (2009b). Semantic meaning and pragmatic interpretation in five-year olds: Evidence from real time spoken language comprehension. Developmental Psychology, 45: 1723–39.

References 881 Huang, Y. T., Snedeker, J., and Spelke, E. (2013). What exactly do numbers mean? Language Learning and Development, 9(2): 105–29. Hughes, M. E., and Allen, S. E. (2014). The incremental effect of discourse-pragmatic sensitivity on referential choice in the acquisition of a first language. Lingua, 155: 43–61. Humbert, Helga (1997). On the asymmetrical nature of nasal obstruent relations. In Kiyomi Kusumoto (ed.), Proceedings of the North East Linguistic Society 27. GLSA. University of Massachusetts, Amherst, 219–233. Hunter, T. and Lidz, J. (2013). Conservativity and learnability of determiners. Journal of Semantics, 30: 315–34. Hurch, Bernard (ed.) (2005). Studies on Reduplication. Berlin, New York: Mouton de Gruyter. Hurewitz, F., Papafragou, A., Gleitman, L., and Gelman, R. (2006). Asymmetries in the acquisition of numbers and quantifiers. Language Learning and Development, 2: 77–96. Hurst, J. A., Baraitser, M., Auger, E., Graham, F. and Norell, S. (1990). An extended family with a dominantly inherited speech disorder. Developmental Medicine and Child Neurology, 32: 347–55. Hutchinson, A. (1994). Algorithmic Learning. Oxford: Clarendon Press. Huttenlocher, J., Vasilyeva, M., and Shimpi, P. (2004). Syntactic priming in young children, Journal of Memory and Language, 50: 182–95. Huxley, Renira. (1970). The development of the correct use of subject personal pronouns in two children. In G. Flores d’Arcais and W. Levelt (eds), Advances in Psycholinguistics. Amsterdam: North Holland. Hyams, N. (1986). Language Acquisition and the Theory of Parameters. Dordrecht: Reidel. Hyams, N. (1992). The genesis of clausal structure. In J. Meisel (ed.), The Acquisition of Verb Placement; Functional Categories and V2 Phenomena in Language Development. Dordrecht: Kluwer, 371–400. Hyams, N. (2002). Clausal structure in early Greek: A reply to Varlokosta, Vainikka and Rohrbacher and a reanalysis. The Linguistic Review, 19: 225–69. Hyams, Nina (2007). Aspectual effects on interpretation in early grammar. Language Acquisition, 14(3): 231–68. Hyams, N. (2011). Missing subjects in early child language. In J. de Villiers and T. Roeper (eds), Handbook of Generative Approaches to Language Acquisition. New York: Springer, 13–52. Hyams, Nina (2012). Eventivity effects in early grammar: The case of non-finite verbs. First Language, 32: 239–69. Hyams, N. (to appear). Missing subjects in early child language. Hyams, N. and S. Sigurjónsdóttir (1990). A cross-linguistic comparison of the development of referentially dependent elements. Language Acquisition, 1: 57–93. Hyams, N. and Snyder, W. (2005). Young children never smuggle: Reflexive clitics and the Universal Freezing Hypothesis. Paper presented at the Boston University Conference on Language Development, 5 November. Hyams, N. and Wexler, K. (1993). On the grammatical basis of null subjects in child language. Linguistic Inquiry, 24: 421–59. Hyams, Nina, Schaeffer, Jeanette, and Johnson, Kyle (1993). On the acquisition of verb particle constructions. Unpublished manuscript, UCLA and University of Massachusetts, Amherst. Hyams, N., Ntelitheos, D., and Manorohanta, C. (2006). The acquisition of the Malagasy voicing system: Implications for the adult grammar. Natural Language and Linguistic Theory, 24(4): 1049–92.

882 References Iatridou, S. (1993). On nominative case assignment and a few related things. MIT Working Papers in Linguistics, 19: 175–96 (First circulated in 1988). Idsardi, William (2006). A simple proof that Optimality Theory is computationally intractable. Linguistic Inquiry, 37(2): 271–5. Ingram, David (1974). Fronting in child phonology. Journal of Child Language, 1: 233–41. Ingram, David (1978). The role of the syllable in phonological development. In Alan Bell and Joan B. Hooper (eds), Syllables and Segments. Amsterdam: North-Holland, 143–55. Ingram, David (1988a). Jakobson revisited: Some evidence from the acquisition of Polish. Lingua, 75: 55–82. Ingram, David (1988b). The acquisition of word-initial [v]. Language and Speech, 31: 77–85. Ingram, David (1989). Phonological Disability in Children, 2nd edn. London: Cole and Whurr. Ingram, David (1992). Early phonological acquisition: A cross- linguistic perspective. In Charles A. Ferguson, Lise Menn, and Carol Stoel- Gammon (eds), Phonological Development: Models, research, implications. Timonium, MD: York Press, 423–35. Ingram, D. and Thompson, W. (1996). Early syntactic acquisition in German: Evidence for the modal hypothesis. Language, 72(1): 97–120. Ingram, D. and Tyack, D. L. (1979). Inversion of subject NP and Aux in children’s questions. Journal of Psycholinguistic Research, 8(4): 333–41. Inhelder, B. and Piaget, J. (1958). The Growth of Logical Thinking from Childhood to Adolescence. New York: Basic Books. Inhelder, Barbel and Piaget, Jean (1964). The Early Growth of Logic in the Child. London: Routledge & Kegan Paul. Inkelas, Sharon (1993). Deriving cyclicity. In S. Hargus and E. Kaisse (eds), Studies in Lexical Phonology. San Diego: Academic Press. Inkelas, Sharon and Rose, Yvan (2007). Positional neutralization: A case study from child language. Language, 83: 707–36. Inkelas, Sharon, Orgun, Orhan, and Zoll, Cheryl (1997). The implications of lexical exceptions for the nature of the grammar. In I. Roca (ed.), Derivations and Constraints in Phonology. New York: Oxford University Press, 393–418. Isobe, Miwa, and Sugisaki, Koji (2002). The acquisition of pied-piping in French and its theoretical implications. Paper presented at Going Romance 2002, Workshop on Acquisition, Rijksuniversiteit Groningen. Itkonen, Terho (1977). Huomioita lapsen äänteistön kehityksestä. Virittäjä, 279–308. Ito, Junko and Mester, Armin (2001). Covert generalizations in Optimality Theory: The role of stratal faithfulness constraints. Studies in Phonetics, Phonology, and Morphology, 7(2): 273–99. Ito, Junko, Mester, Armin, and Padgett, Jaye (1995). Licensing and underspecification in Optimality Theory. Linguistic Inquiry, 26(4): 571–613. Izard, V., Sann, C., Spelke, E. S., and Streri, A. (2009). Newborn infants perceive abstract numbers. Proceedings of the National Academy of Sciences, 106(25): 10382–5. Jackendoff, R. S. (1972). Semantic Interpretation in Generative Grammar. Cambridge, MA: MIT Press. Jackendoff, R. S. (1976). Toward an explanatory semantic representation. Linguistic Inquiry, 7: 89–150. Jackendoff, R. (1977). X-Bar Syntax: A study of phrase structure. Linguistic Inquiry monograph two. Cambridge, MA: MIT Press. Jackendoff, R. S. (1983). Semantics and Cognition. Cambridge, MA: MIT Press.

References 883 Jackendoff, R. S. (1987). Consciousness and Computational Mind. Cambridge, MA: MIT Press. Jackendoff, R. S. (1990). Semantic Structures. Cambridge, MA: MIT Press. Jackendoff, R. S. (1992). Mme. Tussaud meets the binding theory. Natural Language and Linguistic Theory, 10: 1–31. Jackendoff, R. S. (1996). The proper treatment of measuring out, telicity, and perhaps even quantification in English. Natural Language and Linguistic Theory, 14: 305–54. Jackendoff, Ray (2002). Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press. Jackson, Janice (1998). Linguistic aspect in African-American English speaking children: An investigation of aspectual “be”. Unpublished Ph.D. thesis, University of Massachusetts. Jackson, Janice and Green, Lisa (2005). Habitual aspect in child African American English. H. Verkuyl, H. de Swart, and A. van Hout (eds), Perspectives on Aspect. Dordrecht: Springer, 232–50. Jaeger, Jeri (1984). Assessing the psychological status of the Vowel Shift Rule. Journal of Psycholinguistic Research, 13: 13–36. Jaeger, Jeri (1986). On the acquisition of abstract representations for English vowels. Phonology Yearbook, 3: 71–97. Jaeggli, Osvaldo (1986). Passive. Linguistic Inquiry, 17: 587–622. Jäger, Gerhard (2007). Maximum entropy models and stochastic Optimality Theory. In Annie Zaenen, Jane Simpson, Tracy Holloway King, Jane Grimshaw, Joan Maling, and Chris Manning (eds), Architectures, Rules, and Preferences: Variation on Themes by Joan Bresnan. Stanford, CA: CSLI Publications, 467–79. Jäger, G. and Rogers J. (2012). Formal language theory: refining the Chomsky hierarchy. Philosophical Transactions of the Royal Society B, 367: 1956–70. Jäger, Gerhard and Rosenbach, Anette (2006). The winner takes it all—almost. cumulativity in grammatical variation. Linguistics, 44: 937–7 1. Jain, Sanjay, Osherson, Daniel, Royer, James S., and Sharma, Arun (1999). Systems that Learn: An Introduction to Learning Theory (Learning, Development and Conceptual Change), 2nd edn. Cambridge, MA: MIT Press. Jakobson, Roman (1941). Kindersprache, Aphasie und allgemeine Lautgesetze. Uppsala: Almqvist and Wiksells. Translated (1968) as Child Language, Aphasia, and Phonological Universals. by Allan R. Keiler. Mouton. The Hague. Jakobson, Roman (1941/1968). Child Language, Aphasia and Phonological Universals. The Hague: Mouton. (Original work published 1941.) Jakubowicz, C. (1984). On markedness and binding principles. In C. Jones and P. Sells (eds), Proceedings of NELS, 14, GLSA, University of Massachusetts, Amherst. Jakubowicz, C. (2011). Measuring derivational complexity: New evidence from typically- developing and SLI learners of L1 French. Lingua, 121: 339–51. Jakubowicz, C. and J. Gutierrez (2007). Elicited production and comprehension of root wh questions in French and Basque. COST Meeting Cross Linguistically Robust Stage of Children’s Linguistic Performance, Berlin. Jakubowicz, C. and Roulet, L. (2004). Do French-speaking children with SLI present a selective deficit on tense? Proceedings of the Annual Boston University Conference on Language Development, 28(1): 256–66. Jakubowicz, C. and Strik, N. (2008). Scope-marking strategies in the acquisition of long distance wh-questions in French and Dutch. Language and Speech, 51(1&2): 101–32. Jared, Debra and Seidenberg, Mark S. (1990). Naming multisyllabic words. Journal of Experimental Psychology: Human perception and performance, 16: 92–105.

884 References Jarmulowicz, L. (2006). School-aged children’s phonological production of derived English words. Journal of Speech, Language, and Hearing Research, 49: 294–308. Jarosz, Gaja (2006a). Rich lexicons and restrictive grammars—maximum likelihood learning in Optimality Theory. Unpublished Ph.D. thesis, The Johns Hopkins University. Jarosz, Gaja (2006b). Richness of the base and probabilistic unsupervised learning in Optimality Theory. In Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology at HLT-NAACL. Association for Computational Linguistics. New York City, USA, 50–9. Jarosz, Gaja (2009a). Effects of lexical frequency and phonotactics on learning of morphophonological alternations. Paper Presented at Learning Meets Acquisition: The Learnability of Linguistic Frameworks From Formal and Cognitive Perspectives. Osnabrueck, Germany. Jarosz, Gaja (2009b). Learning phonology with stochastic partial orders. In Third North East Computational Phonology Meeting. MIT, Cambridge, MA. Jarosz, Gaja (2009c). Restrictiveness and phonological grammar and lexicon learning. In Malcolm Elliot, James Kirby, Osamu Sawada, Eleni Staraki, and Suwon Yoon (eds), Proceedings of the 43rd Annual Meeting of the Chicago Linguistics Society. Chicago Linguistics Society, 125–34. Jarosz, Gaja (2010). Implicational markedness and frequency in constraint- based computational models of phonological learning. Journal of Child Language. Special Issue on Computational Models of Child Language Learning, 37(3): 565–606. Jarosz, Gaja (2011). The roles of phonotactics and frequency in the learning of alternations. In Nick Danis, Kate Mesh, and Hyunsuk Sung (eds), BUCLD 35: Proceedings of the 35th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 321–33. Jarosz, Gaja (2013). Naive parameter learning for Optimality Theory—the hidden structure problem. In Seda Kan, Claire Moore-Cantwell and Robert Staubs (eds), Proceedings of the Fortieth Conference of the North East Linguistics Society. Amherst, MA: GLSA, 2: 1–14. Jeanne, LaVerne M. (1992). Some phonological rules of Hopi. International Journal of American Linguistics, 48: 245–70. Jeong, Y. (2004). Children’s question formations from a Minimalist Perspective. Unpublished manuscript, University of Maryland. Jensen, B. and Thornton, R. (2008). Fragments of child grammar. In M. João Freitas and A. Gavarró Algueró (eds), Language Acquisition and Development: Papers from GALA 2007. Cambridge: Cambridge Scholars Publishing. Jeschull, Liane (2007). The pragmatics of telicity and what children make of it. In A. Belikova, L. Meroni, and M. Umeda (eds), Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America (GALANA). Somerville, MA: Cascadilla Proceedings Project, 180–7. Jesney, Karen (2007). Child chain shifts as faithfulness to input prominence. In Alyona Belikova, Luisa Meroni, and Mari Umeda (eds), Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition—North America (GALANA 2). Cascadilla. Somerville, MA, 188–99. Jesney, Karen (2009). Uniformity effects as a consequence of learning with lexical constraints. Poster presented at the KNAW Colloquium on Language Acquisition and Optimality Theory. Amsterdam, the Netherlands. Jesney, Karen and Tessier, Anne- Michelle (2009). Gradual learning and faithfulness: Consequences of ranked vs. weighted constraints. In Muhammad Abdurrahman, Anisa

References 885 Schardl, and Martin Walkow (eds), Proceedings of the Thirty-Eighth Meeting of the North East Linguistic Society. GLSA Publications. Amherst, MA: GLSA Publications, 1: 375–88. Jesney, Karen and Tessier, Anne-Michelle (2011). Biases in Harmonic Grammar: The road to restrictive learning. Natural Language and Linguistic Theory, 29: 251–90. Jesney, Karen, Pater, Joe, and Staubs, Robert (2010). Restrictive learning with distributions over underlying representations. Paper presented at the Workshop on Computational Modeling of Sound Pattern Acquisition. Edmonton, AB. Jing, C. (2008). Pragmatic computation in language acquisition: Evidence from disjunction and conjunction in negative context. Ph.D. thesis, University of Maryland at College Park. Jing, C., Crain. S., and Hsu. C.-F. (2005). The interpretation of focus in Chinese: Child vs. adult language. In Y. Otsu (ed.), Proceedings of the Sixth Tokyo Conference on Psycholinguistics. Tokyo: Hitsuji Shobo, 165–90. Joffe, V. and Varlokosta, S. (2007). Patterns of syntactic development in children with Williams syndrome and Down’s syndrome: Evidence from passives and wh-questions. Clinical Linguistics and Phonetics, 21(9): 705–27. Johns, Alana (1992). Deriving ergativity. Linguistic Inquiry, 21: 57–87. Johns, Alana, Massam, Diane, and Ndayiragije, Juvenal (eds) (2006). Ergativity: Emerging issues. Dordrecht: Springer. Johnson, E. K. (2008). Infants use prosodically conditioned acoustic-phonetic cues to extract words from speech. Journal of the Acoustical Society of America, 123: EL144–EL148. Johnson, E. K. and Jusczyk, P. W. (2001). Word segmentation by 8-month-olds: When speech cues count more than statistics. Journal of Memory and Language, 44: 1–20. Johnson, E. K. and Seidl, A. (2008). Clause segmentation by 6-month-old infants: A cross- linguistic perspective. Infancy, 13: 440–55. Johnson, E. K. and Seidl, A. (2009). At 11 months, prosody still outranks statistics. Developmental Science, 12: 131–14. Johnson, E. K. and Tyler, M. (2010). Testing the limits of statistical learning for word segmentation. Developmental Science, 13: 339–45. Johnson, K. (1989). Clausal architecture and structural case. Manuscript, University of Wisconsin, Madison. Johnson, Mark (2002). Optimality-theoretic lexical functional grammar. In Suzanne Stevenson and Paula Merlo (eds), The Lexical Basis of Syntactic Processing: Formal, computational and experimental issues. Amsterdam: John Benjamins, 59–73. Johnson, M. (2008). Using adapter grammars to identify synergies in the unsupervised learning of linguistic structure. In Proceedings of Association for Computational Linguistics 2008. Johnson, V. E. and de Villiers, J. G. (2009). Syntactic frames in fast mapping verbs: The effects of age, dialect and clinical status. Journal of Speech, Language and Hearing Research, 52: 610–22. Jongstra, Wenckje (2003). Variable and stable clusters: Variation in the realisation of consonant clusters. Canadian Journal of Linguistics, 48: 265–88. Joos, Martin (1942). A phonological dilemna in Canadian English. Language, 18: 141–4. Jordens, P. (1990). The acquisition of verb placement in Dutch and German. Linguistics, 28: 1407–48. Josefsson, Gunlög (1997). On the Principles of Word Formation in Swedish. Lund: Lund University Press. Josefsson, G. (2002). The use and function of nonfinite root clauses in Swedish child language. Language Acquisition, 10(4): 273–320.

886 References Joseph, K. L. and Pine, J. M. (2002). Does error-free use of French negation constitute evidence for very early parameter setting? Journal of Child Language, 29(1): 71–86. Joshi, A. K. (1985). Tree-adjoining grammars: How much context sensitivity is required to provide reasonable structural descriptions? In D. Dowty, L. Karttunen, and A. Zwicky (eds), Natural Language Parsing. Cambridge: Cambridge University Press, 206–250. Joshi, A. K. (2004). Starting with complex primitives pays off: Complicate locally, simplify globally. Cognitive Science, 28: 637–68. Jun, Jongho (1995). Perceptual and articulatory factors in place assimilation: An Optimality Theory approach. Unpublished Ph.D. thesis., UCLA. Jun, Sun-Ah and Fougeron, Cécile (2000). A phonological model of French intonation. In Antonis Botinis (ed.), Intonation: Analysis, Modelling and Technology. Dordrecht: Kluwer, 209–42. Jusczyk, Peter W. (1977). Perception of syllable-final stops by 2-month-old infants. Perception and Psychophysics, 21: 450–54. Jusczyk, Peter (1992). Developing phonological categories from the speech signal. In Charles A. Ferguson, Lise Menn, and Carol Stoel-Gammon (eds), Phonological Development: Models, Research, Implications. Timonium, MD: York Press, 17–64. Jusczyk, Peter W. (1997). The Discovery of Spoken Language. Cambridge, MA: MIT Press/ Bradford Books. Jusczyk, P. W. and Aslin, R. N. (1995). Infants’ detection of the sound patterns of words in fluent speech. Cognitive Psychology, 29: 1–23. Jusczyk, Peter W. and Derrah, Carolyn (1987). Representation of speech sounds by young infants. Developmental Psychology, 23: 648–54. Jusczyk, P. W. and Thompson (1978). Perception of a phonetic contrast in multisyllabic utterances by two-month-old infants. Perception and Psychophysics, 23: 105–9. Jusczyk, P. W., Hirsh-Pasek, K., Kemler Nelson, D. G., Kennedy, L., Woodward, A., and Piwoz, J. (1992). Perception of acoustic correlates of major phrasal units by young infants. Cognitive Psychology, 24: 252–93. Jusczyk, P. W., Cutler, A., and Redanz, N. J. (1993a). Infants’ preference for the predominant stress patterns of English words. Child Development, 64: 675–87. Jusczyk, P. W., Friederici, A. D., Wessels, J., Svenkerud, V. Y. and Jusczyk, A. M. (1993b). Infants’ sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32: 402–20. Jusczyk, P. W., Luce, P. A. and Charles-Luce, J. (1994). Infants’ sensitivity to phonotactic patterns in the native language. Journal of Memory and Language, 33: 630–45. Jusczyk, Peter W., Jusczyk, Anne Marie, Kennedy, Lori J., Schomberg, Tracy, and Koenig, Nan (1995a). Young infants’ retention of information about bisyllabic utterances. Journal of Experimental Psychology: Human perception and performance, 21: 822–36. Jusczyk, Peter W., Kennedy, Lori J., and Jusczyk, Anne Marie (1995b). Young infants’ retention of information about syllables. Infant Behavior and Development, 18: 27–41. Jusczyk, P. W., Hohne, E. and Mandel, D. (1995c). Picking up regularities in the sound structure of the native language. In W. Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and methodological issues in crosslanguage speech research. Timonium, MD: York Press, 91–119. Jusczyk, P. W., Hohne, E. A., and Bauman, A. (1999a). Infants’ sensitivity to allophonic cues for word segmentation. Perception and Psychophysics, 62: 1465–76.

References 887 Jusczyk, P. W., Houston, D. M., and Newsome, M. (1999b). The beginning of word segmentation in English-learning infants. Cognitive Psychology, 39: 159–207. Jusczyk, Peter W., Mara B. Goodman, and Angela Bauman (1999c). 9-month-olds attention to sound similarities in syllables. Journal of Memory and Language, 40: 62–82. Jusczyk, Peter W., Smolensky, Paul, and Allocco, Theresa (2002). How English-learning infants respond to markedness and faithfulness constraints. Language Acquisition, 10: 31–73. Jusczyk, Peter W., Smolensky, Paul, Arnold, Karen, and Moreton, Elliott (2003). Acquisition of nasal place assimilation by 4.5-month-old infants. In Derek M. Houston, Amanda Seidl, George Hollich, Elizabeth K. Johnson, and Ann Marie Jusczyk (eds), Jusczyk Lab Final Report. Purdue University. West Lafayette, IN. . Kabuto, Y. (2007). The acquisition of the mechanism of unselective binding, LF wh-movement and constraints on movement. Nazan Linguistics Special Issue. Kadin, G. and Engstrand, O. (2005). Tonal word accents produced by Swedish 18-and 24- month-olds. Proceedings of FONETIK 2005, 67–70. Kaestner, K. H., Knochel, W., and Martinez, D. E. (2000). Unified nomenclature for the winged helix/forkhead transcription factors. Genes and Development, 14: 142–6. Kager, René (1999a). Optimality Theory. Cambridge: Cambridge University Press. Kager, René (1999b). Surface opacity of metrical structure in optimality theory. In B. Herman and M. van Oostendorp (eds), The Derivational Residue in Phonological Optimality Theory. Amsterdam: John Benjamins, 207–45. Kager, R. (2007). Feet and metrical stress. In P. de Lacy (ed.), The Cambridge Handbook of Phonology. Cambridge: Cambridge University Press, 195–227. Kager, R., Pater, J., and Zonneveld, W. (eds) (2004). Constraints in Phonological Acquisition. Cambridge: Cambridge University Press. Kajikawa, Sachiyo, Fais, Laurel, Mugitani, Ryoko, Werker, Janet F. and Amano, Shigeaki (2006). Cross-language sensitivity to phonotactic patterns in infants. Journal of the Acoustical Society of America, 120: 2278–84. Kamawar, D. and Olson, D. R. (1999). Children’s representational theory of language: The problem of opaque contexts. Cognitive Development, 14: 531–48. Kamp, H. and Partee, B. H. (1995). Prototype theory and compositionality. Cognition, 57: 129–91. Kamp, Hans and Reyle, Uwe (1993). From Discourse to Logic. Dordrecht: Kluwer. Kampen, Jacqueline van (1996). PF/LF convergence in acquisition. In Kiyomi Kusumoto (ed.), Proceedings of NELS 26. Amherst, Massachusetts: GLSA, 149–63. Kaper, Willem (1976). Pronominal case errors. Journal of Child Language, 3: 439–41. Kaplan, Ronald and Kay, Martin (1994). Regular models of phonological rule systems. Computational Linguistics, 20: 331–78. Kapur, S. (1994). Some applications of formal learning theory results to natural language acquisition. In B. Lust and G. Hermon (eds), Syntactic Theory and First Language Acquisition: Cross-linguistic Perspectives. Volume 2: Binding, Dependencies, and Learnability. Hillsdale, NJ: Lawrence Erlbaum, 2: 491–508. Karmiloff-Smith, A. (1980). Psychological processes underlying pronominalization and non- pronominalization in children’s connected discourse. In J. Kreiman and A. G. Ojeda (eds), Papers from the Parasession on Pronouns and Anaphora. Chicago, IL: Chicago Linguistic Society, , 231–49.

888 References Karmiloff-Smith, A. (1998). Development itself is the key to understanding developmental disorders. Trends in Cognitive Sciences, 2(10): 389–98. Karttunen, Lauri (1998). The proper treatment of optimality in computational phonology. In FSMNLP ’98, 1–12. International Workshop on Finite-State Methods in Natural Language Processing, Bilkent University, Ankara, Turkey. Karzon, Roanne G. (1985). Discrimination of polysyllabic sequences by one-to four-month- old infants. Journal of Experimental Child Psychology, 39: 326–42. Kasprzik, Anna and Kötzing, Timo (2010). String extension learning using lattices. In Henning Fernau, Adrian-Horia Dediu, and Carlos Martín-Vide (eds), Proceedings of the 4th International Conference on Language and Automata Theory and Applications (LATA 2010). Lecture Notes in Computer Science. Trier: Springer, 6031: 380–91. Katsos, N. and Bishop, D. V. (2011). Pragmatic tolerance: Implications for the acquisition of informativeness and implicature. Cognition, 120: 67–81. Katsos, N. and Smith, N. (2010). Pragmatic tolerance and speaker–comprehender asymmetries. In K. Franich, M. Iserman, and L. Keil (eds), Proceedings of the 34th Annual Boston Conference in Language Development. Somerville, MA: Cascadilla Press, 221–32. Katz, W. F., Beach, C. M., Jenouri, K., and Verma, S. (1996). Duration and fundamental frequecny correlates of phrase boundaries in productions by children and adults. Journal of the Acoustical Society of America, 99: 3179–91. Katzir, R. (2007). Structurally-defined alternatives. Linguistics and Philosophy, 30: 669–90. Kaufmann, I. and Wunderlich, D. (1998). Cross-linguistic patterns of resultatives. Working papers SFB Theorie des Lexikons, No. 109, University of Düsseldorf. Kawasaki-Fukumori, Haruko (1992). An acoustic basis for universal phonotactic constraints. Language and Speech, 35: 73–86. Kayne, R. (1985). Notes on English agreement. Unpublished manuscript, Graduate Center, City University of New York. Kayne, Richard. (1981). On certain differences between French and English. Linguistic Inquiry, 12: 349–7 1. Kazanina, N. and Phillips, C. (2001). Coreference in child Russian: Distinguishing syntactic and discourse constraints. In A. H.-J. Do, L. Domínguez, and A. Johansen (eds), Proceedings of the 25th annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 413–24. Kazanina, Nina and Phillips, Colin (2007). A developmental perspective on the imperfective paradox. Cognition, 105: 65–102. Kazazis, Kostas (1969). Possible evidence for (near-)underlying forms in the speech of one child. In Proceedings of the Chicago Linguistic Society (CLS), 5: 382–8. Kearns, Michael and Ming Li (1993). Learning in the presence of malicious errors. SIAM Journal of Computing, 22. Kearns, Michael and Vazirani, Umesh (1994). An Introduction to Computational Learning Theory. Cambridge, MA: MIT Press. Kedar, Y., Casasola, M., and Lust, B. (2006). Getting there faster: 18-and 24-month-old infants’ use of function words to determine reference. Child Development, 77: 325–38. Keenan, E. (1971). Names, quantifiers, and the sloppy identity problem. Papers in Linguistics, 4: 211–32. Keenan, Edward (1985). Passive in the world’s languages. In T. Shopen (ed.), Language Typology and Syntactic Description. Cambridge: Cambridge University Press, 1: 243–81. Keenan, E. and Stavi, J. (1986). A semantic characterization of natural language determiners. Linguistics and Philosophy, 9: 253–326.

References 889 Kehoe, M. (1997). Stress error patterns in English-speaking children’s word productions. Clinical Linguistics and Phonetics, 11: 389–409. Kehoe, M. (1998). Support for metrical stress theory in stress acquisition. Clinical Linguistics and Phonetics, 12: 1–23. Kehoe, Margaret (2000). Truncation without shape constraints: The latter stages of prosodic acquisition. Language Acquisition, 8(1): 23–67. Kehoe, M. (2001). Prosodic patterns in children’s multisyllabic word productions. Language, Speech, and Hearing Services in Schools, 32: 284–94. Kehoe, M. and Stoel-Gammon, Carole (1997a). The acquisition of prosodic structure: An investigation of current accounts of children’s prosodic development. Language, 73: 113–44. Kehoe, M. and Stoel-Gammon, Carole (1997b). Truncation patterns in English-speaking children’s word productions. Journal of Speech, Language, and Hearing Research, 40: 526–41. Kehoe, Margaret, Hilaire-Debove, Geraldine, Demuth, Katherine, and Lleó, Conxita (2008). The structure of branching onsets and rising diphthongs: Evidence from the acquisition of French and Spanish. Language Acquisition, 15: 5–57. Keller, Frank (2000). Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Unpublished Ph.D. thesis, University of Edinburgh. Keller, Frank and Asudeh, Ash (2002). Probabilistic learning algorithms and Optimality Theory. Linguistic Inquiry, 33(2): 225–44. Keller, Kathryn C. and Saporta, Sol (1957). The frequency of consonant clusters in Chontal. International Journal of American Linguistics, 23: 28–35. Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of phonology in grammatical category assignment. Psychological Review, 99: 349–64. Kelly, M. H. and Bock, J. K. (1988). Stress in time. Journal of Experimental Psychology: Human perception and performance, 14: 389–403. Kelly, M. H. and Martin, S. (1994). Domain-general abilities applied to domain-specific tasks: Sensitivity to probabilities in perception, cognition, and language. Lingua, 92: 105–40. Kemler-Nelson, D. G., Hirsh-Pasek, K., Jusczyk, P. W., and Cassidy, K. W. (1989). How the prosodic cues in motherese might assist language learning. Journal of Child Language, 16: 55–68. Kemler-Nelson, D. G., Jusczyk, P.W., Mandel, D. R., Myers, J., Turk, A., and Gerken, L. A. (1995). The headturn preference procedure for testing auditory perception. lnfant Behavior and Development, 18: 111–16. Kemp, C. and Tenenbaum, J. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences, 105(31): 10687–92. Kemp, C., Perfors, A., and Tenenbaum, J. (2007). Learning overhypotheses with hieracrchical Bayesian models. Developmental Science, 10(3): 307–21. Kennedy, C. (1997). Antecedent-contained deletion and the syntax of quantification. Linguistic Inquiry, 28: 662–88. Kennedy, C. (1999). Projecting the adjective: The syntax and semantics of gradability and comparison. Unpublished Ph.D. thesis, University of California. [Outstanding dissertations in linguistics. New York, NY: Garland Press.] Kennedy, C. (2001). Polar opposition and the ontology of “degrees.” Linguistics and Philosophy, 24: 33–70. Kennedy, C. (2002). Comparative deletion and optimality in syntax. Natural Language and Linguistic Theory, 20: 553–621. Kennedy, C. (2006). Semantics of comparatives. In K. Allan (ed.), Semantics section of K. Brown (ed.), Encyclopedia of Language and Linguistics, 2nd edn. Oxford: Elsevier.

890 References Kennedy, C. (2007). Vagueness and grammar: The semantics of relative and absolute gradable adjectives. Linguistics and Philosophy, 30: 1–45. Kennedy, C. (2009). Modes of comparison. In M. Elliott, J. Kirby, O. Sawada, E. Staraki, and S. Yoon (eds), Proceedings of the 43rd Annual Meeting of the Chicago Linguistic Society. Chicago, IL: Chicago Linguistic Society, 141–65. Kennedy, C. (2013). A scalar meaning for scalar readings of number words. Unpublished manuscript, University of Chicago. Kennedy, C. and McNally, L. (2005). Scale structure, degree modification, and the semantics of gradable adjectives. Language, 81: 345–81. Kennedy, C., and Merchant, J. (2000). Attributive comparative deletion. Natural Language and Linguistic Theory, 18: 89–146. Kenstowicz, M. (1989). The null subject parameter in modern Arabic dialects. In O. Jaeggli and K. Safir (eds), The Null Subject Parameter. Dordrecht: Kluwer, 263–75. Kenstowicz, Michael (1994). Phonology in Generative Grammar. Cambridge, MA: Blackwell. Kenstowicz, Michael (1997). Base identity and uniform exponence: Alternatives to cyclicity. In J. Durand and B. Laks (eds), Current Trends in Phonology: Models and Methods. Salford, UK: University of Salford, 363–94. Kenstowicz, Michael and Kisseberth, Charles (1979). Generative Phonology: Description and Theory. San Diego, CA: Academic Press. Kerkhoff, Annemarie (2007). The phonology–morphology interface: Acquisition of alternations. Unpublished Ph.D. thesis, Utrecht University. Kernan, K. and B. Blount (1966). The acquisition of Spanish Grammar by Mexican children. Anthropological Linguistics, 8(9): 1–14. Kernan, K. and Sabsay, S. (1996). Linguistic and cognitive ability of adults with Down Syndrome and mental retardation of unknown etiology. Journal of Communication Disorders, 29: 401–22. Kiguchi, H. and Thornton, R. (2004). Binding principles and ACD constructions in child grammars. Syntax, 7: 234–7 1. Kilian, A., Yaman, S., von Fersen, L., and Güntürkün, O. (2003). A bottlenose dolphin discriminates visual stimuli differing in numerosity. Learning and Behavior, 31(2): 133–42. Kim, M., Landau, B., and Phillips, C. (1999). Cross-linguistic differences in children’s syntax for locative verbs. Proceedings of the Annual Boston University Conference on Language Development, 23: 337–48. Kim, Y.-J. (2000). Subject/object drop in the acquisition of Korean: A cross-linguistic comparison. Journal of East Asian Linguistics, 9: 325–51. King, Robert D. (1973). Rule Insertion. Language, 49(3): 551–78. Kiparsky, Paul (1968). How Abstract is Phonology? Bloomington, IN: IULC Publications. Kiparsky, Paul (1976). Abstractness, opacity and global rules. In Andreas Koutsoudas (ed.), The Application and Ordering of Grammatical Rules. The Hague: Mouton, 160–86. Kiparsky, Paul (1982). From cyclic phonology to lexical phonology. In H.v.d. Hulst and N. Smith (eds), The Structure of Phonological Representations. Dordrecht: Foris, Part 1: 131–75. Kiparsky, Paul (1985). Some consequences of lexical phonology. Phonology Yearbook, 2: 85–138. Kiparsky, Paul (2000). Opacity and cyclicity. The Linguistic Review, 17: 351–65. Kiparsky, Paul and Menn, Lise (1977). On the acquisition of phonology. In J. Macnamara (ed.), Language, Learning and Thought. New York: Academic Press, 47–78.

References 891 Kirby, Susannah (2009). Semantic scaffolding in first language acquisition: The acquisition of raising-to-object and object control. Unpublished Ph.D. thesis, University of North Carolina at Chapel Hill. Kirby, Susannah (2011). More over, Control freaks: Syntactic Raising as a cognitive default. In Proceedings of BUCLD 35 (Online Proceedings Supplement). Kirby, S. and Becker, M. (2007). Which it is it? The acquisition of referential and expletive it. Journal of Child Language, 34: 571–99. Kirby, Susannah, Davies, William D., and Dubinsky, Stanley (2010). Up to d[eb]ate on raising and control, part 1: Properties and analyses of the constructions. Language and Linguistics Compass, 4. Kirkham, N. Z., Slemmer, J. A., and Johnson, S. P. (2002). Visual statistical learning in infancy: Evidence for a domain general learning mechanism. Cognition, 83: B35–B42. Kisseberth, Charles (1970). On the functional unity of phonological rules. Linguistic Inquiry, 1(3): 291–306. Kjelgaard, M. M. and Tager-Flusberg, H. (2001). An investigation of language impairment in autism: Implications for genetic subgroups. Language and Cognitive Processes, 16(2/ 3): 287–308. Klatt, D. H. (1979). Speech perception: A model of acoustic-phonetic analysis and lexical access. Journal of Phonetics, 7: 279–312. Klatzky, R. L., Clark, E. V., and Macken, M. (1973). Asymmetries in the acquisition of polar adjectives: Linguistic or conceptual? Journal of Experimental Child Psychology, 16: 32–46. Klee, T. (1985). Role of inversion in children’s question development. Journal of Speech and Hearing Research, 28: 225–32. Klein, B. P. (1995). Grammatical abilities of children with Williams syndrome. Department of Psychology. Atlanta, Georgia: Emory University, 76. Klein, D. (2005). The unsupervised learning of natural language. Unpublished Ph.D. thesis, Stanford University. Klein, E. (1980). A semantics for positive and comparative adjectives. Linguistics and Philosophy, 4: 1–45. Klein, E. (1982). The interpretation of adjectival comparatives. Journal of Linguistics, 18: 113–36. Klein, H. (1984). Learning to stress: A case study. Journal of Child Language, 11: 375–390. Klein, E. (1991). Comparatives. In A. von Stechow and D. Wunderlich (eds), Semantik: Ein internationals Handbuch der zeitgenössischen Forschung. Berlin: Walter de Gruyter, 673–91. Klein, Robert P. (1971). Acoustic analysis of the acquisition of acceptable “r” in American English. Child Development, 42: 543–50. Klein, Wolfgang (1994). Time in Language. London: Routledge. Klein, Wolfgang (2009). How time is encoded. In W. Klein and P. Li (eds), The Expression of Time. Berlin: Mouton de Gruyter, 39–81. Klibanoff, R. S., and Waxman, S. R. (2000). Basic level object categories support the acquisition of novel adjectives: Evidence from preschool-aged children. Child Development, 71: 649–59. Klima, E. S. and Bellugi, U. (1966). Syntactic regularities in the speech of children. In J. Lyons and R. J. Wales (eds), Psycholinguistics Papers: The proceedings of the 1966 Edinburgh conference. Edinburgh: Edinburgh University Press, 183–208. Kline, Melissa, and Demuth, Katherine (2010). Factors facilitating implicit learning: The case of the Sesotho passive. Language Acquisition, 17: 220–34. Knill, David C., and Richards, Whitman (1996). Perception as Bayesian Inference. Cambridge: Cambridge University Press.

892 References Ko, H. (2005). Syntax of why-in-situ: Merge into [Spec,CP] in the overt syntax. Natural Language and Linguistic Theory, 23(4): 867–916. Ko, H. (2006). On the stuctural height of reason Wh-adverbials: Acquisition and consequences. In L. Cheng and N. Corver (eds), Wh-movement Moving On. Cambridge, MA.: MIT Press. Kobele, Gregory (2006). Generating copies: An investigation into structural identity in language and grammar. Unpublished Ph.D. thesis, University of California. Kochanski, G., Grabe, E., Coleman, J., and Rosner, B. (2005). Loudness predicts prominence: Fundamental frequency lends little. Journal of the Acoustical Society of America, 118: 1038–54. Koenig, J., P. (1991). Scalar predicates and negation: Punctual semantics and interval interpretations. CLS, 27: 140–55. Koenig, J. P. (1993). Scalar predicates and negation: Punctual semantics and interval interpretations. Chicago Linguistic Society 27, part 2: The Parasession on Negation, 140–55. Kohl, K. T. (1999). An analysis of finite parameter learning in linguistic spaces. Master’s thesis, MIT. Komura, A. (1981). Shoyuu hyoogen no hattatsu. In M Hori and F. C. Peng (eds) Aspects of Language Acquisition: Gengo shuutoku no shosoo. Hiroshima: Bunka Hyoron Publishing Co. Konopka, G., Bomar, J. M., Winden, K., Coppola, G., Jonsson, Z. O., Gao, F., Peng, S., Preuss, T. M., Wohlschlegel, J. A. and Geschwind, D. H. (2009). Human-specific transcriptional regulation of CNS development genes by FOXP2. Nature, 462, 213–18. Konstantzou, Katerina (2014). Acquisition of tense and aspect in Greek child language: A comparative study between typical development and Specific Language Impairment. Unpublished Ph.D. thesis, University of Athens. Konstantzou, Katerina, Angeliek van Hout, Spyridoula Varlokosta, and Maria Vlassopoulos. (2013). Perfective–imperfective: Development of aspectual distinctions in Greek specific language impairment. Linguistic Variation, 13(2): 187–216. Koo, H. and L. Callahan (2012). Tier-adjacency is not a necessary condition for learning phonotactic dependencies. Language and Cognitive Processes, 27: 1425–32. Kooijman, V. (2007). Continuous-speech segmentation at the beginning of language acquisition: Electrophysiological evidence. Unpublished Ph.D. thesis, Radboud University. Kooijman, V., Hagoort, P., and Cutler, A. (2005). Electrophysiological evidence for prelinguistic infants’ word recognition in continuous speech. Cognitive Brain Research, 24: 109–16. Kooijman, V., Hagoort, P., and Cutler, A. (2009). Prosodic structure in early word segmentation: ERP evidence from Dutch ten-month-olds. Infancy, 6: 591–612. Koopman, Hilda (1984). The Syntax of Verbs: From verb movement rules in the Kru languages to Universal Grammar. Dordrecht: Foris. Koopman, Hilda and Sportiche, Dominique (1991). The position of subjects. Lingua, 85: 211–58. Kopcke, K.-M. (1988). Schemas in German plural formation. Lingua, 74: 303–35. Kopcke, K.-M. (1998). The acquisition of plural marking in English and German revisited: Schemata versus rules. Journal of Child Language, 25(2): 293–319. Köppe, Regina (1994). NP-movement and subject raising. In J. Meisel and P. Jordens (eds), Bilingual First Language Acquisition: French and German grammatical development. Amsterdam: Benjamins, 209–34. Kornfeld, Judith R. (1971). What initial clusters tell us about the child’s speech code. Quarterly Progress Report, MIT Research Laboratory of Electronics, 101: 218–21. Kornfeld, Judith R. (1976). Implications of studying reduced consonant clusters in normal and abnormal child speech. In Robin N. Campbell and Philip T. Smith (eds), Recent Advances in

References 893 the Psychology of Language: Formal and experimental approaches. New York: Plenum Press, 413–23. Kornfeld, Judith R. and Goehl, Henry (1974). A new twist to an old observation: Kids know more than they say. In Anthony Bruck, Robert A. Fox, and Michael W. La Galy (eds), Papers from the Tenth Annual Meeting of the Chicago Linguistic Society, Parasession on Natural Phonology. Chicago Linguistic Society. Chicago, 210–19. Koster, C. (1993). Errors in anaphora acquisition. Unpublished Ph.D. thesis, Utrecht University. Kostic, A. (1995). Information load constraints on processing inflected morphology. In L. B. Feldman (ed.), Morphological Aspects of Language Processing. Hillsdale, NJ: Erlbaum, 317–44. Kovas, Y. and Plomin, R. (2006). Generalist genes: Implications for the cognitive sciences. Trends in Cognitive Sciences, 10(5): 198–203. Krämer, I. (1993). The licensing of subjects in early child language. MIT Working Papers in Linguistics, 19: 197–212. Krämer, I. (2000). Interpreting indefinites. Unpublished Ph.D. thesis , Utrecht University. Krämer, I. (2005). When does many mean a lot? Discourse pragmatics of the weak-strong distinction. In A. Brugos, M. R. Clark-Cotton, and S. Ha (eds), Proceedings of the 29th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 353–64. Kraska-Szlenk, Iwona (1995). The phonology of stress in Polish. Unpublished Ph.D. thesis, University of Illinois at Urbana-Champaign. Kratzer, A. (1996). Severing the external argument from its verb. In J. Rooryck and L. Zaring (eds), Phrase Structure and the Lexicon. Dordrecht: Kluwer, 109–38. Kratzer, Angelika (2010). What grammar might contribute to meaning composition. Slides from talk presented at the Contextualism and Compositionality Workshop, Paris, France, 18 May 2010. Krifka, M. (1998a). At least some determiners aren’t determiners. Unpublished manuscript, University of Texas at Austin. Krifka, Manfred (1989b). Nominal reference, temporal constitution and quantification in event semantics. In R. Bartsch, J. van Benthem, and P. van Emde Boas (eds), Semantics and Contextual Expressions. Dordrecht: Foris Publications, 75–115. Krifka, Manfred (1992). Thematic relations as links between nominal reference and temporal constitution. I. Sag, and A. Szabolcsi (eds), Lexical Matters. Stanford: CSLI Publications, 29–53. Krifka, Manfred, Pelletier, F. Jeffrey, Carlson, Gregory N., ter Meulen, Alice, Chierchia, Gennaro, and Link, Godehart (1995). Genericity: An introduction. In Gregory N. Carlson and Francis J. Pelletier (eds), The Generic Book. Chicago: Chicago University Press, 1–124. Kroch, A., Santorini, B., and Heycock, C. (1988). Bare infinitives and external arguments. In J. Blevins and J. Carter (eds), Proceedings of NELS 18. Amherst: University of Massachusetts, GLSA, 271–85. Kubo, M. (1990). Japanese passives. Unpublished manuscript, MIT. Kučera, Henry and Francis, Winthrop Nelson (1967). Computational Analysis of Present-Day American English. Providence, RI: Brown University Press. Kučerová, I. (2014). The syntax of null subjects. Syntax, 17(2): 132–67. Kuczaj, S. (1976). -ing, -s and -ed: A study of the acquisition of certain verb inflections. Unpublished Ph.D. thesis, University of Minnesota. Kuczaj, Stan A. (1977). The acquisition of regular and irregular past tense forms. Journal of Verbal Learning and Verbal Behavior, 16: 589–600.

894 References Kuczaj, Stan A. (1978). Children’s judgments of grammatical and ungrammatical irregular past tense verbs. Child Development, 49: 319–26. Kuczaj, S. A. and Maratsos, M. P. (1975). What children can say before they will. Merrill-Palmer Quaterly, 21: 89–111. Kuczaj, S. and M. Maratsos (1983). Initial verbs of yes-no questions: A different kind of general grammatical category. Developmental Psychology, 19: 440–44. Kuehn, David P. and Tomblin, J. Bruce (1977). A cineradiographic investigation of children’s w/ r substitutions. Journal of Speech and Hearing Disorders, 42: 462–73. Kuhl, Patricia K. and Miller, James D. (1975). Speech perception by the Chinchilla: Voiced- voiceless distinction in alveolar plosive consonants. Science, 190: 69–72. Kuhl, P. and Miller, J. D. (1982). Discrimination of auditory target dimensions in the presence or absence of variation in a second dimension by infants. Perception and Psychophysics, 31: 279–92. Kuhl, P., Williams, K. A., Lacerda, F., Stevens, K. N. and Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science, 255: 606–8. Kuijpers, C., Coolen, R., Houston, D., and Cutler, A. (1998). Using the head-turning technique to explore cross-linguistic performance differences. Advances in Infancy Research. Stamford, CT: Ablex, 12: 205–20. Kupisch, Tanja (2006). The Acquisition of Determiners in Bilingual German–Italian and German–French Children. Lincom Europa. München. Kurtzman, H. and MacDonald, M. (1993). Resolution of quantifier scope ambiguities. Cognition, 48: 243–79. Kuryłowicz, Jerzy (1949). La nature des procès dites “analogiques.” Acta Linguistica 5: 15–37. Kutas, Marta and Hillyard, Steven A. (1980). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207: 203–8. Labelle, Marie (1990). Predication, wh-movement, and the development of relative clauses. Language Acquisition, 1: 95–119. LaBelle, M. (2000). Les infinitifs racines en langage enfantin. Unpublished manuscript. Labov, W. and Labov, T. (1978). Learning the syntax of questions. In R. Campbell and P. Smith (eds), Recent Advances in the Psychology of Language. New York: Plenum Press. Ladd, D. R. (2008). Intonational Phonology, 2nd edn. Cambridge: Cambridge University Press. Ladefoged, Peter and Maddieson, Ian (1996). The Sounds of the World’s Languages. Oxford: Blackwell. Lado, Robert (1957). Linguistics across Cultures. Ann Arbor: University of Michigan Press. Lahiri, Aditi, and Reetz, Henning (2002). Underspecified recognition. In Carlos Gussenhoven, Natasha Warner, and Ton Rietveld (eds), Labphon 7. Lai, C., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F., and Monaco, A. P. (2001). A forkhead- domain gene is mutated in a severe speech and language disorder. Nature, 413: 519–23. Lai, C. S. L., Gerrelli, D., Monaco, A. P., Fisher, S. E., and Copp, A. J. (2003). FOXP2 expression during brain development coincides with adult sites of pathology in a severe speech and language disorder. Brain, 126(11): 2455–62. Lai, Regine (2015). Learnable vs. unlearnable harmony patterns. Linguistic Inquiry, 46(3): 425–51. Landau, B., and Gleitman, L. R. (1985). Language and Experience: Evidence from the blind child. Cambridge, MA: Harvard University Press. Landau, I. (2000). Elements of Control: Structure and Meaning in Infinitival Constructions. Dordrecht: Kluwer.

References 895 Lange, Steffen, Zeugmann, Thomas, and Zilles, Sandra (2008). Learning indexed families of recursive languages from positive data: A survey. Theoretical Computer Science, 397: 194–232. Lany, Jill and Saffran, Jenny R. (2010). From statistics to meaning: Infants acquisition of lexical categories. Psychological Science, 21: 284–91. Lardiere, Donna (1998). Parameter-resetting in morphology: Evidence from compounding. In Maria-Luise Beck (ed.), Morphology and its Interfaces in Second Language Knowledge. Amsterdam: John Benjamins, 283–305. Larson, R. (1988a). Scope and comparatives. Linguistics and Philosophy, 11: 1–26. Larson, R. K. (1988b). On the double object construction. Linguistic Inquiry, 19: 335–91. Larson, R. K. (1990). Double objects revisited: Reply to Jackendoff. Linguistic Inquiry, 21: 589–632. Lasky, Robert E., Syrdal-Lasky, Ann, and Klein, Robert E. (1975). VOT Discrimination by four to six and a half month old infants from Spanish environments. Journal of Experimental Child Psychology, 20: 215–25. Lasnik, H. and Saito, M. (1984). On the nature of proper government. Linguistic Inquiry, 15: 235–89. Lasnik, Howard, and Saito, Mamoru (1991). On the subject of infinitives. In Lise M. Dobrin, Lynn Nichols and Rosa M. Rodriguez (eds), CLS 27: Papers from the 27th Regional Meeting of the Chicago Linguistic Society. Chicago, IL: Chicago Linguistic Society, University of Chicago, 324–43. Lasser, I. (1997). Finiteness in Adult and Child German. MPI Series in Psycholinguistics. Lasser, I. (1998). Getting rid of root infinitives. Proceedings of the Annual Boston University Conference on Language Development, 22(2): 465–76. Lasser, I. (2002). The roots of root infinitives: Remarks on infinitival main clauses in adult and child language. Linguistics, 40(4(380)): 767–96. Lau, E. (2011). Obligatory agent promotes cue validity: Early acquisition of passive in Cantonese. In N. Danis, K. Mesh, and H. Sung (eds), Proceeding of the 35th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Law, Paul (1998). A unified analysis of P- stranding in Romance and Germanic. In Pius N. Tamanji and Kiyomi Kusumoto (eds), Proceedings of NELS 28. University of Massachusetts, Amherst: GLSA, 219–34. Law, Paul (2006). Preposition stranding. In Martin Everaert and Henk van Riemsdijk (eds), The Blackwell Companion to Syntax. Oxford: Blackwell, 631–84. Laws, G. and Bishop, D. V. M. (2003). A comparison of language abilities in adolescents with Down syndrome and children with Specific Language Impairment. Journal of Speech, Language and Hearing Research, 46: 1324–39. Layton, T. L., and Stick, S. L. (1979). Comprehension and production of comparatives and superlatives. Journal of Child Language, 6: 511–27. Le Corre, M. and Carey, S. (2007). One, two, three, four, nothing more: An investigation of the conceptual sources of the verbal counting principles. Cognition, 105(2): 395–438. Lebeaux, D. (1988). Language acquisition and the form of the grammar. Unpublished Ph.D. thesis, University of Massachussetts. Leben, W. (1975). Suprasegmental phonology. Unpublished Ph.D. thesis, MIT. [Published 1980 by New York, Garland Press.] Lechner, W. (2001). Reduced and phrasal comparatives. Natural Language and Linguistic Theory, 19: 683–735. Lechner, W. (2004). Ellipsis in Comparatives. Berlin: Mouton de Gruyter.

896 References Leddon, E. and Lidz, J. (2006). Reconstruction effects in child language. In D. Bamman, T. Magnitskaia, and C. Zaller (eds), BUCLD 30. Somerville, MA: Cascadilla Press, 328–39. Lee, Duck-Young (1998). Korean Phonology: A Principle-based Approach. München: LINCOM Europa. Lee, H. S. (1991). Tense, aspect and modality: A discourse-pragmatic analysis of verbal affixes in Korean from a typological perspective. Unpublished Ph.D. thesis, UCLA. Lee, H. and Wexler, K. (1987). The acquisition of reflexives and pronouns in Korean: From a cross-linguistic perspective. Paper presented at the 12th annual Boston University conference on language development, Boston. Lee, J., and Naigles, L.R. (2005). Input to verb learning in Mandarin Chinese: A role for syntactic bootstrapping. Developmental Psychology, 41: 529–40. Lee, J., and Naigles, L. R. (2008). Mandarin learners use syntactic bootstrapping in verb acquisition. Cognition, 106: 1028–37. Lee, K. O. and Lee, Y. (2008). An event-structural account of passive acquisition in Korean. Language and Speech, 51: 133–49. Lee, T. (1999). Finiteness and null arguments in child Cantonese. Tsing Hua Journal of Chinese Studies, 29: 365–94. Lees, R. B. (1961). The Phonology of Modern Standard Turkish. Bloomington: Indiana University Press. Legate, J. and Yang, C. (2002). Empirical re-assessment of stimulus poverty arguments. The Linguistic Review, 19: 151–62. Legate, J. A. and Yang, C. (2007). Morphosyntactic learning and the development of tense. Language Acquisition, 14(3): 315–44. Legendre, Géraldine, Miyata, Yoshiro and Smolensky, Paul (1990a). Harmonic Grammar: A formal multi-level connectionist theory of linguistic well-formedness: an application. Proceedings of the Twelfth Annual Conference of the Cognitive Science Society. Cambridge, MA: Lawrence Erlbaum, 884–91. Legendre, Géraldine, Miyata, Yoshiro and Smolensky, Paul (1990b). Harmonic Grammar: A formal multi-level connectionist theory of linguistic well-formedness: theoretical foundations. In Proceedings of the Twelfth Annual Conference of the Cognitive Science Society. Cambridge, MA: Lawrence Erlbaum, 388–95. Legendre, Géraldine, Miyata, Yoshiro and Smolensky, Paul (1990c). Can connectionism contribute to syntax? Harmonic Grammar, with an application. In M. Ziolkowski, M. Noske, and K. Deaton (eds), Proceedings of the Twenty-Sixth Regional Meeting of the Chicago Linguistic Society. Chicago: Chicago Linguistic Society, 237–52. Legendre, Géraldine, Hagstrom, Paul, Vainikka, Anne, and Todorova, Marina (2002). Partial constraint ordering in child French syntax. Language Acquisition, 10(3): 189–227. Legendre, Géraldine, Sorace, Antonella, and Smolensky, Paul (2006). The Optimality Theory– Harmonic Grammar connection. In The Harmonic Mind: From Neural Computation to Optimality-Theoretic Grammar. Cambridge, MA: MIT Press. Léger, C. (2006). Understanding of factivity under negation: An asymmetry between two types of factive predicates. Paper presented at the Boston University Conference on Language Development 31. Boston, MA: Boston University, November 3–5. Leonard, L. (1998). Childen with Specific Language Impairment. Cambridge, MA: MIT Press. Leonard, L. (2009). Cross-linguistic studies of child language disorders. In R. G. Schwartz (ed.), Handbook of Child Language Disorders. New York: Psychology Press.

References 897 Leonard, Laurence B. and Brown, Barbara L. (1984). Nature and boundaries of phonologic categories: A case study of an unusual phonologic pattern in a language-impaired child. Journal of Speech and Hearing Disorders, 49: 419–28. Leonard, L. B., Dromi, E., Adam, G., and Zadunaisky-Ehrlich, S. (2000). Tense and finiteness in the speech of children with Specific Language Impairment acquiring Hebrew. International Journal of Language and Communication Disorders, 35(3): 319–35. Leonini, C. (2002). Assessing the acquisition of Italian as an L1: A longitudinal study of a child of preschool age. Unpublished University of Siena manuscript. LeRoux, Cecile (1988). On the interface of morphology and syntax. Stellenbosch Papers in Lingusitics #18. South Africa: University of Stellenbosch. Leslie, Sarah- Jane (2008). Generics: Cognition and acquisition. Philosophical Review, 117(1): 1–47. Leslie, Sarah-Jane, Khemlani, Sangeet, and Glucksberg, Sam (2011). Do all ducks lay eggs? The generic overgeneralization effect. Journal of Memory and Language, 65(1): 15–31. Levelt, Clara C. (1989). An essay on child phonology. Unpublished Ph.D. thesis, Leiden University. Levelt, Clara C. (1996). Consonant-vowel interactions in child language. In Barbara Bernhardt, John Gilbert, and David Ingram (eds), Proceedings of the UBC International Conference on Phonological Acquisition. Somerville, MA: Cascadilla Press, 229–39. Levelt, Clara C. (2011). Consonant harmony in child language. In Marc van Oostendorp, Colin Ewen, Elizabeth Hume, and Keren Rice (eds), The Blackwell Companion to Phonology. Oxford: Wiley-Blackwell, 1691–7 16. Levelt, Clara C. (2012). Perception mirrors production in 14-and 18-month-olds: The case of coda consonants. Cognition, 123: 174–9. Levelt, Clara C. and van de Vijver, Ruben (2004). Syllable types in cross-linguistic and developmental grammars. In René Kager, Joe Pater, and Wim Zonneveld (eds), Constraints in Phonological Acquisition. Cambridge: Cambridge University Press, 204–18. (Original work published 1998.) Levelt, Willem J. and Wheeldon, Linda (1994). Do speakers have access to a mental syllabary? Cognition, 50: 239–69. Levelt, Clara C., Schiller, Niels O., and Levelt, Willem J. (1999). A developmental grammar for syllable structure in the production of child language. Brain and Language, 68: 291–9. Levelt, Clara C., Schiller, Niels O., and Levelt, Willem J. (2000). The acquisition of syllable types. Language Acquisition, 8: 237–64. Levin, B. (1993). English Verb Classes and Alternations: A preliminary investigation. Chicago: University of Chicago Press. Levin, B. and Rappaport Hovav, M. (1995). Unaccusativity: At the syntax–lexical semantics interface. Cambridge, MA: MIT Press. Levin, B. and Rappaport Hovav, M. (2005). Argument Realization. Cambridge: Cambridge University Press. Levinson, S. (2000). Presumptive Meanings. Cambridge, MA: MIT Press. Levy, Y. and Vainikka, A. (1999/2000). The development of a mixed null subject system: A crosslinguistic perspective with data on the acquisition of Hebrew. Language Acquisition, 8: 363–84. Li, Charles and Thompson, Sandra A. (1977). The acquisition of tone in Mandarin-speaking children. Journal of Child Language, 4: 185–99.

898 References Li, Charles N. and Thompson, Sandra A. (1981). Mandarin Chinese: A Functional Reference Grammar. Berkeley: University of California Press. Li, N., and Bartlett, C. W. (2012). Defining the genetic architecture of human developmental language impairment. Life Sciences, 90: 469–75. Li, Ping, and Bowerman, Melissa (1998). The acquisition of lexical and grammatical aspect in Chinese. First Language, 18: 311–50. Li, Ping, and Shirai, Yasushiro (2000). The Acquisition of Lexical and Grammatical Aspect. Berlin: Mouton de Gruyter. Li, X. (2009). Degreeless comparatives. Unpublished Ph.D. thesis, Rutgers, The State University of New Jersey. Liberman, M. (1975). The intonational system of English. Unpublished Ph.D. thesis, MIT. Liberman, M. and Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8: 249–336. Liceras, J., Bel, A., and Perales, S. (2006). “Living with optionality”: Root infinitives, bare forms and inflected forms in child null subject languages. In N. Sagarra and A. J. Toribio (eds), Selected Proceedings of the 9th Hispanic Linguistics Symposium. Somerville, MA: Cascadilla Press, 203–16. Liceras, J. M., Fernández Fuertes, R., and Alba de la Fuente, A. (2012). Overt subjects and copula omission in the Spanish and the English grammar of English–Spanish bilingual: On the locus and directionality of interlinguistic influence. First Language, 32: 88–115. Lidz, J. (1997). When is a reflexive not a reflexive? Near reflexivity and Condition B. In K. Kusumoto (ed.), Proceedings of the NELS 27. Amherst: GLSA, University of Massachusetts. Lidz, J. and Conroy, A. (2007). Mechanisms of LF priming: Evidence from Kannada and English. Poster presented at Boston University Conference on Language Development. Lidz, J., and Musolino, J. (2002). Children’s command of quantification. Cognition, 84: 113–54. Lidz, J. and Musolino, J. (2006). On the quantificational status of indefinites: The view from child language. Language Acquisition, 13: 73–102. Lidz, J., Gleitman, H., and Gleitman, L. (2003a). Understanding how input matters: Verb- learning and the footprint of universal grammar. Cognition, 87: 151–78. Lidz, J., Waxman, S., and Freedman, J. (2003b). What infants know about syntax but couldn’t have learned: Experimental evidence for syntactic structure at 18 months. Cognition, 89: B65–B73. Lidz, J., Conroy, A., Musolino, J., and Syrett, K. (2008). When revision is difficult and when it isn’t. Paper presented at Generative Approaches to Language Acquisition. University of Connecticut. Lieberman, Philip (1980). On the development of vowel production in young children. In Grace H. Yeni-Komshian, James F. Kavanagh, and Charles A. Ferguson (eds), Child Phonology. Volume 1: Production. New York: Academic Press, 1: 113–42. Lieven, E. V. M. and Behrens, H. (2012). Dense sampling. In E. Hoff (ed.), Research Methods in Child Language: A Practical Guide. Oxford: Blackwell, 226–39. Lieven, E. and Tomasello, M. (2009). Children’s first language acquisition from a usage- based perspective. In N. Ellis (ed.), Handbook of Cognitive Linguistics and Second Language Acquisition. Abingdon: Routledge. Lieven, E. V. M., Pine, J. M., and Baldwin, G. (1997). Lexically-based learning and early grammatical development. Journal of Child Language, 24: 187–219. Lightbown, P. M. (1977). Consistency and variation in the acquisition of French: A study of first and second language development. Unpublished Ph.D. thesis, Columbia University. Lightfoot, D. (1982). The Language Lottery. Cambridge, MA: MIT Press. Lightfoot, D. (1993). How to Set Parameters. Cambridge, MA: MIT Press.

References 899 Lightfoot, D. (1999). The Development of Language: Acquisition, Change, and Evolution. Malden, MA: Blackwell. Lightfoot, D. (2006). How New Languages Emerge. Cambridge: Cambridge University Press. Limbach, M. and Adone, D. (2010). Language acquisition of recursive possessives in English. In K. Franich, K. M. Iserman, and L. L. Keil (eds), Proceedings of the 34 Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 281–90. Limber, J. (1973). The genesis of complex sentences. In T. Moore (ed.), Cognitive Development and the Acquisition of Language. New York: Academic Press, 169–86. Lin, J. (2009). Chinese comparatives and their implicational parameters. Natural Language Semantics, 17: 1–27. Lintfert, B., and B. Möbius. (2010). Acquisition of syllabic prominence in German speaking children. Speech Prosody 2010 102008: 1–4. . Lisker, Leigh, and Abramson, Arthur S. (1970). The voicing dimension: Some experiments in comparative phonetics. In Proceedings of Sixth International Conference of Phonetic Sciences, 563–67. Lléo, Conxita (1997). Filler syllables, Proto- articles and early prosodic constraints in Spanish and German. In A. Sorace, C. Heycock, and R. Shillcock (eds), Language Acquisition: Knowledge, Representation and Processing. Proceedings of GALA 1997. Edinburgh: HCRC, 251–6. Lléo, Conxita (2001). The interface of phonology and syntax: The emergence of the article in the early acquisition of Spanish and German. In J. Weissenborn and B. Höhle (eds), Approaches to Bootstrapping: Phonological, syntactic and neurophysiological aspects of early language acquisition. Amsterdam/Philadelphia: John Benjamins, 23–44. Lléo, Conxita and Demuth, Katherine (1999). Prosodic constraints on the emergence of grammatical morphemes: Crosslinguistic evidence from Germanic and Romance languages. In A. Greenhill, H. Littlefield, and C. Tano (eds), Proceedings of the 23rd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 407–18. Lleó, Conxita and Prinz, Michael (1996). Consonant clusters in child phonology and the directionality of syllable structure assignment. Journal of Child Language, 23: 31–56. Lleó, Conxita and Prinz, Michael (1997). Syllable structure parameters and the acquisition of affricates. In S. J. Hannahs and Martha Young-Scholten (eds), Focus on Phonological Acquisition. Amsterdam: John Benjamins, 143–64. Lo Duca, Maria G. (1990). Creatività e regole: Studio sull’acquisizione della morfologia derivativa dell’italiano. Bologna: Il Mulino. Locke, John L. (1980a). The inference of speech perception in the phonologically disordered child, part I: A rationale, some criteria, the conventional tests. Journal of Speech and Hearing Disorders, 45: 431–44. Locke, John L. (1980b). The inference of speech perception in the phonologically disordered child, part II: Some clinically novel procedures, their use, some findings. Journal of Speech and Hearing Disorders, 45: 445–68. Locke, John L. (1993). The Child’s Path to Spoken Language. Cambridge, MA: Harvard University Press. Locke, John L. and Kutz, Kathryn J. (1975). Memory for speech and speech for memory. Journal of Speech and Hearing Disorders, 18: 176–91. Lødrup, Helge (1991). The Norwegian pseudopassive in lexical theory. In Working Papers in Scandinavian Syntax, 47: 118–29.

900 References Loeb, D. F. and Leonard, L. B. (1988). Specific Language Impairment and parameter theory. Clinical Linguistics and Phonetics, 2(4): 317–27. Loeb, Diane Frome and Leonard, Laurence B. (1991). Subject case marking and verb morphology in normally developing and specifically language-impaired children. Journal of Speech and Hearing Research, 34(2): 340–6. Lohndal, T. (2010). Medial-wh phenomena, parallel movement and parameters. Linguistic Analysis, 34: 215–44. Lohuis-Weber, H. and Zonneveld, W. (1996). Phonological acquisition and Dutch word prosody. Language Acquisition, 5: 245–83. Lombardi, Linda (1999). Positional faithfulness and voicing assimilation in Optimality Theory. Natural Language and Linguistic Theory, 17: 267–302. Lombardi, L and Sarma, J. (1989). Against the bound variable hypothesis of the acquisition of Condition B. Paper presented at the annual meeting of the Linguistic Society of America, Washington, DC. Lorusso, P., Caprin, C., and Guasti, M. T. (2005). Overt subject distribution in early Italian children. In A. Brugos, M. R. Clark-Cotton, and S. Ha (eds), A Supplement to the Proceedings of the 29th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Low, J. and Perner, J. (2012). Implicit and explicit theory of mind: State of the art. British Journal of Developmental Psychology, 30: 1–13. Łubowicz, Ania (2002). Derived environment effects in Optimality Theory. Lingua, 112: 243–80. Luck, S. J. and E. K. Vogel (1997). The capacity of visual working memory for features and conjunctions. Nature, 390(1997): 279–81. Ludlow, P. (1989). Implicit comparison classes. Linguistics and Philosophy, 12: 519–33. Łukaszewicz, Beata (2006). Extrasyllabicity, transparency and prosodic constituency in the acquisition of Polish. Lingua, 116: 1–30. Łukaszewicz, Beata (2007). Reduction in syllable onsets in the acquisition of Polish: Deletion, coalescence, metathesis and gemination. Journal of Child Language, 34. 53–82. Lukatela, G., Carello, C., and Turvey, M. T. (1987). Lexical representation of regular and irregular inflected nouns. Language and Cognitive Processes, 2: 1–17. Lukyanenko, C., Conroy, A. and Lidz, J. (2010). Rudiments of Principle C in 30-month-olds. Unpublished manuscript. Lust, B. C. (1994). Functional projection of CP and phrase structure parameterization: An argument for the Strong Continuity Hypothesis. In B. Lust, J. Whitman, and M. Suñer (eds), Syntactic Theory and First Language Acquisition: Cross linguistic perspectives. Volume I: Heads, projections, and learnability. Hillsdale, NJ: Lawrence Erlbaum, I: 85–118. Lust, B., Eisele, J.and Mazuka, R. (1992). The binding theory module: Evidence from first language acquisition for Principle C. Language, 68: 333–58. Lyczkowsky, D. A. (1999). Adquiéretelo: On the acquisition of pronominal object clitics in Spanish, BA thesis, Harvard University. McAllister Byun, Tara (2011). A gestural account of a child-specific neutralisation in strong position. Phonology, 28: 371–412. McAllister Byun, Tara (2012). Positional velar fronting: An updated articulatory account. Journal of Child Language, 39: 104376. McArthur, G. M. and Bishop, D. V. M. (2004). Which people with specific language impairment have auditory processing deficits? Cognitive Neuropsychology, 21(1): 79–94.

References 901 McArthur, G. M. and Bishop, D. V. M. (2005). Speech and non-speech processing in people with specific language impairment: A behavioural and electrophysiological study. Brain and Language, 13(1): 33–62. McCarthy, D. A. (1972). McCarthy Scales of Children’s Abilities. San Antonio, TX: The Psychological Corporation. McCarthy, John J. (1998). Morpheme structure constraints and paradigm occultation. In M. Catherine Gruber, Derrick Higgins, Kenneth Olson, and Tamra Wysocki (eds), Proceedings of the Chicago Linguistic Society 5, Vol. II: The Panels. Chicago, IL: Chicago Linguistic Society, II: 123–50. McCarthy, John J. (2002). A Thematic Guide to Optimality Theory. Cambridge: Cambridge University Press. McCarthy, John J. (2005a). Optimal paradigms. In Laura Downing, Tracy Alan Hall, and Renate Raffelsiefen (eds), Paradigms in Phonological Theory. Oxford: Oxford University Press, 170–210. McCarthy, John J. (2005b). Taking a free ride in morphophonemic learning. Catalan Journal of Linguistics, 4: 19–56. McCarthy, John J. and Prince, Alan (1993). Prosodic Morphology I: Constraint Interaction and Satisfaction. Technical Report #3. Rutgers University Centre for Cognitive Science. McCarthy, John J. and Prince, Alan (1994). The emergence of the unmarked: Optimality in prosodic morphology. In M. González (ed.), Proceedings of the North East Linguistics Society 24. Amherst, MA: GLSA Publications, 333–79. McCarthy, John J. and Prince, Alan (1995a). Faithfulness and reduplicative identity. In J. Beckman, S. Urbanczyk, and L. Walsh-Dickey (eds), University of Massachusetts Occasional Papers in Linguistics 18: Papers in Optimality Theory. Amherst, MA: GLSA Publications, 249–384. McCarthy, John J. and Prince, Alan S. (1995b). Prosodic morphology. In John Goldsmith (ed.), The Handbook of Phonological Theory. Oxford: Blackwell, 318–66. McClelland, J. (1998). Connectionist models and Bayesian inference. In M. Oaksford and N. Chater (eds), Rational Models of Cognition. Oxford: Oxford University Press, 21–53. McClelland, J., Botvinick, M., Noelle, D., Plaut, D., Rogers, T., Seidenberg, M., and Smith, L. (2010). Letting structure emerge: Connectionist and dynamical systems approaches to cognition. Trends in Cognitive Sciences, 14: 348–56. McCloskey, J. (2000). Quantifier float and wh-movement in an Irish English. Linguistic Inquiry, 31: 57–84. McCloskey, J. and Hale, K. (1984). On the syntax of person–number inflection in Modern Irish. Natural Language and Linguistic Theory, 1: 487–533. McConnell-Ginet, S. (1973). Comparative constructions in English: A syntactic and semantic analysis. Unpublished Ph.D. thesis, University of Rochester. McConnell-Ginet, S. (1979). On the deep (and surface) adjective good. In L. Waugh and F. van Coetsem (eds), Contributions to Grammatical Studies: Semantics and syntax. Cornell Linguistics Contributions II. Leiden: E. J. Brill, 132–50. McDaniel, D. (1986). Conditions on wh-chains. Unpublished Ph.D. thesis, City University of New York. McDaniel, D. (1989). Partial and multiple wh movement. Natural Language and Linguistic Theory, 7: 565–604. McDaniel, D., and Cairns, H. S. (1996). Eliciting judgement of grammaticality and reference. In D. McDaniel, C. McKee, and H. S. Cairns (eds), Methods for Assessing Children’s Syntax. Cambridge, MA: MIT Press, 233–54.

902 References McDaniel, D. and Maxfield, T. (1992). Principle B and Contrastive Stress. Language Acquisition, 2: 337–58. McDaniel, D., Cairns, H. S. and Hsu, J. R. (1990). Control principles in the grammars of young children. Language Acquisition, 1: 297–335. McDaniel, D., Chiu, B., and Maxfield, T. (1995). Parameters for wh-movement types. Natural Language and Linguistic Theory, 13: 709–53. McDaniel, Dana, Mckee, Cecile, and Bernstein, Judy B. (1998). How children’s relatives solve a problem for minimalism. Language, 74: 308–34. MacDonald, Jonathan (2008). Domain of aspectual interpretation. Linguistic Inquiry, 39(1): 128–47. McDonough, Joyce and Myers, Scott (1991). Consonant harmony and planar segregation in child language. Unpublished manuscript. McDuffie, A. and Abbeduto, L. (2009). Developmental delay and genetic syndromes: Down syndrome, fragile X syndrome, and Williams syndrome. In R. Schwartz (ed.), Handbook of Child Language Disorders. New York: Psychology Press. McGregor, Karla K. and Schwartz, Richard G. (1992). Converging evidence for underlying phonological representations in a child who misarticulates. Journal of Speech and Hearing Research, 35: 596–603. Machida, Nanako, Miyagawa, Shigeru, and Wexler, Kenneth (2004). A-chain maturation reexamined: Why Japanese children perform better on “full” unaccusatives than on passives. In Aniko Csirmaz, Andrea Gualmini, and Andrew Nevins (eds), Plato’s Problems: Papers in language acquisition, vol. 48. Cambridge, MA: MITWPL. Mackay, David (2003). Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press. McKee, C. (1992). A comparison of pronouns and anaphors in Italian and English acquisition. Language Acquisition, 2: 21–54. Macken, Marlys (1978). Permitted complexity in phonological development: One child’s acquisition of Spanish consonants. Lingua, 44: 219–53. Macken, Marlys (1979). Developmental reorganization of phonology: A hierarchy of basic units of acquisition. Lingua, 49: 11–49. Macken, Marlys (1980). The child’s lexical representation: The “puzzle-puddle-pickle” evidence. Journal of Linguistics, 16: 1–17. Macken, Marlys and Barton, David (1980a). The acquisition of the voicing contrast in English: A study of voice onset time in word-initial stop consonants. Journal of Child Language, 7: 41–74. Macken, Marlys and Barton, David (1980b). A longitudinal study of the acquisition of the voicing contrast in American-English word-initial stops, as measured by voice onset time. Journal of Child Language, 7: 41–74. McKeown, R., Zukowski, A. and Larsen, J. (2008). Adolescents with Williams syndrome know the locality requirement for reflexives. Unpublished ms, University of Maryland. McMurray, Bob and Aslin, Richard N. (2005). Infants are sensitive to within-category variation in speech perception. Cognition, 95: B15–826. McMurray, Bob, Aslin, Richard N., and Toscano, Joseph C (2009). Statistical learning of phonetic categories: Insights from a computational approach. Developmental Science, 12: 369–78. Macnamara, J. (1982). Names for Things: A Study of Child Language. Cambridge, MA: Bradford Books/MIT Press.

References 903 McQueen, J. M., Norris, D., and Cutler, A. (1994). Competition in spoken word recognition: Spotting words in other words. Journal of Experimental Psychology: Learning, memory, and cognition, 20: 621–38. McQueen, J. M., Cutler, A., Briscoe, T., and Norris, D. (1995). Models of continuous speech recognition and the contents of the vocabulary. Language and Cognitive Processes, 10: 309–31. McReynolds, Leija V. and Kearns, Kevin P. (1983). Single-subject Experimental Designs in Communicative Disorders. Baltimore, MD: University Park Press. McReynolds, Leija V., Kohn, Joan, and Williams, Gail C. (1975). Articulatory-defective children’s discrimination of their production errors. Journal of Speech and Hearing Disorders, 40: 327–38. MacWhinney, Brian (1974). How Hungarian children learn to speak. Unpublished Ph.D. thesis, University of California. MacWhinney, Brian (1975). Rules, rote, and analogy in morphological formations by Hungarian children. Journal of Child Language, 2: 65–77. MacWhinney, B. (1978). The acquisition of morphophonology. Monographs of the Society for Research in Child Development, 43(1–2) (serial no. 174). MacWhinney, B. (1982). Basic syntactic processes. In S. Kuczaj (ed.), Language Acquisition: Vol. 1. Syntax and semantics. Hillsdale, NJ: Lawrence Erlbaum, 1: 73–136. MacWhinney, B. (2000). The CHILDES Project: Tools for analyzing talk (Vol. 1), 3rd edn. Mahwah, NJ: Lawrence Erlbaum Associates. MacWhinney, B. and Bates, E. (1978). Sentential devices for conveying givennes and newness: A cross-cultural developmental study. Journal of Verbal Learning and Verbal Behavior, 17: 539–58. MacWhinney, Brian and Leinbach, J. (1991). Implementations are not conceptualizations: Revising the Verb Learning Model. Cognition, 29: 121–57. MacWhinney, B., and Snow, C. (1985). The child language data exchange system. Journal of Child Language, 12: 271–96. MacWhinney, B., and Snow, C. (1990). The child language data exchange system: An update. Journal of Child Language, 17: 457–72. Maddieson, Ian (1984). Patterns of Sounds. Cambridge: Cambridge University Press. Magri, Giorgio (2009). A theory of individual-level predicates based on blind mandatory implicatures: Constraint promotion for optimality theory. Unpublished Ph.D. thesis, MIT. Magri, Giorgio (2012). HG has no computational advantages over OT. In J. Choi, E. A. Hogue, J. Punske, D. Tat, J. Schertz, and A. Trueman (eds), WCCFL 29: Proceedings of the 29th West Coast Conference in Formal Linguistics. Somerville, MA, 380–8. Mairal, Ricardo, and Gil, Juana (eds) (2006). Linguistic Universals. Cambridge: Cambridge University Press. Major, Roy C. (1976). Phonological differentiation of a bilingual child. Unpublished Ph.D. thesis, Ohio State University. Maling, Joan and Zaenen, Annie (1990). Preposition-stranding and passive. In Joan Maling and Annie Zaenen (eds), Syntax and Semantics 24: Modern Icelandic Syntax. San Diego, CA: Academic Press, 153–64. Mańczak, Witold (1957–58). Tendences générales des changements analogiques. Lingua, 7: 298–325, 387–420. Mandel, D. R., Jusczyk, P. W., and Kemler Nelson, D. G. (1994). Does sentential prosody help infants to organize and remember speech information? Cognition, 53: 155–80.

904 References Mandel, D. R., Kemler Nelson, D. G., and Jusczyk, P. W. (1996). Infants remember the order of words in a spoken sentence. Cognitive Development, 11: 181–96. Manetti, C. (2012). The acquisition of passives in Italian: Evidence from comprehension, production and syntactic priming experiments. Unpublished Ph.D. thesis, University of Siena. Manning, Christopher and Schütze, Hinrich (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. Manzini, R. and Wexler, K. (1987). Parameters, binding theory, and learnability. Linguistic Inquiry, 18: 413–44. Marantz, A. (1984). On the Nature of Grammatical Relations. Cambridge, MA: MIT Press. Marantz, A. (1996). “Cat” as a phrasal idiom: Consequences of late insertion in Distributed Morphology. Manuscript. MIT. Marantz, A. (1997). No escape from syntax: Don’t try morphological analysis in the privacy of your own lexicon. Penn Working Papers in Linguistics, 4: 201–25. Maratsos, Michael (1974a). Children who get worse at understanding the passive: A replication of Bever. Journal of Psycholinguistic Research, 3: 65–74. Maratsos, M. (1974b). How preschool children understand missing complement subjects. Child Development, 45: 700–6. Maratsos, Michael (2000). More overregularizations after all: New data and discussion on Marcus, Pinker, Ullman, Hollander, Rosen and Xu. Journal of Child Language, 27: 183–212. Maratsos, M. and Abramovitch, R. (1975). How children understand full, truncated and anomalous passives. Journal of Verbal Learning and Verbal Behavior, 14: 145–57. Maratsos, Michael, Kuczaj, Stan, Fox, D. E. C. and Chalkley, M (1979). Some empirical studies in the acquisition of transformational relations: Passives, negatives, and the past tense. In W. A. Collins (ed.), Minnesota Symposium on Child Psychology, Hillsdale, NJ: Lawrence Erlbaum. Maratsos, M., Fox, D., Becker, J., and Chalkey, M. (1985). Semantic restrictions on children’s passives. Cognition, 19: 167–91. Maratsos, M. P., Gudeman, R., Gerard-Ngo, P., and DeHart, G. (1987). A study in novel word learning: The productivity of the causative. In B. MacWhinney (ed.), Mechanisms of Language Acquisition. Hillsdale, NJ: Lawrence Erlbaum, 89–113. Marchman, V., Bates, E., Burkardt, A., and Good, A. (1991). Functional constraints of the acquisition of the passive: Toward a model of the competence to perform. First Language, 11: 65–92. Marcus, G. (1993). Negative evidence in language acquisition. Cognition, 46(1): 53–85. Marcus, G. F. (2004). The Birth of the Mind. London: Basic Books. Marcus, G. F. and Rabagliati, H. (2006). Genes and domain specificity. Trends in Cognitive Sciences, 10(9): 397–8. Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J., and Xu. F. (1992). Overregularization in language acquisition. Monographs of the Society for Research in Child Development, 228 7(4). Marcus, G. F., Vijayan, S., Bandi Rao, S., and Vishton, P. M. (1999. Rule learning by seven- month-old infants. Science, 283: 77–80. Margócsy, D. (2000). Pronouns in Hungarian. Unpublished manuscript, Utrecht University College, Utrecht, The Netherlands. Marinis, T. (2000). The acquisition of the DP in Modern Greek. Unpublished Ph.D. thesis, University of Potsdam.

References 905 Marinis, T. (2002). Acquiring the possessive construction in Modern Greek. In I. Lasser (ed.), The Process of Language Acquisition. Frankfurt/Berlin: Peter Lang, 57–80. Marinis, T. (2003). The Acquisition of the DP in Modern Greek, vol. 31. Amsterdam: John Benjamins. Markman, E. M. (1989). Categorization and Naming in Children: Problems of induction. Cambridge, MA: MIT Press. Markman, E. M. and Wachtel, G. F. (1988). Children’s use of mutual exclusivity to constrain the meanings of words. Cognitive Psychology, 20: 121–57. Markman, E. M., Wasow, J. L., and Hansen, M. B. (2003). Use of the mutual exclusivity assumption by young word learners. Cognitive Psychology, 47: 241–75. Markson, L. and Bloom, P. (1997). Evidence against a dedicated system for word learning in children. Nature, 385: 813–15. Marr, D. (1982). Vision. San Francisco: W.H. Freeman. Marsden, H. (2004). Quantifier scope in non-native Japanese: A comparative study of Chinese, English, and Korean-speaking learners. Unpublished Ph.D. thesis, University of Durham. Marthi, B., Pasula, H., Russell, S., and Peres, Y. (2002). Decayed MCMC Filtering. In Proceedings of 18th UAI, 319–26. Martin, A., Peperkamp, S., and Dupoux, E. (2010). Learning phonemes with a pseudo-lexicon. Paper presented at the Workshop on Computational Modelling of Sound Pattern Acquisition, Edmonton, Canada, February 13–14. Martin, Andrea E., and McElree, Brian (2008). A content- addressable pointer mechanism underlies comprehension of verb-phrase ellipsis. Journal of Memory and Language, 58: 879–906. Mascaró, Juan (1976). Catalan phonology and the phonological cycle. Unpublished Ph.D. thesis, MIT. Maslen, R. J. C., Theakston, A. L., Lieven, E. V. M., and Tomasello, M. (2004). A dense corpus study of past tense and plural overregularization in English. Journal of Speech, Language, and Hearing Research, 47: 1319–33. Matheny, A. P. and Bruggemann, C. E. (1973). Children’s speech: Hereditary components and sex differences. Folia Phoniatrica, 25: 442–9. Mather, P. L. and Black, K. N. (1984). Hereditary and environmental influences on preschool twins’ language skills. Developmental Psychology, 20: 303–8. Matsumoto, Y. (1995). The conversational condition on Horn scales. Linguistics and Philosophy, 18: 21–60. Matsuo, Ayumi (2009). Young children’s understanding of ongoing vs. completion in imperfective and perfective participle. Linguistics, 47(3): 743–57. Matsuoka, Kazumi (1998). The acquisition of Japanese case particles and the theory of case checking. Unpublished Ph.D. thesis, University of Connecticut. Matthews, D., Lieven, E., Theakston, A., and Tomasello, M. (2005). The role of frequency in the acquisition of English word order. Cognitive Development, 20: 121–36. Matthews, D., Lieven, E., Theakston, A., and Tomasello, M. (2007). French children’s use and correction of weird word orders: A constructivist account. Journal of Child Language, 32: 381–409. Mattock, K. and Burnham, D. (2006). Chinese and English infants’ tone perception: Evidence for perceptual reorganization. Infancy, 10: 241–65. Mattock, K., Molnar, M., Polka, L., and Burnham, D. (2008). The developmental course of lexical tone perception in the first year of life. Cognition, 106: 1367–81.

906 References Mattys, S. and Jusczyk, P. W. (2001a). Phonotactic cues for segmentation of fluent speech by infants. Cognition, 78: 91–121. Mattys, S., and Jusczyk, P. W. (2001b). Do infants segment words or recurring contiguous patterns? Journal of Experimental Psychology: Human perception and performance, 27: 644–55. Mattys, S., Jusczyk, P. W., Luce, P. A., and Morgan, J. L. (1999). Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology, 38: 465–94. Mattys, S. L., White, L., and Melhorn, J. F. (2005). Integration of multiple speech segmentation cues: A hierarchical framework. Journal of Experimental Psychology: General, 134: 477–500. Mattys, Sven L., Jusczyk, Peter W., Luce, Paul A., and Morgan, James L. (1999). Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology, 38: 465–94. Matushansky, O. and Wexler, K. (2002). Again on the subject of English null subjects: Discourse and syntax. Proceedings of GALA 2001, 164–7 1. May, R. (1977). The grammar of quantification. Unpublished Ph.D. thesis, MIT. Maye, J. and Weiss, D. (2003). Statistical cues facilitate infants’ discrimination of difficult phonetic contrasts. Proceedings of the 27th Annual Boston University Conference on Language Development. Maye, J., Werker, J., and Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82: B101–B111. Mazuka, R., Lust, B., Wakayama, T., and Snyder, W. (1986). Distinguishing effects of parameters in early syntax acquisition: A cross-linguistic study of Japanese and English. In Papers and Reports on Child Language Development. Palo Alto, CA: Stanford University. Mazurkewich, I. and White, L. (1984). The acquisition of the dative alternation: Unlearning overgeneralizations. Cognition, 16: 261–83. Meaburn, E., Dale, P. S., Craig, I. W. and Plomin, R. (2002). Language-impaired children: No sign of the FOXP2 mutation. Cognitive Neuroscience and Neuropsychology, 13: 1075–77. Meck, W. H. and Church, R. M. (1983). A mode control model of counting and timing processes. Journal of Experimental Psychology: Animal Behavior Processes, 9(3): 320–34. Medina, T. N. (2007). Learning which verbs allow object omission: Verb semantic selectivity and the implicit object construction. Unpublished Ph.D. thesis, Johns Hopkins University. Medina, T., Snedecker, J., Trueswell, J., and Gleitman, L. (2011). How words can and cannot be learned by observation, Proceedings of the National Academy of Sciences, 108: 9014–19. Mehler, Jacques (1963). Some effects of grammatical transformations on the recall of English sentences. Journal of Verbal Learning and Verbal Behavior, 2: 346–51. Mehler, Jaques, Dommergues, Jean Y., Frauenfelder, Uli, and Segui, Juan (1981). The syllable’s role in speech segmentation. Journal of Verbal Learning and Verbal Behavior, 20: 298–305. Mehler, J., Jusczyk, P. W., Lambertz, G., Halsted, N., Bertoncini, J., and Amiel-Tison, C. (1988). A precursor of language acquisition in young infants. Cognition, 29: 144–78. Mehler, J., Dupoux, E., Nazzi, T., and Dehaene-Lambertz, G. (1996). Coping with linguistic diversity: The infant’s viewpoint. In J. L. Morgan and K. Demuth (eds), Signal to Syntax. Mahwah, NJ: Lawrence Erlbaum Associates, 101–116. Mehler, J., Peña, M., Nespor, M., and Bonatti, L. (2006). The soul of language does not use statistics: Reflections on vowels and consonants. Cortex, 42(6): 846–54. Meier, C. (2003). The meaning of too, enough and so … that. Natural Language Semantics, 11: 69–107. Meisel, J. M. (1990). Inflection: Subjects and subject-verb agreement. In J. M. Meisel (ed.), Two First Languages: Early grammatical development in bilingual children. Dordrecht: Foris, 237–300.

References 907 Meisel, J. M. (1992). The Acquisition of Verb Placement: Functional categories and V2 phenomena in language acquisition. Norwell, MA: Kluwer. Menn, Lise (1971). Phonotactic rules in beginning speech. Lingua, 26: 225–51. Menn, Lise (1978). Phonological units in beginning speech. In Alan Bell and Joan B. Hooper (eds), Syllables and Segments. Amsterdam: North Holland, 315–34. Menn, Lise (1980). Phonological theory and child phonology. In Grace H. Yeni-Komshian, James F. Kavanah, and Charles A. Ferguson (eds), Child Phonology: Volume 1: Production. New York: Academic Press, 1: 23–42. Menn, Lise (1992). Building our own models: Developmental phonology comes of age. In Charles A. Ferguson, Lise Menn, and Carol Stoel- Gammon (eds), Phonological Development: Models, research, implications. Timonium, MD: York Press, 3–15. Menn, Lise (2004). Saving the baby: making sure that old data survive new theories. In René Kager, Joe Pater, and Wim Zonneveld (eds), Constraints in Phonological Acquisition. Cambridge: Cambridge University Press, 54–72. Menn, Lise and Matthei, Edward H. (1992). The “two-lexicon” model of child phonology: Looking back, looking ahead. In Charles A. Ferguson, Lise Menn, and Carol Stoel-Gammon (eds), Phonological Development: Models, research, implications. Timonium, MD: York Press. Menyuk, P. (1963). A preliminary evaluation of grammatical capacity in children. Journal of Verbal Learning and Verbal Behavior, 2: 429–39. Menyuk, Paula (1971). Clusters as single underlying consonants: Evidence from children’s productions. Paper presented at the International Congress of Phonetic Sciences, Montreal, August. Menyuk, Paula (1972). Clusters as single underlying consonants: Evidence from children’s production. In A. Rigault and R. Charbonneau (eds), Proceedings of the Seventh International Congress of Phonetic Sciences. The Hague: Mouton, 1161–5. Menyuk, Paula and Anderson, Suzan (1969). Children’s identification and reproduction of /w/, /r/ and /l/. Journal of Speech and Hearing Disorders, 12: 39–52. Merchant, J. (2004). Fragments and ellipsis. Linguistics and Philosophy, 27: 661–738. Merchant, Jason (2002). Swiping in Germanic. In C. Jan-Wouter Zwart and Werner Abraham (eds), Studies in Comparative Germanic Syntax. John Benjamins: Amsterdam, 295–321. Merchant, J. (2009). Phrasal and clausal comparatives in Greek and the abstractness of syntax. Journal of Greek Linguistics, 9: 134–64. Merchant, Nazarré (2008). Discovering Underlying Forms: Contrast Pairs and Ranking. Unpublished Ph.D. thesis, Rutgers University. Merchant, Nazarré and Tesar, Bruce (2008). Learning underlying forms by searching restricted lexical subspaces. In Proceedings of the Forty-First Conference of the Chicago Linguistic Society. Chicago Linguistics Society, 33–47. Mersad, K. and Nazzi, T. (2011). Transitional probabilities and phonotactics in a hierarchical model of speech segmentation. Memory and Cognition, 39: 1085–93. Mersad, K. and Nazzi, T. (2012). When Mommy comes to the rescue of statistics: Infants combine top- down and bottom- up cues to segment speech. Language Learning and Development, 8: 303–15. Mervis, C., Robinson, B. F., Rowe, M. L., Becerra, A. M., and Klein-Tasman, B. P. (2003). Language abilities of individuals with Williams syndrome. International Review of Research in Mental Retardation, 27: 35–81. Messenger, K. (2009). Syntactic priming and children’s production and representation of the passive. Unpublished Ph.D. thesis, University of Edinburgh.

908 References Messenger, K., Yuan, S., and Fisher, C. (n.d.). Syntax and selection: Learning combinatorial properties of verbs from listening. Unpublished manuscript, University of Illinois. Messenger, K., Branigan, H. P., and McLean, J. F. (2011). Evidence for (shared) abstract structure underlying children’s short and full passives. Cognition, 121: 268–74. Messer, Stanley (1967). Implicit phonology in children. Journal of Verbal Learning and Verbal Behaviour, 6: 609–13. Michaelis, L. A., and Ruppenhofer, J. (2001). Beyond Alternations: A constructional model of the German applicative pattern. Stanford: CSLI Publications. Miller, Joanne L. and Eimas, Peter D. (1979). Organization in infant speech perception. Canadian Journal of Psychology, 33: 353–67. Miller, Karen and Schmitt, Christina (2010). Effects of variable input in the acquisition of plural in two dialects of Spanish. Lingua, 120: 1178–93. Millotte, S., Frauenfelder, U. H., and Christophe, A. (2007a). Phrasal prosody constrains lexical access. Paper presented at the Architectures and Mechanisms for Language Processing. Millotte, S., Wales, R., and Christophe, A. (2007b). Phrasal prosody disambiguates syntax. Language and Cognitive Processes, 22: 898–909. Millotte, S., René, A., Wales, R., and Christophe, A. (2008). Phonological phrase boundaries constrain on-line syntactic analysis. Journal of Experimental Psychology: Learning, memory, and cognition, 34: 874–85. Millotte, S., Morgan, J., Margules, S., Bernal, S., Dutat, M., and Christophe, A. (2010). Phrasal prosody constrains word segmentation in French 16-month-olds. Journal of Portuguese Linguistics, 9: 67–86. Mills, A. E. (1985). The acquisition of German. In D. Slobin (ed.), The Cross-linguistic Study of Language Acquisition. Hillsdale, NJ: Erlbaum, 141–254. Mills, D. L., Coffey-Corina, S. A., Neville, H. J. (1993). Language acquisition and cerebral specialization in 20-month-old infants. Journal of Cognitive Neuroscience, 5: 317–34. Minai, Utako (2000). The acquisition of Japanese passives. In Proceedings of JK 9. Mintz, T. (2002). Category induction from distributional cues in an artificial language. Memory and Cognition, 30: 678–86. Mintz, T. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90: 91–117. Mintz, T. (2006). Finding the verbs: Distributional cues to categories available to young learners. In K. Hirsh-Pasek and R. M. Golinkoff (eds), Action Meets Word: How children learn verbs. New York: Oxford University Press, 31–63. Mittler, P. (1969). Genetic aspects of psycholinguistic abilities. Journal of Child Psychology and Psychiatry, 10: 165–76. Miyagawa, S. (1989). Structure and case marking in Japanese. In Syntax and Semantics, vol. 22. San Diego: Academic Press. Miyahara, K. (1974). The acquisition of the Japanese particles. Journal of Child Language, 1: 283–6. Miyamoto, E., Wexler, K., Aikawa, T., and Miyagawa, S. (1999). Case dropping and unaccusatives in Japanese acquisition. In A. Greenhill, H. Littlefield, and C. Tano (eds), Proceedings of 23rd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 443–52. Mohanan, K. P. (1986). The Theory of Lexical Phonology. Dordrecht: Reidel. Monaco, A. P. and The SLI Consortium (2007). Multivariat linkage analysis of Specific Language Impairment (SLI). Annals of Human Genetics, 71: 660–673.

References 909 Monnin, Lorraine M. and Huntington, Dorothy A. (1974). Relationship of articulatory deficits to speech-sounds identification. Journal of Speech and Hearing Research, 17: 352–66. Montgomery, C. R. and Clarkson, M. G. (1997). “Infants” pitch perception: Masking by low- and high-frequency noises. Journal of the Acoustical Society of America, 102: 3665–72. Moon, C., Panneton-Cooper, R. and Fifer, W. P. (1993). Two-day-olds prefer their native language. Infant Behavior and Development, 16: 495–500. Moore, C., Bryant, D. and Furrow, D. (1989). Mental terms and the development of certainty. Child Development, 60: 167–7 1. Moore, C., Pure, K., and Furrow, D. (1990). Children’s understanding of the modal expression of speaker certainty and uncertainty and its relation to the development of a representational theory of mind. Child Development, 61: 722–30. Moore, D. (1999). Comparatives and superlatives: Lexical before functional. In A. Greenhill, H. Littlefield, and C. Tano (eds), Proceedings of the 23rd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 474–81. Moreton, E. (2008). Analytic bias and phonological typology. Phonology, 25: 83–127. Moreton, Elliott and Pater, Joe (2011). Learning artificial phonology: A review. Unpublished manuscript. Morgan, J. and K. Demuth (eds) (1996a). Signal to Syntax: Bootstrapping from speech to grammar in early acquisition. Mahwah, NJ: Lawrence Erlbaum. Morgan, J. L. and Demuth, K. (1996b). Signal to Syntax: An overview. In J. L. Morgan and K. Demuth (eds), Signal to Syntax: Bootstrapping from speech to grammar in early acquisition. Mahwah, NJ: Lawrence Erlbaum Associates, 1–22. Morgan, J. L. and Saffran, J. R. (1995). Emerging integration of sequential and suprasegmental information in preverbal speech segmentation. Child Development, 66: 911–36. Morgan, James, and Travis, Lisa (1989). Limits on negative information in language input. Journal of Child Language, 16: 531–52. Morrisette, Michele L., Dinnsen, Daniel A., and Gierut, Judith A. (2003). Markedness and context effects in the acquisition of place features. Canadian Journal of Linguistics, 48: 329–55. Morse, P. A. (1972). The discrimination of speech and nonspeech stimuli in early infancy. Journal of Experimental Child Psychology, 14: 477–92. Mostowski, A. (1957). On a generalization of quantifiers. Fundamenta mathematicae, 44: 12–36. Moskowitz, Arlene J. (1973). The acquisition of phonology and syntax: A preliminary study. In J. Hintikka et al. (eds), Approaches to Natural Language: Proceedings of the 1970 Stanford Workshop on Grammar and Semantics. Dordrecht: Reidel, 48–84. Muggleton, Stephen (1990). Inductive Acquisition of Expert Knowledge. Boston, MA: Addison-Wesley. Mugitani, Ryoko, Fais, Laurel, Kajikawa, Sachiyo, Werker, Janet F., and Amano, Shigeaki (2007). Age-related changes in sensitivity to native phonotactics in Japanese infants. Journal of the Acoustical Society of America, 122: 1332–5. Müller, A., Höhle, B., Schmitz, M., and Weissenborn, J. (2006). Focus-to-stress alignment in 4-to 5-year-old German-learning children. In Proceedings of GALA, 393–407. Munn, Alan, Zhang, Xiaofei, and Schmitt, Cristina (2009). The acquisition of plurality in a language without plurality. In José Brucart, Anna Gavarró, and Jaume Solà (eds), Merging Features: Computation, interpretation and acquisition. Oxford: Oxford University Press. Munsen, Benjamin, Edwards, Jan, and Beckman, Mary E. (2005). Phonological knowledge in typical and atypical speech–sound development. Topics in Language Disorders, 25: 190–206.

910 References Munson, Benjamin (2001). Phonological pattern frequency and speech production in adults and children. Journal of Speech, Language, and Hearing Research, 44: 778–92. Munson, Benjamin, Edwards, Jan, and Beckman, Mary E. (2005a). Relationships between nonword repetition accuracy and other measures of linguistic development in children with phonological disorders. Journal of Speech, Language, and Hearing Research, 48: 61–78. Munson, Benjamin, Kurtz, Beth A., and Windsor, Jennifer (2005b). The influence of vocabulary size, phonotactic probability, and wordlikeness on nonword repetitions of children with and without specific language impairment. Journal of Speech, Language, and Hearing Research, 48: 1033–47. Munson, Benjamin, Edwards, Jan, Schellinger, Sarah K., Beckman, Mary E. and Meyer, Marie K. (2010). Deconstructing phonetic transcription: Covert contrast, perceptual bias, and an extraterrestrial view of Vox Humana: Clinical linguistics and phonetics, 24: 245–60. Munson, Benjamin, Edwards, Jan, and Beckman, Mary E. (2012). Phonological representations in language acquisition: Climbing the ladder of abstraction. In Abigail C. Cohn, Cécile Fougeron, and Marie K. Huffman (eds), The Oxford Handbook of Laboratory Phonology. Oxford: Oxford University Press, 288–309. Murasugi, K. and Fuji, C. (2009). Root infinitives in Japanese and the late acquisition of head- movement. In J. Chandlee, M. Franchini, S. Lord, and M. Rheiner (eds), A Supplement to the Proceedings of the 33rd Boston University Conference on Language Development. Murasugi, Keiko and Watanabe, Eriko (2009). Case errors in Child Japanese and the implication for the syntactic theory. In Jean Crawford, Koichi Otaki, and Masahiko Takahashi (eds), Proceedings of the 3rd Conference on Generative Approaches to Language Acquisition North America (GALANA 2008). Somerville, MA: Cascadilla Proceedings Project, 153–64. Musolino, J. (1998). Universal Grammar and the acquisition of semantic knowledge: an experimental investigation of quantifier-negation interactions in English. Unpublished Ph.D. thesis, University of Maryland. Musolino, J. (2004). The semantics and acquisition of number words: Integrating linguistic and developmental perspectives. Cognition, 93: 1–41. Musolino, J. (2009). The logical syntax of number words: Theory, acquisition and processing. Cognition, 111: 24–45. Musolino, J. and Gualmini, A. (2004). The role of partivity in child language. Language Acquistion, 12(1): 97–107. Musolino, J. and Landau, B. (2010). When theories don’t compete: Response to Thomas, Karaminis, and Knowland’s commentary on Musolino, Chunyo, and Landau. Language and Learning Development, 6: 170–8. Musolino, J. and Lidz, J. (2003). The scope of isomorphisim: Turning adults into children. Language Acquistion, 11(4): 277–91. Musolino, J. and Lidz, J. (2006). Why children aren’t universally successful with quantification. Linguistics, 44(4): 817–52. Musolino, J., Crain, S., and Thornton, R. (2000). Navigating negative quantificational space. Linguistics, 38(1): 1–32. Musolino, J., G. Chunyo, and Landau, B. (2010). Uncovering knowledge of core syntactic and semantic principles in individuals with Williams syndrome. Language Learning and Development, 6: 126–61. Myerson, R. F. (1976). Children’s knowledge of selected aspects of “Sound Pattern of English.” In R. N. Campbell and P. T. Smith (eds), Recent Advances in the Psychology of Language. New York: Plenum Press, 377–420.

References 911 Näätänen, R., Gaillard, A. W. K., and Mäantysalo, S. (1978. Early selective-attention effect on evoked potential reinterpreted. Acta Psychologica, 42: 313–29. Nadig, A., and Sedivy, J. (2002). Evidence of perspective-taking constraints in children’s on- line reference resolution. Psychological Science, 13: 329–36. Nadig, A., Sedivy, J., Joshi, A., and Bortfeld, H. (2003). The development of discourse constraints on the interpretation of adjectives. In B. Beachley, A. Brown, and F. Conlin (eds), Proceedings of the 27th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 568–79. Naigles, L. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17: 357–74. Naigles, Letitia (1996). The use of multiple frames in verb learning via syntactic bootstrapping. Cognition, 58: 221–51. Naigles, L. and Kako, E. T. (1993). First contact in verb acquisition: Defining a role for syntax. Child Development, 64: 1665–87. Naigles, L., and Lehrer, N. (2002). Language-general and language-specific influences on children’s acquisition of argument structure: A comparison of French and English. Journal of Child Language, 29: 545–66. Naigles, L., Gleitman, H., and Gleitman, L. (1993). Syntactic bootstrapping and verb acquisition. In E. Dromi (ed.), Language and Cognition: A developmental perspective. Norwood, NJ: Ablex, 104–40. Naigles, Letitia, Guerrera, Katelyn, Petroj, Vanessa, Riqueros Morante, José, Lillo-Martin, Diane, and Snyder, William (2013). The Compounding Parameter: New evidence from IPL. Paper presented at the Boston University Conference on Language Development, 1 November 2013. Namiki, Takayasu (1994). Subheads of compounds. In Shuji Chiba (ed.), Synchronic and Diachronic Approaches to Language: A Festschrift for Toshio Nakao on the occasion of his sixtieth birthday. Tokyo: Liber Press, 269–85. Namy, L. and Gentner, D. (2002). Making a silk purse out of two sow’s ears: Young children’s use of comparison in category learning. Journal of Experimental Psychology: General, 131: 5–15. Napoli, D. J. (1982). Initial material deletion in English. Glossa: An international journal of linguistics, 16: 85–111. Narasinham, B., Budwig, N., and Murty, L. (2005). Argument realization in Hindi caregiver- child discourse. Journal of Pragmatics, 37: 461–95. Narayan, Chandan R. (2008). The acoustic-perceptual salience of nasal place contrasts. Journal of Phonetics, 36: 191–217. Narayan, Chandan R., Werker, Janet F., and Beddor, Patrice S. (2010). The interaction between acoustic salience and language experience in developmental speech perception: Evidence from nasal place discrimination. Developmental Science, 13: 407–20. Nazzi, T. (2005). Use of phonetic specificity during the acquisition of new words: Differences between consonants and vowels. Cognition, 98: 13–30. Nazzi, T., and Bertoncini, J. (2003). Before and after the vocabulary spurt: Two modes of word acquisition? Developmental Science, 6: 136–42. Nazzi, Thierry and Bertoncini, Josiane (2009). Phonetic specificity in early lexical acquisition: New evidence from consonants in coda positions. Language and Speech, 52: 463–80. Nazzi, T., Bertoncini, J., and Mehler, J. (1998a). Language discrimination by newborns: Towards an understanding of the role of rhythm. Journal of Experimental Psychology: Human perception and performance, 24: 756–66.

912 References Nazzi, T., Floccia, C., and Bertoncini, J. (1998b). Discrimination of pitch contours by neonates. Infant Behavior and Development, 21: 779–84. Nazzi, T., Jusczyk, P. W., and Johnson, E. K. (2000a). Language discrimination by English learning 5-month-olds: Effects of rhythm and familiarity. Journal of Memory and Language, 43: 1–19. Nazzi, T., Kemler Nelson, D. G., Jusczyk, P. W., and Jusczyk, A. M. (2000b). Six-month-olds’ detection of clauses in continuous speech: Effects of prosodic well-formedness. Infancy, 1: 123–47. Nazzi, T., Dilley, L. C., Jusczyk, A. M., Shattuck-Hunagel, S., and Jusczyk, P. W. (2005). English- learning infants’ segmentation of verbs from fluent speech. Language and Speech, 48: 279–98. Nazzi, T., Iakimova, I., Bertoncini, J., Frédonie, S., and Alcantara, C. (2006). Early segmentation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences. Journal of Memory and Language, 54: 283–99. Nazzi, T., Mersad, K., Sundara, M., Iakimova, G., and Polka, L. (2014). Early word segmentation in infants acquiring Parisian French: Task-dependent and dialect-specific aspects. Journal of Child Language, 41: 600–33. Nazzi, T., Bertoncini, J., and Bijeljac-Babic, R. (2009). A perceptual equivalent of the labial- coronal effect in the first year of life. Journal of the Acoustical Society of America, 126: 1440–6. Neale, M. C. and Maes, H. (2003). Methodology for Genetic Studies of Twins and Families. Dordrecht: Kluwer Academic. Neale, M. C., Boker, S. M., Xie, G., and Maes, H. H. (2006). Mx: Statistical modeling, 7th edn. VCU Box 900126, Richmond, VA 23298: Department of Psychiatry, . Neeleman, Ad (1994). Complex Predicates. Utrecht: Onderzoekinstituut voor Taal en Sprak (OTS). Neeleman, A. and Szendrői, K. (2007). Radical pro drop and the morphology of pronouns. Linguistic Inquiry, 38: 671–7 14. Neeleman, A. and Weerman, F. (1999). Flexible Syntax: A theory of case and arguments, vol. 47, Studies in Natural Language and Linguistic Theory. Dordrecht: Kluwer Academic. Neimark, E. D. and Slotnick, N. S. (1970). Development of the understanding of logical connectives. Journal of Educational Psychology, 61: 451–60. Nelson, K. and Benedict, H. (1974). The comprehension of relative, absolute, and contrastive adjectives by young children. Journal of Psycholinguistic Research, 3: 333–42. Nespor, Marina and Vogel, Irene (1986). Prosodic Phonology. Dordrecht: Foris. Newbury, D. F. and Monaco, A. P. (2010). Genetic advances in the study of speech and language disorders. Neuron, 68: 309–20. Newbury, D. F., Bonora, E., Lamb, J. A., Fisher, S. E., Lai, C. S. L., Baird, G., Jannoun, L., Slonims, V., Stott, C. M., Merricks, J. M., Bolton, P. F., Bailey, A. J., Monaco, A. P., and the International Molecular Genetic Study of Autism Consortium (2002). FOXP2 is not a major susceptibility gene for autism or specific language impairment. American Journal of Human Genetics, 70: 1318–27. Newbury, D. F., Winchester, L., Addis, L., Paracchini, S., Buckingham, L.-L., Clark, A., Cohen, W., Cowie, H., Dworzynski, K., Everitt, A., Goodyer, I. M., Hennessy, E., Kindley, A. D., Miller, L. L., Nasir, J., O’Hare, A., Shaw, D., Simkin, Z., Simonoff, E., Slonims, V., Watson, J., Ragoussis, J., Fisher, S. E., Seckl, J., Helms, P. J., Bolton, P. F., Pickles, A., Conti-Ramsden, G., Baird, G., Bishop, D. V. M., and Monaco, A. (2009). CMIP and APT2C2 modulate phonological short-term memory in language impairment. The American Journal of Human Genetics, 85: 264–272.

References 913 Newbury, D. F., Paracchini, S., Scerri, T. S., Winchester, L., Addis, L., Richardson, A. J., Walter, J., Stein, J. F., Talcott, J. B., and Monaco, A. P. (2011). Investigation of dyslexia and SLI risk variants in reading-and language-impaired subjects. Behavior Genetics, 41: 90–104. Newman, R., Bernstein Ratner, N., Jusczyk, A. M., Jusczyk, P. W., and Dow, K. A. (2006). Infants’ early ability to segment the conversational speech signal predicts later language development: A retrospective analysis. Developmental Psychology, 42: 643–55. Newmeyer, F. J. (2004). Against a parameter-setting approach to language variation. Linguistic Variation Yearbook, 4: 181–234. Newport, E. (1990). Maturational constraints on language learning. Cognitive Science, 14: 11–28. Newport, E., and Aslin, R. (2004). Learning at a distance I. Statistical learning of non-adjacent dependencies. Cognitive Psychology, 48(2): 127–62. Ng, L., Cheung, H., and Xiao, W. (2010). False belief, complementation language, and contextual bias in preschoolers. International Journal of Behavioral Development, 34(2): 168–79. Ní Chiosáin, Máire and Padgett, Jaye (1993). On the nature of consonant-vowel interaction. Presented at HILP, Leiden University. Nichols, Johanna (1992). Linguistic Diversity in Space and Time. Chicago: University of Chicago Press. Ning, Pan (2005). A government phonology approach to the acquisition of syllable structure. Unpublished Ph.D. thesis, University of Louisiana. Ninio, A. (1999). Pathbreaking verbs in syntactic development and the question of prototypical transitivity. Journal of Child Language, 26: 619–53. Ninio, A. (2011). Syntactic Development, Its Input And Output. Oxford: Oxford University Press. Nissenbaum, J., and Schwarz, B. (2011). Parasitic degree phrases. Natural Language Semantics, 19: 1–38. Nishibayashi, L-.L., Goyet, L., and Nazzi, T. (in press). Early speech segmentation in French- learning infants: Monosyllabic words versus embedded syllables. Language & Speech. Nittrouer, Susan and Studdert-Kennedy, Michael (1987). The role of coarticulatory effects in the perception of fricatives by children and adults. Journal of Speech and Hearing Research, 30: 319–29. Niyogi, Partha (2006). The Computational Nature of Language Learning and Evolution. Cambridge, MA: MIT Press. Niyogi, P. and Berwick, R. C. (1996). A language learning model for finite parameter spaces. Cognition, 61:161–93. Nopola- Hemmi, J., Myllyluoma, B., Haltia, T., Taipale, M., Ollikainen, V., Ahonen, T., Voutilainen, A., Kere, J., and Widen, E. (2001). A dominant gene for developmental dyslexia on chromosome 3. Journal of Medical Genetics, 38: 658–64. Noveck, I (2001). When children are more logical than adults: Experimental investigations of scalar implicature. Cognition, 78(2): 165–88. Noveck, I. A., and Posada, A. (2003). Characterizing the time course of an implicature. Brain and Language, 85: 203–10. Noveck, I. A., and Sperber, D. (2007). The why and how of experimental pragmatics: The case of “scalar inferences.” In N. Burton-Roberts (ed.), Advances in Pragmatics. Basingstoke: Palgrave. Noveck, I. A., Ho, S. and Sera, M. (1996). Children’s understanding of epistemic modals. Journal of Child Language, 23(3): 621–43. Noveck, I., Guelminger, R., Georgieff, N., and Labruyere, N. (2007). What autism can tell us about every … not sentences. Journal of Semantics, 24(1): 73–90.

914 References Noveck, I., Chevallier, C., Chevaux, F., Musolino, J., and Bott, L. (2009). Children’s enrichments of conjunctive sentences in context. In P. De Brabanter and M. Kissine (eds), Current Research in Semantics/Pragmatics. Bingley, UK: Emerald Group, 20: 211–34. Novogrodsky, R. and Friedmann, N. (2006). The production of relative clauses in syntactic SLI: A window to the nature of the impairment. Advances in Speech-Language Pathology, 8(4): 364–75. Nowak, Martin A., Komarova, Natalia L., and Niyogi, Partha (2002). Computational and evolutionary aspects of language. Nature, 417: 611–17. Nübling, Damaris and Szczepaniak, Renata (2008). On the way from morphology to phonology: German linking elements and the role of the phonological word. Morphology, 18: 1–25. Nuñez del Prado, Z., Foley, C. Proman, R., and Lust, B. (1994). Subordinate CP and pro- drop: Evidence for degree “n” learnability from an experimental study of Spanish and English acquisition. In M. Gonzalez (ed.), Proceedings of NELS 24, 2: 443–60. Nyberg, E. H., III (1992). A non-deterministic, success-driven model of parameter setting in language acquisition. Unpublished Ph.D. thesis, Carnegie Mellon University. O’Brien, E. K., Zhang, X., Nishimura, C., Tomblin, B. and Murray, J. C. (2003). Association of specific language impairment (SLI) to the region of 7q31. American Journal of Human Genetics, 72: 1536–43. O’Brien, K., Grolla, E. and Lillo-Martin, D. (2006). Long passives are understood by young children. In D. Bamman, T. Magnitskaia, and C. Zaller (eds), Proceedings of the 30th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Oates, Tim, Armstrong, Tom, and Becerra Bonache, Leonor (2006). Inferring grammars for mildly context-sensitive languages in polynomial-time. In Proceedings of the 8th International Colloquium on Grammatical Inference (ICGI), 137–47. Odic, D., Pietroski, P., Hunter, T., Lidz, J., and Halberda, J. (2013). Young children’s understanding of “more” and discrimination of number and surface area. Journal of Experimental Psychology: Learning, memory, and cognition, 39(2): 451–61. Odic, D., Pietroski, P., Lidz, J., and Halberda, J. (2014). The co-development of language and cognition: Evidence from children’s acquisition of most. Unpublished manuscript, Johns Hopkins University and University of Maryland. Odic, D., Libertus, M. E., Feigenson, L., and Halberda, J. (in press). Developmental Change in the Acuity of Approximate Number and Area Representations. Developmental Psychology. Oehrle, R. T. (1976). The grammatical status of the English dative alternation. Unpublished Ph.D. thesis, MIT. Ogiela, Diane (2007). Development of telicity interpretation: Sensitivity to verb-type and determiner-type. Unpublished Ph.D. thesis, Michigan State University. O’Grady, W. (1997). Syntactic Development. Chicago, IL: University of Chicago Press. O’Grady, W., Yamashita, Y., Lee, M., Choo, M., and Cho S. (2000). Computational factors in the acquisition of relative clauses. International Conference on the Development of the Mind, Keio University, Tokyo. Ohala, Diane K. (1999). The influence of sonority on children’s clusters reductions. Journal of Communication Disorders, 32: 397–422. Oiry, M. and Demirdache, H. (2006). Evidence from L1 Acquisition for the syntax of Wh-scope marking in French. In V. Torrens and L. Escobar (eds), The Acquisition of Syntax in Romance Languages: Language acquisition and language disorders. Amsterdam: John Benjamins, 41: 289–315.

References 915 Okabe, Reiko and Sano, Tetsuya (2002). The acquisition of implicit arguments in Japanese and related matters. Proceedings of the Boston University Conference on Language Development, 26: 485–99. Okubo, A. (1967). Yooji gengo no hattatsu. Tokyo: Tokyodoo. Olguin, R., and Tomasello, M. (1993). Twenty-five-month-old children do not have a grammatical category of verb. Cognitive Development, 8: 245–72. Oliver, B. R. and Plomin, R. (2007). Twins’ early development study (TEDS): a multivariate, longitudinal genetic investigation of language, cognition, and behavior problems from childhood through adolescence. Twin Research and Human Genetics, 10(1): 96–105. Olmsted, David L. (1971). Out of the Mouth of Babes: Earliest stages in language learning. The Hague: Mouton De Gruyter. Olsen, Mari Broman and Weinberg, Amy (1999). Innateness and the acquisition of grammatical aspect via lexical aspect. A. Greenhill, H. Littlefield, and C. Tano (eds), Proceedings of the 23rd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Oltra-Massuet, I. and Arregi, K. (2005). Stress-by-Structure in Spanish. Linguistic Inquiry, 36(1): 43–84. Omaki, A. and Lidz, J. (2015). Linking parser development to the acquisition of syntactic knowledge. Language Acquisition, 22(2): 158–92. Omaki, A., White, I. D., Goro, T., Lidz, J., and Phillips, C. (2014). No fear of commitment: Children’s incremental interpretations in English and Japanese wh questions. Language Learning and Development, 10: 206–33. Onishi, Kristine H., Chambers, Kyle E., and Fisher, Cynthia (2003). Infants learn phonotactic regularities from brief auditory experience. Cognition, 87: B69–B77. Onnis, Luca and Christiansen, Morten H. (2008). Lexical categories at the edge of the word. Cognitive Science, 32: 184–221. Onnis, Luca, Monaghan, Padraic, Richmond, Korin, and Chater, Nick (2005). Phonology impacts segmentation in online speech processing. Journal of Memory and Language, 53: 225–37. Orfitelli, R. M. (2012). Argument intervention in the acquisition of A-movement. Unpublished Ph.D. thesis, UCLA. Orfitelli, R. and Hyams, N. (2012). Children’s grammar of null subjects: Evidence from comprehension. Linguistic Inquiry, 43(4). 563–90. Orie, Olanike Ola (2003). Two harmony theories and high vowel patterns in Ebira and Yoruba. The Linguistic Review, 20: 1–35. Orsolini, M., Fanari, R., and Bowles, H. (1998). Acquiring regular and irregular inflection in a language with verb classes. Language and Cognitive Processes, 13: 425–64. Osherson, D. N., Stob, M., and Weinstein, S. (1986). Systems that Learn: An introduction to learning theory for cognitive and computer scientists. Cambridge, MA: MIT Press. Ota, M. (2003a). The development of lexical pitch accent systems: An autosegmental analysis. Canadian Journal of Linguistics, 48: 357–83. Ota, M. (2003b). The Development of Prosodic Structure in Early Words: Continuity, diverge and change. Amsterdam: John Benjamins. Ota, M. (2006). Children’s production of word accents in Swedish revisited. Phonetica, 63: 230–46. Otake, T., Hatano, G., Cutler, A., and Mehler, J. (1993). Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language, 32: 258–78.

916 References Otomo, K, and Stoel-Gammon, C. (1992). The acquisition of unrounded vowels in English. Journal of Speech and Hearing Research, 35: 604–16. Otsu, Y. Universal Grammar and syntactic development in children. Unpublished Ph.D. thesis, MIT, Cambridge, MA. Otto, F. (1985). Classes of regular and context-free languages over countably infinite alphabets. Discrete Applied Mathematics, 12: 41–56. Özçelik, Öner (2009). Exceptions in stress assignment: Feet in input. Paper presented at NELS 40, MIT, November. Ozturk, O. and Papafragou, A. (2014). The acquisition of epistemic modality: From semantic meaning to pragmatic interpretation. Language Learning and Development, online: 1–24. Padgett, Jaye (2002). Russian voicing assimilation, final devoicing, and the problem of [v] (or, The mouse that squeaked). Unpublished manuscript. Padilla, J. (1990). On the Definition of Binding Domains in Spanish. Dordrecht: Kluwer. Palermo, D. S. (1973). More about less: A study of language comprehension. Journal of Verbal Learning and Verbal Behavior, 12: 211–21. Palermo, D. S. (1974). Still more about the comprehension of “less.” Experimental Psychology, 10: 827–9. Pallier, Christophe, Nuria Sebastián-Gallés, and Angels Colomé (1999). Phonological representations and repetition priming. Proceedings, Sixth European Conference on Speech Communication and Technology (Eurospeech ’99). Budapest, Hungary, September 5–9, 1999, ISCA Archive, 1907–10, . Palmer F. (1986). Mood and Modality. Cambridge: Cambridge University Press. Papadimitriou, Christon (1994). Computational Complexity. Boston, MA: Addison Wesley. Papafragou, A. (1997). Modality: A case-study in semantic underdeterminacy. In University College London Working Papers in Linguistics, 9: 77–105. Papafragou, A. (1998). The acquisition of modality: Implications for theories of semantic representation. Mind and Language, 13: 370–99. Papafragou, A. (2006). From scalar semantics to implicature: Children’s interpretation of aspectuals. Journal of Child Language, 33: 721–57. Papafragou, A. and Musolino, J. (2003). Scalar implicatures: Experiments at the semantics– pragmatic interface. Cognition, 86: 253–82. Papafragou, A. and Ozturk, O. (2007). Children’s acquisition of epistemic modality. Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America (GALANA), ed. Alyona Belikova et al. Somerville, MA: Cascadilla Proceedings Project, 320–327. Papafragou, A. and Ozturk, O. (in press). On the acquisition of modality. Proceedings from the 30th Annual Penn Linguistics Colloquium. Department of Linguistics, University of Pennsylvania. Papafragou, A. and Schwarz, N. (2006). Most wanted. Language Acquisition (Special Issue: On the Acquisition of Quantification), 13: 207–51. Papafragou, A. and Tantalou, N. (2004). Children’s computation of implicatures. Language Acquisition, 12: 71–82. Papafragou, A., Cassidy, K., and Gleitman, L. (2007). When we think about thinking: The acquisition of belief verbs. Cognition, 105: 125–65. Pappas, Athina and Gelman, Susan (1998). Generic noun phrases in mother–child conversations. Journal of Child Language, 25: 19–33.

References 917 Paradis, C. (1997). Degree Modifiers of Adjectives in Spoken British English. Lund, Sweden: Lund University Press. Paradis, J. (2005). Grammatical morphology in children learning English as a second language: Implications of similarities with specific language impairment. Language, Speech, and Hearing Services in Schools, 36: 172–87. Paradis, J. and Crago, M. (2000). Tense and temporality: A comparison between children learning a second language and chidlren with SLI. Journal of Speech, Language and Hearing Research, 43: 837–47. Paradis, J. and Crago, M. (2001). The morphosyntax of specific language impairament in French: An extended optional default account. Language Acquisition, 9(4): 269–300. Paradis, J. and Navarro, S. (2003). Subject realization and crosslinguistic interference in the bilingual acquisition of Spanish and English: What is the role of the input? Journal of Child Language, 30: 371–93. Paradis, Carole and Prunet, Jean-François (eds) (1991). The Special Status of Coronals: Internal and external evidence. San Diego: Academic Press. Paradis, J., M. Crago, and Genesee, F. (2006). Domain- general versus domain- specific accounts of specific language impairment: Evidence from bilingual children’s acquisition of object pronouns. Language Acquisition, 13(1): 33–62. Paris, S. G. (1973). Comprehension of language connectives and propositional logical relationships. Journal of Experimental Child Psychology, 16: 278–91. Parsons, T. (1972). Some problems concerning the logic of grammatical modifiers. In D. Davidson and G. Harman (eds), Semantics of Natural Language. Dordrecht: Reidel, 127–41. Parsons, Terence (1990). Events in the Semantics of English: A study in subatomic semantics. Cambridge, MA: MIT Press. Partee, Barbara (1999). Nominal and temporal semantic structure: Aspect and quantification. E. Hajičová, T. Hoskovec, O. Leška, P. Sgall, and Z. Skoumalová (eds), Prague Linguistic Circle Papers. Amsterdam: John Benjamins, 91–108. Partee, Barbara, Alice ter Meulen, and Robert Wall (1993). Mathematical Methods in Linguistics. Dordrect, Boston, London: Kluwer Academic Publishers. Patel, R. and Grigos, M. I. (2006). Acoustic characterization of the question–statement contrast in 4, 7 and 11 year-old children. Speech Communication, 48: 1308–18. Pater, J. (1997). Minimal violation and phonological development. Language Acquisition, 6: 201–53. Pater, Joe (2004). Bridging the gap between receptive and productive development with minimally violable constraints. In René Kager, Joe Pater, and Wim Zonneveld (eds), Constraints in Phonological Acquisition. Cambridge: Cambridge University Press, 219–44. Pater, Joe (2008). Gradual learning and convergence. Linguistic Inquiry, 39(2): 334–45. Pater, Joe (2009a). Morpheme-specific phonology: Constraint indexation and inconsistency resolution. In Steve Parker (ed.), Phonological Argumentation: Essays on evidence and motivation. London: Equinox, 123–54. Pater, Joe (2009b). Weighted constraints in generative linguistics. Cognitive Science, 33: 999–1035. Pater, Joe (2014). Canadian raising with language-specific weighted constraints. Language, 90(1): 230–40. Pater, Joe and Barlow, Jessica A. (2003). Constraint conflict in cluster reduction. Journal of Child Language, 30(3): 487–526.

918 References Pater, Joe and Tessier, Anne-Michelle (2005). Phonotactics and alternations: Testing the connection with artificial language learning. In Kathryn Flack and Shigeto Kawahara (eds), University of Massachusetts Occasional Papers in Linguistics, 31: 1–16. Pater, Joe and Werle, Adam (2003). Direction of assimilation in child consonant harmony. Canadian Journal of Linguistics, 48: 385–408. Pater, Joe, Stager, Christine, and Werker, Janet F. (1998). Additive effects of phonetic distinctions in word learning. In Proceedings: 16th International Congress on Acoustics and 135th Meeting Acoustical Society of America, 2049–50. Pater, Joe, Stager, Christine, and Werker, Janet (2004). The perceptual acquisition of phonological contrasts. Language, 80: 384–402. Paterson, Kevin B., Liversedge, Simon P., Rowland, Caroline, and Filik, Ruth (2003). Children’s comprehension of sentences with focus particles. Cognition, 89: 263–94. Pavlović, Milivoj (1920). Le langage enfantin: Acquisition du Serbe et du Français par un enfant serbe. Paris: E. Champion. Pea, R. (1982). Origins of verbal logic: Spontaneous denials by two-and three-year olds. Journal of Child Language, 9: 597–626. Pea R., Mawby, R., and MacCain, S. (1982). World-making and world-revealing: semantics and pragmatics of modal auxiliary verbs during the third year of life. Paper presented at the 7th Annual Boston Conference on Child Language Development. Pearl, L. and Lidz, J. (2009). When domain general learning fails and when it succeeds: Identifying the contribution of domain specificity. Language Learning and Development, 5(4): 235–65. Pearl, L. and Mis, B. (2011). How far can indirect evidence take us? Anaphoric One revisited. Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Boston, MA: Cognitive Science Society. Pearl, L. and Mis, B. (submitted). What indirect evidence can tell us about Universal Grammar: Anaphoric One revisited. University of California, Irvine. . Pearl, L. S. (2007). Necessary bias in natural language learning. Unpublished Ph.D. thesis, University of Maryland. Pearl, L. (2011). When unbiased probabilistic learning is not enough: Acquiring a parametric system of metrical phonology. Language Acquisition, 18: 87–120. Pearl, L., Goldwater, S., and Steyvers, M. (2011). Online learning mechanisms for Bayesian models of word segmentation, Research on Language and Computation, 8(2): 107–32. Pearson, H. (2009). How to do comparison in a language without degrees: A semantics for the comparative in Fijian. In M. Prinzhorn, V. Schmitt, and S. Zobel (eds), Proceedings of Sinn und Bedeutung 14. Online proceedings, 356–72. Pearson, M. (2005). The Malagasy subject/topic as an A′ element. Natural Language and Linguistic Theory, 23: 381–457. Pelucchi, B., Hay, J., and Saffran, J. (2009a). Statistical learning in natural language by 8-month- old infants. Child Development, 80(3): 674–85. Pelucchi, B., Hay, J., and Saffran, J. (2009b). Learning in reverse: Eight-month-old infants track backward transitional probabilities. Cognition, 244–7. Penke, M. (2015). Syntax and language disorders. In T. Kiss and A. Alexiadou (eds), Syntax: An International Handbook, vol. 3. Amsterdam: Mouton de Gruyter. Penner, Z. and Weissenborn, J. (1996). Strong continuity, parameter setting and the trigger hierarchy: On the acquisition of the DP in Bernese Swiss German and high German. In

References 919 H. Clahsen (ed.), Generative Perspectives on Language Acquisition: Empirical findings, theoretical considerations and crosslinguistic comparisons. Amsterdam: Benjamins, 161–200. Penner, Z., Wymann, K., and Schulz, P. (1999). Specific language impairment revisted: parallelism vs. deviance: a learning theoretical approach. Normal and Impaired Language Acquisition: Studies in lexical, syntactic, and phonological development, Fachgruppe Sprachwisseenschaft, University of Konstanz. Arbeitspapier No. 105. 2. Penner, Zvi, Schulz, Petra, and Wyman, Karin (2003). Learning the meaning of verbs: What distinguishes language- impaired from normally developing children? Linguistics, 41(2): 289–319. Penner, Z., Krügel, C., Gross, M., and Hesse, V. (2006). Sehr frühe Indikatoren von Spracherwerbsverzögerungen bei gesunden, normal hörenden Kindern. Frühförderung Interdisziplinar, 25: 37–48. Pensalfini, Rob (1995). Pronoun case errors, both syntactic and morphological. In C. Schütze, K. Broihier, and J. Ganger (eds), Papers on Language Processing and Acquisition. Cambridge, MA: MIT Working Papers in Linguistics 26: 305–24. Peperkamp, Sharon, Le Calvez, Rozenn, Nadal, Jean-Pierre, and Dupoux, Emmanuel (2006). The acquisition of allophonic rules: Statistical learning with linguistic constraints. Cognition, 101: B31–B41. Pérez-Leroux, Ana Teresa (1993). Empty categories and the acquisition of wh-movement. Unpublished Ph.D. thesis, University of Massachusetts. Pérez- Leroux, Teresa, Ana, Schmitt, Cristina, and Munn, Alan (2004). The development of inalienable possession in English and Spanish. In Reineke Bok-Bennema, Bart Hollebrandse, Brigitte Kampers- Manhe, and Petra Sleeman, Romance Languages and Linguistic Theory. Amsterdam: John Benjamins, 199–216. Pérez-Leroux, A. T., Castilla-Earls, A., Bejar, S., and Massam, D. (2012). Elmo’s sister’s ball: The problem of acquiring nominal recursion. Language Acquisition, 19: 301–11. Perez-Pereira, M. (1989). The acquisition of morphemes: Some evidence from Spanish. Journal of Psycholinguistic Research, 18(3): 289–312. Perfors, A., Tenenbaum, J., and Regier, T. (2006). Poverty of the stimulus? A rational approach. In R. Sun and N. Miyake (eds), Proceedings of the 28th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 663–8. Perfors, A., Tenenbaum, J. B., Gibson, E., and Regier, T. (2010). How recursive is language? A Bayesian exploration. In H. van der Hulst (ed.), Recursion and Human Language. Amsterdam: Mouton DeGruyter, 159–175. Perfors, A., Tenenbaum, J., Regier, T. (2011). The learnability of abstract syntactic principles. Cognition, 118(3): 306–38. Perlmutter, David M. (1978). Impersonal passives and the unaccusative hypothesis. In Proceedings of the Berkeley Linguistic Society, 4: 157–189. Perlmutter, David M. and Postal, Paul M. (1984). The 1-Advancement exclusiveness law. In David Perlmutter and Carol Rosen (eds), Studies in Relational Grammar 2. Chicago: University of Chicago Press, 81–125. Perner, J., Sprung, M., Zauner, P., and Haider, H. (2003). “Want that” is understood well before “say that,” “think that,” and “false belief ”: A test of de Villiers’ linguistic determinism on German-speaking children. In Child Development, 74: 179–88. Perovic, A. (2004). Knowledge of Binding in Down Syndrome: Evidence from English and Serbo- Croatian. London: University College, London.

920 References Perovic, A. (2006). Syntactic deficit in Down syndrome: More evidence for the modular organisation of language. Lingua, 116: 1616–30. Perovic, A. and Wexler, K. (2004). Knowledge of binding, raising and passives in Williams Syndrome. Generative approaches to language acquisition—North America (GALANA). University of Hawai’i at Manoa, University of Connecticut Occasional Papers in Linguistics. Perret, Cyril, Bonin, Patrick, and Méot, Alan (2006). Syllabic priming effects in picture naming in French: Lost in the sea! Experimental Psychology, 53: 95–104. Perruchet, P. and Desaulty, S. (2008). A role for backward transitional probabilities in word segmentation? Memory and Cognition, 36(7): 1299–305. Perruchet, P. and Vinter, A. (1998). PARSER: A model for word segmentation. Journal of Memory and Language, 39: 246–63. Pesetsky, D. (1987). Wh-in-situ: Movement and unselective binding. In E. Reuland and A. Meulen (eds), The Representation of (In)definiteness. Cambidge, MA: MIT Press, 98–129. Pesetsky, D. (1995). Zero Syntax: Experiencers and cascades. Cambridge, MA: MIT Press. Peter, B., Raskind, W. H., Matsushita, M., Lisowski, M., Vu, T., Berninger, V. W., Wijsman, E. M., and Brkanac, Z. (2011). Replication of CNTNAP2 association with nonword repetition and support for FOXP2 association with timed reading and motor activitiesin a dyslexia family sample. Journal of Neurodevelopmental Disorders, 3: 39–49. Peters, A. (1983). The Units of Language Acquisition, Monographs in Applied Psycholinguistics. New York: Cambridge University Press. Peters, A. and S. Strömqvist (1996). The role of prosody in the acquisition of morphemes. In J. Morgan and K. Demuth (eds), Signal to Syntax: Bootstrapping from speech to grammar in early acquisition. Mahwah, NJ: Lawrence Erlbaum, 215–32. Petersson, K. M., Forkstam, C., and Ingvar, M. (2004). Artificial syntactic violations activate brocas region. Cognitive Science, 28: 383–407. Philip, W. (1995). Event quantification in the acquisition of universal quantification. Unpublished Ph.D. thesis, University of Massachusetts. Philip, W. and Coopmans, P. (1996a). The role of lexical feature acquisition in the development of pronominal anaphora. In W. Philip and F. Wijnen (eds), Amsterdam Series on Child Language Development. Amsterdam: Instituut Algemene Taalwetenschap, 5: 68. Philip, W. and Coopmans, P. (1996b). The Double Dutch Delay of Principle B Effect. Proceedings of the 20th Boston University Conference on Language Development. Somerville MA: Cascadilla Press, 576–87. Phillips, C. (1995). Syntax at age two: Cross-linguistic differences. In C. Schütze, K. Broihier, and J. Ganger (eds), Papers on Language Processing and Acquisition. Cambridge, MA: MIT Working Papers in Linguistics 26: 225–82. Phillips, C. (2004). Linguistics and linking problems. In M. Rice and S. Warren (eds), Developmental Language Disorders: From phenotypes to etiologies. Mahwah, NJ: Lawrence Erlbaum Associates. Phinney, M. (1981). Syntactic constraints and the acquisition of embedded complements. Unpublished Ph.D. thesis, University of Massachusetts. Piaget, J. (1928). Judgment and Reasoning in the Child. London: Routledge and Kegan Paul. Piaget, J. (1967). The Language and Thought of the Child. Cleveland: World. Piaget, J. (1968). Judgment and Reasoning in the Child. New Jersey: Littlefield. Piantadosi, S., Tenenbaum, J., and Goodman, N. (2012). Modeling the acquisition of quantifier semantics: a case study in function word learnability. Unpublished manuscript, University of Rochester.

References 921 Pierce, A. (1989). On the emergence of syntax: A crosslinguistic study. Unpublished Ph.D. thesis, MIT. Pierce, A. (1992a). The acquisition of passives in Spanish and the question of A-chain maturation. Language Acquisition, 2: 55–81. Pierce, A. (1992b). Language Acquisition and Syntactic Theory: A comparative analysis of French and English child grammars. Dordrecht, NL: Kluwer. Pierrehumbert, J. (1980). The phonetics and phonology of English intonation. Unpublished Ph.D. thesis, MIT. Pierrehumbert, Janet (2001). Exemplar dynamics: Word frequency, lenition, and contrast. In J. Bybee and P. Hopper (eds), Frequency Effects and the Emergence of Lexical Structure. Amsterdam: John Benjamins, 137–57. Pierrehumbert, Janet B. (2003). Phonetic diversity, statistical learning, and acquisition of phonology. Language and Speech, 46: 115–54. Pierrehumbert, J. and Beckman, M. (1986). Japanese Tone Structure. Cambridge, MA: MIT Press. Pietroski, P. M. (2005). Events and Semantic Architecture. Oxford: Oxford University Press. Piffer, L., Agrillo, C., and Hyde, D. C. (2012). Small and large number discrimination in guppies. Animal Cognition, 15(2): 215–221. Pike, K. (1945). The Intonation of American English. Ann Arbor, MI: University of Michigan Press. Piñango, Maria, Zurif, Edgar, and Jackendoff, Ray (1999). Real-time processing implications of enriched composition at the syntax–semantics interface. Journal of Psycholinguistic Research, 28(4): 395–414. Pine, J. M. and Lieven, E. (1993). Reanalysing rote-learned phrases: Individual differences in the transition to multi-word speech. Journal of Child Language, 20: 551–7 1. Pine, J. M. and Lieven, E. (1997). Slot and frame patterns in the development of the determiner category. Applied Psycholinguistics, 18: 123–38. Pine, J. M., Lieven, E. V. M., and Rowland, C. F. (1998). Comparing different models of the development of the English verb category. Linguistics, 36: 807–30. Pine, J. M., Joseph, K. L., and Conti-Ramsden, G. (2004). Do data from children with Specific Language Impairment support the agreement/tense omission model? Journal of Speech, Language, and Hearing Research, 47(4): 913–23. Pine, J. M., Rowland, C. F., Lieven, E. V. M., and Theakston, A. L. (2005). Testing the agreement/tense omission model: Why the data on children’s use of non-nominative 3psg subjects count against the ATOM. Journal of Child Language, 32(2): 269–89. Pine, J. M., Conti-R amsden, G., Joseph, K. L., Lieven, E. V. M., and Serratrice, L. (2008). Tense over time: Testing the agreement/tense omission model as an account of the pattern of tense-marking provision in early child English. Journal of Child Language, 35(1): 55–75. Pinker, S. (1979). Formal models of language learning. Cognition, 7: 217–83. Pinker, S. (1984). Language Learnability and Language Development. Cambridge, MA: Harvard University Press. Pinker, S. (1989). Learnability and Cognition: The acquisition of argument structure. Cambridge, MA: MIT Press. Pinker, S. (1994). The Language Instinct: How the mind creates language. New York: Morrow. Pinker, Steven (1999). Words and Rules: The ingredients of language. New York: Basic Books. Pinker, S. (2001). Talk of genetics and vice versa. Nature, 413: 465–66.

922 References Pinker, Steven and Prince, Alan (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28: 73–193. Pinker, Steven and Prince, Alan (1994). Regular and irregular morphology and the psychological status of rules of grammar. In S. Lima, R. Corrigan, and G. Iverson (eds), The Reality of Linguistic Rules. Amsterdam: John Benjamins, 321–52. Pinker, S. and Ullman, M. T. (2002). The past and future of the past tense. Trends in Cognitive Sciences, 6(11): 456–63. Pinker, S., Lebeaux, D., and Frost, L. (1987). Productivity and constraints in the acquisition of the passive. Cognition, 26: 195–267. Pitt, Leonard (1985). Probabilistic inductive inference. Unpublished Ph.D. thesis, Yale University. Pitt, Leonard (1989). Inductive inference, DFAs and computational complexity. In Proceedings of the International Workshop on Analogical and Inductive Inference, Lecture Notes in Artificial Intelligence (v. 397). Springer-Verlag, 18–44. Pizzuto, Elena and Caselli, Maria Cristina (1992). The acquisition of Italian morphology: implications for models of language development. Journal of Child Language, 19: 491–557. Pizutto, E. and Caselli, C. (1994). The acquisition of Italian verb morphology in a cross- linguistic perspective. In Y. Levy (ed.), Other Children, Other Languages. Hillsdale, NJ: Erlbaum, 137–88. Plomin, R. and Kovas, Y. (2005). Generalist genes and learning disabilities. Psychological Bulletin, 131(4): 592–617. Plomin, R., DeFries, J. C., McClearn, G. E., and McGuffin, P. (2008). Behavioral Genetics, 5th edn. New York: Worth. Plunkett, B. (1991). Inversion in early wh-questions. In T. Maxwell and B. Plunkett (eds), Papers in the Acquisition of wh: Proceedings of the UMass Roundtable, May, 1990. Amherst, MA: University of Massachusetts Occasional Papers. Plunkett, K. and Strömqvist, S. (1990). The acquisition of Scandinavian languages. Gothenburg Papers in Theoretical Linguistics, 59. Plunkett, Kim and Marchman, Virginia (1991). U-shaped learning and frequency effects in a multilayered perceptron: Implications for child language acquisition. Cognition, 38: 43–102. Plunkett, Kim and Marchman, Virginia (1993). From rote learning to system building: The acquisition of morphology in children and connectionist nets. Cognition, 48: 21–69. Poeppel, D. and Wexler, K. (1993). The full competence hypothesis of clause structure in early German. Language, 69: 1–33. Polka, L. and Sundara, M. (2012). Word segmentation in monolingual infants acquiring Canadian English and Canadian French: Native language, cross-dialect, and crosslanguage comparisons. Infancy, 17: 198–232. Polka, L. and Werker, J. (1994). Developmental changes in perception of nonnative vowel contrasts. Journal of Experimental Psychology: Human perception and performance, 20: 421–35. Pollard, Carl, and Sag, Ivan (1994). Head-driven Phrase Structure Grammar. Chicago: University of Chicago Press. Pollock, J.-Y. (1989). Verb movement, Universal Grammar, and the structure of IP. Linguistic Inquiry, 20(3): 365–424. Pons, F. and Bosch, L. (2007). The perception of stress patterns by Spanish and Catalan infants. In P. Prieto, J. Mascaró, and M. J. Solé (eds), Segmental and Prosodic Issues in (Romance) Linguistics. Current issues in linguistic theory. Amsterdam: John Benjamins, 199–218.

References 923 Pons, F. and Bosch, L. (2010). Stress pattern preference in Spanish-learning infants: The role of syllable weight. Infancy, 15: 223–45. Postal, Paul (1974). On Raising: One rule of English grammar and its theoretical implications. Cambridge, MA: MIT Press. Potsdam, E. (2011). A direct analysis of Malagasy phrasal comparatives. Paper presented at the 18th Meeting of the Austronesian Formal Linguistics Association, March 4–6, Harvard University. Potsdam, Eric and Polinsky, Maria (2011). Against covert A-movement in Russian unaccusatives. Linguistic Inquiry, 42: 345–55. Potts, Christopher, Pater, Joe, Jesney, Karen, Bhatt, Rajesh, and Becker, Michael (2010). Harmonic grammar with linear programming: From linear systems to linguistic typology. Phonology, 27(1): 77–117. Pouscoulous, N., Noveck, I., Politzer, G., Bastide, A. (2007). Processing costs and implicature development. Language Acquisition, 14: 347–75. Powers, Susan M. (1996). The growth of the phrase marker: Evidence from subjects. Unpublished Ph.D. thesis, University of Maryland. Prasada, Sandeep (2000). Acquiring generic knowledge. Trends in Cognitive Sciences, 4: 66–72. Prasada, Sandeep, and Dillingham, Elaine M. (2006). Principled and statistical connections in common sense conception. Cognition, 99(1): 73–112. Prather, Elizabeth M., Hedrick, Dona Lee, and Kern, Carolyn A. (1975). Articulation development in children aged two to four years. Journal of Speech and Hearing Disorders, 40: 170–91. Pratt, Amy and Grinstead, John (2007). Optional infinitives in child Spanish. In Alyona Belikova, Luisa Meroni, and Mari Umeda (eds), Proceedings of the 2nd conference on Generative Approaches to Language Acquisition North America (GALANA). Somerville, MA: Cascadilla Proceedings Project, 351–62. Pratt, A. and Grinstead, J. (2008). Receptive measures of the optional infinitive stage in child Spanish. In J. B. d. Garavito and E. Valenzuela (eds), Proceedings of the Hispanic Linguistic Symposium. University of Western Ontario, London, Ontario: Cascadilla Press, 120–33. Price, J. R., Roberts, J. E., Hennon, E. A., Berni, M. C., Anderson K. L., and Sideris, J. (2008). Syntactic complexity during conversation of boys with fragile X syndrome and Down syndrome. Journal of Speech, Language and Hearing Research, 51: 3–15. Price, T. S., Eley, T. C., Dale, P. S., Stevenson, J., Saudino, K., and Plomin, R. (2000). Genetic and environmental covariation between verbal and nonverbal cognitive development in infancy. Child Development, 71: 948–59. Prieto, P. (2006). The relevance of metrical information in early prosodic word acquisition: A comparison of Catalan and Spanish. Language and Speech, 49: 23–58. Prieto, P., Mascaró, J., and Solé, M. J. (eds) (2007). Segmental and Prosodic Issues in (Romance) Linguistics. Current issues in linguistic theory. Amsterdam: John Benjamins. Prince, Alan (2002a). Arguing Optimality. In Andries Coetzee, Angela Carpenter, and Paul de Lacy (eds), Papers in Optimality Theory II. Amherst, MA: GLSA. Prince, Alan (2002b). Anything goes. In Takeru Honma, Masao Okazaki, Toshiyuki Tabata, and Shin-Ichi Tanaka (eds), New Century of Phonology and Phonological Theory. Kaitakusha. Tokyo, 66–90.

924 References Prince, Alan and Smolensky, Paul (2004). Optimality Theory : Constraint interaction in generative grammar. Malden, MA: Blackwell Publishers. (Original work published 1993). Prince, Alan and Tesar, Bruce (2004). Learning phonotactic distributions. In Rene Kager, Joe Pater, and Wim Zonneveld (eds), Fixing Priorities: Constraints in phonological acquisition. Cambridge: Cambridge University Press, 245–91. Progovac, L. (1994). Negative and Positive Polarity. Cambridge: Cambridge University Press. Pullum, G. and Scholz, B. (2002). Empirical assessment of stimulus poverty arguments. The Linguistic Review, 19: 9–50. Purcell, S. (2008). Appendix: Statistical methods in behavior genetics. In R. Plomin, J. C. DrFries, G. E. McClearn, and P. McGuffin, Behavioral Genetics, 5th edn. New York: Worth, 359–410. Pustejovsky, J. (1991). The syntax of event structure. Cognition, 41: 47–81. Pye, C. (1992). The acquisition of K’iche’ Maya. In D. I. Slobin (ed.), The Crosslinguistic Study of Language Acquisition. Hillsdale, Lawrence Erlbaum, 3: 221–308. Pye, Clifton, and Poz, Pedro Quixtan (1988). Precocious passives (and antipassives) in Quiche Mayan. Papers and Reports on Child Language Development, 27: 71–80. Pye, Clifton, Ingram, David, and List, Helen (1987). A comparison of initial consonant acquisition in English and Quiche. In K. E. Nelson and A. Van Kleeck (eds), Children’s Language. Hillsdale, NJ: Erlbaum, 175–90. Pyers, J. (2004). The relationship between language and false-belief understanding: Evidence from learners of an emerging sign language in Nicaragua. Unpublished Ph.D. thesis, University of California. Pyers, J. and Senghas, A. (2009). Language promotes false-belief understanding: Evidence from learners of a new sign language. Psychological Science, 20(7): 805–12. Pylkkänen, L. (2008). Introducing Arguments. Cambridge, MA: MIT Press. Qu, Chen (2011). Development of place contrast system in early L1 grammar. Western Conference of Linguistics (WECOL). Simon Fraser University. Vancouver, Canada. November. Quam, C. and Swingley, D. (2010). Phonological knowledge guides two-year-olds’ and adults’ interpretation of salient pitch contours in word learning. Journal of Memory and Language, 62: 135–50. Radford, A. (1988). Small children’s small clauses. Transactions of the Philological Society, 86: 1–43. Radford, A. (1990). Syntactic Theory and the Acquisition of English Syntax. Oxford: Basil Blackwell. Radford, A. (1997). Syntax: A minimalist introduction. Cambridge: Cambridge University Press. Radford, Andrew (1998). Genitive subjects in child English. Lingua, 106: 113–31. Radford, A. and Galasso, J. (1998). Children’s possessive structures: A case study. Essex Research Reports in Linguistics, 19: 37–45. Radford, A. and Ploennig-Pacheco, I. (1995). The morphosyntax of subjects and verbs in child Spanish: A case study. Essex Reports in Linguistics, 5: 23–67. Rakison, D. H. and Oakes, L. M. (2003). Early Category and Concept Development: Making sense of the blooming, buzzing confusion. New York: Oxford University Press. Ramchand, Gillian (1997). Aspect and Predication: The semantics of argument structure. Oxford: Clarendon Press. Ramus, F., Rosen, S., Dakin, S. C., Day, B. L., Castellotte, J. M., White, S., and Frith, U. (2003). Theories of developmental dyslexia: insights from a multiple case study of dyslexic adults. Brain, 126: 841–65. Randall, J. (2010). Linking: The geometry of argument structure. London: Springer.

References 925 Randall, Janet, van Hout, Angeliek, and Weissenborn, Jürgen (1994). Approaching linking. In Proceedings of the Boston University Conference on Language Development. Somerville: Cascadilla Press. Raposo, E. and Uriagereka, J. (1995). Two types of small clauses (Toward a syntax of theme/ rheme relations). In A. Cardinaletti and M. T. Guasti (eds), Syntax and Semantics 28: Small clauses. San Diego, CA: Academic Press, 179–206. Rappaport Hovav, M. and Levin, B. (1998). Building verb meanings. In M. Butt and W. Geuder (eds), The Projection of Arguments: Lexical and compositional factors. Stanford, CA: CSLI Publications, 97–134. Rappaport Hovav, M. and Levin, B. (2001). An event structure account of English resultatives. Language, 77: 766–97. Rasetti, L. (2000a). Interpretive and formal properties of null subjects in early French. Generative Grammar at Geneva (GG@G). 1: 241–274. Rasetti, L. (2000b). Null subjects and root infinitives. In M.-A. Friedemann and L. Rizzi (eds), The Acquisition of Syntax. Harlow, UK: Longman, 236–68. Rasetti, L. (2003). Optional categories in early French syntax: A developmental study of root infinitives and null arguments. Unpublished Ph.D. thesis, University of Geneva, Geneva, Switzerland. Raven, J. C., Court, J. H. and Raven, J. (1998). Manual for Raven’s Progressive Matrices. London: H. K. Lewis. Redington, M., Chater, C., and Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4): 425–69. Regier, T. and Gahl, S. (2004). Learning the unlearnable: The role of missing evidence. Cognition, 93: 147–55. Reichenbach, H. (1947). Elements of Symbolic Logic. New York: Macmillan. Reinhart, T. (1976). The syntactic domain of anaphora. Unpublished Ph.D. thesis, MIT. Reinhart, T. (1983). Coreference and bound anaphora: A restatement of the anaphora questions. Linguistics and Philosophy, 6: 47–88. Reinhart, T. and Reuland, E. (1993). Reflexivity. Linguistic Inquiry, 24: 657–720. Renfrew, C. E. (1997a). Bus Story Test— A test of narrative speech, 4th edn. Bicester, Oxon: Winslow Press. Renfrew, C. E. (1997b). Action Picture Test, 4th edn. Bicester, Oxon: Winslow Press. Resnik, P. and Hardisty, E. (2009). Gibbs sampling for the uninitiated. Unpublished manuscript, Version 0.3, October 2009. Available from . Reuland, E. (2001). Primitives of Binding. Linguistic Inquiry, 32: 439–92. Reznick, J.S., Corley, R., and Robinson, J. (1997). A longitudinal twin study of intelligence in the second year. Monographs of the Society for Research in Child Development, 62: 1–154. Rice, Keren and Avery, Peter (1995). Variability in a deterministic model of language acquisition: A theory of segmental elaboration. In John Archibald (ed.), Phonological Acquisition and Phonological Theory. Hillsdale, NJ: Lawrence Erlbaum, 23–42. Rice, M. (2003). A unified model of specific and general language delay: Grammatical tense as a clinical marker of unexpected variation. In Y. Levy and J. Schaeffer (eds), Language Competence across Populations. Mahway, NJ: LEA. Rice, M. and K. Wexler (1996). Toward tense as a clinical marker of specific language impairment. Journal of Speech, Language and Hearing Research, 39: 1236–57. Rice, M. L., and K., Wexler (2001). Rice/Wexler Test of Early Grammatical Impairment. San Antonio, TX: The Psychology Corporation.

926 References Rice, M. L., Wexler, K., and Hershberger, S. (1998). Tense over time: The longitudinal course of tense acquisition in children with Specific Language Impairment. Journal of Speech, Language, and Hearing Research, 41(6): 1412–31. Rice, M. L., Wexler, K., and Redmond, S. M. (1999a). Grammaticality judgments of an extended optional infinitive grammar: Evidence from English-speaking children with Specific Language Impairment. Journal of Speech, Language, and Hearing Research, 42(4): 943–61. Rice, M., Mervis, C., Klein, B. P., and Rice, K. J. (1999b). Children with Williams syndrome do not show an EOI stage. Paper presented at the Boston University Conference on Language Development. Boston. Rice, M. L., Smith, S. D., and Gayan, J. (2009). Convergent genetic linkage and associations to language, speech, and reading measures in families of probands with Specific Language Impairment. Journal of Neurodevelopmental Disorders, 1: 264–82. Richtsmeier, Peter T. (2007). An articulatory analysis of gliding. Paper presented at Ultrafest IV, New York, Sept. Retrieved from . Riemsdijk, Henk van (1978). A Case Study in Syntactic Markedness: The binding nature of prepositional phrases. Dordrecht: Foris. Rigau, G. (1991). On the functional properties of AGR. Catalan Working Papers in Linguistics, 235–60. Riggle, Jason (2004). Generation, recognition, and learning in Finite State Optimality Theory. Unpublished Ph.D. thesis, UCLA. Rij, J, van, van Rijn, H. and Hendriks, P. (2010). Cognitive architectures and language acquisition: A case study in pronoun comprehension. Journal of Child Language, 37(3): 731–66. Rijk, Rudolf P. G. de (2008). Standard Basque: A progressive grammar. Cambridge, MA: MIT Press. Ring, M. and Clahsen, H. (2005). Distinct patterns of language impairment in Down’s syndrome and Williams syndrome: The case of syntactic chains. Journal of Neurolinguistics, 18: 479–501. Rips, L. J. (1975). Quantification and semantic memory. Cognitive Psychology, 7: 307–40. Rispoli, M. (1987). The acquisition of the transitive and intransitive action verb categories in Japanese. First Language, 7: 183–200. Rispoli, Matthew (1994). Prounoun case overextension and paradigm building. Journal of Child Language, 21: 157–72. Rispoli, M. (1995). Missing arguments and the acquisition of predicate meanings. In M. Tomasello and W. Merriman (eds), Beyond Names for Things: Young children’s acquisition of verbs. Hillsdale NJ: Erlbaum, 331–52. Rispoli, M. (1997). The default case for subjects in the Optional Infinitive Stage. Proceedings of the Annual Boston University Conference on Language Development, 21(2): 465–75. Rispoli, M. (1998a). Me or my: Two different patterns of pronoun case errors. Journal of Speech, Language, and Hearing Research, 41(2): 385–93. Rispoli, M. (1998b). Patterns of pronoun case error. Journal of Child Language, 25(3): 533–54. Rispoli, M. (1999). Case and agreement in English language development. Journal of Child Language, 26(2): 357–72. Rispoli, M. (2000). Towards a more precise model of pronoun case error: A response to Schutze. Journal of Child Language, 27(3): 707–14. Rispoli, M. (2002). Theory and methods in the study of the development of case and agreement: A response to Schutze. Journal of Child Language, 29(1): 151–9.

References 927 Rispoli, M. (2005). When children reach beyond their grasp: Why some children make pronoun case errors and others don’t. Journal of Child Language, 32(1): 93–116. Ritter, E. (1991). Two functional categories in noun phrases: Evidence from Modern Hebrew. In S. Rothstein (ed.), Perspectives on Phrase Structure: Heads and licensing. New York: Academic Press, 37–62. Rizzi, L. (1982). Issues in Italian Syntax. Dordrecht: Foris. Rizzi, L. (1986). Null objects in Italian and the theory of pro. Linguistic Inquiry, 17: 501–57. Rizzi, L. (1990). Relativized Minimality. Cambridge, MA: MIT Press. Rizzi, L. (1994). Some notes on linguistic theory and language development: The case of root infinitives. Language Acquisition, 3: 371–93. Rizzi, L. (1997). The fine structure of the left periphery. In L. Haegeman (ed.), Elements of Grammar: Handbook of generative syntax. Dordrecht: Kluwer, 281–331. Rizzi, L. (2001). On the position of Int(errogative) in the left periphery of the clause. In G. Cinque and G. Salvi (eds), Current Studies in Italian Linguistics. Offered to Lorenzo Renzi. Oxford: Elsevier, 287–96. Rizzi, L. (2005a). On the grammatical basis of language development: A case study. In G. Cinque and R. Kayne (eds), The Oxford Handbook of Comparative Syntax. New York: Oxford University Press, 70–109. Rizzi, L. (2005b). Grammatically-based target- inconsistencies in child language. Proceedings of Generative Approaches to Language Acquisition, Cambridge, MA. Roberge, J. (1975). Development of comprehension of logical connectives in symbolic or verbal form. Educational Studies in Mathematics, 6: 207–12. Roberts, I. (2010). A deletion analysis of null subjects. In T. Biberauer, A. Holmberg, I. Roberts, and M. Sheehan (eds), Parametric Variation: Null subjects in Minimalist Theory. Cambridge: Cambridge University Press, 58–87. Roberts, I. and Holmberg, A. (2010). Introduction: Parameters in minimalist theory. In T. Biberauer, A. Holmberg, I. Roberts, and M. Sheehan (eds), Parametric Variation: Null subjects in Minimalist Theory. Cambridge: Cambridge University Press, 1–57. Roeper, T. (1982). The role of universals in the acquisition of gerunds. In E. Wanner and L Gleitman (eds), Language Acquisition: The state of the art. Cambridge: Cambridge University Press. Roeper, T. (1987). Implicit arguments and the Head-Complement relation. Linguistic Inquiry, 18: 267–310. Roeper, T. (2000). Universal bilingualism. Bilingualism: Language and cognition, 2: 169–86. Roeper, T. (2007). The Prism of Language: How child language illuminates humanism. Cambridge, MA: MIT Press. Roeper, T. (2011). The acquisition of recursion: How formalism articulates the child’s path. Biolinguistics, 5(1–2): 57–86. Roeper, T. and de Villiers, J. G. (1991). Ordered decisions in the acquisition of wh-questions. In H. Goodluck, J. Weissenborn, and T. Roeper (eds), Theoretical Issues in Language Development. Hillsdale, NJ: Erlbaum. Roeper, T. and de Villiers, J. G. (1994). Lexical links in the Wh-chain. In B. Lust, G. Hermon, and J. Kornfilt (eds), Syntactic Theory and First Language Acquisition: Cross linguistic perspectives, Volume II: Binding, dependencies and learnability. Hillsdale, NJ: Lawrence Erlbaum. Roeper, T. and de Villiers, J. G. (2011). The acquisition path for wh-questions. In J. G. de Villiers and T. Roeper (eds), Handbook of Generative Approaches to Language Acquisition. New York: Springer.

928 References Roeper, T. and Rohrbacher, B. (1994). Null subjects in early child English and the theory of economy of projection. University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-94–16. Roeper, Thomas and Snyder, Willliam (2004). Recursion as an analytic device in acquisition. In Proceedings of GALA 2003 (Generative Approaches to Language Acquisition). Utrecht: LOT Publications, 401–8. Roeper, T. and Snyder, W. (2005). Language learnability and the forms of recursion. In A. M. Di Sciullo and R. Delmonte (eds), UG and External Systems: Language, brain and computation. Amsterdam: John Benjamins, 155–69. Roeper, T. and Weissenborn, J. (1990). How to make parameters work: Comments on Valian. In L. Frazier and J. de Villiers (eds), Language Processing and Language Acquisition. Dordrecht, NL: Kluwer, 117–62. Roeper, T., Rooth, M., Mallis, L., and Akiyama, S. (1985). The problem of empty categories and bound variables in language acquisition. Unpublished manuscript, University of Massachusetts. Rogers, Hartley (1967). Theory of Recursive Functions and Effective Computability. Columbus, OH: McGraw Hill Book Company. Rogers, James, and Hauser, Marc (2010). The use of formal languages in artificial language learning: a proposal for distinguishing the differences between human and nonhuman animal learners. In Harry van der Hulst (ed.), Recursion and Human Language. Berlin: De Gruyter Mouton, 213–32. Ronald, A., Happé, F., Bolton, P., Butcher, L. M., Price, T. S., Wheelwright, S., Baron-Cohen, S., and Plomin, R. (2006). Genetic heterogeneity between the three components of the autism spectrum: A twin study. Journal of the American Academy of Child Adolescent Psychiatry, 45(6): 691–9. Rondal, Jean (1985). Adult–child Interaction and the Process of Language Understanding. New York: Praeger. Rondal, Jean, Bachelet, J. F., and Peree, F. (1985). Analyse du langage et des interactions verbales adulte-enfant. Bulletin d’Audiophonologie, 5. Rooryck, J. and Vanden Wyngaerd, G. (1997). The self as other: A minimalist approach to zich and zichzelf in Dutch. Proceedings of NELS, 28. Rose, Sharon and Walker, Rachel (2004). A typology of consonant agreement as correspondence. Language, 80: 475–531. Rose, Yvan (2000). Headedness and prosodic licensing in the L1 acquisition of phonology. Unpublished Ph.D. thesis, McGill University. Rose, Yvan (2009). Internal and external influences on child language productions. In François Pellegrino, Egidio Marsico, Ioana Chitoran, and Christophe Coupé (eds), Approaches to Phonological Complexity. Berlin: Mouton de Gruyter, 329–51. Rosen, S., Adlard, A., and van der Lely, H. K. J. (2009). Backward and simultaneous masking in children with grammatical specific langauge impairment: no simple link between auditory and language abilities. Journal of Speech, Language and Hearing Research, 52(2): 396–411. Rosenbaum, Peter (1967). The Grammar of English Predicate Complement Constructions. Cambridge, MA: MIT Press. Ross, Alan S.C. (1937). An example of vowel-harmony in a young child. Modern Language Notes, November: 508–9. Rothstein, S. (1983). The syntactic forms of predication. Unpublished Ph.D. thesis, MIT.

References 929 Rothstein, Susan (2004). Structuring Events: A study in the semantics of lexical aspect. Oxford: Blackwell. Rotstein, C. and Winter, Y. (2004). Total adjectives vs. partial adjectives: Scale structure and higher-order modifiers. Natural Language Semantics, 12: 259–88. Rowland, C. F. (2007). Explaining errors in children’s questions. Cognition, 104(1): 106–34. Rowland, C. F. and Pine, J. M. (2000). Subject–auxiliary inversion errors in wh-question acquisition: “what children do know?” Journal of Child Language, 27(1): 157–81. Rowland, C. F., Pine, J. M., Lieven, E. V. M., and Theakston, A. L. (2005). The incidence of error in young children’s wh-questions. Journal of Speech, Language, and Hearing Research, 48(2): 384–404. Rubach, Jerzy and Booij, Geert E. (1990). Edge of constituent effects in Polish. Natural Language and Linguistic Theory, 8: 427–63. Ruigendijk, E., Vasić, N., and Avrutin, S. (2006). Reference assignment: Using language breakdown to choose between theoretical approaches. Brain and Language, 96(3): 302–17. Rumelhart, D. and McClelland, J. (eds) (1986a). Parallel Distributed Processing: Explorations in the microstructure of cognition, vol. 1: Foundations. Cambridge, MA: MIT Press, chapter 18. Rumelhart, D. E., and McClelland, J. L. (1986b). On learning the past tenses of English verbs. In J. L. McClelland and D. E. Rumelhart (eds), Parallel Distributed Processing. Cambridge MA: MIT Press, 2: 216–7 1. Rumelhart, D.E. and J. L. McClelland (1987). Learning the past tense of English verbs: Implicit rules or parallel distributed processing? In B. MacWhinney (ed.), Mechanisms of Language Acquisition. Hillsdale, NJ: Erlbaum. Rusiecki, J. (1985). Adjectives and Comparison in English, vol. 31. Longman Linguistics Library. London: Longman Group Limited. Russell, J. (1987). “Can we say … ?” Children’s understanding of intensionality. Cognition, 2: 289–308. Rvachew, Susan and Bernhardt, Barbara M. (2010). Clinical implications of dynamic systems theory for phonological development. American Journal of Speech-Language Pathology, 19: 34–50. Rvachew, Susan and Jamieson, Donald G. (1989). Perception of voiceless fricatives by children with a functional articulation disorder. Journal of Speech and Hearing Disorders, 54: 193–208. Rvachew, Susan and Nowak, Michele (2001). The effect of target-selection strategy on phonological learning. Journal of Speech, Language, and Hearing Research, 44: 610–23. Sabel, J. (2000). Partial wh-movement and the typology of wh-questions. In U. Lutz, G. Müller, and A. von Stechow (eds), Wh-Scope Marking. Amsterdam: John Benjamins, 409–46. Sachs, J. (1983). Talking about the there and then: The emergence of displaced reference in parent-child discourse. In K. E. Nelson (ed.), Children’s Language, vol. 3. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Sadock, J. (1984). Whither radical pragmatics? In D. Schiffrin (ed.), Meaning, Form and Use in Context: Linguistics applications. Washington: Georgetown University Roundtable, Georgetown University Press, 139–49. Saffran, Jenny R. and Estes, Katharine Graf (2006). Mapping sound to meaning: Connections between learning about sounds and learning about words. In Robert V. Kail (ed.), Advances in Child Development and Behavior. New York: Elsevier, 1: 1–38. Saffran, J. R. and Thiessen, E. D. (2003). Pattern induction by infant language learners. Developmental Psychology, 39: 484–94.

930 References Saffran, Jenny R., Newport, Elissa L., and Aslin, Richard N. (1996a). Statistical learning by 8- month old infants. Science, 274: 1926–8. Saffran, Jenny R., Newport, Elissa L., and Aslin, Richard N. (1996b). Word segmentation: The role of distributional cues. Journal of Memory and Language, 25: 606–21. Saffran, J. R., Johnson, E. K., Aslin, R. N., and Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1): 27–52. Saffran, J. R., Hauser, M., Seibel, R. L., Kapfhamer, J., Tsao, F., and Cushman, F. (2008). Grammatical pattern learning by infants and cotton-top tamarin monkeys. Cognition, 107: 479–500. Sag, I. (1976). Deletion and logical form. Unpublished Ph.D. thesis, MIT. [Published: Garland, New York (1980).] Sakas, W. G. (2000). Ambiguity and the computational feasibility of syntax acquisition. Unpublished Ph.D. thesis, City University of New York. Sakas, W. G. (2003). A word-order database for testing computational models of language acquisition. In E. Hinrichs and D. Roth (eds), Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003). Sapporo, Japan. Sakas, W. G. and Fodor, J. D. (2001). The structural triggers learner. In S. Bertolo (ed.), Language Acquisition and Learnability. Cambridge: Cambridge University Press, 172–233. Sakas, W. G. and Fodor, J. D. (2012). Disambiguating syntactic triggers. Language Acquisition, 19(2): 83–143. Salidis, Joanna and Johnson, Jacqueline S. (1997). The production of minimal words: A longitudinal case study of phonological development. Language Acquisition, 6: 1–36. Salles, Heloisa Maria Moreira-Lima (1997). Prepositions and the Syntax of Complementation. Unpublished Ph.D. thesis, University of Wales. Salustri, M. and Hyams, N. (2003). Is there an analogue to the RI stage in the null subject languages? Proceedings of the BUCLD 26. Proceedings of the Annual Boston University Conference on Language Development, 27(2). Sommerville, MA: Cascadilla Press, 692–703. Salustri, M. and Hyams, N. (2006). Looking for the universal core of the RI stage. In V. Torrens and L. Escobar (eds), The Acquisition of Syntax in Romance Languages. Amsterdam: Benjamins, 159–82. Salverda, A. P., Dahan, D., and McQueen, J. (2003). The role of prosodic boundaries in the resolution of lexical embedding in speech comprehension. Cognition, 90: 51–89. Sampson, G. (2005). The “Language Instinct” Debate, rev edn. London: Continuum. Sander, Eric K. (1972). When are speech sounds learned? Journal of Speech and Hearing Disorders, 37: 55–63. Sanders, Nathan (2002). Preserving synchronic parallelism: Diachrony and opacity in Polish. In Proceedings of CLS 37: The Main Session. Chicago Linguistic Society. Chicago, IL, 501–16. Sano, Tetsuya (2000). Issues on unaccusatives and passives in the acquisition of japanese. In Proceedings of the Tokyo Conference on Psycholinguistics, Hituzi Sybo, 1: 1–21. Sano, T. (2004). Scope relations of QP’s and scrambling in the acquisition of Japanese. In J. van Kampen and S. Baauw (eds), Proceedings of the GALA 2003 Conference on Language Acquisition, 421–31. Sano, T. and Hyams, N. (1994). Agreement, finiteness, and the development of null arguments. Proceedings of the NorthEast Linguistics Society (NELS), 24: 543–58. Sano, Tetsuya, Endo, Mika, and Yamakoshi, Kyoko (2001). Developmental issues in the acquisition of Japanese unaccusatives and passives. In Anna H.-J. Do, Laura Domínguez,

References 931 and Aimee Johansen (eds), Proceedings of the Boston University Conference on Language Development 25. Somerville: Cascadilla Press, 668–83. Sanoudaki, E. (2003). Greek “strong” pronouns and the Delay of Principle B Effect. In M. Georgiafentis, E. Haeberli, S. Varlokosta (eds), Reading Working Papers in Linguistics. Sansavini, A., Bertoncini, J. and Giovanelli, G. (1997). Newborns discriminate the rhythm of multisyllabic stressed words. Developmental Psychology, 33: 3–11. Santelmann, L. (1995). The acquisition of verb second grammar in child Swedish. Unpublished Ph.D. thesis, Cornell University. Santelmann, L. (1998). The acquisition of verb movement and spec-head relationships in child Swedish. In D. Adger, S. Pintzuk, B. Plunket, and G. Tsoulas (eds), Specifiers: Minimalist approaches. Oxford: Oxford University Press. Santelmann, Lynn and Jusczyk, Peter (1998). Sensitivity to discontinuous dependencies in language learners: Evidence for limitations in processing space. Cognition, 69: 105–34. Santelmann, L., Berk, S., Austin, J., Someshankar, S., Lambert, K., and Lust, B. (2002). Continuity and development in the acquisition of inversion in yes/no questions. Journal of Child Language, 29(4): 813–40. Sapir, E. (1944). Grading: A study in semantics. Philosophical Science, 11: 93–116. Reprinted in D. G. Mandelbaum (ed.) (1949), Selected Writings of Edward Sapir in Language, Culture, and Personality. Berkeley: University of California, 122–49. Saporta, Sol (1955). Frequency of consonant clusters. Language, 31: 25–30. Saporta, Sol and Olson, Donald (1958). Classification of intervocalic clusters. Language, 34: 261–6. Sarma, J. (1991). The acquisition of Wh-questions in English. Unpublished Ph.D. thesis, University of Conneticut. Sarnecka, B. W., and Gelman, S. A. (2004). Six does not just mean a lot: Preschoolers see number words as specific. Cognition, 92: 329–52. Sato, Y., Sogabe, Y., and Mazuka, R. (2010). Development of hemispheric specialization for lexical pitch-accent in Japanese infants. Journal of Cognitive Neuroscience, 22: 2503–13. Saudino, K. J., Dale, P. S., Oliver, B., Petrill, S. A., Richardson, V., Rutter, M., Simonoff, E., Stevenson, J., and Plomin, R. (1998). The validity of parent-based assessment of cognitive abilities of 2-year-olds. British Journal of Developmental Psychology, 16: 349–63. Sauerland, U. (2004). Scalar implicatures in complex sentences. Linguistics and Philosophy, 27: 367–91. Savage, C., Lieven, E., Theakston, A., and Tomasello, M. (2003). Testing the abstractness of children’s linguistic representations: Lexical and structural priming of syntactic constructions in young children. Developmental Science, 6: 557–67. Savage, C., Lieven, E., Theakston, A., and Tomasello, M. (2006). Structural priming as implicit learning in language acquisition: The persistence of lexical and structural priming in 4-year- olds. Language Learning and Development, 2: 27–49. Sawada, Naoko and Murasugi, Keiko (2011). A cross-linguistic approach to the “erroneous” genitive subjects: Underspecification of tense in child grammar revisited. In Mihaela Pirvulescu, María Cristina Cuervo, Ana Teresa Pérez-Leroux, Jeffrey Steele, and Nelleke Strik (eds), Selected Proceedings of the 4th Conference on Generative Approaches to Language Acquisition North America (GALANA 2010). Somerville, MA: Cascadilla Proceedings Project, 241–8. Say, T. and Clahsen, H. (2001). Words, rules and stems in the Italian mental lexicon. In S. Nooteboom, F. Weerman, and F. Wijnen (eds), Storage and Computation in the Language Faculty. Dordrecht, Kluwer.

932 References Scerri, T. S. and Schulte-Korne, G. (2010). Genetics of developmental dyslexia. European Child and Adolescent Psychiatry, 19: 179–97. Schaeffer, J. (1997). On the acquisition of object placement in Dutch and Italian. In A. Sorace, C. Heycock, and R. Shillcock (eds), Proceedings of GALA ’97 Conference on Language Acquisition, Edinburgh: HCRC. Scharinger, Mathias, Monahan, Philip J., and Idsardi, William J. (2012). Asymmetries in the processing of vowel height. Journal of Speech, Language and Hearing Research, 55: 903–19. Scherf, K. S. (2005). Infant event representations and the acquisition of verb argument structure. Poster presented at the Biennial Meeting of the Society for Research in Child Development, Atlanta, GA. Scherf, K. S. and Gordon, P. (1998). Where does argument structure come from? Poster presented at the Biennial Meeting of the Society for Research in Child Development, Albuquerque, NM. Scherf, K. S. and Gordon, P. (2000). Precursors to argument structure in infants’ event representations. Poster presented at the International Conference on Infant Studies, July 2000, Brighton, England. Schick, B., de Villiers, P. A., de Villiers, J. G., and Hoffmeister, R. (2007). Language and theory of mind: A study of deaf children. Child Development, 78: 376–96. Schiller, Niels O. (1998). The effect of visually masked syllable primes on the naming latencies of words and pictures. Journal of Memory and Language, 39: 484–507. Schiller, Niels O. (1999). Masked syllable priming of English nouns. Brain and Language 68: 300–5. Schiller, Niels O. (2000). Single word production in English: The role of subsyllabic units during phonological encoding. Journal of Experimental Psychology, 26: 512–28. Schlenker, P. (2005). Minimize restrictors! Notes on definite descriptions, Condition C and epithets. Manuscript, UCLA. Schmitt, Cristina (1996). Aspect and the syntax of noun phrases. Unpublished Ph.D. thesis, University of Maryland. Schmitz, K. and Müller, N. (2008). Strong and clitic pronouns in the monolingual and bilingual acquisition of French and Italian. Bilingualism: Language and cognition, 11: 19–4 1. Schmitz, K., Patuto, M., and Müller, N. (2012). The null-subject parameter at the interface between syntax and pragmatics: Evidence from bilingual German-Italian, German-French and Italian-French children. First Language, 32: 205–38. Schmitz, M., Höhle, B. and Weissenborn, J. (2003). How pause length influences the perception of major syntactic boundaries in 6-month-old German infants. Paper presented at the conference of Generative Approaches to Language Acquisition (GALA). Groningen. Scholl, B. (2001). Objects and attention: the state of the art. Cognition, 80: 1–46. Scholnick, E. and Wing, C. (1995). Logic in conversation: Comparative studies of deduction in children and adults. Cognitive Development, 10: 319–345. Schuele, C. M. and Nicholls, L. M. (2000). Relative clauses: Evidence of continued linguistic vulnerability in children with specific language impairment. Clinical Linguistics and Phonetics, 14(8): 563–85. Schuele, C. M. and Tolbert, L. (2001). Omissions of obligatory relative markers in children with specific langauge impairment. Clinical Linguistics and Phonetics, 15(4): 257–74. Schulz, P. (2003). Factivity: Its nature and acquisition. Tubingen: Niemeyer.

References 933 Schulz, Petra, and Penner, Zvi (2002). How you can eat the apple and have it too: Evidence from the acquisition of telicity in German. In J. Costa and M. J. Freitas (eds), Proceedings of the GALA 2001 Conference on Langyage Acquisition. Schulz, Petra, and Wittek, Angelika (2003). Opening doors and sweeping floors: What children with Specific Language Impairment know about telic and atelic verbs. In B. Beachley, A. Brown, and F. Colin (eds), Proceedings of the 27th BUCLD. Somerville, MA: Cascadilla Press, 727–38. Schütze, C. T. (1999). Different rates of pronoun case error: Comments on Rispoli (1998). Journal of Child Language, 26(3): 749–55. Schütze, C. T. R. (1997). INFL in child and adult language: Agreement, case and licensing. Unpublished Ph.D. thesis, MIT. Schütze, Carson (2001a). On the nature of default case. Syntax, 4(3): 205–38. Schütze, Carson (2001b). Productive inventory and case/agreement contingencies: A methodological note on Rispoli (1999). Journal of Child Language, 28: 507–15. Schütze, Carson (2009). What it means (not) to know (number) agreement. In José M. Brucart, Anna Gavarró, and Jaume Solà (eds), Merging Features: Computation, interpretation, and acquisition. Oxford: Oxford University Press, 80–103. Schütze, C. (2010). The status of nonagreeing don’t and theories of root infinitives. Language Acquisition, 17: 235–7 1. Schütze, C. T. and Wexler, K. (1996a). Subject case licensing and English root infinitives. In Andy Stringfellow, Dalia Cahana- Amitay, Elizabeth Hughes, and Andrea Zukowski (eds), Proceedings of the Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 20(2): 670–81. Schütze, C. and Wexler, K. (1996b). What case acquisition data have to say about the components of INFL. Paper presented at the WHCSALT Conference. Schwartz, R. G. and Goffman, L. (1995). Metrical patterns of words and production accuracy. Journal of Speech and Hearing Research, 38: 876–88. Schwartz, Richard G. and Leonard, Laurence B. (1982). Do children pick and choose? An examination of phonological selection and avoidance. Journal of Child Language, 9: 319–36. Schwarzschild, R. (2006). The role of dimensions in the syntax of noun phrases. Syntax, 9: 67–110. Schwarzschild, R. (2008). The semantics of comparatives and other degree constructions. Language and Linguistics Compass, 2: 308–31. Schwarzschild, R. and Wilkinson, K. (2002). Quantifiers in comparatives: A semantics of degree based on intervals. Natural Language Semantics, 10: 1–41. Scobbie, James M. (1998). Interactions between the acquisition of phonetics and phonology. In M. Catherine Gruber, Derrick Higgins, Kenneth S. Olson, and Tamra Wysocki (eds), Proceedings from the Panels of the Chicago Linguistic Society’s Thirty-fourth Meeting. Chicago: Chicago Linguistic Society, 34(2): 343–58. Scobbie, James M., Gibbon, Fiona E., Hardcastle, William J., and Fletcher, Paul J. (1997). Covert contrast and the acquisition of phonetics and phonology. In W. Ziegler and K. Deger (eds), Clinical Phonetics and Linguistics. London: Whurr Publishers, 147–56. Scobbie, James M., Gibbon, Fiona, Hardcastle, William J., and Fletcher, Paul (2000). Covert contrast as a stage in the acquisition of phonetics and phonology. In Michael B. Broe and Janet B. Pierrehumbert (ed.), Papers in Laboratory Phonology V: Language acquisition and the lexicon. Cambridge: Cambridge University Press, 194–207.

934 References Scott, R. M. and Fisher, C. (2009). 2-year-olds use distributional cues to interpret transitivity- alternating verbs. Language and Cognitive Processes, 24: 777–803. Sebastián-Gallés, Núria and Bosch, Laura (2002). Building phonotactic knowledge in bilinguals: Role of early exposure. Journal of Experimental Psychology: Human perception and performance, 28: 974–89. Segal, N. L. (1985). Monozygotic and dizygotic twins: A comparative analysis of mental ability profiles. Child Development, 56: 1051–8. Seidel, A., Hollich, G., and Jusczyk, P. (2003). Early comprehension of subject and object wh- questions. Infancy, 4(3): 423–36. Seidl, A. (2007). Infants’ use and weighting of prosodic cues in clause segmentation. Journal of Memory and Language, 57: 24–48. Seidl, Amanda and Buckley, Eugene (2005). On the learning of arbitrary phonological rules. Language Learning and Development, 1: 289–316. Seidl, A. and Johnson, E. K. (2006). Infant word segmentation revisited: Edge alignment facilitates target extraction. Developmental Science, 9: 565–73. Seidl, A. and Johnson, E. K. (2008). Boundary alignment enables 11-month-olds to segment vowel initial words from speech. Journal of Child Language, 35: 1–24. Seidl, Amanda, Cristià, Alejandrina, Bernard, Amelie, and Onishi, Kristine H. (2009). Allophonic and phonemic contrasts in infants’ learning of sound patterns. Language Learning and Development, 5: 191–202. Selkirk, Elisabeth O. (1982). The syllable. In H. van der Hulst and N. Smith (eds), The Structure of Phonological Representations. Dordrecht: Foris, 2: 337–83. Selkirk, Elizabeth (1984). On the major class features and syllable theory. In Mark Arnoff and Richard T. Oehrle (eds), Language and Sound Structure. Cambridge, MA: MIT Press, 107–36. Selkirk, Elizabeth O. (1995). The prosodic structure of function words. In J. L. Morgan and K. Demuth (eds), Signal to Syntax: Bootstrapping from speech to grammar in early acquisition. Hillsdale, NJ: Erlbaum, 187–214. Selkirk, Elizabeth O. (2011). The syntax–phonology interface. In J. Goldsmith, J. Riggle, and A. Yu (eds), The Handbook of Phonological Theory, 2nd edn. Malden MA: Wiley-Blackwell, 435–84. Semel, E., Wigg, E., and Secord, W. (1987). Clinical Evaluation of Language Fundamentals- Revised. San Antonio, TX: The Pscyhological Corporation. Senghas, Ann, Kim, John J., Pinker, Steven., and Collins, Christopher (1991). Plurals-inside- compounds: Morphological constraints and their implications for acquisition. Paper presented at the Sixteenth Annual Boston University Conference on Language Development, Boston, MA, October 1991. Serratrice, L. (2005). The role of discourse pragmatics in the acquisition of subjects in Italian. Applied Psycholinguistics, 26: 437–62. Serratrice, L. (2007). Cross-linguistic influence in the interpretation of anaphoric and cataphoric pronouns in English–Italian bilingual children. Bilingualism: Language and cognition, 10: 225–38. Serratrice, L. and Sorace, A. (2002). Overt and null subjects in monolingual and bilingual Italian acquisition. In B. Beachley, A. Brown, and F. Conlin (eds), Proceedings of the 27th Boston University Conference on Child Language Development. Somerville, MA: Cascadilla Press, 739–50.

References 935 Serratrice, L., Sorace, A., and Paoli, S. (2004). Crosslinguistic influence at the syntax–pragmatics interface: Subjects and objects in English–Italian bilingual and monolingual acquisition. Bilingualism: Language and cognition, 7: 183–205. Seuren, P. (1978). The structure and selection of positive and negative gradable adjectives. In D. Farkas, W. M. Jacobsen, and K. W. Todrys (eds), Papers from the Parasession on the Lexicon, 14th Regional Meeting of the Chicago Linguistic Society. Chicago, IL: Chicago Linguistic Society, 336–46. Seymour, H., Roeper, T. and de Villiers, J. (2005). The DELV-NR. (Norm-referenced version): The Diagnostic Evaluation of Language Variation. San Antonio: The Psychological Corporation. Shaffer, T. M. and Ehri, L. C. (1980). Seriators’ and non-seriators’ comprehension of comparative adjective forms. Journal of Psycholinguistic Research, 9: 187–204. Shattuck-Hufnagel, S. and Turk, A. E. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, 25: 193–247. Shatz, M. (1983). Communication. In P. H. Mussen (series ed.), and J. Flavell and E. Markman (vol. eds). Handbook of Child Psychology: Vol. 3. Cognitive development 4th edn. New York: Wiley, 3: 841–889. Shatz, M. and S. Wilcox (1991). Constraints on the acquisition of English modals. In S. Gelman and J. Byrnes (eds), Perspecives on Langugae and Thought. New York: Cambridge University Press, 319–53. Shatz, M., Wellman, H., and Silber, S. (1983). The acquisition of mental verbs: A systematic investigation of first references to mental state. Cognition, 14: 301–21. Shatz, M., Billman D., and Yaniv, I. (1986). Early occurrences of English auxiliaries in children’s speech. Unpublished manuscript, University of Michigan, Ann Arbor. Shepherd, S. (1982). From deontic to epistemic: an analysis of modals in the history of English, creoles, and language acquisition. In A. Ahlqvist (ed.), Papers from the 5th International Conference on Historical Linguistics. Amsterdam: Benjamins, 316–23. Sherman-Cohen, J. and Lust, B. (1993). Children are in control. Cognition, 46: 1–51. Shi, L., Griffiths, T., Feldman, N., and Sanborn, A. (2010). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin and Review, 17(4): 443–64. Shi, R., and Gauthier, B. (2005). Recognition of function words in 8-month-old French- learning infants. Journal of the Acoustical Society of America, 117: 2426–7. Shi, R., and Lepage, M. (2008). The effect of functional morphemes on word segmentation in preverbal infants. Developmental Science, 11: 407–13. Shi, R., and Melançon, A. (2010). Syntactic categorization in French-learning infants. Infancy, 15: 517–33. Shi, R., Morgan, J. L., and Allopenna, P. (1998). Phonological and acoustic bases for earliest grammatical category assignment: A cross-linguistic perspective. Journal of Child Language, 25: 169–201. Shi, R., Cutler, A., Werker, J., and Cruickshank, M. (2006a). Frequency and form as determinants of functor sensitivity in English-acquiring infants. Journal of the Acoustical Society of America, 119: EL61–EL66. Shi, R., Werker, J. F., and Cutler, A. (2006b). Recognition and representation of function words in English-learning infants. Infancy, 10: 187–98. Shieber, Stuart (1985). Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8: 333–43.

936 References Shimada, Hiroyuki and Sano, Tetsuya (2007). A-chains and unaccusative–unergative distinction in the child grammar: The acquisition of Japanese Te-iru constructions. In Alyona Belikova, Luisa Meroni, and Mari Umeda (eds), Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America (GALANA). Somerville, MA: Cascadilla Proceedings Project, 386–93. Shimoyama, J. (to appear). Clausal comparatives and cross-linguistic variation. In S. Lima, K. Mullin, and B. Smith (eds), 39th Meeting of the North East Linguistic Society. Amherst, MA: GLSA, University of Massachusetts, Amherst. Shimpi, P., Gamez, P., Huttenlocher, J., and Vasilyeva, M. (2007). Syntactic priming in 3-and 4-year-old children: Evidence for abstract representations of transitive and dative forms. Developmental Psychology, 43: 1334–46. Shirai, Yasuhiro (1998). The acquisition of tense–aspect marking in Japanese as a second language. Language Learning, 48: 245–79. Shirai, Yasuhiro and Andersen, Roger W. (1995). The acquisition of tense/aspect morphology: A prototype account. Language, 71:743–62. Shirai, Yasuhiro, Slobin, Dan, and Weist, Richard (1998). Introduction to the acquisition of tense-aspect morphology. First Language, 245–53. Shlonsky, U. (2009). Hebrew as a partial null-subject language. Studia Linguistica, 63: 133–57. Shukla, M., Nespor, M., and Mehler, J. (2007). An interaction between prosody and statistics in the segmentation of fluent speech. CognitivePsychology, 54: 1–32. Shultz, T. R. and Gerken, L. A. (2005). A model of infant learning of word stress. In Proceedings of the 27th Annual Conference of the Cognitive Science Society, 2015–20. Shulze, C., Grassman, S., and Tomasello, M. (2010). Relevance inferences in 3-year-olds. Proceedings from the 35th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Shvachkin, N. Kh. (1948/1973). The development of phonemic speech perception in early childhood. Reprinted (1973) in Charles A. Ferguson and Dan I. Slobin (eds), Studies of Child Language Development. Holt, Rinehart and Winston. New York, 91–127. Siegel, Dorothy (1979). Topics in English Morphology. New York: Garland Press. [Unpublished Ph.D. thesis, MIT, Cambridge, Massachusetts, 1974.] Siewierska, A. (1984). The Passive: A Comparative Linguistic Analysis. London: Croom Helm. Sigurðsson, H. Á. (2011). Conditions on argument drop. Linguistic Inquiry, 42: 267–304. Sigurðsson, H. Á. and Egerland, V. (2009). Impersonal null-subjects in Icelandic and elsewhere. Studia Linguistica, 63: 158–85. Sigurjónsdóttir, S. (1992). Binding in Icelandic: Evidence from language acquisition. Unpublished Ph.D. thesis, University of California. Sigurjónsdóttir, S. (1999). Root infinitives and nulls subjects in early Icelandic. Proceedings of BUCLD, 23: 630–41. Sigurjónsdóttir, S. (2005). The different properties of root infinitives and finite verbs in the acquisition of Icelandic. In A. Brugos, M. R. Clark-Cotton, and S. Ha (eds), Proceedings of the 29th Annual Boston Conference on Language Development. Boston, MA: Cascadilla Press, 2: 540–51. Sigurjónsdóttir, S. and Coopmans, P. (1996). The acquisition of anaphoric relations in Dutch. In W. Philip and F. Wijnen (eds), Amsterdam Series in Child Language Development. Amsterdam: Instituut Algemene Taalwetenschap, 5: 68. Siloni, T. (1996). Hebrew noun phrases: Generalized noun raising. In A. Belletti and L. Rizzi (eds), Parameters and Functional Heads. Essays in Comparative Syntax. Oxford: Oxford University Press, 239–67.

References 937 Silva-Corvalán, C. (1977). A discourse study of the Spanish spoken by Mexican-Americans in West Los Angeles. Unpublished Ph.D. thesis, University of California. Silva-Corvalán, C. and Sánchez-Walker, N. (2007). Subjects in early dual language development: A case study of a Spanish–English bilingual child. In K. Potowski and R. Cameron (eds), Spanish in Contact: Policy, social, and linguistic inquiries. Amsterdam: John Benjamins, 3–22. Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J. (1992). ToBI: A standard for labeling English prosody. In Proceedings of the 1992 International Conference on Spoken Language Processing, 867–70. Simmons, T. R., Flax, J. F., Azaro, M. A., Hayter, J. E., Justice, L. M., Petrill, S. A., Bassett, A. S., Tallal, P., Brzustowicz, L. M., and Bartlett, C. W. (2010). Increasing genotype-phenotype model determinism: Application to bivariate reading/language traits and epistatic interaction in language-impaired families. Human Heredity, 70: 232–44. Sinclair de Zwart, H. (1967). Acquisition du language et developpement de la pensée. Paris: Dunod. Sinclair de Zwart, H. (1969). Developmental psycholinguistics. In D. Elkind and J. H. Flavell (eds), Studies in Cognitive Development: Essays in honor of Jean Piaget. New York: Oxford University Press, 315–36. Sipser, Michael (1997). Introduction to the Theory of Computation. Boston, MA: PWS Publishing Company. Siqueland, Einar R. and DeLucia, Clement A. (1969). Visual reinforcement of nonnutritive sucking in human infants. Science, 165: 1144–6. Skarabela, B. (2006). The role of social cognition in early syntax: The case of joint attention in argument realization in child Inuktitut. Unpublished Ph.D. thesis, Boston University. Skordos, D. and Papafragou, A. (under review) Children’s derivation of scalar implicatures: Alternatives and relevance. Skoruppa, K., Pons, F., Christophe, A., Bosch, L., Dupoux, E., Sebastián-Gallés, N., Alves Limissuri, R., and Peperkamp, S. (2009). Language-specific stress perception by 9-month- old French and Spanish infants. Developmental Science, 12: 914–19. Skoruppa, K., Cristià, A., Peperkamp, S., and Seidl, A. (2011). English-learning infants’ perception of word stress pattern. Journal of the Acoustical Society of America, 130: EL50–EL55. Skoruppa, K., Mani, N. and Peperkamp, S. (2013). Toddlers’ processing of phonological alternations: Early compensation for assimilation in English and French. Journal of Child Development, 84(1): 313–30. Skousen, Royal (1989). Analogical Modeling of Language. Dordrecht: Kluwer Academic Press. Skousen, Royal, Lonsdale, Deryle, and Parkinson, Dilworth B. (eds) (2002). Analogical Modeling. An Exemplar-based Approach to Language. Amsterdam: John Benjamins. Slabakova, Roumyana (2001). Telicity in the Second Language. Amsterdam: John Benjamins. Slabakova, Roumyana (2002). Recent research on the acquisition of aspect: An embarrassment of riches? Second Language Research, 18: 172–88. Slabakova, R. and Montrul, S. (2007). L2 acquisition at the grammar–discourse Interface: Aspectual shifts in L2 Spanish. In J. Liceras, H. Zobl, and H. Goodluck (eds), The Role of Features in Second Language Acquisition. Mahwah, NJ: Lawrence Erlbaum. Sledd, James H. (1966). Breaking, Umlaut, and the Southern Drawl. Language, 42: 18–41. The SLI Consortium (2002). A genome-wide scan identifies two novel loci involved in specific language impairment. American Journal of Human Genetics, 70: 384–98.

938 References The SLI Consortium (2004). Highly significant linkage to the SLI1 locus in an expanded sample of individuals affected by specific language impairment in kindergarten children. American Journal of Human Genetics, 74: 1225–38. Slobin, Dan (1966). Grammatical transformations and sentence comprehension in childhood and adulthood. Journal of Verbal Learning and Verbal Behavior, 5: 219–27. Slobin, D. I. (1968). Recall of full and truncated passive sentences in connected discourse. Journal of Verbal Learning and Verbal Behavior, 7: 876–81. Slobin, Dan (1973). Cognitive prerequisites for the development of grammar. In C. A. Ferguson and D. Slobin (eds), Studies of Child Language Development. New York: Holt, Rinehart and Winston, 175–208. Slobin, Dan (1982). Universal and particular in the acquisition of language. In E. Wanner and L. Gleitman (eds), Language Acquisition: The state of the art. Cambridge, MA: Cambridge University Press, 128–170. Slobin, Dan (1985). Cross-linguistic evidence for the language-making capacity. In D. Slobin (ed.), The Cross-linguistic Study of Language Acquisition, Vol. 2: Theoretical issues. Hillsdale, NJ: Lawrence Erlbaum, 2: 1157–249. Slobin, D. I. (1994). Passives and alternatives in children’s narratives in English, Spanish, German, and Turkish. In B. Fox and P. Hopper (eds), Voice: Form and function. Amsterdam/ Philadelphia: John Benjamins, 341–64. Slobin, Dan (1997). The universal, the typological and the particular in language acquisition. In D. Slobin (ed.), The Crosslinguistic Study of Language Acquisition, Vol. 5: Expanding the contexts. Mahwah, NJ: Lawrence Erlbaum, 5: 1–39. Slobin, Dan (2008). The child learns to think for speaking: Puzzles of crosslinguistic diversity in form–meaning mappings. Studies in Language Sciences, 7: 1–13. Sluijter, A. M. C. (1995). Phonetic Correlates of Stress and Accent. The Hague: Holland Academic Graphics. Sluijter, A. M. C. and van Heuven, V. J. (1995). Spectral balance as an acoustic correlate of linguistic stress. Journal of Acoustical Society of America, 100: 2471–84. Smit, Ann B. (1993a). Phonologic error distributions in the Iowa-Nebraska Articulation Norms Project: Consonant singletons. Journal of Speech and Hearing Research, 36: 533–47. Smit, Ann B. (1993b). Phonologic error distributions in the Iowa-Nebraska articulation norms project: Word-initial consonant clusters. Journal of Speech and Hearing Research, 36: 931–47. Smit, Ann B., Hand, Linda, Freilinger, J. Joseph, Bernthal, John E., and Bird, Ann (1990). The Iowa articulation norms project and its Nebraska replication. Journal of Speech and Hearing Disorders, 55: 779–98. Smith, Carlota S. (1980a). The acquisition of time talk: Relations between child and adult grammars. Journal of Child Language, 7: 263–278. Smith, C. L. (1980b). Quantifiers and question answering in young children. Journal of Experimental Child Psychology, 30: 191–205. Smith, Carlota (1991. The Parameter of Aspect. Dordrecht: Kluwer. Smith, Jennifer L. (1999). Noun faithfulness and accent in Fukuoka Japanese. In Sonya Bird, Andrew Carnie, Jason D. Haugen, and Peter Norquest (eds), Proceedings of WCCFL18. Somerville, MA: Cascadilla Press, 519–31. Smith, Jennifer (2000). Positional faithfulness and learnability in Optimality Theory. In Rebecca Daly and Anastasia Riehl (eds), Proceedings of ESCOL 99. Ithaca, NY: CLC Publications, 203–14.

References 939 Smith, Jennifer L. (2011). Category-specific effects. In Marc van Oostendorp, Colin Ewen, Beth Hume, and Keren Rice (eds), The Blackwell Companion to Phonology. Malden, MA: Wiley- Blackwell, 2439–63. Smith, L., and Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition, 106: 1558–68. Smith, L. B., Cooney, N., and McCord, C. (1986). What is high? The development of reference points for high and low. Child Development, 57: 583–602. Smith, L. B., Rattermann, M. J., and Sera, M. (1988). Higher and lower: Comparative and categorical interpretations by children. Cognitive Development, 3: 341–57. Smith, Neil V. (1973). The Acquisition of Phonology: A case study. Cambridge: Cambridge University Press. Smith, Neil (2010). Acquiring Phonology: A cross- generational case study. Cambridge: Cambridge University Press. Smith, S. D., Kimberling, W. J., Pennington, B. F. and Lubs, H. A. (1983). Specific reading disability: Identification of an inherited form through linkage analysis. Science, 219(4590): 1345–7. Smith, S. D., Girgorenko, E., Willcutt, E., Pennington, B. F., Olson, R. K., DeFries, J. C. (2010). Etiologies and molecular mechanisms of communication disorders. Journal of Developmental and Behavioral Pedriatics, 31: 555–63. Smoczyńska, M. (1985). The acquisition of Polish. In Dan I. Slobin (ed.), The Crosslinguistic Study of Language Acquisition I: The data. Hillsdale, NJ: Lawrence Erlbaum, 595–686. Smolensky, Paul (1996a). The initial state and “Richness of the Base” in Optimality Theory. Rutgers Optimality Archive, ROA-154. Smolensky, Paul (1996b). On the comprehension/production dilemma in child language. Linguistic Inquiry, 27: 720–31. Smolensky, Paul and Géraldine Legendre (2006). The Harmonic Mind : From neural computation to optimality-theoretic grammar. Cambridge, MA: MIT Press. Smolensky, P., Legendre, G., and Miyata, Y. (1992). Principles for an integrated connectionist/symbolic theory of higher cognition, Report No. CU-CS-600–92. Computer Science Department, University of Colorado at Boulder. Snedeker, J. and Gleitman, L. (2004). Why it is hard to label our concepts. In D. G. Hall and S. R. Waxman (eds), Weaving a Lexicon. Cambridge, MA: MIT Press, 257–94. Snedeker, J. and Trueswell, J. C. (2004). The developing constraints on parsing decisions: The role of lexical-biases and referential scenes in child and adult sentence processing. Cognitive Psychology, 493: 238–99. Sneed German, Elisa (2008). Input in the acquisition of genericity. Unpublished Ph.D. thesis, Northwestern University. Snow, D. and Balog, H. L. (2002). Do children produce the melody before words? A review of developmental intonation research. Lingua, 112: 1025–58. Snyder, William (1995). Language acquisition and language variation: The role of morphology. Unpublished Ph.D. thesis, MIT. Snyder, William (1999). Compounds and complex predicates in English, Basque, and Romance. Poster presented at the 1999 Meeting of the International Association for the Study of Child Language, San Sebastian, Spain, 16 July 1999. Text available at . Snyder, W. (2001). On the nature of syntactic variation: Evidence from complex predicates and complex word-formation. Language, 77: 324–42.

940 References Snyder, William (2005). Motion predicates and the compounding parameter: A new approach. Paper presented in the Linguistics Colloquium Series, University of Maryland, College Park. Snyder, W. (2007). Child Language: The parametric approach. Oxford: Oxford University Press. Snyder, William (2008). Children’s grammatical conservatism: Implications for linguistic theory. In Tetsuya Sano et al. (eds), An Enterprise in the Cognitive Science of Language: A Festschrift for Yukio Otsu. Tokyo: Hituzi Syobo, 41–51. Snyder, William (2011). Children’s grammatical conservatism: Implications for syntactic theory. In Nick Danis, Kate Mesh, and Hyunsuk Sung (eds), Proceedings of the 35th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 1–20. Snyder, William (2012). Parameter theory and motion predicates. In Violeta Demonte and Louise McNally (eds), Telicity, Change, and State: A cross-categorial view of event structure. Oxford: Oxford University Press, 279–99. Snyder, William and Chen, Deborah (1997). The syntax–morphology interface in the acquisition of French and English. In K. Kusumoto (ed.), Proceedings of NELS 27 (North East Linguistic Society). Amherst, MA: GLSA Snyder, W., and Hyams, N. (2015). Mimimality effects in childrens’ passives. In E. Di Domenico, C. Hamann, and S. Matteini (eds), Structures, Strategies and Beyond. Amsterdam: John Benjamins. Snyder, W. and Roeper, T. (2004). Learnability and recursion across categories. In Proceedings of the 28th Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Snyder, W. and Stromswold, K. (1997). The structure and acquisition of English dative constructions. Linguistic Inquiry, 28: 281–317. Snyder, W., Hyams, N., and Crisma, P. (1995). Romance auxiliary selection with reflexive clitics: Evidence for early knowledge of unaccusativity. In E. Clark (ed.), Proceedings of the Twenty-sixth Annual Child Language Research Forum. Stanford, CA: CSLI. Snyder, William, Felber, Sarah, Kang, Bosook, and Lillo-Martin, Diane (2001). Path phrases and compounds in the acquisition of English. Paper presented at the 26th Boston University Conference on Language Development, Boston University. Snyder, William, Lillo-Martin, Diane, and Naigles, Letitia (2014). The compounding parameter: New evidence from IPL. Unpublished manuscript, University of Connecticut, Storrs, 1 September 2014. So, L. K. H. and Dodd, B. J. (1995). The acquisition of phonology by Cantonese-speaking children. Journal of Child Language, 22: 473–95. Soderstrom, Melanie, Wexler, Ken, and Jusczyk, Peter W. (2002). English-learning toddlers’ sensitivity to agreement morphology in receptive grammar. In B. Skarahela, S. Fish, and A. H.-J. Do (eds), Proceedings of the 26th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 643–52. Soderstrom, M., Seidl, A., Kemler Nelson, D. G., and Jusczyk, P. (2003). The prosodic bootstrapping of phrases: Evidence from prelinguistic infants. Journal of Memory and Language, 49: 249–67. Soderstrom, M., Kemler Nelson, D. G., and Jusczyk, P. W. (2005). Six-months-olds recognize clauses embedded in different passages of fluent speech. Infant Behavior and Development, 28: 87–94.

References 941 Soderstrom, Melanie, Mathis, Don, and Smolensky, Paul (2006). Abstract genomic encoding of Universal Grammar in Optimality Theory. In Paul Smolensky and Geraldine Legendre (eds), The Harmonic Mind: From neural computation to optimality-theoretic grammar. Cambridge, MA: MIT Press, 403–7 1. Soderstrom, Melanie, White, Katherine, Conwell, Eric, and Morgan, James (2007). Receptive grammatical knowledge of familiar content words and inflection in 16-month olds. Infancy, 12(1): 1–29. Solomonoff, Ray J. (1978). Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transactions on Information Theory, 24: 422–32. Sommerstein, Alan (1974). On phonotactically motivated rules. Journal of Linguistics, 10: 71–94. Song, Jae Yung, Sundara, Megha, and Demuth, Katherine (2009). Phonological constraints on children’s production of English third person singular -s. Journal of Speech, Language, and Hearing Research, 52: 623–42. Sorace, A. (1995). Acquiring linking rules and argument structures in a second language. The unaccusative/unergative distinction. In L. Eubank, L. Selinker, and M. S. Sharwood (eds), The Current State of Interlanguage. Amsterdam: Benjamins. Sowalsky, E., Hacquard, V., and Roeper, T. (2009). PP opacity. In J. Crawford et al. (eds), Proceedings of the 3rd Conference on Generative Approaches to Language Acquisition North America (GALANA 2008). Somerville, MA: Cascadilla Press, 253–61. Spector, B. (2007). Aspects of the pragmatics of plural morphology: On higher-order implicatures. In U. Sauerland and P. Stateva (eds), Presupposition and Implicature in Compositional Semantics. New York: Palgrave Macmillan, 71–120. Spenader, J., Smits, E. J., and Hendriks, P. (2009). Coherent discourse solves the Pronoun Interpretation Problem. Journal of Child Language, 36: 23–52. Spencer, Andrew (1986). Towards a theory of phonological development. Lingua, 68: 3–38. Spencer, Andrew (1991). Morphological Theory. Oxford: Blackwell. Sperber, D. and Wilson, D. (1986). Relevance: Communication and cognition (2nd edn. 1995). Cambridge, MA: Harvard University Press. Sperling, G. (1960). The information available in brief visual presentations, Psychological Monographs, 74(11): 1–29. Spinath, F. M., Price, T. S., Dale, P. S., and Plomin, R. (2004). The genetic and environmental origins of language disability and ability. Child Development, 75(2): 445–54. Spoehr, Kathryn T. and Smith, Edward E. (1973). The role of syllables in perceptual processing. Cognitive Psychology, 5: 71–89. Sportiche, D. (1992). Clitic Constructions. Unpublished manuscript, University of California, Los Angeles, California. Spring, D. R. and Dale, P. (1977). Discrimination of linguistic stress in early infancy. Journal of Speech and Hearing Research. 20: 224–32. Stabler, Edward P. (2009). Computational models of language universals. In Christiansen, Collins, and Edelman (eds). Language Universals. Oxford: Oxford University Press, 200–23. Stager, C. L. and Werker, J. F. (1997). Infants listen for more phonetic detail in speech perception than in word-learning tasks. Nature, 388: 381–2. Stalnaker, R. (1978). Assertion. In P. Cole (ed.), Syntax and Semantics. New York: Academic Press, 9: 315–22.

942 References Stampe, David (1969). The acquisition of phonetic representation. In Robert I. Binnick, Alice Davison, Georgia M. Green, and Jerry L. Morgan (eds), Papers from the Fifth Regional Meeting of the Chicago Linguistic Society. Chicago: Chicago Linguistic Society, 443–54. Stampe, David (1979). A Dissertation on Natural Phonology. New York: Garland. Stathopoulou, N. (2007). Producing relative clauses in Greek: Evidence from down syndrome. Essex Graduate Student Papers in Language and Linguistics, 9: 104–25. Steedman, M. (1997). Surface Structure and Interpretation. Cambridge, MA: MIT Press. Stein, C. M., Schick, J. H., Taylor, H. G., Shriberg, L. D., Millard, C., Kundtz-Kluge, A., Russo, K., Minich, N., Hansen, A., Freebarin, L. A., Elston, R. C., Lewis, B. A., and Iyengar, S. K. (2004). Pleiotropic effects of a chromosome 3 locus on speech-sound disorder and reading. American Journal of Human Genetics, 74: 283–97. PMCID: PMC 1181926. Stemberger, Joseph P. (1993). Vowel dominance in overregularization. Journal of Child Language, 20: 503–21. Stemberger, Joseph and Bernhardt, Barbara (1997). Syntactic development limited by phonological development. Paper presented at the 38th meeting of the Psychonomic Society, Philadelphia, PA. Stemberger, Joseph and Stoel-Gammon, Carol (1991). The underspecification of coronals: Evidence from language acquisition and performance errors. In Carole Paradis and Jean- François Prunet (eds), The Special Status of Coronals: Internal and external evidence. San Diego: Academic Press, 181–99. Stephany, Ursula (1981). Verbal grammar in Modern Greek early child language. In P. S. Dale and D. Ingram (eds), Child Language: An international perspective. Baltimore, MD: University Park Press. Stephany, U. (1986). Modality. In Paul Fletcher and Michael Garman (eds), Language Acquisition. Cambridge: Cambridge University Press. Stephany, U. (1995). The acquisition of Greek. In D. I. Slobin (ed.), The Crosslinguistic Study of Language Acquisition, vol. 4. Hillsdale, NJ: Lawrence Erlbaum. Steriade, Donca (1982). Greek prosodies and the nature of syllabification. Unpublished Ph.D. thesis, MIT. Steriade, Donca (1999). Alternatives to the syllabic interpretation of consonantal phonotactics. In Osama Fujimura, Brian D. Joseph, and Bohumil Palek (eds), Proceedings of the 1998 Linguistics and Phonetics Conference. Prague: The Karolinum Press, 205–42. Steriade, Donca (2000). Paradigm uniformity and the phonetics–phonology boundary. In M. Broe and J. Pierrehumbert (eds), Papers in Laboratory Phonology 5. Cambridge: Cambridge University Press. Stern, C. and Stern, W. (1907). Die Kindersprache: Eine psychologische und sprachtheoretische Untersuchung. Leipzig: Barth. Stock, Haily, Graham, Susan, and Chambers, Craig (2009). Generic language and speaker confidence guide preschoolers’ inferences about novel animate kinds. Developmental Psychology, 45: 884–8. Stoel-Gammon, Carol (1983). Constraints on consonant-vowel sequences in early words. Journal of Child Language, 10: 455–7. Stoel-Gammon, Carol (1985). Phonetic inventories, 15–24 months: A longitudinal study. Journal of Speech and Hearing Research, 28: 505–12. Stoel-Gammon, Carol (1996). On the acquisition of velars in English. In Barbara Bernhardt, John Gilbert, and David Ingram (eds), Proceedings of the UBC International Conference on Phonological Acquisition. Somerville, MA: Cascadilla Press, 201–14.

References 943 Stoel-Gammon, Carol (2011). Relationships between lexical and phonological development in young children. Journal of Child Language, 38: 1–34. Stoel-Gammon, Carol and Cooper, Judith A. (1984). Patterns of early lexical and phonological development. Journal of Child Language, 11: 247–7 1. Stokes, Stephanie F., Wong, Anita M.-Y., Fletcher, Paul, and Leonard, Laurence B. (2006). Nonword repetition and sentence repetition as clinical markers of specific language impairment: The case of Cantonese. Journal of Speech, Language and Hearing Research, 49: 219–36. Stoll, Sabine (1998). Acquisition of Russian aspect. First Language, 18: 351–77. Storkel, Holly L. (2001). Learning new words: Phonotactic probability in language development. Journal of Speech, Language and Hearing Research, 44: 1321–37. Storkel, Holly L. (2009). Developmental differences in the effects of phonological, lexical, and semantic variables on word learning by infants. Journal of Child Language, 36: 291–321. Stowell, Timothy (1981). Origins of phrase structure. Unpublished Ph.D. thesis, MIT. Stowell, Timothy (1982a). Conditions on reanalysis. In Alec Marantz and Tim Stowell (eds), Papers in Syntax, MIT Working Papers in Linguistics 4. Cambridge, Massachusetts: MIT Working Papers in Linguistics, 245–69. Stowell, T. (1982b). The tense of infinitives. Linguistic Inquiry, 13: 561–70. Stowell, T. (1991). Small clause restructuring. In R. Freidin (ed.), Principles and Parameters in Comparative Grammar. Cambridge, MA: MIT Press, 182–218. Stowell, T. and Beghelli, F. (1994). The direction of quantifier movement. Paper presented at the GLOW Conference, Vienna. Straight, H. Stephen (1980). Auditory versus articulatory phonological processes and their development in children. In ed. by Grace H. Yeni-Komshian, James F. Kavanagh, and Charles A. Ferguson (eds), Child Phonology: Volume 1: Production. New York: Academic Press, 1: 43–67. Strange, Winifred and Broen, Patricia A. (1980). Perception and production of approximant consonants by 3-year-olds: A first study. In Grace H. Yeni-Komshian, James F. Kavanagh, and Charles A. Ferguson (eds), Child Phonology, 2: Perception. New York: Academic Press, 117–54. Straus, K. (2008). Validations of a probabilistic model of language learning. Unpublished Ph.D. thesis, Northeastern, Boston, MA. Streeter, Lynn A. (1976). Language perception of 2-month-old infants shows effects of both innate mechanisms and experience. Nature, 259: 39–41. Stromswold, K. (1988a). Linguistic representation of children’s wh-questions. Papers and Reports on Child Language (Stanford University), 27: 107–14. Stromswold, K. (1988b). The structure of children’s wh-questions. Paper presented at the 13th Annual Boston University Conference on Language Development. Stromswold, K. (1992). Learnability and the acquisition of auxiliaries. Unpublished Ph.D. thesis, MIT. Stromswold, Karin. (1996). Analyzing children’s spontaneous speech. In Dana McDaniel, Cecile McKee and Helen Smith Cairns (eds), Methods for Assessing Children’s Syntax. Cambridge, MA: MIT Press, 23–53. Stromswold, K. (2001). The heritability of language: A review and meta-analysis of twin, adoption, and linkage studies. Language, 77(4): 647–723. Stromswold, K. (2008). The genetics of speech and language impairments. The New England Journal of Medicine, 359: 2381–3.

944 References Stromswold, K. and Snyder, W. (1995). Acquisition of datives, particles, and related constructions: Evidence for a parametric account. In D. MacLaughlin and S. McEwen (eds), Proceedings of the 19th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Stump, G. (1993). On rules of referral. Language, 69: 449–79. Su, Y. (2003). Chinese children’s ziji,’ Paper presented at Boston University Conference on Language Development. Sudhalter, V. and Braine, M. (1985). How does comprehension of passives develop? A comparison of actional and experiential verbs. Journal of Child Language, 12: 455–70. Sugisaki, K. (1999). Japanese passives in acquisition. University of Connecticut Working Papers in Linguistics, 10: 145–56. Sugisaki, K. (in press). LF wh-movement and its locality constraints in child Japanese. Language Acquisition, 19: 174–81. Sugisaki, Koji and Isobe, Miwa (2000). Resultatives result from the Compounding Parameter: On the acquisitional correlation between resultatives and N–N compounds in Japanese. In Billerey and Lillehaugen (eds), WCCFL 19 Proceedings. Somerville, MA: Cascadilla Press, 493–506. Sugisaki, K. and Isobe, M. (2001). Quantification without qualification without plausible dissent. In J.-Y.Kim and A.Werle (eds), The Proceedings of SULA 1, Amherst, MA: GLSA Publications. Sugisaki, Koji and Snyder, William (2002). Preposition stranding and the compounding parameter: A developmental perspective. In Barbora Skarabela, Sarah Fish, and Anna H.-J. Do (eds), Proceedings of the 26th Annual Boston University Conference on Language Development. Somerville, Massachusetts: Cascadilla Press, 677–88. Sugisaki, K., and Snyder, W. (2003). Do parameters have default values? Evidence from the acquisition of English and Spanish. In Y. Otsu (ed.), Proceedings of the Fourth Tokyo Conference on Psycholinguistics. Tokyo: Hituzi Syobo. Sugisaki, K. and Snyder, W. (2005). Evaluating the variational model of language acquisition. In K. U. Deen, J. Nomura, B. Schulz, and B. D. Schwartz (eds), Proceedings of the Inaugural Conference on Generative Approaches to Language Acquisition—North America (GALANA). University of Hawaii. Sugisaki, Koji, and Snyder, William (2006). The parameter of preposition stranding: A view from child English. Language Acquisition, 13: 349–61. Suppes, Patrick (1974). The semantics of children’s language. American Psychologist, 29: 103–14. Suppes, P. and Feldman, S. (1969). Young children’s comprehension of logical connectives. Technical Report No. 150, Stanford University. Suppes, P., Smith, R., and Leveille, M. (1973). The French syntax of a child’s noun phrases. Archives de Psychologie, 12(166): 207–69. Suresh, R., Ambrose, N., Roe, C., Pluzhinikov, A., Wittke-Thompson, J. K., Ng, M. C.-Y., et al. (2006). New complexities in the genetics of stuttering. American Journal of Human Genetics, 78: 554–63. Suzman, Susan (1985). Learning the passive in Zulu. Papers and Reports on Child Language Development, 24: 131–7. Svartvik, J. (1966). On Voice in the English Verb. The Hague: Mouton and Co. Svenonius, Peter (1996). The verb– particle alternation in the Scandinavian languages. Unpublished manuscript, University of Tromsø.

References 945 de Swart, Henriëtte (1998). Aspect shift and coercion. Natural Language and Linguistic Theory, 16: 347–85. de Swart, Henriëtte (2007). A cross-linguistic discourse analysis of the perfect. Journal of Pragmatics, 39(12): 2273–307. Swift, Mary (2004). Time in Child Inuktitut: A developmental study of an Eskimo-Aleut language. Berlin: Mouton de Gruyter. Swingley, Daniel (2005a). 11- month- olds’ knowledge of how familiar words sound. Developmental Science, 8: 432–43. Swingley, D. (2005b). Statistical clustering and the contents of the infant vocabulary. Cognitive Psychology, 50: 86–132. Swingley, D. (2009a). Contributions of infant word learning to language development. Philosophical Transactions of the Royal Society B, 364: 3617–32. Swingley, Daniel (2009b). Onsets and codas in 1.5-year-olds’ word recognition. Journal of Memory and Language, 608: 252–69. Swingley, Daniel and Aslin, Richard N. (2000). Spoken word recognition and lexical representation in very young children. Cognition, 76: 147–66. Swingley, Daniel and Aslin, Richard N. (2002). Lexical neighborhoods and the word-form representations of 14-month-olds. Psychological Science, 13: 480–4. Swingley, Daniel and Aslin, Richard N. (2007). Lexical competition in young children’s word learning. Cognitive Psychology, 54: 99–132. Syrett, K. (2007). Learning about the structure of scales: Adverbial modification and the acquisition of the semantics of gradable adjectives. Unpublished Ph.D. thesis, Northwestern University. Syrett, K. (2010). The representation and processing of measure phrases by four-year-olds. In K. Franich, K. Iserman, and L. Keil (eds), Proceedings of the 34th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 421–32. Syrett, K. (2013). The role of cardinality in the interpretation of measurement expressions. Language Acquisition: A journal of developmental linguistics, 20: 228–40. Syrett, K. and Lidz, J. (2009). QR in child grammar: Evidence from antecedent-contained deletion. Language Acquisition, 16: 67–81. Syrett, K. and Lidz, J. (2010). 30-month-olds use the distribution and meaning of adverbs to interpret novel adjectives. Language Learning and Development, 6: 258–82. Syrett, K. and Lidz, J. (2011). Competence, performance, and the locality of quantifier raising: Evidence from 4-year-old children. Linguistic Inquiry, 42: 305–37. Syrett, K., Bradley, E., Kennedy, C., and Lidz, J. (2006). Shifting standards: Children’s understanding of gradable adjectives. In K. Ud Deen, J. Nomura, B. Schulz, and B. D. Schwartz (eds), Proceedings of the Inaugural Conference on Generative Approaches to Language Acquisition—North America, Honolulu, HI. Cambridge, MA: UConn Occasional Papers in Linguistics 4: 353–64. Syrett, K., Kennedy, C., and Lidz, J. (2010). Meaning and context in children’s understanding of gradable adjectives. Journal of Semantics, 27: 1–35. Syrett, K., Musolino, J., and Gelman, R. (2012). How can syntax support number word acquisition. Language Learning and Development, 8: 146–76. Syrett, K., Arunachalam, S., and Waxman, S. R. (2014). Slowly but surely: Adverbs support verb learning in 2-year-olds. Language Learning and Development, 10: 263–78. Szabolcsi, A. (1983/84). The possessor that ran away from home. The Linguistic Review, 3: 89–1–2.

946 References Szabolcsi, A. (1994). The noun phrase. In F. Kiefer and K. Kiss (eds), The Syntactic Structure of Hungarian. San Diego, CA: Academic Press, 179–274. Szabolcsi, A. (2002). Hungarian Disjunctions and Positive Polarity. In I. Kenesei and P. Siptar (eds), Approaches to Hungarian 8. Budapest: Akademiai Kiado. Szabolcsi, A. (2004). Positive polarity–negative polarity. Natural Language and Linguistic Theory, 22: 409–52. Szabolcsi, A. and Haddican, B. (2004). Conjunction meets negation: A study in cross-linguistic variation. Journal of Semantics, 21(3): 219–49. Tager-Flusberg, H. (1994). Dissociations in form and function in the acquisition of language by autistic children. In H. Tager-Flusberg (ed.), Constraints on Language Acquisition: Studies of atypical children. Hillsdale, NJ: Erlbaum, 175–94. Tager-Flusberg, H. (2004). Strategies for conducting research on language in Autism. Journal of Autism and Developmental Disorders, 34(1): 75–80. Tager-Flusberg, H. and Joseph, R. (2005). How language facilitates the acquisition of false belief in children with autism. In J. W. Astington and J. A. Baird (eds), Why Language Matters for Theory of Mind. New York: Oxford University Press. Taipale, M., Kaminen, N., Nopola-Hemmi, J., Haltia, T., Myllyluoma, B., Lyytinen, H., Muller, K., Kaaranen, M., Lindsberg, P.J., Hannula-Jouppi, K., and Kere, J. (2003). A candidate gene for developmental dyslexia encodes a nuclear tetratricopeptide repeat domain protein dynamically regulated in the brain. Proceedings of the National Academy of Sceince, 100(2): 11553–8. Talmy, L. (1976). Semantic causative types. In M. Shibatani (ed.), Syntax and Semantics Volume 6: The grammar of causative constructions. New York: Academic Press, 6: 43–116. Talmy, Leonard (1985). Lexicalization patterns: Semantic structure in lexical forms. In Timothy Shopen (ed.), Language Typology and Syntactic Description, Volume III: Grammatical categories and the lexicon. Cambridge: Cambridge University Press, III: 57–149. Talmy, L. (1988). Force dynamics in language and cognition. Cognitive Science, 12: 49–100. Tanz, Christine. (1974). Cognitive principles underlying children’s errors in pronominal casemarking. Journal of Child Language, 1: 271–6. Tanz, C. (1977). Polar exploration: Hot and cold, cool and cold. Journal of Child Language, 4: 477–8. Tardif, T. and Wellman, H. M. (2000). Acquisition of mental state language in Mandarin-and Cantonese-speaking children. Developmental Psychology, 36: 25–43. Tardif, T., So, C. and Kaciroti, N. (2007). Language and false belief: Evidence for general, not specific, effects in Cantonese- speaking preschoolers, Developmental Psychology, 43: 318–40. Taylor, P. A. (2000). Analysis and synthesis of intonation using the Tilt model. Journal of the Acoustical Society of America, 107: 1697–7 14. Tenenbaum, J. (1996). Learning the structure of similarity. In D. Touretzky, M. Mozer, and M. Hasslemo (eds), Neural Information Processing Systems, 8. Cambridge, MA: MIT Press. Tenenbaum, Joshua (1999). A Bayesian framework for concept learning. Unpublished Ph.D. thesis, MIT. Tenenbaum, J. and Griffiths, T. (2001). Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24: 629–41. Tenny, Carol (1987). Grammaticalizing aspect and affectedness. Unpublished Ph.D. thesis, MIT. Tenny, Carol (1994). Aspectual Roles and the Syntax–Semantics Interface. Dordrecht: Kluwer.

References 947 Terada, M. (1991). Small clause passives. In B. Plunkett (ed.), Psycholinguistics. UMOP 15. Terzi, A. (1992). PRO in finite clauses. A study of the inflectional heads of the Balkan languages. Unpublished Ph.D. thesis, CUNY Graduate Center. Terzi, A. (1997). PRO and null case in finite clauses. The Linguistic Review, 14: 335–60. Terzi, A. and Wexler, K. (2002). A-Chains and S-Homophones in children’s grammar: Evidence from Greek passives. In M. Hirotani (ed.), The Proceedings of the 32nd Conference of the North East Linguistic Society NELS 32. Amherst: GLSA, 519–37. Tesar, Bruce (1995). Computational Optimality Theory. Unpublished Ph.D. thesis, University of Colorado, Boulder, CO. Tesar, Bruce (1997a). An iterative strategy for learning metrical stress in Optimality Theory. In The Proceedings of the 21St Annual Boston University Conference on Language Development. Boston University, MA, 615–26. Tesar, Bruce (1997b). Multi-recursive constraint demotion. Unpublished manuscript, Rutgers University, New Brunswick, NJ. Tesar, B. (1998). An iterative strategy for language learning. Lingua, 104: 131–45. Tesar, B (2000). Using inconsistency detection to overcome structural ambiguity in language learning, Ms. Technical Report RuCCS-TR-58, Rutgers Center for Cognitive Science, Rutgers University. Tesar, Bruce (2004a). Contrast analysis in phonological learning. Unpublished manuscript, Rutgers University, NJ. Tesar, Bruce (2004b). Using inconsistency detection to overcome structural ambiguity. Linguistic Inquiry, 35(2): 219–53. Tesar, Bruce (2006a). Faithful contrastive features in learning. Cognitive Science, 30(5): 863–903. Tesar, Bruce (2006b). Learning from paradigmatic information. In Proceedings of the Thirty- Sixth Conference of the North East Linguistics Society, 619–38. Tesar, Bruce (2007). A comparison of lexicographic and linear numeric optimization using Ldots. Unpublished manuscript, Rutgers University, New Brunswick, NJ. Tesar, Bruce (2008). Output-driven maps. Unpublished manuscript, Rutgers University, New Brunswick, NJ. Tesar, Bruce (2009). Learning phonological grammars for output-driven maps. In Proceedings of the Thirty-Ninth Conference of the North East Linguistics Society. Tesar, Bruce and Smolensky, Paul (1993). The learnability of Optimality Theory: An algorithm and some basic complexity results. Unpublished manuscript, ROA-2, Rutgers Optimality Archive. Retrieved from . Tesar, Bruce and Smolensky, Paul (1998). Learnability in Optimality Theory. Linguistic Inquiry, 29(2): 229–68. Tesar, B. and Smolensky, P. (2000). Learnability in Optimality Theory. Cambridge, MA: MIT Press. Tesar, Bruce, Alderete, John, Horwood, Graham, Merchant, Nazarré, Nishitani, Koichi, and Prince, Alan (2003). Surgery in language learning. In Proceedings of the 22nd West Coast Conference on Formal Linguistics, 477–90. Tessier, Anne- Michelle (2006). Testing for OO- Faithfulness in Artificial Phonological Acquisition. In David Bamman, Tatiana Magnitskaia, and Colleen Zaller (eds), Proceedings of the 30th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 619–30. Tessier, Anne-Michelle (2007). Biases and stages in phonological acquisition. Unpublished Ph.D. thesis, University of Massachusetts.

948 References Tessier, Anne-Michelle (2009). Frequency of violation and constraint-based phonological learning. Lingua, 119(1): 6–38. Tessier, Anne-Michelle (2012). Testing for OO_faithfulness in the acquisition of consonant clusters. Language Acquisition, 144–73. Theakston, A. L., Lieven, E. V. M., Pine, J. M., and Rowland, C. F. (2001). The role of performance limitations in the acquisition of verb–argument structure: An alternative account. Journal of Child Language, 28: 127–52. Theakston, A. L., Lieven, E. V. M., Pine, J. M., and Rowland, C. F. (2005). The acquisition of auxiliary syntax: BE and HAVE. Cognitive Linguistics, 16(1): 247–77. Thierry, G., Vihman, M., and Roberts, M. (2003). Familiar words capture the attention of 11- month-olds in less than 250 ms. Neuroreport, 14: 2307–10. Thiessen, E. (2007). The effect of distributional information on children’s use of phonemic contrasts. Journal of Memory and Language, 56: 16–34. Thiessen, E. D. and Saffran, J. R. (2003). When cues collide: Use of stress and statistical cues to word boundaries by 7-to 9-month-old infants. Developmental Psychology, 39: 706–16. Thiessen, E. D., Hill, E. A., and Saffran, J. R. (2005). Infant-directed speech facilitates word segmentation. Infancy, 7: 53–7 1. Thomas, Enlii Mon and Gathercole, Virginia C. Mueller (2007). Children’s productive command of grammatical gender and mutation in Welsh: An alternative to rule-based learning. First Language, 27(3): 251–78. Thomas, M. S. C. and Karmiloff-Smith, A. (2003). Modeling language acquisition in atypical phenotypes. Psychological Review, 110: 647–82. Thomas, M. S. C., Karaminis, T. N. and Knowland, V. C. P. (2010). What is typical language development. Language Learning and Development, 6: 162–9. Thomas, Wolfgang (1997). Languages, Automata, and Logic. Berlin: Springer. Thompson, L. A., Detterman, D. K., and Plomin, R. (1991). Associations between cofinitive abilities and scholastic achievement: Genetic overlap but environmental differences. Psychological Science, 2: 158–65. Thompson, S. and Newport, E. (2007). Statistical learning of syntax: The role of transitional probability. Language Learning and Development, 3: 1–42. Thornton, R. (1990). Adventures in long-distance moving: The acquisition of complex Wh- questions. Unpublished Ph.D. thesis, University of Connecticut. Thornton, R. (1995). Referentiality and wh-movement in child English: Juvenile D-Linkuency. Language Acquisition, 4: 139–75. Thornton, R. (2004). Why continuity. In A. Brugos, L. Micciulla, and C. E. Smith (eds), Proceedings of Boston University Conference on Language Development (BUCLD). Somerville: Cascadilla Press, 28: 620–32. Thornton, R. (2008). Why continuity. Natural Language and Linguistic Theory, 26(1): 107–46. Thornton, R., and Crain, S. (1994). Successful cyclic movement. In T. Hoekstra and B. Schwartz (eds), Language Acquisition Studies in Generative Grammar. Amsterdam & Philadelphia: John Benjamins, 215–53. Thornton, R. and Wexler, K. (1999). Principle B, VP Ellipsis, and Interpretation in Child Grammar. Cambridge, MA: MIT Press,. Thothathiri, M. and Snedeker, J. (2008). Syntactic priming during language comprehension in three- and four-year-old children. Journal of Memory and Language, 58: 188–213. Thothathiri, M. and Snedeker, J. (2011). The role of thematic roles in sentence processing: Evidence from structural priming in young children. Unpublished manuscript, Harvard University.

References 949 Thráinsson, Höskuldur (2000). Object shift and scrambling. In Mark R. Baltin and Chris Collins (eds), The Handbook of Contemporary Syntactic Theory. Oxford: Blackwell, 148–202. Thrasher, R. H., Jr. (1974). Shouldn’t ignore these strings: A study of conversational deletion. Unpublished Ph.D. thesis, Retrieved from ProQuest Dissertations and Theses database. Threlkeld, S. W., McClure, M. M., Bai, J., Wang, Y., LoTurco, J. J., Rosen, G. D., and Fitch, H. (2007). Developmental disruptions and behavioral impairments in rats following in utero RNAi of Dyx1c1. Brain Research Bulletin, 71: 508–14. Tiemann, S., Hohaus, V., and Beck, S. (2010). German and English comparatives are different: Evidence from child language acquisition. In Proceedings of Linguistic Evidence, 254–58. Tincoff, Ruth and Jusczyk, Peter W. (2003). Are word-final sounds perceptually salient for infants? In Derek M. Houston, Amanda Seidl, George Hollich, Elizabeth K. Johnson, and Ann Marie Jusczyk (eds), Jusczyk Lab Final Report. West Lafayette, IN: Purdue University. . Tincoff, R., Santelmann, L. M. and Jusczyk, P. W. (2000). Auxiliary verb learning and 18- month-olds’ acquisition of morphological relationships. In S. C. Howell, S. A. Fish, and T. Keith-Lucas (eds), Proceedings of the 24th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 726–37. Tîrnauca, Cristina (2008). A note on the relationship between different types of correction queries. In Alexander Clark, François Coste, and Laurent Miclet (eds). Grammatical Inference: Algorithms and Applications, 9th International Colloquium, ICGI 2008, Saint- Malo, France, September 22–24, 2008, Proceedings, Lecture Notes in Computer Science. Berlin: Springer, 5278: 213–23. Tokizaki, Hisao (2013). Deriving the compounding parameter from phonology. Linguistic Analysis, 38: 275–302. Tomasello, M. (1992). First verbs: A case study of early grammatical development. Cambridge: Cambridge University Press. Tomasello, M. (1998a). One child’s early talk about possession. In R. J. Newman (ed.), The Linguistics of Giving. Philadelphia: John Benjamins, 349–73. Tomasello, Michael (1998b). The return of constructions. Journal of Child Language, 25: 431–42. Tomasello, M. (2000a). Do young children have syntactic competence? Cognition, 74: 209–53. Tomasello, M. (2000b). The item-based nature of children’s early syntactic development. Trends in Cognitive Sciences, 4: 156–63. Tomasello, M. (2003). Constructing a Language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Tomasello, Michael (2005). Constructing a Language: A uUsage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Tomasello, M. and Brooks, P. (1998). Young children’s earliest transitive and intransitive constructions. Cognitive Linguistics, 9: 379–95. Tornyova, L. and Valian, V. (2009). The role of cross-linguistic variation in the acquisition of auxiliary inversion in wh-questions. In J. Crawford, K. Otaki, and M. Takahashi (eds), Proceedings of the 3rd Conference on Generative Approaches to Language Acquisition North America (GALANA 2008). Somerville, MA: Cascadilla Press, 282–90. Toro, J. M. and Trobalon, J. B. (2005). Statistical computations over a speech stream in a rodent. Perception and Psychophysics, 67(5): 867–75. Toro, J. M. and Trobalon, J. (2005). Statistical computations over a speech stream in a rodent. Perception and Psychophysics, 67: 867–75.

950 References Torrence, Harold and Hyams, Nina (2004). On the role of aspect in determining finiteness and temporal interpretation in early grammar. In J. van Kampen and S. Baauw (eds), Proceedings of GALA 2003. Utrecht: LOT Publications. Torrens, V. (1995). The acquisition of inflection in Spanish and Catalan. In C. Schütze, J. Ganger, and K. Broihier (eds), Papers on Language Processing and Acquisition. Cambridge: MIT Working Papers in Linguistics, 26: 451–72. Townsend, D. (1974). Children’s comprehension of comparative forms. Journal of Experimental Child Psychology, 18: 293–303. Townsend, D. (1976). Do children interpret “marked” comparative adjectives as their opposites? Journal of Child Language, 3: 385–96. Travis, Lisa (1994). Event phrase structure and a theory of functional categories. In P. Koskinen (ed.), Proceedings of the 1994 Annual Meeting of the Canadian Linguistic Society, 559–70. Trehub, S. E. and Abramovitch, R. (1978). Less is not more: Further observations on nonlinguistic strategies. Journal of Experimental Child Psychology, 25: 160–7. Trommelen, M. (1983). The Syllable in Dutch: With special reference to diminutive formation. Foris. Dordrecht. Trubetzkoy, Nikolai (1939/1969). Principles of Phonology, trans C.A.M. Baltaxe. University of California Press, Berkeley, CA. Trueswell, J. C., Sekerina, I., Hill, N. M., and Logrip, M. L. (1999). The kindergarten- path effect: Studying on- line sentence processing in young children. Cognition, 73(2): 89–134. Tsai, W.-T. D. (1994). On economizing the theory of A-bar dependencies. Unpublished Ph.D. thesis, MIT. Tsay, J. (2001). Phonetic parameters of tone acquisition in Taiwanese. In M. Nakayama (ed.), Issues in East Asian Language Acquisition. Tokyo, Kuroshio, 205–26. Tse, J. K. P. (1978). Tone acquisition in Cantonese: A longitudinal case study. Journal of Child Language, 5: 191–204. Tuller, L. and M. Blondel, and Niederberger, N. (2007). Growing up bilingual in French and French Sign Language. In D. Ayoun (ed.), French Applied Linguistics. Philadelphia, PA: John Benjamins. Turing, Alan (1937). On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society, s2: 230–65. Turk, A. E., Jusczyk, P. W., and Gerken, L. (1995). Do English-learning infants use syllable weight to determine stress? Language and Speech, 38: 143–58. Turkel, W. J. (1996). Acquisition by a genetic algorithm-based model in spaces with local maxima. Linguistic Inquiry, 27(2): 350–5. Turkheimer, E. (2000). Three laws of behavior genetics and what they mean. Current Directions in Psychological Science, 9(5): 160–4. Turner, E. A. and Rommetveit, R. (1967a). The acquisition of sentence voice and reversibility. Child Development, 38: 649–60. Turner, E. A. and Rommetveit, R. (1967b). Experimental manipulation of the production of active and passive voice in children. Language and Speech, 10: 169–80. Tuvblad, C., Grann, M., and Lichtenstein, P. (2006). Heritability for adolescent antisocial behavior differs with socioeconomic status: Gene–environment interaction. Journal of Child Psychology and Psychiatry, 47(7): 734–43.

References 951 Tyack, D. and Ingram, D. (1977). Children’s production and comprehension of questions. Journal of Child Language, 4: 211–24. Tyler, Ann A. and Figurski, G. Randall (1994). Phonetic inventory changes after treating distinctions along an implicational hierarchy. Clinical Linguistics and Phonetics, 8: 91–108. Tyler, Ann A. and Langsdale, Teru E. (1996). Consonant–vowel interactions in early phonological development. First Language, 16: 159–91. Ullman, M. T. (2004). Contributions of neural memory circuits to language: The declarative/ procedural model. Cognition, 92(1–2): 231–70. Ullman, M. T. and Pierpont, E. I. (2005). Specific Language Impairment is not specific to language: The procedural deficit hypothesis. Cortex, 41: 399–433. Uriagereka, J. (1995). Some aspects of the syntax of clitic placement in Western Romance. Linguistic Inquiry, 26: 79–123. Vainikka, Anne (1993). Case in the development of English syntax. Language Acquisition, 3(3): 257–325. Vainikka, A. and Levy, Y. (1999). Empty subjects in Finnish and Hebrew. Natural Language and Linguistic Theory, 17: 613–7 1. Valian, V. (1986). Syntactic categories in the speech of young children. Developmental Psychology, 22: 562–79. Valian, V. (1990). Null subjects: A problem for parameter setting models of language acquisition. Cognition, 35: 105–22. Valian, V. (1991). Syntactic subjects in the early speech of American and Italian children. Cognition, 40: 21–81. Valian, V. (1993). Parser failure and grammar change. Cognition, 46: 195–202. Valian, V. (1994). Children’s postulation of null subjects: Parameter setting and language acquisition. In B. Lust, G. Hermon, and J. Kornfilt (eds), Syntactic Theory and First Language Acquisition: Cross-linguistic Perspectives. Vol. 2: Binding, dependencies, and learnability. Hillsdale, NJ: Erlbaum, 2: 273–86. Valian, V. (2006). Young children’s understanding of present and past tense. Language Learning and Development, 2: 251–76. Valian, V. and Aubry, S. (2005). When opportunity knocks twice: two-year-olds’ repetition of sentence subjects. Journal of Child Language, 32: 617–41. Valian, V. and Casey, L. (2003). Young children’s acquisition of questions: The role of structured input. Journal of Child Language, 30(1): 117–43. Valian, V. and Eisenberg, Z. (1996). Syntactic subjects in the speech of Portuguese-speaking children. Journal of Child Language, 23: 103–28. Valian, V., Hoeffner, J., and Aubry, S. (1996). Young children’s imitation of sentence subjects: Evidence of processing limitations. Developmental Psychology, 32: 153–64. Valian, V., Lasser, I., and Mandelbaum, D. (1992). Children’s Early Questions. Paper presented at the 17th Annual Boston University Conference on Language Development. Valian, V., Prasada, S., and Scarpa, J. (2006). Direct object predictability: Effects on young children’s imitation of sentences. Journal of Child Language, 33(2): 247–69. Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM, 27: 1134–42. Vallabha, Gautam K., Mcclelland, James L., Pons, Ferran, Werker, Janet F., and Amano, Shigeaki (2007). Unsupervised learning of vowel categories from infant-directed speech. Proceedings of the National Academy of Sciences of the United States of America, 104: 13273–8.

952 References van de Weijer, J. (1998). Language input for word discovery. Unpublished Ph.D. thesis, University of Nijmegen. van der Feest, S., and van Hout, A. (2002). Tense comprehension in child Dutch. In B. Skarabela, S. Fish, and A. Do (eds), Proceedings of 26th BUCLD. Somerville, MA: Cascadilla Press, 734–45. van der Lely, H. and Battell, J. (2003). Wh-movement in children with grammatical SLI: A test of the RDDR hypothesis. Language, 79(1): 153–81. van der Lely, H. K. J., Jones, M., and Marshall, C. R. (2011). Who did Buzz see someone? Grammaticality judgment of wh-questions in typically developing children and children with Grammatical SLI. Lingua, 121: 408–22. Van Gelderen, V., and Van der Meulen, I. (1998). Root infinitives in Russian: Evidence from acquisition. Term Paper, Leiden University. van Kampen, J. (1997). First Steps in Wh-movement. Delft: Eburon. Van Oostendorp, Marc (2004a). Crossing morpheme boundaries in Dutch. Lingua, 114(11): 1367–400. Van Oostendorp, Marc (2004b). Aspects of Vowel Harmony. Lectures notes, University of the Aegean, Rhodes, Greece. Available at . Vanderweide, Teresa. (2006). Cues, opacity and the puzzle- puddle- pickle problem. In Proceedings of the 2006 Canadian Linguistics Association Annual Conference. Citeseer, 115–30. Vapnik, Vladimir (1995). The Nature of Statistical Learning Theory. New York: Springer. Vapnik, Vladimir (1998). Statistical Learning Theory. New York: Wiley. Vargha-Khadem, F., Gadian, D. G., Copp, A., and Mishkin, M. (2005). FOXP2 and the neuroanatomy of speech and language. Nature Reviews /Neuroscience, 6: 131–8. Varlokosta, S. (2000). Lack of clitic-pronoun distinctions in the acquisition of Principle B in child Greek. In S. C. Howell, S. A. Fish, and T. Keith-Lucas (eds), Proceedings of the 24th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press,. Varlokosta, S. and Hornstein, N. (1993). Control in modern Greek. Proceedings of NELS, 23: 507–21. Varlokosta, Spyridoula, Vainikka, Anne, and Rohrbacher, Bernhard (1996). Root infinitives without infinitives. In Andy Stringfellow, Dalia Cahana- Amitay, Elizabeth Hughes, and Andrea Zukowski (eds), BUCLD 20: Proceedings of the 20th Annual Boston University Conference on Language Development, 20. Somerville, MA: Cascadilla Press, 816–27. Varlokosta, S., Vainikka, A., and Rohrbacher, B. (1998). Functional projections, markedness, and “root infinitives” in early child Greek. The Linguistic Review, 15: 187–207. Vasilyeva, M., Huttenlocher, J., and Waterfall, H. (2006). Effects of language intervention on syntactic skill levels of preschoolers. Developmental Psychology, 42: 164–74. Velleman, Shelley L. (1988). The role of linguistic perception in later phonological development. Applied Psycholinguistics, 9: 221–36. Vendler, Z. (1957). Verbs and times. Philosophical Review, 66: 143–60. Vendler, Z. (1968). Adjectives and Nominalizations. Papers on formal linguistics, vol. 5. Berlin: Walter De Gruyter. Vendler, Z. (1972). Res cogitans. Ithaca, NY: Cornell University Press. Venneman, Theo (1972). Rule inversion. Lingua, 29: 209–42.

References 953 Verbuk, A. (2006a). The acquisition of the Russian Or. Paper presented at Western Conference on Linguistics (WECOL 06). California State University, Fresno. Verbuk, A. (2006b). Acquisition of scalar implicatures: When some of the crayons will do the job. Online Proceedings from the 31st Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Verkuyl, Henk (1972). On the Compositional Nature of the Aspects. Dordrecht: D. Reidel Publishing Company. Verkuyl, Henk (1993). A Theory of Aspectuality. Cambridge: Cambridge University Press. Verkuyl, Henk (2005). How (in-)sensitive is tense to aspectual information? B. Hollebrandse, A. van Hout, and C. Vet (eds), Crosslinguistic Studies on Aspect, Tense and Modality. Amsterdam/New York: Rodopi, 145–69. Verkuyl, Henk, de Swart, Henriëtte, and van Hout, Angeliek (eds) (2005). Perspectives on Aspect. Dordrecht: Springer. Vernes, S. C., Newbury, D. F., Abrahams, B. S., Winchester, L., Nicod, J., Groszer, M., Alarcon, M., Oliver, P. L., Davies, K. E., Geschwind, D. H., Monaco, A. P., and Fisher, S. E. (2008). A functional genetic link between distinct developmental language disorders. The New England Journal of Medicine, 359: 2337–45. Verrips, M. (1996). Potatoes must peel: The acquisition of Dutch passive. Unpublished Ph.D. thesis, University of Amsterdam. Viau, J. (2007). Possession and spatial motion in the acquisition of ditransitives. Unpublished Ph.D. thesis, Northwestern University. Viau, J., and Lidz, J. (2011). Selective learning in the acquisition of Kannada ditransitives. Unpublished manuscript, Johns Hopkins University. Viau, J., Lidz, J., and Musolino, J. (2010). Priming of abstract logical representations in 4-year- olds. Language Acquisition, 17: 26–50. Viding, E., Spinath, F. M., Price, T. S., Bishop, D. V. M., Dale, P. S., and Plomin, R. (2004). Genetic and environmental influence on language impairment in 4-year-old same-sex and opposite-sex twins. Journal of Child Psychology and Psychiatry, 45(2): 315–25. Vihman, Marilyn (1978). Consonant harmony: Its scope and function in child language. In Joseph Greenberg (ed.), Universals of Human Language. Stanford, CA: Stanford University Press, 281–334. Vihman, Marilyn (1996). Phonological Development: The origins of language in the child. Oxford: Blackwell. Vihman, Marilyn May and Croft, William (2007). Phonological development: Toward a “radical” templatic phonology. Linguistics, 45: 683–725. Vihman, Marilyn May, Ferguson, Charles A., and Elbert, Mary (1986). Phonological development from babbling to speech: Common tendencies and individual differences. Applied Psycholinguistics, 7: 3–40. Vihman, Marilyn, DePaolis, Rory, and Davis, Barbara (1998). Is there a “trochaic bias” in early word learning? Evidence from infant production in English and French. Child Development, 69: 935–49. Vihman, M. M., Nakai, S., DePaolis, R. A., and Hallé, P. (2004). The role of accentual pattern in early lexical representation. Journal of Memory and Language, 50: 336–53. Vikner, S. (1995). Verb Movement and Expletive Subjects in the Germanic Languages. Oxford: Oxford University Press. Villa-Garcia, J. (2013). On the role of children’s deterministic learning in the “no-overt-subject” stage in the L1 Acquisition of Spanish. In C. Cathcart, I.-H. Chen, G. Finley, S. Kang, C.

954 References S. Sandy, and E. Stickles (eds), Proceedings of the 37th Annual Meeting of the Berkeley Linguistics Society, 375–88. Villanueva, P., Newbury, D. F., Jara, L., De Barbieri, Z., Mirza, G., Palomino, H. M., Fernandez, M. A., Cazier, J.-B., Monaco, A. P., and Palomino, H. (2011). Genome-wide analysis of genetic susceptibility to language impairment in an isolated Chilean population. European Journal of Human Genetics, 19: 687–95. Villavicencio, A. (2001). The acquisition of a unification-based generalised categorial grammar. Unpublished Ph.D. thesis, Cambridge University. Vinnitskaya, Ina and Wexler, Ken (2001. The role of pragmatics in the development of Russian aspect. First Language, 21: 143–86. Vitevitch, Michael S. and Luce, Paul A. (2004). A web-based interface to calculate phonotactic probability for words and nonwords in English. Behavior Research Methods, Instruments and Computers, 36: 481–7. Vogel, I. and Raimy, E. (2002). The acquisition of compound vs. phrasal stress: The role of prosodic constituents. Journal of Child Language, 29: 225–50. Volpato, F., Tagliaferro, L., Verin, L., and Cardinaletti, A. (2012).The comprehension of Italian (eventive) verbal passives in Italian pre-school aged children. In S. Stavrakaki (ed.), Proceedings of GALA. Newcastle: Cambridge Scholar Press. Volpato, F., Verin, L., and Cardinaletti, A. (2015). The comprehension and production of verbal passives by Italian preschool-age children. Applied Psycholinguistics, First Online: 1–31. von Stechow, A. (1984). Comparing semantic theories of comparison. Journal of Semantics, 3: 1–77. von Stutterheim, Christiane, Carroll, Mary, and Klein, Wolfgang (2009. New perspectives in analyzing aspectual distinctions across languages. In W. Klein and P. Li (eds), The Expression of Time. Berlin: Mouton de Gruyter, 195–216. Vosniadou, S. (1987). Contextual and linguistic factors in children’s comprehension of nonliteral language. Metaphor and Symbolic Activity, 2: 1–11. Wagner, K. R. (1985). How much do children say in a day? Journal of Child Language, 12: 475–87. Wagner, Laura (1998). The semantics and acquisition of time in language. Unpublished Ph.D. thesis, University of Pennsylvania. Wagner, Laura (2001). Aspectual influences on early tense interpretation. Journal of Child Language, 28: 661–81. Wagner, Laura (2002). Children’s comprehension of completion entailments in the absence of agency cues. Journal of Child Language, 29: 109–25. Wagner, Laura (2006). Aspectual bootstrapping in language acquisition: Transitivity and telicity. Language Learning and Development, 2: 51–77. Wagner, Laura (2009). I’ll never grow up: Continuity in aspectual representations. Linguistics, 47: 1051–74. Wagner, Laura (2010). Inferring meaning from syntactic structures: The case of transitivity and telicity. Language and Cognitive Processes, 25(10): 1354–79. Wagner, Laura (2012). First language acquisition. In R. Binnick (ed.), The Oxford Handbook of Tense and Aspect. Oxford: Oxford University Press, 458–80. Wagner, Laura and Carey, Susan (2003). Individuation of objects and events: A developmental study. Cognition, 90: 163–91. Wales, Julia, and Hollich, George J. (2004). Early word learning: How infants learn words that sound similar. In Kenneth Forbus, Dedre Gentner, and Terry Regier (es), Proceedings of the

References 955 Twenty-Sixth Annual Conference of the Cognitive Science Society, August 4–7, 2004, Chicago, Illinois USA, 1652. Wales, R. and Campbell, R. N. (1970). On the development of comparison and the comparison of development. In G. Flores D’Arcais and W. J. M. Levelt (eds), Advances in Psycholinguistics. Amsterdam: North-Holland. Wallace, J. (1972). Positive, comparative, superlative. The Journal of Philosophy, 69: 773–82. Walley, Amanda C., Smith, Linda B., and Jusczyk, Peter W. (1986). The role of phonemes and syllables in the perceived similarity of speech sounds for children. Memory and Cognition, 14: 220–29. Wang, H. and Mintz, T. (2008). A dynamic learning model for categorizing words using frames.In H. Chan, H. Jacob, and E. Kapia (eds), BUCLD 32 Proceedings. Somerville, MA: Cascadilla Press, 525–36. Wang, Q., Lillo-Martin, D., Best, C., and Levitt, A. (1992). Null subject versus null object: Some evidence from the acquisition of Chinese and English. Language Acquisition, 2: 221–54. Wannacott, E. and Watson, D. G. (2008). Acoustic emphasis in 4-year-olds. Cognition, 107: 1093–101. Wannemacher, J. T. and Ryan, M. L. (1978). “Less” is not “more”: A study of children’s comprehension of “less” in various task contexts. Child Development, 49, 660–8. Wasow, T. and Roeper, T. (1972). On the subject of gerunds. Foundations of Language, 8: 44–61. Watanabe, A. (1993). Agr-based case theory and its interaction with the A-bar system. Unpublished Ph.D. thesis, MIT. Watson, J. S., Gergely, G., Csanyi, V., Topal, J., Gasci, M., and Sarkozi, Z. (2001). Distinguishing logic from association in the solution of an invisible displacement task by children (Homo sapiens) and dogs (Canis familiaris): using negation of disjunction. Journal of Comparative Psychology, 115(3): 219–26. Waxman, S. R. (1990). Linguistic biases and the establishment of conceptual hierarchies: Evidence from preschool children. Cognitive Development, 5: 123–50. Waxman, S. R., and Klibanoff, R. S. (2000). The role of comparison in the extension of novel adjectives. Developmental Psychology, 36: 571–81. Weber, C. (2005). Reduced stress pattern discrimination in 5-month-olds as a marker of risk for later langauge impairment: neurophysiological evidence. Cognitive Brain Research, 25: 180–7. Weber, C., Hahne, A., Friedrich, M., and Friederici, A. D. (2004). Discrimination of word stress in early infant perception: Electrophysiological evidence. Cognitive Brain Research, 18: 149–61. Webster, O’Connor, Brendan, and Ingram, David (1972). The comprehension and production of the anaphoric pronouns “he, she, him, her” in normal and linguistically deviant children. Papers and Reports in Child Language Disorders, 4: 55–73. Weckerly, J., Wulfeck, B., and Reilly, J. (2004). The development of morphosyntactic ability in atypical populations: The acquisition of tag questions in children with early focal lesions and children with specific-language impairment. Brain and Language, 88(2): 190–201. Weinberg, Amy (1987). Comments of Borer and Wexler. In Thomas Roeper and Edwin Williams (ed.), Parameter Setting. Dordrecht: D. Reidel, 173–87. Weismer, Gary (1984). Acoustic analysis strategies for the refinement of phonological analysis. In Mary Elbert, Daniel A. Dinnsen, and Gary Weismer (eds), Phonological Theory and the Misarticulating Child (ASHA monographs no. 22). Rockville, MD: ASHA, 30–52.

956 References Weismer, Gary, Dinnsen, Daniel A., and Elbert, Mary (1981). A study of the voicing distinction associated with omitted, word-final stops. Journal of Speech and Hearing Disorders, 46. 320–8. Weist, Richard (2002). The first language acquisition of tense and aspect: A review. In R. Salaberry and Y. Shirai (eds), The L2 Acquisition of Tense–Aspect Morphology. Amsterdam: John Benjamins, 21–78. Weist, Richard (2003). Review of Ping Li and Yasushiro Shirai, The acquisition of lexical and grammatical aspect. Journal of Child Language, 30(1): 237–51. Weist, Richard, Wysocka, Jolanta, Witkowska-Stadnik, K., Buczowska, E., and Konieczna, E. (1984). The defective tense hypothesis: On the emergence of tense and aspect in child Polish. Journal of Child Language, 11: 347–74. Weist, Richard, Wysocka, Jolanta, and Lyytinen, Paula (1991). A cross-linguistic perspective on the development of temporal systems. Journal of Child Language, 18: 67–92. Weitzman, Raymond S. (2007). Some issues in infant speech perception: Do the means justify the ends. The Analysis of Verbal Behavior, 23: 17–27. Wells, B., Peppé, S., and Goulandris, N. (2004). Intonation development from five to thirteen. Journal of Child Language, 31: 749–78. Wells, G. (1979). Learning and using the auxiliary verb in English. In V. Lee (ed.), Language Development. London: Croom Helm, 250–70. Wellwood, A., Gagliardi, A., and Lidz, J. (2014). Syntactic and lexical inference in the acquisition of novel superlatives, article submitted for publication. Werker, Janet F. (1989). Becoming a native listener. American Scientist, 77: 54–9. Werker, Janet F. and Fennell, Christopher E. (2004). From listening to sounds to listening to words: Early steps in word learning. In D. Geoffrey Hall and Sandra R. Waxman (eds), Weaving a Lexicon, 79–110. Cambridge, MA: MIT Press. Werker, Janet F. and Tees, Richard (1983). Developmental changes across childhood in the perception of non-native speech sounds. Canadian Journal of Psychology, 37: 278–86. Werker, Janet F. and Tees, Richard C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the year of life. Infant Behavior and Development, 7: 49–36. Werker, Janet F., Cohen, Leslie B., Lloyd, Valerie L., Casasola, Marianella, and Stager, Christine L. (1998). Acquisition of word-object associations by 14-month-old infants. Developmental Psychology, 34: 1289–309. Werker, Janet F., Fennell, Christopher T., Corcoran, Kathleen M., and Stager, Christine L. (2002). Infants’ ability to learn phonetically similar words: Effects of age and vocabulary size. Infancy, 3: 1–30. Westergaard, M. R. (2003). On the acquisition of word order in wh-questions in the Tromso dialect. Nordlyd, 31(3): [np]. Westergaard, M. (2009). Usage-based vs. rule-based learning: The acquisition of word order in wh-questions in English and Norwegian. Journal of Child Language, 36: 1023–51. Weverink, M. (1989). The subject in relation to inflection in child language. Unpublished MA thesis, University of Utrecht. Wexler, K. (1994). Optional infinitives, head movement and the economy of derivations. In D. Lightfoot and N. Hornstein (eds), Verb Movement. Cambridge: Cambridge University Press, 305–62. Wexler, K. (1998). Very early parameter setting and the unique checking constraint: A new explanation of the optional infinitive stage. Lingua, 106: 23–79.

References 957 Wexler, Kenneth (2004). Theory of phasal development: Perfection in child grammar. In Aniko Csirmaz, Andrea Gualmini, and Andrew Navins (eds), MITWPL 48: Plato’s Problems: Papers on Language Acquisition. Cambridge, MA: MIT Working Papers in Linguistics, 159–209. Wexler, K. (2013). A new theory of null-subjects of finite verbs: A phasal analysis. In M. Becker, J. Grinstead, and J. Rothman.(eds), Generative Linguistics and Acquisition: Studies in honor of Nina M. Hyams. Amsterdam: John Benjamins, 325–55. Wexler, K. and Chien, Y. C. (1985). The development of lexical anaphoxs and pronouns. Papers and Reports on Child Language Development, 24: 138–49. Wexler, K., and Culicover, P. W. (1980). Formal Principles of Language Acquisition. Cambridge, MA: MIT Press. Wexler, K. and Manzini, R. (1987). Parameters and learnability in binding theory. In T. Roeper and E. Williams (eds), Parameter Setting. Dordrecht: Reidel, 41–76. Wexler, K., Schütze, C. T., and Rice, M. (1998). Subject case in children with SLI and unaffected controls: Evidence for the Agr/Tns omission model. Language Acquisition, 7(2–4): 317–44. Wexler, K., Gavarro, A., and Torrens, V. (2004a). Feature checking and object clitic omission in child Catalan and Spanish. In Reineke Bok-Bennema, Bart Hollebrandse, Brigitte Kampers- Manhe, and Petra Sleeman (eds), Romance Languages and Linguistic Theory 2002. Selected Papers from “Going Romance,” Groningen, 28– 30 November 2002. Amsterdam: John Benjamins, 253–68. Wexler, K., Schaeffer, J., and Bol, G. (2004b). Verbal syntax and morphology in typically developing Dutch children and children with SLI: How developmental data can play an important role in morphological theory. Syntax, 7(2): 148–98. Wheeler, Max (2005). The Phonology of Catalan. Oxford: Blackwell. Wheeler, S. C. (1972). Attributives and their modifiers. Noûs, 6: 310–34. White, A., Baier, R., and Lidz, J. (2011). Prediction and subcategorization frame frequency in the developing parser. Paper presented at CUNY Conference on Sentence Processing, Stanford University. White, Katherine S., Peperkamp, Sharon, Kirk, Cecilia, and Morgan, James L. (2009). Rapid acquisition of phonological alternations by infants. Cognition, 107: 238–65. White, Lydia (1982). Grammatical Theory and Language Acquisition. Dordrecht: Foris. Whitehouse, A. J. O., Bishop, D. V. M., Ang, Q. W., Pennell, C. E., and Fisher, S. E. (2011). CNTNAP2 variants affect early language development in the general population. Genes, Brain, and Behavior, 10: 451–6. Whitehurst, G., Ironsmith, M., and Goldfein, M. (1974). Selective imitation of the passive construction through modeling. Journal of Experimental Child Psychology, 17: 288–302. Whitney, William Dwight (1889). Sanskrit Grammar. Cambridge, MA: Harvard University Press. Whorf, Benjamin L. (1940). Linguistics as an exact science. Technology Review, 43: 3–8. Wiehagen, R. (1977). Identification of formal languages. In A. L. de Oliveira (ed.), Mathematical Foundations of Computer Science, Lecture Notes in Computer Science. New York: Springer- Verlag, 53: 571–9. Wiehagen, R., Frievalds, R., and Kinber, E. (1984). On the power of probabilistic strategies in inductive inference. Theoretical Computer Science, 28: 111–33. Wigg, E. H., Secord, W. and Sabers, D. (1989). Test of Language Competence-Expanded Edition. San Antonio, TX: The Psychological Corporation. Wijnen, F. (1997). Temporal reference and eventivity in root infinitives. MIT Occasional Papers in Linguistics, 12: 1–25.

958 References Wijnen, F., Krikhaar, E., and den Os, E. (1994). The (non) realization of unstressed elements in children’s utterances: A rhythmic constraint? Journal of Child Language, 21: 59–83. Wilder, C. (1997). Some properties of ellipsis in coordination. In A. Alexiadou and T. A. Hall (eds), Studies in Universal Grammar and Typological Variation. Amsterdam: John Benjamins, 59–107. Williams, A. (2005). Complex causatives and verbal valence. Unpublished Ph.D. thesis, University of Pennsylvania. Williams, A. Lynn and Dinnsen, Daniel A. (1987). A problem of allophonic variation in a speech disordered child. Innovations in Linguistic Education, 5: 85–90. Williams, E. (1977). Discourse and Logical Form. Linguistic Inquiry, 8: 101–39. Williams, Edwin (1981). On the notions “lexically related” and “head of a word.” Linguistic Inquiry, 12: 255–74. Williams, E. (1983). Against small clauses. Linguistic Inquiry, 14: 287–308. Wilson, Colin (2006). Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science, 30: 945–82. Wittek, A. (2008). What adverbs have to do with learning the meaning of verbs. In M. Bowerman and P. Brown (eds), Crosslinguistic Perspectives on Argument Structure: Implications for Learnability. Mahwah, NJ: Erlbaum, 309–30. Wójtowicz, J. (1959). Drobne spostrzez˙enia o działaniu analogii w mowie dziecka. Poradnik je˛zykowy, 8: 349–52. Wold, D. (1995). Antecedent-contained deletion in comparative constructions. Unpublished manuscript, MIT. Wolf, Matthew (2008). Optimal interleaving: Serial phonology–morphology interaction in a constraint-based model. Unpublished Ph.D. thesis, University of Massachusetts. Wolf, Matthew (2009). Mutation and learnability in Optimality Theory. In Anisa Schardl, Martin Walkow, and Muhammad Abdurrahman (eds), Proceedings of the Thirty-Eighth Annual Meeting of the North East Linguistic Society. Amherst, MA: GLSA Publications, 469–82. Wolf, Matthew (2010). Implications of affix-protecting junctural underapplication. In Jon Scott Stevens (ed.), University of Pennsylvania Working Papers in Linguistics 16.1: Proceedings of the 33rd Annual Penn Linguistics Colloquium. Philadelphia, PA: Penn Linguistics Club. Wolf, Matthew (2015). Lexical insertion occurs in the phonological component. In E. Bonet, M-R. Llorat and J. Mascaró (eds), Understanding Allomorphy: Perspectives from Optimality Theory. London: Equinox. Wolfe, Virgina I. and Blocker, Suzanne D. (1990). Consonant–vowel interaction in an unusual phonological system. Journal of Speech and Hearing Disorders, 55: 561–6. Wolff, J. G. (1977). The discovery of segments in natural language. British Journal of Psychology, 68: 97–106. Wong, P., Schwartz, R. G., and Jenkins, J. J. (2005). Perception and production of lexical tones by 3-year-old, Mandarin-speaking children. Journal of Speech, Language, and Hearing Research, 48: 1065–79. Wonnacott, E., Newport, E. L., and Tannenhaus, M. K. (2008). Acquiring and processing verb argument structure: Distributional learning in a miniature language. Cognitive Psychology, 56: 165–209. Woolford, Ellen (2006). Lexical case, inherent case, and argument structure. Linguistic Inquiry, 37(1): 111–30. Wormith, S. J., Pankhurst, D., and Moffitt, A. R. (1975). Frequency discrimination by young infants. Child Development, 46: 272–5.

References 959 Wright, Richard (2004). A review of perceptual cues and cue robustness. In Bruce Hayes, Robert Kirchner, and Donca Steriade (eds), Phonetically Based Phonology. Cambridge: Cambridge University Press, 34–57. Wunderlich, D. (1997). Cause and the structure of verbs. Linguistic Inquiry, 28: 27–68. Wunderlich, D. (2000). Predicate composition and argument extension as general options: A study in the interface of semantic and conceptual structure. In B. Stiebels and D. Wunderlich (eds), Lexicon in Focus. Berlin: Akademie Verlag, 249–72. Wurmbrand, S. (2001). Infinitives: Restructuring and clause structure. Berlin: Mouton de Gruyter. Wyngaerd, G. W. (1994). PRO-legomena: Distribution and reference of infinitival subjects. Berlin: Mouton de Gruyter. Wynn, K. (1992). Children’s acquisition of number words and the counting system. Cognitive Psychology, 24: 220–51. Xiang, M. (2003). A phrasal analysis of Chinese comparatives. In J. Cihlar, A. Franklin, D. Kaise, and I. Kimbara (eds), Proceedings of the 39th meeting of the Chicago Linguistic Society. Chicago, IL: CLS, 1: 739–54. Xiang, M. (2006). Some topics in comparative constructions. Unpublished Ph.D. thesis, Michigan State University. Xu, F. (2002). The role of language in acquiring object concepts in infancy. Cognition, 85: 223–50. Xu, F. and Spelke, E. S. (2000). Large number discrimination in 6-month-old infants. Cognition, 74: B1–B11. Xu, F. and Tenenbaum, J. (2007). Word learning as Bayesian inference. Psychological Review, 114: 245–72. Xu, Y. and Wang, Q. E. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese. Speech Communication, 33: 319–37. Yamamda, K., Mizuno, M., and Nabeshima, T. (2002). Role for brain-derived neurotopic factor in learning and memory. Life Science, 70: 735–44. Yang, C. (2002). Knowledge and Learning in Natural Language. Oxford: Oxford University Press. Yang, C. (2004). Universal Grammar, statistics, or both? Trends in Cognitive Sciences, 8(10): 451–6. Yavaş, Mehmet (1991). Phonological Disorders in Children: Theory, research, and practice. London: Routledge. Yavaş, Mehmet (2010). Epilogue. Clinical Linguistics and Phonetics, 24(3): 239–41. Yavaş, Mehmet, Ben-David, Avivit, Gerrits, Ellen, Kristoffersen, Kristian E., and Simonsen, Hanne G. (2008). Sonority and cross-linguistic acquisition of initial s-cluster. Journal of Child Language, 22(6): 421–41. Yip, Moira (1992). Prosodic morphology in four Chinese dialects. Journal of East Asian Linguistics, 1: 1–35. Yip, M. (2002). Tone. Cambridge: Cambridge University Press. Yip, V. and Matthews, S. (2005). Dual input and learnability: Null objects in Cantonese- English bilingual children. In J. Cohen, K. T. McAlister, K. Rolstad, and J. MacSwan (eds), Proceedings of the 4th International Symposium on Bilingualism. Somerville, MA: Cascadilla Press, 2421–31. Yoon, Y. (1996). Total and partial predicates and the weak and strong interpretations. Natural Language Semantics, 4: 217–36.

960 References Yoshida, Katherine A., Fennell, Christopher T., Swingley, Daniel, and Werker, Janet F. (2009). Fourteen- month- old infants learn similar- sounding words. Developmental Science, 12: 412–18. Yoshinaka, Ryo (2008). Identification in the limit of k, l -substitutable context-free languages. In Alexander Clark, François Coste, and Laurent Miclet (eds), Grammatical Inference: Algorithms and Applications, 9th International Colloquium, ICGI 2008, Saint-Malo, France, September 22–24, 2008, Proceedings, Lecture Notes in Computer Science. Berlin: Springer, 5278: 266–79. Yoshinaka, Ryo (2011). Efficient learning of multiple context-free languages with multidimensional substitutability from positive data. Theoretical Computer Science, 412: 1821–31. Young, V. (2002). Knowledge and Learning in Natural Language. New York: Oxford University Press. Yu, C. and Smith, L. (2007). Rapid word learning under uncertainty via cross-situational statistics. Psychological Science, 18(5): 414–20. Yu, K. (2011). The learnability of tones from the speech signal. Unpublished Ph.D. thesis, University of California. Yuan, S. and Fisher, C. (2009). “Really? She blicked the baby?” Two-year-olds learn combinatorial facts about verbs by listening. Psychological Science, 20: 619–26. Yuan, S., Fisher, C. L., Gertner, Y., and Snedeker, J.C. (2007). Participants are more than physical bodies: 21-month-olds assign relational meaning to novel transitive verbs. Paper presented at the 2007 SRCD Biennial Meeting, Boston, MA. Yue-Hashimoto, A. O.-K. (1980). Word play in language acquisition: A Mandarin case. Journal of Chinese Linguistics, 8: 181–204. Zamuner, Tania S. (2003). Input-Based Phonological Acquisition. New York: Routledge. Zamuner, Tania S. (2006). Sensitivity to word-final phonotactics in 9-to 16-month-old infants. Infancy, 10: 77–95. Zamuner, Tania S. (2009a). Phonological probabilities at the onset of language development: Speech production and word position. Journal of Speech, Language and Hearing Research, 52: 49–60. Zamuner, Tania S. (2009b). The structure and nature of phonological neighbourhoods in children’s early lexicons. Journal of Child Language, 36: 3–21. Zamuner, Tania S., Gerken, LouAnn, and Hammond, Michael (2004). Phonotactic probabilities in young children’s speech production. Journal of Child Language, 31: 515–36. Zamuner, Tania, Kerkhoff, Annemarie and Fikkert, Paula (2006). Acquisition of voicing neutralization and alternations in Dutch. In D. Bamman et al. (eds), Proceedings of the 30th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press, 701–12. Zamuner, Tania S., Kerkhoff, Annemarie, and Fikkert, Paula (2012). Phonotactics and morpho-phonology in early child language: Evidence from Dutch. Applied Psycholinguistics, 33(3): 481–99. Zangl, R.,and Fernald, A. (2007). Increasing flexibility in children’s online processing of grammatical and nonce determiners in fluent speech. Language Learning and Development, 3: 199–231. Zanuttini, R. (1997). A Comparative Study of Romance Languages. New York: Oxford University Press. Zeugmann, Thomas and Zilles, Sandra (2008). Learning recursive functions: A survey. Theoretical Computer Science, 397: 4–56.

References 961 Zhang, Shi (1990). Correlations between the double object construction and preposition stranding. Linguistic Inquiry, 21: 312–16. Zhou, P. and Crain, S. (2011). Children’s knowledge of the quantifier Dou in Mandarin Chinese. Journal of Psycholinguistic Research, 40(3): 155–76. Zipf, George K. (1935). Psychobiology of Language: An introduction to dynamic philology. Cambridge, MA: MIT Press. Zonneveld, W. and Nouveau, D. (2004). Child word stress competence: An experimental approach. In R. Kager, J. Pater and W. Zonneveld (eds), Constraints in Phonological Acquisition. Cambridge: Cambridge University Press, 369–408. Zucchi, A. (1990). The language of propositions and events: Issues in the syntax and the semantics of nominalization. Unpublished Ph.D. thesis, University of Massachusetts. Zukowski, A. (2001). Grammatical knowledge and language production in Williams syndrome. Paper presented at the Separability of Cognitive Functions: What Can be Learned from Williams Syndrome workshop, University of Massachusetts, August. Zukowski, A. (2004). Investigating knowledge of complex syntax in Williams Syndrome. In M. Rice and S. Warren (eds), Developmental Language Disorders: From phenotypes to etiologies. Mahwah, NJ: Lawrence Erlbaum. Zukowski, A. (2005). Knowledge of constraints on compounding in children and adolescents with Williams Syndrome. Journal of Speech, Language and Hearing Research, 48: 79–92. Zukowski, A. (2009). Elicited production of relative clauses in children with Williams syndrome. Language and Cognitive Processes, 24(1): 1–144. Zukowski, A. and Faigle, J. (2002). Tag questions in normally developing children. Unpublished paper, University of Maryland. Zukowski, A. and Larsen, J. (2004). Tags are learnable, aren’t they? Boston University Conference on Language Development, Boston, MA. Zukowski, A., McKeowen, R., and Larsen, J. (2008). A tough test of the locality requirement for reflexives. In H. Chan, H. Jacob, and E. Kapia (eds), Proceedings of the 32nd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Zuraw, Kie (2000). Patterned exceptions in phonology. Unpublished Ph.D. thesis, UCLA. Zwanziger, E. E., Allen, S. E., and Genesee, F. (2005). Crosslinguistic influence in bilingual acquisition: Subject omission in learners of Inuktitut and English. Journal of Child Language, 32(4): 893–909. Zwicky, Arnold (1970). A double regularity in the acquisition of English verb morphology. Papers in Linguistics, 3: 411–18. Zwicky, A. (1971). In a manner of speaking. Linguistic Inquiry, 11(2): 223–33. Zwicky, A. M. (1985). How to describe inflection. Berkeley Linguistics Society, 11: 372–86.

Index

Introductory Note References such as ‘178–9’ indicate (not necessarily continuous) discussion of a topic across a range of pages. Wherever possible in the case of topics with many references, these have either been divided into sub-topics or only the most significant discussions of the topic are listed. Because the entire work is about ‘developmental linguistics’, the use of these terms (and certain others which occur constantly throughout the book) as entry points has been restricted. Information will be found under the corresponding detailed topics. 2-wh questions, 335–7 Abney, S., 339, 438, 469, 650–1 absolute gradable adjectives, 485, 487 absolute phonotactics, 28–31 abstract knowledge, 32, 163, 169, 172, 187, 204–5, 405–6 abstract representations, 133, 163, 186–7, 203–4, 429, 681 abstract structures, 40, 86, 163, 414, 429, 472, 474 abstraction, 23, 29, 158, 169, 171, 174, 177–8, 304 early, 163–5, 177 accent, 82, 84 acute, 82–3 grave, 82–3 word, 82 accuracy, 119, 122, 129, 187, 733, 738, 740, 814 production, 791 accusative, 219, 261–2, 416–20, 426–8, 430, 442, 444, 538 double, 219–21 accusative pronouns, 419, 538 accusative subjects, 286, 430–1, 433 ACD (antecedent-contained deletion), 474, 489, 516–18

ACDH (A-chain Deficit Hypothesis), 188–91, 194, 196, 198–9, 203, 248, 257, 265 to Universal Phase Requirement (UPR), 189–91 A-chain Condition, 535–7 A-chain Deficit Hypothesis, see ACDH A-chains, 202–4, 248–50, 252–3, 258–60, 262–5, 276, 535–6, 539 formation, 276, 539, 543 maturation, 248–9, 255–6, 262–3, 276 non-trivial, 249, 276 acoustic cues, 136, 149, 684 acoustic properties, 679–80 acoustic saliency, 31, 38, 42 acquisition, 1–3, 256–7, 310–11, 386, 390–1, 664–8, 672–6, 697–9; see also Introductory Note A-movement, 231, 257, 260, 278 binding and coreference, 520–46 comparatives and degree constructions, 463–97 complements, 279–309 cross-linguistic, 386, 546, 810 disjunction, 554–6 genitive case, 450–1 Italian, 122, 534

964 Index acquisition (cont.) lexical, 36–7, 42, 133–4, 149, 153 long-distance questions, 326, 332 modality, 370–4 models, 109, 696, 712, 717, 723 mood, 375–84 morphology, 87–153 passive, 185, 248–9 paths, 279, 295, 308 perfective, 604, 606, 609 perfective-imperfective, 604–9 phonetic category, 676, 679 phonological, 8, 11, 41, 153, 790–1, 793, 807 phonotactic, 30, 32, 34–5 possessives, 435–59 P&P, 697–9, 706, 708 prepositions and particles, 206–29 questions, 310–40 recursive possessives, 443, 459 second language, 797 semantics, 461–629 sound systems, 5–86 stranding, 213–14 stress, 69–78 syntax, 134, 155–459 telicity, 600–604 velars, 10 acquisitional evidence, 110, 221, 223, 228 actional passives, 181, 183, 186–90, 198–200, 203–4, 242, 244, 252 actional verbs, 181–4, 187–91, 198, 200–201, 205, 241–7, 254 active clauses, 273, 276 active sentences, 180–1, 183–4, 190, 195, 200–201, 203, 241–2, 252 active voice, 179, 182, 243–4, 246, 253 active/passive alternations, 172, 183 activity verbs, 226, 238, 596 act-out tasks, 168–9, 205, 240, 244, 246, 292–3, 545 additive constraint interactions, 735–6 additive genetic variance, 774–5 ADHD (attention-deficit hyperactivity disorder), 785, 788 adjectival passives, 188–92, 200–201, 236, 250, 252–3 adjectival resultatives, 105, 107, 109–10

adjectives, 102, 104, 281, 283, 464–6, 478, 482–6, 659 attributive, 93–5, 97–8, 103, 108, 290 comparative, 479–80, 763 gradable, see gradable adjectives novel, 478–9 and passive participles, 191–2 adjunct clauses, 283, 295, 299, 325–6 adjunct extraction, 319 adjunct position, 247, 316, 318–19, 327, 335, 337 adjunct questions, 319, 400 adjunct status, 180, 196, 199 adjuncts, 179, 204, 319, 323, 808–9, 814 adnominal possessives, 435–41, 443, 446, 448, 454 adolescents, 754–5, 757, 763–4 adpositions, 206–10, 212 adult asymmetry, 57–8, 62 adult data, 23, 103, 335, 348, 443 adult grammars, 43–7, 50–1, 54–5, 57, 63–4, 66, 407, 744–5 adult knowledge, 563–4 adult languages, 54–5, 285–6, 292–3, 332–3, 356–7, 359, 375–9, 425–8 adult lexicon, 600–601, 629 adult speech, 195, 249, 262, 359, 386, 578, 597, 623 adult systems, 46, 57–8, 66, 77, 86, 103, 374, 385 adult transcribers, 58, 61, 85 adult-directed speech, 134, 140–1 adultlike performance, 81, 246, 527, 529, 533, 539, 543, 551 adultlike responses, 528, 540, 542, 561 adults, 186–8, 201–3, 493–5, 509–15, 517–19, 614–20, 624, 670–1 English-speaking, 12, 507, 515, 561–2, 761 adverbial modification, 173, 487 adverbs, 352, 487, 572, 585, 702 sentence, 352–3 adversarial teachers, 656–7 affectedness, 185 affirmative questions, 314–15, 322–4, 758 affixes, 111–12, 115–17, 124–5, 130, 305, 332, 424 morphological, 116, 126 verbal, 207 affricates, 8, 10, 33–4, 811, 814–15

Index 965 age, 106–8, 143–7, 168–74, 250–6, 362–5, 393–7, 442–5, 775–9 control groups, 363–5 age groups, 130, 172, 175, 241–2, 270, 275, 277, 493–4 age ranges, 2, 107, 167–9, 171, 173–4, 176, 214, 363 agent by-phrases, 199, 237, 241 agent nouns, 98, 104 agent subjects, 238, 242, 244–5 agent theta-role, 199 agentive-transitive verbs, 230–1 agents, 158–62, 198–200, 230–1, 237–42, 247, 254–6, 298, 304 potential, 184, 247–8 agreeing verbs, 348, 433–4 agreement, 197–8, 286, 347–8, 357–8, 374–5, 413–17, 421–31, 433–4 in adult language, 415–17 and case, 414–34 in child language, 422–5 errors, 347, 423 explaining patterns, 425–34 morphology, 423–4, 433 relations, 193–4, 197 specifications, 430–1 subject, 381, 422, 425, 427, 430–3 verb, 170, 362, 416, 427 AIC (Akaike’s Information Criterion), 774 AIH (Argument Intervention Hypothesis), 202–3 Albright, A., 41, 118, 121–2, 126–7, 131, 741, 745 algorithms, 636, 641, 678, 685, 703–4, 727, 731–3, 738–40 genetic, 703–4, 710, 717 Alishahi, A., 178 allomorphs, 116–17, 120–1, 129–30 multiple, 124, 128–9 plural, 121, 131 allophones, 153, 676, 681–2, 798–9 allophonic cues, 135, 151 allophonic rules, 24, 798–9 alternations, 111–14, 116, 118–19, 123–4, 128–9, 131, 796, 799–801 active/passive, 172, 183 dative, 164, 170, 176 mood, 367–85

morpho-phonological, 113, 126, 128–9, 132, 794, 796, 799, 806 observed, 112, 121–2, 126 phonological, 114, 117, 726 phonotactically-motivated, 112–13 voice, 112, 119, 179–205 vowel-zero, 119, 127 alternatives lexical, 622–4, 626 scalar, 621–3, 627 Altvater-Mackensen, N., 20–1 alveolar stops, 66, 124, 794, 797 ambient language, 12–13, 29–30, 32–3, 35, 37, 42, 69, 79 ambiguity, 574, 576, 584–5, 701–3, 705, 708, 710, 712–13 resolution, 512–13 structural, 729, 737–9, 741–2, 747 ambiguous sentences, 325, 511, 513, 713 A-movement, 196, 198, 202, 208, 230–78, 281 acquisition, 231, 257, 260, 278 in adult grammar, 231–6 covert, 262 non-passive, theoretical approaches, 256–7 raising-to-subject, 264–7 1 unaccusatives, 257–64 analogical knowledge, 126–7 analogy, morpho-phonological acquisition via, 126–7 anaphors, 272, 521, 523, 688, 706 Angluin, D., 639–40, 643–4, 647–8, 651–2, 654–8, 661 animacy, 168, 171–3, 245, 269–70, 273, 577 animate subjects, 168–9, 268–9 answers, fragment, 312–13 antagonistic faithfulness constraints, 802–6 antecedent-contained deletion, see ACD antecedents, 234, 316, 489, 516–18, 524, 528, 541–5, 688–91 anticanonical verbs, 244 antonyms, 480–1 aphasia, 8, 527–8 applicative arguments, 171 approximate convergence, 641–3, 650 Archer, S.L., 31 Argument Intervention Hypothesis (AIH), 202–3

966 Index argument questions, 319 argument structure, 157–78, 230, 248, 275, 304, 411, 415, 590 analysis, 176–8 developmental findings, 165–76 mapping/linking, 162–3 terms and preliminaries, 157–60 thematic relations, 160–1 theories, 163–5 arguments applicative, 171 experiencer, 270–1 external, 193, 198–200, 204, 247, 256, 264, 266, 504–6 implicit, 237 internal, 179, 192–3, 195–8, 259, 261–2, 266, 503–4, 532 semantic, 230, 571 theme, 161–2, 196, 244, 256, 260 articulation, 21, 33, 116, 775–6, 779, 782, 784–5, 796–7 artifacts, experimental, 385, 524–5, 528, 531, 537, 542 artificial language learning experiments, 178, 634, 662 artificial languages, 35, 75, 139–41, 149, 662, 670, 684 aspect coercion, 598–9, 605, 609–10 cross-linguistic variation, 587–94 grammatical, 587–8, 590–3, 595, 597, 599, 601, 603–5, 608–10 imperfective, 590–3, 596, 598–9, 601, 604–10 learnability, 594–6 lexical, 356, 587–90 markers, 570, 594–7, 599, 605 morphemes, 596, 598 morphology, 587, 599 perfective, 590–1, 596, 598–9, 605–8 spontaneous speech (SS), 596–600 aspect-on-the-verb bias, 604, 609 aspectual classes, 588, 590, 592, 596, 599 aspectual primitives, 594–5, 609–10 aspectual tenses, 593, 606 assimilation, 63, 112–13, 116, 130, 303 consonant-to-consonant, 55, 57

asymmetry, 181–2, 204–5, 321–2, 399, 409–10, 480, 505–6, 608–9 adult, 57–8, 62 subject–object, 397, 399, 405–7, 409–12, 766 atelic predicates, 589, 591, 597–600, 603, 605 atelic verbs, 596, 599–600 attention-deficit hyperactivity disorder (ADHD), 785, 788 attributive adjectives, 93–5, 97–8, 103, 108, 290 atypical populations, 2–3, 749–816 auditory deficits, 753–4, 784 autism, 299, 752, 758, 762–3, 767–9, 787–8 autosegmental phonology, 68, 82–3, 85–6 autosegmental representation, 81–6 aux-doubling structures, 322–3, 340 auxiliaries, 189, 191–2, 314–18, 341–2, 345–6, 593, 691–2, 762–4 selection, 197, 259–60, 264 auxiliary verbs, 311, 314–18, 322, 341, 416, 691–2, 763 main clause, 691–2 Avrutin, S., 356, 524–5, 527, 530–1, 538, 541–4 babbling, 9, 18 Babyonyshev, M., 193, 249, 256, 259–63, 266, 421 Bachrach, A., 626–7 Bantu languages, 250–1, 253, 255, 381 bare infinitives, 423 bare nouns, 570–1, 578–80 bare plural generics, 579, 583 bare plurals, 569–7 1, 573, 575, 578–80, 585–6 bare stems, 90, 94, 342–4, 355, 360–3, 424 bare verbs, 316, 377, 431, 433, 599 bare-stem compounds, 89–90, 94–5, 98, 103–4, 108 bare-stem endocentric compounding, 92, 98, 104–10 Barner, D., 484, 493, 500, 502, 508, 622–7 Bartlett, C.W., 784, 786 Barton, D., 14, 20, 22, 44, 811 base word order, 441, 449, 452, 454, 456 Basque, 94, 105, 225, 331 Bates, E., 81, 130, 343, 358, 396, 652–3, 779 Bauer, L., 90–1, 93–4 Bayesian inference, 664, 678–9, 690, 694–5

Index 967 Bayesian learners, 25–6, 653, 666–7, 676–7, 690–4 Bayesian modeling, 26, 653, 664–7, 672–9 examples, 679–95 BCD (Biased Constraint Demotion), 743, 746 BE omission, 192–3 Beard, R., 93–7 Beard’s Generalization, 94–5, 97, 103 behavior genetics, 773, 780–3 belief verbs, 174 beliefs, 118, 174, 279, 300, 567 false, 282, 295–8 Bellugi, U., 310, 314, 319, 322, 349, 441, 754–5 be-passives, 181, 184–5, 204, 246 Berger, C.V., 534, 537 Berman, R., 171, 188, 242, 250–1, 253, 397, 451–2, 596 Bernhardt, B., 41, 46, 119, 124, 790, 792–5, 808, 810 Bertoncini, J., 38–40, 133 Berwick, R.C., 23, 249, 406, 640, 659, 668, 700, 710–11 Bhatt, R., 470, 472–3, 503 Biased Constraint Demotion (BCD), 743, 746 biases, 70–1, 77, 293–4, 514–15, 601, 669–70, 717, 743–4 aspect-on-the-verb, 604, 609 inductive, 664–5, 667, 669, 671, 673–5, 677, 679, 681 innate, 308, 601, 671 ranking, 124, 743–4, 746 bigram model, 683–4 Bijeljac-Babic, R., 39–40 bilingual children, 393, 397–9, 405 binding and coreference, 523–5 domains, 539, 544–5, 787 theory, 521–3 Chomsky, 521, 523 Binomial Test, 218–22 biological maturation, 1, 242, 265, 278, 357 Bishop, C., 464, 478, 618, 753–4, 767–70, 775, 782, 789 bisyllabic words, 65, 137, 140–1, 143, 145–7, 149 bivariate heritability, 775–7 Bloom, P., 291, 294, 315, 401, 411, 441–2, 506–7, 526

Boersma, P., 19, 23, 50, 74, 730–5, 738, 743, 745 Boolean semantics, 554–5 bootstrapping, 138, 177, 676, 780 semantic, 165, 178 syntactic, 164, 167, 177, 231, 600–601 Borer, H., 188, 242, 246, 249–50, 252–3, 260, 440, 590 Bosch, L., 30, 38, 70, 147 not both interpretation, 561–3 bound morphemes, 342, 455 boundaries, 13, 15, 134, 139, 683 clause, 80, 136–7, 273, 552, 561 final, 591, 599, 605–6 initial, 591 phrase, see phrase boundaries syntactic, 136, 138 word, 32, 34–5, 136–7, 139–40, 142–3, 147, 668–9, 682–4 Bowerman, M., 169–70, 172–3, 208, 212, 441–2, 455, 595–6 brain, 2, 16–17, 42, 640, 699, 778 Braine, M.D.S., 19, 165, 169, 181, 240, 549, 614–15, 618 branching, 92, 811–12, 815 onsets, 45, 48–9, 59–61, 66 Brandone, A., 166, 569 Brent, M., 36, 134, 139, 149, 669 Buckley, E., 27, 33, 40, 44, 46, 52, 54 Budwig, N., 181, 184, 246, 418 Bulgarian, 218, 316 Burzio, L., 117, 183, 257–8 Bybee, J., 31, 116, 125–7, 131, 164, 368, 601 by-phrases, 179–80, 182–4, 189, 196, 198–200, 202, 245–7, 252–3 agent, 199, 237, 241 CAH (Canonical Alignment Hypothesis), 256, 277 and passives, 199–202 candidate genes, 784–8 canonical semantics, 231, 277 canonical verbs, 244–5 Cantonese, 79, 132, 194, 282, 298–9, 397, 765 capacities, 305, 499–501, 519, 712, 740, 743, 747 processing, see processing capacity cardinality, 492, 499–500, 502, 506, 658 caregivers, 170, 173, 252–4, 274, 597, 699

968 Index Carey, S., 464, 481–2, 499–501, 506, 508, 600, 622, 687 case in adult language, 415–17 and agreement, 414–34 in child language, 417–22 dative, see dative errors, 348, 418, 421–2, 428–9, 432, 434 pronoun, 418 subject, 421, 427 explaining patterns, 425–34 genitive, 261–4, 417, 420, 422, 427–8, 436–8, 442–4, 447–51 Catalan, 30, 193–4, 357, 360–2, 424–5, 530, 533, 537–9 categories functional, 34, 283, 285–7, 306, 391, 413, 444, 448 grammatical, 670, 673 lexical, 13, 25, 144, 158 morphological, 115, 126, 592, 606 phonetic, 20, 25, 676, 679–82 syntactic, 90, 150, 206–7, 211, 326, 358–9, 390, 688–91 categorization, 164, 674, 676, 680–1 lexical, 284 phonetic, 13, 680 causal relationships, 85, 304, 792 causative constructions, 183, 199 causative overgeneralizations, 172, 175 c-command, 233, 292, 313, 517–18, 521, 529, 541–2, 758 CD (Constraint Demotion), 729–31, 733, 738, 743 CEE (Clitic Exemption Effect), 530–2 CG (Construction Grammar), 59–60, 162 clusters, 59–60 derived, 59–61 outputs, 60, 62 strings, 59–61 Chafetz, J., 180–1, 186, 243–4, 253–4 chains, 188, 233, 543; see also A-Chain Markov, 678, 685, 710 Chater, N., 653–5, 659–62, 665, 673, 676 Chien, Y.-C., 353, 520, 523–5, 530–2, 540–1, 544–6, 571 Chierchia, G., 529, 548, 570–2, 578, 613, 615, 621, 623

child grammars, 43–7, 50–1, 57–8, 66–7, 190, 193–4, 331–4, 407–9 containing less than adult grammars, 50–4 containing more than adult grammars, 54–7 as possible grammars, 45–50 toolkit, 50–7 unexpected complexity, 64–7 child language acquisition, see acquisition child phonology, 7 child productions, 11, 19–20, 83, 419, 423 child speech, 115, 249, 255, 274, 312, 370–1, 383, 386 child-directed speech, 98, 153, 168, 175, 242, 251, 506–7, 692–4 CHILDES, 222, 224, 226, 274–5, 315–16, 378, 490–1, 565 Chinese, 85, 221, 388, 391–2, 396–9, 544–5, 571, 576–7 children, 192, 545–6 Mandarin, 78–80, 206–7, 211, 324, 471, 576, 592–3, 596–7 Chomsky, C., 291 Chomsky, N., 17–18, 232–6, 264–5, 280–1, 344–5, 521, 667–8, 696–7 Chomsky Hierarchy, 636–9, 646–9, 658, 660–1 Chomsky’s Binding Theory, 523 Christofidou, A., 449–51 chromosomes, 754, 783–4, 786, 788 Cimpian, A., 568–9, 579–82 Clahsen, H., 101, 122, 357, 423, 482–3, 757, 761, 763 Clark, R., 701–8 Clark’s GA model, 704–5, 709, 713, 716 classes languages, 142, 645, 647–8, 652, 657–8, 662 recursive, 644, 648, 651, 660–1 superfinite, 646–8, 651, 653, 655, 657–8 Classic OT, 726–7, 730–3, 735–6 classical triggering model, 714, 718, 723 classification, 24, 208, 584, 588, 590 coarse, 642–3 clausal comparatives, 472–3, 491 clause boundaries, 80, 136–7, 273, 552, 561 clause structure, 357, 415, 427 clauses, 192–3, 280–1, 285–90, 295, 305, 330, 466–7, 522 active, 273, 276

Index 969 comparative, 467, 473 complement, see complement clauses embedded, see embedded clauses finite, 292, 294, 306, 331, 345, 400–401, 403, 599 infinitival, 219, 281, 290–1, 293, 330, 335 lower, 266, 281–2, 291, 293, 297–300, 302–3, 337 main, 399, 408–10, 466, 468–72, 474, 488–91, 539, 762 matrix, 266–7, 272, 276–7, 315–16, 325–6, 334–5, 338–9, 518 nonfinite, 292, 301, 345, 355, 599, 762 relative, 214, 315, 337, 514, 552–3, 757, 759, 764–6 structure, 341–66 subordinate, 306, 325, 394, 405, 409–10, 466–73, 488, 490 tensed, 280, 282–3, 285, 294, 301, 307, 394, 518 Clements, G.N., 29, 47, 59, 63, 115, 637, 798, 808 clinical markers, 766 clinical treatment, 805, 813–14 Clitic Exemption Effect, see CEE clitic pronouns, 530–1, 533–4, 537, 539 clitics, 115, 260, 360, 530–4, 539–40, 546, 765 object, 259–60, 537, 539 reflexive, 259–60, 263–4 syntactic, 530, 532–4 closed syllables, 72, 76, 115, 130 cluster reduction, 46–7, 113, 793, 797 clustering, 23–4 clusters, 32, 35, 46–9, 59–61, 63, 807–8, 810–12, 814–15 CL, 59–60 consonant, see consonant clusters left-edge, 46, 59 liquid, 60, 809, 811 nasal, 808–10 onset, 31, 808, 810–12, 815–16 sonorant, 808–9, 812 true, 808–9, 814 coarticulation, 135, 147, 149, 151 coda consonants, 34, 806, 810–11 codas, 28, 34, 40, 52, 58, 72, 743, 810–11 coercion, 598–9, 605–6, 608 aspect, 598–9, 605, 609–10

cognition, 477, 479, 673, 752–4, 771–2, 776, 778, 789 cognitive deficits, 751, 755 cognitive impairments, 755–6, 763, 769 cognitive processes, 144, 665, 770, 772–3 cognitive resources, 1, 405, 413, 426, 508, 617, 685–6 CoLAG domain, 711, 718–20, 722 Colledge, E., 777, 779, 782 color, 159, 478, 486, 488, 493, 507, 517–18, 554 combinatorial speech, 394–5, 398, 405, 407, 410 commission, errors of, 287, 343, 423, 702, 718 comparative adjectives, 479–80, 763 comparative clauses, 467, 473 comparative constructions, 465–7, 477–8, 480, 485, 487 comparative expressions, 463–6, 477, 479, 497 comparative markers, 464, 466, 470, 474, 477 comparative morphemes, 466, 469–72, 476, 482, 490, 763 comparative morphology, 464, 477, 486, 497 comparative syntax, 217, 221 comparatives, 463–79, 481, 483, 485–91, 493, 495–7 acquisition, 478, 497 clausal, 472–3, 491 cross-linguistic variability, 470–2 deconstruction, 466 ellipsis and covert movement, 472–4, 488–91 equatives, 465, 475, 496–7 and language learners, 476–7 more v -er, 482–3 more v less, 480–2 MP (measure phrases), 130, 235–7, 466, 470, 490–6 phrasal, 473, 491 semantic analyses, 467–70 standards of comparison, 483–5 complement clauses, 294–5, 297, 304, 476, 490 finite, 291, 294 complement recursion, 284, 305, 308 complement types, 279, 288, 308 complementary distributions, 25, 34, 585, 797–8, 804 complementation, 279, 281, 294–5, 298, 305, 308–9

970 Index complementizers, 207, 210, 219, 292, 295, 306, 331, 693 obligatory, 759–60 prepositional, 210, 219–20, 223, 228 complements, 208–9, 237, 266, 411, 469, 533, 690, 702 acquisition, 279–309 embedded, 337 false, 297, 301 and fast mapping, 303–5 finite, 293–4, 296, 300, 518 finite tensed clauses, 294–302 infinitives and control, 273, 287, 290–4, 300–301, 303–4, 306 nonfinite, 291, 293–4, 518 raising structures, 294 recursion, 284, 305–8 small clauses, 285–90 tensed, 282, 287, 297, 299–300, 303–5 verbal, 236–7, 273 wh-complements, 302–3 completion, 478, 551, 591, 593, 603, 605, 733 entailment, 591, 603, 606–7, 609 complex predicates, 105, 107, 109, 289 complex questions, 326, 326–9, 337 types, 326–9 yes/no, 691–3 complexity, 64–5, 305–6, 322, 332, 574, 582–3, 586, 638–9 computational, 639, 646, 656, 660 phonological, 411, 815 semantic, 605–6, 608–9 unexpected, 64–6 compositional semantics, 102, 210 compositional telicity, 590, 601–4, 609 compound phrases, 90 compound word formation, see compounding compounding, 71, 89–110, 116, 170, 224–9 creativity, 91–2, 94, 105 cross-linguistic variation, 91–6 review, 89–96 synthetic -ER, 96–104 terminology, 89–91 compounds, 89, 91, 94–101, 103–4, 106, 108–10, 129, 308 bare-stem, 89–90, 94–5, 98, 103–4, 108

definition, 89–90 endocentric, see endocentric compounds lexicalized, 93–4 nominal, 93, 95 root, 90–1 synthetic, 95–9, 103 comprehension, 181–2, 184, 188, 200–201, 203–5, 312, 405–6, 608–10 experiments, 326, 333, 335–6, 408, 458–9, 578, 585 methodology, 313, 334, 337 tasks, 107, 131, 240, 242, 251–2, 278, 292, 355 comprehension-based methods, 165–6, 171 computable data presentations, 642, 651, 653, 655, 657, 661 computable positive stochastic data, 644, 646, 648, 652 computation, 18, 312–13, 560, 615, 617, 633, 636, 674 computational approaches to parameter setting, 2, 696–724 models, 701–22 computational complexity, 639, 646, 656, 660 computational learning, 633–4, 645–62 computational modeling, 41, 698, 700–701, 705–6, 716–20, 722–3, 725–6, 747 computational resources, 709, 711, 726 computer science, 23, 633, 649, 664 conceptual domains, 568–9, 576 conceptual representations, 157, 166, 173, 576 conditional probability, 668, 687 conjunction, 142, 548, 550, 552, 554, 561–3 conjunctive interpretation, 306, 550–2, 554–6, 558–60 connectionism, 640, 666, 679 Conroy, A., 313, 321, 513–15, 522, 524, 528–9 conservatism, 176, 308, 709 grammatical, 215–16, 280, 343 conservativity, 165, 169, 502, 504 consonant clusters, 30, 40, 130, 791, 807 word-final, 30, 37 consonant harmony, 44, 54, 57, 66–7, 793 consonant mutation, 132 consonants, 8–9, 21–2, 32, 39–40, 54, 670, 797, 810–11 coronal, 33, 804 dorsal, 803–4

Index 971 final, 33, 142, 455, 807 initial, 34, 777, 804 lingual, 796–7 velar, 21, 796, 798, 803 voiceless, 34, 113 word-final, see final consonants conspiracies, 791, 807–8, 811–15 Constraint Demotion, see CD constraint hierarchies, 57, 802, 804, 807–8, 815–16 constraint interactions, additive, 735–6 constraint ranking, 3, 43, 45, 54, 73, 727, 737 constraint reranking, 43, 46–50, 73–4 constraint violations, 727, 734–5, 737, 742 constraint-based approaches, 73, 116, 725, 801–15 constraint-based frameworks, 744, 793, 802, 806, 815 constraint-based grammars, 123, 131, 747 constraint-based learning, 124, 725–6, 739, 747 constraint-based phonology, 123–5 constraints, 43–7, 49–50, 60–1, 63–6, 666–7, 726–30, 732–7, 742–4 lexical, 446, 449, 742 markedness, 45–8, 51, 56, 74, 743, 746, 802–6, 809–14 phonotactic, 28, 36, 147 physiological, 58, 62 positional, 48, 50 positional faithfulness, 49–50 pragmatic, 521, 524 prosodic, 125, 455 semantic, 202, 242, 245, 446, 449 soft, 664–5 weighted, 50, 734, 736 construal coreference, 524, 527, 531, 537, 540 first-order, 307 second-order, 307–8 Construction Grammar, see CG constructions, 216–21, 239–43, 253–7, 262–4, 266–8, 271–6, 465–6, 475–6 causative, 183, 199 comparative, 465–7, 477–8, 480, 485, 487 degree, 463, 465–7, 469, 471, 473–7, 479, 481, 495–7

existential, 494, 585 particle, 106, 108, 211–12, 223–9 passive, 187, 239–42, 247, 251–6, 276 possessive, 436, 440–1, 447–53, 456–7 raising, 234, 264–5, 268 raising-to-object, 231, 257, 273 raising-to-subject, 256–7, 264–5, 271 rare, 458–9 separable-particle, 105, 107–8, 216, 223–4, 227–8 construct-state expressions, 90, 94 content words, 135, 149–50, 172 contentful wh-phrases, 327–8, 330–2, 335 context-free grammars, 644, 651–2, 660–1 probabilistic, 651–2, 676 context-free languages, 650, 661 contexts, 17–19, 442–3, 485–7, 508–9, 513–15, 536–7, 616–20, 794–6 contrastive, 513, 534, 542 discourse, 173, 383, 463, 513–16, 529, 580 experimental, 331, 371–3, 377 modal, 377, 380 morphological, 114, 116, 121 non-finite, 401, 408 obligatory, 379, 394, 403, 422–4, 451, 457 syntactic, 158, 168, 174, 304, 508–9, 537, 570 contextual scales, 623–4 continuity, 203, 248, 343, 578, 614, 790 continuous speech, 133–53 interactions and combinations between segmentation cues, 150–2 segmentation cues at word level, 138–50 segmentation of prosodic units, 136–8 contradiction, 591, 613, 761, 814 contrastive contexts, 513, 534, 542 contrastive stress, 534 control, 106, 111, 270, 281–2, 290–4, 308–9, 579–80, 787 motor, 44, 50 non-obligatory, 292–3 obligatory, 293 structures, 269, 281, 291 verbs, 268–70 words, 144, 146, 149 control groups, 186, 443, 483, 756 adult, 554 age, 363–5

972 Index controllers, 281, 289, 292 convergence, 43, 50, 72, 74, 164, 640–4, 647, 652–3 approximate, 641–3, 650 exact, 641–4, 652–3 converging evidence, 118, 168, 221 conversational implicatures, 475, 602, 613 co-occurrence, 33, 140–1, 433, 487, 672 Coopmans, P., 524, 530, 534–6, 538–9, 541–2, 544 coordination, gestural, 62 copula, 192, 281, 345, 349–50, 361, 373, 422 coreference, 313, 517–18, 520–5, 527–33, 535–7, 539, 541, 543 and binding, 523–5 construal, 524, 527, 531, 537, 540 local, 522, 524–8, 531–2, 534, 537, 539–40 pragmatic, 522, 524, 534 Coronal Backing, 793, 797, 799 coronal consonants, 33, 804 coronal stops, 63, 116, 119, 125, 794 coronals, 21, 54–6, 60, 63, 794–9, 804–5 corpora, 122, 168–9, 215, 218–22, 352, 381, 383, 433 longitudinal, 105–7, 215–16, 218, 220, 222, 224, 227 correct grammar, 512, 653, 668, 733 correct values, 697–8, 702, 708, 714, 722–3 correlations, 106, 108, 122–3, 224, 400–401, 422, 430, 775 genetic, 771, 775–80, 782 partial, 106, 108 co-twins, 773–4, 781 count nouns, 130, 571 counterexamples, 8, 93–4, 217, 433, 567, 598 covariance, 774–5 covert contrasts, 44, 59, 61–2, 797 covert movement, 262, 472–4, 476, 488–9 Crago, M., 194–5, 253–5, 762–3, 765–6, 768, 786 Crain, S., 184, 245–50, 324–5, 334–7, 528–9, 549–51, 555, 559 creativity, 91–2, 94, 105 cross-linguistic acquisition, 386, 546, 810 cross-linguistic data/evidence, 177, 249, 251, 256, 340, 385, 765, 768 cross-linguistic generalizations, 98, 162, 341, 423, 768

cross-linguistic variation adnominal possession, 436–41, 454 adpositions and particles, 207 aspect, 587–94 clitics and strong pronouns, 530–4 compounding, 91–6 developmental disorders, 765–8 discjunction, 551–6 lexical and grammatical aspect, 592–4 prepositions and particles, 206, 209–12, 216, 223–4, 226–8 cues, 134–6, 138–9, 141, 146–7, 150–3, 507–8, 581, 668–9 allophonic, 135, 151 distributional, 128, 151, 497 linguistic, 134, 579, 595 phonotactic, 35–6, 135, 152 pragmatic, 570, 578, 585 prosodic, 81, 135–6 segmentation, 136, 138, 145, 150–1 statistical, 668, 670, 672 transitional probability, 669, 671 culmination, 110, 588, 591, 599, 601, 605–6 cumulative effects, 540, 735–6, 745 Cunningham, J., 541–2, 544 Curtin, S., 31, 70, 74, 140, 143, 151 CV outputs, 50, 52–3 cyclic movement, successive, 310, 326–7, 330, 333, 335–6, 339–40 Daelemans, W., 75–6, 707, 716–17 Dahl, Ö., 588, 592 Danish, 209, 211, 217, 392 data presentations, 640, 642–4, 647–8, 650–2, 654–5, 657–9, 661–2 computable, 642, 651, 653, 655, 657, 661 dative, 172–3, 208, 219, 290, 304, 422, 440, 447–9 alternation, 164, 170, 176 double-object, 164, 172, 174, 176 prepositional, 164, 171–2, 176 Davidson, L., 74, 361–2, 424 de Swart, H., 590, 593, 598, 605 deaf children, 299, 577, 586, 765 declarative sentences, 350, 698 declaratives, 252, 286, 314, 349–50, 395, 401, 652, 698

Index 973 decoding, 230, 429, 712–14, 784 default operators, 573, 582–3 default values, 73, 392, 557–8, 713, 720–1, 723 defaultist accounts, 623–4 deficits, 22, 751–4, 767 auditory, 753–4, 784 cognitive, 751, 755 language, 753, 786, 788 defining properties, 285, 553, 648, 811 definite determiners, 571, 579, 603 definite plural generics, 571, 580 definite plurals, 561, 571, 575, 578–81 degree constructions, 463, 465–7, 469, 471, 473–7, 479, 481, 495–7 Delay of Principle B Effect, see DPBE deletion, 44–5, 49, 51, 516–17, 800, 806, 812, 814 antecedent-contained, see ACD final consonant, 799–800, 807 Demirdache, H., 303, 331, 590 demotion, 47–9, 57, 803, 805–7, 814 Demuth, K., 51, 53–4, 79–80, 83, 126, 170–1, 194, 250–3 denotation, 110, 161, 470–1, 487, 493, 504, 582, 589 deontic modality, 367–8, 370–5, 378–80, 384–5 dependencies, 281, 299, 539, 683, 763 non-adjacent, 670, 672 referential, 520–2, 527, 539, 543 derivational morphemes, 99, 101, 104, 360, 369 derivational suffixes, 94–5, 97, 104 derivations, 232, 234–6, 264–5, 332, 612, 614, 794–5, 799–800 economical, 214, 332 derived CG clusters, 59–61 derived word order, 441, 454, 456 determiner phrases, see DPs determiners, 221, 445, 449, 503–5, 508, 570–2, 574–6, 579 definite, 571, 579, 603 nonconservative, 503–4 personal, 470 determinism, 705, 722 deterministic learning, 665, 722–3 deterministic triggering model, 698, 705 developmental data, 73, 76–7, 85, 612, 614, 628, 717–18, 722

developmental disorders, 751–70, 788 conditions under which language impairments occur, 752–8 cross-linguistic variation, 765–8 how language goes wrong, 758–68 questions, 751–2 work still to be done, 768–70 developmental linguistics, 1, 68, 747, 751 developmental psycholinguistics, 3, 23, 341–2, 355, 401, 408, 633–63, 695 developmental stages, 52–3, 167, 379, 423, 443, 453, 455, 457–8 developmental trajectories, 118, 142, 152, 792 deviant errors, 759–60 devices, 308, 374, 498, 653 morphological, 367, 370, 374, 378 devoicing, final, 118, 123, 129–30, 741, 793, 796–7 Di Sciullo, A. M., 532–4, 539–40 dialectic model, MacWhinney’s, 120–2 Diesing, M., 573, 584 Diessel, H., 291, 294–5 Dillon, B., 24–5, 676, 681–2 diminutives, 114–17, 450 Dionne, G., 779–81 direct negative evidence, 557 direct objects, 202, 211, 403, 533–4, 590, 592, 600–601, 603–4 directional prepositional phrases, 488, 496, 590, 592, 600 directionality, 390, 481 head, 72, 390–1, 677–8, 719 right-to-left, 54–5, 72–3 Dirichlet process, 680–1, 683 disabilities, 776, 780–2 language, 780–4, 788 disambiguation, triggers, 719–23 discourse context, 173, 383, 463, 513–16, 529, 580 discrimination, 14–15, 20, 37–8, 79, 499 abilities, 12–13, 37, 499, 796 disjoint reference, 334, 527 disjunction, 560–2, 564, 614–15 acquisition, 554–6 acquisition of scope, 556–8 Boolean disjunction in preschool English, 549–51

974 Index disjunction (cont.) cross-linguistic variation and universality, 551–6 exclusive interpretation, 547–8, 550 interpretation, 506, 548, 550 negated, 550, 555–6, 558 operator, 550, 554, 561, 614–15 and semantic capacity, 558–61 semantics and pragmatics, 547–9 semantics of, 555, 564 disjunctive statements, 549, 551 disordered phonologies, 792, 807–8, 811, 813 disorders, 752, 754–6, 758–65, 767–9, 780, 783, 786, 790–3 developmental, 751–70, 788 language, 761–2, 776, 782–3 neurodevelopmental, 753 dissent, plausible, 528, 542 distributional analysis, 28, 146, 441, 454–6, 597 distributional cues, 128, 151, 497 distributional evidence, 59, 61, 680 distributional information, 36, 134–5, 138–9, 142, 145–7, 151, 178, 487 distributional properties, 150, 690 ditransitives, 158–9, 164, 238 diversity, 496, 693, 704, 706 cross-linguistic, 595, 610 domain-general learning mechanisms/ strategies, 177, 358–9, 665–6, 668, 690 domain-general processes, 164, 668 domains, 283–4, 569, 573–4, 707, 709–11, 713, 715–17, 719–20 binding, 539, 544–5, 787 CoLAG, 711, 718–20, 722 conceptual, 568–9, 576 linguistic, 2–3, 429, 586, 638, 726, 747 P&P, 699–700, 716 smooth, 707–8, 719 domain-specificity, 666, 671, 788–9 Donaldson, M., 464, 478, 480 dorsal consonants, 803–4 double-accusative construction, 219–21, 223 double-object datives, 164, 172, 174, 176 Down Syndrome, see DS DPBE (Delay of Principle B Effect), 520, 523–6, 528–34, 536–7, 539–40

as experimental artefact, 528–9 as incomplete acquisition, 525–6 as result of processing limitations, 526–8 DPBE, extra strong, 536–7, 540 DPs (determiner phrases), 216–17, 224–5, 227–8, 231–5, 272, 288–9, 438–41, 521 construction, 217–19, 221, 223, 227 Dresher, B.E., 19, 21, 73, 697, 705, 707–8, 721, 726 DS (Down Syndrome), 752–5, 758, 761–4, 768 Dutch, 29–30, 70–1, 354–5, 375–9, 523–4, 526–30, 532–44, 605–7 children, 130, 377, 527, 530, 532, 534, 536, 540–3 dyslexia, 752–4, 769, 784–5, 787–8 EARH (External Argument Requirement Hypothesis), 263–4, 266 early grammars, 45–6, 50, 54, 66–7, 132, 313, 317, 407 Ebeling, K.S., 483–4 EC, see empty categories ECM sentences, 536–9 economical derivation, 214, 332 ECP (Empty Category Principle), 319, 325, 331 EDCD (Error-Driven Constraint Demotion), 729–33 Eisenbeiss, S., 250, 347, 357, 435, 439–40, 444–7, 449 Eisenberg, S., 291–3, 396, 402–3, 413 Eisner, J., 638, 725, 739 Elbert, M., 793, 799–800, 807 Elbourne, P., 524–6, 528, 531 elicitation tasks, 240, 246, 298, 443, 457, 459, 479, 791 elicited imitation, 213–14, 349, 351, 402, 405, 411 elicited production, 97, 215–16, 245–6, 252, 322–3, 363–5, 759–60, 764 tasks, 213–14, 245–6, 250, 252, 401, 759, 764, 766 test, 363, 365 ellipsis, 306, 312–13, 472–4, 488–9, 491, 516–17 verb phrase, 489, 516 Elman, J., 164, 652–3, 664, 666, 673 embedded clauses, 264–5, 267, 276–7, 325–8, 334–5, 399, 409–10, 544–5 tensed, 328, 330, 394, 399, 408

Index 975 embedded complements, 337 embedded expletives, 274, 277 embedded infinitives, 381, 385 embedded passives, 275–7 embedded predicates, 267–9 embedded questions, 315–16, 319, 337 wh-questions, 316 yes/no, 315 embedded sentences, 536, 539, 544 embedded SpecCP, 332–3, 335–6 embedded subjects, 265, 271–4, 277, 291–2, 399 empirical evidence, 118, 478, 486, 585, 792, 794 empirical questions, 73, 83, 86, 132, 177 empty categories, 291–3, 319, 325, 334, 388, 391, 407–8 Empty Category Principle, see ECP empty subjects, 291, 389, 409 encoding, 16–17, 20–3, 25, 157, 159, 166, 606, 611 endocentric compounds, 90–1, 94, 105, 108–9, 224–5 recursive, 92, 109 end-state grammars, 43, 46–7, 50, 55, 57, 59 English-learning children, 15, 31–2, 107, 109, 126, 128, 134–5, 143 English-speaking adults, 12, 507, 515, 561–2, 761 English-speaking children, 252–6, 316–18, 320–1, 396–400, 407–10, 525–6, 529–31, 765–6 entailments, 160, 497, 505–6, 589, 591, 606 completion, 591, 603, 606–7, 609 entrenchment, 165, 175 entropy, maximum, 640, 734–6 environmental variances non-shared, 774–5 shared, 774–5 environments, 10, 28, 31, 37, 421–2, 426, 618, 622 linguistic, 2, 12, 20, 121, 655, 696, 706, 720 EP, see elicited production epenthesis, 51, 94, 744 epistemic modality, 367–8, 371–4, 378–80, 384 EPP (Extended Projection Principle), 237, 697 equatives, 465, 475, 496–7 error patterns, 267, 347, 443, 793–8, 800–807, 813–15 common, 794, 799, 808 error rates, 239, 315, 324, 382–4

Error-Driven Constraint Demotion, see EDCD errors, 239–40, 315–17, 336–7, 382–5, 426–8, 730–3, 759–61, 764 case, see case errors of commission, 287, 343, 423, 702, 718 deviant, 759–60 grammatical, 216, 718, 764 of omission, 216, 407, 423, 718 overgeneralization, 170, 172, 175 substitution, 381–2, 802 symmetry, 504 Error-Selective Learning, see ESL Escobar, L., 530, 533, 537, 539 ESL (Error-Selective Learning), 746 Estonian, 225, 594 etiologies, 752, 754–5, 782 E-triggers, 720, 722–3 event representations, 166, 174 event structure, 157, 159, 166, 191, 599 event time, 355, 587–8 event types, 159, 207, 374, 588–90, 598–600 eventive interpretation, 185, 190 evidence, 121–8, 163–8, 249–52, 554–6, 658–9, 752–9, 762–4, 796 acquisitional, 110, 221, 223, 228 developmental, 76, 612, 614 distributional, 59, 61, 680 empirical, 118, 478, 486, 585, 792, 794 experimental, 255, 299, 545, 670, 672 indirect negative, 165, 665, 667, 671, 699 indirect positive, 690 negative, see negative evidence noiseless, 642–3 noisy, 639, 642–3 unambiguous, 260, 715, 723 exact convergence, 641–4, 652–3 exact interpretations, 30, 510, 625–7 exact semantics, 595, 625–6 exactness, 480, 509–11, 627 exclusive interpretation of disjunction, 547–8, 550 exemplar models, 75, 678 existential closure, 573, 584 existential constructions, 494, 585 existential operator, 468, 573 experience, 358–9, 512–13, 554–7, 634, 638–41, 644–5, 653–4, 658–60

976 Index experiencer arguments, overt, 270–1 experiencer role, 231, 247, 292 experiencer subjects, 242, 245 experiencers, 160, 184, 198–202, 241, 247–8, 267, 271, 292 experimental artifacts, 385, 524–5, 528, 531, 537, 542 experimental contexts, 331, 371–3, 377 experimental designs, 250, 443, 541, 546, 549, 562, 591, 617 experimental evidence, 255, 299, 545, 670, 672 experimenters, 99–100, 275–6, 329, 336, 480, 483–4, 500, 619–20 experiments, 213–14, 242–7, 275–6, 334–8, 482–3, 615–21, 670–2, 684–5 expletives, 387, 389, 392, 394, 400, 404–5, 407–8, 410 embedded, 274, 277 pronouns, 411–12 subjects, 237, 258, 266, 269–70, 273, 387, 389, 402 expressions, 90, 295–6, 367–70, 378, 494, 622–5, 627–8, 787–8 comparative, 463–6, 477, 479, 497 generic, 565–7, 569–70, 572–9, 583–5 logical, 561, 614, 623 measure, 507–8 morphological, 367, 369, 374, 378 parametric, 702–3, 706 quantificational, 498–9, 557 referential, 232, 575, 584 scalar, 612, 614, 617, 621, 623, 625, 628–9 Extended Projection Principle (EPP), 237, 697 External Argument Requirement Hypothesis, see EARH external arguments, 193, 198–200, 204, 247, 256, 264, 266, 504–6 external θ-role, 237, 257–8, 264 extra strong DPBE, 536–7, 540 extraction, 208–9, 301–2, 318, 326, 330–1, 337–8, 473 site, 310, 326, 337 eye movements, 175, 529, 620 failures, 22, 120–1, 141, 297, 322–3, 554, 615, 620–2 faire-par, see FP

false beliefs, 282, 295–8 false complements, 297, 301 false-belief reasoning, 297–9 falsehood, logical, 618–19 fast mapping, 303–5 feature values, 20–2, 391, 741 felicity, pragmatic, 278, 529, 621 Fikkert, P., 19–21, 41, 44, 46, 55–6, 59–61, 70–1, 73–4 final boundaries, 591, 599, 605–6 final consonants, 30, 33, 142, 455, 807 deletion, 799–800, 807 omission, 793, 797, 799, 806 final devoicing, 118, 123, 129–30, 741, 793, 796–7 final obstruents, 112, 799–800 final syllables, 65, 145–6, 149 final vowels, 170, 369, 380–3, 744 finite clauses, 292, 294, 306, 331, 345, 400–401, 403, 599 finite complement clauses, 291, 294 finite complements, 293–4, 296, 300, 518 finite languages, 646–8, 660 finite tensed clauses, 294–302 finite verbs, 344–5, 347, 351–3, 376, 592, 599 finiteness, 341, 343, 346–52, 358, 360, 362, 762–3 verb, 343, 345–50, 365 first-order construal, 307 first-order patterns, 28, 32–3, 42 fixed rankings, 55, 804, 808–9 focal stress, 561–2 focus operator, 559, 562 Fodor, J.D., 163, 249–50, 700, 702, 706, 710–15, 718–23, 726 FP (faire-par), 183, 185, 199 fragment answers, 312–13 French, 48–9, 95–7, 145–7, 213–15, 218–22, 352–5, 548–50, 552–3 Québec, 48, 66 French-learning infants, 38, 143, 145–7, 150, 152 French-speaking children, 62, 171, 193, 395, 615, 763, 766 frequency, 29–30, 32, 125–6, 253–5, 324, 456, 576, 745–6 effects, 455, 793

Index 977 high, 106, 150, 412, 673 input, 170, 187, 253–4, 324, 457, 585 low, 288, 482, 577 relative, 25, 28, 218–19 Freudenthal, D., 343, 358, 360, 365, 401–2, 404, 406, 412 fricatives, 8, 10–11, 33–5, 60, 63, 121, 803–4, 812–14 Friederici, A.D., 27, 29–30, 36, 38, 70, 143–4, 147 Friedmann, N., 193, 202, 259, 760, 764, 766, 769 Friedrich, M., 21, 36, 144 front vowels, 33–4, 61, 797–9, 805 function words, 116, 149–50, 762, 777 functional categories, 34, 283, 285–7, 306, 391, 413, 444, 448 functional heads, 272, 416–17, 439 functional morphemes, 125–6 functional projections, 233, 357, 429, 440, 532 functional structure, 310, 578 functions, 81, 83, 158, 347–8, 570–1, 634–5, 638–40, 783 grammatical, 199, 201, 414, 416 learners as, 634, 639–40 likelihood, 25, 675 primitive recursive, 644, 657, 659 recursive, 644, 657, 659 subject, 179–80, 199–201 GA, see gradable adjectives GA (genetic algorithm) models, 703, 706, 709 Clark’s, 704–5, 709, 713, 716 Gahl, S., 406, 667, 690 Gathercole, S.E., 132, 464, 477, 479, 482, 495, 622 Gavarró, A., 396, 530, 533, 537, 539, 570 Gavruseva, E., 332, 338–9 GC (Governing Category), 525–6, 531 Gelman, S. A., 483–4, 499–500, 502, 565, 569, 575–9, 581, 626 gender, 95, 387, 389, 416, 425, 433, 440, 448–9 generalist genes, 776–7, 782 generalizations, 10–12, 180–1, 423–4, 566–8, 582–3, 664, 671–3, 801 cross-linguistic, 98, 162, 341, 423, 768 morpho-phonological, 112, 131

Namiki’s, 92, 105, 109 typological, 503–4, 601 Generalized Modification, see GM generative approaches, 358, 427 generative grammar, 342–3, 349, 357 generative linguistics, 17, 360, 668, 696–724 generative phonology, 17, 705, 707, 792, 800 generative syntax, 416, 697 generic expressions, 565–7, 569–70, 572–9, 583–5 acquisition, 573–82 definitions, 566–73 learnability, 582–6 generic interpretations, 571, 573–4, 578–81, 583–6 generic knowledge, 566, 568–9, 573, 577–8, 582 generic language, 577–8, 586 generic operator, 582–3, 585 generic overgeneralization, 584, 586 generic properties, 569, 574, 578 generic sentences, 567–9, 572–4, 582–3, 585 genericity, 565–86 generics, 566–8, 570, 572, 574–8, 581–6 bare plural, 579, 583 definite plural, 571, 580 genes, 703–4, 706, 771–2, 774–89 candidate, 784–8 generalist, 776–7, 782 genetic algorithms, 703–4, 710, 717 genetic correlations, 771, 775–80, 782 genetic variances, 771, 775, 782, 789 additive, 774–5 genetics, 771–89 behavior, 773, 780–3 integrating and evaluating findings, 789 molecular, 783–9 of normal variation, 772–80 reasons for studying genetics of language, 771–2 genitive case, 261–4, 417, 420, 422, 427–8, 436–8, 442–4, 447–51 acquisition of, 450–1 genitive of negation, 259–60, 262 genitive pronouns, 418, 420 genitive subjects, 419–20 Gerken, L.A., 75–6, 126, 137, 402, 411, 455, 662, 670–1

978 Index German, 91–4, 122–7, 142–4, 208–12, 375–9, 436–41, 444–8, 454–6 Germanic languages, 91, 93, 105, 209, 211, 302, 308, 388 German-learning infants, 36, 137, 143–4 German-speaking children, 189, 195, 197, 251, 351 gestural coordination, 62 gestural timing, 58, 61, 66 get-passives, 181, 184–5, 190, 194, 198, 200, 204, 246 Gibbs sampling, 678, 685 Gibson, E., 23, 703, 707–11, 713, 716, 719–20, 723, 726 GLA (Gradual Learning Algorithm), 50, 74, 731–3, 735–6, 738–9, 742–3, 745–6 Gleitman, L.R., 164, 231, 279, 282, 303, 600, 660, 664 glide contrast, 61–2 glide substitution, 59, 61, 66 glides, 10, 59–60, 62, 808, 812 putative, 45, 61 gliding, 793, 797, 803–4 GM (Generalized Modification), 109–10, 641 Gnanadesikan, A., 43, 45–7, 744, 808 Gold, E.M., 640, 642, 644, 646–54, 656–9, 661, 700 Goldin-Meadow, S., 162, 577 Goldrick, M., 361–2, 424, 662 Golinkoff, R.M., 166, 181, 458 Gómez, R., 128, 662, 670–2 Goodluck, H., 215, 271, 282–3, 291–3 Goodman, J.C., 343, 677, 779 Gordon, P., 76–7, 99–103, 129, 166, 180–1, 186, 243–4, 253–4 Governing Category, see GC gradable adjectives (GA), 465, 467–70, 474, 476, 485–7, 489, 703, 705–7 absolute, 485, 487 gradable predicates, 466–7, 472 gradient grammaticality, 733, 735 Gradual Learning Algorithm, see GLA gradual learning curvess, 733, 735, 745–6 grammar hypotheses, 702, 704, 707–8, 710–11, 714, 716, 730, 740 grammar-learning subproblem, 726–7, 730–1, 733, 736–7, 740, 746–7

grammars, 97–100, 634–40, 664–5, 692–4, 696–720, 731–4, 739–47, 779–80 adult, 43–7, 50–1, 54–5, 57, 63–4, 66, 407, 744–5 child, 43–7, 50–1, 54–5, 57–8, 66–7, 190, 331–4, 407–9 constraint-based, 123, 131, 747 correct, 512, 653, 668, 733 early, 45–6, 50, 54, 66–7, 132, 313, 317, 407 end-state, 43, 46–7, 50, 55, 57, 59 generative, 342–3, 349, 357 Harmonic, 123, 666, 732, 734 intermediate, 43, 47–8, 64, 745–6 multiple, 359, 392, 700, 707, 709 phonological, 10, 20, 43, 66, 123, 125, 128, 131 probabilistic, 410, 653, 733, 740, 745–6 restrictive, 741–4, 747 rogue, 44–5, 59, 62–4 target, 43, 47, 49, 64–6, 697, 699–700, 710–12, 714–19 universal, see UG weighted constraint, 734, 736 grammatical aspect, 356, 587–93, 595, 597, 599, 601, 603–5, 608–10 grammatical categories, 670, 673 grammatical errors, 216, 718, 764 grammatical forms, 565, 576, 580, 585, 764 grammatical functions, 199, 201, 414, 416 grammatical hierarchies, 199–200, 202–3 grammatical knowledge, 1–2, 215, 248–9, 277, 414, 419, 513, 756–7 grammatical markers, 566, 576, 581, 583 grammatical morphemes, 411, 454–6, 458, 766–7 grammatical representations, 2, 406, 742, 767 grammatical rules, 126, 516, 525, 692 grammatical structures, 229, 414, 672, 720 grammaticality, 178, 265, 277, 387–8, 521 gradient, 733, 735 grammaticality judgments, 277, 341, 364, 769 tasks, 176, 323, 363 granularity, 320, 595 grave accent, 82–3 Graziano-King, J., 464, 482 Greek, 188–9, 191, 379, 436–9, 449, 454–6, 530, 765–6 Greek-speaking children, 293, 380

Index 979 Greenberg, J.H., 8, 659 Griffiths, T., 26, 640, 665, 674–6, 678, 783 Grimshaw, J., 159, 162, 165, 470, 526, 529–30 Grodzinsky, Y., 181–2, 198–9, 241, 246–7, 249, 523, 526–8 Gropen, J., 170, 173, 176 Gruber, J., 160, 303, 346, 418 Gualmini, A., 490, 506, 512–13, 549, 551, 558–9, 615, 621 Guilfoyle, E., 345, 408, 429 Gupta, P., 75–7 Haegeman, L., 190, 200, 345, 387, 399, 408–9 Halberda, J., 493, 500–502, 556, 622 Hale, K., 18, 44, 46, 127, 160–1, 389, 790 Halle, M., 17–18, 21, 112, 114, 116, 130, 347, 792 Harmonic Grammar, see HG harmonic scales, 803, 808 harmony, 54–6, 66, 113, 115, 734 consonant, 44, 54, 57, 66–7, 793 vowel, 66–7 Hauser, M., 140, 662, 672, 693 Haworth, C.M.A., 776–8, 781–2 head directionality, 72, 390–1, 677–8, 719 head nouns, 95–6, 104, 108 hearers, 371, 406, 527, 531, 548, 611–12, 618, 621 Hebrew, 90, 94–5, 188–9, 389, 440–1, 451–2, 455–6, 765–6 Hebrew-speaking children, 253, 259, 760 height, 463–4, 466–9, 475, 481, 484–5, 490 Heim, I., 467, 469, 473–6, 522, 524, 531–2, 543, 573 Hendriks, P., 527, 529, 608 heritability, 770–3, 775–6, 781–2 bivariate, 775–7 high, 776, 782 low, 776 moderate, 775, 777, 779 non-zero, 773, 789 heuristics, 600–601, 604, 610, 674, 678, 703, 707, 713–14 innate, 609–10 search, 706, 723 transitivity, 600, 604, 609 HG (Harmonic Grammar), 123, 666, 732, 734–6, 738, 746 hidden objects, 372, 501

hidden structure, 726, 731, 736–42, 747 hierarchies, 11, 13, 480, 727–8, 730, 807, 809, 812 constraint, 57, 802, 804, 807–8, 815–16 grammatical, 199–200, 202–3 thematic, 162, 245, 271 high tone, 80, 82–3 high-ranking markedness, 46–7 Hiramatsu, K., 323–4 Hirsch, C., 182, 187, 198–201, 264, 267, 271 Hirst, D.J., 86, 370–3, 616 Hodgson, M., 602–3, 607, 610 Hoekstra, T., 344–5, 352, 354–5, 376, 379, 598 Höhle, B., 70, 128, 143–4, 150, 458 Hollebrandse, B., 284, 306–7 Holmes, U.T., 51–3, 65 Horgan, D., 181, 240–1, 249–50 Horning, J.J., 640, 644, 647, 649–56, 661 Hornstein, N., 213, 234, 292, 689 Huang, C.-T. J., 194, 206, 325, 388, 545–6, 620, 624–5, 664 Hungarian, 74, 121, 338–9, 530, 537–8, 553 Huttenlocher, J., 172, 203 Hyams, N., 196, 256, 344–5, 354–5, 375–9, 401–3, 407–8, 598–9 hypothesis space, 504, 665–9, 673–7, 679–80, 683–4, 689–90, 692, 694–5 structured, 634, 668 iambic words, 144, 151 Icelandic, 209, 218, 220, 352–3, 355, 375, 530, 544–5 ideal learner, 24, 653–5, 659–60, 662, 673, 683–6 identification, 520, 529, 532, 543, 642–4, 646–59, 785, 793 picture, 239, 250, 252 IDL (Inconsistency Detection Learners), 738–40 ill-formed sentences, 635, 637 impairments, 753, 755, 762, 768–70, 782 cognitive, 755–6, 763, 769 reading, 764, 784–5 selective, 753–5 specific language, see SLI imperatives, 254, 286, 378–9, 383, 393, 395, 401, 403

980 Index imperfective aspect, 590–3, 596, 598–9, 601, 604–10 acquisition, 604–9 imperfective forms, 592–3, 607 imperfective paradox, 591, 598 impersonal passives, 238, 252, 388–9 implicational relationships, 220, 228, 803–4 implicatures, 301, 475, 493, 612–15, 617–19, 622, 624–5, 629 pragmatic, 301, 548, 551, 558, 564 scalar, 373, 493, 548, 550, 611–29 and semantics, 625–7 implicit arguments, 237 implicit comparisons, 469, 485–6, 497 implicit learning, 787–8 inanimate objects, 268, 270 inanimate subjects, 169, 268–9, 277 inconsistency, 729–31, 733, 738–9, 741 detection, 73, 729, 738–9, 741 Inconsistency Detection Learners, see IDL indefinite singulars, 570, 581 indefinite subjects, 516, 573, 581, 584 indefinites, 512, 516, 571, 573, 575, 584 indirect negative evidence, 165, 665, 667, 671, 699 indirect objects, 179, 290, 416, 590, 719 indirect positive evidence, 690 individual variation, 319, 444, 457, 756 inductive biases, 664–5, 667, 669, 671, 673–5, 677, 679, 681 infancy, 23–4, 133, 135, 137, 139, 141, 143–5, 147 early, 78, 152, 753 infant-directed speech, 69, 134, 141 infants, 12–15, 29–40, 69–7 1, 78–80, 133–47, 149–53, 499–501, 668–72 German-learning, 36, 137, 143–4 Japanese, 15 speech perception, 7, 12–13, 15, 17, 27–42 young, 13, 68–9, 85, 136–8, 142, 150, 153, 680 infelicity, pragmatic, 249, 617–19 inferences, 168, 611–14, 620, 622–3, 628, 672, 677, 741 Bayesian, 664, 678–9, 690, 694–5 pragmatic, 611–12, 614–17, 620, 627–9 infinite languages, 647–8 infinitival clauses, 219, 281, 290–1, 293, 330, 335

infinitival complements, 273, 287, 293–4, 300–301, 303–4, 306 infinitives, 280, 282–3, 285, 287, 290, 301–3, 376–80, 423–4 bare, 423 embedded, 381, 385 morphological, 360 optional infinitive stage, 341, 346, 357, 363, 365, 423 inflection, 128–9, 349, 388–9, 400, 408–9, 415, 595, 598 irregular, 101 verb, 128, 388–9, 598 inflectional morphemes, 457, 762, 777 information value, 403, 410–12 inherently passive verbs, 257–64 initial boundaries, 591 initial consonants, 34, 777, 804 innate biases, 308, 601, 671 innate heuristics, 609–10 innate knowledge, 29, 163, 177, 311, 557, 584, 669, 772 input frequency, 170, 187, 253–4, 324, 457, 585 input segments, 806, 812 input sentences, 700, 703–5, 707–9, 712, 715–16, 718 intelligence, nonverbal, 775, 778, 791 intended referents, 686, 690–1 intensity, 136, 478, 529 intermediate grammars, 43, 47–8, 64, 745–6 intermediate SpecCP, 327–8, 330–1, 335 internal arguments, 179, 192–3, 195–8, 259, 261–2, 266, 503–4, 532 internal objects, 171, 195, 720 internalized underlying representations, 791, 798–9, 806 interpretation, 489, 496, 513, 520, 530, 534, 558–60, 582–3 conjunctive, 306, 550–2, 554–6, 558–60 eventive, 185, 190 exact, 30, 510, 625–7 generic, 571, 573–4, 578–81, 583–6 long-distance, 327, 336–7 non-adult, 559–60 non-reflexive, 534, 540 pragmatic, 611, 615, 620, 628

Index 981 reflexive, 524, 530, 532–8, 540, 543 semantic, 105, 109, 225, 328, 689 interpretive parsing, 73, 160, 624, 737–8, 740, 742 intervocalic voicing, 116, 123–4, 741 intonation, 68, 77–83, 85–6, 143, 476 development, 80–1 intonational phonology, 81–4 intransitive verbs, 168, 172, 175, 190, 238, 257, 416, 600–601 inventories phonological, 7–26, 42 receptive, 16, 18–19, 21, 23–5 inventory development, 7–8, 15, 17, 21 inverse scope, 511, 514–16, 556–7 inversion, 314–15, 318–19, 321, 338, 340, 349–51, 355 subject-auxiliary, 311, 314, 318, 344, 349–50, 355 inversion judgments, 350–1 irrealis, 367–9, 374, 376–8, 381, 384–5 markers, 369–70, 374, 385 irregular inflection, 101 irregular morphology, 117–18 irregular nouns, 99–100 irregular plurals, 99–101 irregular verbs, 117, 119 Isobe, M., 107, 109, 222, 504 isolated words, 133–4, 136, 144, 147, 149 isomorphism, 511–15 Italian, 191–4, 259–60, 320–3, 376–9, 385–9, 395–6, 398–400, 534 acquisition, 122, 534 Italian-speaking children, 189, 192–3, 259–60, 377–9, 396, 400, 531, 534 Jackendoff, R., 109, 158, 160, 162, 164, 234, 245, 543 Jaeggli, O., 232, 237, 247 Jain, S., 640, 655, 657 Jakobson, R., 7–13, 15, 18–19, 41, 43, 45, 51, 744 Jakubowicz, C., 331–3, 363, 520, 530, 536, 766 Japanese, 78–80, 107–9, 416, 422, 452–5, 515– 16, 552–7, 562–3 children, 15, 52, 80, 326, 397, 403, 554–7, 562–3 Jensen, B., 312–13, 797

Jesney, K., 50, 64, 115, 124, 743, 745–6 judgment tasks, 214, 278, 482, 486, 489, 491, 616, 618–19 truth value, 395, 401, 489, 524 judgments, 277, 350, 483–6, 490, 548, 550, 617, 621 grammaticality, see grammaticality judgments inversion, 350–1 prevalence, 567–8 truth value, 2, 107, 239, 275, 313, 512, 551, 602 Jusczyk, P.W., 30–2, 35, 37–9, 69–70, 133–7, 141–3, 147, 151–2 Kannada, 169, 512, 514 Kaye, J.D., 23, 73, 76, 163, 697, 705, 721, 726 Kayne, R., 183, 213, 219–21 Kearns, K.P., 639–40, 644, 648, 791 Kenstowicz, M., 114–15, 117, 127, 409, 790, 792, 808 Kernan, K., 121, 363, 754 Khmer, 93, 225 Kindersprache, 8, 11–13, 18 kinship terms, 439, 445–6, 449 Kiparsky, P., 56, 101–2, 112, 116, 125, 131, 811, 813 Kisseberth, C., 113, 127, 790, 811 knowledge, 27, 29–30, 166–8, 175–8, 404–7, 411–15, 547–9, 756–8 abstract, 32, 163, 169, 172, 187, 204–5, 405–6 adult, 563–4 analogical, 126–7 generic, 566, 568–9, 573, 577–8, 582 grammatical, 1–2, 215, 248–9, 277, 414, 419, 513, 756–7 innate, 29, 163, 177, 311, 557, 584, 669, 772 language-general, 29–30 linguistic, 1, 203, 311, 316, 324, 332–3, 557, 664 number, 500–502 phonological, 36–7, 42, 121 phonotactic, 28–33, 35–6, 42, 121–2 pragmatic, 312–13, 546 prior, 1, 3, 36, 121–2, 667, 687, 717 semantic, 312, 501, 550–1, 555, 564 syntactic, 177, 312–13, 412, 670 Korean children, 321, 544, 765 Krämer, I., 345, 494–5, 512, 608 Kraska-Szlenk, I., 114–15

982 Index Kratzer, A., 109, 161, 522, 532 Kuczaj, S., 122, 314, 370, 490 Kuhl, P.K., 13, 24, 79, 83 labels, 14, 140, 196, 392, 484–5, 583, 590, 686–8 novel, 483, 556, 622 labials, 10, 15, 21, 55, 60 Laird, N.M., 639–40 Landau, B., 164, 292, 600, 757 landing sites, 265, 271, 310, 320, 340, 517–18 language abilities, 106, 753, 755, 776, 781–2, 789 language acquisition, see acquisition language change, 717, 722 language deficits, 753, 786, 788 language disability, 780–4, 788 language disorders, 761–2, 776, 782–3 language impairments, 342, 752–3, 755, 768, 772, 784, 786–9 language-dedicated knowledge, 753, 755–6 language-general knowledge, 29–30 Lappin, S., 640, 646, 648, 656–8, 660–1 Lardiere, D., 95, 97 Lasnik, H., 219, 234, 271, 273, 281, 345, 664 Lasser, I., 356, 376 learnability, 2, 31, 33, 428–9, 557–8, 649–50, 699, 725–6 criteria, 700, 710 generic expressions, 582–6 perspective, 25, 366, 441, 455, 585 stress, 76–8 theories, 163–5 learners, 120–7, 644–8, 650–8, 673–80, 698– 703, 707–12, 730–3, 736–46 as functions, 634, 639–40 learning, see also Introductory Note definitions, 640–5 machine, 23, 664, 703, 717 learning algorithms, 584, 586, 666, 709–10, 727, 731, 734–6, 738 Gradual Learning Algorithm, 50, 74, 731–2 TLA (Triggering Learning Algorithm), 708–14, 716–19 learning criteria, 634, 640–2, 647 learning data, 727, 729–30, 732–3, 736–9, 741–4, 746

learning frameworks, 643, 650, 653, 655–6, 659–61 probabilistic, 647, 656 learning models, see models learning problems, 573–4, 585, 643, 645, 648–9, 679, 681–2, 740 left periphery, 286, 310, 318, 320, 339–40, 388, 409 legal phonotactics, 29–30, 36, 38, 118 Legate, J., 359, 365, 666, 692 Legendre, G., 123, 424, 431, 666, 732, 734–6, 745 Leonard, L., 59, 346–8, 363, 430, 765–7, 798 Leslie, B., 568, 582–3 Levelt, W., 19, 21–2, 38–9, 41, 54–6, 743, 745, 798 Levin, B., 110, 157, 159–62 Levinson, S., 613–14, 623, 626 lexical acquisition, 36–7, 42, 133–4, 149, 153 and phonotactics, 36–7 lexical alternatives, 622–4, 626 lexical aspect, 356, 587–90 lexical categories, 13, 25, 144, 158 lexical categorization, 284 lexical constraints, 446, 449, 742 lexical factors, 293–4, 545, 590 lexical items, 36, 78, 440–1, 480–1, 487–8, 680–1, 684, 686 lexical learning, 165, 177, 197, 280, 294, 600, 742, 744 Lexical Phonology, 101, 116, 125 lexical pitch, 80, 82 lexical properties, 81, 590, 701 lexical representations, 17, 23, 41, 111, 161, 741, 799 lexical semantics, 160, 212, 463, 509, 548–9, 563, 623, 629 lexical subjects, 219, 394–5, 402–3, 412 lexical tones, 78–9, 81–2 lexical variation, 294, 308, 553 lexicalized compounds, 93–4 lexicon, 16–17, 20–1, 130–1, 133, 235–6, 739–42, 744, 746–7 adult, 600–601, 629 LF (logical form), 233–4, 265, 279, 332, 517–18, 546, 572, 583–5 LFCD (Low Faithfulness Constraint Demotion), 743, 746

Index 983 Lieven, E., 165, 170, 458 light verbs, 170, 265 Lightfoot, D., 163, 554, 689, 720, 722 likelihood, 25, 28, 94, 105, 125, 136, 401, 403 functions, 25, 675 maximization, 739–40 limited processing capacity, 98, 539–40 limited processing resources, 325, 525, 527 linear order, 96, 272, 486, 512, 691 linear rules, 691–3 lingual consonants, 796–7 linguistic cues, 134, 579, 595 linguistic determinism, 595 linguistic domains, 2–3, 429, 586, 638, 726, 747 linguistic environment, 2, 12, 20, 121, 655, 696, 706, 720 linguistic experience, 31, 213, 555, 628, 667 linguistic knowledge, 1, 203, 311, 316, 324, 332–3, 557, 664 linguistic maturation, 248–9, 251 linguistic stimuli, 78, 167, 621 linguistic structures, 27, 97, 166, 329, 496, 585, 761, 792 linguistic theory, 1–2, 310–12, 318–19, 322, 326, 386, 465, 496–7 linkage, 783, 785–6 linking rules, 161, 199 liquid clusters, 60, 809, 811 liquid properties, 45, 61–2 local coreference, 522, 524–8, 531–2, 534, 537, 539–40 local maxima, 710–11, 714, 716 local movement, 328, 333–6 local negation, 553, 556–8, 562 local subjects, 533, 539, 546 locality constraints, 202, 298, 325, 327, 337 locative verbs, 175 logical connectives, 547–64 logical expressions, 561, 614, 623 logical falsehood, 618–19 logical form, see LF logical objects, 190–1, 193–4, 196–8, 244–5 logical scales, 613–14, 623–4 logical subjects, 241, 245, 247 logical words, 549, 551, 554, 564 Lohndal, T., 332–3 Lombardi, L, 57, 113, 123, 524

long passives, see LP long vowels, 52, 72, 142 long-distance interpretation, 327, 336–7 long-distance movement, 318, 321, 328–30, 332–3, 337 long-distance questions, 310–11, 321, 326–40 long-distance reflexives, 546 longitudinal corpora, 105–7, 215–16, 218, 220, 222, 224, 227 long-term memory, 13 Lorusso, P., 193, 259, 396 losers, 727–30, 732, 735 Low Faithfulness Constraint Demotion (LFCD), 743, 746 low tone, 79, 82–3 lower clauses, 266, 281–2, 291, 293, 297–300, 302–3, 337 lower predicates, 268–9 lower verbs, 281, 290–1, 297 LP (long passives), 179–84, 186, 189, 194, 198, 248 Lukaszewicz, B., 118–19, 127, 812 Lust, B.C, 291, 410, 529, 545–6, 549, 561 machine learning, 23, 664, 703, 717 Macken, M., 11, 19–20, 44, 63–4 MacWhinney, B., 118–21, 124, 127, 294, 315, 378, 464, 490–1 MacWhinney’s Dialectic Model, 120–2 main clause auxiliary verbs, 691–2 main clause subjects, 291–2, 536, 539–40, 544 main clauses, 399, 408–10, 466, 468–72, 474, 488–91, 539, 762 main verbs, 224–5, 270, 316, 416, 423, 536, 593, 677 Mainstream Generative Grammar, 356–8 Malagasy, 180, 195–6, 199, 238, 256, 473 Mandarin Chinese, 78–80, 206–7, 211, 324, 471, 576, 592–3, 596–7 Mandarin-speaking children, 325, 580 Manetti, C., 181–2, 189, 201, 204 manner-of-motion verbs, 110, 226 Manzini, R., 512, 525, 544, 697, 700 map experience, 634, 639 mappings, 82–3, 161–2, 172–3, 573–5, 584–5, 598, 669–7 1, 686–7 fast, 303–5

984 Index mappings (cont.) object, 140, 686 semantics, 688–9, 694 Marantz, A., 160–1, 179, 259, 347 Maratsos, M.P., 122, 173, 181, 240–2, 246, 248, 250, 254 Marchman, V., 127, 130, 181, 184 Marcus, G.F., 122, 243, 661, 672, 699, 772, 778 marked values, 721, 723 markedness, 64, 608, 697, 743, 745, 803–4, 806, 811–12 constraints, 45–8, 51, 56, 74, 743, 746, 802–6, 809–14 high-ranking, 46–7 universal, 745 markers, 82, 370–1, 379, 422, 443, 453, 580, 584–6 aspect, 570, 594–7, 599, 605 clinical, 766 comparative, 464, 466, 470, 474, 477 grammatical, 566, 576, 581, 583 mood, 377, 382–3 plural, 93, 101, 103, 121, 125 possessive, 338, 437, 444, 446 scope, 327–8, 331, 335–6 subject, 80, 83 topic, 570, 719 marking, 342–4, 346–7, 355–6, 430, 447–51, 592–6, 598–9, 608–10 Markman, E.M., 568, 579, 582, 622, 671, 687 Markov chains, 678, 685, 710 masculine, 30, 93, 95–6, 115 maternal input, 149 mathematical objects, 635, 638 matrix clauses, 266–7, 272, 276–7, 315–16, 325–6, 334–5, 338–9, 518 matrix objects, 271, 273, 277 matrix sentences, 286, 297, 305 matrix SpecCP, 327–8, 330, 332, 334, 339 matrix subjects, 194, 231, 264–5, 268–9, 271–2, 297–8 matrix verbs, 269, 271, 273–4, 286, 288, 291, 297, 299–301 Mattys, S., 27, 31–2, 35, 38, 135, 147, 151–3 maturation, 178, 202, 246, 250–1, 357, 710 A-chains, 248–9 biological, 1, 242, 265, 278, 357 hypothesis, 248–51

maxima, local, 710–11, 714, 716 maximum entropy, 640, 734–6 McCarthy, J., 114–15, 117, 123–5, 743–4, 777, 800, 806, 811 McClelland, J., 127, 640, 666, 678–9 McDaniel, D., 214–15, 291, 303, 326–8, 534 McKee, C., 517, 525–6, 529–32, 541, 551, 616 mean length of utterance, see MLU measure expressions, 507–8 measure phrases, see MP memorization, 99, 120, 122, 136, 149, 626 memory, 23–4, 26, 499, 501, 709, 715–16, 786, 788 limitations, 685 long-term, 13 numerical, 777–8 short-term, 754–5, 767 Menn, L., 8, 17, 19, 54–6, 131, 800, 811 mental age, 754–7, 764 mental representation, 118, 133, 298, 531, 543 mental retardation, 752, 754–6 mental verbs, 241–2, 246, 282, 303–4 Merchant, J., 209–10, 312–13, 387, 470, 473, 729–30, 741 Mersad, K., 135, 140–1, 149, 151–2 Messenger, K., 184, 187, 201, 203–5, 498 methodologies, 2, 10, 12, 14, 37–8, 325–6, 709, 711 metrical organization in stress development, 73–6 metrical phonology, 71–3, 76, 86 metrical stress, 68, 72, 707–8 metrical structures, 73, 75, 86 metrical theory, 73, 85 and learnability of stress, 76–8 microparameters, 390–1 middle voice, 196–8, 204 minimal constraint reranking, 43, 46–50 minimal pairs, 11, 13–15, 21–2, 197, 456, 681, 791 Minimalism, 265, 286, 390 Minimalist Program, 214, 232, 235, 697 Mintz, T., 670, 673, 678 misalignment, 199–200, 203 misanalysis, 59–62, 420 mispronunciations, 14, 38, 44 missing subjects, 388, 408, 413

Index 985 MLU (mean length of utterance), 106, 122, 254, 314, 363–4, 393–4, 396, 444–5 modal auxiliaries, 371 modal contexts, 377, 380 Modal Reference Effect (MRE), 354, 376 modal verbs, 282, 368, 370–3, 379–80, 468, 677 modality, 367–74, 376–81, 384–5, 671–2 acquisition, 370–4 deontic, 367–8, 370–5, 378–80, 384–5 epistemic, 367–8, 371–4, 378–80, 384 modals, 305, 315–16, 349–50, 367–8, 370–3, 384, 509–10, 615–16 epistemic, 372–3 modeling Bayesian, see Bayesian modeling computational, 41, 698, 700–701, 705–6, 716–20, 722–3, 725–6, 747 models, 76–7, 404–6, 410–12, 673–4, 678–88, 705–7, 722–6, 743–7 bigram, 683–4 GA, see GA models hierarchical, 151–2 modifiers, 92–7, 102–4, 108, 110, 175, 443, 475–6, 688–91 adverbial, 173, 487 nominal, 95, 102 modularity, 752, 771, 773, 776, 782, 788–9 molecular genetics, 783–9 Monaco, A.P., 784, 786 mono-morphemic words, 111, 113, 124, 454 monosyllabic words, 79, 84, 134, 144–5, 147 mood, 170, 292, 367–7 1, 373, 375, 377–83, 385, 544–5 acquisition, 375–84 alternations, 367–85 markers, 377, 382–3 morphemes, 368–9, 377, 379, 381–3 particles, 371, 379 Moore, J., 372, 464, 490, 616 morphemes, 123–4, 380–2, 393–4, 454–5, 474–5, 478, 482, 741–2 aspect, 596, 598 bound, 342, 455 derivational, 99, 101, 104, 360, 369 functional, 125–6 grammatical, 411, 454–6, 458, 766–7 inflectional, 457, 762, 777

mood, 368–9, 377, 379, 381–3 possessive, 444, 454–5 tense, 432, 763 morphological affixes, 116, 126 morphological categories, 115, 126, 592, 606 morphological contexts, 114, 116, 121 morphological expression, 367, 369, 374, 378 morphological infinitives, 360 morphological paradigms, 114, 124, 131, 379, 415, 426 morphological structure, 102, 129, 431, 434 morphologically-sensitive phonological alternations, 114–17 morphologically-sensitive phonology, 119, 125 morphology, 367, 369–70, 384–5, 388, 393–4, 425–6, 782, 786–7 acquisition, 87–153 aspect, 587, 599 comparative, 464, 477, 486, 497 tense, 432, 482, 592, 763 verbs, 356, 592, 766 morpho-phonological acquisition, 68, 111–32 experimental findings, 128–31 finding morphological bases and underlying forms, 127–8 open questions, 131–2 and prosodic representations, 125–6 via analogy, 126–7 via constraint-based phonology, 123–5 via rote memorization and rules, 120–3 morpho-phonological generalizations, 112, 131 morpho-phonological learning, 126, 128, 131 morpho-phonological processes, 112, 130, 132 morpho-phonology, 111–12, 121–3, 125–6, 128–32 morphosyntax, 382, 389, 578, 581, 586, 594, 604, 609 motion predicates, 226–7 motor control, 44, 50 motor-failure theory, 18 movement covert, 262, 472–4, 476, 488–9 eye, 175, 529, 620 local, 328, 333–6 long-distance, 318, 321, 328–30, 332–3, 337

986 Index movement (cont.) partial, see partial movement successive cyclic, 310, 326–7, 330, 333, 335–6, 340 syntactic, 232–3, 312, 319, 519, 766 MP (measure phrases), 130, 235–7, 466, 470, 490–6 MRCD (Multi-Recursive Constraint Demotion), 730–3, 738–9, 743, 745–6 MRE (Modal Reference Effect), 354, 376 multi-morphemic words, 111, 126–7 Multi-Recursive Constraint Demotion, see MRCD multivariate analysis, 775, 782, 785, 789 Munn, A., 576, 579–80 Murasugi, K., 420, 422 Musolino, J., 373, 509–13, 548, 556, 616–19, 621–2, 625, 757–8 mutation, 703–4, 706, 772, 787–8 Myerson, R.F., 130–1 Nadig, A., 486, 628 Naigles, L., 110, 168–9, 173, 231 Namiki’s Generalization, 92, 105, 109 narrow grammar-learning subproblem, 731, 736, 746 nasal clusters, 808–10 nasals, 11, 34–5, 115, 142, 808 native language, 27, 138, 140, 142–3, 147, 150, 152–3, 697 natural language, 140–1, 390, 498–9, 501, 547, 562–3, 637–8, 658–61 acquisition, 141, 656–7 patterns, 637–8, 651, 656, 659–60, 662 quantifiers, 500–501, 504 natural numbers, 500, 502 naturalistic data, 2, 249, 306, 370–3, 385, 443, 457 naturalistic speech, 243, 367, 371–3, 381 naturalness, phonetic, 33–4, 42, 116 negated disjunction, 550, 555–6, 558 negation, 261–4, 322–4, 352–3, 467–8, 511–13, 515, 549–54, 561 local, 553, 556–8, 562 matrix, 274, 552 sentential, 324, 353, 557 negative data, 642, 646–7, 650, 657–8

negative evidence, 164–5, 526, 639, 642–4, 651, 658, 661, 699–700 indirect, 165, 665, 667, 671, 699 negative polarity items (NPIs), 467–8, 490 negative sentences, 513, 551–7, 561–3 neurodevelopmental disorders, 753 neutralization rules, 10, 796–7 newborns, 1, 3, 16, 40, 68–9, 78, 136, 142 Newman, R., 133–4, 753 Newport, E.L., 74, 664, 670, 686 Niyogi, P., 23, 640, 668, 710–11 noise, 24–5, 433, 705, 709, 722, 733, 736, 740 noiseless evidence, 642–3 noisy evidence, 639, 642–3 nominal compounds, 93, 95 nominal modifiers, 95, 102 nominative, 262, 265, 347–8, 416–22, 425, 427–8, 430, 433 case, 193, 271, 286, 344, 349, 421, 427, 430 subjects, 418, 430–1, 433 nominative pronouns, 419–20 nominative-accusative, 416, 428–9 non-actional passives, 180–1, 186, 204, 242, 244, 246–7, 252, 255 non-actional verbs, 181, 186, 243, 247, 252 non-adjacent dependencies, 670, 672 non-adult forms, 414, 423, 425–7, 429, 434 non-adult interpretations, 559–60 non-adult-like performance, 549, 604 non-canonical ordering, 171, 256 nonconservative determiners, 503–4 nonfinite clauses, 292, 301, 345, 355, 599, 762 nonfinite complements, 291, 293–4, 518 nonfinite contexts, 401, 408 nonfinite forms, 344, 351, 356, 358, 362, 593 nonfinite verbs, 357, 375, 592, 599 root, 341–56 nonfiniteness, 345–6, 352, 366 non-generic sentences, 575 nonlinguistic stimuli, 78 non-logical scales, 613, 619, 624 non-nominative pronouns, 348, 418 non-nominative subjects, 348, 430, 433–4 non-nominatives, 348, 418, 430, 434 non-null subject languages, 387, 394–5, 397–9, 401, 403–5, 407–9, 412 non-obligatory control, 292–3

Index 987 non-prepositional possessives, 436–41 non-reflexive interpretation, 534, 540 nonreversible passives, 239–41 nonreversible sentences, 239–40 nonsense words, 75, 126, 128, 150 non-shared environment, 773–5 non-shared environmental variances, 774–5 nonspecific objects, 262–3 non-statistical learners, 665–6 non-stochastic languages, 635, 650 non-syntactic explanations, 404–6, 410 non-targets, 138, 313, 331–2, 338, 340, 607 non-trivial A-chains, 249, 276 nonverbal intelligence, 775, 778, 791 non-word repetition, 775, 779, 782, 784–7, 789 non-words, 29, 32, 36, 38, 139–40, 774 nonzero values, 635–7, 639 Noonan, M., 345, 408, 429 normal variation, 772–3, 775–6, 780, 782–3 Norwegian, 209, 211, 218, 317, 530, 536–8 noun phrases, 93–5, 186–7, 230–3, 236–7, 288–9, 347, 438–40, 582–4 subject, 236, 265, 311, 315, 320–1 nouns, 90, 93–5, 97–100, 288, 415–16, 445–6, 448–51, 659 bare, 570–1, 578–80 count, 130, 571 head, 95–6, 104, 108 irregular, 99–100 novel, 226–7, 579 Noveck, I.A., 372–3, 512, 548, 614–18, 621, 623–4, 628–9 novel adjectives, 478–9 novel endocentric compounds, 105, 108, 224–5 novel labels, 483, 556, 622 novel nouns, 226–7, 579 novel objects, 14, 485, 556, 622 novel verbs, 166–9, 171, 173–6, 231, 243, 253, 304, 601 NPIs (negative polarity items), 467–8, 490 NP-movement, 242, 246, 251, 254, 260 NPs, see noun phrases nuclear scope, 559, 572–3, 584 null objects, 532 null subject languages, 359–60, 387–8, 390, 392, 394–9, 401–2, 404–5, 407–9

Romance, 377, 395–7, 400, 403, 409 null subject parameter, 386, 388, 390–1, 407, 697 null subjects, 344–6, 355, 359–60, 362, 386–413, 697–8, 719 occurrence, 344–5, 355 principles and parameters, 390–2 syntax of, 386–7 null topics, 334, 576, 702 number knowledge, 500–502 number representations, 499, 501, 508 number words, 488, 496, 500, 502, 506, 509–11, 626–7, 629 numbers natural, 500, 502 real, 635–6, 639, 641, 644, 676 numerals, 492, 506, 509, 612, 620, 625–7 numerical memory, 777–8 numerical scale, 616, 626 numerosity, 499–500, 506, 578, 626 object clitics, 259–60, 537, 539 object control (OC), 275, 277 object mappings, 140, 686 object position, 163, 272–3, 328, 402, 512, 535, 543, 557 object pronouns, 412, 428, 533–4, 765 object wh-questions, 187, 312 object-experiencer verbs, 200–202 objects, 13–15, 262–3, 397–9, 478–81, 483–7, 501, 603–4, 686–8 direct, 202, 211, 403, 533–4, 590, 592, 600–601, 603–4 indirect, 179, 290, 416, 590, 719 internal, 171, 195, 720 mathematical, 635, 638 nonspecific, 262–3 novel, 14, 485, 556, 622 null, 532 patient, 238, 242–3, 245, 276 syntactic, 102, 277 theme, 242–5 obligatory complementizers, 759–60 obligatory contexts, 379, 394, 403, 422–4, 451, 457 obligatory control, 293 obligatory piping, 215, 221–2

988 Index O’Brien, E.K., 184, 194, 204, 247–9, 252, 787 obstruent voicing, 34, 113, 123 obstruents, 14, 60, 112, 116, 119, 799–800, 808, 812 final, 112, 799–800 voiced, 113, 728 OC (object control), 275, 277 Odic, D., 500, 502 Ogiela, D., 602–4 Olsen, M.B., 597–8 omission, errors of, 216, 407, 423, 718 omissions, 192, 382, 421–3, 444, 446, 453–5, 759–60, 762 final consonant, 793, 797, 799, 806 one-participant events, 159, 166–7 onset clusters, 31, 808, 810–12, 815–16 onset selection, 46–7, 59, 66 onsets, branching, 45, 48–9, 59–61, 66 opacity, 159, 297 operators, 388, 473, 476, 558–9, 572, 582, 591–2 default, 573, 582–3 disjunction, 550, 554, 561, 614–15 existential, 468, 573 focus, 559, 562 generic, 582–3, 585 tense, 598–9 Optimality Theory (OT), 43, 45–6, 50–1, 725–7, 731–3, 735–7, 801–2, 804–11 optimization, 23, 608, 665–6, 739–40 optional infinitive stage, 341, 346, 357, 363, 365, 423 optionality, 342, 346, 399, 538, 731, 733 oral stops, 8, 10, 34 order, 8, 10–11, 93–4, 227–8, 449–51, 478–80, 745–6, 760–1 linear, 96, 272, 486, 512, 691 syllable, 135, 146 Orfitelli, R., 183, 194, 198, 202, 395, 401 organization, metrical, 73–4, 76 OT, see Optimality Theory output segments, 806, 812–13 outputs, 20, 49, 51, 62, 124, 730, 732, 734 phonetic, 791, 793–4, 815 unmarked, 51, 64 overextensions, 418–19, 433 overgeneralizations, 119, 121–2, 127, 170, 172–3, 175–6, 283, 287 causative, 172, 175

overhypotheses, 677 over-regularizations, 127–8 overt quantifiers, 572, 582–3 overt subject languages, 344–6, 355 overt subjects, 193, 291, 294, 344–6, 360, 362–3, 386–7, 395–6 Ozturk, O., 373, 616, 618, 621 P+D suppletion, 208, 221–3 pairs, 22, 25, 140–1, 478, 480, 680–1, 727–8, 773 aspectual, 592–3 loser, 727–31, 738, 746 minimal, 11, 13–15, 21–2, 197, 456, 681, 791 question-answer, 313, 356, 619 paradigm leveling, 127 parameter interaction, 702, 705, 719, 723 parameter setting, 696–724 parameter space, 698, 705, 716, 719–20, 723 parameter treelets, 712–13, 715 parameter values, 558, 697–8, 701–4, 707, 710–12, 714–16, 718–19, 721 correct, 698, 702, 708 incorrect, 706, 718 vectors of, 702–3, 707, 709, 715 parameters, 73, 109, 390–2, 557–8, 696–9, 701–3, 706–16, 718–23 pied-piping, 702, 708, 714, 718, 720 parameter-settings, 73, 95, 220, 223, 227, 390–1 parametric ambiguity, 698, 701–8, 719, 723 parametric expression, 702–3, 706 parametric frameworks, 224, 710, 726 Parametric Principle, 706, 711, 713–14, 720 parental speech, 254, 575–6 parents, 395–6, 403, 405, 566, 575, 578, 779, 786 Parisian French, 145–6 parsers, 139, 711–12, 714 failure, 392 parsing, 102, 270–1, 277–8, 392, 702, 704–5, 709, 712–16 decisions, 175 interpretive, 73, 160, 624, 737–8, 740, 742 violations, 704–5 Partee, B.H., 469, 588, 637 partial movement, 303, 327–8, 331, 335 questions, 328, 330, 335 structures, 328–9, 331–2, 335, 338–9

Index 989 participles, 119, 259–60, 263, 345, 389, 593, 596 passive, 191–2 particle constructions, 106, 108, 211–12, 223–9 particle verbs, 602–3 particles, 105–9, 206–7, 209–13, 217–19, 223, 227–9, 592–3, 600–602 mood, 371, 379 prepositional, 206, 217, 223, 225 separable, 105, 108–9, 224–5 verb, 590, 592, 600, 609 partitions, 469, 471, 476, 569 partitive frame, 506–7 passive acquisition, 185, 248–9 theoretical approaches, 248–51 passive construction, 187, 239–42, 247, 251–6, 276 passive participles, and adjectives, 191–2 passive primes, 172, 204 passive sentences, 172, 180, 183, 186, 194, 196, 236, 239–42 passive suffixes, 127, 236–7, 256 passive verbs, 195, 257 inherently, 257–64 passive voice, 179–85, 242, 244, 253 passives, 179–205, 230–2, 235–46, 248–56, 258–60, 262, 276, 278–9 A-chain Deficit Hypothesis, 188–9 to Universal Phase Requirement (UPR), 189–91 actional, 181, 183, 186–90, 198–200, 203–4, 242, 244, 252 adjectival, 188–92, 200–201, 236, 250, 252–3 and AIH (Argument Intervention Hypothesis), 202–3 and A-movement, 236–57 BE omission, 192–3 and CAH (Canonical Alignment Hypothesis), 199–202 cross-linguistic findings on acquisition, 251–6 embedded, 275–7 empirical results, 239–48 future development, 202–5 impersonal, 238, 252, 388–9 inherently passive verbs, 257–64 long, see LP maturation of A-chains, 248–9

and middle voice, 196–8, 204 non-actional, 180–1, 186, 204, 242, 244, 246–7, 252, 255 non-maturational approaches, 249–51 nonreversible, 239–41 and post-verbal subjects, 193–4 problems for A-chain Deficit Hypothesis and Universal Phase Requirement (UPR), 191–8 psych, 182–3, 187, 199–200, 204 resultative, 190–1 reversible, 205, 239–41, 252 short, see SP theoretical approaches to acquisition, 248–51 and transmission of thematic roles, 198–9 typology, 238 usage-based account of acquisition, 185–7 verbal, see verbal passives and voice systems, 194–6 passivization, 191, 202, 233, 235, 242, 244–5, 276, 289–90 patient objects, 238, 242–3, 245, 276 patients, 110, 159–60, 163, 167, 172–3, 252, 255, 528 patterns natural language, 637–8, 651, 656, 659–60, 662 phonotactic, 28–38, 40–2, 638 pauses, 80, 134, 136, 158, 170, 600–601, 682, 684 PCFGs (probabilistic context-free grammars), 651–2, 676 Pelucchi, B., 141, 151, 668, 670 Penner, Ζ., 440, 444, 447–8, 602, 753, 766 perception, 11–13, 23, 25, 44, 61, 70–1, 78, 767–8 phonological inventories, 12–16 speech, 12–13, 18, 26–7, 39, 41–2 verbs, 242, 244–5 perfective, 591–5, 598–9, 605–7, 609–10 acquisition, 604–9 aspect, 590–1, 596, 598–9, 605–8 completion entailment, 606, 609 perfectivity, 355, 598–9 performance, 248, 275, 278, 479, 483–4, 520, 620–2, 628 errors, 98, 323 limitations, 401–2, 405–6, 410–11, 413

990 Index Perfors, A., 285, 653–4, 667, 692–4 periphery, left, 286, 310, 318, 320, 339–40, 388, 409 Perlmutter, D.M., 159, 231, 257 personal determiners, 470 Pesetsky, D., 159, 201–2, 338 Phase Impenetrability Condition (PIC), 190, 193, 266 phenotypes, 783–5, 787–9 phenotypic correlation, 775–6, 779 Phillips, C., 345, 365, 393, 423–4, 431, 529, 762 phoneme transition pairs (PTPs), 139 phonemes, 17, 25, 28–30, 39, 42, 147, 681–3, 798–9 single, 681–2 phonemic inventory, 29, 128, 798 phonemic representations, 21, 76, 683 phonemic split, 799, 805 phonemic status, 33, 35, 42 phonetic categories, 20, 25, 676, 679–82 phonetic categorization, 13, 680 phonetic inventories, 790–1, 802–3, 805, 809 phonetic naturalness, 33–4, 42, 116 phonetic output, 791, 793–4, 815 phonetic representations, 17, 23, 343, 802 phonetics, 16, 18–21, 25, 85, 233, 679, 795, 800 phonological acquisition, 8, 11, 41, 153, 790–1, 793, 807 phonological alternations, 114, 117, 726 morphologically-sensitive, 114 phonological complexity, 411, 815 phonological development, 75, 791, 798, 803 phonological disorders, 790–816 constraint-based accounts, 802–15 rule-based accounts, 793–802 phonological forms, 72, 101, 120, 150, 153, 291, 432, 582 phonological grammars, 10, 20, 43, 66, 123, 125, 128, 131 phonological inventories, 7–26, 42 acquisition of novel patterns, 32–5 future, 23–6 perception, 12–16 production, 8–12 traditional views, 7–22 phonological knowledge, 36–7, 42, 121 phonological patterns, 36, 111, 123, 638, 792

phonological phrase boundaries, 136–8, 152 phonological processes, 43–68, 80, 114, 116, 801–2 phonological properties, 36, 118, 144, 455 phonological rules, 116, 120, 122, 124, 676, 681–2, 794, 800 phonological structures, 81, 83, 86, 668, 801, 803, 815 phonological systems, 35, 44, 54, 56, 72, 83, 86, 791–2 phonological vowels, 382–3 phonology, 16–17, 116, 123–4, 131–3, 697, 775–6, 790–3, 801 autosegmental, 68, 82–3, 85–6 child, 7 constraint-based, 123–5 disordered, 792, 807–8, 811, 813 generative, 17, 705, 707, 792, 800 morphologically-sensitive, 119, 125 prosodic, 68, 80 phonotactic acquisition, 30, 32, 34–5 phonotactic constraints, 28, 36, 147 phonotactic knowledge, 28–33, 35–6, 42, 121–2 phonotactic patterns, 28–38, 40–2, 638 phonotactic probabilities, 31–2, 36–7, 798 phonotactic regularities, 29, 34–5, 117 phonotactically-motivated alternations, 112–13 phonotactics, 28–38, 41–2, 111–13, 123–4, 142, 151, 382, 790 legal, 29–30, 36, 38, 118 and lexical acquisition, 36–7 probabilistic, 31–2 and prosodic/word domains, 37–8 voicing, 38, 112 and word segmentation, 35–6 phrasal comparatives, 473, 491 phrasal prosody, 135–8, 141, 151–2 phrase boundaries, 78, 82, 136, 635 phonological, 136–8, 152 phrase structure, 316, 340, 719 physiological constraints, 58, 62 Piaget, J., 464, 488, 503, 547, 559 PIC (Phase Impenetrability Condition), 190, 193, 266 picture identification, 239, 250, 252

Index 991 picture verification tasks, 246, 293, 523, 529–30, 533–4, 536–7, 540–1 picture-selection, 103, 242, 255, 266 pied piping, 332, 338–9, 702 pied-piping parameter, 702, 708, 714, 718, 720 Pierce, A., 193, 196–7, 249, 258–9, 352–3, 375, 395, 700 Pine, J.M., 170, 317–19, 348, 358, 433–4 Pinker, S., 43, 121–2, 163–5, 172–5, 181–2, 243–5, 248–9, 771–2 piping, 209, 212–16, 219, 221–3 obligatory, 215, 221–2 option, 213–14, 216 pitch, 78, 80–3 configurations, 78, 82, 84 contours, 79–81, 84, 86 differences, 78–9 lexical, 80, 82 patterns, 69, 79–81, 83 sensitivity to, 78–9 Pitt, L., 642–3, 646–7, 654 plausible dissent, 528, 542 Plunkett, B., 127, 130, 354, 376, 395 plural allomorphs, 121, 131 plural markers, 93, 101, 103, 121, 125 pluralia tantum, 100–102 plurality, 113, 118, 581 plurals, 121–2, 130, 757 bare, 569–7 1, 573, 575, 578–80, 585–6 definite, 561, 571, 575, 578–81 irregular, 99–101 regular, 99–101 plurals-within-compounds, 101 Poeppel, D., 286, 351, 375 polarity, 144, 763–4 positive, see PPI Polish, 113–15, 118–19, 127, 593, 596, 606–7, 745, 812 Polka, L., 83, 145–6 PoS, see poverty of stimulus positional constraints, 48, 50 positional faithfulness constraint, 49–50 positive evidence, indirect, 690 Positive Polarity Items, see PPI positive recursive data, 643, 646, 651, 655

positive stochastic data computable, 644, 646, 648, 652 distribution-free, 643, 646, 654, 657 possession, 90, 118, 435, 440, 454–6 adnominal, 435–6, 440, 454 possessive construction, 436, 440–1, 447–53, 456–7 possessive markers, 338, 437, 444, 446 possessive morphemes, 444, 454–5 possessive pronouns, 440–2, 444, 446–9, 457, 522 possessives, 283–4, 308, 437–9, 441, 443, 449, 451, 453–9 acquisition, 435–59 adnominal, 435–41, 443, 446, 448, 454 future research, 457–9 non-prepositional, 436–41 prepositional, 436–41, 446, 452, 456–7 pronominal, see possessive pronouns recursive, 284, 438, 441, 443–4, 454–7, 459 possessor–possessum order, 436–7, 440, 442, 446–7, 449–53 possessors, 90, 93, 435–40, 442, 444–53, 455 in isolation, 442, 451, 453 possessum, 435–7, 439–40, 444–53, 455 possessum–possessor order, 437, 440, 445–6, 448–52 possible grammars child grammars as, 45–50 space of, 3, 705, 713 posterior probability, 25, 684, 692 post-verbal position, 179, 196–7, 237, 259 post-verbal subjects, 237, 259, 396 and passives, 193–4 potential agents, 184, 247–8 Potsdam, E., 262, 473 Pouscoulous, N., 619–20, 622, 624 poverty of stimulus (PoS), 342, 576, 665, 667, 707 P&P acquisition, 697–9, 706, 708 P&P domains, 699–700, 716 PPI (Positive Polarity Items), 553–5, 557–8, 561–3 P-questions, see prepositional questions pragmatic constraints, 521, 524 pragmatic coreference, 522, 524, 534 pragmatic cues, 570, 578, 585

992 Index pragmatic felicity, 278, 529, 621 pragmatic implicatures, 301, 548, 551, 558, 564 pragmatic infelicity, 249, 617–19 pragmatic inferences, 611–12, 614–17, 620, 627–9 pragmatic interpretation, 611, 615, 620, 628 pragmatic knowledge, 312–13, 546 pragmatic rules, 525–6, 530 pragmatics, 493, 497, 509, 513, 522, 546–7, 581, 585 Pratt, A., 343, 362, 424 predicate telicity, 590, 600–604, 609 predicate types, 232, 585–6 predicates, 159–61, 242, 435, 463–4, 467, 589, 598–9, 602–3 atelic, 589, 591, 597–600, 603, 605 complex, 105, 107, 109, 289 embedded, 267–9 gradable, 466–7, 472 lower, 268–9 stage-level, 573, 585 telic, 589, 591, 593, 597–600, 602, 604–6, 608 verbal, 110, 535–6, 542 predictions, 143–5, 217–18, 220–2, 335–6, 378, 432–4, 677–8, 745–7 typological, 85, 735–6 preemption, statistical, 165 prefixes, 97, 113, 211, 590, 592, 601, 604, 609 preposition stranding, 209, 718 prepositional complementizers, 210, 219–20, 223, 228 prepositional datives, 164, 171–2, 176 prepositional objects, 720, 723 prepositional particles, 206, 217, 223, 225 prepositional phrases, 90, 159, 175, 677, 702, 720 directional, 488, 496, 590, 592, 600 prepositional possessives, 436–41, 446, 452, 456–7 prepositional questions, 209, 215–16 prepositions, 183, 198–9, 206–13, 221–3, 415–16, 440–1, 454–5, 677 preposition-stranding, 209, 718 preschool children, 313, 478, 512–13, 519, 547, 551, 559, 561 prevalence judgments, 567–8 preverbal position, 108, 179, 194, 197–8

preverbal subjects, 259, 345 Primacy of Modality hypothesis, 367–8, 374–5, 378, 381–2, 385 primary stress, 69, 71–2, 74 primes, 130, 186, 203 priming, 40, 171–2, 187, 203–5, 478 effects, 172, 186, 203–4 syntactic, 186–7, 203–4 primitive recursive data, 644, 646, 650, 658 primitive recursive functions, 644, 657, 659 primitives, 45–6, 50, 54, 57, 66, 594, 610 aspectual, 594–5, 609–10 Prince, A., 41, 43, 72–3, 121–3, 735–6, 743, 801–2, 809–10 Principle A, 25–6, 75, 392, 521, 540–1, 545–6, 569, 700 Principle B, 517–18, 520–1, 523–8, 531–2, 535–6, 539–40, 662, 666; see also DPBE Principle C, 313, 334, 517–18, 520–1, 529–30 principles-and-parameters theory, 319, 390 prior knowledge, 1, 3, 36, 121–2, 667, 687, 717 probabilistic context-free grammars, see PCFGs probabilistic extensions, 731, 733–4, 741 probabilistic grammars, 410, 653, 733, 740, 745–6 probabilistic learners, 665, 733, 746 probabilistic learning, 77, 456, 666 frameworks, 647, 656 probabilistic phonotactics, 31–2 probabilistic ranking, 731–2, 735 probabilistic weighting, 732, 734–5 probability, 25–6, 642–4, 646–7, 650, 654, 674–6, 715–16, 734 distributions, 635, 638, 654–5, 658–9, 665, 731–2, 734, 739–40 posterior, 25, 684, 692 theory, 25, 665, 668, 673–4 transitional, see TPs processing capacity, limited, 98, 426, 539–40, 543 processing limitations, 198, 321, 406, 411, 432, 526, 529 processing resources, 277, 325, 525, 527, 546, 599, 608, 686 production, 43–5, 47, 49, 51, 53, 55, 61–3, 495 accuracy, 791

Index 993 elicited, see elicited production phonological inventories, 8–12 productive rules, 244–5 productivity, 91, 120, 129, 284 projection principle, 233–4, 273 extended, 237, 697 projections, functional, 233, 357, 429, 440, 532 pronominal possessives, see possessive pronouns pronominal subjects, see subject pronouns pronouns case errors, 418 clitic, 530–1, 533–4, 537, 539 expletive, 411–12 interpretation, 525–6 interpretation and production, 523–40 nominative, 419–20 non-nominative, 348, 418 object, 412, 428, 533–4, 765 possessive, 440, 444, 446–9, 457, 522 reflexive, 521, 523, 535–6, 540–1 resumptive, 215, 340 singular, 83, 343, 414, 426 strong, 530–1, 533–4, 543 subject, 412, 418, 428 weak, 532, 534 properties, 227, 414–16, 467–8, 485–7, 506–8, 565–9, 578–83, 587–90 acoustic, 679–80 defining, 285, 553, 648, 811 fundamental, 376, 386, 792 generic, 569, 574, 578 lexical, 81, 590, 701 liquid, 45, 61–2 quantificational, 592, 601 referential, 281, 354–5, 526, 534, 690 semantic, 157, 172, 204, 352, 406, 429, 511, 524 statistical, 126, 667–8 syntactic, 210–11, 219, 343–4, 352, 388, 397, 533–4, 537 temporal, 158, 587–8, 592, 610 universal, 660–1 proportional quantifiers, 502, 567, 625, 627 prosodic boundaries, 135–8 prosodic constituents, 45, 51, 53, 57, 66, 152 prosodic constraints, 125, 455 prosodic cues, 81, 135–6

prosodic phenomena, 54, 68–86 prosodic phonology, 68, 80 prosodic representations, 125–6 prosodic structure, 126, 136 prosodic units, 133, 150 segmentation, 136–8 prosodic words, 51, 53, 74, 114, 136 prosody, 85, 141, 143, 152, 401–2, 412, 455 phrasal, 135–8, 141, 151–2 pseudopartitives, 492 pseudopassive, 209 P-stranding, 218 psych passives, 182–3, 187, 199–200, 204 psych verbs, see psychological verbs psycholinguistics, developmental, 3, 23, 341–2, 355, 401, 408, 633–63, 695 psychological verbs, 181–3, 187–9, 199, 205, 575 PTPs (phoneme transition pairs), 139 puppets, 268–9, 275–7, 313, 325, 336, 551, 554, 616–18 putative glides, 45, 61 puzzle, 18–19, 63–4, 346, 348, 466, 605, 769 Pye, C., 11, 74, 255, 793 Pyers, J., 296–8 QR, see quantifier raising QTL (quantitative trait locus), 783–6, 788 quantification, 25, 498–519, 572, 585, 591, 614 quantificational expressions, 498–9, 557 quantificational properties, 592, 601 quantified sentences, 515–16 quantified subjects, 511, 524, 528, 531 quantifier raising (QR), 473, 489, 516–18, 522, 532 quantifiers, 499–502, 504–6, 508–9, 511–13, 519, 582–3, 613–15, 623–7 exactness, 480, 509–11, 627 isomorphism and scope of negation, 511–15 learning, 499–508 natural language, 500–501, 504 overt, 572, 582–3 proportional, 502, 567, 625, 627 scope ambiguities without negation, 515–16 syntax and semantics, 509–19 universal, 559, 567 weak, 492, 623, 627 quantitative trait locus, see QTL

994 Index quantization, 600, 603–4 Québec French, 48, 66 question words, 310, 316, 318–19, 324–5, 335, 337 question-answer pairs, 313, 356, 619 questions 2-wh, 335–7 acquisition, 310–40 adjunct, 319, 400 affirmative, 314–15, 322–4, 758 argument, 319 complex, 326, 326–9, 337 embedded, 315–16, 319, 337 empirical, 73, 83, 86, 132, 177 matrix, 311–26, 339–40 partial movement, 328, 330, 335 tag, 758, 762–5 raising, 115, 230, 233, 235, 254, 258, 268–70, 276 constructions, 234, 264–5, 268 sentences, 268, 270–1 structures, 198, 200, 268–9, 281, 294 subject, 231, 235, 249, 264, 271, 278 verbs, 231–2, 235, 266, 268, 278 raising-to-object, 278 raising-to-subject, 264–7 1 Randall, J., 162, 198, 260 ranking, constraint, 3, 43, 45, 54, 73, 727, 737 ranking biases, 124, 743–4, 746 rankings, 52, 123–5, 726–34, 736–40, 745–6, 802–4, 806, 812 fixed, 55, 804, 808–9 learning, 727–31 probabilistic, 731–2, 735 total, 729, 731–3, 736, 739–40, 746 rare constructions, 458–9 RCD (Recursive Constraint Demotion), 727–31, 738, 745 real numbers, 635–6, 639, 641, 644, 676 reanalysis rules, 216–17 receptive inventories, 16, 18–19, 21, 23–5 recognition, 69–70, 146, 298, 725, 805 recursion, 92, 101–2, 279, 283–5, 443, 667, 693–4 complements, 284, 305–8 recursive class, 644, 648, 651, 660–1 Recursive Constraint Demotion, see RCD

recursive data, positive, 643, 646, 651, 655 recursive endocentric compounds, 92, 109 recursive functions, 644, 657, 659 recursive languages, 638, 646 recursive patterns, 637, 655, 659, 661 recursive possessives, 284, 438, 441, 443–4, 454–7, 459 acquisition, 443, 459 recursive rules, 693–4 recursive stochastic languages, 646–7, 651, 654–5, 661 reduced clausal analyses, 473, 491 referential dependencies, 520–2, 527, 539, 543 referential expressions, 232, 575, 584 referential properties, 281, 354–5, 526, 534, 690 referential subjects, 269, 402, 408, 412, 524, 528–9 referents, 173, 195, 252, 426, 543, 620, 671, 686–90 intended, 686, 690–1 reflexive clitics, 259–60, 263–4 reflexive interpretation, 524, 530, 532–8, 540, 543 reflexive pronouns, 521, 523, 535–6, 540–1 reflexive verbs, 197–8, 536, 543 reflexive-marked verbs, 539–40 reflexives, 197–8, 200, 259–61, 520–1, 523, 526, 537–46, 761 interpretation and production, 540–6 long-distance, 546 SELF, 523, 541–3 weak, 525–6 reflexivity, 523, 535–6, 543 regular plurals, 99–101 regular verbs, 117, 122 regularities, phonotactic, 29, 34–5, 117 Reinhart, T., 292, 521, 523, 526–7, 530, 535, 538, 542 Reiss, C., 18, 44, 46, 790 relative clauses, 214, 315, 337, 514, 552–3, 757, 759, 764–6 Renfrew Bus Story, 777, 779 repetition, non-word, 775, 779, 782, 784–7, 789 representational development, 17–18 representations, 17, 23, 40–2, 257–8, 298, 500–502, 540, 793–4 abstract, 133, 163, 186–7, 203–4, 429, 681

Index 995 autosegmental, 81–6 conceptual, 157, 166, 173, 576 event, 166, 174 grammatical, 2, 406, 742, 767 lexical, 17, 23, 41, 111, 161, 741, 799 mental, 118, 133, 298, 531, 543 number, 499, 501, 508 phonemic, 21, 76, 683 phonetic, 17, 23, 343, 802 prosodic, 125–6 semantic, 233, 328, 463, 469, 476–7, 486–7, 502, 612 syntactic, 163–4, 187, 203, 230, 233, 307, 430, 438 underlying, see underlying representation reranking, constraint, 43, 46–50, 73–4 resources, 402, 406, 648, 660, 716, 768 cognitive, 1, 405, 413, 426, 508, 617, 685–6 computational, 709, 711, 726 restrictions, semantic, 174, 269, 352 restrictive grammars, 741–4, 747 restrictive languages, 726, 741 learning, 742–4 restrictiveness, 736, 742–4 restrictors, 559, 572, 583–4 resultative analysis, 191–2 resultative passives, 190–1 resultatives, 107, 190, 192, 226–8, 289–90 adjectival, 105, 107, 109–10 resumptive pronouns, 215, 340 retardation, mental, 752, 754–6 reversible passives, 205, 239–41, 252 rhythm, 135–6, 142, 151 rhythmic classes, 142–3 rhythmic information, 143, 150–1 rhythmic segmentation, 142–4, 146–7 rhythmic units, 135, 138, 142–5, 147, 152 RI phenomenon, 342–4, 356, 359, 365, 375 Rice, M., 19, 341, 363, 763, 782, 784–5, 788 right-to-left directionality, 54–5, 72–3 RIP (Robust Interpretive Parsing), 73, 160, 624, 737–8, 740, 742 RIs, see root infinitives Robust Interpretive Parsing, see RIP rogue grammars, 44–5, 59, 62–4 rogue substitutions, 57–64

Romance languages, 95–6, 103, 105, 109–10, 211–12, 222–3, 532–3, 571; see also individual languages Romance null subject languages, 377, 395–7, 400, 403, 409 root compounds, 90–1 root infinitives (RIs), 341–67, 374–8, 385, 400–401, 431, 599, 762–3, 765 challenges for future research, 365–6 constructivist approaches, 358–9 Mainstream Generative Grammar, 356–8 modality, 375–7 null subject languages, 360–5 root nonfinite verbs, 341–56 Variational Learning Model, 359 root nonfinite verbs, 341–56 root-affix boundaries, 114 roots, 91, 111–13, 115, 125, 127, 343–4, 361, 409 productive, 228–9 verbal, 80, 131 Rosen, T., 526, 529–30, 753 rote learning, 120–1, 441, 454–5 Rothstein, S., 347, 588, 590 Rowland, C., 314–15, 317–19, 324, 349 rule-based accounts, 793–802 rule-governed knowledge, 29, 32, 42 rules, 120–2, 232–5, 481–2, 526–8, 676, 691–4, 696–8, 799–802 allophonic, 24, 798–9 breakdown of, 527, 530, 539 grammatical, 126, 516, 525, 692 linear, 691–3 linking, 161, 199 neutralization, 10, 796–7 phonological, 116, 120, 122, 124, 676, 681–2, 794, 800 pragmatic, 525–6, 530 reanalysis, 216–17 recursive, 693–4 tense, 121, 483 transformational, 311, 696 undergeneralization of, 599, 609 Rumain, B., 549, 614–15, 618 Rumelhart, D., 127, 640, 666 Russian, 260, 262, 264, 338–9, 425, 427–8, 529–30, 765–6

996 Index Russian-speaking children, 259, 262–3, 421, 425, 529, 606 Saffran, J.R., 33–7, 40, 135, 139–41, 143, 151–2, 668–9, 671–2 saliency, acoustic, 31, 38, 42 samples, 218, 351, 362–3, 393, 396, 453, 784, 786 spontaneous speech, 220, 243, 255, 401 sampling error, 456 Gibbs, 678, 685 Sano, T., 255–7, 263, 345, 375, 408, 515 Santelmann, L., 128, 315, 317, 349, 352 satellites, 223–6 Sawada, O., 420, 422 scalar alternatives, 621–3, 627 scalar expressions, 612, 614, 617, 621, 623, 625, 628–9 scalar implicature, 373, 493, 548, 550, 611–29 processing in children, 614–24 and semantics, 625–7 theoretical background, 612–14 scalar items, 621, 625, 628 scalar statements, 618, 621, 623 scalar structure, 485, 488 scalar terms, 509, 613, 615, 618, 625 scalars, 485, 614, 620, 625–8 weak, 617, 621, 625, 628–9 scales, 467, 469, 475–6, 484–5, 487, 613, 623–4, 626 contextual, 623–4 harmonic, 803, 808 logical, 613–14, 623–4 non-logical, 613, 619, 624 numerical, 616, 626 Scandinavian Languages, 209, 222 Scerri, T.S., 784–5 Schuele, C.M., 759–60 Schulte-Korne, G., 784–5 Schütze, C., 323, 346–8, 418–21, 426–7, 430–3, 651 scope, 261–3, 297–300, 511–12, 515–16, 518–19, 549–51, 553, 556–8 inverse, 511, 514–16, 556–7 nuclear, 559, 572–3, 584 surface, 514–15 scope interpretation, 511, 515–16, 553, 556–7

scope markers, 327–8, 331, 335–6 scrambling, 321, 428 search heuristics, 706, 723 search space, 181, 205, 703, 714, 729 Sebastian-Galles, N., 30 second language acquisition, 797 secondary stress, 69, 71, 76 second-order construal, 307–8 second-order patterns, 28, 33 segmentation, 136, 142–7, 149, 152, 669, 683–5 abilities, 133–4, 138, 140, 142–3, 145–7, 151–3 cues interactions and combinations between, 150–2 at word level, 138–50 procedures, 134, 138, 142, 149–50, 152–3 prosodic units, 136–8 rhythmic, 142–4, 146–7 syllabic, 145–6 word, 35–6, 137, 141, 145–7, 153, 668–70, 679–80, 682–3 segments, 17, 28, 33–5, 144–5, 149–50, 152–3, 670–2, 808–9 output, 806, 812–13 Seidl, A., 27, 33–5, 40, 136–7, 151–2, 312, 662 selection, 35, 59, 198, 385, 479, 803 auxiliaries, 197, 259–60, 264 pictures, 103, 242, 255, 266 selective impairment, 753–5 selective sparing, 752, 755–6 SELF anaphors, 541, 543 SELF reflexives, 523, 541–3 Selkirk, E., 29, 54, 125, 808 semantic analyses, 467, 470, 473, 497 semantic arguments, 230, 571 semantic bootstrapping, 165, 178 semantic complexity, 605–6, 608–9 semantic constraints, 202, 242, 245, 446, 449 semantic interpretation, 105, 109, 225, 328, 689 semantic knowledge, 312, 501, 550–1, 555, 564 semantic properties, 157, 172, 204, 352, 406, 429, 511, 524 semantic representations, 233, 328, 463, 469, 476–7, 486–7, 502, 612 semantic restrictions, 174, 269, 352

Index 997 semantics, 162, 164, 166, 240–2, 263–4, 278–9, 302–3, 312–13 acquisition, 461–629 Boolean, 554–5 canonical, 231, 277 compositional, 102, 210 degree, 476, 485, 497 exact, 595, 625–6 lexical, 160, 212, 463, 509, 548–9, 563, 623, 629 mapping, 688–9, 694 syntax–semantics interface, 109, 590, 600–601, 668 temporal, 595, 597 verb, 175, 292 semi-auxiliaries, 287, 291, 370, 374 sensitivity, 15, 29, 31–2, 38–40, 78–80, 136–7, 579–81, 746 to pitch, 78–9 sentence adverbs, 352–3 sentences, 195–201, 266–7 1, 493–5, 509–15, 526–31, 547–52, 559–68, 704–14 active, 180–1, 183–4, 190, 195, 200–201, 203, 241–2, 252 ambiguous, 325, 511, 513, 713 declarative, 350, 698 ECM, 536–9 embedded, 536, 539, 544 generic, 567–9, 572–4, 582–3, 585 ill-formed, 635, 637 input, 700, 703–5, 707–9, 712, 715–16, 718 intransitive, 735 negative, 513, 551–7, 561–3 non-generic, 575 nonreversible, 239–40 quantified, 515–16 raising, 268, 270–1 subjectless, 393, 395, 401 target, 172, 186–7, 554, 557, 705 test, 524, 527–9, 534, 548–9, 551, 554–7, 559–60, 563 transitive, 537–9, 557, 600, 604, 609 two-clause, 296, 541 sentential negation, 324, 353, 557 separable particles, 105, 108–9, 224–5 separable-particle constructions, 105, 107–8, 216, 223–4, 227–8

Serbo-Croatian, 78, 215, 218, 225 Sesotho, 79–80, 83, 126, 170–1, 194, 196, 250–3, 256 SGA (Stochastic Gradient Ascent/Descent), 735–6, 738–40, 743, 745–6 shared environmental variances, 774–5 Shirai, Y., 595–7, 609 short passives, see SP short-term memory, 754–5, 767 Shvachkin, N.Kh., 13–14, 44 sign language, 225, 298, 392, 577 Sigurjónsdóttir, S., 352–3, 375, 530, 534–6, 544–5 Sigurðsson, H.Á., 388, 390–1 simplicity, 613, 653, 694, 696, 723, 730 simulations, 74–7, 139, 681, 685, 733, 737–40, 745–6 simultaneous learning, 740, 744, 746–7 Sinclair de Zwart, H., 464, 479 Single Value Constraint, see SVC singletons, 46, 59, 62, 119, 814–15 singular pronouns, 83, 343, 414, 426 singular subjects, 355, 360, 422 singulars, indefinite, 570, 581 Skoruppa, K., 70, 130, 132 Slabakova, R., 590, 595, 599, 608, 610 Slavic languages, 216, 218, 225, 592, 608 SLI (specific language impairment), 342, 363– 5, 752–5, 758–69, 781, 784–5, 787–8 Slobin, D.I., 125–6, 181, 183, 239–40, 594–5, 601 small clauses, complements, 285–90 Smith, N., 10, 18–20, 44, 54–5, 63 Smolensky, P., 45–6, 73–4, 666, 725–7, 729–32, 737–41, 743–4, 801–2 smooth domains, 707–8, 719 SMT (Strong Minimalist Thesis), 300, 307–8 Snedeker, J., 81, 164, 171–2, 175, 484, 620, 624–5 Sneed German, E., 575–6, 584–6 Soderstrom, M., 128, 137, 735 soft constraints, 664–5 Solomonoff, R.J., 653–4 sonorant clusters, 808–9, 812 sonorants, 14, 112, 812 sonority, 47, 60, 808, 812, 814–15 sequencing principle, 29–30, 808 sound patterns, 18, 27, 31–3, 111, 120, 133, 142

998 Index sound systems, 790–2, 801, 812 acquisition, 5–86 sounds, 8, 21–2, 28, 31, 139, 680–1, 791–2, 800–804 individual, 17, 42, 680 sources, 10–11, 44, 319, 345–6, 353, 426–7, 430–1, 559 Southern Romance, 362–3, 365 SP (short passives), 179, 181–4, 189, 198–9, 204, 237–8, 247, 705 Spanish, 207–12, 215–16, 223–6, 357–63, 532–4, 537–40, 548–50, 552–3 Spanish-speaking children, 11, 331, 362–3, 399–400, 532–3, 538, 540, 542–3 sparing, selective, 752, 755–6 speaker intentions, 611, 622, 686, 688 SpecCP embedded, 332–3, 335–6 intermediate, 327–8, 330–1, 335 special populations, 365, 577, 751–2, 758–9, 765, 769 questions, 751–2 specifiers, 190, 266, 316, 340, 409, 439–40, 448, 532 SpecIntP, 320–1 SpecTP, 249, 256–8 speech perception, 12–13, 18, 26–7, 39, 41–2 infant, 12–13, 15, 17, 27, 29, 37, 39, 41–2 speech sounds, 13, 16, 18, 21, 23, 28, 680 speech stream, 134, 137–8, 147, 150, 153, 670, 672 Spelke, E., 499, 506, 627 Spenader, J., 527, 608 Spencer, A., 54, 89, 91, 95, 101, 800 spontaneous speech (SS), 97, 121–2, 215–17, 274, 294–6, 393–5, 401–2, 759 samples, 220, 243, 255, 401 tense and aspect, 596–600 Sportiche, D., 249, 532 SS, see spontaneous speech S-structures, 233–4, 265, 517–18 stacking, 207, 210, 212 Stager, C.L., 14, 20, 22 Stampe, D., 18, 43, 45, 113, 744, 793–4 standardized tests, 753, 776–7, 780, 785–6, 791 statements, underinformative, 373, 617–19 statistical cues, 668, 670, 672

statistical information, 666–70, 672–3, 694–5 statistical learners, 667, 673, 683–4, 694 statistical learning, 664–9, 671–3, 675, 677, 679, 681, 687, 693–5 statistical preemption, 165 statistical properties, 126, 667–8 statistics, 23, 139, 653–4, 667–8, 670, 774–5, 780 stative verbs, 185, 189–90, 200, 238, 241, 270, 352 Stemberger, J., 54–5, 119, 124–5, 790, 792–3, 795, 808, 810 stems, bare, 90, 94, 342–4, 355, 360–3, 424 Stephany, U., 370, 379–80, 449, 596 Stevenson, S., 178 stimuli, 29–30, 32, 35, 78–9, 107, 140–2, 145–6, 213–14 items, 36, 38, 40 nonlinguistic, 78 STL (Structural Triggers Learners), 711–14, 717–18, 720 Stochastic Gradient Ascent/Descent, see SGA stochastic languages, 635–6, 638, 640, 644, 647, 650 recursive, 646–7, 651, 654–5, 661 Stoel-Gammon, C., 37, 41, 44, 51, 54–5, 58, 792–4, 798 stops, 8, 10, 14, 33–5, 60, 63–4, 808, 812 alveolar, 66, 124, 794, 797 coronal, 63, 116, 119, 125, 794 oral, 8, 10, 34 velar, 11, 795, 797 voiceless, 11, 124 Stowell, T., 213, 216–17, 219, 221, 289, 340, 370, 376 stranding, 209, 212–23, 228–9, 339 acquisition, 213–14 availability, 209, 219, 223 preposition, 209, 718 stress, 48, 68–73, 75–7, 79, 81, 85–6, 117, 744 acquisition, 69–78 and metrical phonology, 71–3 contrastive, 534 development, 69–7 1 metrical organization in, 73–6 focal, 561–2 learnability and metrical theory, 76–8 patterns, 69–7 1, 73, 75–7, 108, 114, 151

Index 999 primary, 69, 71–2, 74 secondary, 69, 71, 76 stress-based languages, 135, 143–5 stressed syllables, 48–9, 54, 58, 69, 75, 77, 746 strings, 13, 150, 635, 637, 644, 690, 701, 703 CG, 59–61 Stromswold, K., 218–19, 227, 259, 263–4, 315, 318–20, 322, 775–6 Strong Minimalist Thesis, see SMT strong pronouns, 530–1, 533–4, 543 structural ambiguity, 729, 737–9, 741–2, 747 structural descriptions, 726, 729–31, 733–4, 736–9, 742 structural positions, 260, 389, 415–16, 428, 439, 584 Structural Triggers Learners, see STL structural variation, 534, 585 structure-dependence, 315, 667, 691–3 structures abstract, 40, 86, 163, 414, 429, 472, 474 clauses, 341–66 control, 269, 281, 291 event, 157, 159, 166, 191, 599 functional, 310, 578 grammatical, 229, 414, 672, 720 hidden, 726, 731, 736–7, 739, 742, 747 linguistic, 27, 97, 166, 329, 496, 585, 761, 792 metrical, 73, 75, 86 morphological, 102, 129, 431, 434 partial movement, 328–9, 331–2, 335, 338–9 phonological, 81, 83, 86, 668, 801, 803, 815 prosodic, 126, 136 raising, 198, 200, 268–9, 281, 294 syllable, 2, 27, 38–42, 676, 745, 808, 810, 814–15 syntactic, 318, 321, 375–6, 411, 414, 456–7, 469–70, 472–3 unergative, 260, 263–4 subject agreement, 381, 422, 425, 427, 430–3 subject case, 427, 430, 433 errors, 421, 427 subject function, 179–80, 199–201 subject gap relative clauses, 757, 759, 764 subject licensing, 347 subject markers, 80, 83 subject noun phrases, 236, 265, 311, 315, 320–1 subject production, 397, 405, 411

subject pronouns, 389, 394, 398, 401–3, 405, 408, 411–12, 418 subject raising, 231, 235, 249, 264, 271, 278 subject-auxiliary inversion, 311, 314, 318, 344, 349–50, 355 subject-experiencer verbs, 200, 204 subjectless sentences, 393, 395, 401 subject–object asymmetry, 397, 399, 405–7, 409–12, 766 subjects, 262–6, 288–92, 319–21, 386–9, 391–414, 416–22, 424–30, 545–6 animate, 168–9, 268–9 covert, 291 data on production by children, 392–404 embedded, 265, 271–4, 277, 291–2, 399 expletive, 237, 258, 266, 269–70, 273, 387, 389, 402 genitive, 419–20 inanimate, 169, 268–9, 277 indefinite, 516, 573, 581, 584 lexical, 219, 394–5, 402–3, 412 local, 533, 539, 546 main clause, 291–2, 536, 539–40, 544 matrix, 194, 231, 264–5, 268–9, 271–2, 297–8 missing, 388, 408, 413 models of acquisition process, 404–13 nominative, 418, 430–1, 433 non-nominative, 348, 430, 433–4 non-null, see non-null subject languages,387, 395, 398, 404–5, 407 null, see null subjects overt, 193, 291, 294, 344–6, 360, 362–3, 386–7, 395–6 position, 200, 256, 258, 271, 348–9, 418, 571–3, 575–6 preverbal, 259, 345 pronominal, 274, 389, 394, 398, 401–3, 405, 408, 411 quantified, 511, 524, 528, 531 referential, 269, 402, 408, 412, 524, 528–9 singular, 355, 360, 422 surface, 180, 192, 195, 199, 202, 204, 245 theme, 243–5, 271 subjunctive, 280, 282, 293, 369–70, 379–81, 384–5, 544–5 morphology, 381–3 subjunctive-marked verbs, 383–4

1000 Index subordinate clauses, 306, 325, 394, 405, 409–10, 466–73, 488, 490 subset languages, 699, 704, 717, 720 subset principle, 545, 597, 699–700, 705, 710, 723 subset problem, 516, 665, 667, 671 subset relationships, 620, 642–3 subsets, 217, 220–1, 500–501, 505–6, 545, 700–701, 705, 779 subset/superset relationships, 558, 736, 744 substitution, 10, 19, 44–5, 61, 297, 380, 762, 766 errors, 381–2, 802 glide, 59, 61, 66 patterns, 793, 802–3 velar, 44, 62–4, 66 successive cyclic movement, 310, 326–7, 330, 333, 335–6, 339–40 suffixation, 482–3 suffixes, 96, 115, 237, 380, 437, 439, 455 derivational, 94–5, 97, 104 passive, 127, 236–7, 256 Sundara, M., 145–6 superfinite classes, 646–8, 651, 653, 655, 657–8 supergrammar, 711–12 superlatives, 464–5, 474, 478, 480, 491, 493 superset languages, 699–700 superset relationships, 505–6, 700, 705, 707, 710 supersets, 243, 700, 716, 723 Suppes, P., 313, 315, 342, 352, 418, 430, 549, 561 suppletive forms, 117, 207–8, 221–3 surface scope, 514–15 surface subjects, 180, 192, 195, 199, 202, 204, 245 SVC (Single Value Constraint), 703, 709 Swahili, 96, 368–70, 374, 380–5, 397, 431, 697 adult, 384–5, 431 Swedish, 78, 80, 92, 228, 317, 375–6, 759–60, 765–6 Swingley, D., 14, 22, 37–8, 81, 670, 672–3, 678, 680 swiping, 207, 209–10 syllabic segmentation, 145–6 syllabic units, 135, 145–6, 152 syllable structure, 2, 27, 38–42, 676, 745, 808, 810, 814–15 syllable-based languages, 135, 143, 145–6

syllables, 51–3, 69–70, 72, 74–7, 137–42, 145–6, 149–50, 668–70 closed, 72, 76, 115, 130 final, 65, 145–6, 149 stressed, 48–9, 54, 58, 69, 75, 77, 746 unstressed, 48–9, 85, 126, 404, 412, 455 weight, 76–7 symbolic learning, 634, 649, 662 symmetry, errors, 504 syntactic analyses, 211, 223, 227, 584 syntactic bootstrapping, 164, 167, 177, 231, 600–601 syntactic boundaries, 136, 138 syntactic categories, 90, 150, 206–7, 211, 326, 358–9, 390, 688–91 syntactic clitics, 530, 532–4 syntactic contexts, 158, 168, 174, 304, 508–9, 537, 570 syntactic knowledge, 177, 312–13, 412, 670 syntactic movement, 232–3, 312, 319, 519, 766 syntactic objects, 102, 277 syntactic parameters, 170, 697, 705, 708, 719 syntactic priming, 186–7, 203–4 syntactic properties, 210–11, 219, 343–4, 352, 388, 397, 533–4, 537 syntactic representations, 163–4, 187, 203, 230, 233, 307, 430, 438 syntactic structures, 318, 321, 375–6, 411, 414, 456–7, 469–70, 472–3 syntactic theory, 310, 343, 386 syntactic variation, 209–10, 212, 223, 229 syntax, 159–62, 164–6, 212, 302–4, 465–6, 570–3, 590, 696–7 acquisition, 134, 155–459 comparative, 217, 221 null subjects, 386–7 syntax–semantics interface, 109, 590, 600–601, 668 synthetic compounds, 95–9, 103 synthetic -ER compounding, 96–104 Szabolcsi, A., 338, 552–3, 561 tag questions, 758, 762–5 Tagalog, 15, 180, 570, 593 Tager-Flusberg, H., 299, 406, 763 Takahashi, S., 470, 472–3 target grammars, 43, 47, 49, 64–6, 697, 699– 700, 710–12, 714–19

Index 1001 target languages, 2, 104, 452, 641–2, 699–700, 705–7, 716–18, 797–802 target sentences, 172, 186–7, 554, 557, 705 target velars, 58, 794, 797, 803 target words, 71, 134, 137–8, 141, 144, 146, 796, 799 target-appropriate underlying representations, 795–7, 799–800, 804–5 targetlike performance, 523, 536, 540 tasks act-out, 168–9, 205, 240, 244, 246, 292–3, 545 comprehension, 107, 131, 240, 242, 251–2, 278, 292, 355 elicitation, 240, 246, 298, 443, 457, 459, 479, 791 judgment, see judgment tasks picture verification, 246, 293, 523, 529–30, 533–4, 536–7, 540–1 truth value judgment, 107, 395, 489, 492, 524, 526, 528–9, 545 TCP (The Compounding Parameter), 92, 105, 107–10, 224–7 θ-Criterion, 230, 233–7, 265, 271, 273 TD children, 754, 756, 759–64, 766–8 teachers, 208, 277, 569, 773 adversarial, 656–7 TEDS (Twin Early Development Study), 776–7, 779–81, 789 telic predicates, 589, 591, 593, 597–600, 602, 604–6, 608 telic verbs, 596, 598–9, 609 telicity acquisition, 600–604 compositional, 590, 601–4, 609 predicate, 590, 600–604, 609 temporal properties, 158, 587–8, 592, 610 temporal reference, 355, 599 temporal semantics, 595, 597 Tenenbaum, J., 26, 665, 670–1, 673–4, 677, 686 tense, 284–6, 347–8, 354–9, 388–9, 430–3, 592–3, 595–9, 605–6 combinations, 389, 431 forms, 122, 125, 127, 295, 360, 430, 606 morphology, 432, 482, 592, 763 operators, 598–9 rules, 121, 483

spontaneous speech (SS), 596–600 verbs, 117, 122, 362, 389, 433 tensed clauses, 280, 282–3, 285, 307, 394, 518 finite, 294–302 tensed complements, 282, 287, 297, 299–300, 303–5 tensed forms, 300, 306, 403, 763 tensed verbs, 387, 409, 701 Terzi, A., 181, 188, 251, 292 Tesar, B., 23, 73, 673, 725–7, 729–31, 736–41, 743–4, 746 test sentences, 524, 527–9, 534, 548–9, 551, 554–7, 559–60, 563 tests, standardized, 753, 776–7, 780, 785–6, 791 Thai, 79–80, 93, 225 The Compounding Parameter, see TCP Thematic Hierarchy, 162, 245, 271 thematic relations, 157, 159–61, 231, 303 thematic roles, 160–2, 167, 171–2, 179, 198–9, 202, 230–2, 234 transmission, 198–9 thematic vowel, 360–1, 382–3 theme arguments, 161–2, 196, 244, 256, 260 theme objects, 242–5 theme subjects, 243–5, 271 theoretical linguistics, 17, 547, 563, 611, 633 theories of learning, 3, 111, 631–747 theta roles, 198–201, 255, 543 Thiessen, E.D., 14, 22, 33–4, 40, 135, 140, 151–2, 662 Thothathiri, M., 171–2 time, event, 355, 587–8 timelines, 13, 15, 23, 118, 123 timing, gestural, 58, 61, 66 TLA (Triggering Learning Algorithm), 708–14, 716–19 to-infinitives, 287, 304 token frequencies, 31–2 tokens, 23–5, 75, 140, 303, 383–4, 575, 619–20, 679–80 word, 680–1, 684 Tolbert, L., 759–60 Tomasello, M., 160, 163–5, 168–7 1, 177, 294–5, 441–2, 455–6, 458 tone, 68–9, 71, 73, 75, 77–86, 132, 140, 498 development, 79–80 high, 80, 82–3

1002 Index tone (cont.) lexical, 78–9, 81–2 musical, 140, 671 topic markers, 570, 719 topicalization, 195, 256, 334, 701–2, 719 Tornyova, L., 316–17, 349 total rankings, 729, 731–3, 736, 739–40, 746 Touretzky, D. S., 75–7 TPs (transitional probabilities), 139–41, 146, 149, 152, 233, 236, 668–72, 684–5 transcribers, adult, 58, 61, 85 transcripts, 129, 310, 313–16, 321, 329, 347, 351, 360 transformational rules, 311, 696 transitional probabilities, see TPs transitional probability, cues, 669, 671 transitions, 14, 16, 39, 61, 77, 358, 362, 685 transitive frames, 166–8, 171, 174, 304 transitive sentences, 537–9, 557, 600, 604, 609 transitive verbs, 105, 170, 172, 243, 261–2, 416, 429, 600–603 transitivity, 159, 169, 242, 600–601, 604 heuristic, 600, 604, 609 treelets, 711–14, 720 parameter, 712–13, 715 trial-and-error, 698, 705, 720, 722–3 triggering, 698, 705, 713, 721–3 classical, 705, 708, 714, 718, 722–4 as search, 708–11 Triggering Learning Algorithm, see TLA triggers disambiguation, 719–23 E-triggers, 720, 722–3 unambiguous, 711–14, 719–21, 723 trisyllabic words, 139–41, 737 trochaic bias, 143–4 trochaic units, 135, 143–5, 151–2 trochaic words, 141, 143–4 θ-role, 232, 234–7, 243, 247, 269, 271, 273–4, 276 external, 237, 257–8, 264 true clusters, 808–9, 814 Trueswell, J., 81, 175, 513–14, 611 truncation, 51, 126, 316, 375, 387, 409 truth conditions, 503, 550–1, 553, 555–6, 558, 560, 567, 583 truth value judgment tasks, 107, 395, 489, 492, 524, 526, 528–9, 545

truth value judgments, 107, 239, 275, 313, 512, 551, 602, 616 truth values, 275, 277, 297–8, 300, 567 θ-transmission, 246–7 Turing machines, 636–8, 641, 654 Turkish, 54, 95, 115, 298, 371, 596 TVJ, see truth value judgments Twin Early Development Study, see TEDS twins, 755, 767, 772–7, 779–83, 814 two-clause sentences, 296, 541 two-clause structures, 299, 311, 326 two-participant events, 159, 167 Tyler, M., 135, 140–1, 149, 151, 798, 803–4 typically-developing children, 295, 346–7, 363, 752, 754–6, 759–60, 791–2, 814–15 typological generalizations, 503–4, 601 typological predictions, 85, 735–6 typological variation, 3, 570–1 typology, 11, 48, 77, 111, 725, 733 passives, 238 UCC (Unique Checking Constraint), 343, 356–7 UG (Universal Grammar), 248–9, 283, 324, 329, 339–40, 697, 707–8, 719–20 unaccusative verbs, 159, 168, 192–3, 197–8, 200–201, 255, 257–64, 266 unaccusatives, 158–9, 168, 192–3, 197–8, 201, 231–2, 235, 278 A-movement, 257–64 unambiguous triggers, 711–14, 719–21, 723 undergeneralization of rules, 599, 609 underinformative statements, 373, 617–19 underlying representation (URs), 123–5, 127, 131, 203–4, 740–2, 791, 793–807, 815 underlying representations (URs) internalized, 791, 798–9, 806 target-appropriate, 795–7, 799–800, 804–5 underlying velars, 64, 795–6 underspecification, 21, 376, 444, 446 understanding, 81, 372–3, 478, 491, 496, 502, 615 unergative structures, 260, 263–4 unergative verbs, 159, 168, 193, 258–9, 262 unergatives, 159, 168, 190, 193, 258–60, 263 unexpected complexity, 64–7 ungrammaticality, 323, 327, 440, 451–2, 456, 529, 699, 743

Index 1003 Uniformity of Theta Assignment Hypothesis, 162, 260 Unique Checking Constraint, see UCC Universal Grammar, see UG Universal Phase Requirement, see UPR Universal Phase Requirement hypothesis, see UPRH universal properties, 660–1 universal quantifiers, 559, 567 universality, 11, 93, 551, 556 universals, 2, 177, 390, 498 unstressed syllables, 48–9, 85, 126, 404, 412, 455 UPR (Universal Phase Requirement), 189–91, 203, 265–6 UPRH (Universal Phase Requirement hypothesis), 190–1, 193–4, 196–8, 201–2 URs, see underlying representation utterances, 260, 273–8, 393–5, 412–14, 445–7, 455, 495–6, 684–91 length, 441, 454–5, 684 multiword, 134, 167, 180 Vainikka, A., 342, 397, 418, 420, 429 Valiant, L.G., 644, 646–7, 658 Vallabha, G.K., 24, 681 values, 73, 390–2, 557–8, 697–9, 702–4, 708–10, 715–16, 718–21 correct, 697–8, 702, 708, 714, 722–3 feature, 20–2, 391, 741 incorrect, 392 information, 403, 410–12 marked, 721, 723 nonzero, 635–7, 639 parameter, see parameter values zero, 637, 639 van der Lely, H., 760, 766–7 van Kampen, J., 331–2, 722 van Oostendorp, M., 115, 126 variances, 12, 772–4, 783 genetic, 771, 775, 782, 789 variants, 14, 470, 475, 657–8, 680, 685, 729, 731 variation, 62, 212, 386, 389–91, 570–2, 574, 733–5, 745 individual, 319, 444, 457, 756 lexical, 294, 308, 553

normal, 772–3, 775–6, 780, 782–3 structural, 534, 585 syntactic, 209–10, 212, 223, 229 Variational Learning Model, 715–19 root infinitives, 359 Varlokosta, S., 292, 379, 424, 530, 533, 537, 754 V–DP–Particle constructions, 224–5, 227–8 velar consonants, 21, 796, 798, 803 velar fronting (VF), 44, 58–9, 62, 66, 793–7, 802 velar stops, 11, 795, 797 velar substitution, 44, 62–4, 66 velars, 10, 55–6, 58, 63, 794–9, 804–5 acquisition, 10 target, 58, 794, 797, 803 underlying, 64, 795–6 Vendler, Z., 158, 303, 469, 588–9 verb agreement, 170, 362, 416, 427 verb classes, 173, 175, 187, 268, 270, 290, 379 verb finiteness, 343, 345–50, 365 verb inflections, 128, 388–9, 598 verb movement, 316, 320, 346, 357 parameters, 703, 708, 714 verb particles, 590, 592, 600, 609 verb phrase ellipsis (VPE), 489, 516 verb phrases (VP), 190, 192–4, 200–202, 236–8, 257–8, 408, 515–18, 531–3 length effect, 401–2, 412 verb semantics, 175, 292 verbal affixes, 207 verbal complements, 236–7, 273 verbal memory, 774, 779–80 verbal passives, 188–92, 194, 200–201, 204–5, 242, 246, 250, 252–3 verbal predicates, 110, 535–6, 542 verbal roots, 80, 131 verb-particle combinations, 105–6, 108–9, 223, 225, 229 acquisition and cross-linguistic variation, 223–8 verbs, 157–62, 164–88, 225–32, 241–8, 262–70, 279–83, 285–305, 598–604 actional, 181–4, 187–91, 198, 200–201, 205, 241–7, 254 activity, 226, 238, 596 agentive-transitive, 230–1 agreeing, 348, 433–4

1004 Index verbs (cont.) anticanonical, 244 atelic, 596, 599–600 auxiliary, see auxiliary verbs bare, 316, 377, 431, 433, 599 belief, 174 canonical, 244–5 control, 268–70 finite, 344–5, 347, 351–3, 376, 592, 599 intransitive, 168, 172, 175, 190, 238, 257, 416, 600–601 irregular, 117, 119 light, 170, 265 locative, 175 lower, 281, 290–1, 297 main, 224–5, 270, 316, 416, 423, 536, 593, 677 manner-of-motion, 110, 226 matrix, 269, 271, 273–4, 286, 288, 291, 297, 299–301 mental, 241–2, 246, 282, 303–4 morphology, 356, 592, 766 non-actional, 181, 186, 243, 247, 252 nonfinite, 343–5, 347, 351–3, 357, 375, 592, 599 novel, 166–9, 171, 173–6, 231, 243, 253, 304, 601 passive, 195, 257 passivized, 191, 244–5, 290 perception, 242, 244–5 psychological, 181–3, 187–9, 199, 205, 575 raising, 231–2, 235, 266, 268, 278 reflexive, 197–8, 536, 543 reflexive-marked, 539–40 regular, 117, 122 root infinitive, see root infinitives root nonfinite, 341–56 subject-experiencer, 200, 204 telic, 596, 598–9, 609 tense, 117, 122, 362, 389, 433 tensed, 387, 409, 701 transitive, 105, 170, 172, 243, 261–2, 416, 429, 600–603 unaccusative, 159, 168, 192–3, 197–8, 200–201, 255, 257–64, 266 unergative, 159, 168, 193, 258–9, 262 VF, see velar fronting Vihman, M.M., 11, 18, 20, 44, 54–6, 70 violable constraints, 73, 725–47, 802 beyond total ranking, 731–6

hidden structure, 736–42 learning rankings, 727–31 learning restrictive languages, 742–4 modeling acquisition, 744–7 vocabulary, 149–50, 152, 684–5, 771, 773, 775–7, 779–81, 785–6 voice, 112, 114, 121, 123–4, 180, 195–6, 493, 766 active, 179, 182, 243–4, 246, 253 alternations, 112, 119, 179–205 middle, 196–7, 204 passive, 179–80, 242, 244, 253 and passives, 194–6 voiceless consonants, 34, 113 voiceless stops, 11, 124 voicing, 14, 34, 112, 124, 130, 728, 797 alternations, see voice, alternations intervocalic, 116, 123–4, 741 obstruent, 34, 113, 123 phonotactics, 38, 112 Volpato, F., 182, 189, 204 von Stechow, A., 467–9, 475–6 vowel harmony, 66–7 vowels, 21–2, 24, 33, 83–4, 113–14, 382, 797–9, 804–6 final, 170, 369, 380–3, 744 front, 33–4, 61, 797–9, 805 long, 52, 72, 142 low, 8, 21 phonological, 382–3 thematic, 360–1, 382–3 vowel-zero alternations, 119, 127 VP, see verb phrases V–Particle–DP constructions, 216–19, 221, 223, 227–8 VPE, see verb phrase ellipsis V-raising, 191–2 Wagner, K.R., 351, 592, 595, 597, 600–601, 604–5, 607–10 Wales, R.J., 14, 464, 478, 480, 776 weak pronouns, 532, 534 weak quantifiers, 492, 623, 627 weak reflexives, 525–6 weak scalars, 617, 621, 625, 628–9 weighted constraints, 50, 734, 736 weightings, 410, 726, 734–7 probabilistic, 732, 734–5

Index 1005 Weinberg, A., 213, 249, 597–8 Weissenborn, J., 143–4, 409, 440, 444, 447–8, 458 Weist, R., 592, 595–8, 603, 607 Werker, J., 13–14, 20, 22, 31, 83, 118 Wessels, J., 27, 29–30, 38, 147 Westergaard, M.R., 317–18, 352 Wexler, K., 181–2, 248–53, 264–8, 270–1, 346– 8, 430–1, 523–5, 708–11 wh-complements, 302–3 wh-copying, 327–8, 330–2, 335–6 questions, 327–8, 336 structures, 327–8, 330–2, 336, 340 wh-movement, 283, 296, 298–9, 301–2, 324, 326, 332–5, 337 wh-phrases, 221, 310–11, 316–21, 326–8, 330–2, 334–40 contentful, 327–8, 330–2, 335 D-linked, 331–2, 338 wh-questions, 298–300, 311–12, 314–15, 318– 20, 334, 400, 760, 766 affirmative, 314, 758 wh-words, 209, 212, 302–3, 317–21, 324–5, 338– 40, 473, 766 why-questions, 307, 318–21 Wiehagen, R., 642–3, 647 Wijnen, F., 74, 354, 371, 376–7 Williams Syndrome (WS), 482–3, 752, 754–9, 761–4, 768 winner–loser pairs, 727–31, 738, 746 winners, 727–32, 735, 746 word accents, 82 word boundaries, 32, 34–5, 136–7, 139–40, 142– 3, 147, 668–9, 682–4 word formation, 91, 95–6, 105 compound, 89–110 word learning tasks, 20–2, 507, 681 word order base, 441, 449, 452, 454, 456 derived, 441, 454, 456 possessor–possessum, 436–7, 446, 450–1 possessum–possessor, 437, 448–50

word segmentation, 137, 139, 141, 145–7, 153, 668–70, 679–80, 682–3 and phonotactics, 35–6 word tokens, 680–1, 684 word-initial consonants, see initial consonants words, 21–2, 35–8, 67–73, 75–82, 133–44, 149– 52, 669–7 1, 680–8 bisyllabic, 65, 137, 140–1, 143, 145–7, 149 content, 135, 149–50, 172 control, 144, 146, 149 definition, 89 function, 116, 149–50, 762, 777 iambic, 144, 151 isolated, 133–4, 136, 144, 147, 149 logical, 549, 551, 554, 564 mono-morphemic, 111, 113, 124, 454 monosyllabic, 79, 84, 134, 144–5, 147 nonsense, 75, 126, 128, 150 number, 488, 496, 500, 502, 506, 509–11, 626–7, 629 question, 310, 316, 318–19, 324–5, 335, 337 target, 71, 134, 137–8, 141, 144, 146, 796, 799 trisyllabic, 139–41, 737 trochaic, 141, 143–4 WS, see Williams Syndrome Wynn, K., 500, 502, 506–7, 627 Xu, Y., 86, 499, 670–1, 674, 686–8 Yang, C., 359, 410, 666, 669, 711, 713, 715–19, 722–3 yes/no questions, 296, 299, 307, 314–15, 322, 334, 583, 691 complex, 691–3 Yoshinaka, R., 648, 660, 662 young children, 99–100, 241, 525, 556–8, 565– 6, 616–17, 622–3, 759–60 zero values, 637, 639 Zhou, P., 324–5 Zwicky, A., 119, 303, 348

OXFORD HANDBOOKS IN LINGUISTICS THE OXFORD HANDBOOK OF APPLIED LINGUISTICS Second edition Edited by Robert B. Kaplan

THE OXFORD HANDBOOK OF ARABIC LINGUISTICS Edited by Jonathan Owens

THE OXFORD HANDBOOK OF CASE Edited by Andrej Malchukov and Andrew Spencer

THE OXFORD HANDBOOK OF COGNITIVE LINGUISTICS Edited by Dirk Geeraerts and Hubert Cuyckens

THE OXFORD HANDBOOK OF COMPARATIVE SYNTAX Edited by Gugliemo Cinque and Richard S. Kayne

THE OXFORD HANDBOOK OF COMPOSITIONALITY Edited by Markus Werning, Wolfram Hinzen, and Edouard Machery

THE OXFORD HANDBOOK OF COMPOUNDING Edited by Rochelle Lieber and Pavol Štekauer

THE OXFORD HANDBOOK OF COMPUTATIONAL LINGUISTICS Edited by Ruslan Mitkov

THE OXFORD HANDBOOK OF CONSTRUCTION GRAMMAR Edited by Thomas Hoffman and Graeme Trousdale

THE OXFORD HANDBOOK OF CORPUS PHONOLOGY Edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen

THE OXFORD HANDBOOK OF DERIVATIONAL MORPHOLOGY Rochelle Lieber and Pavol Štekauer

THE OXFORD HANDBOOK OF DEVELOPMENTAL LINGUISTICS Edited by Jeffrey Lidz, William Snyder, and Joe Pater

THE OXFORD HANDBOOK OF GRAMMATICALIZATION Edited by Heiko Narrog and Bernd Heine

THE OXFORD HANDBOOK OF HISTORICAL PHONOLOGY Edited by Patrick Honeybone and Joseph Salmons

THE OXFORD HANDBOOK OF THE HISTORY OF ENGLISH Edited by Terttu Nevalainen and Elizabeth Closs Traugott

THE OXFORD HANDBOOK OF THE HISTORY OF LINGUISTICS Edited by Keith Allan

THE OXFORD HANDBOOK OF INFLECTION Edited by Matthew Baerman

THE OXFORD HANDBOOK OF JAPANESE LINGUISTICS Edited by Shigeru Miyagawa and Mamoru Saito

THE OXFORD HANDBOOK OF LABORATORY PHONOLOGY Edited by Abigail C. Cohn, Cécile Fougeron, and Marie Hoffman

THE OXFORD HANDBOOK OF LANGUAGE AND LAW Edited by Peter Tiersma and Lawrence M. Solan

THE OXFORD HANDBOOK OF LANGUAGE EVOLUTION Edited by Maggie Tallerman and Kathleen Gibson

THE OXFORD HANDBOOK OF LEXICOGRAPHY Edited by Philip Durkin

THE OXFORD HANDBOOK OF LINGUISTIC ANALYSIS Second Edition Edited by Bernd Heine and Heiko Narrog

THE OXFORD HANDBOOK OF LINGUISTIC FIELDWORK Edited by Nicholas Thieberger

THE OXFORD HANDBOOK OF LINGUISTIC INTERFACES Edited by Gillian Ramchand and Charles Reiss

THE OXFORD HANDBOOK OF LINGUISTIC MINIMALISM Edited by Cedric Boeckx

THE OXFORD HANDBOOK OF LINGUISTIC TYPOLOGY Edited by Jae Jung Song

THE OXFORD HANDBOOK OF NAMES AND NAMING Edited by Carole Hough

THE OXFORD HANDBOOK OF SOCIOLINGUISTICS Edited by Robert Bayley, Richard Cameron, and Ceil Lucas

THE OXFORD HANDBOOK OF TENSE AND ASPECT Edited by Robert I. Binnick

THE OXFORD HANDBOOK OF THE WORD Edited by John R. Taylor

THE OXFORD HANDBOOK OF TRANSLATION STUDIES Edited by Kirsten Malmkjaer and Kevin Windle