Logical issues in language acquisition 9789067655064, 9067655066

505 91 20MB

English Pages [320] Year 1990

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Logical issues in language acquisition
 9789067655064, 9067655066

Table of contents :
Table of Contents
List of Contributors
Introduction
The logical problem of language acquisition: representational and procedural issues
Observational data and the UG theory of language acquisition
Parameters of Metrical Theory and Learnability
Markedness and growth
Nativist and Functional Explanations in Language Acquisition
Locality and Parameters again
On the rhythm parameter in phonology
Dependencies in the Lexical Setting of Parameters: a solution to the undergeneralisation problem
The Nature of Children's Initial Grammars of English
Null Subjects, Markedness, and Implicit Negative Evidence
Second Language Learnability
Can Pragmatics fix Parameters?
Author Index
Subject Index

Citation preview

Logical Issues in Language Acquisition

Linguistic Models The publications in this series tackle crucial problems, both empirical and conceptual, within the context of progressive research programs. In particular Linguistic Models will address the development of formal methods in the study of language with special reference to the interaction of grammatical components. Series Editors: Teun Hoekstra Harry van der Hülst Other books in this series: 1 2 3 4 5 6 7 8 9 10 11 12 12 13 14

Michael Moortgat, Harry van der Hülst and Teun Hoekstra (eds) The Scope of Lexical Rules Harry van der Hülst and Norval Smith (eds) The Structure of Phonological Representations. Part I Harry van der Hülst and Norval Smith (eds) The Structure of Phonological Representations. Part II Gerald Gazdar, Ewan Klein and Geoffrey K. Pullum (eds) Order, Concord and Constituency W. de Geest and Y. Putseys (eds) Sentential Complementation Teun Hoekstra Transitivity. Grammatical Relations in Government-Binding Theory Harry van der Hülst and Norval Smith (eds) Advances in Nonlinear Phonology Harry van der Hülst Syllable Structure and Stress in Dutch Hans Bennis Gaps and Dummies Ian G. Roberts The Representation of Implicit and Dethematized Subjects Harry van der Hülst and Norval Smith (eds) Autosegmental Studies on Pitch Accent a. Harry van der Hülst and Norval Smith (eds) Features, Segmental Structures and Harmony Processes (Part I) b. Harry van der Hülst and Norval Smith (eds) Features, Segmental Structures and Harmony Processes (Part II) D. Jaspers, W. Klooster, Y. Putseys and P. Seuren (eds) Sentential Complementation and the Lexicon René Kager A Metrical Theory of Stress and Destressing in English and Dutch

Logical Issues in Language Acquisition I.M. Roca (ed.)

¥

1990 FORIS PUBLICATIONS Dordrecht - Holland/Providence RI - U.S.A.

Published by: Foris Publications Holland P.O. Box 509 3300 AM Dordrecht, The Netherlands Distributor for the U.S.A. and Canada: Foris Publications USA, Inc. P.O. Box 5904 Providence RI 02903 U.S.A. Distributor for Japan: Toppan Company, Ltd. Shufunotomo Bldg. 1-6, Kanda Surugadai Chiyoda-ku Tokyo 101, Japan CIP-DATA KONINKLIJKEBIBLIOTHEEK,

DENHAAG

Logical Logical Issues in Language Acquisition / I.M. Roca (ed.). - Dordrecht [etc.]: Foris. - 111. (Linguistic Models : 15) With Index, Ref. ISBN 90 6765-506-6 Subject Heading: Language Acquisition

ISBN 90 6765 506 6 © 1990 Foris Publications - Dordrecht No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission from the copyright owner. Printed in The Netherlands by ICG Printing, Dordrecht.

Table of Contents

List of Contributors

xi

I.M. Roca Introduction

xv

Martin Atkinson The logical problem of language acquisition: representational and procedural issues 1. Background 2. Representational Issues 3. Procedural Issues Footnotes References

1 2 10 16 26 29

Vivian Cook Observational data and the UG theory of language acquisition . . . . 1. Evidence in the UG model 2. I-language and E-language theories 3. Observational data, performance and development 4. Representativeness of observational data 5. Observational data and adult performance 6. Evidence of absence 7. Correlations within observational data 8. General requirements for observational data in UG research References

33 33 34 35 37 38 40 42 43 45

Michael Hammond Parameters of Metrical Theory and Learnability 1. Metrical Theory 2. Learnability 3. The Seven-Syllable Hypothesis 4. Levels and Options 5. Short-Term Memory Constraint

47 48 49 52 53 57

vi

Logical Issues in Language Acquisition

Footnotes References Teun Hoekstra Markedness and growth 1. Parameters and Markedness 2. Developmental markedness 3. Extension and Intension 4. The notion of growth: the Unique External Argument Principle 5. A-chains 5.1. Ergatives 5.2. Passives 6. Conclusion Footnote References

60 61

63 64 69 72 73 76 76 79 82 82 83

James Hurford Nativist and Functional Explanations in Language Acquisition 1. Preliminaries 1.1. Setting and Purpose 1.2. Glossogenetic and Phylogenetic mechanisms 1.3. Competence/performance, I-Language/E-language 1.4. The ambiguity of 'functional' 2. Glossogenetic mechanism of functional influence on language form 2.1. The Arena of Use 2.2. Frequency, statistics and language acquisition 2.3. Grammaticalisation, syntacticisation, phonologisation 2.4. The role of invention and individual creativity 2.5. The problem of identifying major functional forces 2.6. Language drift 3. Conclusion Footnotes References

96 96 107 113 120 124 129 130 131 132

Rita Manzini Locality and Parameters again 1. Locality 2. English Anaphors and Pronouns 3. Italian Reciprocal Constructions 4. Parameters in Locality Theory References

137 138 142 148 152 156

85 85 85 87 89 94

Contents

vii

Marina Nespor On the rhythm parameter in phonology 1. Phonetic evidence against two types of timing 2. Phonological evidence against two types of rhythm 2.1. Nonrhythmic characteristics of "stress-timed and "syllabletimed" languages 2.2. On the existence of intermediate systems 2.3. On the development of rhythm 3. The Phonology of rhythm: arguments for a unified rhythmic component 3.1. The metrical grid in English and Italian 3.2. The Rhythm Rule in English and Italian 3.3. The definition of stress clash in Italian and English 3.4. Stress lapses in English and Italian 4. Conclusions Footnotes References

157 159 162 162 163 165 166 166 167 169 171 172 173 173

Mark Newson Dependencies in the Lexical Setting of Parameters: a solution to the undergeneralisation problem 1. The Lexical Parameterisation Hypothesis and Ensuing Problems 2. A solution to the problems 3. Undergeneralisations and the Binding Theory 3.1. Background issues 3.2. Generalisations and the Lexical Dependency 4. Support for the Lexical Dependency 5. A further predicted generalisation Footnotes References

177 179 180 180 182 187 192 195 197

Andrew Radford The Nature of Children's Initial Grammars of English 1. Introduction 2. Structure of nominals in early child English 3. Structure of clauses in early child English 4. The overall organisation of early child grammars 5. Summary Footnotes References

199 199 202 209 219 228 229 231

177

viii

Logical Issues in Language

Acquisition

Anjum Saleemi Null Subjects, Markedness, and Implicit Negative Evidence 1. Some background assumptions 2. The Licensing Parameter 3. The Learnability Problem 4. Positive Identification 5. Exact Identification 6. Is Implicit Negative Evidence Really Necessary? 7. Developmental Implications 8. Binding Parameters and Markedness Footnotes References Michael Sharwood Smith Second Language Learnability 1. Introduction 1.1. The second language learner as a constructor of mental grammars 1.2. LI and L2 acquisition as special cases of the same process 2. Linguistic theory and second language acquisition 2.1. L2 learnability 2.2. The initial L2 state: logical possibilities 2.2.1. The " U G by proxy" view 2.2.2. The "back-to-square-one" view 2.2.3. The UG-Reorganisation view 3. Research strategies 4. Conclusion Footnotes References N. V. Smith Can Pragmatics fix Parameters? 1. Introduction 2. Exclusions 3. Relevance 4. Parameters 5. Hyams 6. Fixing 7. Conclusion Footnotes References

235 236 237 242 242 247 248 249 252 255 256

259 259 259 260 262 262 264 265 267 267 271 272 273 273

277 277 278 280 281 282 284 287 288 288

Contents Author Index Subject Index

List of Contributors

Martin Atkinson Department of Language and Linguistics University of Essex Colchester Essex C 0 4 3SQ ENGLAND Vivian Cook Department of Language and Linguistics University of Essex Colchester Essex C 0 4 3SQ ENGLAND Michael Hammond Department of Linguistics University of Arizona Tucson, AZ 857221 USA e-mail:[email protected] Teun Hoekstra Instituut voor Algemene Taalwetenschap Rijksuniversiteit Postbus 9515 2300 RA Leiden THE NETHERLANDS e-mail:[email protected]

xii

James Hurford Department of Linguistics Adam Ferguson Building 40 George Square Edinburgh EH8 9LL SCOTLAND e-mail:[email protected] Rita Manzini Department of Phonetics and Linguistics University College Gower Street London WC1E 6BT ENGLAND Marina Nespor Italiaans Seminarium Universiteit van Amsterdam Spuistraat 210 1012 VT Amsterdam THE NETHERLANDS Mark Newson Department of Language and Linguistics University of Essex Colchester Essex C04 3SQ ENGLAND Andrew Radford Department of Language and Linguistics University of Essex Colchester Essex C 0 4 3SQ ENGLAND

Iggy Roca Department of Language and Linguistics University of Essex Colchester Essex C 0 4 3SQ ENGLAND e-mail:[email protected] Anjum P. Saleemi English Department Allama Iqbal Open University H-8, Islamabad PAKISTAN Michael Sharwood Smith English Department Rijksuniversiteit te Utrech Trans 10 3512 JK Utrecht THE NETHERLANDS e-mail:[email protected] N.V. Smith Department of Phonetics and Linguistics University College Gower Street London WC1E 6BT ENGLAND e-mail:[email protected]

Introduction I.M. Roca University of Essex

This volume grew out of a seminar series on the theme 'The Logical Problem of Language Acquisition' that I organised for the Department of Language and Linguistics of the University of Essex in 1988, and at which most of the papers included here were first presented. The aim of the series was to examine the impact of the issue on various areas of language research, thus offering as broad as possible an overview of what is rapidly becoming the focal point of generative linguistics. The change in concerns and outlook which has taken place in linguistics over the past quarter century is nicely encapsulated in the contrast between the basic tenet of American descriptive linguistics that 'languages ... differ from each other without limit and in unpredictable ways' (Joos 1957:96) and Chomsky's current position that there is only one language (cf. e.g. Chomsky 1988c: 2). The apparent irreconcilability of these two stands betrays a more fundamental truth that contemporary linguistics, under Chomsky's enduring leadership, has been labouring to unravel and articulate. Specifically, the crucial discovery has been that phenomenon must be kept distinct from noumenon, or, in plainer words, that underlying the obvious diversity of languages there is a unity more essential to language than its surface geographical variety. Chomksy has thus shifted the focus of linguistics from language to man, from manifestation to source. The central question has now become that of accounting for the possession of language, that is, of an object which has the precise characteristics that human languages are known to have. Pursuing this line of logical investigation, it is reasonable to conclude that if all human languages are cut to the same shape, this shape must be imposed by the very organism in which such languages are contained, that is to say, by man himself. Moreover, given the obvious fact that language develops in man rather than from him, like, say, a physical limb or body hair, the need for interaction between the organism and its environment becomes more acutely obvious. Briefly, what psychological (or, more accurately, biological) attributes must humans possess in order for language learning to take place in early childhood, under the usual conditions of spontaneity, rapidity, satisfactory completion, and so on?

xvi

I.M. Roca

In turn, what traits are necessary in the ambient language itself to make such learning possible in spite of the apparent input variety which so struck linguists of Joos's generation? Here we have in a nutshell what has come to be known as the logical problem of language acquisition. Chomsky's unashamedly nativist position is of course well-known. Briefly, the surface complexity of language is such that no acquisition could meaningfully take place unless the organism already comes equipped with a sort of mental template designed to anticipate and match the ambient data in some way. Given the reality of cross-linguistic surface variation, however, such matching cannot be simplistically direct. Rather, the idea is that the variation is built into the template in the form of a limited range of values for each of a set of parameters. From this perspective, therefore, the task of the child learner is one of elucidating from the data which of the available values must be assigned to the language to which he is being exposed. To a large degree, the acquisition of this language consists in the setting of such parameters. Further to this, there will be the (of course non-negligible) task of rote learning the idiosyncratic properties of lexical items. Not unexpectedly, these two tasks are in fact interdependent, in ways that are gradually becoming better known. It is not my intention to review here the short but already hefty history of the topic which inspires the title of this book. For most of the relevant information, the curious reader can refer to such works as Wexler and Culicover (1980), Baker and McCarthy (1981), Hornstein and Lightfoot (1981), Atkinson (1982), Borer (1984), Pinker (1984), Berwick (1985), Chomsky (1986a), Hyams (1986), Roeper and Williams (1987), and Chomsky (1988a, 1988b, 1988c) Focussing then on the contents of the present collection, a range of interwoven themes are discernable, and we shall now go through them briefly. Granting the reality of Universal Grammar in the form of principles and parameters, one obvious question concerns the chronology of its availability. Specifically, are all such principles and parameters present and accessible from the onset of the acquisition process or do they (or at least some of them) emerge as development unfolds, as in Borer and Wexler's (1987) Maturation Hypothesis? While Atkinson is decidedly sympathetic to the maturational account, an important part of Hoekstra's paper is aimed against Borer and Wexler's key argument for the hypothesis, which is based on the claim that the non-occurrence of verbal passives in the early stages is the result of the unavailability of A-chains at this point of development. Hoekstra's alternative hinges on the characterisation of language acquisition as growth in the system of grammatical knowledge, the central theme of his paper. Importantly, such intensional accruement need not result in extensional expansion, but is also consistent with

Introduction

xvii

contraction of the output language, and in this light Hoekstra reinterprets the Unique External Argument Principle of Borer and Wexler (1988). The availability issue reemerges in the arena of L2 acquisition. Clearly, here the learner arrives at the process with all the baggage of his learned LI. The question therefore is - does Universal Grammar still play a role, and, if so, exactly what form does it take? In Sharwood Smith's paper a number of possibilities are presented and evaluated. Importantly, the issue is clouded by the existence of several obvious differences between mother tongue and L2 acquisition, in the areas of cognitive development, social context, and target attainment, among others. The richness of such extralinguistic factors creates falsifiability difficulties for simplistic claims based on a naive, if commonly adopted, identity hypothesis. Consequently, Sharwood Smith forcefully points out their methodological undesirability, unless embedded in a research strategy embracing a range of alternatives for the investigation of the developing perceptions of L2 learners. I have thus far been using the terms 'acquisition' and 'learning' as mutually substitutable alternatives. Behind such apparently harmless stylistic variation lurks however the substantive issue of the nature of language development. In particular, is acquisition reducible to classical learning, or does it possess characteristics all of its own? This is perhaps the question of most general relevance and far-reaching consequences in the whole domain of language. In his paper, Atkinson concludes after careful discussion that, despite explicit claims to the contrary by practitioners (e.g. Chomsky 1988a, Piatelli-Palmerini 1989), it is not possible to divorce language acquisition from learning, given the central role afforded to hypothesis selection and testing in both processes. The books's spine, like that of the principles and parameters approach to language and to language learnability, concerns the nature and identity of the parameters themselves, and Atkinson warns against the dangers of a new descriptivism cloaked in parametric terminology. He presents and discusses several views on the matter, the adoption of which would lead to a tightening of the range of parameters, and thus to a restriction of the hypothesis space available to the child, with the obvious positive consequences for language learnability. A concrete aspect of the issue of parameter identity concerns the assessment of causal relations between specific language phenomena and the corresponding hypothesised parameter(s), and the papers by Manzini and Nespor shed light on this matter from opposing ends. A locality parameter for binding was proposed in Wexler and Manzini (1987) and Manzini and Wexler (1987) in support of the influential Subset Principle (Berwick 1985, Wexler and Manzini 1987). Pica (1987), however, put forward an alternative account deriving the parametric effects in binding by an appeal to the long vs. short-distance movement of anaphors in LF,

xviii I.M. Roca in accordance with their categorisation as phrases or heads. In her paper, Manzini objects to such a binarity-based approach to locality on the grounds that there is an independent need for two separate definitions of locality, as in Manzini (1988, 1989) and Chomsky (1986b), respectively. She backs up her argument with a detailed examination of the behaviour of the Italian reciprocal I'm I'altro with regard to locality, and concludes that the observable parameter-like effects are indeed best accounted for by means of a specific binding parameter. Confronted with a similar situation in the domain of rhythm, Nespor nonetheless arrives at the opposite conclusion. A cluster of properties led investigators such as Pike (1945) and Abercrombie (1967) into a syllablevs. stress-timing dichotomy as regards the rhythmic realisation of languages. Nespor notes that a specific rhythm parameter would entail a rigid separation of language types which ought to be observable in acquisition, and reviews evidence of various types (acoustic, perceptual, phonological) which contradicts the predictions of such a parameter. In particular, languages with intermediate effects are attested, and children's development goes through a compulsory 'syllable-timed' phase cross-linguistically. Nespor consequently concludes that, contrary to what may appear plausible at first sight, the observable effects are not the result of the different settings of an independent rhythm parameter, but rather correspond to several autonomous processes which, in turn, produce the impression of a different rhythm. A commonly held belief is that, in order to facilitate learnability, the values of the parameters are ordered according to a hierarchy of markedness. The use of the term 'markedness' in the literature is not, however, free of ambiguity, and Hoekstra's paper contains a useful review of the several notions available. Tackling directly the issue of parameter marking, Saleemi presents the case for adopting an intensional approach. Focussing on the specific case of Pro-drop, he maintains that the ranking hierarchy between the (multivalued) settings of the parameter can be derived from the set theoretical relations which exist between the corresponding grammars, thus directly confronting the claims put forward by the proponents of the extensionally-oriented Subset Principle. Saleemi further contends that, while learning according to such an internal criterion can proceed exclusively on the basis of positive evidence, the possibility of some inconsistency between the marked parameter values and the corresponding languages may remain. If so, the achievement of what he calls 'exact identification' (i.e. of the ambient language) may have to involve some use of implicit negative evidence. The challenge presented to learnability by the Lexical Parameterisation Hypothesis (Wexler and Manzini 1987) is taken up in Newson's contribution. In particular, Newson attempts to solve the Undergeneralisation

Introduction

xix

Problem (Safir 1987) by establishing 'Lexical Dependencies' between the settings of any one parameter for different categories. Thus, for instance, the subset relations following the values of the Governing Category parameter and the Proper Antecedent parameter are inverted for anaphors and pronominals, with the consequence that the corresponding markedness hierarchies also ought to be reversed, a conclusion which conflicts with the distribution of pronominals in the world's languages. According to Newson, this difficulty is readily resolved if we assume that anaphors have dominant status over pronominals as regards the setting of these parameters. Consequently, in the unmarked situation the setting for pronominals will be parasitic on that for anaphors, even if this runs counter the predictions of the Subset Principle. Note that Newson's lexical dependencies are parameter-internal, and thus leave open the possibility that the settings of different parameters are mutually unaffectable, as has been contended by Wexler and Manzini (1987). Undoubtedly, one of the most fruitful and debated dichotomies introduced by Chomsky is that between competence and performance. Not unexpectedly such a distinction is also found to permeate the area of language acquisition. In particular, an important issue for learnability theory concerns the trade-off between the (innate) principles of grammar and the effects on learning of factors of performance, especially those which impinge on the input data. Cook's paper examines the pitfalls inherent to the investigation of competence through performance, the common situation in acquisition studies. In particular, he confronts the evidence provided by the grammatical judgement of single sentences, which he regards as paradigmatic of studies of adult competence, with the typical use of observational evidence in child language. Cook alerts us to the possible dangers of misusing such E-language data, and he proposes a range of specific methodological safeguards. He further expresses his uneasiness about reading too much into negative evidence, and suggests that, if child performance data are indeed to be used, they must be compared with data of adult performance, rather than competence. Smith's contribution centres on the role of Pragmatics, as defined in the context of Relevance Theory (Sperber and Wilson 1986), in the fixing of parameters. The interaction between pragmatics and the acquisition of grammar is paradoxical, in that each appears to presuppose the other: pragmatic interpretation requires the use of grammatical knowledge, while the acquisition of such knowledge must be grounded in the contextual interpretation of the input utterances. Smith cuts the Gordian knot by allowing a distinction in the mode of operation of pragmatic principles between the adult and the developing child. In particular, he contends

xx

I.M. Roca

that the operation of the child's pragmatic principles may not in fact presuppose a total syntactic analysis. Relevant to this issue is the detailed evidence presented by Radford concerning the structure of early child grammars (20-24 months). These grammars must be taken to lean heavily on Universal Grammar, given the minimal amount of exposure to the ambient language by this time. Obviously, thus, they constitute a privileged testing ground for claims regarding the availability of grammatical devices to the developing child. Radford's finding is that early grammars are lexical-thematic, that is to say, they contain neither functional categories nor non-thematic constituents. Correspondingly, all structures in these grammars are projections of lexical categories and comprise networks of thematic relations. As a consequence, child grammar will lack the functional properties associated with functional categories, such as case and binding. Interestingly, the lexical-thematic hypothesis can account for the absence of movement chains, referred to above in connection with Hoekstra's contribution. The topic of the interaction between Universal Grammar and the ambient language is taken up again by Hammond, who draws a distinction along standard lines between a default setting of a parameter, for which no external evidence is required, and a marked setting, which can only be triggered by positive evidence. He goes on to show that in the domain of word stress the marked values cooccur with a maximum of seven syllables. Rather than building the corresponding constraint into UG, Hammond opts for the non-stipulative strategy of relating the observation to Miller's (1967) magical number seven. In particular, he contends that the reason for the 7-syllable limit is derivable from the limitation of the storage capacity of short-term memory to seven units. In this way, the statement of the stress parameters is kept at its maximum level of generality, while still being compatible with the facts. Hurford explicitly sets out to reconcile nativist and (social or cognitive) functional explanations to language acquisition, all ¡too often incarnated in the guise of two openly warring factions. He makes a general plea for the integration of extra-grammatical factors into the domain of learnability by drawing a distinction between the evolution of the species and the evolution of particular languages, which he claims to be a function of both innate and culturally transmitted factors. The central concept in his theory is the 'Arena of Use', a performance-related abstraction paralleling Chomsky's competence-related Language Acquisition Device. By jointly providing the input data for the next generation of learners, the LAD and the AoU are both instrumental in the acquisition of competence. Importantly, a model of this kind allows for such factors as statistical frequency and distribution, discourse structure, and individual invention and creativity to play a role in language development without needing

Introduction

xxi

to build them directly into the competence. It moreover goes some way towards accounting for such recalcitrant phenomena as the existence of language drift or the survival of the phoneme through adverse theoretical conditions. As follows from the broad range of contributions, the book ought to be readable by, and useful to, linguists with a variety of interests and from a variety of backgrounds: child language researchers, learnability theorists, syntacticians and phonologists with an interest in principles and parameters, functionalists, language phylogenists, second language researchers, and so on. Indeed, it is perhaps not unreasonable to hope that the volume will make some contribution towards the integration of the rich and varied field of language acquisition. During the period leading up to publication, the papers were subject to critical reviews and subsequent extensive revision, and I wish to make public my gratitude to the anonymous referees who so generously contributed their time and expertise. My editing task has been considerably facilitated by the help and encouragement I received from the series editors, Teun Hoekstra and Harry van der Hulst, and from the Essex colleagues who participated in the project, in particular Martin Atkinson, whose idea the collection originally was, and who made funds and facilities available during his period as chairman of the Department of Language and Linguistics. In the interest of symmetry, I have taken the liberty of introducing a modest degree of style harmonisation across the papers, which will hopefully make the reader's task a more pleasurable one. The generic use of he adopted here should obviously not mislead the pragmatically aware reader into believing that children (or adults) are all of one sex. Owing to practicalities and, especially, time pressure, the choice of a number of typographic conventions has however been left to the individual initiative of the contributors. Throughout the two years which have elapsed between conception and delivery, the contributors have at all times borne my periodic bombardments with patience and good humour. I apologise to them for my countless inefficiencies and thank them warmly for their enthusiasm and cooperation. It is of course to the contributors that any merit of this collection must ultimately revert.

REFERENCES Abercrombie, D. 1967. Elements of General Phonetics. Edinburgh: Edinburgh University Press. Atkinson, M. 1982. Explanations in the Study of Child Language Development. Cambridge: Cambridge University Press.

xxii

I.M. Roca

Baker, C. and J. J. McCarthy. 1981. The Logical Problem of Language Acquisition. Cambridge, Massachusetts: MIT Press. Berwick, R. C. 1985. The Acquisition of Syntactic Knowledge. Cambridge, Massachusetts: MIT Press. Borer, H. 1984. Parametric Syntax. Dordrecht: Foris. Borer, H. and K. Wexler. 1987. The Maturation of Syntax. In Roeper and Williams. 123172. Borer, H. and K. Wexler. 1988. The Maturation of Grammatical Principles. Ms. University of California, Irvine. Chomsky, N. 1986a. Knowledge of Language: Its Nature, Origin and Use. New York: Praeger. Chomsky, N. 1986b. Barriers. Cambridge, Massachusetts: MIT Press. Chomsky, N. 1988a. Generative Grammar. Studies in English Linguistics and Literature. Kyoto University of Foreign Studies. Chomsky, N. 1988b. Language and Problems of Knowledge: the Managua Lectures. Cambridge, Massachusetts: MIT Press. Chomsky, N. 1988c. Some Notes on Economy of Derivation and Representation. Ms. MIT. Hornstein, N. and D. Lightfoot. 1981. Explanation in Linguistics. London: Longman. Hyams, N. 1986. Language Acquisition and the Theory of Parameters. Dordrecht: Reidel. Joos, M. 1957. Readings in Linguistics. Chicago: University of Chicago Press. Manzini, R. 1988. Constituent Structure and Locality. In A. Cardinaletti, G. Cinque and G. Giusti (eds.), Constituent Structure. Papers from the 1987 GLOW Conference, Annali di Ca' Foscari 27, IV. Manzini, R. 1989. Locality. Ms. University College, London. Manzini, R. and K. Wexler. 1987. Parameters, Binding Theory and Learnability. Linguistic Inquiry 18. 413-444. Miller, G.A. 1967. The Magical Number Seven, plus or minus two: Some Limits on our Capacity to Process Information. In G. A. Miller (ed.) The Psychology of Communication. New York: Basic Books Inc. 14-44. Piatelli-Palmerini, M. 1989. Evolution, Selection and Cognition: from 'Learning' to Parameter Setting in Biology and in the Study of Language. Cognition 31. 1-44. Pica, P. 1987. On the Nature of the Reflexivization Cycle. In Proceedings ofNELS 17, GSLA University of Massachusetts. Pike, K. 1945. The Intonation of American English. Ann Arbor. Michigan: University of Michigan Press. Pinker, S. 1984. Language Learnability and Language Development. Cambridge, Massachusetts: Harvard University Press. Roeper, T. and E. Williams. 1987. Parameter Setting. Dordrecht: Reidel. Safir, K. 1987. Comments on Wexler and Manzini. In Roeper and Williams. 77-89. Sperber, D. and D. Wilson. 1986. Relevance: Communication and Cognition. Oxford: Blackwell. Wexler, K. and P. Culicover. 1980. Formal Principles of Language Acquisition. Cambridge, Massachusetts: MIT Press. Wexler, K. and R. Manzini. 1987. Parameters and Learnability in Binding Theory. In Roeper and Williams. 41-76.

The logical problem of language acquisition: representational and procedural issues Martin Atkinson University of Essex I take it that the logical problem of language acquisition has come of age, in the sense that an increasing number of researchers in linguistic theory and the acquisition of language refer their speculations to this problem and offer them as contributing to its solution. This is to be contrasted with the situation a decade ago, when much of the linguistics literature contained only token gestures towards the problem and that devoted to empirical studies of child language largely ignored it.1 This latter was particularly worrying, leading to a plethora of studies which were rich in data of various kinds and designed according to the best standards operative in the field, but which, lacking theoretical foundation, failed to have any lasting significance. The purpose of this paper is to offer an overview of the field. It will not, however, constitute a review, since, for the most part, I shall presuppose some familiarity with the primary literature which informs much of the discussion. Rather, I shall seek to highlight some of the issues which seem to me to be central and, perhaps more importantly, draw attention to a series of questions where clarification appears to be called for. There is, in my view, a new optimism abroad in language acquisition theory at the moment, based on the belief that recent linguistic theorising is at last providing the appropriate concepts for approaching a genuinely explanatory account of the child's achievement in mastering his first language . This is an optimism which I share and which I hope this paper will convey. Inevitably, however, novel conceptualisations of problems generate their own fundamental questions, and this is a sign of a burgeoning research paradigm. I have found it convenient in my own thinking to attempt to maintain a distinction between what I refer to as representational and procedural aspects of the problem. Just what is involved in this distinction will become clear as the paper proceeds, but I should state at the outset that in utilising this distinction, I do not wish to maintain that it will necessarily survive as understanding of the issues deepens. For now, it should be treated as an expository convenience. Accordingly, after an introduction in which

2

Martin

Atkinson

a number of background assumptions and issues are presented, the paper consists of two major sections. The first is largely concerned with the nature of parameters and parametric variation and the questions raised are in the context of the idealisation to instantaneous acquisition; the second focuses on mechanisms of development which might be supposed to operate in real time. 1. BACKGROUND

Recognition of the existence of a problem of the type with which this volume is concerned arises in the context of adopting an explicit framework for thinking about language acquisition. Such a framework typically contains at least three sets of assumptions, as in (l) 2 : (1)

a. b. c.

those concerning the space of hypotheses available to the child; those concerning the data available to the child; those concerning the procedure(s) the child utilises in selecting hypotheses on the basis of exposure to data.

A logical problem exists, modulo such a framework, when it can be argued (informally in most linguistic theorising, but formally in formal learning theory, e.g. Wexler and Culicover 1980; Osherson, Stob and Weinstein 1986) that there is no guarantee that the correct hypothesis will be selected by the assumed procedure(s) on exposure to appropriate data; if we present such a theory of language learning as a serious candidate for explaining some aspect of the child's acquisition and it leads to this conclusion, we can immediately infer that some aspect of it is wrong, since children do in fact acquire their native language at least partially on the basis of exposure to data. Various responses can be contemplated to the conclusion that a logical problem exists, but before looking briefly at these, it will be useful to see how the framework of (1) can be applied in an abstract context (Wexler and Culicover 1980, 43-46). Consider the infinite set of languages in (2), each consisting of an infinite set of sentences using the single 'word' a: (2)

Lj = {a, aa, aaa,

}

L2 =

{aa, aaa,

}

{aaa,

}

L3= etc.

The logical problem of language acquisition

3

This set of languages is to constitute the hypothesis space of (la), i.e. the learner's task is to be the identification of the language to which he is being exposed from this antecedently given set. Exposure consists of the presentation of sentences from the target language so that, for any sentence in the language, there will be some finite time at which that sentence will have been presented. Crucially, the learner is presented with no information about non-sentences. 3 For example, if the learner is being exposed to L 3 , at no point is he presented with the information in (3): (3)

*aa

With our assumptions about the data available to the learner explicit, it is easy to see how to formulate a procedure which will guarantee successful identification after a finite time. This procedure simply instructs the learner to set i, in his current hypothesis L i; as the length of the shortest sentence to which he has been exposed so far. Since, by our assumptions about the data, this shortest string will be presented after some finite time, at that time the procedure will select the correct language and no subsequent datum will modify this selection. Adding this procedure, then, to the assumptions about the space of hypotheses and those concerning available data will yield a learning theory for the languages of (2) in which no logical problem arises. But now consider a superficially similar problem which leads to a radically different outcome. Suppose that the hypothesis space is defined by the languages in (4), only one of which contains an infinite number of sentences, and that our assumptions about data remain unaltered: (4)

L, = {a} L 2 = {a, aa} L 3 = {a, aa, aaa}

L 0 = {a, aa, aaa

}

Now, it is easy to see that no procedure can be formulated which will guarantee choice of the correct language in a finite time. For suppose that we attempt to produce a procedure which is conservative and sensitive to the length of input sentences: say, set i as the length of the longest sentence in the data so far. This will be fine, so long as the target language is one of the finite languages, but if the target happens to be L 0 , this language will never be guessed, so the required guarantee of correctness is not obtained. Alternatively, the non-conservative strategy of selecting

4

Martin Atkinson

L0 in all circumstances will be successful precisely where the conservative strategy fails; but it will also fail where the latter succeeds. Other strategies that could be envisaged may be successful on occasions, but it should be clear on the basis of the above that no such strategy could guarantee success across the full set of target languages. Accordingly, a logical problem exists for the languages in (4), no matter what procedure we invoke, and it follows that correct identification of languages from this set is either impossible or is going to involve changing assumptions about data. It is instructive to see how this problem can be solved if the assumption that the learner gets no information about non-sentences is changed. So now assume that the learner's data consist of sentences and non-sentences labelled as such and that for any datum (sentence or non-sentence), there will be a finite time at which it will have been presented.4 With this modified assumption, there is no difficulty in specifying a procedure that will guarantee successful identification: set i initially at 0 and successively modify it to the length of the shortest non-sentence presented minus 1. The above examples are some way removed from the arena of natural language acquisition, but, in drawing attention to the fundamental changes in the character of a problem which follow from the assumption that information about the status of non-sentences is available to the learner, they make contact with arguments which are constructed in more directly relevant contexts. The perspective I shall be adopting in what follows is that the child develops a core grammar for a language on the basis of exposure to data of a strictly limited kind, and by far the most important limitation consistently advocated is that the data do not contain non-sentences labelled as such.5 If this is so, we have only to consider the wide range of impressively subtle judgements which appear to be readily available to native-speakers, and which presumably arise because of what native-speakers know about their language, to see how easy it is to construct informal tokens of the logical problem in the natural language learning domain. To take wellknown examples, native speakers of English have little difficulty in agreeing that (5) is considerably better than (6), although perhaps not perfect: (5)

?Who did John wonder whether Mary kissed?

(6)

*Who did John wonder whether kissed Mary?

Similarly, multiple wA-questions exhibit a subject-object asymmetry, with the sentence with the subject in situ being considerably degraded in wellformedness compared to that where the object is not moved6: (7)

Who did what?

The logical problem of language acquisition (8)

5

?What did who do?

Or, to take a less widely discussed example, Baker (1988b) cites data from Chichewa, showing that this language has applicative constructions corresponding to both instrumentals and benefactives: (9)

Mavuto anaumb -ir -a mpeni Mavuto SUBJ PREF PAST mould APPLIC ASP knife mtsuko waterpot 'Mavuto moulded the waterpot with a knife'

(10)

Mavuto anaumb -ir -a mfumu Mavuto SUBJ PREF PAST mould APPLIC ASP chief mtsuko waterpot 'Mavuto moulded the waterpot for the chief

In (9) and (10) the instrumental NP mpeni and the benefactive NP mfumu are 'promoted' to direct object position immediately following the verb, creating a structure in which the verb appears to be followed by two objects. However, these two objects behave rather differently in a number of respects between the instrumental and benefactive cases. To take one such difference, for the instrumental both 'objects' can appear as pronominal object prefixes in front of the verb, as in (11): (11)

a.

b.

Mavuto anauMavuto SUBJ PREF PAST OBJ PREF -a mitsuko ASP waterpots 'Mavuto moulded the waterpots with it' Mavuto anaiMavuto SUBJ PREF PAST OBJ PREF mpeni knife 'Mavuto moulded them with a knife'

umb -ir mould APPLIC

umb -ir -a mould APPLIC ASP

For the benefactive applicative, however, only the benefactive NP can be replaced by the pronominal object prefix7:

6 (12)

Martin a.

Atkinson

Mavuto anawaMavuto SUBJ PREF PAST OBJ PREF -a mtsuko ASP waterpot 'Mavuto moulded the waterpot for them' b. * Mavuto anauMavuto SUBJ PREF PAST OBJ P R E F -a ana ASP children 'Mavuto moulded it for the children'

umb -ir mould APPLIC

umb -ir mould APPLIC

The point now is that to the extent that these judgements are reliable and diagnostic of a uniform, internally represented grammar, we are obliged to seek an account of how they arise. That native-speakers of English are not consistently (or even exceptionally) told that (5) is odd but nothing like so bad as (6), or that (7) is fine but (8) is less good is surely uncontroversial. Furthermore, resorting to analogy has no attractions here, as the relevant English judgements concern degrees of ill-formedness and, by assumption, the child is provided with no information of this nature which could form the basis for an analogical inference. Also in the Chichewa case, if analogy were to be employed, it would presumably yield the conclusion that (12b) is well-formed alongside ( l i b ) , since simple applicatives, lacking object prefixes, do not appear to differentiate between instrumentals and benefactives. In these circumstances, we appear to be driven to the conclusion that the judgements that are made regarding these sentences must arise from an interaction of the data to which children are exposed and knowledge which is brought to the acquisition task, this knowledge amounting, within the framework of (1), to a substantive constraint on the hypotheses considered by the child. Characterisation of this knowledge is precisely the concern of linguists attempting to formulate accounts of Universal Grammar and, as we shall see, is specifically aimed at dealing with representational aspects of the logical problem. Before being swept along with the current of opinion which claims that information about non-sentences, or negative evidence as it is often called, is not available to the learner, it is prudent to note that the above observations claim only that acquirers of English (or Chichewa) do not receive systematic explicit exposure to non-sentences together with information about their status. This does not rule out the possibility of a causally efficacious role for implicit negative evidence, as is noted in Chomsky (1981). One way in which this suggestion could be given some substance would be to equip the child with some mechanism which is sensitive to non-occurring tokens which might be predicted as occurring on the basis

The logical problem of language acquisition

7

of an existing system, such non-occurrences, after exposure to a specified amount of data, leading to modifications in the system. That it is possible to formulate learning principles which have this sensitivity is demonstrated by Oehrle (1985), and the extent to which profound changes in learnability domains follows from assuming the existence of negative evidence (in any form) is examined in detail by Osherson, Stob and Weinstein (1986). While it is important to be aware of the magnitude of the effects of the no negative data assumption, the fact remains that the vast majority of work in this area adopts the assumption (but see Randall 1985; Lasnik 1985; Saleemi 1988 and this volume). Returning now to possible responses to the observation that a logical problem exists, it is clear that all three components in the framework of (1) will embody substantive claims and will be modifiable in principle. The discussion of negative evidence above has exhibited one way in which the information in the data available to the child might be enriched, albeit in an indirect way, and such enrichment could well lead to the evaporation of the perceived problem while leaving whatever assumptions we are operating with under (la) and (lc) intact. An alternative strategy, also aimed at (lb), is to assume that the available data are structured in a particularly appropriate way, constituting the ideal language learning environment favoured by those who see an important causal role in the features of the special register of Motherese (see papers in Snow and Ferguson 1977, most notoriously Brown's introduction to the volume). As things stand, there is little reason to be optimistic that this strategy can be profitably pursued. Most directly, whatever features may be characteristic of the Motherese register, they are surely irrelevant to the type of judgement involved with (5) - (12) above, and this type of example could be multiplied endlessly. In addition, the strategy has been effectively challenged both empirically (Newport, Gleitman and Gleitman 1977; Gleitman, Newport and Gleitman 1984) and conceptually (Wexler and Culicover 1980; Wexler 1982). Manipulations of assumptions under (la) and (lc) are essentially what the next two sections of this paper are concerned with, but a preliminary word is in order here. Regarding (la), the obvious move to contemplate is that of restricting the hypothesis space, and, of course, this can be seen as a fairly constant backdrop in the development of generative grammar. For (lc), the issues are less straightforward. If we are subscribing to a hypothesis selection and testing account of development, as (1) suggests we are, the most powerful procedure, in the sense of that which will guarantee success in identification if any can, will be one that allows the child access to an a priori enumeration of hypotheses. Changes in the current hypothesis will be occasioned by errors on a current datum and the next hypothesis in the enumeration which is consistent with the current

8

Martin Atkinson

datum and all previous data will be selected as an alternative. On the assumption that the correct hypothesis is located in the enumeration, it will eventually be selected and no further data will lead to its rejection. But, as is argued extensively by Wexler and Culicover (1980), such a procedure, requiring memory for all previous data, is empirically quite implausible. However, any change in the procedure away from enumeration will yield a 'weaker' procedure and any problems arising on the enumeration assumption will remain unsolved. It follows, then, that certain changes in the selection procedure will be quite ineffective in the context of a logical problem and will in no way obviate the need for restrictions in the hypothesis space under (la). 8 To close this introductory section, it is perhaps appropriate to consider the three dominant paradigms in the linguistics of the last 50 years from the point of view of the framework in (1). First, taking neo-Bloomfieldian structuralist linguistics, the acquisition model that this approach gives rise to might be schematised as in (13): (13)

a. b. c.

any hypothesis compatible with the application of inductive procedures to primary linguistic data; 'objective' properties of utterances, most notably acoustic and distributional properties; hypothesis formulation and testing.

Since the neo-Bloomfieldians did not formulate a mentalistic acquisition model, what we have in (13) must be approached with caution. Nevertheless, it is very much in the spirit of the Chomskian re-construction of the position advocated by his structuralist predecessors, taking the view that their way of doing linguistics embodied an implicit theory of language acquisition. Notable characteristics of it from the point of view of this paper include the observation that the constraints on hypotheses are procedurally induced and are not the product of any substantive linguistic principles, i.e. there is no place for Universal Grammar in this schématisation. Furthermore, hypotheses are not selected but are actually formulated via the application of inductive procedures and whether this is even intelligible in a mentalistic framework is highly debatable. However, explicitly restricting the data available to the learner, albeit via a spurious notion of objectivity, is an emphasis which has already cropped up and which will recur. Inadequacies in the linguistic accounts produced by the structuralists led to the classic theory of transformational grammar (Chomsky 1965) with the characteristics qua a theory of language acquisition as in (14):

The logical problem of language acquisition (14)

a.

b.

c.

9

any rule system compatible with Universal Grammar, this being construed as a set of constraints on possible rule systems; no clear statement, but with hindsight it appears that it was necessary for the child to have access to the same data as the linguist, including information about non-sentences, etc; hypothesis selection, testing and evaluation via the operation of an evaluation measure.

Compared to the structuralist account, there are major changes here. The hypothesis space is now constrained by linguistic principles and the formulation (or discovery) of hypotheses is replaced by selection from an antecedently specified set of possibilities. However, the framework is bedevilled by a number of problems, including the following: (i) Universal Grammar, as a set of constraints on possible rule systems, makes available a very rich set of descriptive options, many of which are not attested and, indeed, are unlikely to be so; the descriptive poverty of structuralism is replaced by profligacy; (ii) it is quite counterintuitive to assume that the child has access to the same data as the linguist; yet without this assumption, the problem raised under (i) takes on massive proportions, as the child would then be required to pick his way through this forbiddingly complex set of options on the basis of rudimentary data; (iii) the form and operation of the evaluation measure, which was the mechanism enabling the child to select the descriptively adequate grammar over one that was merely observationally adequate, remained poorly understood and undeveloped. It was against this background of descriptive largesse that the current Principles and Parameters model emerged with the characteristics in (15): (15)

a.

b. c.

any core grammar which results from the interaction of a set of universal principles and a set of parameters, the values of which can vary; subject to a criterion of 'epistemological priority'; triggering, parameter setting and maturation.

Of course, the switch from rule systems to principles does not in itself guarantee the restrictiveness of descriptive options, but this must be seen alongside an emphasis on the deductive structure of the theory, which enables a particular principle to have effects, as far as the properties of sentences are concerned, only at the end of a lengthy deduction, perhaps involving complex interactions with other principles and parameters. The intention, anyway, is that the number of principles, and perhaps also parameters, will be fairly small and this is clearly a shift away from a

10

Martin Atkinson

situation in which each construction type in each language merits its own rule. Under (15b), the reference to epistemological priority is a recognition that the development of the system must take place in the context of data which it is plausible to assume the child actually has access to. Thus, alongside the familiar restriction on negative data, this approach prohibits reliance on complex data in the fixing of parameter values (in this connection, see Wexler and Culicover 1980 on degree-2 learnability, Morgan 1986 on degree-1 learnability if the child has access to constituent information, the speculations of Lightfoot (1989) on degree-0 learnability, and Elliott and Wexler 1988 on the emergence of a set of grammatical categories from an epistemologically plausible perspective). Finally, triggering, parameter-setting and maturation under (15c) are intended to have a character which makes them quite distinct from learning, even when the latter is construed mentalistically as in (14c), and the extent to which this can be maintained will be examined in Section 3 below. I now turn to a discussion of issues surrounding (15a). 2. REPRESENTATIONAL ISSUES

The Principles and Parameters framework can, in fact, be pursued in two rather different ways, depending upon whether researchers are interested in the detailed operation of whatever mechanisms are postulated in (15c) or not. If not, questions are raised in the context of pure linguistic research, and acquisitional issues are considered at a level of abstraction defined by an idealisation to instantaneous acquisition. The view is that this idealisation is innocent for the purposes of furthering linguistic understanding, coupled with the recognition that it does not address real-time problems in the acquisition domain. 9 Schematically, this idealisation can be represented as in (16):

Here S0 and Sn designate the initial and final states in the acquisition process. Pj, P 2 , ..., Pn is the set of universal principles, and p b p 2 , ..., pm is the set of parameters. The use of x in connection with the parameters

The logical problem of language acquisition

11

at S0 is intended to indicate that at this stage their values are open, and the aj at S n represent the values that are determined in the acquisition process. Presumably, S 0 will also contain specifications of the ranges of the different parameter values, but I am not concerned with such niceties here.10 (16) contains nothing corresponding to the observed gradualness of the acquisition process and no detailed information about how the transition between S0 and S„ is effected. Consideration of these questions is set aside until Section 3. Perhaps the most serious problem confronting this way of looking at things is that of the nature of the principles and parameters, i.e. what is needed is a general theory of what principles and parameters are legitimate, and this section is largely concerned with examining a number of perspectives on this problem. I shall have little to say about principles here, although the issues I raise deserve consideration from this perspective too (see Safir 1987). As things stand, there is not a great deal of agreement among researchers on the identity of more than a small number of parameters. There is prodrop, the unitary status of which is the subject of considerable debate (see, for example, Safir 1985), bounding node for Subjacency, direction of Case and 0-role assignment, governing category, again involving some dispute, the set of proper governors for ECP, and perhaps a few others which have been the subject of systematic discussion. Alongside these, however, there is a large set of proposals in the literature which might be viewed simply as parametric relabellings for aspects of linguistic variation. To take one example, in Lasnik and Saito (1984), in the context of a discussion of the position of w/z-phrases in English and several other languages, we meet the suggestion that whether complementiser positions marked as [+w/i] must contain a [+w/i] element at S-structure is a parameter, and they speculate on whether such a parameter is implicationally related to whether languages have syntactic wA-movement, concluding that it is and that the 'basic' parameter is one expressing the presence or absence of such movement. But these observations do not proceed significantly beyond the data that lead to them, and it is difficult to resist the suggestion that we are being offered nothing more than a translation of an aspect of linguistic variation into a fashionable mode. Now, I do not wish to suggest that the Lasnik and Saito parameter is illegitimate, but the view that the theory of parameters is itself in a position similar to that of the theory of transformational rules in the late 1960s is not easy to put aside. Of course, there is an important difference in that individual transformational rules were seen as having to be learned, whereas parameters and their values are given as part of the solution to the logical problem, but from a methodological perspective, there are uncomfortable similarities; just as it was all too easy to formulate con-

12

Martin

Atkinson

struction-specific rules within a rule-based framework to take account of constructional idiosyncracies, so it is straightforward to allude to the existence of some parameter in coming to terms with some aspect of linguistic variation. The risk of a new variant of descriptivism is very real. 11 How might we contemplate constraining the theory of possible parameters? It seems to me that there are a number of avenues worth exploring, although none of them presents a clear way to proceed at the moment. One is that such constraints will emerge out of procedural considerations once we drop the instantaneous acquisition idealisation, and I shall come back to this in Section 3.12 At least two types of possibility exist, however, which are not primarily based on procedural considerations and it is appropriate to discuss these here. The first is difficult to formulate clearly and is broadly methodological. Thus, one might maintain that a legitimate parameter will have a certain amount of 'explanatory depth', a property we might expect to follow from the deductive structure of the theory alluded to at the end of Section 1. So Baker (1988b), discussing why instrumental NPs do not incorporate universally into verbs in languages which allow noun incorporation, speculates that this may be due to parameterisation in Case theory. The alternative of parameterising G-role assignments so that instruments are assigned 0-roles directly by the verb in some languages, thereby allowing incorporation on Baker's assumptions, but not in others is dismissed as less attractive, presumably because Baker (1988a) has extensive observations suggesting that Case-assigning properties need to be parameterised across a variety of languages, this parameterisation having far-reaching consequences, whereas G-role assignment parameterisation would be an innovation not linked to other aspects of the theory. That linguists do operate with some such notion, then, is probably uncontroversial, but like all methodological rules-of-thumb it is difficult to ascribe the sort of content to it which would enable us to reliably categorise proposed parameters as legitimate or not. The second strategy is to impose some substantive constraint on parameters and this admits two sub-cases: the constraints may directly concern the form of parameters, or the location of parametric variation within the theory. I shall briefly discuss each of these possibilities in turn. Perhaps the most obvious formal property to consider is that of binarity. Since the phonological speculations of Jakobson and his associates (Jakobson, Fant and Halle 1952; Jakobson 1968), linguists have been attracted by binarity, and, in a recent attempt to link the parameter-setting approach in language learning to selective theories in biology, speaking of parameters, Piattelli-Palmarini says (1989, 3): "... each can be 'set' on only one of a small number of admissible values (for many linguistic parameters there seem to be just two such possible values) ..." It might also be maintained

The logical problem of language acquisition

13

that binarity sits most comfortably with the switch-setting analogy offered by Chomsky (1988a), an analogy which Piattelli-Palmarini uses extensively (see further below), although nothing in principle rules out the possibility of multiple switch-settings. Unfortunately, attractive as binarity might be conceptually, in the current state of enquiry we are forced to acknowledge the existence of multiplevalued parameters even among those where fairly extensive justification exists. Thus, for example, Wexler and Manzini (1987) and Manzini and Wexler (1987) offer a 5-valued parameter for governing category, Saleemi (this volume) considers a 4-valued parameter in connection with his reanalysis of pro-drop phenomena in terms of the postponement of Case assignment to LF, and Baker (1988a) suggests that verbs (or perhaps languages) admit multiple possibilities for Case assignment, including the option of assigning two structural cases and the option of one structural and one inherent case, alongside the common situation of having only a single structural case. Nor can we maintain the converse position that a defining property of parameters is non-binarity, as there appear to be some, most notably the directionality parameters, which by their very nature are binary. Naturally, there is nothing unintelligible about sets of parameters some of which are binary and others of which are not, particularly if it transpired that the two sets clustered together with respect to other properties, perhaps thereby constituting parametric natural 'kinds' (see below), but this first attempt to impose a substantive constraint on parameters would appear to require major réévaluations of central parts of the theory if it were to be adopted. 13 A related possibility is that of whether parameters come pre-set, resetting being determined by positive evidence (see Hammond, this volume) or whether they are simply unset, requiring positive evidence to be set in one way or another. Pre-set values, if they exist, can then be referred to as unmarked. This possibility, of course, raises procedural questions, and it will arise again in Section 3, but for now, since we are assuming that S 0 contains a specification of the range of permissible parameter values for each parameter, it is natural to wonder whether some a priori ordering might not be imposed on this range. Again, unfortunately, what we have on the ground is a mixed bag. Thus, taking the governing category parameter in the work of Wexler and Manzini, the notion of a default, pre-set value makes perfect sense and, indeed, is necessary from the set-theoretic perspective they adopt (see p. 18 below, and Newson, this volume). As is well-known, a default value has also been suggested by Hyams (1986) for pro-drop, this being [+pro-drop] and motivated by some controversial claims about early child speech (see, for example, Aldridge 1988). Already, with these two cases, however, whatever the empirical status of the claims, there is the uncomfortable observation that Wexler and Manzini's account

14

Martin

Atkinson

is grounded in learnability considerations (irrespective of whether markedness is viewed as integral to the initial state of Universal Grammar or a consequence of the operation of a separate learning module) which are not available for Hyams, the point being that ±pro-drop does not yield set-theoretically nested languages because of the non-existence of sentences with expletive subjects in many [+pro-drop] languages. 14 Clearly, if we consider other examples such as the directionality parameters (head direction and direction of Case and 9-role assignment) or whether whmovement is obligatory in syntax, we end up with overlapping or disjoint languages for which no observations have been offered to suggest that one value should be the default setting of the parameter (see Newson, this volume). Turning now to how we might constrain the location of parameterisation within the theory, there is a variety of ways in which we might spell out the sense in which the framework allows for parametric variation. Some construals appear in (17): (17)

Universal Grammar consists of: a. a set of parameterised principles; b. ( i) a set of universal principles; (ii) a set of parameterised principles. c. ( i) a set of universal principles, stated in terms of a set of primitives; (ii) a parameterisation of the primitives. d. ( i) a set of universal principles, stated in terms of a set of universal primitives; (ii) a set of universal principles, stated in terms of parameterised primitives; (iii) a parameterisation of the appropriate primitives.

To appreciate what is involved in these alternatives, we might consider the particular example of the Binding Theory. According to (17a), and assuming that some parameterisation is justified here, we would say that the Principles of the Binding Theory are themselves parameterised. According to (17c), however, following Wexler and Manzini (1987), the clauses of the Binding Theory are formulated in terms of primitives, one of which is governing category, and it is this notion which is parameterised. 15 On this construal, then, the actual form of the Binding Theory is universal. (17b) and (17d) merely represent mixed possibilities. Now, it seems increasingly clear that when authors talk about parameterisation, they are, like Manzini and Wexler, referring to primitives and not principles Subjacency would be another example, where we could talk about the parameterisation of the Principle or of the notion of bounding node, one

The logical problem of language acquisition

15

of the primitives which enters into its formulation. This begins to look like a fairly tidy constraint to impose on the location of parameterisation, but, again, there are claimed instances of parameterisation to which it is not clearly applicable. Thus, the directionality parameters do not enter directly into the principles of X-bar theory, Case Theory or 0-theory and we appear to have a situation where parameterisation can occur in the primitives appearing in principles and elsewhere. It is, of course, notable that the directionality parameters have clustered together with respect to the properties of binarity and default values, being binary when other parameters are not and not having default values when other parameters do. This may be symptomatic of an interesting partition in the set of possible parameters. 16 Another parametric constraint may also be construed as locational and involves the claim that parametric variation is restricted to occur in the lexicon. The suggestion seems to have been first made by Borer (1984) and it has a clear intuitive appeal. Everyone agrees that the lexical items of a language have to be learned along with their idiosyncratic properties. It is also apparent that those features of languages which make them different from each other have to be somehow acquired and cannot be antecedently specified in the structure of S0. It is natural, therefore, to identify the locus of variation with that aspect of grammar that has to be learned, viz. the lexicon. Support for this localisation of parametric variation has been supplied by Wexler and Manzini (1987), who argue in detail that different values of the governing category parameter cannot be associated with a language once and for all, but have to be linked to specific anaphors and pronouns, since it is possible to find two such items in a single language the syntactic behaviour of which is regulated by different values of the parameter. A more radical alternative is considered in a tentative way by Chomsky (1988b), basing his discussion on Pollock (1987). Having stated the lexical parameterisation view, he goes on to say (p. 44): "If substantive elements (verbs, nouns, etc.) are drawn from an invariant universal vocabulary, then only functional elements will be parameterised". His subsequent discussion argues for just such a parameterisation of functional elements, proposing that AGR is 'strong' in French but 'weak' in English, these attributes being spelled out in terms of the ability or lack of it to transmit 0-roles, and that [+finite] is 'strong' for both languages, whereas [-finite] is 'weak'. These proposals enable Chomsky, again following Pollock, to produce a comprehensive account of the behaviour of adverbials, quantifiers, negation, etc. in simple clauses in English and French. 17 There are at least two reasons for being cautious about these proposals, which clearly represent a significant attempt to localise parametric effects. First, they do not bear at all on the nature of parameters, so the questions

16

Martin Atkinson

with which I began this section stand unanswered, i.e. we are no nearer an understanding of exactly what forms of lexical parameterisation are legitimate and we have at best partially responded to the dangers of descriptivism. Second, as Safir (1987) observes, restricting variation to the lexicon runs the risk of losing generalisations, if it transpires that all, or even most, lexical items of a particular category behave in a certain way. As we shall see in the next section, Wexler and Manzini themselves confront this sort of problem in connection with Binding Theory phenomena, but their way of dealing with it is not entirely satisfactory. Safir's specific worries again concern the directionality parameters and could be met by extending the notion of lexical parameterisation to zerolevel categories. Thus, the claim that verbs in English uniformly assign Case and 0-role to the right would fall under this extended notion of lexical parameterisation. As far as the more radical version of lexical parameterisation, restricting it to functional elements, is concerned, it is perhaps premature to speculate on its plausibility. Suffice it to say that pursuit of it would require the development and justification of an inventory of functional categories and their properties (for an initial view on such an inventory, see Abney 1987) and a re-analysis of the whole range of linguistic variation in terms of these properties. An instance of how progress might be achieved in this regard is Fassi-Fehri's (1988) discussion of Case assignment in Arabic and English. Adopting Abney's (1987) DP analysis, he argues that verbs in English and Arabic uniformly assign accusative case to the right, a necessary consequence of restricting parameterisation to functional elements. However, D and I, both functional elements, differ in the two languages in that they assign genitive and nominative case to the left in English and to the right in Arabic. This proposal enables FassiFehri to construct an interesting account of word-order differences in the two languages.18 This section has surveyed some of the obvious and less obvious ways in which a theory of parameters might be constrained within the instantaneous idealisation. I hope that the need for such constraints is self-evident, but it is not clear which, if any, of the possibilities raised, is appropriate to pursue. Some of these issues will arise in a different context as we now shift away from the instantaneous idealisation and construe the system as developing in real time. 3. PROCEDURAL ISSUES

Dropping the instantaneous idealisation of (16) gives us the schématisation of (18): (18)

S0 — Sj —•

— Sn

The logical problem of language acquisition

17

Here, again, S 0 and S n designate the initial and final states, but now we recognise a succession of intermediate states. A large number of questions arise in this context, but in this section I shall focus on aspects of just two of these. What is the nature of the developmental process which mediates between the various states in this sequence? And does S 0 contain a full inventory of the principles and parameters of Universal Grammar, thereby implying that the same is true of the intermediate states, or do some principles and parameters only become available as the child develops? If this latter possibility is correct, an immediate further question arises: what is responsible for the emergence of those principles and parameters which are not available in the initial state? Let us initially focus on the first of these questions, assuming for the purposes of this discussion that, indeed, the full set of principles and parameters is present from S 0 . An obvious way to view the offerings of the instantaneous idealisation is in terms of it providing a restricted set of hypotheses in line with (la), each hypothesis corresponding to a core grammar; then the learner's task is seen as that of selecting and testing hypotheses on the basis of exposure to data which are subject to the criterion of'epistemological priority', and the job of the theorist, no longer operating with the idealisation, is to provide a detailed account of exactly how hypotheses are selected, what 'epistemological priority' amounts to, etc. Presumably, there will be a relation of 'content' between selected hypotheses and the data which occasion their selection. For example, we would anticipate that the governing category parameter for a particular anaphor will be set, or re-set, on the basis of exposure to data containing that anaphor, represented as such by the child, in a relevant structural configuration. From the perspective of Fodor (1981), such content-relatedness is diagnostic of paradigmatic cases of learning, yet supporters of the parameter-setting account often give the impression that they are offering something quite distinct from a learning account, and PiattelliPalmarini (1989) suggests that applications of the label 'learning' to the envisaged procedures is quite wrong and should be resisted. With the rejection of learning comes the rejection of hypothesis selection and testing, since this is the only coherent account of learning within a mentalistic framework. Whether the claim that something conceptually distinct from learning is going on here is a question of some importance. Exactly where does the distinctiveness of development in this model reside? First, and most obviously, the restrictedness of the hypothesis space might be seen as contributing to this distinctiveness, but a moment's reflection should persuade us that this is unlikely. In the typical concept 'learning' experiment, there is normally only a finite (and small) number of obvious candidates for stimulus variation, and the subject's task is to

18

Martin

Atkinson

fix a value for each of these. As Fodor (1975, 1981) maintains, the only remotely plausible story that has ever been told about what goes on in such experiments has the subject selecting and testing hypotheses, the hypotheses being related in 'content' to occasioning stimuli, and this is a learning situation. Furthermore, there are cases in the linguistics literature which make it clear that something like this is seen to be going on. So, consider Huang's (1982) discussion of English and Chinese word-order and recall that he takes Greenberg to task for (i) failing to account for why word-order properties cluster in the way they do, and (ii) failing to account for exceptions to his statistical tendencies. Huang's alternative is to formulate a version of the head-direction parameter and, indeed, this comes to terms with (i) in a straightforward way. For (ii), however, Huang has to recognise that the head-direction parameter is not set once and for all for all categories and all bar levels, and he has to contemplate a learner refining hypotheses in the light of additional experience with the language. More generally, any account that admits of a parameter being wrongly set, thereby requiring re-setting (and this applies to some of the best-known proposals in the field, e.g. Hyams 1986, Wexler and Manzini 1987), has to have some mechanism for achieving this re-setting, and, at this level of generality, it is not clear that authors have anything other than hypothesis testing in mind. Qualitatively, we have no difference between this sort of account and standard views on learning, although quantitatively, particularly in comparison to the account offered in classic transformational grammar in (14) above, there may be major differences in terms of the size of the hypothesis space (see Atkinson 1987, for more extended discussion). If distinctiveness does not lie in a shift away from hypothesis testing per se, perhaps it resides in properties of the mechanism by which hypotheses are selected and tested. The Subset Principle, as developed by Wexler and Manzini (1987), building on earlier suggestions of Berwick (1985), can be viewed in this context. The Subset Principle is designed to directly alleviate the difficulty arising from the no negative data assumption by rendering the learner conservative in a straightforward sense. 7/"parameter values give rise to set-theoretically nested languages, then the Subset Principle obliges the learner to select the least inclusive language compatible with the data received so far and the parameter value yielding this language is deemed to be less marked than those giving rise to more inclusive languages. Modifications 'upwards' will always be possible in the light of further positive data to justify them; modifications 'downwards' will never occur, but, then, if things have gone according to plan, they will never be needed. There are several points to make about the Subset Principle.

The logical problem of language acquisition

19

First, Wexler and Manzini offer it as a principle of a learning module. As such, it is not part of the initial state, S0. As a consequence, markedness orderings do not have to be specified as part of S0 either (see p. 13 above), since they arise naturally from the operation of a learning module which is constrained by the Subset Principle. This is attractive, since the markedness orderings for values of the governing category parameter for anaphors and pronouns are mirror-images; therefore, it would not be possible to specify a single markedness ordering for this parameter as part of the initial state. Second, there is nothing in the postulation of the Subset Principle to prevent the account including it from being a learning account; Wexler and Manzini's location of the principle in a learning module is a clear indication that this is not a view which they would find objectionable. Again, consider the case of concept learning experiments. Here there is ample evidence to suggest that subjects have available some sort of a priori ordering of hypotheses, (for example, conjunctive concepts defined on the parameters of stimulus variation are more accessible to subjects than disjunctive concepts in the same domain) and that this determines their learning strategies. Of course, the situations are different in a number of ways. The ordering of concepts is not determined by set-theoretic inclusion and the learning process is therefore not deterministic, as it is for values of the governing category parameter; in the concept learning experiment the subject is standardly supplied with negative feedback; and whereas the Subset Principle is intended to be instrumental in every language learner's selection of hypotheses, there is no suggestion that every learner in a concept learning experiment operates with exactly the same a priori ordering of hypotheses. But there is a sense in which the notion of learning transcends these differences, and it is with reference to this sense that I submit that there are important similarities between the two situations. Finally, the extensional character of the Subset Principle is worthy of attention. Wexler and Manzini, being anxious to get markedness orderings out of Universal Grammar for reasons outlined above, have to assume that their learning mechanism is capable of computing extensional relationships between the languages determined by particular parameter settings. Now, there is little value in speculating on the computational resources of the learner's mind, but some (e.g. Safir 1987) have seen this computational assumption as a somewhat implausible aspect of the account, and it is debatable how it should be accommodated to the current orthodox position emphasising the importance of I-language, as opposed to Elanguage, in linguistic theorising (Chomsky 1986b). An alternative which might be worth considering is to construe the Subset Principle intensionally on definitions of primitives, and Saleemi (1988, this volume) does this for pro-drop, suggesting that his framework is extendible to governing

20

Martin Atkinson

category, but even if this proves feasible, it does not bear on the main issue under consideration. Overall, it seems that there is no compelling reason to view the Subset Principle as requiring us to move away from a learning account. The learning in question is 'special' in that it is governed by a domain-specific principle for selecting hypotheses, but that selection and testing of hypotheses is going on is surely incontestable. The plausibility of the Subset Principle, a property of the learning module, is seen as deriving from the Subset Condition, a constraint on possible parameters, which might, therefore, be seen as responding to some of the issues raised in Section 2 (cf. fn. 12) from a procedural perspective. The Subset Condition is defined in Wexler and Manzini (1987, 60) as in (19): (19)

For every parameter p and every two values i, j of p, the languages generated under the two values of the parameter are one a subset of the other, that is, L(p(i)) c L(p(j)) or L(pG)) c L(p(i))

I am puzzled by the status of this condition, and it is worth attempting to articulate this puzzlement. First, (19) has a universal quantifier in front of it, ranging over the set of parameters. It is, however, readily apparent, if we consider just the parameters which have been mentioned earlier in this paper, that they do not all have values that yield nested sets of languages. The directionality parameters constitute the most obvious instantiation of this claim. Therefore, (19) is false of parameters as they are currently understood. Are we then to read (19) as normative and as rendering illegitimate any parameter which does not conform to it? This would certainly constitute an interesting and strong constraint on the theory of parameters, but since Wexler and Manzini acknowledge the existence of such non-conforming parameters, this appears to be an unlikely interpretation. Are there alternatives? Wexler and Manzini have this to say (1987: 61): "What do we mean when we say that the Subset Condition is necessary? We say that it is necessary in order for the Subset Principle to be always applicable. In other words, if the values that the learning function selects on the basis of data are determined by the Subset Principle and by nothing else, then the values of a parameter must determine languages which form a strict hierarchy of subsets". But, in my view, this only succeeds in converting the Subset Condition from a false claim to a vacuous claim. What this passage suggests is that the universal quantifier at the front of (19) should be restricted to parameters to which the Subset Principle is applicable. But, for the Subset Principle to be applicable, parameter values must yield set-theoretically nested sets of languages. There are possible extensional relations between the languages generated by particular parameter values which might give rise to interesting con-

The logical problem of language acquisition

21

straints on the theory of parameters. For example, we could consider the possibility that if for a particular parameter, one pair of values gives nested languages, then any pair of values does, i.e. parameters come in two varieties, those which uniformly produce nested languages and those which uniformly do not; there are no mixed parameters. This is obviously true of the governing category parameter, but other multi-valued parameters against which it could be evaluated are not well understood, so I shall not pursue the matter here. 19 I would, however, emphasise that I believe that the Subset Condition should be laid to rest. A matter which has been mentioned above but which, so far, nothing much has been made of is that of the deductive structure of the theory. The idea, roughly, is that setting a parameter in one way or another can have wide consequences throughout the system, and this immediately suggests a way in which the real-time development of the system might have distinctive characteristics when compared to paradigmatic learning models; perhaps it is possible for some aspect of the system to be 'fixed' by virtue of the 'fixing' of some other aspect. Again alluding to concept learning, this is not a situation that obtains there, since the various parameters of stimulus variation are assumed to be independent. The immediate question to raise in this connection is: in the schema 'X leads deductively to Y', what are the legitimate substitutions for X and Y? Standardly, it seems to me, X will be a parameter value and Y will be some property of the grammatical representations of sentences. The deduction may well employ a number of suppressed premises, relating to other parameter values. But, of course, there is another possible substitute for Y, namely the value of a distinct parameter. Thus, we are considering statements of the form: Pj(a) leads deductively to P2(a') where P] and P 2 are distinct parameters and a, a' are specified values of P,, P 2 respectively. This deductive relationship is presumably stipulated as a property of the initial state, and to the extent that we can justify such statements, we would appear to have something qualitatively different to learning going on here, at least as far as the value of P 2 is concerned. Indeed, it may be appropriate to construe the setting of P 2 in this scenario as 'triggering'. 20 Again, some caution is necessary. The possibility being contemplated here is likely to lead to a response from the linguist which points us in rather different directions to those we are envisaging. Pi(a) will be justified on the basis of certain data D (which may of course be deductively some way distant from Pi) and P 2 (a') on the basis of data D'. Thus, we shall be considering a cluster of phenomena D U D ' , and the implicational statement amounts to the claim that we never have D without D'. 21 In these circumstances, the linguist will typically search for a single parameter responsible (in interaction with other parameters and principles) for both

22

Martin

Atkinson

D and D'. I conclude, therefore, that the linguistic tradition does not readily accept stipulations of deductive relationships between parameter values. The existence of implicational relationships between parameter values comes under pressure from a different consideration in the learnability context. Wexler and Manzini (1987), considering the operation of the Subset Principle in the case where many parameters are to be set, formulate an Independence Principle, the content of which is that the set-theoretic relationships between languages generated by the values of one parameter should not be disturbed by the values of other parameters. If these relationships were not robust in this fashion, there would be no way for the Subset Principle to function in a consistent way. At first glance, it would appear that the Independence Principle rules out exactly the sort of implicational relationships we are considering here. Newson (this volume) argues that this is not necessarily the case, pointing out that implicational relationships between the values of distinct parameters are not guaranteed to change set inclusion relations. What they will do is make certain languages illegitimate, and it will follow that markedness hierarchies for parameter values will not be calculable by the learning module, since some of the languages which constitute the input to the computation will not be available. 22 However, we have already seen above that reservations about the extensional computations of the Wexler and Manzini account have been expressed (Safir 1987), and it therefore seems worthwhile to put these aside and give serious consideration to including implicational statements as part of Universal Grammar. Newson (1988 and this volume) pursues this course for Wexler and Manzini's governing category parameter, arguing that the value of this parameter for a pronominal is initially fixed on the basis of the value for a corresponding anaphor. This enables Newson to produce coherent accounts of two phenomena which are recalcitrant in the Wexler and Manzini framework. Manzini and Wexler (1987) note that it is never the case that the governing category for pronominals in a language properly contains the governing category for anaphors. If this were the case, there would be domains between the pronominal and anaphor governing category boundary in which binding relations would be inexpressible. To rule out this possibility, they formulate the Spanning Hypothesis, as in (20): (20)

Any given grammar contains at least an anaphor and a pronominal that have complementary or overlapping distribution.

Commenting on the status of (20), they say (p.440): "... it seems plausible that [it] expresses a proposition that happens to be true of natural languages as they have actually evolved, but has no psychological necessity, either

The logical problem of language acquisition

23

as part of the theory of learnability or as part of the theory of grammar." There might be justified suspicion about the basis of this remark; intuitions about those properties which are true of all languages via psychological (ultimately, biological) necessity versus those which merely happen to be true are not guaranteed to travel well, and it is a virtue of Newson's approach that he can account for (20) without relying on this difficult distinction. Briefly, if a pronoun is required to initially take the value of its corresponding anaphor, subsequent positive evidence can either leave this situation unchanged, in which case anaphoric and pronominal governing categories coincide or it can make the pronominal value more marked, in which case the governing category for the anaphor will contain that of the pronoun. These two situations are exactly those stipulated by Manzini and Wexler's (20). The second problem noted by Manzini and Wexler is the fact that pronouns taking unmarked values of the governing category parameter are remarkably infrequent. Furthermore, the majority of pronouns they consider appear to take the maximally marked value of the parameter, and, while there is no necessary positive correlation between unmarkedness and distribution in the world's languages on their account, the existence of a seemingly massive negative correlation has to be seen as worrying. Again, on Newson's proposal, these distributional phenomena follow naturally. The majority of anaphors, as expected, take the minimally marked value of the parameter. If the values for pronouns are initially fixed by reference to the values for anaphors, the majority of pronouns will be acquired with the maximally marked (for pronouns) value. Furthermore, positive evidence will not enable this situation to change, so most pronouns will be 'stuck' with this maximally marked value. The only way for a pronoun to take the maximally unmarked value will be on the basis of being linked to an anaphor which takes the maximally marked (for anaphors) value. This will be an unusual situation, and, even if it should arise, positive data will still provide the pronoun with the opportunity of taking a more marked (for pronouns) value. Of course, Newson's account raises many questions and makes predictions about the distribution of anaphor-pronoun pairs in languages and about the course of development of such pairs (for relevant discussion of the latter, see Wexler and Chien 1985; Solan 1987). For the moment, it is sufficient to have established that the possibility of Universal Grammar containing implicational statements linking parameter values should not be summarily dismissed. To close this paper, I would like to briefly consider the second question with which this section opened, that of whether all the principles and parameters of Universal Grammar are available to the child from the onset of acquisition (see Hoekstra, this volume, for specific discussion of this issue). This claim, made, for example, by Hyams (1986), embodies what

24

Martin Atkinson

Borer and Wexler (1987) and Wexler (1988), following Pinker (1984), refer to as the Continuity Hypothesis. The best-known arguments against the Continuity Hypothesis are set out in Borer and Wexler (1987). Most obviously, they draw attention to what they refer to as the Triggering Problem in connection with Hyams' (1986) account of the re-setting of the pro-drop parameter for children acquiring a [-pro-drop] language such as English. For Hyams, this resetting is 'triggered' by the presence of expletive subjects, but the question that immediately arises is that of why this triggering does not occur earlier, since the child is exposed to sentences containing expletive subjects from an early age. Borer and Wexler do not offer an alternative theory of prodrop in their paper, but, to illustrate an area where they feel a non-continuity account is insightful, they propose that the child's early 'passives' in English and Hebrew are all adjectival and therefore do not involve movement. Movement involves the representation of A-chains, and, they claim, this aspect of Universal Grammar is not available to the child at the stage at which the earliest 'passives' are produced. These 'passives' it is assumed, are all lexical. This suggestion receives further support from a consideration of causatives in English and Hebrew and also plays a role in accounting for a range of control phenomena in Wexler (1988). It seems to me plausible to consider similar proposals in connection with Radford's (1988) claim that "small children speak small clauses", his explication of this being in terms of children lacking an I-system at the relevant stage, and his extension of this claim to include the C-system and D-system in Radford (1990). It is not my purpose here to submit such proposals to critical scrutiny (for some remarks on the Borer and Wexler proposals, see Hoekstra, this volume; also Weinberg 1987). Rather, I shall take the correctness of some kind of non-continuity hypothesis for granted and briefly consider the question of the developmental mechanisms it requires. We might be tempted to think that a non-continuity hypothesis is consistent with a learning emphasis and that the representation of A-chains, I-constituents, etc. is somehow induced by the child on the basis of exposure to the linguistic environment. But well-known arguments of Fodor (1975, 1980, 1981) militate against this approach. If learning is to be viewed in terms of hypothesis testing, the hypotheses must be available to be tested, and Fodor's conclusion that a 'more expressive' system cannot develop out of a 'less expressive' one by this mechanism follows. An alternative, advocated by Borer and Wexler, is that the relevant representational capacities mature, coming on-line according to some genetically determined schedule. This perspective raises a number of interesting issues which are likely to be the subject of considerable debate in the near future. First, and most obviously, there is clearly nothing unintelligible about

The logical problem of language acquisition

25

a maturationalist account of linguistic capacities, once one subscribes to the view that significant aspects of linguistic development are genetically determined, a position which is now surely standard throughout the field. Other genetically determined capacities unfold as the child matures, so the onus is on the detractors of maturational accounts to say why linguistic capacities should be different in this respect. Second, the extent to which a maturational account can provide insightful analyses for a wide variety of data must be pursued. The case for Achains has a good deal of support (see, however, Hoekstra, this volume), as, I believe, does that for the maturational emergence of the functional categories I, C and D, but note that none of this impinges on our earlier preoccupation with parameters. Indeed, it is not immediately clear whether maturational considerations are going to be relevant to parametric variation. To illustrate, if, say, it is a parametric property of I that it assigns Case to the left or the right, then the development of I, perhaps according to a maturational schedule, will be equivalent to the development of the parameter. It will not, of course, be equivalent to the development of the correct value of the parameter, but, as has been argued above, it seems that some notion of learning will be the relevant developmental mechanism in this respect. Third, there is a clear sense in which the child language theorist's task has to be reconstrued if maturational accounts prove to be generally insightful for developmental phenomena. A question that has rightly concerned many workers in the field over the years is that of why a specific capacity emerges before another (see Atkinson 1982, for an extended defence of the view that up to 1980, theorists interested in a wide range of acquisitional phenomena had not been very successful in approaching this question). But in a maturational theory, this question probably becomes improper at the level at which linguists and psychologists speculate. For example, Borer and Wexler note that at the time at which they wish to maintain that the child does not control A-chains, he nevertheless controls A'-chains, as witnessed by the productive use of w/¡-questions. A natural question to ask is: why are A-chains later to develop than A'-chains? And the answer might be: they just are. Of course, the biologist might at some distant point in the future be able to tell us why this is, but at the level of linguistic and psychological theorising, explanation stops here.23 The situation briefly described here is rather reminiscent of that favoured by Fodor (1981) in his discussion of the ontogenesis of concepts. There, Fodor distinguishes between rational causal processes which involve projecting and testing hypotheses formulated in terms of some privileged set of concepts and brute causal triggering which requires an essentially arbitrary relationship between the occasioning stimulus and the resulting concept. Facing up to the observation that concepts appear to be acquired

26

Martin

Atkinson

in a more or less fixed order, and convinced that this is not explicable in terms of later-acquired concepts being defined in terms of earlier-acquired ones, Fodor extends the notion of brute-causal triggering to embrace the possibility that certain concepts, while not defined in terms of others, nevertheless have others as causal antecedents. To the extent that this view of the development of concepts can be maintained, again the psychologist's task should involve looking rather than attempting to analyse concepts in terms of others. Such looking will reveal the layered conceptual structure of the mind, but this structure will ultimately only be rationally explicable in biological terms. The extent to which representational capacities germane to the development of syntax can be defined in terms of more basic capacities is, in my view, a question still on the agenda (cf. fn. 15). To the extent that they can, we may, at least in principle, contemplate producing a 'rational' account of linguistic development. To the extent that they cannot, there would appear to be no alternative to looking. The conclusion suggested by the considerations in this section is that if the acquisition of syntax is to be seen as having characteristics which take it clearly outside the domain of learning, these will result from the correctness of the non-continuity view. For the development of various formal operations and the principles formulated in terms of them, this is a perspective well worth pursuing. For linguistic variation, encoded in distinct parameter settings, however, prospects of this kind do not look inviting. While it makes sense for a parameter to become available as a result of maturational scheduling, there is little to be said for its values entering the system at different times. To date, I feel that there is no compelling evidence to suggest that learning, perhaps in a very attenuated sense, has no role to play in this aspect of development.

FOOTNOTES 1. Arguably, the problem has always had a central role in Chomsky's theorising, particularly in his less technical works, e.g. Chomsky (1975). Isolated examples in the linguistics literature, such as Peters (1972), Baker (1979) also exist. 2. An additional component of formalised versions of this framework is usually an assumption about what should count as acquisition. The most obvious candidate here is that there should be some finite time at which the learner selects the correct hypothesis and then retains this hypothesis as further data are presented. For discussion of other possibilities, see Wexler and Culicover (1980), Osherson, Stob and Weinstein (1986). I shall suppress reference to this component in my discussion. 3. Readers familiar with the literature will recognise this as an informal characterisation of Gold's (1967) text presentation. 4. This is an informal characterisation of the condition of informant presentation in Gold (1967).

The logical problem of language

acquisition

27

5. The core-periphery distinction is not one on which I shall focus here, although its utilisation in linguistic argument may in itself constitute an interesting area for reflection. Chomsky (1988b: 70), briefly referring to this distinction, says: "... [it] should be regarded as an expository device reflecting a level of understanding that should be superseded as clarification of the nature of linguistic inquiry advances". 6. What is interesting about (5) and (6) is that both of them involve a Subjacency violation (although, see Chomsky (1986a, 50) for the suggestion that this may not be the correct account of extraction from whether-c\auses). In addition, (6) includes an ECP violation, since the empty subject position in the embedded clause is not properly governed. (7) and (8) are also distinguished in terms of the ECP, with a violation of this principle occurring in (8). For discussion of the relevant theoretical concepts, see Lasnik and Saito (1984). 7. Baker accounts for these differences in terms of the instrumental NP receiving its 0role directly from the verb, whereas the benefactive NP is part of a PP at D-structure and receives its 6-role from the preposition. Readers are referred to Baker's discussion for extensive justification of this asymmetric behaviour of different sorts of PP. 8. It is a matter of some contention whether the hypothesis selection and testing framework adopted here is appropriate for the speculations we shall be considering below. Due notice will be given to this issue at the appropriate time. For the purposes of this introduction, I believe that adopting this mode of talk is harmless and quite useful. 9. The clearest argument for the innocence of the idealisation for linguistic theory is the fact that the end state, S„, is remarkably uniform and does not seem to be at the mercy of vagaries in the order in which data are presented. Given this, questions to do with the order in which parameters are set, for example, will be irrelevant to the primary concern of the linguist. 10. Whether information about markedness for parameter values should be included in S 0 is an issue to which I shall return in Section 3. 11. Safir (1987: 77-8) puts the worry thus: "... our assumptions about what counts as a "possible parameter" or a "leamable parameter" remain very weak. ... what is to prevent us from describing any sort of language difference in terms of some ad hoc parameter? In short, how are we to prevent S t a n d a r d ] Parameter] T[heory] from licensing mere description?" 12. I refer here specifically to the Subset Principle and the Subset Condition of Manzini and Wexler (1987) and Wexler and Manzini (1987). Another 'procedural' constraint might be that any legitimate parameter must be capable of being set on the basis of the evidence available to the child. Thus, a parameter requiring negative evidence or highly complex evidence in order to be set would be illegitimate on this basis. See Lightfoot (1989) for how some parameters could be set by degree-0 data. 13. This is not to suggest that such réévaluation would be impossible, and it might be interesting to consider the possibility of replacing non-binary parameters by several distinct binary parameters which together conspire to yield the effects of the original. 14. More recently, Hyams (1987) has proposed a reanalysis of pro-drop phenomena in terms of morphological uniformity (see Jaeggli and Safir 1988). Briefly, the idea is that languages which have morphologically uniform verbal paradigms allow pro-drop. Thus, Italian, in which all verbal forms are inflected,"and Chinese in which none are, are prodrop languages, but English, which admits both inflected and non-inflected forms, is not. With this reanalysis, Hyams is able to maintain that [+morphologically uniform] is the unmarked parameter setting on learnability grounds, since positive data in the form of inflected and uninfected forms will serve to re-set the parameter. If the initial setting were to be [-morphologically uniform], no such re-setting could occur. This simple and attractive idea involves a number of complications concerning licensing and identification, and I shall not discuss it further here.

28

Martin

Atkinson

15. To talk of primitives in this connection is merely to acknowledge that principles refer, in their formulation, to a variety of configurational and non-configurational notions. These could be viewed as 'primitive' modulo the statement of the principle. Whether there is a fundamental primitive basis for the whole account and, if there is, how it relates to the issue of epistemological priority mentioned earlier is an interesting issue which will not be pursued here. 16. The directionality parameters, presumably formulated in terms of the predicates 'right o f and 'left o f would appear to be readily relatable to a primitive epistemological basis. Again, this raises issues beyond the scope of this paper. 17. 'Strong' and 'weak' are, of course, mere mnemonic labels for the distinction. It seems to me reasonable to ask why these properties involve 9-role transmission in the way claimed, i.e. it is not clear that reference to 8-roles is more than another labelling of the distinction. Speculations such as those being considered here take on additional perspectives in the light of Radford's (1988,1990) claims that children acquiring English pass through a pre-functional stage during which they offer evidence of their control of lexical categories and their projections, but give no indication of having mastered the functional systems based on I, C and D. A number of very interesting questions emerge from a juxtaposition of Chomsky's speculations and Radford's empirical claims, particularly if the latter are generalisable to the acquisition of languages other than English. For example, it would appear to follow that any systematic pre-functional variations in the speech of children, say in word order, must be referred to factors which are not properly viewed as belonging to the language module. It would not be appropriate to pursue the ramifications of this suggestion in this paper. 18. Fassi-Fehri's account also requires the assumption that subjects appear at D-structure in some projection of V (or N) which occurs as complement to I (or D). Sportiche (1988), who also adopts this proposal, suggests that languages are parameterised as to whether this subject obligatorily moves to (Spec, I) at S-structure. This 'unconstrained' parameterisation becomes principled on Fassi-Fehri's account; in languages like English, the subject has to move in this way to get nominative case, assigned by I to its left. The movement of the subject follows from a localised parameterisation and does not, it itself, constitute the parameterisation. 19. Alternatively, the applicability of the Subset Principle might be relatable to binarity, being restricted to multi-valued parameters, there then being two distinct types of development countenanced by the model. One would fall under the Subset Principle and involve significant learning; the other would conform more closely to the switch-setting analogy. It is premature to speculate further in this respect. 20. In the literature, 'triggering' often appears to be identified with 'having consequences beyond those immediately contained in the data' but this is obviously true of paradigmatic learning situations, and it seems that some reference to 'content' and 'arbitrariness' is necessary to distinguish these two notions (see Fodor 1978 for illuminating discussion). The appropriateness of the label 'triggering' in this scenario will depend on whether the data leading to the fixing of P] bear an opaque relationship to P 2 . To take an implausible, but relevant case, it is conceivable that the fixing of the governing category parameter is implicationally dependent on the fixing of the head direction parameter, and if this were so, we would surely be justified in asserting that a phrase with a particular head-complement order triggers a value of the governing category parameter. This is not learning. 21. We could, of course, have D' without D if the implication is not bilateral, but I will set this possibility aside here as it does not bear centrally on the discussion. 22. It is not clear that even this is necessary. Given a strict separation between Universal Grammar and the learning module, it is conceivable that the latter could have access to 'impossible' languages to facilitate its computations, e.g. those obtained by removing just the implicationally induced constraints from Universal Grammar.

The logical problem of language

acquisition

29

23. It is noteworthy that, if this view is largely correct, then the traditional concerns of developmentalists in accounting for how stages develop out of their predecessors evaporate; there is no such development.

REFERENCES Abney, S. 1987. The English Noun Phrase in its Sentential Aspect. Doctoral dissertation, MIT. Aldridge, M. 1988. The Acquisition of INFL. Research Monographs in Linguistics, UCNW, Bangor 1. (Reprinted by IULC). Atkinson, M. 1982. Explanations in the Study of Child Language Acquisition. Cambridge: Cambridge University Press. Atkinson, M. 1987. Mechanisms for language acquisition: learning, parameter-setting and triggering. First Language 7. 3-30. Baker, C. L. 1979. Syntactic theory and the projection problem. Linguistic Inquiry 10. 53381. Baker, M. 1988a. Incorporation: A Theory of Grammatical Function Changing. Chicago: University of Chicago Press. Baker, M. 1988b. Theta theory and the syntax of applicatives in Chichewa. Natural Language and Linguistic Theory 6. 353-89. Berwick, R. C. 1985. The Acquisition of Syntactic Knowledge. Cambridge, Massachusetts: MIT Press. Borer, H. 1984. Parametric Syntax. Dordrecht: Foris. Borer, H. and K. Wexler. 1987. The maturation of syntax. In T. Roeper and E. Williams (eds.). Chomsky, A. N. 1965. Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT Press. Chomsky, A. N. 1975. Reflections on Language. New York: Pantheon. Chomsky, A. N. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, A. N. 1986a. Barriers. Cambridge, Massachusetts: MIT Press. Chomsky, A. N. 1986b. Knowledge of Language. New York: Praeger. Chomsky, A. N. 1988a. Generative Grammar. Studies in English Linguistics and Literature. Kyoto University of Foreign Studies. Chomsky, A. N. 1988b. Some notes on economy of derivation and representation. MIT Working Papers in Linguistics 10. 43-74. Elliott, W. N. and K. Wexler. 1988. Principles and computations in the acquisition of grammatical categories. Ms. UC-Irvine. Fassi-Fehri, A. 1988. Generalised IP structure, Case and VS word order. MIT Working Papers in Linguistics 10. 75-112. Fodor, J. A. 1975. The Language of Thought. New York: Thomas Y. Crowell. Fodor, J. A. 1978. Computation and reduction. In C. W. Savage (zd.) Perception and Cognition: Issues in the Foundations of Psychology, Minnesota Studies in the Philosophy of Science 9. Minneapolis: University of Minnesota Press. Fodor, J. A. 1980. Contributions to M. Piattelli-Palmarini (ed.) Language and Learning: The Debate Between Jean Piaget and Noam Chomsky. London: Routledge & Kegan Paul. Fodor, J. 1981. The present status oftheinnateness controversy. In J. A. Fodor Representations. Hassocks: Harvester. Gleitman, L. R., E. Newport, and H. Gleitman. 1984. The current status of the Motherese hypothesis. Journal of Child Language 11. 43-79. Gold, E. M. 1967. Language identification in the limit. Information and Control 10. 447-74.

30

Martin

Atkinson

Huang, J. C.-T. 1982. Logical Relations in Chinese and the Theory of Grammar. Doctoral dissertation MIT. Hyams, N. 1986. Language Acquisition and the Theory of Parameters. Dordrecht: Reidel. Hyams, N. 1987. The setting of the null subject parameter: a reanalysis. Paper presented to Boston University Conference on Child Language Development. Jaeggli, O. and K. Safir. 1988. The null subject parameter and parametric theory. Version of a paper to appear in O. Jaeggli and K. Safir (eds.) The Null Subject Parameter. Dordrecht: Reidel. Jakobson, R. 1968. Child Language, Aphasia and Phonological Universals. The Hague: Mouton. Jakobson, R., G. Fant, and M. Halle. 1952. Preliminaries to Speech Analysis. Cambridge, Massachusetts: MIT Press. Lasnik, H. 1985. On certain substitutes for negative data. Ms. University of Connecticut. Lasnik, H. and M. Saito. 1984. On the nature of proper government. Linguistic Inquiry 15. 235-89. Lightfoot, D. 1989. The child's trigger experience: 'degree-0' learnability. Behavioral and Brain Sciences 12. 321-34. Manzini, R. and K. Wexler. 1987. Parameters, Binding Theory, and learnability. Linguistic Inquiry 18. 413-44. Morgan, J. L. 1986. From Simple Input to Complex Grammar. Cambridge, Massachusetts: MIT Press. Newport, E., L. R. Gleitman, and H. Gleitman. 1977. Mother, I'd rather do it myself: some effects and non-effects of maternal speech style. In C. Snow and C. A. Ferguson (eds.) Talking to Children. Cambridge: Cambridge University Press. Newson, M. 1988. Dependencies in the lexical setting of parameters: a solution to the undergeneralisation problem. Ms. University of Essex. Oehrle, R. 1985. Implicit negative evidence. Ms. University of Arizona. Osherson, D., M. Stob, and S. Weinstein. 1986. Systems that Learn. Cambridge, Massachusetts: MIT Press. Peters, S. 1972. The projection problem: how is a grammar to be selected?. In S. Peters, (ed.) Goals of Linguistic Theory. Englewood Cliffs, N. J.: Prentice Hall. Piattelli-Palmarini, M. 1989. Evolution, selection and cognition: from 'learning' to parameter setting in biology and in the study of language. Cognition 31. 1-44. Pinker, S. 1984. Language Learnability and Language Development. Cambridge, Massachusetts: Harvard University Press. Pollock, J. Y. 1987. Verb movement, UG and the structure of IP. Ms. Université de Haute Bretagne, Rennes II. Radford, A. 1988. Small children's small clauses. Transactions of the Philological Society 86. 1-43. Radford, A. 1990. Syntactic Theory and the Acquisition of Syntax. Oxford: Blackwell. Randall, J. 1985. Positive evidence from negative. In P. Fletcher and M. Garman (eds.) Child Language Seminar Papers. University of Reading. Roeper, T. and E. Williams (eds.). 1987. Parameter Setting. Dordrecht: Reidel. Safir, K. 1985. Syntactic Chains. Cambridge: Cambridge University Press. Safir, K. 1987. Comments on Wexler and Manzini. In T. Roeper and E. Williams (eds.). Saleemi, A. 1988. Learnability and parameter-fixation: the problem of learning in the ontogeny of grammar. Doctoral Dissertation, University of Essex. Solan, L. 1987. Parameter setting and the development of pronouns and reflexives. In T. Roeper and E. Williams (eds.). Sportiche, D. 1988. A theory of floating quantifiers and its corollaries for constituent structure. Linguistic Inquiry 19. 425-50. Weinberg, A. 1987. Comments on Borer and Wexler. In T. Roeper and E. Williams (eds.).

The logical problem of language

acquisition

31

Wexler, K. 1982. A principle theory for language acquisition. In E. Wanner and L. R. Gleitman (eds.) Language Acquisition: The Slate of the Art. Cambridge: Cambridge University Press. Wexler, K. 1988. Aspects of the acquisition of control. Paper presented to Boston University Conference on Language Development. Wexler, K. and Y. C. Chien 1985. The development of lexical anaphors and pronouns. Papers and Research on Child Language Development 24. 138-49. Wexler, K. and P. W. Culicover. 1980. Formal Principles of Language Acquisition. Cambridge, Massachusetts: MIT Press. Wexler, K. and R. Manzini. 1987. Parameters and learnability in Binding Theory. In T. Roeper and E. Williams (eds.).

Observational data and the UG theory of language acquisition Vivian Cook University of Essex

The theory of Universal Grammar (UG), as proposed by Chomsky (e.g. Chomsky, 1986a, 1988), sees the acquisition of the first language as a process of setting the values of parameters in the system of language principles with which the human mind is endowed, in response to "triggering" evidence, these principles and parameters being couched in terms of the Government/Binding (GB) theory of syntax (Chomsky, 1981a, 1986a, 1986b). The aim of this paper is to examine how observations of actual children's language can be related to this Chomskyan model of UG. It does not therefore concern the use of such observational data within approaches that do not share the UG assumptions.

1. EVIDENCE IN THE UG MODEL

The claims that UG theory makes for language acquisition are largely based on the "poverty of the stimulus" argument; given that the adult knows X, and given that X is not acquirable from the normal language input the child hears, then X must have been already present in the child's mind. This crucial argument uses the comparison of the knowledge of language that the adult possesses with the initial state of the child to establish what could not have been acquired from the types of evidence available and must therefore be innate. Chomskyan UG theory would not be discomfited if other evidence from acquisition were not forthcoming. On the one hand, such research is not of prime importance, given the reliance on the poverty of the stimulus argument. On the other, evidence from language development in the child is related with difficulty to acquisition because of the other factors involved in language performance and development - production and comprehension processes, situation and use, the growth in other mental faculties, and so on - all of which are "non-stationary" (Morgan, 1986) and liable to change as the child grows older. The attraction of the current model is that the aspects of language builtin to the mind are precisely the principles of GB theory - the Projection Principle, the Binding Principles, and so on; the aspects that have to be

34

Vivian Cook

learnt are the settings for parameters of variation, and the properties of lexical items. Hence built-in principles of syntax can now be postulated in a rigorous form that has testable consequences; it is possible to start looking for evidence of the effects of U G in children's language development. And also the reverse; it is possible to start phrasing research into acquisition in ways that can affect issues of linguistic theory, the prime example being the work of Hyams (1986) and Radford (1986).

2. I-LANGUAGE A N D E-LANGUAGE THEORIES

A starting point is the distinction made in Chomsky (1986a) between Ilanguage (internalised language) and E-language (externalised language). I-Language is "a system represented in the mind/brain of an individual speaker" (Chomsky, 1986a, p.36); the task of I-language theories is to describe this mental possession; hence I-language syntax is closely related to the mind and to psychology. E-language is a collection of sentences "understood independently of the properties of the mind" (Chomsky, 1986a, p.20); it is the speech people have actually produced, the ways they use language to interact, and the description of the statistical properties of language events; E-language syntax is related to social interaction and to sociology. In I-language theories, acquisition research is seen as explaining how the child acquires knowledge of language; in E-language theories, it is seen partly as describing the regularities in a corpus of the child's sentences, partly as describing how the child develops social interaction. UG theory is concerned with I-language knowledge rather than with the E-language regularities in a set of utterances. GB does not necessarily confine itself to a single form of data: "In principle evidence ... could come from many different sources ... perceptual experiments, the study of acquisition and deficit, or of partially invented languages such as Creoles, or of literary usage or language change, neurology, biochemistry, and so o n " (Chomsky, 1986a, p.36-37); its chief evidence is whether a sentence conforms to the speaker's knowledge of language. In practice GB theory has concentrated almost exclusively on what can be called "single-sentence" evidence; a single sentence, such as Is the man who is here tall? (Chomsky, 1980, p39) or ¿Esta el hombre, que esta contento, en la casa? 'Is the man, who is happy, at home?' (Chomsky, 1988, p.42), is sufficient to attest to the native speaker's language knowledge, and hence to invoke the poverty of the stimulus argument. An I-language theory takes Is the man who is here tall? to be a grammatical sentence on the linguist's own say-so without hunting for observations of native speakers using it in actual speech. But single-sentence evidence is difficult to use in acquisition research since native children cannot be

Observational data and the UG theory of language acquisition

35

asked directly whether they accept Is the man who is here tall? as their answer would not be meaningful. Children are by and large not capable of attesting unambiguously that a particular sentence is or isn't generated by their grammar. Other than single-sentence evidence or the pure poverty of the stimulus argument, what else can count as evidence of language acquisition in a UG framework? One possibility is to use experimental techniques and statistical procedures from the psycholinguistic tradition. The research of the past decade has employed a wealth of techniques ranging from the elicited imitation tasks used by Lust and her associates (1989) to the comprehension tasks employed by Matthei (1981) and others; indeed the specific case of structure dependency seen in Is the man who is here tall? was investigated by Crain and Nakayama (1983) through a question production task. The validity of such forms of evidence is not the concern here; an account of some of their merits and demerits is seen in Bennett-Kastor (1988). Instead the discussion will be restricted to one type of evidence that has been used within UG theory, namely the use of sentences observed in actual children's speech, which can be called "observational data". If a child is heard to say Slug coming, what status does this sentence have as evidence for UG theory? The main argument here is that there is an inherent paradox in using observational data to support a UG model that needs to be aired, even if it cannot be resolved. Observational data belongs in essence to E-language; the typical E-language study of acquisition looks at statistically prominent features found in a substantial collection of children's speech, say Brown (1973) or Wells (1985). The major problem is how to argue from E-language descriptive data of children's actual speech to their I-language knowledge, a problem first perhaps highlighted in Chomsky (1965) as "a general tendency... to assume that the determination of competence can be derived from description of a corpus by some sort of sufficiently developed data-processing technique". While it is interesting and instructive to use observational data to investigate the UG claims, the chain of qualifications and inferences between such data and language knowledge is long and tortuous.

3. OBSERVATIONAL DATA, PERFORMANCE A N D DEVELOPMENT

There are two related dimensions to this within the UG theory - performance and development. Any use of performance data by linguists faces the problem of distinguishing grammatical competence from the effects of production and comprehension processes, short term memory, or other non-competence areas of the mind involved in actual speech production. Single-sentence evidence is immune to all of these factors. In this sense

36

Vivian Cook

children's speech presents exactly the same problems for the I-language analyst as the speech of adults. GB oriented linguists base their syntactic analyses on single example sentences rather than on chunks of performance; they too have problems with deriving the knowledge of the native speaker from samples of raw performance. But children's language also ties in with their development on other fronts; the actual sentences they produce reflect their developing channel capacity, that is to say a mixture of cognitive, social, and physical development, from which the effects of language acquisition need to be filtered out. The distortions that performance processes cause in actual speech are doubly difficult to compensate for in language acquisition research because they may be systematically, or nonsystematically, different from those of adults - short term memory may be smaller in capacity or organised in a different way, cognitive schemas may be different, and so on; insofar as these are involved in language performance they affect children differently from adults. "Much of the investigation of early language development is concerned with matters that may not properly belong to the language faculty ... but to other faculties of the mind that interact in an intimate fashion with the language faculty in language use" (Chomsky, 1981b). Cook (1988) distinguishes "acquisition" - the logical problem of how the mind goes from S0 (zero state) to Ss (steady state) - from "development" - the history of the intervening stages, S b S2, and so on. To argue from observation of children's development to the theory of acquisition means carefully balancing all: these possibilities. Linguists are frequently struck by the child's presumed difficulties in dealing with primary linguistic data; their own difficulties in deriving a representation of grammatical knowledge from samples of children's performance are not dissimilar, or indeed worse since children's sentences are more deficient than the fully grammatical sentences spoken by caretakers (Newport, 1976). So the child saying Slug coming may be suffering from particular production difficulties shared by adults or from specific deficits in areas that have not yet developed, say the articulatory loop in working memory (Baddeley, 1986). The apparent syntax of the sentence may be different from the child's competence for all sorts of reasons. Observational data thus raise two problems related to performance; one is the distortion resulting from the systematic or accidental features of psychological processes; the other is the compounding effect of the development of the child's other faculties. For observational data to be used in a UG context, eventually these distortions need to be accommodated within a developmental framework that includes adequate accounts of the other faculties involved in the child's language performance, which, needless to say, does not yet exist. Furthermore, observational data of children's speech are still only evidence for production rather than comprehension,

Observational data and the UG theory of language acquisition

37

the two processes being arguably distinct in young children (Cairns, 1984).

4. REPRESENTATIVENESS OF OBSERVATIONAL DATA

Let us now turn to some methodological issues with observational data. Taking the step from single-sentence evidence to E-language performance evidence incurs several obligations. The first is to take a reasonable sample of data. Any linguistics book or article in the I-language tradition assumes that it is meaningful to discuss people's knowledge of John is easy to please or That he won the prize amazed even John or John caused the book to fall to the floor, it is beside the point whether such sentences have ever been uttered, provided that they reflect the knowledge of a native speaker. An actual child's sentence, however, is not concocted by a linguist to illustrate a particular syntactic point and so removed from processing constraints, discourse connections, and so on; nor can it be checked against the speaker's intuitions about his grammatical knowledge. Because of the deficient nature of children's language, an example or two can probably be found of almost any syntactic possibility one cares to name, say Verb Subject order as in Comes a mummy reported in Cook (in progress) or apparent structure-dependency violations as in What does sheep make a noise? (Cook, 1988). Such isolated examples are not sufficient for E-language analysis, nor are arbitrary lists or collections of sentences; data need to be typical of the child and of children in general if they are to be of value, because of the many other causes that may be involved. As Elanguage data, a single sentence such as Comes a mummy might be a performance slip or a garbled attempt at a nursery rhyme or a genuine reflection of competence. A sample of sentences is required to rule out the accidental or non-accidental but non-linguistic effects of performance. Again for E-language data a reasonable sample of children need to be included; it prejudges the universality issue if arguments are based on observational data from a few children or from one child. So transcripts of actual children's speech must include a reasonable sample of children and a reasonable sample of speech for each stage of each child. They need to be based on a consistent analysis and on the same sample of children, to eliminate possible regional, social, and individual differences, as discussed in Bennett-Kastor (1988); a good example of such research is seen in Stromswold's study of some 16000 examples of sentences with wh-words from 12 children (Stromswold, 1988). Information on the relevant makeup of the children should be explicitly stated, once the initial step into observational data has been taken. All of these requirements are beside the point for single-sentence evidence and for the poverty of the stimulus argument. They are however perfectly plausible demands on E-language

38

Vivian Cook

evidence. A requirement for combining the I-language approach to knowledge with the use of E-language data is the quantification of aspects of children's sentences; frequency of occurrence needs to be counted, calculations need to be made, and statistical reliability becomes an issue. As Bloom et al (1976, p.34) put it, "if structural features occur often enough and are shared by a large enough number of different multi-word utterances, then it is possible to ascribe the recurrence of such regular features to the productivity of an underlying rule system ...". The key words when handling E-language data are "often enough" and "large enough". A single sentence could be highly revealing of the child's grammar; but, because of the possibility that any single spoken sentence is an isolated freak due to memory problems, rote repetition, discourse constraints, or any of a host of other factors, the UG analyst using E-language data has a duty to put them in the context of a broader picture of children's speech. This is not to say that frequency of occurrence is crucial; it is clearly irrelevant to single-sentence I-language evidence. But E-language evidence needs to be safeguarded by showing that the data reflect some general property of the child's knowledge rather than being one-off instances.

5. OBSERVATIONAL DATA A N D ADULT PERFORMANCE

A major point also concerns what data from children should be compared with - adult competence or adult performance? The significant paper by Radford (1986), for instance directly compares two sets of sentences, one consisting of actual child performance such as That one go round, the other of bracketed adult versions such as Let [that one go round], as if they were the same type of data (pp. 10-11). It is difficult to offer children's E-language data as evidence for their knowledge of language without comparing them with E-language data from adults. A comparison of observational data from children with single-sentence evidence from adult competence begs many questions. Once it is conceded that adult performance needs to be used, a range of phenomena must be taken into account that GB syntax has mostly excluded. Let us take the pro-drop parameter as an illustration. The main criterion for a pro-drop language is the absence of certain subjects in declarative sentences. In her important work with pro-drop Hyams (1986) found that children from three different language backgrounds have null-subject sentences; she regarded this as confirmation of an initial pro-drop setting, later rephrased as [+uniform] morphology (Hyams, 1987). However, adult E-language data for English reveal that subjects are often omitted in actual speech and writing, usually at the beginning of the sentence. Taking a random selection of sources, Can't buy me love and Flew in from Miami Beach come from well-known song

Observational data and the UG theory of language acquisition

39

lyrics; the opening pages of the novel The Onion Eaters (Donleavy, 1971) contain Wasn't a second before you came in, Must be ninety now, and Hasn't been known to speak to a soul since anyone can remember, a column writer in The Weekend Guardian with the pseudonym "Dulcie D o m u m " typically uses sentences such as Drive to health food shop for takeaway, In fact might be too exciting, and Replies that it's in my desk drawer (Domum, 1989) - indeed in this article some 34 out of 68 sentences have at least one null-subject; an anecdote in Preston (1989) concerns a prescriptively oriented teacher denying that she uses gonna "Ridiculous" she said; "Never did; never will". Adult speakers of English appear to use null-subject sentences, even if they only utilise them in certain registers and situations. So, if the performance dimension of variation between styles of language is taken into account, null-subjects may be expected to appear in children's performance and children may also be expected to have encountered them in some forms of adult speech. But also, given the many ways in which children are different from adults, an argument based on observational data has to explore the alternative developmental explanations that might cause something to be lacking from their speech. One explanation might indeed be a more frequent use by children of some performance process that the initial elements or elements in the sentence can be omitted - a clipping of the start of the sentence - which creates the illusion of pro-drop among other effects. A counterargument is that the null-subject is not always initial and hence not a product of utterance-initial clipping; however in a sample of children's language discussed in Cook (in progress) only 3 out of 59 null-subject examples had non-initial null-subject. Another explanation might be the "recency" effect whereby children pay attention chiefly to the ends of sentences (Cook, 1973), thus being more likely in SVO languages to omit subjects than objects and hence giving the illusion of null subject sentences. Hyams (1986) presents the counterargument that, at the same time as children produce subjectless sentences, they also produce ones with subjects, so that the lack of overt subjects is not a memory limitation; while this may well be true, the use of null-subject sentences by English-speaking adults is equally not a product of memory limitations. A further explanation might be found in the type of subject that is missing. Children may leave out some first person subjects because they feel they are not needed, and this might be a cognitive universal; Sinclair and Bronckart (1971) suggested that at a certain period children see themselves as the implicit subject of the sentence; Halliday (1985) sees first person subjects as the most prototypical form. According to Hyams (1986, p.69), however, "the referent of the null-subject is not restricted to the child himself'. Yet in the same sample of sentences some 39 out of 59 null subjects were apparently first person; a high proportion, though not all, of children's null-subject

40

Vivian Cook

sentences could be attributed to this source. To decide between these conflicting explanations would entail crosslinguistic evidence based on large samples of non-first person null-subject sentences. The general point is that sheer observation of forms in children's language is susceptible to several explanations other than language knowledge per se. Other evidence is required to decide between the explanation based on adult E-language data, the explanations based on performance and cognitive development, and the linguistic UG explanation. The correctness of the pro-drop explanation cannot be uniquely shown from Elanguage observational data without in some way examining the other explanations within a larger framework of children's development and of language use by adults. If E-language performance by children is compared directly with adult E-language data rather than with adult I-language knowledge, an apparent peculiarity of the child's language may be shown to be a fact of performance shared by adults but not taken into account in the standard descriptions of grammatical competence within GB theory. Adult users of a language tend to be regarded as grammatically perfect without seeing that they are subject to the same factors of performance as children: some adults, on some occasions, do produce null-subject sentences in English. In one sense this claims simply that alternative explanations can be found for apparently linguistic phenomena; since the alternative proposals are not precisely formulated, what is wrong with accepting the linguist's account? But in the early stages of child speech this must be faut mieux; until there is an overall theoretical framework that encompasses the diverse aspects of children's development, it will not be possible to extract language acquisition from language development by means of observational data alone.

6. EVIDENCE OF ABSENCE

The other problem is what counts as observational data. Sherlock Holmes once drew attention to the behaviour of the household dog during a burglary; Watson pointed out that there wasn't any behaviour since the dog had not even barked; Holmes showed the very absence of barking implicated one of the people in the household since otherwise the dog would have barked at a stranger. Absence of something that might be expected to occur may in itself be relevant evidence. Both Hyams (1986, 1987) and Radford (1986,1988) make extensive use of evidence of children's non-performance. The major absence discussed by Hyams (1986) is the subject of the sentence but she also talks about the absence of auxiliaries and of inflections. Similarly there are twelve summary statements in Radford (1986) to show that children's grammars lack Inflection and Complement

Observational data and the UG theory of language acquisition

41

Phrases; eight claim that children "lack" or "may lack" Complementisers, infinitival to, Modal Auxiliaries, Tense, Agreement, nominative subjects, overt subjects, and VP; two statements suggest children "have n o " inverted Auxiliaries and preposed wh-phrases; one states that "Child independent clauses may be nonfinite" (i.e. they may lack a finite verb); only one is phrased positively - "Child clauses have particle negation" - which partly implies that their sentences lack auxiliaries. All the arguments except one come down to the absence of elements from children's sentences. Doubtless there are problems of phrasing here; some facts could be equally expressed positively or negatively. But, whatever the phrasing, the point applies to the type of evidence that is used - evidence of positive occurrence or evidence of absence. Arguing from absence is intrinsically problematic. If the possibilities of occurrence are circumscribed so that a precise prediction can be made as to what will be missing, then it seems perfectly acceptable. Providing a list of the objects that are on my desk at the moment is easy; compiling a list of what is not on my desk is difficult and eventually means listing all the absent objects in the universe, unless there are precise expectations of what should be on desks. Evidence of absence is hard to interpret when alternative explanations exist. After all the dog might have been drugged, or chasing a rat, or just fast asleep. When the main evidence for acquisition is what children can't do or the evidence for their knowledge of language is what they don't produce, interpretation becomes difficult if the predictions are not tightly constrained. In a sense the object of Universal Grammar is to specify the constraints on what might be expected in human language, to say what cannot occur. The difficulty in applying evidence of absence to observational data is that at the early stages of children's language many things are absent and the child is deficient in many areas: many explanations in cognitive and performance terms could be therefore advanced for things that are absent. Features that are observed to occur can be used directly as evidence; features that are not found are ambiguous if there are many of them and if they are susceptible to explanations other than in terms of grammatical competence. The child has sometimes been referred to as a "mini-linguist"; let us reverse the metaphor for a moment by talking about the linguist as a "maxi-child". The problem that the child faces is deriving a grammar only from positive evidence; negative evidence in the shape of sentences that don't occur plays a less prominent role. It has been argued, for example by Saleemi (1988), that it should only figure as ancillary support for ideas already vouched for by positive evidence. The linguist attempts to derive the grammar of children from observational data that constitutes evidence for what occurs. The strongest requirement is that the maxi-child should keep to positive evidence alone; in this case what doesn't occur in children's

42

Vivian Cook

speech should only support ideas already vouched for by observational data or by other types of evidence, such as comprehension tests. This is not to deny that absence of a precisely defined property is not a valid form of observational data; the dog after all didn't bark. Evidence of absence is however a tainted source in the early stages of children's acquisition since virtually anything in adult grammar could be argued to be missing; such indirect evidence needs support from other evidence of one kind or another. One counter-argument is that evidence of absence maintains the standard generative use of what people don't say as a form of evidence; the use of starred sentences is taken for granted as a normal and legitimate form of argumentation and is shorthand for saying these sentences are not found. The constraints of UG on the mind such as structure-dependency are typically shown by demonstrating that people who have never heard an example of a structure-dependent sentence nevertheless reject it on sight. But this still constitutes single-sentence I-language evidence: is this sentence generated by the grammar of the speaker? It can be backed up with experimentally gained grammaticality judgements or comprehension tasks for many sentences or many people, though this raises the difficulties of experimental method, as seen in Carroll et al (1981). Shifting to observational data, there are indeed particular sentences that people do not use; one might point to the total lack of structure-independent sentences in some vast corpus. But is this the same as observing that there are aspects of the sentence that children leave out? The prediction is of a different kind; evidence of absence cannot be justified simply by the single-sentence appeal to what people do not say.

7. CORRELATIONS WITHIN OBSERVATIONAL DATA

Finally, as has already been seen, analyses depending on observational data frequently employ intuitive concepts of simultaneity and correlation; because X occurs at the same time as Y it must be the same phenomenon or indeed it must cause Y. Hyams (1987) for instance argues that "the child who allows null-subjects must also be analysing his language as morphologically uniform" based on evidence such as "the acquisition of the present and past tense morphemes coincides with the end of the nullsubject stage in English". One problem is the matter of defining simultaneity and, conversely, sequence: what counts as the same time? what counts as a sequence? what does "coincide" actually mean? the same day, the same month, the same stage, or what? Statements that forms coexist or that they occur in a sequence mean little without an explicit framework for measuring time and sequence: developmental research needs a clock.

Observational data and the UG theory of language acquisition

43

Furthermore, if data from more than one child are being used, it is necessary to define coexistence in terms of chronological age, mental age, MLU, LARSP, grammatical stages or whatever developmental schedule one prefers: developmental clocks need to be set to the same standard time if comparisons are to be made. Secondly, there is the question of how different aspects of behaviour correlate within a stage. Ingram (1989) finds four meanings for "stage" when considered in terms of a single behaviour and four more when considered in terms of two behaviours;in his terms, much of the UG related research goes beyond the simple "succession" stage to the "co-occurrence" stage where two behaviours occur during the same timespan or the "principle" stage in which a single principle accounts for diverse forms of behaviour. In terms of observational data, given that many forms occur or don't occur at the same stage, how can it be shown which correlate with each other and which don't? All the forms present at the same stage correlate in the sense that they coexist and are part of the same grammar; what are the grounds for believing some are more closely related than others? There may be entirely independent reasons why two things happen at the same time; a paradigm statistical example showing correlation is not causation is the clocks in town all striking twelve simultaneously. Correlation is more problematic when it relies on absence rather than presence of forms. The Hyams (1986) analysis depends on a link between null-subject sentences and lack of expletive subjects; the Hyams (1987) analysis depends on a link between null-subject sentences and lack of inflections. UG can predict grouping of precise features missing from children's language and their absence can be correlated closely; but, if their absence coincides with large numbers of other absent features, the validity of such a correlation becomes hard to test; why should any two pairs of missing bits of the sentence be more related than any two other missing bits? Early children's language is difficult for observational data because it is so deficient: arguing from absence provides too unconfined a set of possibilities for correlation.

8. GENERAL REQUIREMENTS FOR OBSERVATIONAL DATA IN UG RESEARCH

Let us sum up some of the requirements for observational data in a series of statements: - development is not acquisition. Any data from observation of children must be related to the other processes and faculties involved in speech production, i.e. to performance. Sentences from actual children's speech do not have a unique explanation in terms of linguistic competence, as

44

Vivian Cook

single-sentence evidence may have; alternative explanations from nonlinguistic sources have always to be taken into account. At the moment, because of the unknown aspects of the developmental process, the linguistic explanations have to be considered defaults to be kept or modified when a broader framework is available. - E-language data must be representative. Because of the distortions of actual performance, isolated sentences or small numbers of sentences cannot be trusted. Statements can never be based on isolated sentences, simply because it is uncertain how accidental these may be. Ways of balancing E-language corpora must be employed to ensure representativeness across children in general and across the speech of one child in particular; tests of statistical significance can be employed. Without this the data may illustrate a point but cannot be trusted for firmer conclusions. - like must be compared with like. The sentences produced by children cannot be compared directly to the I-language single sentences used in linguistic theories but should be compared with the equivalent adult productions. That is to say, performance sentences should not be compared with idealised example sentences, again a familiar point rephrasing Chomsky (1965, 36), "... one can find out about competence only by studying performance, but this study must be carried out in clever and devious ways, if any serious result is to be obtained". Children's performance should be compared with adults' performance rather than with adults' competence as shown in single-sentence evidence. - evidence of presence is preferable to evidence from absence. In an Elanguage account the first responsibility is to what actually occurs. Evidence from absence is intrinsically open to many interpretations; it loses some of its strength in early children's speech because of the multiplicity of plausible nonlinguistic reasons for children's deficiencies. - correlations should chiefly correlate positive data. Correlations between non-occurring data should be treated with caution in early children's observational data, again because of the general deficiencies of the child's speech. The main conclusion to this paper is that the use of observational data within the UG theory of language acquisition must always be qualified; such data should be treated as showing the interaction of complex performance processes that are themselves developing. The work with observational data by Hyams, Radford, and others has provided a tremendous revitalisation of the UG theory in recent years; greater discussion

Observational data and the UG theory of language acquisition

45

should take place on the methodological status of observational data within a basically I-language theory. Observational data are one of the many sources of evidence that are available outside the bounds of the poverty of the stimulus argument. This paper has concentrated on the pitfalls and possibilities of this form of data. Other techniques of investigation have their own pros and cons - grammaticality judgements, elicited imitation, act-out comprehension techniques, and so on - some of which are described in Bennett-Kastor (1988); many of them could complement observational data; for example the pro-drop explanation might be preferred over its alternative because of evidence from experiments with comprehension or elicited imitation of VS or null subject sentences. We should not be disappointed if one source of evidence is not in itself sufficient. As Fodor (1981) pointed out, a scientific theory should not confine itself to a certain set of facts but "any facts about the use of language, and about how it is learnt... could in principle be relevant to the choice between competing theories". Together these alternative forms of evidence complement the basic appeal to the poverty of the stimulus used by the UG theory; in the Feyerabend view of science multiple approaches to the same area should be explored simultaneously (Feyerabend, 1975). Observational data is one useful tool in our kit, provided it is used with appropriate caution and supplemented with other tools when necessary.

REFERENCES Baddeley, A. D. 1986. Working Memory. Oxford: Clarendon Press. Bennett-Kastor, T. 1988. Analyzing Children's Language. Oxford: Blackwell. Bloom, L., P. Lightbown and L. Hood. 1975. Structure and Variation in Child Language. Monographs of the Society for Research in Child Development Serial No 160, Vol 40, No 2. Brown, R. 1973. A First Language: The Early Stages. London: Allen and Unwin. Cairns, H. S. 1984. Current issues in research in language comprehension. In R. Naremore (ed.) Recent Advances in Language Sciences. College Hill Press. Carroll, J. M.,T. G. Beverand C. R. Pollack. 1981. The non-uniqueness of linguistic intuitions. Language 57. 368-383. Chomsky, N. 1965. Formal discussion: the development of grammar in child language. In U. Bellugi and R. Brown (eds.) The Acquisition of Language. Purdue University, Indiana. Chomsky, N. 1980. On cognitive structures and their development. In M. Piattelli-Palmarini. 1980 (ed.) Language and Learning: the Debate between Jean Piaget and Noam Chomsky. London: Routledge Kegan Paul. Chomsky, N. 1981a. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. 1981b. Principles and parameters in syntactic theory. In N. Hornstein and D. Lightfoot (eds.) Explanations in Linguistics. London: Longman. Chomsky, N. 1986a. Knowledge of Language: Its Nature, Origin and Use. New York: Praeger Chomsky, N. 1986b. Barriers. Cambridge, Massachusetts: MIT Press.

46

Vivian Cook

Chomsky, N. 1988. Language and Problems of Knowledge: The Managua Lectures. Cambridge, Massachusetts: MIT Press. Cook, V. J. 1973. The comparison of language development in native children and foreign adults. IRAL XI/1. 13-28. Cook, V. J. 1988. Chomsky's Universal Grammar: An Introduction. Oxford: Blackwell. Cook, V. J., in progress. Universal Grammar and the child's acquisition of word order in phrases. Crain, S. C. and I. Nakayama. 1983. Structure dependence in grammar formation. Language 63. 522-543. Domum, D. 1989. Plumbing the bidet depths. The Weekend Guardian. 11th April, p . l l . Donleavy, J. P. 1971. The Onion Eaters. London: Eyre and Spottiswode. Dulay, H. C. and M. K. Burt. 1973. Should we teach children syntax? Language Learning 23. 245-258. Feyerabend, P. 1975. Against Method. London: Verso. Fodor, J. A. 1981. Some notes on what linguistics is about. In N. Block (ed.) Readings in the Philosophy of Psychology. 197-207. Halliday, M. A. K. 1985. An Introduction to Functional Grammar. London: Edward Arnold. Hyams, N. 1986. Language Acquisition and the Theory of Parameters. Dordrecht: Reidel. Hyams, N. 1987. The setting of the null subject parameter: a reanalysis. Paper presented to the Boston University Conference on Child Language Development. Ingram, D. 1989. First Language Acquisition. Cambridge: Cambridge University Press. Lust, B., J. Eisele and N. Goss (in prep.). 'The development of pronouns and null arguments irfxhild language', Cornell University. Matthei, E. 1981. Children's interpretation of sentences containing reciprocals. In Tavakolian (ed.), 58-101. Morgan, J. L. 1986. From Simple Input to Complex Grammar. Cambridge, Massachusetts: MIT Press. Newport, E. L. 1976. Motherese: the speech of mothers to young children. In N. Castellan, D. Pisoni, and G. Potts (eds.) Cognitive Theory, vol 2. Hillsdale: Erlbaum. Preston, D. R. 1989. Sociolinguistics and Second Language Acquisition. Oxford: Blackwell. Radford, A. 1986. Small children's small clauses. Bangor Research Papers in Linguistics 1. 1-38. Radford, A. 1988. Small children's small clauses. Transactions of the Philological Society 86. 1-43. Saleemi, A. 1988. Learnability and Parameter Fixation. Doctoral Dissertation, University of Essex. Sinclair, H. and J. Bronckart. 1971. SVO a linguistic universal? Journal of Experimental Child Psychology 14. 329-348. Stromswold, K. 1988. Linguistic representations of children's wh-questions. Papers and Reports in Child Language 27. Wells, C. G. 1985. Language Development in the Preschool Years. Cambridge: Cambridge University Press.

Parameters of Metrical Theory and Learnability* Michael Hammond University of Arizona

Let us take Universal Grammar (UG) to denote the innate predisposition a speaker has to deduce grammars within a certain range on presentation of appropriate data. The criterion of learnability is the requirement that there is an algorithm by which a learner can deduce any grammar in the set licensed by UG on presentation of some finite and positive set of data (Wexler & Culicover, 1980). It is important to investigate this conception of UG because it is, in principle, possible to construct theories of UG that do not satisfy the criterion of learnability. Such theories of UG are unacceptable, as the goal of the investigation of UG is a theory that can explain how it is that people acquire adult grammars (i.e. learn languages). The criterion of learnability has not often been applied to the domain of phonology. 1 This is unfortunate as, in at least some domains of phonology, the structure of UG is much clearer than in other areas of grammar. The criterion of learnability can be applied successfully only when we are relatively confident about the nature of the theory of UG. Metrical theory is a domain of linguistic theory where the structure of UG is relatively clear. While there is still considerable debate over a number of issues in the theory, its general character is abundantly clear from the consensus that can be gleaned from recent alternative formulations, e.g. Halle (1989), Halle and Vergnaud (1987), Hammond (1984/1988, 1986, 1990b), Hayes (1981, 1987, 1989), Levin (1990) et cetera. In this paper, the criterion of learnability will be applied to metrical theory. Several surprising and valuable results follow from this exercise. First, it is shown that metrical theory does NOT satisfy the criterion of learnability per se. In other words, the grammars licensed by UG are not all reachable by learners. Second, it is argued that a hitherto unexplained fact about the parameters of metrical constituent construction finds explanation only when the criterion of learnability is invoked. Third, it is suggested that the metrical components of natural languages are a function of the interaction of UG and a constraint on short-term memory. The organization of this paper is as follows. First, the structure of metrical theory is briefly reviewed. Second, the hypothesis is presented that metrical systems are all learnable on the basis of words of seven syllables or less.

48

Michael Hammond

Third, an argument for the seven-syllable hypothesis is presented from the asymmetric number of options employed in metrical constituency at different levels of the metrical hierarchy. Finally, an explanation for this asymmetry is presented and the consequences are discussed.

1. METRICAL THEORY

As noted above, there are a number of versions of metrical theory being debated in the literature. However, all current versions share a number of properties. First, all versions of metrical theory, past and present, share the claim that stress should be represented hierarchically. In other words, stress patterns should not be represented in terms of a linear feature [stress]; rather, stress is encoded by structural relationships in a treelike representation. In (1), the stress pattern of Apalachicola is indicated using the linear stress feature of Chomsky and Halle (1968) and in terms of a metrical representation. 2 The linear feature encodes stress in terms of the numerical values associated with the different vowels. A [lstress] indicates primary stress; [2stress] indicates secondary stress; and [Ostress] indicates stresslessness. The metrical representation encodes degree of stress in terms of the height of the different columns dominating the vowels. The metrical representation also indicates grouping relationships with parentheses. The metrical representation claims that stressed syllables are grouped with neighbouring stressless syllables in particular ways. This claim is not made by the linear representation and is amply supported by the metrical literature.3 (1)

x (x x x) (x x) (x x)(x x) Apa lachicola 2 0 2 010

level 2 level 1 level 0 [stress]

word tree feet syllables (metrical) (linear)

All current versions of metrical theory include analogues to each of the components in (2): (2)

a. b. c. d. e. f.

constituents, directionality, iterativity, extrametricality, destressing, scansions/levels.

Parameters of Metrical Theory and Learnability

49

All theories include a set of constituents (2a). The constituents in (1) are binary and left-headed. (The stress occurs on the left side of the disyllabic unit.) However, there are other kinds of constituents as well. For example, there are binary right-headed constituents as well (e.g. in Aklan per Hayes 1981). Theories differ in how many constituent types they allow and in the precise properties of those constituents. All theories include a parameter of directionality (2b). Are constituents assigned from left to right or from right to left? The directionality parameter has an effect in polysyllabic words with an odd number of syllables. All theories include some mechanism to deal with superficial iterativity (2c). Are constituents constructed iteratively, filling the span with stresses, or is only a single constituent built, placing a stress at or near one of the peripheries of the domain?4 All theories include some analogue to the mechanism of extrametricality (2d). This allows a peripheral syllable or higher-level constituent to be excluded from metrification. Metrical theory also includes some subsequent operations that will be included under the rubric of "destressing" here (2e). Such rules manipulate the metrical structure assigned by the parameters discussed above. In this paper, some of the results concerning destressing rules of Hammond (1984/ 1988) will be assumed. First, destressing rules may only remove stresses. Second, stresses may only be removed to resolve stress clashes. Last, the main stress of a domain may not be removed. Finally, all versions of metrical theory include levels and scansions (2f). These are discussed in section 4 below. There are other aspects of metrical theory which are not discussed here, e.g. cyclicity, exceptions, and the relationship between segmental rules and metrical structure. Space limitations preclude an adequate treatment of these. It is expected that including them would not alter the results arrived at here.

2. LEARNABILITY

In this section, the criterion of learnability is reviewed. As noted above, the criterion of learnability requires that any grammar licensed by UG be reachable by some algorithm from data the child might be exposed to. If no constraint is imposed on the data learning proceeds from, then the criterion of learnability is vacuous. For example, if we hypothesise that the set of grammars licensed by UG is enumerable, then it need only be assumed that the data include an explicit statement about which grammar is selected. This is obviously an unreasonable picture of the kind of data children are exposed to.

50

Michael Hammond

There are two minimal assumptions about the kinds of data that children are exposed to that are accepted here. The first is that learning proceeds on the basis of positive evidence (but cf. Saleemi, this volume). That is, it is normally assumed that children are not systematically corrected for ill-formed utterances (Brown and Hanlon, 1970). (3)

Learning is based on positive evidence.

A second assumption that is often made and that will be adopted here is that learning proceeds on the basis of a presentation of a finite set of data. This is a natural consequence of the assumption that speakers do actually come up with a grammar at some point and that the time up to that point is finite. (4)

Learning is based on finite evidence.

From these two assumptions it is possible to show that certain possible theories of U G satisfy the criterion of learnability and others do not. Wexler and Culicover (1980) present the following schematic example of a theory of UG that does not satisfy the learnability criterion. Imagine a theory of phrase structure that licenses the following (infinite) set of grammars. Assume, for simplicity, that " a " is the only terminal symbol, or word, in all the candidate grammars. The grammar H 0 licenses sentences of any length. The grammars denoted by H, license strings up to i words in length. (5)

i. ii. iii. iv. v.

H0 H, H2 H3 H,

= = = = =

{a,aa,aaa } {a} {a,aa} {a,aa,aaa} {a,aa,...,a 3 or i — 0. If, in such a case, the learner selects H 0 , then there is no context where the learner would ever be prompted to select a grammar H„ where i ^ 0. On the other hand, if the learner chooses some grammar H„ where i # 0, say H 3 , then there is no context in which the learner would select H 0 . Either way, some grammar is unreachable and the theory of UG does not satisfy the criterion of learnability. Wexler and Culicover (1980) deal with this problem by constraining the theory of UG so that it does not have the crucial properties of the example in (5).5 Specifically, they suggest that the theory of transformations,

Parameters of Metrical Theory and Learnability

51

which is the domain of grammar they are concerned with, is subject to the constraints listed in (6) below. (6)

a. b. c. d. e.

Binary Principle Freezing Principle Raising Principle Principle of No Bottom Context Principle of the Transparency of Untransformable Base Structures

This approach is a good one if it can be shown that constraints on UG like those in (6) are desirable. In this paper, another tack will be taken. The basic idea is that the criterion of learnability is too strong. It requires that all grammars licensed by UG be learnable, which makes the implicit assumption that all grammars licensed by UG can occur. Here, the possibility is suggested that there are grammars licensed by UG that are non occurring and that their non occurrence should be accounted for not by restricting UG, but by constructing a learning algorithm that does not allow the learner to reach the non occurring grammars. Schematically, the two approaches are compared in (7). In (7a), Wexler and Culicover's approach is schematised. There is a theory of UG that licenses three grammars and excludes a fourth and fifth. The learning algorithm, applied to that theory of UG, allows all three graihmars licensed to be learned. In (7b), there is a theory of UG that licenses four grammars and excludes a fifth. The learning algorithm, applied to this latter theory of UG only allows three of the four grammars licensed to be learned. Under (7a), the theory of UG does all the work; under (7b), the theory of learning does some of the work. (7)

a. b.

UG-{G,,G2,G3} LA (UG) = { G „ G 2 , G 3 } UG — {Gi, G 2 , G 3 , G 4 } LA (UG) = {G„ G 2 , G 3 }

(*G 4 , *G 5 , ....) (*G 5 ,....)

Where should the work be done? The answer depends on the character of the theories of learning and UG that result. For example, if excluding G 4 in the learning algorithm would vastly complicate the learning algorithm, but excluding G 4 in UG would only slightly complicate UG, then G 4 should be excluded by UG. If, on the other hand, excluding G 4 in UG would overcomplicate UG, but excluding it in the learning algorithm would be relatively minor, then G 4 should be excluded by the learning algorithm. In the next two sections, a case is presented that would seem to be best accounted for in terms of the approach in (7b).

52

Michael Hammond

3. THE SEVEN-SYLLABLE HYPOTHESIS

This paper is an interim report on a larger project developed in Hammond (1990b). Here, only the basic hypothesis is presented. The basic hypothesis is that all occurring stress systems licensed by the theory of metrical phonology can be learned on the basis of words of seven syllables or less. (8)

Seven-syllable hypothesis: All occurring metrical systems can be learned on the basis of words of seven syllables or less.

The word "occurring" here is crucial to the argument for (7b) above. It will be shown that while all occurring metrical systems are learnable on the basis of words of seven syllables or less, the larger set of metrical systems licensed by UG is not. This distinction will form the centrepiece of the argument for (7b). A proof can be constructed if any two existing metrical systems can be distinguished on the basis of words with n syllables (where n < 8). Compare, for example, the following two systems. In Language I, a simplified version of English, trochaic feet insensitive to syllable weight are constructed from right to left. The rightmost foot is elevated to main stress and adjacent stresses are resolved by removing one of the clashing stresses. In Language II, a simplified version of Lenakel (Hammond, 1986, 1990b), one trochee is built from the right and then as many as possible are built from the left. Again, adjacent stresses are resolved by destressing. In both languages, destressing operates in a familiar fashion. The second of two adjacent stresses is removed unless it is the main stress. Otherwise, the first is removed. The patterns produced in words of different lengths are diagrammed with schematic words in (9). Notice how the two patterns only become distinct in examples of at least seven syllables in length. (9)

language I a aa aaa aaaa aaaaa aaaaaa aaaaaaa

language II a aa aaa aaaa aaaaa aaaaaa aaaaaaa

The comparison in (9) shows that the two systems considered require that

Parameters of Metrical Theory and Learnability

53

learners be exposed to words of at least seven syllables in length. Hammond (1990b) shows that all occurring systems licensed by metrical theory can be distinguished on the basis of words of no more than seven syllables. Notice that the facts of (9) necessitate a seven-syllable minimum regardless of how the learner traverses the search space. If both of the grammars in (9) occur, and if they truly are indistinct for words of less than seven syllables, then to reach both grammars, a learner must have access to words that would distinguish the analyses. Thus the seven-syllable hypothesis is independent of how learners actually learn. How learners set parameters is irrelevant to the fact that words of at least seven syllables are necessary to have access to the distinctions that are necessary to deduce the correct grammar.

4. LEVELS AND OPTIONS

As indicated in (2), all versions of metrical theory include some statement about the number of options that occur in the metrical hierarchy. Here, it is shown that the number of options available at different levels of the hierarchy can be explained by an extralinguistic constraint on learning. Metrical theory allows at most three levels of metrical constituency within words: feet, cola, and word tree. (10)

x (x x x) (x x ) (x x ) (x) (x x) (x x) (x x) (x x) (x) aa aa a a aa a

level level level level

3 word tree 2 cola 1 feet 0 syllables

W C F s

In principle, one might expect to find the same options and parameters available at each level of the hierarchy, but, in fact, that does not occur. At each successively higher level, the number of options available decreases. This fact finds an explanatory solution only when one looks to learnability concerns. In (11), the options for foot construction are diagrammed. 6 (11)

a. b. c. d. e. f.

all licensed constituents; directionality; iterativity; extrametricality; destressing; multiple scansions.

54

Michael Hammond

All the options presented in (2) are included in (11). In (12), the possibilities for colon construction are given. (12)

a. b. c. d. e. f.

only one constituent: [[F] F]; directionality; iterativity; F-extrametricality; no F-destressing; one scansion maximum.

Here, there are fewer possibilities. For example, there seems to be only left-headed binary cola. 7 All languages that exhibit cola exhibit left-headed binary cola, e.g. in Tiberian Hebrew, Passamaquoddy, Hungarian, Odawa, etc. Moreover, no language exhibiting cola exhibits more than one scansion of cola. The fact that these additional options are not available at the colon level has no explanation within any current version of metrical theory. At the word tree level, the options are even more restricted. (13)

a. b. c. d. e. f.

only two constituents: [[x]...] or [...[x]]; no directionality; no iterativity; no C-extrametricality; no C-destressing; one scansion maximum.

Again, this absence of the full power of metrical theory at the word tree level is unexplained in all versions of metrical theory. 8 There are two ways to go about rectifying this lack of explanation in metrical theory. One possibility might be to alter the theory in some radical fashion so as to preclude these options at higher levels of the hierarchy. This approach is problematic in two ways. First, it would result in a rather "numerological" version of metrical theory. The options can only be excluded by brute force and the resulting theory does not have a desirable character. The second problem is that excluding these options would be unexplanatory. That is, altering U G directly would miss an important generalization about the nature of the restrictions outlined in (11), (12), and (13). Specifically, the restrictions on options available at each level are directly related to the fact that words of seven syllables or less are sufficient to distinguish all occurring stress systems. If the same number of options were available at each level, then the seven-syllable hypothesis could not be maintained. As a demonstration of this, let us consider several possible enrichments of the system in (11), (12), and (13).

55

Parameters of Metrical Theory and Learnability

Consider what would happen to the seven-syllable limit if cola could be constructed bidirectionally (12f). In addition to systems that constructed, e.g. cola from right to left, there would also be systems where one colon was constructed from the right, and then as many as possible from the left. Assuming that those cola were constructed over, e.g. trochaic feet, more then seven syllables would be required to distinguish them, as diagrammed in (14). Such systems would only be distinct in words of at least nine syllables. (14)

right-to-left: x x (x ) (x X ) (x x) (x x) (x x) aa aa aa

bidirectional: x x (x X ) (x ) (x x) (x x) (x x) aa aa aa

x (x x) (x x) (x) aa a

x (x x) (x x) (x) aa a

To explicitly exclude the possibility of bidirectional cola from UG would be a mistake as it would not capture the generalization that this exclusion is directly tied to the seven-syllable maximum. As a second example, consider what would happen if the set of constituents in (13a) were expanded to include, say, an iterated trochee. As shown in (15), such a system would only become distinct from a leftheaded word tree when words of at least nine syllables are considered. (15)

a.

b.

c.

x (x X x) (x ) ( x X )(x (x x) (x x) (x x) (x x) aa aa aa aa x (x X x) (x ) (x X ) (x (x x) (x x) (x x) (x x) aa aa aa aa x x (X

X

x) (x) a

word tree = [fx]...]

x) (x) a word tree = [...[x]]

) (X)

(x ) (x X ) (x x) (x x) (x x) (x x) (x x) (x) a a a a a a a a a word tree = [[x]x] (R->L) As a third and final example, consider the possibility of colon-extrametricality (13d). If colon-extrametricality were allowed under word tree construction, it would entail also that more than seven syllables be required to distinguish different systems. This is shown in (16)/(17). First, trochees

56

Michael Hammond

are built from left to right. Then cola are built right to left. The two grammars diverge at that point. In the first, the rightmost colon is extrametrical and a right-headed word tree is built. In the other, a leftheaded word tree is built. Figure (16) shows how these systems are indistinct with words of eight syllables (or less); (17) shows how they are distinct in words of nine syllables (or more). (16)

(17)

x (x ) (x X )(x x) (x x) (x x) (x x) (x x) a a a a aa aa x x ) (X ) ( x x )(x) (x x) (x x) (x x) (x x) (x) aa aa aa aa a (x

x x) (x X )(x x) (x x) (x x) (x x) (x x) aa aa aa aa (X

x (x X X ) (x ) ( x x) (x x) (x x) (x x) (x x) (x x) (x) aa aa aa aa a

Thus the seven-syllable hypothesis can explain why fewer options are available at successively higher levels of the metrical hierarchy. Notice that the particular options available at any level does not follow from the seven-syllable restriction. For example, it was argued above that the seven-syllable restriction accounts for why the set of word tree constituents cannot be augmented with an iterated trochee. The sevensyllable restriction does not explain why the word tree constituents are as in (18a), and not as in (18b). In (18a), the actually occurring possibilities are given. In (18b), the left-headed unbounded foot is replaced with an iterated trochee. The number of choices in each system is the same; the particular choices are different. (18)

a.

b.

actual word tree constituents i. [[x]...] ii. [...[x]] possible word tree constituents i. [[x]x], iterated ii. [...[x]]

The constituents of (18b) can also be distinguished in words of seven syllables. Figure (19) shows how (18bi) and (18bii) are distinguishable in words of seven syllables. The examples in (19) are a worst-case scenario where there are also cola.

57

Parameters of Metrical Theory and Learnability (19)

a.

(X

x

b.

x X)

(x x )(x x) (x x) (x x) (x x) (x) aa aa aa a

(X

X)

(x x )(x x) (x x) (x x) (x x) (x) aa aa aa a

Thus an explanation for the specific asymmetries of (11), (12), and (13) in terms of the seven-syllable hypothesis has to be supplemented with something else. That "something else" would appear to be some kind of markedness. Iterated trochees are more marked than [[x]...]. The particular options available at any level are the least marked. The specific details remain to be worked out. To summarise thus far, it has been hypothesized that metrical systems are all distinguishable on the basis of words of seven syllables or less. It has been shown that there is an asymmetric use of the parameters provided by the theory at the different levels of the metrical hierarchy. It has been argued that directly accounting for this asymmetry would result in an undesirable theory because the account would result in an inelegant theory that does not explain the relationship between the restrictions and the seven-syllable hypothesis. The seven-syllable hypothesis predicts that fewer options should be available at higher levels of the metrical hierarchy. Markedness accounts for what specific options are available at those higher levels.

5. SHORT-TERM MEMORY CONSTRAINT

In this section, an explanation for the hierarchical asymmetry discussed in the previous section is proposed. This explanation requires that the criterion of learnability be revised as proposed above. Let us suppose that learning must proceed on words of seven syllables or less. Longer words are learnable, but cannot be used in extracting the metrical system of a language. (20)

Seven-syllable constraint Learners cannot make use of words longer than seven syllables to extract metrical generalizations.

Possible support for this proposal comes from the psychological literature. Miller (1967) discusses a number of psychological results that seem to converge on the conclusion that human short-term memory is basically limited to retaining seven elements (plus or minus two). The proposal made here is that the seven-unit maximum on short-term memory applies to language learning as well.

58

Michael Hammond

The idea is that forms can only be used to learn stress systems if they can be held in short-term memory long enough for the learner to extract the relevant generalizations. Words longer than seven syllables are learnable because short-term memory does not constrain other aspects of acquisition. The hypothesis is given in (21) below. (21)

Limit on short-term memory: Learners cannot make use of forms longer than seven syllables because of a general extralinguistic constraint on the size of short-term memory.

The specific claim is that the nonlinguistic effects Miller discusses are mirrored by a constraint on the learning algorithm for metrical systems. This constraint prevents the learner from paying attention to words of more than seven syllables. This proposal solves both of the problems mentioned above. First, UG is not complicated needlessly. The theory of UG allows all options at all three levels, and the restrictions at higher levels are a function of the fact that the number of options available increases the number of syllables necessary to distinguish the resulting systems. The particular options available are a function of markedness as discussed above. This proposal also solves the second problem. The asymmetry is directly tied to the sevensyllable restriction expressed as (20) or (21). For example, the absence of bidirectional cola follows from the constraint on short-term memory and is explained by it. The alternative tactic of complicating UG does not connect the absence of bidirectional cola with the seven-syllable limit at all. Finally, this proposal is more general in that the seven-unit effect is expected to have extralinguistic consequences, just as Miller shows. In order to maintain this explanation, several aspects of the proposal must be fleshed out. First, unlike some of the experiments Miller discusses, it looks like the restriction with respect to language learning refers to precisely seven syllables. It does not allow for variation. This is taken as progress in our understanding of short-term memory. Second, unlike the effects Miller discusses, the restriction on short-term memory as it affects language is specific to the unit syllable. In the psychological literature, the particular unit restricted in short-term memory can vary. This is not the case in metrical phonology. The seven-syllable restriction is specific to syllables, and not some other phonological unit, like cola or word trees. This is arguably a consequence of the fact that, while a variety of factors influence how metrical structure is applied, it is always applied to syllables. For example, while syllable weight in a

Parameters of Metrical Theory and Learnability

59

language like English affects metrical structure, that structure is still applied to syllables.9 Third, it might be thought that the seven-syllable limit is excessive as there are many languages, e.g. English, where words of seven syllables or more are vanishingly rare. This is not a problem at all, however. The seven-syllable restriction makes the strong prediction that languages where children are exposed only to relatively short words, must opt for default settings of parameters when contradictory data are impossible because of the length of words. Contrast languages like English and Lenakel. In English, there is a single scansion of right-to-left footing. Moreover, children are exposed to relatively short words. Lenakel, on the other hand, exhibits bidirectional footing (at least two scansions from different directions). The demonstration in (14) requires that Lenakel children be exposed, at the appropriate point of acquisition, to words of at least seven syllables. Our approach predicts that learners not exposed to words of sufficient length will have to opt for the default choice between one scansion and two scansions: presumably one scansion (as in English).10 As a second example of this sort, consider the possibility of foot extrametricality. 11 Contrast the following systems. All involve building trochees from left to right. The first two build a right-headed word tree. The second system also makes a final degenerate foot extrametrical. The third builds a left-headed word tree. As shown in (22), the first two systems only become distinct in words of three syllables or more. The latter two only become distinct when words of four syllables are considered. X

X

X

(X)

(x)

(X)

(X)

(X)

(X)

a

a

a

X

X

X

(X)

(x)

(x)

(X X)

(X X)

(X X)

a a

a a

aa

x)

X

X)

(x

(X (XX)

(X)

aa a

(X X) (X)

aa

X (X

X)

X

)

a X

(X

X)

(X

X)

(X X) (X)

aa

a

X (X

X)

(x x)(x x)

(X x)(x x)

(x x)(x x)

aa aa

aa aa

a a aa

60

Michael Hammond

Again, the system developed here requires that learners exposed to words of insufficient length to distinguish these systems will opt for the default settings for the parameters that separate these systems.12 Finally, the approach taken here makes an extremely interesting prediction about other components of grammar. If the hierarchical asymmetry is truly a function of an extralinguistic constraint on the size of shortterm memory, then we would expect the same constraint to also affect other domains of grammar, e.g. syntax, semantics, etc. To summarise, it has been shown that there is an asymmetry in the use of metrical parameters at different layers of the metrical hierarchy. This asymmetry is most appropriately handled by imposing an extralinguistic constraint on the learning of stress systems. This forces us to revise the criterion of learnability so that only occurring grammars need to be learned. It also forces us to revise our understanding of the character and relevance of short-term memory. These conclusions are based on a comparison of the predictions made by metrical theory and the stress systems of the world. If there are significant flaws in our understanding of either of these, the results would have to be reconsidered. This is not a problem by any means. The proposal made here is easily falsified and thus provides clear directions for further investigation. Last, note that our results with respect to the learnability criterion are independent of how learning actually takes place.13 The seven-syllable hypothesis says nothing about how learning happens. What it says is about what the input to learning must be.

FOOTNOTES •Thanks for useful discussion to the participants in my Spring 1990 seminar at the University of Arizona, Diana Archangeli, Andy Barss, Robin Clark, Dick Demers, Elan Dresher, Kerry Green, Terry Langendoen, Adrienne Lehrer, John McCarthy, Cecile McKee, Shaun O'Connor, Dick Oehrle, Doug Saddy, Paul Saka, and Sue Steele. Thanks also to the editor and two anonymous reviewers. Some of this material was presented at G L O W (Hammond, 1990a). All errors are my own. 1. See, however, Braine (1974), Dell (1981), Dresher (1981), Dresher and Kaye (1990), and McCarthy (1981). 2. This particular representation is used for typographical convenience. In all respects, the representation employed here is a notational variant of the "lollipop" representation used by Hammond (1984/1988) etc. See Hammond (1987) for discussion. 3. See the references cited in the text. 4. Halle and Vergnaud (1987) accomplish this indirectly with the mechanism of conflation. 5. See Wexler and Culicover (1980) and Gold (1967) for a discussion of what these properties are.

Parameters of Metrical Theory and Learnability

61

6. Theories differ with respect to the number of constituents allowed. For example, Hayes (1987) has three, Halle and Vergnaud (1987) have five, Hammond (1990b) has nine, and Hayes (1981) has twelve. All of these theories allow all possibilities at the foot level. 7. Beat addition of the sort that promotes the first stress of Apalachicola can be accomplished with a binary left-headed colon. An unbounded colon is not necessary for cases like this. 8. As pointed out to me by Iggy Roca, some of the parameters in (13) are dependent in an interesting sense. For example, from the fact that the only constituents available at this level are unbounded, it follows that there is no directionality, no iterativity, and a onescansion maximum. While this accounts for some of the restrictions (13b,c,f), it does not account for all of them (13a,d,e). It might be possible to derive the feet in (13a) from the requirement that metrical trees terminate in a single node. This requirement is a stipulation that is otherwise unmotivated. 9. This is shown by the fact that syllables can never be split into separate metrical constituents (Hayes, 1981). There are languages like Southern Paiute where stress is arguably assigned to the mora, rather than the syllable. In such a language, the seven-unit restriction may apply to morae. The Southern Paiute stress system is consistent with either hypothesis. 10. Obviously, it is unethical to test this hypothesis experimentally. The hypothesis can be verified observationally, however, if the language learner's experience can be assessed for word length at the critical stage. Language acquisition research has not reached the point where this information is available. 11. Foot-extrametricality can only apply to degenerate feet, e.g. in English, Aklan, Odawa, etc. (Hammond, 1990b). 12. See Dresher and Kaye (1990) for one proposal regarding default parameters in metrical theory. 13. For some interesting recent proposals, see Barss (1989) and Clark (1990).

REFERENCES Barss, Andrew. 1989. Against the Subset Principle. Paper presented at WECOL, Phoenix. Baker, C.L. and John J. McCarthy (eds.). 1981. The Logical Problem of Language Acquisition. Cambridge, Massachusetts: MIT Press. Braine, M. 1974. On what might constitute learnable phonology. Language 50. 270-299. Brown, R. and C. Hanlon. 1970. Derivational complexity and the order of acquisition of child speech. In J.R. Hayes (ed.) Cognition and the Development of Language, New York: Wiley. Chomsky, Noam and Morris Halle. 1968. The Sound Pattern of English, New York: Harper & Row. Clark, Robin. 1990. Some elements of a proof for language learnability. Ms. Université de Geneve. Dell, F. 1981. On the learnability of optional phonological rules. Linguistic Inquiry 12. 3138. Dresher, Bezalel Elan. 1981. On the learnability of abstract phonology. In Baker and McCarthy (eds.), 188-210. Dresher, B. Elan and Jonathan D. Kaye. 1990. A computational learning model for metrical phonology. Cognition 34. 137-195. Gold, E.M. 1967. Language identification in the limit. Information and Control 10. 447474. Halle, Morris. 1989. The exhaustivity condition, idiosyncratic constituent boundaries and other issues in the theory of stress. Ms. MIT.

62

Michael

Hammond

Halle, Morris and Jean-Roger Vergnaud. 1987. An Essay on Stress. Cambridge, Massachusetts: MIT Press. Hammond, Michael. 1984/1988. Constraining Metrical Theory: A Modular Theory of Rhythm and Destressing, 1984 UCLA doctoral dissertation, revised version distributed by IULC, 1988, published by Garland, New York. Hammond, Michael. 1986. The obligatory-branching parameter in metrical theory. Natural Language and Linguistic Theory 4. 185-228. Hammond, Michael. 1987. Accent, constituency, and lollipops. CLS 23/2. 149-166. Hammond, Michael. 1990a. Degree-7 learnability. Paper presented at GLOW, Cambridge, England. Hammond, Michael. 1990b. Metrical Theory and Learnability. Ms. U. of Arizona. Hayes, Bruce. 1981. A Metrical Theory of Stress Rules, 1980 MIT Doctoral Dissertation, revised version available from IULC and Garland, New York. Hayes, Bruce. 1987. A revised parametric metrical theory. NELS 17. 274-289. Hayes, Bruce. 1989. Stress and syllabification in the Yupik languages. Ms. UCLA. Levin, J. 1990. Alternatives to exhaustivity and conflation in metrical theory. Ms. University of Texas, Austin. McCarthy, John J. 1981. The role of the evaluation metric in the acquisition of phonology. In Baker and McCarthy (eds.), 218-248. Miller, George A. 1967. The magical number seven, plus or minus two: some limits on our capacity for processing information. In G. A. Miller (ed.) The Psychology of Communication. New York: Basic Books Inc. 14-44. Wexler, Kenneth and Peter W. Culicover. 1980. Formal Principles of Language Acquisition. Cambridge, Massachusetts: MIT Press.

Markedness and growth* Teun Hoekstra University of Leiden

Within generative-based acquisition studies two distinct implementations of the innateness hypothesis may be distinguished. The traditional conception is now commonly referred to as the continuity hypothesis (cf. Pinker 1984), which holds that all of UG is present at birth, while development from initial state to steady state is determined by factors outside UG. Various aspects can play a role in this development, such as general cognitive growth, or memory growth in particular, in short factors that determine learning. In contrast, the maturational hypothesis (cf. Borer and Wexler 1987) holds that UG itself develops gradually, in the sense that specific principles of UG become available only after a certain period of maturation. In this paper I want to discuss certain questions relating to the conception of language acquisition in terms of parameter setting. An interesting issue is the triggering problem (cf. Borer and Wexler 1987). If values of parameters should meet a condition of learnability defined in terms of easily accessible data to fix the value of a parameter, what delays the child in actually fixing it? Evidently, the possible answers given by the two different hypotheses are distinct: according to the maturational hypothesis (henceforth MH) the parameter P starts asking questions to the data only after it has matured, while the continuity hypothesis (henceforth CH) should provide an answer along different lines. A second question concerns the way in which a particular stage of linguistic development relates to a parameterised property if its value has not already been set. Suppose that the actual output at stage n is consistent with one particular value (a) of the parameter P, can we then say that P has (a) as its initial setting, and that (a) is hence the unmarked value of P? If so, what does unmarked mean, and does it tell us something about the two hypotheses mentioned above? Jakobson (1941) made the claim that crosslinguistic distribution and developmental priority can be captured by the same notion of markedness. This claim harbours a common research programme for traditional general linguistics, taken as the study of linguistic systems, and the generative paradigm as defined by its ultimate explanatory goal of language acqui-

64

Teun Hoekstra

sition. Yet, as I shall clarify below, there are several notions of markedness in the current literature on language acquisition, which need to be kept apart. Before getting into those matters, I shall start with a short description of the parameters model.

1. PARAMETERS AND MARKEDNESS

Although solving the problem of language acquisition is central to the generative research programme, actual work on language acquisition within the framework has long been relatively scarce. The main problem with early attempts to look at acquisition data from the generative perspective was that too many interpretations were possible for the observations that could be made. The situation for the study of adult systems was not fundamentally different. The expressive power of earlier versions of generative grammar was so rich as to hardly impose any restrictions on the way in which a particular grammatical phenomenon could be analysed. By way of illustration, consider the difference between English and French regarding dative constructions. As is well-known, English allows two complementation types for verbs of the g/ve-class, where French, if we disregard the clitic construction, allows only the prepositional variant (*Je donne Jean un livre vs. I give John a book). Although particular proposals were around, the theory as such would allow an account of this difference in at least the following ways: a.

in terms of lexical subcategorisations: (1)

b.

NP PP], [ NP PP]

NP NP] }

in terms of PS-rules: (2)

c.

give: {[ donner: [

English: VP — V (NP) (NP) (PP)* (S') French : VP — V (NP) (PP)* (S')

in terms of transformational rules: (3)

dative shift: X

V, NP to NP Y

1 2 3 4 5 6 = ^ 1 2 5 3 6 optional in English, not available in French Given such freedom of expression allowed by UG for the analysis of adult

Markedness and growth

65

systems the child will have a hard time figuring out what the adequate grammar for the language he is being exposed to should look like. Even more difficult for the linguist is the interpretation of production data in early stages of acquisition, as these data can also be analysed in a rich variety of ways. The first task facing generative theory was therefore to drastically reduce the descriptive options made available by UG. Several changes in the theory brought this goal within reach. Specifically, the abandonment of a construction specific approach and/or its replacement by the modular conception, according to which a particular construction can be seen as the result of an interplay of several relatively simple modules. The reduction of the transformational component to the move (a) format, available in both French and English, made it impossible to express the difference between these languages with respect to dative constructions in the manner described in (3). The proposal to reduce the content of PS-rules to the principle that the internal structure of a phrase is to be regarded as a projection of lexical properties makes (2) unavailable. This leaves us with (1) as a means to capture the difference between French and English, but it will be clear that this can only be regarded as a description of the difference, as it raises the question of why French should not have lexical items with the properties of English give. In fact, the modular approach leads us to ask even more general questions: is the fact that French does not have such lexical items related to other properties of French in which it differs from English, and can these sets of properties follow from a single difference at a more abstract level? Could there be a principle P-prep with two values, such that a positive value of the parameter in P-prep yields a grammar of the French type, while a negative value yields an English type system? For the case at hand, one might think of a correlation of such properties as those in (4) (cf. Kayne 1981): (4)

preposition stranding: a. English: Who did you vote fort b. French: *Qui as-tu voté pourt

(5)

particle constructions: a. English: John called Bill up John called up Bill b. French: particle constructions are simply not available

(6)

prepositional complementizer plus lexical subject: a. English: We hoped for something nice to happen b. French: *On espère de quelque chose passer

66 (7)

Teun Hoekstra

believe-type with lexical subject a. English: We believed John to have done it b. French: *Nous croyons Jean avoir fait cela Now that the expressive format of rule systems is replaced by a set of more abstract principles, some parameterised, we can profitably undertake the endeavour which is in a sense complementary to providing explanations for the language acquisition problem, i.e. coming to grips with the variation between languages. Clearly, this variation imposes a bound to the specificity of the principles of UG in the sense that they must at least allow for the attested variation. Hence, whereas the language acquisition problem requires the principles of UG to determine the grammatical knowledge of a particular grammar as closely as possible, where complete determination would constitute the optimal case, the complementary demand on UG with respect to variation sets a limit to this fit. The study of crosslinguistic variation from the perspective of UG suggests a notion of markedness, which in turn becomes relevant to an understanding of language acquisition. Several notions of markedness, or rather considerations leading to the assumption that a certain option is marked, are available in the literature. I shall first present these different notions of markedness. To start with, we can use the dative construction as an example. According to one of the principles of UG, lexical NPs must be Case-marked. The circumstances under which an NP is Case-marked may vary across languages, within certain boundaries of locality, etc. Despite this variation, at least two instances of Case-marking seem to be rather stable: assignment of Nominative Case to the subject of full clauses and assignment of Objective Case to the complement of verbs and prepositions. Let us assume that these conventions are part of UG. The prepositional variant of the dative construction is unmarked from this point of view, as both object NPs are Case-marked in accordance with UG-determined conventions, whereas the "prepositionless" dative construction poses a problem, i.e. it requires a more specific mechanism. This is not to say that the mechanism involved must be language specific or outside the scope of UG, but merely that it is special and would hence require determination in a more specific way. Assuming P-prep as above, the French value may be taken to represent the unmarked case, while the English value is the marked value, which must be fixed on the basis of positive evidence. It should be noted that markedness considerations of this type are not extensional, but intensional, as they pertain to the system generating languages rather than to the number of languages having such a system, or to the set of sentences generated by such a system. The reason to adopt the English value of P-prep as marked does not necessarily mean that English-type languages are necessarily less common than French-type

Markedness and growth

67

languages. What this means is that there is no a priori reason to assume that what is marked with respect to language acquisition is also marked in a distributional sense, although distributional factors may provide motivation for specific markedness assumptions. The assumption of markedness may just as well be determined on the basis of acquisition data themselves. A case in point is the so-called prodrop parameter, which again may be taken to be binary, yielding the Italiantype system with pro-drop, free-inversion and long subject extraction on one value, and the French-type system, which lacks these three possibilities, on the other value. I disregard at this moment the serious questions that can be asked about the correctness of the parameter. It has been argued by Hyams (1983) that early stages in the acquisition of English exhibit pro-drop, unlike the target language, which should therefore be taken to indicate that the English value is the more marked one, and thus requires fixing, while an Italian-type system is stable in this sense, i.e. the initial setting of the parameter is never changed in the course of the acquisition process. Let us call this developmental markedness as opposed to distributional markedness. There is no logical need that the system which is said to be marked on the basis of such developmental considerations has a more narrow distribution crosslinguistically: developmental and distributional markedness would just not converge. If the developmentally prior option is taken to derive from UG, distributional considerations become irrelevant for the formulation of U G in principle. I return to this question below. At this point I would like to mention a further source of motivation for particular markedness assumptions, which I shall call extensional markedness (not to be confused with distributional markedness, which is also external, but pertains to the number of languages, rather than to the size of any particular language). In this case the markedness assumptions derive from the no negative evidence hypothesis, i.e. the hypothesis that children do not have access, at least not in a systematic way, to evidence that something is impossible in the language they are exposed to. The primary concern here is with the question of what prevents children from constructing overly general grammars, which are not merely consistent with the language to be acquired, but with a superset of that language. In order to prevent this, the child is assumed to be conservative in the sense of sticking to the least marked system possible, unless he is forced by positive evidence to move over to the next least marked system, where systems are ranked on a markedness hierarchy in terms of the extensions of the sets generated by the different values or systems. The clearest statement of this position is to be found in Wexler & Manzini (1987), where a markedness hierarchy is provided for the notions governing category and antecedent type for anaphors and pronouns.

68

Teun Hoekstra

As in the previous case, there is no intrinsic relation to distributional markedness, i.e. most languages might, as a matter of fact, be pretty large with respect to the governing category hierarchy for their anaphoric elements. From the point of view of extensional markedness, such a distributional fact would just be coincidental. There is of course by logical necessity an intrinsic relation between developmental markedness and extensional markedness, i.e. the system that the child starts off with should be minimal from an extensional point of view. If Hyams' (1983) claim concerning the pro-drop parameter is correct, therefore, pro-drop languages should extensionally be smaller than non-pro-drop languages. Although an ultimate assessment is rather complicated (see Saleemi, this volume, for discussion), it would appear at first sight that rather the opposite is true. Pro-drop languages allow both null and overt pronominal subjects, preverbal and postverbal subjects and, long subject-extraction, while non-pro-drop languages only have overt pronominal subjects, subjects occur on only one side of the verb, and they do not allow sentences with long subject extraction. Hence, the two types of internal markedness considerations (internal in the sense that they both have to do with language acquisition itself, rather than with matters external to it) may also fail to converge. This also raises the question of which of the two should be given priority, if either of the two types is incorrect in principle. To sum up, I have distinguished several notions of markedness, or rather several types of considerations to determine which value of a parameter P is marked: 1 2 3 4

distributional: a is unmarked relative to /? if a is instantiated in a larger number of languages than /? developmental : a is unmarked relative to ft if a is developmentally prior to /? extensional : a is unmarked relative to /? if the set generated by P(a) is a subset of the set generated by P(/3) intensional : a is unmarked relative to /} if the system with P(a) is "smaller" (in a sense to be made precise) than the system with P(/3)

Considerations 1 and 2 are observational in the sense that the marked character of /? is determined purely in terms of what is actually found in the data. The claim made by Jakobson suggests that there is an observational convergence between these two domains. If this is not the case, 2 should be given priority over 1, if considerations of this type are correct to begin with. Similarly, if 2 does not square with 3 or 4, 2 should be given priority. In the next section I shall argue that developmental markedness considerations should be dismissed, as they are based on a misconceived relation between output and system.

Markedness and growth

69

2. DEVELOPMENTAL MARKEDNESS

Let us take a closer look at considerations of type 2. This brings us to the second issue mentioned in the introduction, viz. the relation between UG and the stages of language acquisition. Much discussion within generative grammar on the problem of language acquisition has focussed on the logical rather than the developmental question, which is to say that much of the discussion has taken place under the instantaneity assumption. It is quite understandable that under this assumption considerations external to the actual development, such as distributional markedness considerations, play a prominent role. It is also true that parameters that are currently being considered are parameters of variation rather than of systems as they develop in the child. Considerations of extensional or intensional markedness, on the other hand, do pertain to the developmental problem, but in contrast to developmental considerations, their motivation is independent of the actual process, and depends on the logical question. What is remarkable is that children seem to be slow in some domains, but extremely rapid in others, even though there is no sense in which the one type of domain can be said to be more difficult, under whatever extrinsic notion of complexity, than the other. A typical example of their slowness is the acquisition of the correct past tenses of irregular verbs, for which positive evidence is easily accessible, and quite often negative evidence is provided as well. A more interesting case is the control over disjoint reference of pronouns in a local domain, i.e. John washed him, where children are slow in finding out that him and John must be different persons. In order to account for delayed acquisition while the relevant evidence had been available all the time, Wexler & Borer (1987) have put forth the Maturational Hypothesis. As a specific example they suggest that the notion of A-chain becomes available in a later stage of maturation, which allows an explanation for the absence in earlier phases of a number of properties that all seem to come in rapidly and simultaneously once the A-chain capacity has matured. This specific hypothesis is discussed at length in sections 4 and 5. The Maturational Hypothesis leads us to reconsider both the question of how to characterise early stages, and markedness considerations that are based upon them. The absence of e.g. passive, a construction type that is dependent on A-chain formation, at early stages could no longer be taken as evidence for markedness considerations with respect to passive, i.e. there would be no sense in which passive could be said to be marked just because it is absent at a certain stage. By the same token, the fact that early stages of both English and Italian children show pro-drop would not need to have any bearing on the question about markedness of prodrop. More generally, considerations concerning developmental marked-

70

Teun Hoekstra

ness may be taken to be ill-conceived to begin with, if the Maturational Hypothesis is adopted. The strong past tenses again provide a clear illustration of what is wrong with the reasoning. It is well-known that certain irregular past tenses are acquired very early. No one would accept the conclusion that these forms are "unmarked" with respect to regularly formed past tenses. The notion of markedness only makes sense relative to a system, i.e. not in absolute terms. The irregular past tenses may be considered marked relative to the regular system of past tense formation, but before this system has become active, there is no sense in which such forms are more marked than e.g. a form like horse. The essential point is that whether phenomena displayed at some stage of acquisition are determined by a system similar or identical to the adult pro-drop system cannot be determined in an absolute sense. If that were the case, markedness would be a mere taxonomy of observations. In Hyams' view the grammars of Italian children are continuous with respect to the pro-drop parameter, while English children have to change their initial setting to the more marked value. However, if the setting of the prodrop parameter does not take place until a certain maturational stage is reached, there is no obvious sense in which the grammar of Italian children is continuous, even though its output may be unaffected. To slightly elaborate on this, let us assume that the notion of pro-drop, now restricted to the actual dropping of subject pronouns, has content only relative to the interpretation of INFL-features. For example, let us assume that the positive value of the pro-drop parameter is a function of a pronominal interpretation of INFL-features, whereas a negative value is consistent with an anaphoric status only, i.e. this set of features must be licensed by entering into an agreement relation with an overt NP. Clearly, then, the setting of the parameter value must be delayed until after the acquisition of INFL-features, as the value is determined relative to these features. It will be clear that before these features (or the node INFL itself) are acquired, absence of overt subject pronouns cannot be interpreted as resulting from the positive unmarked setting of the parameter, as there is no sense internal to the system in which the notions relevant to this parameter can play a part. There is a different sense in which the acquisition of Italian is continuous as regards pro-drop, but this is an irrelevant observational continuity under this construal, which cannot be explained in terms of an identical setting of the parameter value. How the observational continuity is to be explained is again a different question. Here I can only make a tentative suggestion. One of the questions that still stands out in the assessment of early child grammar is whether it is correct to assume that children drop subject pronouns, or whether they drop pronouns more generally. It is certainly not true that children do not " d r o p " object pronouns, but it is assumed

Markedness and growth

71

that object pro-drop is much less common than subject pro-drop. It seems to me that in order to evaluate this quantitative distinction one has to also take into account the relative distribution of subject and object pronouns in adult speech. From this we know that the frequency of pronominal subjects in transitive clauses is much higher than that of pronominal objects. Looking at it as a dropping process, we must take into account that from a discourse point of view the number of candidates in subject position far exceeds the number of object candidates. We may then assume that from a grammatical point of view there is initially no asymmetry between subject and object drop, contrary to what Hyams concludes. Drawing on Rizzi (1986), Hoekstra & Roberts (1989) make a distinction between content licensing and formal licensing, where content licensing is interpreted as licensing in terms of 0-roles and formal licensing as licensing in terms of "morphological" features. It is argued that the former can be considered a form of D-structure licensing, while the latter is S-structure licensing. Mechanisms of S-structure licensing have to do with the identification of the referent of an argument, e.g. through AGR-coindexation, chain formation, or visibility in terms of phi-features or descriptive features. Thus, the two arguments of a sentence like He kicked the boy are Dstructure licensed in terms of the argument roles assigned by the predicate kick, while the agent argument is S-structure licensed through the phifeatures of the pronoun he and the patient argument is S-structure licensed in terms of the descriptive content (plus quantification) in the NP the boy. I would now like to put forward the hypothesis that early child grammars are characterised by the absence of an S-structure licensing requirement, i.e. D-structure licensing suffices. Adopting a maturational perspective we may interpret this hypothesis in terms of a maturational delay of S-structure licensing. In Hoekstra & Roberts (1989) it is argued that under certain conditions adult systems too allow arguments that are D-structure licensed only, e.g. the null objects in the constructions discussed by Rizzi (1986) and the null external arguments in middle constructions. In those cases, the lack of S-structure licensing is compensated for by an additional form of D-structure licensing (cf. Hoekstra & Roberts 1989 for details). To make our hypothesis more specific, let us assume that S-structure licensing is a function of Case marking. This seems to be quite reasonable if we regard Case assignment as a way of providing visibility to arguments. As we saw, there are two structural Case configurations, complements of Verbs and Prepositions and the Specifier of tensed clauses. While P and V assign Case to their complement under government, Nominative Case is assigned under the mechanism of Head-Spec agreement. Given this formal dissimilarity, we might expect an asymmetric growth of the

72

Teun Hoekstra

relevant assignment conventions, such that objective Case assignment is acquired before Nominative Case assignment. If it is indeed correct that the loss of "subject drop" takes place significantly later than the loss of "object drop" this might be accounted for along these lines. I shall not work out this hypothesis in this paper. I want to conclude this section with a dismissal of developmental markedness considerations. I have argued that output continuity cannot be taken to reflect system continuity. If the value of P is set at stage Sn, and stages S; prior to Sn exhibit phenomena that are characterisable under P(a), this should not be taken as evidence that Sj is generated with a system including P(a), and that therefore P(a) is unmarked vis-à-vis P(ß), as P is not present in that system at all. The observational continuity might be explained along different lines.

3. EXTENSION AND INTENSION

The notion of extensional markedness was developed by Wexler & Manzini (1987) as part of a learning theory. The theory is rather minimal, but is has a number of interesting theoretical consequences. Basically, what the learning theory says is "assume the value x of P that generates the smallest set, and let yourself be forced to y (x < y) only if some positive data are outside the set generated by P(x)". In order to decide on a particular value, the child has to consider the extension of what his system generates, and compare it to the data he is being exposed to. A slightly different conception would be that the system generating the smaller language is "smaller" and that in order to adapt it so as to generate a larger language if the input demands this, the system has to grow, e.g. by adding some additional mechanism. Let us make this difference somewhat more concrete by providing an example. Take two languages, having only local anaphors, as in English, and L2 having long distance anaphors, as in Japanese. Under the extensional perspective this variation is captured in terms of a different notion of governing category relevant for the two languages. So, Japanese grammar possesses the value GC 2 , while English has G Q , where extentionally L(GCj) < L(GC 2 ). Therefore, the Japanese child starts out assuming that he is in a L(GC]) language, but being confronted with a long-distance bound anaphor, he adopts GC 2 . Nothing in the grammar would suggest that GC] is simpler than GC 2 : their relative appearance is merely a function of the differences in extensions. Taking an intensional perspective, we might suggest that the grammar of L2 properly includes the grammar of L, with regard to anaphors, with the addition of some mechanism M, accounting for the long distance binding phenomena, i.e. G(L2) = GiL^+M. Spe-

Markedness and growth

73

cifically, M might be LF-movement of the anaphor, lacking in English, but available in Japanese (cf. Pica 1987, Chomsky 1986). This second conception boils down to saying that languages grow because their grammars grow, where growing is either maturation or learning. While a language may grow as a result of the growth of the system, it might also shrink as a consequence of an addition to the system. If my suggestion concerning the implementation of a Case requirement at a later stage is correct, this would indeed occur. Notice that such shrinkings cannot be accounted for under an extensional approach like the one advocated by Wexler & Manzini (1987). In the next two sections I shall discuss two recent proposals by Borer and Wexler within the framework of UG constrained maturation to see whether the evidence they put forward is consistent with the intensional view of grammatical growth. I shall reject the relevance of the notion of extensional markedness. A second question that I address in this discussion is whether the growth responsible for the acquisitional progress should be considered as resulting from maturation or from learning.

4. THE NOTION OF GROWTH: THE UNIQUE EXTERNAL ARGUMENT PRINCIPLE

In an interesting paper, Borer and Wexler (1988) make a claim which is inconsistent with the notion of growth that I developed in the previous section. They propose a principle, called the Unique External Argument Principle (UEAP), which disappears in the course of maturation. This would be an instance of ungrowing, i.e. of shrinking of the system. I want to argue that the facts that motivate UEAP can be reinterpreted as resulting from growth of the system, and suggest that this growth does not require a maturational account. The second point is essentially independent of the first, i.e. even if an intensional account can be argued to provide a better alternative to UEAP, the growth might be triggered by learning or maturation. UEAP requires that each predicative element have its own subject. B&W use this principle to explain the following situation. In Italian, participles in the perfect do not normally agree with their nominal object (8a), although there is agreement with clitic objects (8b), as well as with subjects in passive constructions (8c), but not in unergative intransitive constructions (8d): (8)

a. b.

Gianni Gianni Gianni Gianni

ha letto i libri has read[-AGR] the books li ha letti them has read[+AGR]

74

Teun Hoekstra c. d.

I libri sono stati letti The books are been[+AGR] read[+AGR] Gianni e Piero hanno corso Gianni and Piero have run[-AGR]

Children divert from this pattern in two respects: they uniformly have agreement between the object and the participle in (8a), and there are no occurrences of the perfect with intransitives of the type (8d). The question is, how to capture the generalisation between non-occurrence of (8d) and overgeneral agreement in (8a). This is where UEAP comes in. With Borer & Wexler (1988) we must make the basic assumption that agreement in early stages results from the same mechanism that is operative in adult grammars, which is to say that it results from a relation with a local subject (cf. Kayne 1986). UEAP requires that every predicate element has its own unique subject. There is no way in which this requirement can be met in (8d), as there are two predicative elements {hanno and corso), but only one subject candidate. UEAP can be met in (8a), however, if i libri is taken as the subject of an adjectival participle letti, which must agree with its subject according to the agreement rule. The overgeneralised agreement in (8a) is lost and (8d) is let in as soon as UEAP disappears from the grammar. This way the generalisation is captured. Let us first turn to the epistemological status of UEAP. The interpretation of UEAP given by B&W (1988) is a maturational one: "UEAP, we propose, represents a maturational stage. While it constrains the early grammar, it is, obviously, not a constraint on the grammar of adults" (1988:22). Notice the implication of this for the hypothesis of UG-constrained maturation. Not only are we to assume that certain portions of UG become available at a certain maturational stage, other portions of UG become unavailable at a certain maturational stage, since UEAP, a principle of UG, is not characteristic of any adult system (by definition), but only of certain stages of language acquisition, disappearing from the organism in a way similar to the loss of the drowning reflex. Notice that UEAP comes very close to a principle of adult-systems, in effect one of the most basic principles of GB-theory, viz. the Projection Principle. If a predicate has a role, it must be assigned to a unique argument. Rather than taking UEAP as an independent principle, B&W suggest looking upon it as a proto-principle that ultimately develops into this principle. The difference between UEAP and the Projection Principle is mainly a matter of scope, UEAP being wider in scope in the sense that a predicate requires a subject independent of the assignment of an argument role to it. The question we have to answer then is how this scope is narrowed down, so as to capture the relevant generalisation, viz. that loss of agreement in (8a) and the emergence of (8d) are simultaneous.

Markedness and growth

75

An interesting fact, to which B&W (1988) do not attach any significance, is that Italian children do construct the perfect tense with intransitive verbs, but only if these belong to the essere-selecting class, i.e. the ergative intransitives (cf. Burzio 1981). In itself this is not problematic for UEAP, as an exception is made as regards the application of UEAP precisely for essere 'be'. Hence, a sentence like Maria e caduta 'Mary is fallen ( + AGR)' does not pose a problem for UEAP, because the only predicative element subject to UEAP is caduta, the participle. The notion of exception to UEAP, restricted by B&W to essere, provides an alternative way of capturing the relevant generalisation. Once we state that avere is an exception as well, the generalisation is captured as well: avere in (8a) and (8b) would no longer require a subject of their own. What does it mean to say that a (verbal) element is an exception to UEAP? Formulated in terms of UEAP, an exception would be a verb that shares its subject with another verb. Clearly, for a verb to share its subject with another verb means that it is an "auxiliary". This is not the place to enter into a discussion of how to represent auxiliaries, neither in adult, nor in child grammars. Various options are currently available. The crucial property would appear to be that these verbs do not have a thematic grid. Whatever its implementation, such a notion must be made available by UG. The question then becomes whether this notion is continuously available, or whether it comes in at a certain maturational stage. The fact that essere is taken as an auxiliary in the relevant sense at a very early stage suggests the former answer, but this answer, as always, raises the triggering problem: why is the acquisition of auxiliary avere (as well as other auxiliaries) more delayed than essere'? If a satisfactory answer can be given to this question, a maturational account is not called for. It would seem that the primary data provide unambiguous evidence for the auxiliary status of essere: it is an auxiliary in every sentence in which it occurs, i.e. it always shares its subject with another predicative element (with an adjectival predicate as a copula, with a PP as a locational predicate, and with a participle as a temporal or passive auxiliary). Avere, on the other hand, is like have in English: it is a main verb in the simplest sentences in which it occurs (John has a bicycle), it may take small clausal complements (John has his door open), also with participles as their predicate (John has fugitives hidden). Under the latter structural analysis of (8a), agreement is predicted. In short, most of the input with avere either requires or is consistent with a non-auxiliary interpretation of avere. I further claim that the child will assume that a particular form instantiates the same lexical requirements. Analysing avere as a main verb will not only allow, but in fact impose the relevant small clause interpretation of (8a). At the same time (8d) cannot be formed. Only input of the type (8d) then forces

76

Teun Hoekstra

the child to revise his initial assumption concerning mere, so as to also use it as an auxiliary verb. This explains the developmental delay of the auxiliary avere. Under this account of the developmental delay which is illustrated by the deviations from the pattern in (8), the grammatical development can be formulated in terms of growth. What grows is the system that at first takes avere to be a main verb, selecting its own subject, but then allows an interpretation of avere as an auxiliary. This process of growth does not require a maturational account, as sensible explanations for the delay in identifying the auxiliary status of avere are available. Hence, the only shift is one of adding lexical information, which is compatible with learning.

5. A-CHAINS

The major motivation for Borer and Wexler's maturational approach derives from the growth of passives. Verbal passives do not occur at early stages. Borer and Wexler explain this developmental delay by assuming that A-chains mature. To be sure, certain passives do occur at stages before the alleged maturation of A-chains, but they analyse these as adjectival passives. Unlike verbal passives, adjectival passives do not involve A-chains, i.e. are not created by syntactic movement, but rather formed in the lexicon, as proposed by Wasow (1977). In this section I shall first argue against the claim that A-chain formation is unavailable at the relevant stage. This argument is based on the analysis of ergative verb constructions, which also involve the formation of Achains. In 5.2. I turn to passives, and argue that the distinction between the two types of passives is insufficiently clear. I then claim that restrictions on early passives do not result from the absence of A-chain formation, but from independent factors. 5.1. Ergatives Within the class of intransitives a distinction is made between unergatives, which take their single argument as an external one, and ergatives, the single argument of which is projected internally. This internal argument is moved to the subject position for reasons of Case, as ergative verbs do not assign Case to the NP they govern. If NP-movement is unavailable in early stages of the acquisition process, it follows that children cannot make the distinction between ergative and unergative intransitives in the way GB-theory represents this distinction. More specifically, children must represent all their intransitives as unergatives. This in turn implies that generalisations which are formulated

Markedness and growth

77

in terms of the class distinction either will fail to hold, or are captured in other terms. A case in point is the selection of perfective auxiliaries, which is sensitive to the (un)ergativity of the verb (cf. Burzio 1981 for Italian, Hoekstra 1984 for Dutch). Dutch children are correct in this respect very early, long before the purported emergence of A-chains. The same appears to be true for Italian children. This implies that they are sensitive to the distinction. If the distinction is not represented in the way it is assumed to be in the adult grammar, the mechanism for auxiliary selection should equally be different. This would raise the question of why children would ever change their system. In order to motivate the claim that children represent ergative and unergative predicates in the same way, viz. as unergatives, B&W adduce cases of overgeneralisation of lexical causativisation, reported for English children by Bowerman (1982). So, apart from transitives such as John broke the glass related to the ergative intransitive The glass broke, children are reported to form alongside unergative intransitives like I sneezed transitive causatives like Daddy's cigar sneezes me. To explain this, B&W assume that, given the fact that they also have to represent intransitive break as unergative, children are forced to formulate a causativisation rule that is marked, while in adult English causatives are formed by an unmarked rule. The marked rule requires the internalisation of an external argument, while the unmarked one would solely add an external causer argument to a verb that did not yet have an external argument. It is only after the maturation of A-chain formation that the child realises that some of the causative/inchoative patterns are consistent with the unmarked instantiation of the rule. Once this is realised, the child stops overgeneralising, as he drops the assumption that the marked rule is operative in the language he is learning. There are several problems with this analysis. The most basic of these is that the hypothesis lacks a perspective on the way in which the difference between an ergative and an unergative representation of a particular item is determined. It is unclear, therefore, how, after the A-chain mechanism has come into the child's reach, he finds out which of his intransitive verbs have an erroneous representation, given that an unergative representation is still available after the emergence of A-chain formation. Related to this is the observation that reported cases of overgeneralisation are not random. I shall elaborate on this matter below. First, however, I would like to consider the notion of marked causative rule itself. From a crosslinguistic point of view, the notion of a marked causative rule as the one employed by B&W seems highly suspect. They notice in passing that the English rule makes use of a zero-affix. Such zero-causative formation of the English type occurs in many languages, but the rule always seems to be restricted to ergative verbs. On the other

78

Teun Hoekstra

hand, causative formation that makes use of overt affixation usually is not restricted in this way. The discussion of the Hebrew hifil by B&W makes clear that this is true for Hebrew. While an exploration of this correspondence is outside the scope of this paper, it can hardly be considered a coincidence. The relevant generalisation comes close to UEAP in a certain sense, in that what it boils down to is that no morphologically simple verb can have two external arguments, while a morphologically complex verb, part of which can be taken to represent the causative meaning, may allow an external argument on top of the external causer argument. This should follow from a principle of U G (cf. Hoekstra, forthcoming, for discussion). If that is correct, the marked rule of causative formation would fall outside the scope of UG, and would therefore be excluded by the program of UG-constrained maturation advocated by B&W. If the assumption that children can have no ergative representation is given up, the overgeneralisation of the causative rule can be used to make the opposite claim. Rather than saying that the overgeneralisations are the result of an overly general rule, they might be attributed to an overgeneralised use of ergative representation, i.e. the verb sneeze in the above example might have been represented as an ergative verb, thus falling within the reach of the unmarked causative rule, in effect the only rule consistent with UG, if I am correct. This brings us to the question mentioned above, namely how the choice between an ergative and an unergative representation is determined. It is clear that the notion of external argument used above is semantically determined. Certain argument roles qualify as external, others as internal, still others are perhaps less clear in this respect. We are referring here to the idea that there is a universal basis for the alignment of participant roles and their linguistic representation, known as the Universal Alignment Hypothesis (UAH) (cf. Pesetsky 1987). The principle has been questioned in the current literature, on the basis of variation between languages with respect to the set of ergative verbs. However, such variation does not militate against UAH: the meaning of translation equivalents may be different in a subtle way, e.g. Italian ergative arrossire 'become red' is used to denote the kind of event for which Dutch uses unergative blozen 'blush'. Yet, the fact that they constitute translation equivalents is not sufficient ground for taking them to mean the same thing: arrossire denotes a change of state, whereas blozen denotes a bodily function, behaving in similar fashion to verbs like lachen 'laugh' etc. Hence, whereas certain concepts determine a unique linguistic meaning, with a unique set of argument roles, other concepts less clearly determine the argument roles of the verb, and in those cases variation between languages is expected. This does not affect the status of UAH: it stipulates for a given argument role how it is projected onto a grammatical function. The point here is

Markedness and growth

79

that the fact that the participant roles are not always uniquely determined does not mean that the choice of an ergative or unergative representation is always arbitrary. The essential ingredients of this hypothesis also underlie ideas such as Pinker's (1984) semantic bootstrapping hypothesis. According to UAH, agents are uniformly represented as external arguments, while themes are taken as internal arguments. In dealing with concepts that determine the participant roles less clearly, the child has the same hypothesis space as languages have: in the absence of any grammatical indications, one may wonder whether the sole argument of (adult) sneeze is an experiencer or theme, undergoing a process, or whether it should qualify as an agent. The child might have the same difficulty. Precisely under these circumstances, erroneous representations are to be expected. The non-random character of the overgeneralisations which are reported follows from this perspective on the nature of the determination of the external/internal status of participant roles. To sum up, overgeneralisations of the causative rule in English do not provide sufficient motivation for the claim that all ergative verbs are initially represented as unergatives. Such a claim would undercut the essence of the UAH, in the absence of which the way in which arguments are linked up with grammatical functions would be arbitrary in principle. Moreover, the claim requires that children exploit mechanisms which should be excluded as a matter of principle, such as a marked causative rule, as well as mechanisms for e.g. auxiliary selection and agreement which are quite different from the mechanisms assumed for adult systems. None of this is needed if the claim that A-chain formation is unavailable is given up. 5.2. Passives If the argument in the previous subsection is correct, the maturational account of the delay in passivisation in terms of A-chain formation cannot be correct. As I mentioned above, passive sentences are not really absent in early child language, but, as B&W note, the range of passives in early grammars appears to be restricted in significant ways. They take this as evidence that the child has the ability of constructing adjectival passives at an early age, but not verbal passives. Unlike the latter, adjectival passives do not involve movement. The distinction between adjectival and verbal passives was originally made in Wasow (1977) and supported with further evidence by Williams (1982). However, the theoretical underpinnings of the distinction have not gone unquestioned. The logic in Wasow's approach was to isolate a number of "adjectival" contexts and to show that certain types of passives are

80

Tern Hoekstra

not found in these contexts. To give an example, the prenominal position as in (10) is taken to be an adjectival context, i.e. if a participle occurs there, it must be an adjective. The types of passives in (9) are all impossible in this position. This would be explained if these types of passives are impossible qua adjectival passives, i.e. if they could only be derived by movement. (9)

a. b. c. d.

John was believed to be foolish Mary was given a book The war was prayed for John was elected president

(10)

a. b. c. d.

*A [believed to be foolish] person *A [given a book] girl *A [prayed for] war *An [elected president] candidate

(raising passive) (indirect object passive) (prepositional passive) (small clause complement passive passive)

As argued in Hoekstra (1984) these criteria are non-explanatory in the sense that they fail to make the correct crosslinguistic predictions. Consider the examples in (11): (11)

a. b.

het mij tijdens de lunch gegeven cadeau the me during lunch given present de door iedereen ongeschikte geachte kandidaat the by everyone unfit considered candidate

The participle gegeven in (11a) can take an NP complement, something which adjectives do not. In (lib), the participle takes a SC-complement (cf. (10d)). The reason for the ungrammaticality of the English examples in (10) is the Head Final Filter (Williams 1982) or whatever explains this filter. This filter states that the head of a prenominal modifier must occur at the right periphery of its phrase, both in Dutch and in English. Due to a difference in recursive side, English being head-initial, Dutch headfinal as far as VP is concerned, the range of possible prenominal participle constructions in English is much smaller than in Dutch. This does not constitute an argument in favour of an adjectival status, nor for a distinction between two types of passive formation. Similar reasonings hold for the other tests that Wasow proposes (cf. Hoekstra 1984 for detailed comments). I should perhaps stress that I do not claim that there are no adjectival participles. My point is that the way in which the distinction between adjectival participles and verbal participles is made, as well as the consequences for the analysis of passives based on it, are not sufficiently

Markedness and growth

81

motivated. In particular, the adjectival nature of a participle cannot simply be taken as sufficient ground for the claim that no movement is involved. B&W note that early passives are restricted to action verbs, i.e. passives with non-actional verbs such as see, like etc. are not found. They relate this to their claim that children only exploit adjectival passive formation, noting that participles of such verbs do not easily occur in "adjectival" contexts either (cf. *a liked man). Another property of early passives is that they are usually truncated, i.e. without a ¿_y-phrase. They furthermore claim that adjectival passives resist ¿^-phrases also. Interestingly, comparative evidence again suggests that the relation they see between the use of ¿y-phrases and passivisability of non-actional verbs is spurious as well. The range of application of passive in English is quite atypical. Many non-actional verbs that may be passivised in English resist passivisation in Dutch, but the use of the Dutch counterpart of the Ay-phrase is not at all restricted. The same is true for many other languages. Conversely, there are many languages in which passives are always truncated (cf. Siewierska 1985). However, even if we were to accept a qualitative difference of the type that B&W argue to be characteristic of adjectival versus verbal passives, we need not have recourse to an account in terms of adjectival passive formation versus movement. Current analyses of the passive in GB-theory claim that there is no elimination of the external argument role, but rather that this role is assigned to the passive affix (cf. Roberts 1985, Hoekstra 1986, Jaeggli 1986). The unmarked assumption might be that the passive affix is eligible only for agents, unmarked in the distributional sense. The rise of the Ay-phrase may be an independent, perhaps maturationally delayed step, which may prompt a more marked hypothesis with respect to the argument roles for which the passive affix is eligible if the language provides positive evidence for such a setting. Summing up this part, we have seen that the hypothesis that A-chain formation matures at a stage when particular types of passives come into use meets with several problems. If we adopt some version of UAH, children will postulate ergative representations for ergative verbs right from the start, which thus prompts the use of A-movement. This is consistent with the observation that Dutch and Italian children appear to be sensitive to the distinction between ergative and unergative verbs if we look at auxiliary selection in both languages, and participial agreement in Italian. I have argued that this hypothesis might also derive some support from the observation that Dutch children do not seem to overgeneralise the causative rule in the way English children are reported to do. I argued that these overgeneralisations in English are the result of an erroneous interpretation of verbs as ergatives, which obviates the postulation of a marked rule of causativisation. The late appearance of certain types of

82

Teun Hoekstra

passive constructions might be explained in maturational terms, not with regard to A-chain formation, but perhaps in terms of the mechanism involved in the use of the 6_y-phrase. A scenario of this type is consistent with the conception developed above, according to which expansion of the language generated is a function of an expansion at the level of the system. The development of passives can be seen as starting with the initial assumption that the passive morpheme may only receive the role of agent. This accounts for the absence of a large class of passives available in adult English, but the notion itself can be motivated with reference to other linguistic systems, where the use of the passive construction is similarly restricted. If early passives are the result of such a system, there is no reason to invoke the absence of Achains to explain the absence of certain types of passives.

6. CONCLUSION

In this paper I have discussed various notions of markedness that are applied to the setting of parameter values in the course of language acquisition. I have dismissed a particular type of markedness considerations, viz. those that I called developmental. I argued that the notion of markedness can only be relevant with reference to a particular system. Early stages of language acquisition may result from systems for which particular parameters are not yet relevant. I then addressed the question of what determines the expansion of early systems. Two hypotheses have received a certain prominence, viz. the continuity hypothesis and the maturational approach. While I am in principle sympathetic to the maturational approach, two specific proposals that I discussed appeared open to alternatives which are more in line with a notion of growth. Growth of a language is a consequence of growth of the system, which suggests a notion of complexity, rather than markedness per se. It is an empirical question whether this notion of complexity has a direct relation to the notion of distributional markedness.

FOOTNOTE *I would like to thank the following persons for conversations about the subject matter: Harry van der Hulst, Hans Bennis, Jan Voskuil and Rene Mulder. A special thanks goes to Hagit Borer, for giving comments which may have led to clarifications, although she is bound to disagree on a number of points.

Markedness and growth

83

REFERENCES Bowerman, M. 1982. Evaluating competing linguistic models with language acquisition data. Semantica 3. 1-73. Borer, H. and K. Wexler. 1987. The maturation of syntax. In T. Roeper and E. Williams (eds) Parameter setting. 123-172. Dordrecht: Reidel. Borer, H. and K. Wexler. 1988. The maturation of grammatical principles. Ms. UC at Irvine. Burzio, L. 1981. Intransitive verbs and Italian auxiliaries. Doctoral dissertation, MIT. Chomsky, N. 1986. Knowledge of language: its nature, origin and use. New York: Praeger. Hoekstra, T. 1984. Transitivity. Dordrecht: Foris. Hoekstra, T. 1986. Passives and participles. In F. Beukema and A. Hulk (eds) Linguistics in the Netherlands ¡986. 95-104. Dordrecht: Foris. Hoekstra, T., forthcoming. Theta theory and aspectual classification. Hoekstra, T. and I. Roberts. 1989. The mapping from lexicon to syntax: null arguments. Paper delivered at the Groningen conference "Knowledge and language". Hyams, N. 1983. The acquisition of parametrized grammars. Doctoral dissertation, CUNY. Jaeggli, O. 1986. Passive. Linguistic Inquiry 17. 587-622. Jakobson, R. 1941. Kindersprache, Afasie und allgemeine Lautgesetze. Uppsala. Kayne, R. 1981. On certain differences between French and English. Linguistic Inquiry 12. 349-372. Kayne, R. 1986. Principles of participle agreement. Ms. University of Paris VIII. Pesetsky, D. 1987. Psych predicates, universal alignment, and lexical decomposition. Ms. UMASS at Amherst. Pica, P. 1987. On the nature of the reflexivization cycle. NELS 17. 483-500. Pinker, S. 1984. Language learnability and language learning. Cambridge, Massachusetts: Harvard University Press. Rizzi, L. 1986. Null objects in Italian and the theory of pro. Linguistic Inquiry 17. 501557. Roberts, I. 1985. [1987] The representation of implicit and dethematized subjects. Dordrecht: Foris. Siewierska, A. 1985. The passive. London: Croom Helm. Wasow. T. 1977. Transformations and the lexicon. In T. Wasow, P. Culicover and A. Akmajian (eds) Formal syntax. 327-377. New York: Academic Press. Wexler, K. and R. Manzini. 1987. Parameters and learnability in binding theory. In: T. Roeper and E. Williams (eds), Parameter setting. 41-76. Dordrecht: Reidel. Williams, E. 1982. Another argument that passive is transformational. Linguistic Inquiry 13. 160-163.

Nativist and Functional Explanations in Language Acquisition James R. Hurford University of Edinburgh

1. PRELIMINARIES

1.1. Setting and Purpose Current theories of language acquisition and of linguistic universals tend to be polarised, adopting strong positions along dimensions such as the following: formal (or nativist) versus functional; internal versus external explanation; acquisition of language versus acquisition of communication skills; specific faculté de langage versus general cognitive capacity. As with many enduring intellectual debates, there is much that is convincing and plausible to be said on each side. Some works are very polemical, apparently conceding little merit in the opposing point of view. Some so-called 'functional' explanations of language universals, which appeal to properties of performance mechanisms, e.g. the human parser, miss the important point that these mechanisms are themselves innate and as much in need of explanation as the properties of the linguistic system. Another class of proposed functional explanations for language universals, which appeal to the grammaticalisation of discourse patterns, fail to locate this mechanism in the life-cycle of individual language-knowers. On the other hand, some nativist explanations imply that they are complete, having finally wrapped up the business of explaining language acquisition, missing the point that the demand for explanations never ceases, and that the 'solution' to any given puzzle immediately becomes the next puzzle. The appearance of a direct confrontation between nativist and functional styles of, or emphases in, explanations of language acquisition and linguistic universals was greater in the 1970s than it has been recently, as Mallinson (1987) emphasises. Golinkoff and Gordon (1983) give a witty, but fairly accurate, historical account of the pendulum-swings and emphasis-shifts in the debate since the inception of generative grammar. Regrettably, the embattled spirit of the barricades survives in some quarters, as in Newmeyer's (1983) review of Givon (1979), itself a sharp polemic, and in the exchange between Coopmans (1984) and Hawkins (1985). In an area where polemic is so rife, the truth-seeker can be distracted or misled by a number of false trails which it is as well to be able to

86

James R. Hurford

recognize in advance. The following are some types of distraction to be ever vigilant for: (1) Unannounced theory-laden use of everyday terms, such as 'language', or 'universal' (for instance, using 'language' to mean just the unmarked core grammar, or excluding phonology); (2) The assumption of a monolithic research enterprise, such that a criticism of any single aspect of it is taken as a blanket attack on the whole; and (3) Sheer mistaking of an opposing position, taking it to be something other (even the opposite) of what it really is (a distressingly frequent type of mistaking involves elementary failure to distinguish between 'all' and 'some' in an opponent's exposition). I assume that the readership of this book will not consist wholly, or even largely, of convinced generative linguists, but will include people such as psychologists studying language acquisition, linguists with a more anthropological emphasis, philosophers who ponder issues of language structure and use, sociolinguists, and theorists of historical language change, to all of whose work logical issues in language acquisition are relevant. Being concerned with outlining a synthesis of approaches accessible to workers in these different areas, my points will typically be at a quite general level, and I will often resort to quoting relevant work from the various fields. The distinctions I discuss will tend to be broad distinctions between domains of study, rather than the finer distinctions identified by workers within domains. Seekers after very specific proposals about models and mechanisms will not find them here. But, at this general level, I will propose a model for the interaction of language use and language acquisition, in which I believe all students of language, from psycholinguists through 'core' linguists to sociolinguists and historical linguists, will be able to identify a part which is theirs. A colleague has likened this attempt at synthesis to waving a flag in the no-man's-land between two entrenched armies shooting at each other, with the consequent likelihood of finding oneself full of bullet-holes. But the military metaphor is, one hopes, inappropriate to scholarly work. Synthesizing, integrating work must be attempted. This is not to discourage any individual researcher from trying to mount a strong case that suchand-such an aspect of language should be attributed to the influence of the innately structured LAD (or alternatively to what I shall call the Arena of Use), nor to dissuade any rival researcher from trying to demolish such a case, on theoretical or empirical grounds. Indeed such efforts, locally partisan as they are, are the sine qua non of the growth of knowledge in the field. What I am trying to discourage is a dismissive globally partisan, academically totalitarian, kind of view, that holds that explanations from innateness (or, for the opposing partisan, from use) are simply not worth serious consideration, on either theoretical or empirical grounds.

Nativist and Functional Explanations in Language Acquisition 1.2. Glossogenetic and Phylogenetic

87

mechanisms

The dimension of diachrony, only skimpily treated in previous discussions, provides a coherent background within which function and innateness can be consistently accommodated. Functional explanations of language acquisition can be compatible with nativist explanations, provided one gets the timescale right. The much-debated dichotomy, innate versus functional, is a red herring. The basic dichotomy is, rather, phylogeny versus ontogeny, and also the related nature versus culture. Function is not 'opposed' to any elements in these dyads, but exerts its influence on all. The issue of the relation between linguistic development and other (cognitive, social, etc.) experience can be set in different timescales, shortterm or long-term. Such experiences may be directly involved with linguistic development within the time-span of an individual's acquisition of his language, a period of a few years; or, at the other extreme, the outcomes of experiences of members of the species over an evolutionary timescale lead to the natural selection of individuals innately equipped to acquire systems with particular formal properties. The idea of short-term (ontogenetic or glossogenetic) timescales versus long-term (phylogenetic) timescales in explanations for linguistic facts is important to an overall view of the relation between function and innateness. The term 'glossogenetic' reflects a focus on the development and history of individual particular languages; language-histories are the rough cumulation, over many generations, of the experiences of individual language acquirers. The biological endowments of successive generations of language acquirers in the history of a language do not differ significantly, and so linguistic ontogeny, and its cumulation, language history, or glossogeny, are to be distinguished from linguistic phylogeny, the chronologically vastly longer domain, in which biological change, affecting the innate language faculty, takes place. After the present section of preliminaries, the second and main section of this paper will be devoted to the short-term, onto- or glossogenetic mechanism of functional influence on language form. A detailed exposition of the phylogenetic mechanism of functional influence on language form is, unfortunately, too long to be included in this collection of papers, and is to be published elsewhere (Hurford, 1991). The phylogenetic mechanism is mentioned briefly by Chomsky and Lasnik (1977:437), but although their note has been echoed by various subsequent authors (e.g. Lightfoot, 1983:32, Newmeyer, 1983:113, Lasnik, 1981:14, Wexler, 1981:40), it has not initiated an appropriate strand of research into functional explanations of language universals at the level of evolution of the species. Despite acceptance of the premise that functional explanations for linguistic universals do operate at the level of evolution of the species,

88

James R. Hurford

remarkably little further gets done about it. Contributions from linguists, of whatever theoretical persuasion, (e.g. Lightfoot's section "Evolution of Grammars in the Species" (Lightfoot, 1983:165-169) and Givon's chapter "Language and Phylogeny" (Givon, 1979:271-309)) remain sketchy, superficial, and anecdotal. On the other hand, a more promising sign is Pinker and Bloom's (1990) paper, in which they systematically address some of the major skeptical positions (e.g. of Piattelli-Palmarini, 1989, Chomsky, and Gould) concerning natural selection and the evolution of the language faculty. Several other articles (Hurford, 1989, 1991a, 1991b; Newmeyer, forthcoming) make a start on working out proposals about how quite specific properties of the human language faculty could have emerged through natural selection. To whet the reader's appetite, without, I hope, appearing too enigmatic or provocative at this stage, I give here a short paragraph with a diagram (Figure 1), sketching the phylogenetic mechanism, and a table (Table 1), summarising the major differences between the glossogenetic and the phylogenetic mechanisms. Deep aspects of the form of language are not likely to be readily identifiable with obvious specific uses, and one cannot suppose that it will be possible to attribute them directly to the recurring short-term needs of successive generations in a community. Here, nativist explanations for aspects of the form of language, appealing to an innate LAD, seem appropriate. But use or function can also be appealed to on the evolutionary timescale, to attempt to explain the structure of the LAD itself. The phylogenetic explanatory scheme I envisage is as follows:

Biological mutations

FACTORS INVOLVED IN SUCCESSFUL COMMUNICATION IN THE HUMAN ENVIRONMENT (THE ARENA OF USE)

Language Acquisition Device

Fig. 1.

Here biological mutations plus functional considerations constitute the explanans, and the LAD itself constitutes the explanandum. The LAD is part of the species' heredity, the result of mutations over a long period.

Nativist and Functional Explanations in Language Acquisition

89

TWO TYPES OF FUNCTIONAL EXPLANATION GLOSSOGENETIC (Sec.2 of this paper)

PHYLOGENETIC (Hurford, 1989, 1991b)

Usefulness felt:

In short term (every generation)

In long term (evolutionary timespan)

Transmission:

Cultural

Genetic

Knowledge determined by data:

Typically, well determined

Typically, poorly determined

Innovation by:

Invention, creativity of individuals

Biological mutation

Typical explanandum:

Language-specific

Universal

Competition in Arena of Use:

Between languages (Ln vs Ln+1)

Between classes of languages

Motivating analogy:

Language as a TOOL

Language as an ORGAN

Table 1.

Much of the present paper will be an extended commentary on the rubrics in this table, especially those in the 'Glossogenetic' column. Before getting down to the details of the glossogenetic mechanism in Section 2,there are a couple of general preliminary plots to be staked out, in the remainder of this section. 1.3. Competence/performance,

I-Language/E-language

Explanations differ according to what is being explained. This is a truism. But much discussion of 'explaining linguistic phenomena' uses that phrase to smother an important distinction, the distinction between grammaticality and acceptability (competence and performance, I-language and Elanguage). The distinction is central to the Chomskyan enterprise, and has been a frequent target of attack, or source of misgivings. In the literature, for instance, one finds widely-read authors writing: "The distinction between competence and performance - or grammar and speaker's behavior - is ... untenable, counterproductive, and nonexplanatory". (Givon, 1979:26) "The borderline between the purely linguistic and the psychological aspects of language ... may not exist at all". (Clark and Haviland, 1974:91)

90

James R. Hurford

"There is a whole range of different objections from sociolinguists, sometimes querying the legitimacy of drawing the [competence/performance] distinction at all". (Milroy, 1985:1)

Givon's book is still widely discussed, Herb Clark is an influential psychologist, and Lesley Milroy speaks for a body of sociolinguists for whom the competence/performance distinction itself is still a current issue. In the context of their expressed doubts about competence/performance (alternatively I-language/E-language), and concomitantly grammaticality/ acceptability, it is relevant to reassert this distinction. Despite such doubts and attacks, I will maintain here that many clear cases of the distinction exist, while conceding that there are borderline linguistic phenomena whose classification as facts of grammar or facts of use is at present problematic. Some early, perhaps overhasty, conclusions claiming to have explained aspects of grammar in functional terms can now be reinterpreted as explaining phenomena more peripheral to the grammatical system, such as stylistic preference, or acceptability. For instance, this is how Newmeyer (1980:223-226) depicts Kuno's various functional explanations: 'Kuno's approach to discourse-based phenomena has gradually moved from a syntactic one to one in which the generalisations are to be stated outside of formal grammar' (Newmeyer, 1980:224). Such reinterpretation follows shifting (and, one hopes, advancing) theories of the boundary between grammatical phenomena proper and acceptability and style. For concreteness, I will give some examples, all for Standard English, of how I assume some relevant phenomena line up: (1)

GRAMMATICAL, BUT O F PROBLEMATIC ACCEPTABILITY Colourless green ideas sleep furiously. The mouse the cat the dog chased caught ate some cheese. The horse raced past the barn fell.

(2)

UNGRAMMATICAL, AND OF PROBLEMATIC ACCEPTABILITY *He left is surprising. *The man was here is my friend

(3)

UNGRAMMATICAL, BUT OFTEN ACCEPTABLE *He volunteered three students to approach the Chairman *She has disappeared the evidence from her office

Nativist and Functional Explanations in Language Acquisition

91

A degree of relative agreement between individuals, and certainty within individuals, about the above examples does not mean that there can't be genuine borderline cases. There may well be slight differences between individuals in their genetically inherited language faculties 1 , and the input data is certainly very variable from one individual to another, as is the wider social context of language acquisition. And (any individual's instantiation of) the language acquisition device itself may not be structured in such a way as to produce a classification of all possible wordstrings with respect to their grammaticality. This classification of patterns of linguistic facts as grammatical or otherwise does not depend, circularly, on the kind of explanatory mechanism one can postulate for them, but rather primarily in practice (though by no means wholly in principle) on that classical resource of generative grammar, native speaker intuitions of grammaticality (themselves not always easily accessible). In fact, from a linguist's viewpoint, the sentences (1-3) constitute a heterogeneous bunch, conflating much more interesting distinctions which these very sentences, if aptly exploited, could well emphasise, for instance, grammaticality versus parsability, grammaticality versus first-choice parsing strategies, semantically correct versus conceptually empty sentences etc. But I am stressing here a more basic point. The grammaticality/ acceptability distinction, paralleling the competence/performance (Ilanguage/E-language) distinction, is an absolutely crucial foundation upon which the further much more interesting distinctions can be elaborated. Only if it is accepted can one progress to the more interesting distinctions. In this paper my concern is to investigate the relationships obtaining between the domain of grammar, on the one hand, and nongrammatical, e.g. processing-psychological and social, domains, on the other hand. For my purposes, as it turns out, these other domains can, at a broad general level, be lumped together, so far as their role in potential functional explanations for aspects of linguistic competence is concerned, although obviously a study with a different focus of attention would immediately separate and distinguish them. Sociolinguistics, pragmatics and discourse analysis, and psycholinguistics are disciplines with highly divergent goals and methodological styles. (Thus 'functional explanation' is likely to be interpreted in different ways by sociolinguists and psycholinguists.) Chomsky is entirely right in emphasising that a language (E-language) is an artifact resulting from the interplay of many factors. Where I differ from his judgement is in my belief that this artifact is of great interest, that it is susceptible to systematic study (once its diverse component factors are identified), and that it can in fact affect grammatical competence (Ilanguage). Given the grammaticality/acceptability distinction, and a classification,

92

James R. Hurford

however tentative, of linguistic facts according to this distinction, the search for explanations must provide appropriate explanatory mechanisms for the different kinds of linguistic phenomena. The explanatory task for grammaticality facts can be couched fairly naturally in terms of language acquisition: 'How does a person acquire a particular set of intuitive judgements about wordstrings?' But the explanatory tasks for the various diverse classes of acceptability facts are not naturally couched in terms of language acquisition. Different kinds of questions require different kinds of answers, but this does not mean that, for example, perceptual strategies can ultimately play no part in explaining how a child acquires certain grammaticality judgements. And, conversely, it does not mean that grammatical facts (competence) can play no part in processing. (To the linguist convinced of the grammaticality/acceptability distinction, processing necessarily involves grammatical facts.) But as the mechanisms which give rise to competence obviously differ in their 'end products' from the mechanisms which give rise to acceptability facts (performance), the details of the two kinds of mechanisms themselves must be different. The reasons for distinguishing competence from performance are very well set out, in partly Saussurean terminology, by Du Bois (1985). "Saussure (1959:11-23, 191ff) demarcates sharply between what he calls internal linguistics, the study of langue, and external linguistics, which encompasses such significant fields of study as articulatory phonetics, ethnographic linguistics, sociolinguistics, geographical linguistics and the study of utterances (discourse?), all of which deal with positive facts. Classical structuralism thus establishes a gulf between the two spheres, so that structuring forces or organizing principles which operate in the one domain will not affect the other. Though this formulation will be seen to be too one-sided, given its assumption that langue is in principle independent of structuring forces originating outside it, I will suggest that the distinction between internal linguistics and external linguistics nevertheless remains useful and in fact necessary. I will draw on this distinction to show how certain phenomena can be at the same time unmotivated from the generative synchronic point of view and motivated from a genuinely metagrammatical viewpoint which treats grammars as adaptive systems, i.e. both partially autonomous (hence systems) and partially responsive to system-external pressures (hence adaptive). This will be fruitful only if we recognise the existence of competing motivations, and further develop a theoretical framework for describing and analysing their interaction within specified contexts, and ultimately for predicting the resolution of their competition. This (panchronic) approach to metagrammar is part of the developing theory of what has been called the ecology of grammar ( D u Bois, 1980:273)." (1985:343-344).

The ecological metaphor is also taken up, independently, in Hurford (1987). While I am in sympathy with Du Bois's approach, and regard it as an admirably clear statement of the system/use dilemma that modern linguistics has forged for itself, I believe Du Bois has not gone as far as

Nativist and Functional Explanations in Language Acquisition

93

he might in considering the ontology of grammar. That is, he still tends, in a Saussurean way, to treat grammatical systems as abstractions, with their own laws and principles, without locating them in the minds of speakers. And he does not locate the mechanism of grammaticisation in the Chomskyan LAD, which, I believe, is where it belongs. Sociolinguists' difficulties with the competence/performance distinction stem largely, according to Milroy, from the problem of language variation. And several current models of language acquisition respond to the pervasive fact of variation by proposing that the linguistic competence acquired is itself variable. Thus Macken (1987) proposes that acquired grammars are partly 'algebraic' and partly 'stochastic'. And the 'competition model' (Bates and MacWhinney, 1987; MacWhinney, 1987a,b) assumes that: "... the 'steady state' reached by adults also contains patterns of statistical variation in the use of grammatical structures that cannot be captured by discrete rules". (Bates and MacWhinney, 1987:158)

This echoes early attempts to reconcile sociolinguistic variation with generative grammar's view of competence; cf. Labov's (1969) idea of 'variable rules', its development by Cedergren and Sankoff (1974), and critical discussion by Romaine (1982:247-251). The facts of linguistic variation and gradual linguistic change lead Kroch (1989) to propose another possibility, distinct from both the 'single discrete competence' and the 'probabilistic competence' views. "If we ask ourselves why the various contexts of a linguistic alternation should, as a general rule, be constrained to change in lock step, the only apparent answer consistent with the facts of the matter is that speakers learning a language in the course of a gradual change learn two sets of well-formedness principles for certain grammatical subsystems and that over historic time pressures associated with usage (presumably processing or discourse function based) drive out one of the alternatives". (Kroch, 1989:349)

This echoes a long tradition in linguistics (cf. Fries and Pike, 1949). It is hard, perhaps impossible, to distinguish empirically between a situation where a speaker knows two grammars or subsystems, corresponding, say, to 'New Variety' and 'Old Variety', and a situation where a speaker knows a single grammar or subsystem providing for a number of options, where these options are associated with use-related labels, 'Old' and 'New'. Plural competences would certainly be methodologically more intractable to investigate, presenting a whole new, and more difficult, ballgame for learnability theory, for instance. On the other hand, plural competences do presumably arise in genuine cases of bilingualism, and so the LAD is equipped to cope with internalizing more than one grammar

94

James R. Hurford

at a time. Perhaps plural competences are indeed the rule for the majority of mankind, and the typical generative study of singular monolithic competence is a product of concentrating on standardised languages (a point made by Milroy). The question is forced on us by the pervasive facts of statistical patterning in sociolinguistic variation, even in the usage of single individuals, and language change. And the question is highly relevant to language acquisition studies, as McCawley (1984:435) points out: 'Do children possess only one grammar at a time? Or may they possess multiple grammars, corresponding to either overlapping developmental stages, or multiple styles and registers?' In what follows I will simply assume that statistical facts belong to the domain of performance and pragmatics (e.g. rules of stylistic preference or, more globally, rules of 'code choice'), whereas facts of acquired adult grammatical competence are not to be stated probabilistically. I do not claim to have argued this assumption, or demonstrated that the variation problem must be handled in this way. But one cannot explore all the possibilities in one article, and I shall explore here how the interplay of grammar and use might be envisaged, if one banishes probabilities from the realm of competence. The research challenge then appears as the twin questions: 'How does all-or-nothing competence give rise to phenomena in which statistical distributions are apparent?' and 'How does exposure to variable data result in all-or-nothing competence?' Possibly, these are the wrong research questions to ask, but the only way to find out is by seeing how fruitful theorising along these lines turns out to be. Other researchers may pursue other assumptions in parallel. In a later subsection (2.3), I will discuss the phenomenon of grammaticalisation, in which, over time, a statistical pattern of use (as I assume it to be) gets fixed into a nonstatistical fact of grammar. 1.4. The ambiguity

of'functional'

Opponents of nativist explanations for linguistic universals often contrast the Chomskyan doctrine of an innate Language Acquisition Device with a form of explanation labelled 'functionalist'. Such functionalist explanations point to the use of language as accounting for the properties of linguistic systems. But typically in such accounts, one of two distinct aspects of 'use' is emphasised. Hyman identifies this ambiguity clearly: "Unfortunately, there is disagreement on the meaning of 'functional' as applied in this context. While everyone would agree that explanations in terms of communication and the nature of discourse are functional, it became evident in different presentations at this workshop that explanations in terms of cognition, the nature of the brain, etc., are considered functional by some but not by other linguists. The distinction appears

Nativist and Functional Explanations in Language Acquisition

95

to be that cognitive or psycholinguistic explanations involve formal operations that the human mind can vs. cannot accommodate or 'likes' vs. 'does not like', etc., while pragmatic or sociolinguistic explanations involve (formal?) operations that a human society or individual within a society can vs. cannot accommodate or likes vs. does not like". (Hyman, 1984:67-8)

The same kind of distinction between types of functional explanation is noted, but labelled differently, by Bever (1975): "There have been two major kinds of attempts to explain linguistic structure as the result of speech functions. One I shall call the 'behavioural context' approach, the other the 'interactionist' approach. The 'behavioural context approach' argues that linguistic patterns exist because of general properties of the way language is used and general properties of the mind. The interactionist approach argues that particular mental mechanisms guide and form certain aspects of linguistic structure". (Bever, 1975:585-

6)

And Atkinson (1982) makes approximately the same distinction between alternative reductive explanations for language acquisition, which he labels 'cognitive reductions' and 'social reductions'. The distinction between cognitive and social reductions (Atkinson's terms), between explanations based on an interactionist approach and those based on a behavioral context approach (Bever's terms) is by no means clear-cut. All humans have cognition and all engage in social relations; but social relations are experienced and managed via cognition (and perception). Social relations not thus mediated by perception and cognition are hard, if not impossible, to conceive. A good illustration of a 'social' principle with substantive 'cognitive' content is the Gricean Maxim of Manner, 'Be perspicuous'. This maxim is generally (by now even conventionally!) held up as an example of the influence of social considerations on language use. But 'Be perspicuous' clearly has psychological content. What is perspicuous to one kind of organism may be opaque to an organism with different cognitive structuring. As Grice's work is widely known, this statement in terms of a Gricean maxim is adequate to make the point of the interpénétration of cognitive and social 'functional' factors. Sperber and Wilson's (1986) Relevance Theory, which claims to have supplanted the Gricean model with a deeper, more general, more explanatory theory of social communication through language, lays great stress on the individual psychological factor of processing effort. 2 Speakers' discourse strategies are jointly motivated by what hearers find easy to understand (a cognitive consideration) and by a desire to communicate efficiently (a social consideration). Functional explanations can indeed have the different emphases which Hyman, Bever, and Atkinson all identify, but cognitive and social factors are often intermingled and not easy to separate.

96

James R. Hurford

An explanation of some aspect of language structure is functional to the extent that it provides an account relating that aspect of structure to some purpose for which language is used, or to some characteristic of the users or manner of use facilitating achievement of that purpose. The canonical form of a functional explanation is as in (4). (4)

X has form F because X is used by U a n d / o r for purpose P.

where some clear connection between F (the putatively useful form) and U (the user) a n d / o r P (the purpose) is articulated. The connection between form and user or purpose need not be immediate or direct but may be mediated in some way, provided the plausibility of the connection is not thereby lost. As a simple concrete example, consider a spade. Parts of its form, e.g. the sharp metal blade, relate directly to the intended purpose, digging into the earth, but other aspects of its form, e.g. its handle and its manageable weight, relate more directly to the given (human) characteristics of the user. Separating out which aspects of spade-design are purpose-motivated and which user-motivated is not easy; likewise it can also be difficult to separate out social (purpose-motivated) functional explanations of language form from psychological (user-motivated) functional explanations. For the purpose of exploring the relationship between nativist and functional explanations of linguistic phenomena, it will in fact be convenient to continue to deal in terms of a single functional domain, which has both cognitive and social components. This domain, which I will label the 'Arena of Use' and discuss in the next section, is contrasted with the 'internal' domain, the domain of facts of grammar. The Arena of Language Use must figure in any explanation of language form that can reasonably be called a 'functional' explanation.

2. GLOSSOGENETIC MECHANISM OF FUNCTIONAL INFLUENCE ON LANGUAGE FORM

2.1. The Arena of Use The familiar nativist scheme for explaining the form of grammatical knowledge is shown in Figure 2.

Nativist and Functional Explanations in Language Acquisition

Primary Linguistic Data

LANGUAGE ACQUISITION DEVICE

97

Individual Grammatical Competence

Fig. 2.

In this scheme, the grammatical competence acquired by every individual who learns a language conforms to a pattern determined by innate psychological properties of the acquirer. These innate characteristics are influential enough to impose significant patterning, not obviously discernible in the primary linguistic data, on the acquirer's internalized grammar. Whatever the primary linguistic data (within the range normally experienced by young humans) the competence acquired on exposure to it conforms to the specifications built into the Language Acquisition Device. So, across languages and cultures, adult language-knowers carry what they know in significantly similar forms, studied under the heading of Universal Grammar (UG). The short-term functional mechanism by which nongrammatical factors can in principle contribute to linguistic phenomena, and ultimately to grammatical competence, can be represented by an extra component added to the Chomskyan diagram (Figure 2), as in Figure 3 below.

Primary Linguistic Data

Individual Grammatical Competence

ARENA OF USE Fig. 3.

What is the Arena of Use? Well, it is non-grammatical, that is it contains no facts of grammar, although it relates to them. And some of it is non-

98

James R. Hurford

psychological, in the sense of being outside the domain of individual mental processes, although it receives input from these, and provides material for them. The Arena of Use does have some psychological ingredients, including those directly involved in linguistic performance. The Arena of Use is where utterances (not sentences) exist. The Arena of Use is a generalisation for theoretical purposes of all the possible non-grammatical aspects, physical, psychological, and social, of human linguistic interactions. Any particular set of temporal, spatial, performance-psychological and social coordinates for a human linguistic encounter is a point in the Arena of Use. So, for example, an address or point in the Arena of Use that I happen just to have visited might be approximately described by the phrase: 'Jim Hurford, sitting in his living room at noon on January 6th, having some cognitive trouble composing an elegant written sentence (strictly an inscription) about the Arena of Use, for an unknown readership, assumed to consist of assorted academic linguists, sociolinguists and psycholinguists.' Another address in the Arena of Use might be 'Mrs Bloggs, at the greengrocer's, asking loudly, since the grocer is a bit deaf, for 21bs of leeks'. The Arena of Use is where communication takes place. It embraces human relationships, the ways in which we organise our social lives, the objects that it is important to us to communicate about, the kinds of message it is important for us to transmit and receive. Other creatures, built differently from ourselves, would conduct their communication in, and have it shaped by, a different (though probably partly similar) Arena of Use. So, note, the Arena of Use is itself partly, in fact very largely, a product of our heredity (part of our 'extended phenotype', in Dawkins' (1982) phrase). The Arena of Use, like UG, has both absolute and statistical properties. A full description of the Arena would specify a definite, obviously infinite, range of possibilities, the coordinates of possible communicative interactions between people using language; and, within this range, the likelihood of the various possibilities being realised would be projected by various principles, in a way analogous to the role played by a theory of markedness within UG. Obviously, we are no nearer to a full description of the Arena of Use than we are to a full description of UG, but central aspects of the nature of the Arena are nevertheless relatively easily accessible for hypothesis and consideration. The Arena of Use is emphatically not 'everything there is (provided it has no grammatical import)'. And it is certainly not equivalent to, or coextensive with, 'the world' or 'the environment'. I will try to clarify. In the first place, the world is 'out there', existing somehow outside our perceptions. (I assume this, being a Realist and not an Idealist.) In knowing the world, we impose categories on it that are to a great extent our own constructs, though they presumably mesh in some way with the ways things

Nativist and Functional Explanations in Language Acquisition

99

really are 'out there'. The Arena of Use is not populated by just whatever exists out there, but (in part) by entities that exist-as-some-category. The relevant idea is put thus by Edie (commenting, as it happens, on Husserl): "For Husserl no 'object' is conceivable except as the correlate of an act of consciousness. An 'object' is thus never a thing-in-the-world, but is rather something apprehended about a thing; objects are things as intended, as meant, as taken by a subject". (Edie, 1976:5, his emphasis, quoted by Fraser, 1989:79) And Fraser elaborates this eloquently: "Many of the Objects that we encounter are presented to us as what they are through a filter of our language and culture, rather than being constituted anew by each Subject on the basis of individual experience". (Fraser, 1989:121)

The inclusion in the Arena of Use of abstract objects constituted through language illustrates how it itself (or better, its instantiation in a particular historic language community) is something dynamic and developing. Traugott (1989) discusses a diachronic tendency for meanings of words to develop from concrete denotations of objects and states of affairs to more abstract denotations 'licensed by the function of language' (Traugott, 1989:35). Thus 'everything there is', 'the world', or 'the environment' is quite different from the Life-worlds of individual subjects, speakers of a language; the Life-worlds are in some ways richer, in some ways poorer, than the actual world, although clearly there is a degree of correspondence. The Arena of Use includes the sum of entities (and classes of entities) in the Life-worlds of individual Subjects (speakers) that these subjects can talk about. (This excludes strictly private psychological entities that might be quite real for many individuals, but which they cannot talk about. One reason for not being able to talk about some experience is the lack of appropriate words and/or grammatical constructions, which is why creative writers sometimes resort to novel forms of expression.) The Arena of Use is not just a union of sets of (classes of) entities. It has structure and texture (much of which remains to be articulated by pragmatic theory). Some (but not all) of its structure is statistical, deriving from the salience (or otherwise) for numbers of speakers of particular classes of entities. Prominent classes of entities in the Arena of Use are those that everyone talks about relatively frequently. Other aspects of the Arena of Use are what Fraser, following Husserl and Heidegger, calls 'points of view' and what Lakoff (1986:49) calls 'motivation'. Humans have purposes, and employ language to manipulate other speakers to help them to achieve those purposes. There are ways in which this is typically done, which gives rise to the taxonomies of Speech Act theory. In fact, any theory of pragmatics contributes to a theory of the Arena of Use, and the categories postulated by pragmatic theorists, such as speaker,

100 James R. Hurford hearer, overhearer, deixis of various types, utterance, situation of utterance, illocution, perlocution, implicature, etc., etc. are all theoretical categories forming part of our (current) picture of the Arena of Use. The Arena of Use is in part the subject matter of pragmatics, and it would clearly be wrong to say that it is 'just everything there is', 'the world', or 'the environment'. If this were so, nothing would distinguish pragmatics from, say, a branch of physics. As for the usefulness of coining the expression 'Arena of Use', my purpose is to focus attention on a vital link in the transmission of language from one generation to the next. Chomsky's similarly ambitious expression 'Language Acquisition Device' has played an enormously important role in focussing theorists' attention on the other important link in the cycle. Clearly, it would have been unimaginative and counterproductive several decades ago to dismiss that expression on the grounds that it simply meant 'child'. It should be clear that the role of the Arena of Use is complementary to that of the LAD, not, of course, in any sense proposed as an alternative to it. And, in fact, just because of this complementarity, studies of UG actually need systematic information about the Arena of Use. Thus Lightfoot (1989a: 326) is forced to resort to a 'hunch' about whether a particular hypothetical social scenario is plausible or 'too exotic', when conducting an argument about whether "The existence of N' might be derived from a property of UG or ... might be triggered by the scenario just sketched". (Grimshaw, 1989:340, complains about the lack of independent evidence backing such hunches.) Obviously it would be too much to expect a theory of the Arena of Use to give a direct answer to this specific question, but, equally obviously, the more systematic a picture of the Arena of Use we can build up, the less we will need to rely on hunches about what the input data available to the child may be. For instance, observation of actual caretaker behaviour is a necessary empirical support to the axiom of 'no negative evidence' central to UG and learnability theory (Lightfoot, 1989a:323-324, Grimshaw and Pinker, 1989:341-342, inter alios but cf. Saleemi, this volume). And a pragmatic theory of why caretakers give little or no negative evidence, if we could get such a theory, would neatly complement the UG and learnability theories. The products of an individual's linguistic competence are filtered by the Arena of Use. In the Chomskyan scheme, the LAD acts partly as a filter. The child in some sense disregards the properties of utterances in the Primary Linguistic Data that do not conform to his innate (unconscious) expectations, the characteristics that cannot be interpreted in terms of the structure already possessed (a function recently emphasised and elaborated on by Lightfoot, 1989a. Similarly, the Arena of Use acts as a filter. Not all the products of an individual's competence serve any

Nativist and Functional Explanations in Language Acquisition

101

useful purpose, and these are either simply not uttered, or uttered and not taken up by interlocutors. At the level of discourse, the filtering function of the Arena is accepted as uncontroversial. A coherent discourse (monologue or dialogue) is not just any sequence of sentences generated by a generative grammar. The uses to which sentences are put when uttered determine the order in which they may be strung together. With the usual reservations about performance errors, interruptions, etc., sequences which do not serve useful purposes in discourse do not occur in the Primary Linguistic Data to which the child is exposed. At the level of vocabulary, the filtering function of the Arena is also uncontroversial. Words whose usefulness diminishes are uttered less frequently, eventually falling out of use. When they fall out of use, they are no longer present in the PLD and cannot pass into the competences of new language acquirers. What words pass through the cycle in Figure 3, assuming their linguistic properties present no acquisition difficulties, is almost entirely determined by considerations of use. I grant that the relation between vocabulary and use is far from simple, as academic folktales about Eskimo words for snow (cf. Pullum, 1989, Martin, 1986), and Arabic words for camel might lead the gullible to believe. But there is a large body of scholarship, under the various titles of ethnographic semantics, ethnoscience, and cognitive anthropology (cf Brown, 1984, for a recent example), building up a picture of the relation between the structure of a community's vocabulary and its external environment. Clearly the usefulness of words is one part of this picture. One example from Brown is: "The fact that warm hues cluster with white and cool hues with dark contributes to the likelihood that languages will make a "macro-white"/"macro-black" distinction in the initial encoding of basic color categories. A utilitarian factor may also contribute to this development. Basic color categories become important when people develop a need to refer to colors in a general manner. An initial "macro-white"/"macro-black" contrast is highly apt and useful since it permits people to refer to virtually all colors through use of general terms". (Brown, 1984:125)

At the level of a semantico/pragmatic typology of sentences, it also seems plausible that the existence of universal types is perpetuated through the mediation of the Arena of Use, rather than of the LAD. The three-way distinction declarative/interrogative/imperative reflects the three most salient types of speech act used in human interaction. This taxonomy and its grammatical realisation is probably passed on to successive generations via ample exemplification in the Arena of Use, necessitating no extraordinary innate powers of extrapolation from skimpy data by the LAD. A theory of the acquisition of grammatical competence, such as UG, makes

102

James R. Hurford

available a range of syntactic forms. Without reference to pragmatics, which provides a classification of the uses to which sentences may be put, there is no account of why three (as opposed to five or nineteen) types of syntactic structure are salient and typically assigned to different uses. What UG cannot account for, without recourse to a pragmatic theory is this: "There is a wealth of cross-language evidence showing the existence of three or four syntactic structures which code prototypical speech acts in any language: (a) (b) (c)

Declarative Imperative Interrogative (i) WH-question (ii) Y e s / N o question.

It is hard to find a language in which some "norm" does not exist for (a), (b), (ci) and (cii), i.e. some structural-syntactic means for keeping these four prototypes apart." (Givon, 1986:94)

We can think of UG as providing a theory of the formal/structural resources, or space, available to humans for the expression of useful distinctions. Obviously, a theory of just what distinctions are useful (pragmatic theory, theory of the Arena of Use) is also needed. That is, "One must then strive to discover the underlying socio-psychological parameters which define the multi-dimensional space within which speechact prototypes cluster". (Givon, 1986:98) Then, interesting discussion can proceed on how specific features of use tend to select specific structural features of sentence form for their expression. Givon's suggestion is that there is an iconic relation between the syntactic forms and their functions, but this clearly needs more fleshing out. Downes (1977) is an interesting paper suggesting why the imperative construction, in particular, occupies the area of syntactic space that it does, e.g. with base form of the verb and suppressed subject. A theory of grammar, such as UG, can make available sentences with null subjects and with base verb forms, but the question arises: Why are these sentences, in particular, typically used to get people to do one's bidding?. My intention is not to dispense with the theory of UG. But the allocation of individual aspects of a phenomenon to a theory of grammar-acquisition or a theory of use must be considered on its merits. Perhaps the assignment of 2nd person to the null subject of imperatives, for example, is a blank that UG can afford to leave to a theory of use. This is in fact what Beukema and Coopmans suggest: "... the position is occupied by a case-marked empty element associated with an empty topic, which receives the interpretation of addressee from the discourse". (Beukema and Coopmans, 1989:435)

Nativist and Functional Explanations in Language Acquisition

103

Beside the declarative/interrogative/imperative pragmatic typology, one could also cite the categories of person and number, which recur in all grammars, as motivated by factors in the Arena of Use. Hawkins puts it concisely: "Innateness is not the only factor to which one can appeal when explaining universals. Certain linguistic properties may have a communicative/functional motivation. If every grammar contains pronouns distinguishing at least three persons and two numbers (cf. Greenberg 1966:96), then an explanation involving the referential distinctions that speakers of all languages regularly need to draw is, a priori, highly plausible". (Hawkins, 1985:583)

The facts of grammatical person are not quite so simple. Foley (1986:6674) (while subscribing to the same functional explanation as Hawkins for distinctions of grammatical person) mentions languages without 3rd person pronouns, and Mtihlhausler and Harre (1990) claim that even 1st versus 2nd person, as usually understood, is not universal. Nevertheless Hawkins' point stands; it is not surprising that 'the referential distinctions that speakers of all languages regularly need to draw' cannot be described by a simple list, but rather require description in statistical terms of significant tendencies. Hawkins gives a number of further plausible examples, which I will not take the space to repeat. In a more recent, and important, contribution the same author accounts for universal tendencies to grammaticalise certain word orders in terms of certain (innate) parsing principles: "The parser has shaped the grammars of the world's languages, with the result that actual grammaticality distinctions, and not just acceptability intuitions, performance frequencies and psycholinguistic experimental results, are ultimately explained by it. This does not entail, however, that the parser must also be assumed to have influenced innate grammatical knowledge, at the level of the evolution of the species, as in the discussion of Chomsky and Lasnik (1977). Rather, I would argue that human beings are equipped with innate processing mechanisms in addition to innate grammatical knowledge, that the grammars of particular languages are shaped by the former as well as by the latter, and that the cross-linguistic regularities of word order that we have seen in this paper are a particularly striking reflection of such innate mechanisms for processing. The evolution of these word order regularities could have come about through the process of language change (or language acquisition): the most frequent orderings in performance, responding to principles such as EIC [Early Immediate Constituents, a parsing principle], will gradually become fixed by the grammar. One can see the kinds of grammaticalization principles at work here in the interplay between "free" word order and fixed word order within and across languages today. The rules or principles that are fixed by a grammar in response to the parser must then be learned by successive generations of speakers". (Hawkins, 1990:258)

104

James R. Hurford

Another case of the influence of phenomena in the Arena of Use on patterns of grammar is discussed in detail by Du Bois (1987). This study attributes the existence of ergative/absolutive grammatical patterning to preferences in discourse structure. The study has the merit of providing substantial statistics on these discourse preferences. The link between such discourse preferences and ergative grammatical patterning is argued for very plausibly. And Du Bois answers the obvious question 'Why are not all languages ergative?' by appealing (again plausibly, I believe) to competing motivations, discourse pressures in several directions. As a final example here of the contribution of the Arena of Use to the form of linguistic phenomena, I cite certain properties of numeral systems, in particular the universal property of being organised on a base number (often 10). There is no evidence that children somehow innately prefer numeral expressions organised in the familiar way using as a baseword the highest-valued available numeral word in the lexicon. Rather, the modern streamlined systems have evolved over long historical periods because of their practical usefulness, and they have to be deliberately inculcated into children. (This argument concerning numerals is pursued in detail in Hurford, 1987, where a computer simulation of the social interactions leading to the emergence of the base-oriented structure of numeral systems is presented.) In summary, the Arena of Use is the domain in which socially useful and cognitively usable expressions are selected to fit the worldly purposes of hearers and speakers. The Arena contributes to the form of languages in a way complementary to the contribution of the Language Acquisition Device. Languages are artifacts resulting from the interplay of many factors. One such factor is the LAD, another is the Arena of Use. The aspects of languages accounted for by these two factors are complementary. As a first approximation, one might guess that the aspects of languages due to the LAD are relatively deep, or abstract, whereas the aspects due to the Arena of Use are relatively superficial, in the sense in which the terms 'deep' and 'superficial' are typically used by generative grammarians. The terms 'deep' and 'superficial' tend to be rhetorically loaded, and imply triviality for superficial aspects of language. One need not accept such a value judgement. The deep characteristics of languages most convincingly attributed to the Language Acquisition Device are those to which the 'poverty of stimulus' argument applies, that is, characteristics which are not likely to be encountered in a sampling of primary linguistic data. Such deep characteristics are thus those which are actually least characteristic of languages, in any normal pretheoretical sense, in the sense of being least obvious. Thus the theoretical style typifying research into the contribution of the Arena of Use is to be expected, in the first place at least,

Nativist and Functional Explanations in Language Acquisition

105

to be more 'superficial' than research into UG and the LAD. But the intrinsic interest of such a theory is not thereby diminished. A full and helpful discussion of the uses of 'deep' by generative grammarians and others, and of the misunderstandings which have arisen over the term, is to be found in Chapter 8 of Chomsky (1979). Putting aside the use of 'deep' as a possible technical term applied to a level of structure (which I am not talking about here), the term 'deep' can be applied either to theories and analyses or to phenomena and data considered pretheoretically. Those aspects of languages due to the LAD seem, at first pretheoretical blush, to be 'deep', to require theories of notable complexity to account for them. These aspects of a language's structure are subtle; they are not the most obvious facts about it, and, for instance, probably get no attention in courses teaching the language, even at an advanced level. Exactly this point is stated by Chomsky: "We cannot expect that the phenomena that are easily and commonly observed will prove to be of much significance in determining the nature of the operative principles. Quite often, the study of exotic phenomena that are difficult to discover and identify is much more revealing, as is true in the sciences generally. This is particularly likely when our inquiry is guided by the considerations of Plato's problem, which directs our attention precisely to facts that are known on the basis of meager and unspecific evidence, these being the facts that are likely to provide the greatest insight concerning the principles of U G " . (Chomsky, 1986:149)

This subtlety in acquired knowledge after exposure to data in which the subtlety is not obviously present accounts for the rise of complex theories of language acquisition. On the other hand, those aspects of languages due to the Arena of Use (many of which would be located in the periphery of grammars by generativists, like irregular and suppletive morphological forms) seem not to require anything so complex - they are much less underdetermined by data, and thus require no invocation of special deep principles to account for their acquisition. My reservation about not necessarily accepting the value judgements implicit in much current usage of 'deep', stems from the association that has now become established between 'deep' and the language-acquisition problem. In a theory of language cast as a theory of language acquisition, or 'guided by the considerations of Plato's problem', the term 'deep' is applied, naturally, to aspects of language whose acquisition apparently necessitates deep analyses. In this sense, the question of how children acquire irregular morphological forms, for example, is relatively trivial, not deep; the child just observes each such irregularity individually and copies it. (Well, let's say for the sake of argument that the right answer really is as simple as that, which it isn't, clearly.) That's not a deep answer, so the question, apparently, wasn't deep. But seen from another perspective,

106

James R. Hurford

the same aspects of language could well necessitate quite deep analyses. If one casts a theory of language as a theory of communication systems 3 operating within human societies (systems transmitted from one generation to the next), then the problem of acquisition is not the only problem one faces. The kind of question one asks is, for instance: Why do these communication systems (languages) have irregular morphological forms?, Why do languages have words for certain classes of experience, but not for others? And the answer to these questions may be quite deep, or at least deeper than the answers to the corresponding acquisition questions. (A similar argument is advanced in Ch.l of Hurford, 1987) Figure 3, introducing the Arena of Use, is actually a version of a diagram given by H. Andersen (1973). Andersen's diagram looks like this:

Fig. 4.

Andersen is interested in the mechanisms of linguistic change, and makes the basic point that grammars do not beget grammars. Grammars give rise to linguistic data, which are in turn taken and used as the basis for the acquisition of grammars by succeeding generations. Lightfoot (1979) argues on these grounds that there can be no theory of linguistic change expressed as a theory directly relating one grammar to a successor grammar. A theory attempting to predict the rise of new grammars from old grammars purely on grounds internal to the grammars themselves would be attempting to make the spurious direct 'horizontal' link between GRAMMAR n and GRAMMAR w+1 in Figure 4. The zigzag in Figure 4 could be extended indefinitely across the page, representing the continuous cycle through acquired grammars and the data they generate. The LAD belongs on the upward arrows between data and grammars. The Arena of Use belongs on the downward arrows, between grammars and data. In fact Figures 3 and 4 both represent exactly the same diachronic spiral, merely differing in emphasis. Figure 3 is simply Figure 4 rotated and viewed 'from one end'. Pateman (1985), also drawing on this work of Andersen's, expresses very neatly the relationship I have in mind between grammars and social or cultural facts:

Nativist and Functional Explanations in Language Acquisition

107

"... through time the content of mentally represented grammars, which are not in my view social objects, comes to contain a content which was in origin clearly social or cultural in character". (Pateman, 1985:51)

George Miller also expresses the same thought concisely and persuasively: "Probably no further organic evolution would have been required for Cro-Magnon man to learn a modern language. But social evolution supplements the biological gift of language. The vocabulary of any language is a repository for all those categories and relations that previous generations deemed worthy of terminological recognition, a cultural heritage of c o m m o n sense passed on from each generation to the next and slowly enriched from accumulated experience". (Miller, 1981:33)

It is worth asking whether the social evolution that Miller writes of affects aspects of languages besides their vocabularies. An argument that it does is presented in Hurford (1987), especially Ch.6. It is clear that much of language structure can be explained by innate characteristics of the LAD; I do not claim that all, or even 'central' (according to some preconceived criterion of centrality) aspects of languages can be explained by factors in the Arena of Use. Bates et al. (1988:2356) conclude: "we have found consistent evidence for 'intraorganismic' correlations, i.e. nonlinguistic factors in the child that seem to vary consistently with aspects of language development". Such factors belong to the Arena of Use, as defined here, but so far as is yet known, affect only development, and not the end product, the content of adult grammars. On 'extraorganismic' correlations, Bates et al. conclude: "This search for social correlates of language has been largely disappointing". (1988:236). At a global level, one should not be 'disappointed' or otherwise at how scientific results turn out. The question of interest is: ' What aspects of language structure are attributable to the innate LAD, and what aspects to the Arena of Use?' It seems likely that the search for influences of the Arena of Use on acquired grammars will be least 'disappointing' in the marked periphery of grammar, as opposed to the core, as the core/ periphery distinction is drawn by UG theorists. 2.2. Frequency, statistics and language acquisition There is serious disagreement on the role to be played by statistical considerations in the theory of language acquisition. The tradition of learnability studies from Gold (1967), through Wexler and Culicover (1980), to such discussions as Lightfoot (1989a), assumes, but of course does not demonstrate, that statistical frequencies are totally alien to language acquisition. Theorems are derived, within a formal system, from axioms,

108

James R. Hurford

whose truth may perhaps be taken for granted by the inventor of the system, but which the system itself can in no way guarantee to be true. The theorems of learnability theory are derived in systems which assume a particular type of definition of 'language', in particular, languages are assumed not to have stochastic properties. But, under a different definition of 'language', different theorems are provable, showing that frequencies in the input data can be relevant to language acquisition. See, for example, Horning (1969), and comments by Macken (1987:391). But, even with a nonstochastic definition of the adult competence acquired, it is still easily conceivable that frequency factors in the input should influence the process of acquisition. Pinker (1987), for example, assumes that adult competence is nonprobabilistic, but proposes a model of acquisition in which exposure to a piece of input data results in the 'strengths' of various elements of the grammar being adjusted, usually being incremented. The point is that in Pinker's proposal one single example of a particular structure in the input data does not automatically create a corresponding all-or-nothing representation in the child's internal grammar; it can take a number of exposures for the score on a given element to accumulate to a total of 1. Presumably, if that number of exposures isn't forthcoming in the input data, that element (rule, feature, whatever) doesn't get into the adult grammar. Learnability theory typically operates with an assumption that the learning device is 'one-memory limited'. This is the assumption that "the child has no memory for the input other than the current sentence-plus-inferredmeaning and whatever information about past inputs is encoded into the grammar at that point". (Pinker, 1984:31)

But the success of learnability theory does not depend on the assumption that its 'one-memory inputs' correspond to single events in the experience of a child. It is quite plausible that there is some pre-processing front end to the device modelled by learnability theory, such that an accumulation of experiences is required for the activation of each one-memory input. Likewise it is easy to envisage that the setting of parameters in the G B / UG account needs some threshold number (more than one) of exemplars. If there were some theorem purporting to demonstrate that this is alien to language acquisition, one would need to examine carefully the relevant axioms and definitions of terms, to see if they made assumptions corresponding appropriately to data uncovered by real acquisition studies. There are studies revealing relationships between acquired (albeit interim) grammars and statistical properties of the input.

Nativist and Functional Explanations in Language Acquisition

109

"One consistent and surprising characteristic of early phonological grammars is their close relationship to frequency and distributional characteristics of not the whole language being learned but the specific input. ... (see, for example, Ingram, 1979 on French; Itkonen, 1977 on Finnish; Macken, 1980 on English and Spanish)". (Macken, 1987:385) "... certain acquisition data in conjunction with an interpretation of the relevant evidence and correlations show that there are stochastic aspects to language acquisition, like sensitivity to frequency information". (Macken, 1987:393) "... Gleitman et al. (1984) cite several studies showing that the development of verbal auxiliaries is affected by the statistical distribution of auxiliaries in maternal speech. In particular, mothers who produce a large number of sentence-initial auxiliaries ... tend to have children who make greater progress in the use of sentence-internal auxiliaries ... Because this auxiliary system is a peculiar property of English, it cannot belong to the stock of innate linguistic hypotheses. It follows that auxiliaries have to be picked up by some kind of frequency-sensitive general learning mechanism". (Bates et al., 1988:62)

There are several studies indicating the influence of word-frequency on internalised phonological forms. "Neu (1980) found that adults delete the / d / in 90 percent of their productions of and, compared to a 32.4 per cent rate of / d / deletion in other monomorphemic clusters; ... Fidelholtz (1975) has observed less in the way of perceptible vowel reduction for frequent words, and Koopmans-van Beinum and Harder (1982/3) have confirmed this in the laboratory. The frequency-reducibility effect evidently holds even where syllabic and phonemic length are equated (Coker, Umeda and Browman 1973; Wright 1979), and as the effect has little to do with differences in the information content or predictability of high and low frequency words (Thiemann 1982), their different reducibility suggests that frequent (i.e. familiar) words may be stored in reduced form. [Footnote:-] Though it is not my purpose here to deal with the child's role in phonological change, my discussion here ... has an obvious bearing on this subject". (Locke, 1986:248; footnote, 524)

In the framework advanced here, either the rule deleting / d / is not a rule of phonological competence, but belongs to the Arena of Use, or, if it is a rule of phonological competence, it is an optional rule, with applicability sensitive to factors in the Arena of Use (e.g. speed of speech)). Further evidence of a relationship between word frequency and internalised grammars is provided by Moder (1986): "High frequency forms were found to be poorer primes of productive patterns than medium frequency forms. Furthermore, the real verb classes which showed some productivity were those with fewer high frequency forms. Because high frequency forms are often rote-learned [Bybee and Brewer, 1980], they are less likely to be analysed and related morphologically to the other members of their paradigm." (Moder, 1986:180)

Phillips (1984) discusses two distinct kinds of historical lexical phonological change, both clearly correlated, in different ways, with word-frequency:

110

James R. Hurford

"Changes affecting the most frequent words first are motivated by physiological factors, acting on surface phonetic forms; changes affecting the least frequent words first are motivated by other, non-physiological factors, acting on underlying forms". (Phillips, 1984:320)

In a generative view of sound change, just as in the view I am advancing here, a sound change cannot 'act on surface phonetic forms', since what differs significantly from one generation to the next is speakers' grammars, and these contain underlying phonological forms and phonological rules, but no direct representation of surface phonetic forms. Phillips does not discuss the micro-implementation of these sound changes at the level of the individual's acquisition of language, but a straightforward interpretation of her results is as follows. Physiological factors (in the Arena of Use) produce phonetically modified forms, whose frequency gives rise, in the language-acquiring generation, to internalised underlying forms closer to the observed phonetic forms. On the other hand, non-physiologically motivated changes arise from what Phillips, following Hooper (1976), calls 'conceptually motivated change', i.e. some kind of reorganisation of the grammar for purposes of maximisation or achievement of some internal property. But these changes, apparently, cannot fly in the face of strong evidence on pronunciation coming from the Arena of Use. Only where such evidence from the Arena is very slight, as with low-frequency words, can the internal grammar reorganisation, for these cases, override the input evidence. Thus, frequency factors from the Arena of Use affect the shape of evolving languages, both positively (pressing for change) and negatively (resisting change). The argument against the relevance of statistical considerations has another strand, which contrasts the subtlety, speed and effortlessness of our grammatical judgments with the poverty of our statistical intuitions, even the most elementary ones (this argument might cite research by Amos Tversky and Daniel Kahneman). There are several points here. Firstly, it is possible to exaggerate the subtlety, speed, and effortlessness of our grammatical judgments. Chomsky points out in many works how our grammatical knowledge needs to be 'teased out' (in the phrase used in Chomsky, 1965). For instance, "Often it is not immediately obvious what our knowledge of language entails in particular cases" (Chomsky, 1986:9), and "... it takes some thought or preparation to see that (13) has the interpretation it does have, and thus to determine the consequences of our knowledge in this case" (ibid: 11). A second point is that the relevant human frequency monitoring abilities are not poor, but quite the contrary, as a seminal publication in the psychological literature shows.

Nativist and Functional Explanations in Language Acquisition

111

"People of all ages and abilities are extremely sensitive to frequency of occurrence information. ... [In] the domain of cognitive psychology ... we note that the major conclusion of this area of research stands on a firm empirical base: The encoding of frequency information is uninfluenced by most task and individual difference variables. As a result, memory for frequency shows a level of invariance that is highly unusual in memory research. This is probably not so because memory is unique but because memory researchers have paid little attention to implicit, or automatic, information acquisition processes. Here we demonstrated the existence of one such process. We also showed its implications for the acquisition and utilisation of some important aspects of knowledge". (Hasher and Zacks, 1984:1385)

Hasher and Zacks also briefly discuss the relation of their work to that of Tversky and Kahneman; they conclude "... the conflict between our view and that of Tversky and Kahneman is more apparent than real" (p. 1383) Thus far, my arguments have been that statistical patterns in the input can and do affect the content of the acquired competence, perhaps especially where the language changes from one generation to the next (i.e. where the acquired competence differs from the competence(s) underlying the PLD). There is another, powerful, argument indicating the necessity, for language acquisition to take place at all, of a certain kind of statistical patterning in the input data. This involves what has been called the 'Semantic Bootstrapping Hypothesis', discussed in detail by Pinker (1984), but advanced in various forms by several others. Briefly, the Semantic Bootstrapping Hypothesis states that the child makes use of certain rough correspondences between linguistic categories (e.g. Noun, Verb) and nonlinguistic categories (e.g. discrete physical object, action) in order to arrive at initial hypotheses about the structure of strings he hears. Without assuming such correspondences, Pinker argues, the set of possible hypotheses would be unmanageably large. This seems right. It is common knowledge, of course, that there is no one-to-one correspondence between conceptual categories and linguistic categories - any such correspondence is statistical. Pinker (1984:41) lists 24 grammatical elements that he assumes correspond to nonlinguistic elements. (In Pinker, 1989 the background to the hypothesis is modified somewhat, but not in any way that endangers the main point.) Now, according to the Semantic Bootstrapping Hypothesis, if these correspondences are not present in the experience of the child, grammar acquisition cannot take place. UG theory characterises a class of possible grammars. These grammars, as specified by UG, make no mention of nonlinguistic categories. Of course, for the grammars to be usable, nonlinguistic categories must be associable with elements of a grammar. For instance, the lexical entry for table must, if a speaker is to use the word appropriately, get associated with the nonlinguistic, experiential concept of a table (or tablehood, or whatever).

112 James R. Hurford But UG theory makes no claim about how the nonlinguistic categories are related to elements of grammars. A possible grammar, in the UG sense, might be considerably complex, and yet not contain any elements that happened to be associated with concrete physical objects, or actions, for example. And the sentences generated by such a grammar could in fact still be usable, say for abstract discourse, if, miraculously, a speaker had managed to learn it. Such a speaker could, for instance, produce and interpret such sentences as Linguistic entities correspond roughly to nonlinguistic entities, or Revolutionary new ideas are boring. But he could not talk about physical objects or actions. And if the Semantic Bootstrapping Hypothesis is true, his speech could not constitute viable input data for the next generation of learners. Thus a theory which aims to account for the perpetuation of (universals of) language across generations, via the innate LAD, actually requires specific conditions to be met in the Arena of Use. These conditions are not, as it happens, absolute, but are statistical. Of course, I do not claim that statistical properties of input are the only ones relevant to the acquisition of competence. I agree with Lightfoot's position: "It has long been known that not everything a child hears has a noticeable or longterm effect on the emergent mature capacity; some sifting is involved. Some of the sifting must surely be statistical, some is effected through the nature of the endowed properties ..." (Lightfoot, 1989b:364)

Facts of grammar are likely to be distributed along a dimension according to whether their acquisition is sensitive to frequency effects in the input data. Some aspects of grammar may involve very rapid fixing (once the child is 'ready') on the basis of very little triggering experience. Other aspects of grammar may be harder to fix, requiring heavier pressure (in the form of frequency, among other things) from the input experience. This suggested dimension is a graded version of Chomsky's binary core/ periphery distinction. Chomsky seems to acknowledge the greater role of input data for the acquisition of the periphery of grammar: "... we would expect phenomena that belong to the periphery to be supported by specific evidence of sufficient 'density'..." (Chomsky, 1986:147)

Whether or not Chomsky intended frequency considerations to contribute to this 'density', there is no principled reason why they should not. As Pinker writes: "Ultimately no comprehensive and predictive account of language development and

Nativist and Functional Explanations in Language Acquisition

113

language acquisition can avoid making quantitative commitments altogether. After all, it may turn out to be true that one rule is learned more reliably than another only because of the steepness of the relevant rule strengthening function or the perceptual salience of its input triggers". (Pinker, 1984:357)

Pinker then states a methodological judgement that 'For now there is little choice but to appeal to quantitative parameters sparingly'. I share his apprehension about the possibility of 'injudicious appeals to quantitative parameters in the absence of relevant data', but the solution lies in making the effort to obtain the relevant data, rather than in prejudging the nature (statistical or not) of the theories that are likely to be correct. 2.3. Grammaticalisation, syntacticisation, phonologisation Previous work has identified a phenomenon of'grammaticalisation', dealing precisely with historical interactions between the Arena of Use and individual linguistic competences. Some such work is vitiated by a misguided attempt to abolish the competence/performance distinction. Givon (1979:26-31) surveys a number of cases in which, on one view of grammar (a view Givon appears emphatically not to hold) "... one may view a grammatical phenomenon as belonging to the realm of competence in one language and performance-text frequency in another" (26). Givon's examples are: (i) the definiteness of subjects of declarative clauses, obligatory in some languages, but merely preferred in others; (ii) the definiteness of referential objects of negative sentences, obligatory in some languages, but merely preferred in others; and (iii) the lack of an overt agent phrase with passive constructions, obligatory in some languages, but merely the preferred pattern in others. The preferences involved can be very strong, but in the languages where the facts seem not to be a matter of absolute rule, but of preference, one can find isolated examples of the pattern that would be ungrammatical in the other language. In precisely similar vein, though not sharing Givon's conclusions, Hyman (1984) writes that he has been "... intrigued by a puzzling recurrent pattern which can be summarized as in (1)

a. b.

Language A has a [phonological, phrase-structure, transformational] rule R which produces a discrete (often obligatory) property P; Language B, on the other hand, does not have rule R, but has property P in some (often nondiscrete, often nonobligatory) less structured sense". (Hyman 1984:68)

And Corbett (1983) in an impressively documented study gives many instances where one Slavic language has an absolute rule which is paralleled

114

James R. Hurford

by a statistical tendency in some other Slavic language. One such case is: The agreement hierarchy attributive - predicate - relative pronoun - personal pronoun "In absolute terms, if semantic agreement is possible in a given position in the hierarchy, it will also be possible in all positions to the right. In relative terms, if alternative agreement forms are available in two positions, the likelihood of semantic agreement will be as great or greater in the position to the right than in that to the left." (Corbett, 1983:10-11)

Givon offers an alternative view to the one quoted above: "Or one may view the phenomenon in both languages in the context of 'communicative function', as being essentially of the same kind. The obvious inference to be drawn from the presentation is as follows: If indeed the phenomenon is of the same kind in both languages, then the distinction between competence and performance - or grammar and speaker's behaviour - is (at least for these particular cases) untenable, counterproductive, and nonexplanatory." (Givon, 1979:26)

This passage, like other polemical passages in linguistics, is a curious mixture of over- and understatement. It ends like a Beethoven symphony, with repeated heavy chords, slightly varied, but united in their effect 'untenable, counterproductive, and nonexplanatory'. But immediately before is the weakening parenthetical caveat '(at least for these particular cases)', and the whole conclusion is in fact embedded in a conditional, ' I f indeed the phenomenon is of the same kind in both languages' [emphasis added, JRH]. So, if the condition is not met and the phenomena are not of the same kind in both languages, the three big guns 'untenable, counterproductive, and nonexplanatory' aimed at the competence/performance distinction don't actually go off. And, even if the condition is met, they may only be aimed at the distinction 'for these particular cases'. Much of Givon's book reflects this kind of rhetorical mixture. The message, if interpreted as urging alternative emphases in linguistic study, is entirely reasonable; more work should be done on the relation between communicative, pragmatic, and discourse phenomena and grammar, and this, thanks to the efforts of people like Givon, is beginning to happen. Clark and Haviland, 1974 is another work in which a reasonable argument for emphasis on discourse study is in places rhetorically inflated to a claim that "the borderline between the purely linguistic and the psychological aspects of language... may not exist at all", (p.91).) But the relation between grammar on the one hand and discourse phenomena on the other cannot be studied if the two sets of phenomena actually turn out to be the same

Nativist and Functional Explanations in Language Acquisition

115

thing, as Givon appears in places to believe. There are good reasons to maintain a distinction between facts of grammar and facts of discourse, and Givon manages throughout his book to write convincingly as if the distinction were valid. What is of great interest is the parallelism between the two domains, illustrated by Givon, Hyman, and Corbett, as cited at the beginning of this section. Absolute grammatical rules in one language paralleled closely by statistical discourse preferences in another language may seem something of a puzzle. But the puzzle can be relatively easily resolved. Let me risk giving a nonlinguistic analogy, asking the reader to make the usual mutatis mutandis allowances necessary for all analogies. Some people eat a variety of foods, but, without having made any decision on the matter, happen hardly ever to eat meat; other people are vegetarians by decision, though sometimes they may accidentally eat meat. Some people, as a matter of habit, drink no alcohol; for others, this is not a matter of habit, but of principle. Some people are pacific by nature; others are pacifists on principle. The principled vegetarians, teetotallers and pacifists have made conscious absolute decisions which are parallel to the statistical behavioural tendencies of certain other people. But there is a valid distinction to be made between the two categories. This distinction is not particularly obvious from mere observation of behaviour. But as humans, we have the benefit of (some) self-knowledge, and we know that there is a difference between a principled vegetarian and a person who happens hardly ever to eat meat, and between a principled pacifist and a pacific person. No analogy is entirely apt, however. This one suffers in at least two ways. Firstly, speakers of a language do not normally make conscious decisions, like the pacifist or the vegetarian, about their own rules of grammar; and secondly, the vegetarian/teetotaller/pacifist analogy distinguishes between individuals in the same community, whereas rules of grammar tend to be shared by members of a speech-community. What I hope this analogy demonstrates is that similar overt patterns of behaviour can be attributed to different categories, such as fact of discourse, or fact of grammar. The categories may also be historically related, as I believe discourse and grammar are, but they are not now a single unified phenomenon. I assume, then, that there is factual content to the notion of following an internalised rule. Chomsky (1986), in a lengthy and cogent discussion, disposes quite satisfactorily, in my view, of the Wittgensteinian objections, taken up by Kripke (1982), to attribution of rule-following by other organisms, be they fellow-speakers of one's language, foreign humans, or even other animals. Wittgensteinians (among whom one would include, for instance, Itkonen, 1978) have often objected to the generativists'

116 James R. Hurford interpretation of rules of language as essentially belonging to the psychology of individuals, and a generativist response to this theme of Wittgenstein's is now satisfactorily articulated. (Perhaps the delay in responding arose from the enigmatic style of Wittgenstein's original presentation, and Kripke's reformulation gave the clarity needed for a careful rebuttal.) I uphold the view that rules of language belong to individual psychology, and should not in any collective sense be attributed to communities (e.g. as social norms). Thus far, I agree completely with Chomsky's position on rules and rule-following. But now here is where we part company: "reference to a community seems to add nothing substantive to the discussion" (Chomsky, 1986:242). I maintain, on the contrary, that communities play a role in determining what rules an individual acquires (which is obvious), and, more generally, that general facts about human communal life play a role in determining the kinds of rules that individuals born into any human community acquire. Pateman expresses the idea so well that his words are worth repeating: "... through time the content of mentally represented grammars, which are not in my view social objects, comes to contain a content which was in origin clearly social or cultural in character." (Pateman, 1985:51)

The historical mechanism by which facts of discourse 'become' facts of grammar is often labelled 'grammaticalisation'. To prevent confusion, it should be stressed that the result of this process does not necessarily involve a class of previously ungrammatical strings becoming grammatical. The converse process can also occur. What gets grammaticalised is a pattern, or configuration of facts, not some class of strings which happens to participate in such a pattern. The following are the main interesting possibilities, in terms of two classes of strings, A and B, which are in some sense functionally equivalent (e.g. (partially) synonymous). (5)

A and B are both grammatical, but A is preferred in use. Diachronic change T in either direction. | A is grammatical, and B is ungrammatical, though B may occur in use

Change in either direction involves a new fact of grammar emerging, which is why such changes are aptly called 'grammaticalisation'. But only change in one direction (upward in (5)) involves previously ungrammatical strings becoming grammatical. Change in either direction would account for the

Nativist and Functional Explanations in Language Acquisition

117

parallelisms noted by Givon, Hyman, and Corbett. Another possibility is: (6)

A and B, both grammatical, are wholly equivalent in meaning and use. Diachronic change T in either direction. | A and B, both grammatical, but differ slightly in meaning and use.

As the relation of surface forms to their linguistic meanings is a matter of grammatical competence, this is also a case of the emergence of a new fact of grammar, and aptly called 'grammaticalisation'. How does the mechanism of grammaticalisation work and how does it relate to the question of nativist versus functional explanations? I beg leave to quote myself.4 "In the model proposed, individual language learners respond in a discrete all-or-nothing way to overwhelming frequency facts. Language learners do not merely adapt their own usage to mimic the frequencies of the data they experience. Rather, they 'make a decision' to use only certain types of expression once the frequency of those types of expression goes beyond some threshold. At a certain point there is a last straw which breaks the camel's back and language learners 'click' discretely to a decision about what for them constitutes a fact of grammar. What I have in mind is similar to Bally and Sechehaye's suggestion about Saussure's view of language change. 'It is only when an innovation becomes engraved in the memory through frequent repetition and enters the system that it effects a shift in the equilibrium of values and that language [langue] changes, spontaneously and ipso facto' (Saussure, 1966:143n). Bever and Langendoen (1971:433) make the same point nicely by quoting Hamlet: 'For use can almost change the face of nature'". (Hurford, 1987:282-3, slightly adapted)

Beyond the kind of vague remarks cited above, no-one has much idea of how grammaticalisation works. Givon's book documents a large number of interesting cases, but his account serves mainly to reinforce the conclusion that grammaticalisation happens, rather than telling us how it happens. And of course the fact that it does happen, that aspects of performance get transmuted into aspects of competence, reinforces, rather than undermines, the competence/performance distinction. But one thing that is clear about grammaticalisation is that the LAD plays a vital part. This emerges from Givon's discussion of Pidgins and Creoles, in which the discrete step from Pidgin to Creole coincides with language acquisition by the first-generation offspring of Pidgin speakers.

118

James R. Hurford

"Briefly, it seems that Pidgin languages (or at least the most prevalent type of Plantation Pidgins) exhibit an enormous amount of internal variation and inconsistency both within the output of the same speaker and across the speech community. The variation is massive to the point where one is indeed justified in asserting that the Pidgin has no stable syntax. No consistent "grammatical" word-order can be shown in a Pidgin, and little or no use of grammatical morphology. The rate of delivery is excruciatingly slow and halting, with many pauses. Verbal clauses are small, normally exhibiting a oneto-one ratio of nouns to verbs. While the subject-predicate structure is virtually undeterminable, the topic-comment structure is transparent. Virtually no syntactic subordination can be found, and verbal clauses are loosely concatenated, usually separated by considerable pauses. In other words, the Pidgin speech exhibits almost an extreme case of the pragmatic mode of communication. In contrast, the Creole - apparently a synthesis di novo [sic] by the first generation of native speakers who received the Pidgin as their data input and proceeded to "create the grammar" - is very much like normal languages, in that it possesses a syntactic mode with all the trimmings ... The amount of variation in the Creole speech is much smaller than in the Pidgin, indistinguishable from the normal level found in "normal" language communities. While Creoles exhibit certain uniform and highly universal characteristics which distinguish them, in degree though not in kind, from other normal languages, they certainly possess the entire range of grammatical signals used in the syntax of natural languages, such as fixed word order, grammatical morphology, intonation, embedding, and various constraints". (Givon, 1979:224)

This passage makes the case so eloquently for the existence of an innate Language Acquisition Device playing a large part in determining the shape of normal languages that one would not be surprised to tind it verbatim in the introduction to a text on orthodox Chomskyan generative grammar. In my terms, the prototypical Pidgin is a hybrid monstrosity inhabiting the Arena of Use, limping along on the basis of no particular shared core of individual competences. The main unifying features it possesses arise from its particular spatial/temporal/social range in the Arena of Use. When a new generation is born into this range, and finds this mess, each newborn brings his innate linguistic faculty to bear on it and helps create, in interaction with other members of the community, the grammar of the new Creole. The picture just given is, by and large, that of Bickerton's Language Bioprogram Hypothesis (Bickerton, 1981), and is probably correct in broad outline, if no doubt an oversimplification of the actual facts. "Usually, however, the trigger experience of original Creole speakers is shrouded in the mists of history, and written records of early stages of Creole languages are meagre." (Lightfoot, 1988:100) A vast amount of empirical research into the creolisation process needs to be done before interesting details become discernible, but clearly the focal point of the process is the point where the innate LAD meets the products of the Arena of Use. The step from a Pidgin to a Creole is an extreme case of many simultaneous

Nativist and Functional Explanations in Language Acquisition

119

grammaticalisations across virtually the whole sweep of the (new) language. Creolisation is massive grammaticalisation. But it is also, due to the historical rootlessness of the Pidgin, grammaticalisation with a very free hand. The LAD can impose its default values against weak opposition from the Pidgin PLD. In discussing grammaticalisation, I do not presuppose that the input to the process is necessarily some pattern evident in use. My position is that grammaticalisation is the creation, by the LAD, of new facts of grammar. Where the input is chaotic, the LAD has a very free hand, and the new facts of grammar reflect the LAD's influence almost solely. But where patterns of use exist in the input data, the new facts of grammar may in certain instances reflect those patterns. We can call these latter cases 'grammaticalisations of patterns of use', and the former (dramatic creole) cases 'grammaticalisations from nothing'. Creoles are in some sense more natural than languages with long histories. Languages with long histories become encrusted with features that require non-default setting from the LAD, and even rote-learning. These encrustations are due to innovations in the Arena of Use over many generations. Many of these developments can be said to be functionally motivated. I have already mentioned in passing several historical studies (Bever and Langendoen, 1971, Phillips, 1984) which make at least prima facie cases for the influence, across time, of use on structure. And in section 2.6, I will add to the list of recent historical linguistic studies which point to the role of the Arena of Use in determining, at least in part, the contents of grammatical competence. In these cases, the languages have drifted, due to pressures of use, to become, in some sense, historically more 'mature' than a new creole. It seems reasonable to suppose that sheer statistical frequency of particular patterns in the Arena of Use will play some part in determining what grammatical rules will be formed. This is one way in which a parallelism between discourse patterns and grammatical rules would arise. But of course the LAD is not merely quantitatively, but also qualitatively selective. It is not the case that any, i.e. every, frequent pattern becomes grammaticalised. If this were so, the most common performance errors, hesitation markers and such like would always get grammaticalized, which of course they often don't. (But note that hesitation markers do tend to become fitted into the vowel system of the dialect in question, i.e. to become phonologised. Cf. the various hesitation vowels in RP ([3:]), Scots English ([e:]), and French ([0:]).) I believe that Lightfoot, in his 1988 paper, somewhat oversimplifies the relation between the qualitative and the quantitative selectivity of the LAD in the following remarks:

120

James R. Hurford

"The most obvious point is that not everything that the child hears 'triggers' a device in the emerging grammar. For example, so-called 'performance errors' and slips of the tongue do not entail that the hearer's grammar be amended in such a way as to generate such deviant expressions, presumably because a particular slip of the tongue does not occur frequently enough to have this effect. This suggests that a trigger is something that is robust in a child's experience, occurring frequently. Children are typically exposed to a diverse and heterogeneous linguistic experience, consisting of different styles of speech and dialects, but only those forms which occur frequently for a given child will act as triggers, thus perpetuating themselves and being absorbed into the productive system which is emerging in the child, the grammar." (Lightfoot, 1988:98)

This seems to equate 'potential trigger experience' with 'frequent experience'. Lightfoot has now developed his ideas on the child's trigger experience further (Lightfoot, 1989), but he still holds that some statistical considerations are relevant. While, with Lightfoot, I believe that frequency in the Arena of Use is an important determinant of the grammars that children acquire, there must also be substantial qualitative selectivity in the LAD. Some aspects of competence can be picked up on the basis of very few exemplars, while the LAD stubbornly resists acquiring other aspects for which the positive examples are very frequent. The particular qualitative selectivity of the LAD is what is studied under the heading of grammatical universals, or UG. 2.4. The role of invention and individual creativity Prototypical short-term functional explanations involve the usefulness of some aspect of a language making itself felt within the time a single individual takes to acquire his linguistic competence (although I shall later mention a version of the same basic mechanism that happens to take somewhat longer). This period may vary from a dozen years, for grammatical constructions, to a whole lifetime, for vocabulary. But, in the prototypical case, a short-term functional explanation involves postulating that each individual acquiring some language recognizes (perhaps unconsciously) the usefulness of some linguistic element (word, construction, etc.) and adds that item to his competence because it is useful. Some universal facts of vocabulary, such as the fact that every human language has at least one word with a designatum in the water/ice/sea/river area, can be illuminated in this way, as can also many language-particular facts, such as those of color, plant, and animal taxonomies worked on in detail by the 'ethnographic semantics' movement (e.g. Brown, 1984). Thus, those aspects of languages for which short-term functional explanations are available are characteristically transmitted culturally. Individuals actually learn these aspects of their language from other members of their com-

Nativist and Functional Explanations in Language Acquisition

121

munity. They are not innate. Such aspects of languages, therefore, are typically well-determined by the observable data of performance, since they need to be sufficiently obvious to new generations to be noticed and adopted. Obviously, quite a lot is innate in the lexicon too. For instance, no single verb can mean 'eat plenty of bread and...', 'persuade a woman that...', 'read many books but not...'. The constraints on possible lexical meanings are strong and elaborate. My point is that, within such innately determined constraints, the matter of what lexical items a language possesses is influenced by factors of usefulness. Individual inventiveness cannot violate the innately determined boundaries, see Hurford, 1987, Ch.2,Sec.5, for a detailed discussion of the relation of individual inventiveness to the capacity for language acquisition. Aspects of languages transmitted culturally from one generation to the next because of their usefulness have their origins in the inventiveness and creativity (presumably in some sense innate) of the individuals who first coined them and gave them currency. In the field of vocabulary again, it is uncontroversial that new words are invented by individuals, or arise somehow from small groups. Often it is not possible to trace who the first user of a new word was, but nevertheless there must have been a first user. In other parts of languages, such as their phonological, morphological, syntactic, semantic and pragmatic rule components, it is difficult to attribute the origins of particular rules to the creativity of individuals or groups, but even here a kind of attenuated creativity in the use of language, proceeding by small increments over many generations, seems plausible. The approximate story would be of existing rules having their domain of application gradually extended or diminished due to a myriad of small individual choices motivated by considerations of usefulness. Very few rules of syntax are completely general in the sense of having no lexical exceptions. Such sets of lexical exceptions are augmented or lessened continually throughout the history of languages. The specifically functional considerations, that is considerations of usefulness, which motivate such changes in the grammar of a language are of course usually impossible to identify with accuracy, and will remain so until we have much subtler theories and taxonomies of language use (which will help us to define the notion of usefulness itself more precisely). The historical role of invention and creativity that I have in mind is envisaged by Gropen et al. (1989) and described by Mithun (1984): "Instead, it could be that the historical processes which cause lexical rules to be defined over some subclasses but not others seem to favour the addition or retention of narrow classes of verbs whose meanings exemplify or echo the semantic structure created by the rule most clearly. The full motivation for the dativisability of a narrow class may

122

James R. Hurford

come from the psychology of the first speakers creative enough or liberal enough to extend the dative to an item in a new class, since such speakers are unlikely to make such extensions at random. Thereafter speakers may add that narrow class to the list of dativisable classes with varying degrees of attention to the motivation provided by the broad-range rule - by recording that possibility as a brute memorised fact, by grasping its motivation with the aid of a stroke of insight recapitulating that of the original coiners, or by depending on some intermediate degree of appreciation of the rationale to learn its components efficiently, depending on the speaker and the narrow class involved". (Gropen et al., 1989:245)

"But in Mohawk, where NI [ = noun incorporation] of all types is highly productive, speakers frequently report their pleasure at visiting someone from another Mohawk community and hearing new NI's for the first time. They have no trouble understanding the new words, but they recognise that they are not part of their own (vast) lexicon. When they themselves form new combinations, they are conscious of creating 'new words', and much discussion often surrounds such events." (Mithun, 1984:889)

The acts of individual speakers in responding creatively to considerations of usefulness are analogous to micro-events at the level of molecules, and the large movements of languages discernible to historical linguists are analogous to macro-events, such as those described in geophysical terms of plate tectonics (this analogy is Bob Ladd's). Whether or not we call a language in which there has been one micro-change a different language is a question of terminology. Let us adopt, temporarily and for argument's sake, the rigid convention that any one change, however slight, in a language L n produces a different language L n + 1 . This effectively equates 'language' with some abstraction even lower in level than 'idiolect', and so is not a generally useful convention in talking about language 5 . But, adopting this usage, competition in the Arena of Use determines whether L n or L n + 1 survives. These minimally differing languages may continue to coexist, because neither is significantly more useful than the other, or one may replace the other because it is in some sense more useful. Adopting a different terminological convention, wherein 'languages' are grosser entities, distinguished by masses of detailed differences, it is still competition in the Arena of Use which decides the survival of languages. The 'languages' I have in mind in this paragraph are I-languages. But since they, existing only inside speakers, can never come into contact with each other, the competition between them is actually fought out through the medium of their corresponding E-languages in the Arena of Use. (An approximate analogy would be a tournament acted out by marionette puppets whose behavioural repertoires (kick, punch, etc.) are specified by different programs of their robot operators, though the set of programs available in principle to all robots is the same. When a puppet loses a match, the program in the robot that was running it is eliminated. But remember that no analogy is perfect.)

Nativist and Functional Explanations in Language Acquisition

123

A schematic representation of the state of affairs postulated in a functional explanation of the short-term type is given in Figure 5, below. Note that the 'languages' mentioned in this diagram are E-languages, since they exist in and through the Arena of Use; that is, they correspond to the competing marionettes of the analogy of the previous paragraph. SHORT-TERM MECHANISM OF FUNCTIONALLY MOTIVATED CHANGE ONTOGENETIC (OR GLOSSOGENETIC) MECHANISM GRAMMARS

G1

G2

/IK /iiX

/

I 11 AoU

m \

LAD

I | .

II I REALISED LANGUAGES

I I \ 1I 1I I1

/ UNREALISED LANGUAGES COMPATIBLE WITH G l . e t c

'

1

/ | \ La Lb Lc...

/ AoU

/Il LI

I I \

G3

,'

'

\ \

I I I / I l / I I j Li

| Lj

LAD

\ Lk...

/ L2

/ u vms \\\

AoU

i\\ ^

^

\ \ \ \ \ ^ 1 \ \

L3

\ \ \ Lx Ly Lz...

Fig. 5.

The upper two levels in this diagram indicate the course of actual linguistic history: the actually mentally represented grammars G l , G2, G3, ..., and the actually realised languages LI, L2, L3, ... The bottom level in the diagram represents alternative language histories - what languages might have been realised if the pressures of the Arena of Use had been other than what they actually were. These possible but unrealised languages can be thought of as aborted due to competition in the Arena of Use from a more successful rival language. Competition in the Arena of Use, in the case of this short-term functional mechanism, is therefore between possible languages defined by the same LAD. (Figure 5 is in fact another variant of Andersen's scheme in Figure 4.) The unrealised languages are possible but non-occurring aggregates of real speech events in the language community, alternative courses of history, in effect. The scheme shown in Figure 5 is obviously idealised in many ways. One aspect of this idealisation worth mentioning is the fact that only one LAD is represented at any transition, whereas in fact language change is mediated by whole populations of LADs (tokens not types), all (1) exposed to different (though partially intersecting) data, (2) possibly themselves subject to some maturational change (see White, 1982:68-70, Borer and Wexler, 1987,1988), and (3) perhaps even not originally completely uniform. In a real case, some individuals would internalise grammars slightly different from those

124

James R. Hurford

internalised by others. This difference would be reflected by statistical changes in the Arena of Use, which in turn might prompt a rather larger proportion of language learners in the next generation to acquire grammars with a certain property. In this way, it might take many generations for a whole population to accomplish what with historical hindsight looks like a single discrete change. The term 'ontogenetic mechanism' might well be reserved for a case where a whole nonstatistical language change is achieved in a single generation, rather like the Bickerton/Givon picture of the leap from Pidgins to Creoles. That is, the new (version of the) language grows, fast, in just the time it takes one generation of individuals to acquire/create it. The slower version of the mechanism, which takes more than one generation, could appropriately be called the 'glossogenetic mechanism'. The only difference between the ontogenetic and the glossogenetic mechanism is in the number of generations taken. 2.5. The problem of identifying major functional

forces

This picture of functionally motivated language change has its opponents. One of the fiercest and most sustained critiques of this general point of view that I am aware of is in Lass (1980:64-97). Lass's view (in which he is not alone) is summed up in: "Merely on the evidence provided so far, if my arguments are sound, the proponents of any functional motivation whatever for linguistic change have to do one of two things: (i) (ii)

Admit that the concept of function is ad hoc and particularistic and give up; or Develop a reasonably rigorous, non-particularistic theory with at least some predictive power; not a theory based merely on post hoc identification plus a modicum of strategies for weaseling out of attempted disconfirmations.

This is the picture as I see it: (i) is of course the easy way out, and (ii) seems to be the minimum required if (i) is not acceptable. I am myself not entirely happy with (i), and it should probably not be taken up - though failing a satisfactory response to (ii) it seems inevitable." (Lass, 1980:79-80)

Lass discusses functional explanation under three subheadings: 'preservation of contrast', 'minimization of allomorphy', and 'avoidance of homophony', and convincingly demolishes claims by various scholars to have explained particular historical linguistic changes in such 'functional' terms. But in fact these attempted explanations are not genuinely functional according to the spirit in which I have argued the term should be taken. It is crucial to note that 'contrast', 'allomorphy', and 'homophony', as Lass uses them, are terms describing a language system, and not language

Nativist and Functional Explanations in Language Acquisition

125

use. In other words, quite clearly, these terms do not describe phenomena in the Arena of Use. Instances of contrast, mean degree of allomorphy, and pervasiveness of homophony can all be ascertained from inspection of a grammar, without ever observing a single speaker in action. This is of course what makes them attractive to many linguists. These are formal properties, in the same way that the simplicity of a grammar, measured in whatever way one chooses, is a formal property. Martinet's 'functional load' is likewise a formal property of language systems, not of language use, which may account for the failure of that concept to blossom as a tool of functional explanation. Obviously, the presence of contrast makes itself/e/i in the Arena of Use, but then so do most other aspects of grammars. In fact, an old and important debate in the transition from postBloomfieldian structuralist phonology to generative phonology sheds light on the relation between contrast, competence, and functionally motivated language change. The classical, taxonomic, or autonomous phoneme, whose essence was that it was defined in terms of contrast, was the central concept of pregenerative phonology. This was before the emergence of a better understanding of the competence/performance, or I-language/E-language, distinction, that came with the advent of generative linguistics. To the surprise of some, it turned out that generative phonology, conceived as a model of an individual's mentally represented knowledge of the sound pattern of his language, had no place at all for the classical phoneme. The classical phoneme simply did not correspond to any linguistically significant level of representation in competence grammars. The phonemicists who found this puzzling had no arguments against this conclusion, yet puzzlement remained, in some quarters. And, in 1971, a postscript to the debate appeared, an article by Schane (Schane, 1971), which pointed the way to a resolution of the puzzle. But even 1971 was too close to the events for matters to have become completely clear, and Schane's postscript still leaves something rather unsettled; I now offer a postpostscript, taking Schane's ideas, and showing how they can be well accommodated within the picture of the interaction between the LAD and the Arena of Use. Schane points to attested or ongoing sound changes in a number of languages (French nasalisation, Rumanian Palatalisation, Rumanian delabialisation, Nupe palatalisation and labialisation, and Japanese palatalisation). These changes conform to a pattern: "If, on the surface, a feature is contrastive in some environments but not in others, that feature is lost where there is no contrast". (Schane, 1971:505)

On the basis of these examples, Schane maintains that, for the speakers involved, the (approximately) phonemic level of representation at which

126

James R. Hurford

these contrasts exist must have had some psychological validity. But he has this problem: "Transformational phonology rejects the phoneme as a unit of surface contrast, so the theory has no way of identifying contrasts, and therefore no basis for identifying alternations (cf. Schane 1971:514). N o point in derivations exists where contrasts are identified". (Hudson, 1980:116)

Faced with the problem of reconciling some kind of psychological validity for the phoneme with the accepted conclusions of generative phonology, Schane argues in detail that representations at the phonemic level can be calculated from generative descriptions. The necessary calculations involve a partition of the rules into two types (morphophonemic and phonetic) and inspection of the derivations involvingjust rules of the former type. Note that the partition of phonological rules into morphophonemic and phonetic is also not directly represented in a generative grammar (of the type Schane was assuming) and must itself be calculated. So though a phonemic level may be accessible through a generative grammar, it is certainly not retrievable in any simple way. Schane's dilemma was that he, like others, "felt guilty about disinheriting the child [the phoneme]" (520), but since linguistic theories at the time were only competence theories, he had no obvious place to locate the phoneme. The classical phoneme was never as well-behaved as its structuralist proponents, some of whom wanted to build it into a bottom-up discovery procedure for grammars, would have liked. Languages often use a contrast distinctively in one environment, but ride roughshod over the distinction in productive phonological rules elsewhere. An example is English /s/~/z/, a phonemic contrast 'demonstrated' by the existence of many minimal pairs (sue/zoo, bus/buzz, racer/razor), but neutralized in many environments by some of the most productive phonological rules of English, the voicing assimilation rules involving the plural, 3rd person singular present tense, and possessive morphemes. Naturally, the phonemicists had a story to tell about such problems, but they were typically epicyclic. What could not be saved was the idea that the main thing a speaker knows about the sounds of his language is a set of surface contrasts, which serve everywhere to 'keep words apart' (Hockett's phrase). But of course, by and large, in the rough and tumble of everyday communication, enough words do get kept apart for decoding and successful communication to take place, much of the time. If phonological rules could obliterate all predictable distinctions between words, communication would break down. Some neutralizations are clearly permissible; the typical redundancy of language allows decoding in spite of them. But the situation cannot get out of hand. This suggests that the proper place for something

Nativist and Functional Explanations in Language Acquisition

127

like the 'Phonemic principle' is the Arena of Use. Speakers who allow their phonetic performance to stray too far away from the surface contrasts used as clues in reception by hearers are likely to be misunderstood. To remain as (linguistically) successful members of the speech community, they learn to respect, in a rough and ready way, a degree of surface contrastivity. I believe that Schane's basic account of the sound changes he discusses does illuminate them. Something puzzling (e.g. denasalisation following hard on the heels of nasalisation) is made to seem less puzzling by drawing attention to the fact that this happened in an environment where no surface contrast was lost. But Schane's principle is only explanatory in this weak sense; it lacks the predictive power that Lass calls for, and falls into Lass's category of 'a theory based merely on post hoc identification'. As Hock (1976) points out: "Though such changes undeniably occur, [Schane's] general claim is certainly t o o strong. Note, first of all that the similar loss of u-umlaut before remaining u, referred to as an 'Old Norse' change ..., is actually limited to Old Norwegian (cf. Benediktsson 1963) - Old Icelandic does not participate in it:... Moreover, among such frequent conditioned changes as palatalization and umlaut, examples of such a 'reversal' of change seem extremely infrequent, suggesting that the phenomenon is quite rare". (Hock, 1976:208)

What is needed, to explain particular sound changes, is a demonstration that particular contrasts are felt so important that actions occur in the Arena of Use tending to prevent loss of such contrasts. Such demonstrations are likely to be very difficult, because they involve delving into the very messy data of the Arena of Use in search of clear indications involving individual words, phonemes, etc. The confrontation with the messy data of the Arena of Use is, however, far less daunting if one heeds the crucial point made by Foley and Van Valin: "It must be emphasised that functional theories are not performance theories. That is, they seek to describe language in terms of the types of speech activities in which language is used as well as the types of constructions which are used in speech activities. They do not attempt to predict the actual tokens of speech events. ... They are theories of systems, not of actual behavior". (Foley and Van Valin, 1984:15)

Unfortunately, this is expressed slightly inaccurately, in my terms. I would rather have said: 'functional theories are theories of performance types, and not of performance tokens'. The point is clear, however, and should be invoked to protect functional theories from disappearing without trace into the ultimate morasse of particular events. But the warning may still not be strong enough, because even functional hypotheses in terms of

128

James R. Hurford

particular construction types and speech activity types are likely to be met by counterexamples. To overcome this and Lass's correct criticism of the 'particularism' of functional explanations, we somehow need to get a good statistical grip on the functional factors that affect language change. It is to be hoped that broad classes of events in the Arena of Use are susceptible to statistical treatment, even though individual events may appear to be more or less random. A theory of functional language change is, for the foreseeable future, only likely to be successful in characterising the statistical distribution of possible end-results of change. In this way it will be predictive in the same sense as, say, cosmology, is predictive. A given cosmological theory may predict that background microwave radiation from all directions in the universe varies only within very narrow limits (a statistical statement), but it will make no predictions at all about the particular variations. In starting to get to theoretical grips with phenomena in the Arena of Use, it will be important to note Bever's guiding words: "I have attempted to avoid vague reference to properties such as "mental effort" "informativeness" "importance" "focus" "empathy" and so on. I do not mean that these terms are empty in principle: however they are empty at the moment, and consequently can have no clear explanatory force". (Bever, 1975:600-601)

Many well-intentioned attempts to establish foundations for functional theories of language change, as, for instance, in Martinet (1961), fall foul of this problem. But there are positive developments, too. The parsing explanation for word order universals offered by Hawkins (1990) makes precise a notion of economy in parsing that rescues a 'principle of least effort' (Zipf, 1949), in this area at least, from vagueness and vacuity. And I would add a reservation to Bever's warning. Terms and concepts acquire explanatory force by being invoked in plausible explanations of wide ranges of phenomena. We don't know in advance just where on the theoretical/ observational continuum notions like 'mental effort' and 'informativeness' will fall. They may turn out to be relatively abstract notions, embedded in a quite highly structured theory. In such a case, their explanatory force would derive from the part they play in the explanatory success of the theory as a whole: it will not be possible to evaluate their contribution in isolation. Lass's reluctance to take up his constructive second option, 'Develop a reasonably rigorous, non-particularistic theory with at least some predictive power', is curious. We build theories, the best that the domains concerned permit, to gain illumination about the world. As long as we don't, we remain in the dark. Of course, we should also avoid building theories only where the (usually mathematical) light is good, like the

Nativist and Functional Explanations in Language Acquisition

129

proverbial man searching for his keys under the street lamp, rather than where he had dropped them, because the light was better under the lamp. But it is precisely because the light is (at present) dim in the area of functional influences on language change that adequate functional theories have not emerged. Perhaps in some cases there are indeed no functional causes of language change, and the changes merely come about by random drift such as one may expect in any complex culturally transmitted system. But it would be quite unreasonable to assert that in no cases does the factor of usefulness exert a pressure for change. The fact that we are unable to pinpoint specific instances should not be confused for an argument that changes caused by factors of usefulness do not exist. We can't see black holes in space, but we have good reasons to believe they exist. Does anyone really doubt that languages are useful systems and that (some) changes in them are brought about by factors of usefulness? The only (!) issue is of the precise nature and extent of the mechanisms involved. 2.6. Language drift A number of recent studies in diachronic linguistics have proposed evolutionary tendencies in the histories of languages. Bybee (1986), for example, argues for the universal origin of grammatical morphemes in independent lexical items. "... the types of change that create grammatical morphemes are universal, and the same or similar material is worn down into grammatical material in the same manner in languages time after time ..." (Bybee, 1986:26) "... grammatical morphemes develop out of lexical morphemes by a gradual process of phonological erosion and fusion, and a parallel process of semantic generalisation". (Bybee, 1986:18)

Mithun (1984) proposes that noun incorporation (NI) develops diachronically along a specific route: "NI apparently arises as part of a general tendency in language for V's to coalesce with their non-referential objects, as in Hungarian and Turkish. The drift may result in a regular, productive word formation process, in which the NI reflects a reduction of their individual salience within predicates (Stage I). Once such compounding has become well established, its function may be extended in scope to background elements within clauses (Stage II). In certain types of languages, the scope of NI may be extended a third step, and be used as a device for backgrounding old or incidental information within discourse (Stage III). Finally, it may evolve one step further into a classificatory

130

James R. Hurford

system in which generic NP's are systematically used to narrow the scope of V's with and without external NP's which identify the arguments so implied (Stage IV)". (Mithun, 1984:891)

Mithun goes on to describe other tendencies for change that languages may undergo, in cases where the evolutionary process is arrested at any of these stages. Traugott (1989) discusses 'paths of semantic change' in terms of the following three closely related tendencies: "Tendency I: Meanings based in the external described situation > meanings based in the internal (evaluative/perceptual/cognitive) described situation. Tendency II: Meanings based in the external or internal described situation > meanings based in the textual and metalinguistic situation. Tendency III: Meanings tend to become increasingly based in the speaker's subjective belief state/attitude toward the proposition. ... All three tendencies share one property: the later meanings presuppose a world not only of objects and states of affairs, but of values and of linguistic relations that cannot exist without language. In other words, the later meanings are licensed by the function of language". (Traugott, 1989:34-35)

Naturally, the proposals of Bybee, Mithun, and Traugott are subject to normal academic controversy, but it seems likely that some core of their central ideas will stand the test of time. For my purpose, the crucial core to all these proposals is the proposition that there exist specific identifiable mechanisms affecting the histories of languages continuously over stretches longer than a single generation. If this is true, which seems likely, then there must be some identifiable property of the language acquirer's experience which has the effect of inducing a competence different in some way from that of the previous generation. If such patterning in the input data were not possible, there could be no medium through which such long-term diachronic mechanisms could manifest themselves; the diachronic spiral through LAD and Arena of Use would not exist; languages would be only reinvented with each generation, and they would contain no 'growth marks', in the sense of Hurford (1987).

3. C O N C L U S I O N

Language, in some broad sense, is equally an object of interest to biologists, to students of language acquisition, of grammatical competence, and of discourse and pragmatics, and to historical linguists. Each of these

Nativist and Functional Explanations in Language Acquisition

131

disciplines has its own perspective on the object (e.g. focussing on Elanguage or I-language), but the perspectives must ultimately be mutually consistent and able to inform each other. The biological linguist is concerned with the innate human properties giving rise to the acquisition of uniformly structured systems across the species. The student of language acquisition is concerned with the interplay between these innate properties of the grammar representation system, other aspects of internal structure (e.g. innate processing mechanisms), and the learner's experience of the physical and social world. Students of discourse and pragmatics focus on, and hope to be able to explain and predict, certain patterning in the social linguistic intercourse which the learner experiences. Such patterning makes some impact on the grammatical competence acquired, resulting in the grammaticalisation of discourse processes, at which point the phenomena engage the attention of the student of competence. Frequency monitoring and individual creativity play a part in this diachronic spiral through grammars and use, by which languages develop, giving rise to the processes studied by the historical linguist. The LAD is born into, and lives in, the Arena of Use. The Arena does not, in the short term, shape the Device, but, in conjunction with it, shapes the learner's acquired competence. The interaction between this competence and the enveloping Arena reconstructs the Arena in readiness for the entry of the next wave of LADs.

FOOTNOTES 1. Pinker and Bloom mention some of the evidence for this: "Bever, Carrithers, Cowart, and Townsend (1989) have extensive experimental data showing that right-handers with a family history of left-handedness show less reliance on syntactic analysis and more reliance on lexical association than d o people without such a genetic background. Moreover, beyond the "normal" range there are documented genetically-transmitted syndromes of grammatical deficits. Lenneberg (1967) notes that specific language disability is a dominant partially sex-linked trait with almost complete penetrance (see also Ludlow and Cooper, 1983, for a literature review). More strikingly, Gopnik, 1989, has found a familial selective deficit in the use of morphological features (gender, number, tense, etc.) that acts as if it is controlled by a dominant gene". (Pinker and Bloom, 1990) 2. Sperber and Wilson's theory is, however, still controversial. See the peer review in Behavioral and Brain Sciences 10 (1987), also the exchange in Journal of Semantics 5 (1988), and Levinson (1989).

132

James R. Hurford

3. This is how Fodor (1976) casts a theory of language: "The fundamental question that a theory of language seeks to answer is: How is it possible for speakers and hearers to communicate by the production of acoustic wave forms?". (Fodor, 1976:103) 4. In this quotation, I have (with the author's approval) three times replaced an original instance of 'speakers' with 'language learners' and (indicating a shift in my opinion about certain numeral expressions) replaced 'preferred usage' with 'a fact of grammar'. 5. This convention is actually quite standard. Pinker, for example, adopts this usage: 'What the Uniqueness principle does is ensure that languages are generally not in proper inclusive relationships. When the child hears an irregular form and consequently drives out its productively generated counterpart, he or she is tacitly assuming that there exists a language that contains the irregular form and lacks the regular form, and a language that contains the regular form and lacks the irregular form, but no language that contains both". (Pinker, 1984:360)

REFERENCES Andersen, H. 1973. Abductive and Deductive change. Language 40. 765-793. Atkinson, M. 1982. Explanations in the Study of Child Language Development. Cambridge: Cambridge University Press. Bates, E., I. Bretherton and L. Snider. 1988. From First Words to Grammar: Individual Differences and Dissociable Mechanisms. Cambridge: Cambridge University Press. Bates, E. and B. MacWhinney. 1987. Competition, Variation, and Language Learning. In B. MacWhinney (ed.) Mechanisms of Language Acquisition. 157-193. Hillsdale, New Jersey: Erlbaum. Benediktsson, H. 1963. Some Aspects of Nordic Umlaut and Breaking. Language 39. 409431. Beukema, F. and P. Coopmans. 1989. A Government-Binding Perspective on the Imperative in English. Journal of Linguistics 25. 417-436. Bever, T. G. 1975. Functional Explanations Require Independently Motivated Functional Theories. In R. E. Grossman, L. James San and T. J. Vance (eds.) Papers from the Parasession on Functionalism. 580-609. Chicago: Chicago Linguistic Society. Bever, T. G., and D. T. Langendoen. 1971. A Dynamic Model of the Evolution of Language. Linguistic Inquiry 2. 433-463. Bever, T. G., C. Carrithers, W. Cowart and D. J. Townsend. (in press). Tales of two sites: The quasimodularity of language. In A. Galaburda (ed.) Neurology and Language. Cambridge, Massachusetts: MIT Press. Bickerton, D. 1981. Roots of Language. Ann Arbor, Michigan: Karoma. Borer, H. and K. Wexler. 1987. The Maturation of Syntax. In T. Roeper and E. Williams (eds.) Parameter Setting. 123-172. Dordrecht: Reidel. Borer, H. and K. Wexler. 1988. The Maturation of Grammatical Principles. Ms. Department of Cognitive Science, University of California at Irvine. Brown, C. H. 1984. Language and Living Things: Uniformities in Folk Classification and Naming. New Brunswick: Rutgers University Press. Bybee, J. L., and M. A. Brewer. 1980. Explanation in Morphophonemics: Changes in Provençal and Spanish Preterite Forms. Lingua 52. 271-312. Bybee, J. L. 1986. On the Nature of Grammatical Categories. Proceedings of the Second Eastern States Conference on Linguistics. 17-34. Ohio State University.

Nativist and Functional Explanations in Language Acquisition

133

Cerdegren, H. and D. Sankoff. 1984. Variable Rules: Performance as a Statistical Reflection of Competence. Langugae 50. 333-355. Chomsky, A. N. 1965. Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT Press. Chomsky, A. N. 1979. Language and Responsibility. Hassocks, Sussex: Harvester Press. Chomsky, A. N. 1986. Knowledge of Language: its Nature, Origin, and Use. New York: Praeger. Chomsky, A. N. and H. Lasnik. 1977. Filters and Control. Linguistic Inquiry 8. 425-504. Clark, H., and S. E. Haviland. 1974. Psychological Processes as Linguistic Explanation. In David Cohen (ed) Explaining Linguistic Phenomena. 91-124. Washington D. C.: Hemisphere Publishing Corporation. Coker, C. H., N. Umeda and C. P. Browman. 1973. Automatic Synthesis from Ordinary English Text. IEEE Transactions on Audio and Electroacoustics AU-21. 293-8. Coopmans, P. 1984. Surface Word-Order Typology and Universal Grammar. Language 60. 5-69. Corbett, G. 1983. Hierarchies, Targets and Controllers: Agreement Patterns in Slavic. London: Croom Helm. Dawkins, R. 1982. The Extended Phenotype: the Gene as the Unit of Selection. Oxford: Oxford University Press. Downes, W. 1977. The Imperative and Pragmatics. Journal of Linguistics 13. 77-97. Du Bois, J. W. 1980. Beyond Definiteness: The Trace of Identity in Discourse. In W. L. Chafe (ed.) The Pear Stories: Cognitive Cultural and Linguistic Aspects of Narrative Production. 207-274. Du Bois, J. W. 1985. Competing Motivations. In John Haiman (ed.) Iconicity in Syntax. 343-365. Amsterdam: John Benjamins. Du Bois, J. W. 1987. The Discourse Basis of Ergativity. Language 63. 805-855. Edie, J. 1976. Speaking and Meaning: the Phenomenology of Language. Bloomington, Indiana: Indiana University Press. Fidelholtz, J. L. 1975. Word Frequency and Vowel Reduction in English. Papers from the Eleventh Regional Meeting of the Chicago Linguistic Society. 200-213. Fodor, J. A. 1976. The Language of Thought. Hassocks, Sussex: Harvester Press. Foley, W. A. and R. D. Van Valin Jr. 1984. Functional Syntax and Universal Grammar. Cambridge: Cambridge University Press. Foley, W. A. 1986. The Papuan Languages of New Guinea. Cambridge: Cambridge University Press. Fries, C. C. and K. L. Pike. 1949. Coexistent Phonemic Systems. Language 25. 29-50. Givon, T. 1979. On Understanding Grammar. New York: Academic Press. Givon, T. 1986. Prototypes: Between Plato and Wittgenstein. In C. Craig (ed.) Noun Classes and Categorization. 77-102. Amsterdam: John Benjamins. Gleitman, L. R., E. Newport and H. Gleitman. 1984. The Current State of the Motherese Hypothesis. Journal of Child Language 2. 43-81. Gold, E. M. 1967. Language Identification in the Limit. Information and Control 10. 447474. Golinkoff, R. M. and L. Gordon. 1983. In the Beginning was the Word: a History of the Study of Language Acquisition. In R. M. Golinkoff (ed.) The Transition from Prelinguistic to Linguistic Communication. 1-25. Hillsdale, New Jersey: Lawrence Erlbaum. Gopnik, M. 1989. A Featureless Grammar in a Dysphasic Child. Ms. Department of Linguistics, McGill University. Greenberg, J. H. 1966. Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements. In J. H. Greenberg (ed.) Universals of Language. Cambridge, Massachusetts: MIT Press.

134

James

R.

Hurford

Grimshaw, A. D. 1989. Infinitely Nested Chinese 'Black Boxes': Linguists and the Search for Universal (Innate) Grammar. Behavioral and Brain Sciences 12. 339-340. Grimshaw, J. and S. Pinker. 1989. Positive and Negative Evidence in Language Acquisition. Behavioral and Brain Sciences 12. 341-342. Gropen, J., S. Pinker, M. Hollander, R. Goldberg and R. Wilson. 1989. The Learnability and Acquisition of the Dative Alternation in English. Language 65. 203-257. Hasher, L. and R. T. Zacks. 1984. Automatic Processing of Fundamental Information: the Case of Frequency of Occurrence. American Psychologist 39. 1372-1388. Hawkins, J. A. 1990. A Parsing Theory of Word Order Universals. Linguistic Inquiry 21. 223-261. Hock, H. H. 1976. Review article on Raimo Anttila 1972. An Introduction to Historical and Comparative Linguistics. New York: Macmillan. Languaqe 52. 202-220. Hooper, J. 1976. Word Frequency in Lexical Diffusion and the Source of Morphophonological Change. In W. M Christie, Jr. (ed.). Current Progress in Historical Linguistics. 95-105. Amsterdam: North Holland. Horning, J. J. 1969. A Study of Grammatical Inference. Doctoral Dissertation, Stanford University. Hudson, G. 1980. Automatic Alternations in Non-Transformational Phonology. Language 56. 94-125. Hurford, J. R. 1987. Language and Number. Oxford: Basil Blackwell. Hurford, J. R. 1989. Biological Evolution of the Saussurean Sign as a Component of the Language Acquisition Device. Lingua 77. 245-280. Hurford, J. R. 1991a. The Evolution of the Critical Period for Language Acquisition. Cognition. Hurford, J. R. 1991b. An Approach to the Phylogeny of the Language Faculty. In J. A. Hawkins and M. Gell-Mann (eds.) The Evolution of Human Languages. Santa Fe Institute Studies in the Sciences of Complexity, Proceedings vol. X. Addison Wesley. Hyman, L. M. 1984. Form and Substance in Language Universals. In Brian Butterworth, B. Comrie and O. Dahl (eds.) Explanations for Language Universals. 67-85. Berlin: Mouton. Ingram, D. 1979. Cross-linguistic Evidence on the Extent and Limit of Individual Variation in Phonological Development. Proceedings of the 9th International Congress of Phonetic Sciences. Institute of Phonetics. University of Copenhagen. Itkonen, E. 1978. Grammatical Theory and Metascience: a critical investigation into the methodological and philosophical foundations of' autonomous' linguistics. Amsterdam: John Benjamins. Itkonen, T. 1977. Notes on the Acquisition of Phonology. English summary of: Huomiota lapsen aanteiston kehitykseka. Virittaja. 279-308. (English summary 304-308). Koopmans-van Beinum, F. J. and J. H. Harder. 1982/3. Word Classification, Word Frequency and Vowel Reduction. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 7. 61-9. Kripke, S. 1982. Wittgenstein on Rules and Private Language. Cambridge, Massachusetts: Harvard University Press. Kroch, A. 1989. Language Learning and Language Change. Behavioral and Brain Sciences 12. 348-349. Labov, W. 1969. Contraction, Deletion and Inherent Variability of the English Copula. Language AS. 716-762. Lasnik, H. 1981. Learnability, Restrictiveness, and the Evaluation Metric. In C. L. Baker and J. J. McCarthy (eds.) The Logical Problem of Language Acquisition. 1-21. Cambridge, Massachusetts: MIT Press. Lass, R. G. 1980. On Explaining Language Change. Cambridge: Cambridge University Press. Lenneberg, E. H. 1967. Biological Foundations of Language. New York: John Wiley and

Nativist and Functional Explanations in Language Acquisition

135

Levinson, S. C. 1989. A review of Relevance. Journal of Linguistics 25. 455-472. Lightfoot, D. W. 1979. Principles of Diachronic Syntax. Cambridge: Cambridge University Press. Lightfoot, D. W. 1983. The Language Lottery: Toward a Biology of Grammars. Cambridge, Massachusetts: MIT Press. Lightfoot, D. W. 1988. Creoles, Triggers and Universal Grammar. In C. Duncan-Rose and T. Vennemann (eds.) On Language: Rhetorica, Phonologica, Syntactica: A Festschrift for R. P. Stockwell from his Friends and Colleagues. 97-105. London: Routledge. Lightfoot, D. W. 1989a. The Child's Trigger Experience: Degree-O Learnability. Behavioral and Brain Sciences 12. 321-334. Lightfoot, D. W. 1989b. Matching Parameters to Simple Triggers. Behavioral and Brain Sciences 12. 364-371. Locke, J. L. 1986. Speech Perception and the Emergent Lexicon: an Ethological Approach. In P. Fletcher and M. Garman (eds.) Language Acquisition: Studies in First Language Development (2nd ed.). 240-250. Cambridge: Cambridge University Press. Ludlow, C. L. and J. A. Cooper. 1983. Genetic Aspects of Speech and Language Disorders: Current status and future directions. In Ludlow, C. L. and J. A. Cooper (eds.) Genetic Aspects of Speech and Language Disorders. New York: Academic Press. Macken, M. A. 1980. Aspects of the Acquisition of Stop Consonants. In Yeni-Komshian et al (eds.) Child Phonology. New York: Academic Press. Macken, M. A. 1987. Representation, Rules, and Overgeneralization in Phonology. In B. MacWhinney (ed.) Mechanisms of Language Acquisition. 367-397. Hillsdale, New Jersey: Erlbaum. McCawley, J. 1984. Review of White (1982). Language 60. 431-436. MacWhinney, B. 1987a. The Competition Model. In B. MacWhinney (ed.) Mechanisms of Language Acquisition. 249-308. Hillsdale, New Jersey: Erlbaum. MacWhinney, B. 1987b. Applying the Competition Model to Bilingualism. Applied Psycholinguistics 8. 315-327. Mallinson, G. 1987. Review of B. Butterworth. B. Comrie and O. Dahl. (eds.) Explanations for Language Universals. Berlin: Mouton. Australian Journal of Linguistics 7. 144-150. Martin, L. 1986. 'Eskimo Words for Snow': A Case Study in the Genesis and Decay of an Anthropological Example. American Anthropologist 88.2 (June). 418-423. Martinet, A. 1961. A Functional View of Language. Oxford: Clarendon Press. Miller, G. 1981. Language and Speech. San Francisco: Freeman. Milroy, L. 1985. What a Performance! some Problems with the Competence-Performance Distinction. Australian Journal of Linguistics 5. 1-17. Mithun, M. 1984. The Evolution of Noun Incorporation. Language 60. 847-894. Moder, C. L. 1986. Productivity and Frequency in Morphological Classes. Proceedings for the Second Eastern States Conference on Linguistics. Columbus, Ohio: Ohio State University. Muhlhausler, P. and R. Harre. 1990. Pronouns and People. Oxford: Basil Blackwell. Neu, H. 1980. Ranking of Constraints on / t , d / deletion in American English. In W. Labor (ed.) Locating Language in Time and Space. 37-54. New York: Academic Press. Newmeyer, F. J. 1980. Linguistic Theory in America: The First Quarter-Century of Transformational Generative Grammar. New York: Academic Press. Newmeyer, F. J., forthcoming. Functional Explanations in Linguistics and the Origin of Language. Language and Communication. Pateman, T. 1985. From Nativism to Sociolinguistics: Integrating a Theory of Language Growth with a Theory of Speech Practices. Journal for the Theory of Social Behaviour 15. 38-59. Phillips, B. 1984. Word Frequency and the Actuation of Sound Change. Language 60. 320342.

136

James

R.

Hurford

Piattelli-Palmarini, M. 1989. Evolution, Selection, and Cognition: From 'learning' to parameter setting in Biology and the Study of Language. Cognition 31. 1-44. Pinker, S. 1984. Language Learnability and Language Development. Cambridge, Massachusetts: Harvard University Press. Pinker, S. and P. Bloom. 1990. Natural Language and Natural Selection. Behavioral and Brian Sciences 13. Pullum, G. 1989. The Great Eskimo Vocabulary Hoax. Natural Language and Linguistic Theory 7. 275-281. Romaine, S. 1982. Socio-Historical Linguistics: its Status and Methodology. Cambridge: Cambridge University Press. Saussure, F. de 1966. Course in General Linguistics (translated by Wade Baskin). New York: McGraw Hill. Schane, S. A. 1971. The Phoneme Revisited. Language 47. 503-521. Sperber, D. and D. Wilson. 1986. Relevance: Communication and Cognition. Oxford: Basil Blackwell. Thiemann Traugott, E. C. 1989. On the Rise of Epistemic Meanings in English: an Example of Subjectification in Semantic Change. Language 65. 31-55. Wexler, K. and P. W. Culicover. 1980. Formal Principles of Language Acquisition. Cambridge, Massachusetts: MIT Press. Wexler, K. 1981. Some Issues in the Theory of Learnability. In C. L. Baker and J. J. McCarthy (eds.) The Logical Problem of Language Acquisition. 30-52. Cambridge, Massachusetts: MIT Press. White, L. 1982. Grammatical Theory and Language Acquisition. Dordrecht: Foris. Wright, C. W. 1979. Duration Differences between Rare and Common Words and their Implications for the Interpretation of Word Frequency Effects. Memory and Cognition 7. 411-419. Zipf, G. K. 1949. Human Behavior and the Principle of Least Effort. Cambridge, Massachusetts: Addison Wesley.

Locality and Parameters again Rita Manzini University College London

A significant issue in the theory of parameterisation is whether there is a parameter associated with the definition of locality, as proposed in Manzini and Wexler (1987), Wexler and Manzini (1987), or the relevant parameterisation effects are produced by other parameters, not associated with the definition of locality, as proposed in Pica (1987). In this paper I aim to show that the latter solution is inadequate, and hence the former solution remains necessary on descriptive adequacy grounds. The conclusion, if correct, is directly relevant to the psycholinguistic discussion revolving around the Subset Principle. Remember that according to the formulation of the Subset Principle in Wexler and Manzini (1987) and Manzini and Wexler (1987), given a parameter p with values pj and pj, and the two languages L( and Lj generated under p; and pj respectively, value Pi is selected by the language learner just in case two conditions are verified: first, L; is compatible with the input data D; second, if Lj is also compatible with the data D, then Lj is a subset of Lj. Obviously, under this formulation, the Subset Principle is void unless subset relations hold between the languages generated under the different values of a parameter. Furthermore, if the Subset Principle is to determine the order of learning in all cases, it is necessary that subset relations hold between the languages generated under each two values of each parameter. This latter requirement corresponds to the Subset Condition of Manzini and Wexler (1987), Wexler and Manzini (1987). In order to show that the Subset Principle can be a sufficient condition for learning, it is necessary to show that the Subset Condition holds. This result is not proved in Manzini and Wexler (1987), Wexler and Manzini (1987); in fact, well established parameters such as the null subject parameter, as discussed notably in Hyams (1986), or head ordering parameters, violate the Subset Condition under any of their formulations. However the question arises whether the Subset Condition holds of at least some parameters; if so, the Subset Principle can be at least a necessary condition for learning. This latter question is answered positively in Manzini and Wexler (1987), Wexler and Manzini (1987) on the basis of the parameter associated with the definition of locality for binding. If on the other hand the alternative to this parameter provided in Pica (1987) is correct, then

138

Rita Manzini

there is no longer any argument for even the weak version of the Subset Condition, and the Subset Principle remains completely unsupported. In the present study, I do not set out to uncover new evidence in favour of the subset theory of learning. What I will argue however is that at least the original evidence for it stands, in that precisely a locality parameter for binding of the type in Manzini and Wexler (1987), Wexler and Manzini (1987) is necessary, and an approach of the type in Pica (1987) is insufficient.

1. LOCALITY

To begin with, I assume that the basic structure of an English sentence is as in (1): (1)

CP

V This is the structure proposed in Chomsky (1986a; b), except that the subject is taken to move to the Spec of IP position, where it can be assigned a Case, from a VP-adjoined position, where it can be assigned a thetarole, as in Sportiche (1988). It is generally assumed that the locality theory for movement is based at least in part on a notion of government, which following Chomsky (1986b) is formulated as in (2) in terms of a notion of barrier: (2)

ft

governs a iff there is no y such that 7 is a barrier for and 7 excludes /?

In Manzini (1988; 1989) it is argued that the locality theory for movement can be entirely based on the notion of government, if the notion of barrier

Locality and Parameters again

139

is in turn formulated as in (3)-(4). (3) defines a g-marker for a as a head which is a sister to a or to a maximal projection that a agrees with : (3)

¡3 g-marks a iff ¡3 is an X° and (i) P is a sister of a; or (ii) j8 is a sister of y and y agrees with a

(4) defines a barrier for a as a maximal projection that dominates a and, if a has a g-marker, a g-marker for a: (4)

7 is a barrier for a iff 7 is a maximal projection, y dominates a and if a is g-marked, y dominates the g-marker for a

It is important to notice that (3)-(4) uses all and only the primitives used in the definition of barrier (and minimality barrier) in Chomsky (1986b), to the exclusion notably of the notion of subject. This in turn is the only crucial property of (3)-(4) for the present discussion; hence the conclusions that we will reach are essentially independent of the theory in Manzini (1988; 1989). Consider then the locality theory for referential dependencies. The locality condition on anaphors in Chomsky (1981), Binding Condition A, states that an anaphor must have an antecedent in its governing category. A governing category for a is defined in turn as a category that dominates a, a governor for a and a subject accessible to a . Let us compare this definition of locality with (4). To begin with, there is no indication that a governing category need ever be a non-maximal projection; thus in this respect governing categories and barriers need never differ. Furthermore, a governing category must dominate a governor for a, while a barrier must dominate a g-marker for a , if a is g-marked. It is easy to check however that in our theory the notion of g-marker reconstructs the notion of governor in Chomsky (1981); thus in this respect the two definitions of locality do not differ either. The only difference between the two remains the notion of subject, which appears in the definition of governing category, but not in the definition of barrier. If so, the definition of governing category y can be given as in (5), which is the definition of barrier in (4) with the added requirement that 7 must dominate a subject accessible to a; in the first instance a subject can be taken to be acccessible to a just in case it c-commands a:

140 (5)

Rita Manzini 7 is a governing category for a iff 7 is a maximal projection, y dominates a, y dominates a g-marker for a and y dominates a subject accessible to a

As for Binding Condition A itself, it can require that an anaphor must have an antecedent not excluded by its governing category, as in (6), thus maximising the similarity between its requirement and a government requirement: (6)

Given an anaphor a, there is an antecedent J3 for a such that no governing category for a excludes y3

Consider English himself. If it is in object position, himself can refer no further than the immediately superordinate subject, as in (7); if in subject position, himself gives rise to illformedness, as in (8): (7) (8)

John thinks that Peter likes himself * John thinks that heself/ himself likes Peter

The facts in (7)-(8) are predictable on the basis of (5)-(6), but also on the basis of a government condition, stating that anaphors must have an antecedent that governs them. Consider first the object position. VP is a barrier for it, hence an anaphor in object position can only have an antecedent that is not excluded by VP, if government is to be satisfied. The VP-adjoined subject position in (1) satisfies this condition, and no position higher than it does. Thus it correctly follows that himself in (7) can only be bound by the embedded subject. Consider now the ultimate subject position, in the Spec of IP, as in (1). CP is a barrier for the Spec of IP, hence an anaphor in the Spec of IP must be bound internally to CP, if government is to be satisfied. However, all available positions in this domain are A'-position. Thus A-binding must violate government, and the ungrammaticality of (8) is correctly derived. (7)-(8), then, do not necessitate recourse to the notion of subject, and therefore lend no support to the theory in (5)-(6) as opposed to (4). The notion of subject is in fact needed to account for this type of examples in Chomsky (1981; 1986a) but only because one subject position only is postulated, the Spec of IP, a VP-external position; this has been noticed also in Kitagawa (1986) and Sportiche (1988). Because of this, and because under any definition of locality based only on the notion of maximal projection (and governor/ g-marker) VP is a locality domain for the direct object, the notion of subject must be referred to in order to allow the

Locality and Parameters again

141

locality domain of the direct object to include the immediately superordinate subject as well. With a pronoun such as him substituted for the anaphors in (7)-(8) the grammaticality judgements are of course reversed, as in (9)-(10): (9)

John thinks that Peter likes him

(10)

John thinks that he likes Peter

(10) is wellformed with John as the antecedent for him, while (9) is wellformed with John again, but not Peter as the antecedent. According to Chomsky (1981), this behaviour is again accounted for by a condition formulated in terms of the notion of governing category in (5), Binding Condition B; following the format of Binding Condition A, as in (6), Binding Condition B can be rendered as in (11): (11)

Given a pronoun a, there is no antecedent /? for a such that no governing category for a excludes )3

As in the case of (5)-(6), the theory in (5) and (11) can account for the data, in this case (9)-( 10); but an account is equally possible in terms of a government condition. Consider first a pronoun in object position, as in (9). Its first barrier is VP, which under the theory of phrase structure in (1) contains a subject position. It follows that the pronoun is correctly predicted to be disjoint in reference from the immediately superordinate subject, if the condition on it is that it cannot be governed by its antecedent. Similarly, consider the subject pronoun in (10), ultimately in the Spec of IP position. IP is not a barrier for the subject, if it is g-marked by C; but CP is a barrier for it. Hence the superordinate subject is correctly predicted to be a possible antecedent for the pronoun, since it does not govern it. As far as the object and subject position of a sentence are concerned, or in general sentential positions, it appears then that Binding Conditions A and B can be formulated in terms of the notion of government, as in (12), and do not require reference to the notion of governing category in (5): (12)

A. Given an anaphor a , there is an antecedent /? for a such that fi governs a B. Given a pronoun a, there is no antecedent ¡3 for such that ft governs a

There is however a type of data in favour of the Binding Theory in (5)-

142

Rita Manzini

(6) and (11) that has not been considered so far. These data, involving NP-internal positions will be considered in the next section, where I will conclude that anaphors are indeed associated with the notion of governing category, though pronouns are associated with the notion of barrier. Thus the notion of governing category cannot be reduced to the notion of barrier, and viceversa.

2. ENGLISH ANAPHORS A N D PRONOUNS

Suppose we assume that NP's have a structure of the type in (13), where a is the NP's object, and /3 the NP's subject: (13)

NP

In the light of recent discussions of the internal structure of NP's, notably English NP's, as in Abney (1987), it is doubtful that (13) and not a more complex structure is to be postulated. I choose (13) simply for convenience; the results obtained for (13) should in turn be extendable to a structure such as (14), where D(et) is also a head: (14)

DP

N This is especially true if in (14) the position of the subject is originally in the Spec of NP, and Case reasons impel movement to the Spec of DP. If so (13) represents not so much an alternative to (14), as a substructure of (14). If the subject in (13) is realised, then an anaphoric a must be bound within NP, in (13) by /?. This is seen, with the English reflexive himself, in examples of the type of (15), where NP itself is in object position:

Locality and Parameters again (15)

143

John likes [Peter's pictures of himself]

It is not difficult to see that in this case the correct predictions follow under both our definition of barrier and the definition of governing category. Under the former, NP is a barrier for because it is a maximal projection that dominates a and the g-marker of a, namely N. Under the latter, NP is a governing category for a for the same reasons and because it also dominates a subject that c-commands a. Thus if an anaphor must have a binder not excluded by its governing category, himself in (15) must be bound by Peter's; the same result follows if an anaphor must have a binder that governs it. Consider now the case in which in (13) is not realised, as in (16), which exactly reproduces (15) but for the absence of the subject Peter's-, crucially, it is not necessary to the wellformedness of (16) that the subject of the NP is interpreted as referentially dependent on John: (16)

John likes [pictures of himself]

Consider then the definitions of barrier and governing category. Under the definition of barrier, the notion of subject, hence its presence or absence in any given structure, is altogether irrelevant. Thus NP is a barrier for a, whether /J is present or not. On the basis of the principle that an anaphor must have a binder that governs it, himself in (16) is then incorrectly barred from being referentially dependent on the sentence's subject, John, which is NP-external. The correct predictions, on the other hand, follow under the definition of governing category. If (3 is missing, NP is not a governing category for a for the simple reason that it does not have a subject. Rather the governing category for a is the first maximal projection that does have a subject, namely VP or IP. The prediction then is that a can be bound by this subject; hence concretely that himself in (16) can be bound by John. In short, our discussion so far indicates that if NP-internal positions are considered, a correct account of anaphoric dependencies can only be given under a definition of locality making reference to the notion of subject. Let us then consider the remaining examples in the himself paradigm. In (17)-(18), himself is again in the object position of an NP, in particular of an NP with a subject in (17) and of a subjectless NP in (18); however the NP itself is in subject position, rather than in object position: (17)

John thought that [Peter's pictures of himself] were on sale

(18)

John thought that [pictures of himself] were on sale

144

Rita Manzini

The embedded NP in (17) is a governing category for himself, since it is a barrier for it and furthemore it contains a subject accessible to it. In (17) then himself is required to be bound within NP; this correctly predicts that it can be bound by Peter, but not by John. In (18) on the other hand the embedded NP does not contain a subject accessible to himself, nor do the embedded IP and CP. Only the superordinate VP contains a subject accessible to himself. Hence binding of himself by John is correctly predicted to be wellformed. Remember that an accessible subject is simply defined as a c-commanding subject. If c-command is not defined between two positions one of which dominates the other, the subject of the embedded IP in (18) does not c-command himself because it contains it; hence it is not accessible to it. Consider on the other hand an ungrammatical example of the type of (8) again, where himself is in the subject position of a sentence: (8)

»John thinks that [heself/ himself likes Peter]

Himself must be accessible to itself if the embedded IP is to be a governing category for it and illformedness is to be correctly predicted. Suppose then we assume that no position can dominate itself. It follows that any position can c-command itself. Hence if an accessible subject is simply a c-commanding subject the correct predictions follow. Examples of the type in (7), where himself is in the object position of a sentence, are also straightforwardly derived: (7)

John thinks that [Peter likes himself]

The embedded VP contains a subject accessible to himself, namely Peter, hence it is correctly predicted that Peter but not John can bind himself. Consider finally anaphors in the subject position of a nominal, as in (19)-(20); each other is exemplified, rather than himself, because of the lack of a genitive form for himself, which we can treat as purely accidental. It is easily checked that each other otherwise has the same distribution as himself. (19)

John and Peter like [each other's pictures]

(20)

John and Peter thought that [each other's pictures] were on sale

By our definition each other in (19) is accessible to itself, exactly as in (8). Notice however that in (19) there is an independent reason why the matrix VP but not the embedded NP is a governing category for the

Locality and Parameters again

145

reciprocal. The reason is that in (19) the matrix verb is the g-marker for each other in the Spec position of NP. NP then is not a governing category for each other because it does not dominate its g-marker. Rather, the first category that dominates the g-marker for each other is the matrix VP, and this is its governing category. The correct predictions then follow, in particular that each other can be bound by the matrix subject. As for (20), our theory predicts its ungrammaticality. Notice that each other does not have a g-marker in (20), since NP, being in subject position, is not a sister to a head. NP itself then is a barrier for each other, and since each other is accessible to itself by our definitions so far, NP is its governing category. Since of course no antecedent is available for it within NP, ungrammaticality is predicted to follow. In fact sentences of the type in (20) appear not to be worse than their counterparts in (18). However, leaving this problem aside, we have verified that the fundamental data relating to English anaphors, including himself and each other, are correctly predicted by our theory under a subject-based definition of locality. In doing so, we have also shown that at least in the cases considered so far the notion of accessibility can be reduced to that of c-command. Remember that in Chomsky (1981) accessibility is defined in terms of ccommand and of the i-within-i constraint; in particular, ft is said to be accessible to a in case it c-commands a and it can be coindexed with it under the i-within-i constraint. If we are correct the second part of this definition can be eliminated altogether. Similarly, in Manzini (1983) 7 is said to be a locality domain for a just in case two independent conditions are satisfied, which can be expressed as follows: first, y dominates a subject that c-commands a, and second, this subject is accessible to a in the sense that it does not violate the i-within-i constraint. In case the second condition is not satisfied, no locality domain for a is defined. If we are correct the whole definition of accessibility must reduce to the first of these two conditions. Nothing that I have said so far touches yet on pronouns. If an anaphor and a pronoun in a language are associated with the same definition of locality, and if locality theory is in fact a biconditional to the effect that an element is anaphoric just in case it is bound within that locality domain, we expect the pronoun and anaphor to have complementary distribution in the language. This is the prediction in Chomsky (1981) for English, and as is well known the prediction fails. Consider first a pronoun in the object and subject position of a sentence, as in (9) and (10) again: (9)

John thinks that [Peter likes him]

(10)

John thinks that [he likes Peter]

146 Rita Manzini In sentential positions there is in fact complementary distribution between anaphors and pronouns in English; hence the correct predictions for pronouns can be obtained under the subject-based definition of governing category as for anaphors. In particular, the embedded VP is the governing category for the pronouns in (9)-(10) and disjoint reference between the pronoun and Peter in (9) is correctly predicted. Consider then a pronoun in the subject position of an object NP, as in (21); this is a clear case of noncomplementary distribution of pronouns and anaphors in English: (21)

The boys saw [their pictures]

By the subject-based definition of locality, the embedded NP is not a governing category for the pronoun in (21), for the same reasons for which it is not for an anaphor. This of course yields the incorrect prediction that their is disjoint in reference with the boys. Suppose on the other hand the pronoun had no g-marker in (21). Then, the embedded NP would be a barrier and a governing category for it, their being a subject and accessible to itself, and the correct predictions would follow. Thus we should be able to construct a theory under which the g-marker for the Spec of NP is relevant if the Spec of NP is anaphoric, but not if it is pronominal. The simplest way to achieve this result is to assume that resort to a g-marker is optional in all cases. Concretely, suppose the definition of barrier is to be modified as in (22), and the definition of governing category is modified accordingly: (22)

7 is a barrier for a iff 7 is a maximal projection, y dominates a, (if a is g-marked, y dominates a g-marker for a)

Within (22) reference to the notion of g-marker is optional. Making gmarkers optional amounts to saying that the first maximal projection that dominates a position can always count as its barrier; however if the position under consideration has a g-marker its barrier can be extended to include this g-marker. It is easy to see that the locality domain defined by reference to a gmarker is always wider than the locality domain defined without reference to it. If so, in the case of anaphors the availability of both locality domains is equivalent to the availability of the wider one only. For, all referential dependency links allowed under the narrower definition of locality are allowed under the wider one as well, though the reverse does not hold. In the case of pronouns conversely we expect the availability of both to

Locality and Parameters again

147

be equivalent to the availability of the narrower one. Indeed all referential dependency links allowed under the wider definition are also allowed under the narrower, and not the reverse. If the notion of g-marker is taken into account in (21) the matrix VP is the governing category for the pronoun; if the notion of g-marker is not taken into account, the embedded NP is. In the first case, coreference between their and the boys is predicted to be impossible; but in the second case coreference between their and the boys is correctly allowed. Consider then a pronoun in the subject position of an NP which is itself in subject position, as in (23): (23)

The boys thought that [their pictures] were on sale

If we are correct, the anaphoric counterpart to (23) is informed. If so, the wellformedness of (23) under any interpretation is correctly predicted whether the notion of g-marker is taken into account or not. In the first case the locality domain for the pronoun is the same as for the anaphor, and complementary distribution is predicted. In the second case the locality domain for the pronoun is simply the embedded NP, and no disjoint reference is predicted to arise. Consider finally a pronoun in the object position of an NP, as in (24): (24)

The boys thought that [pictures of them] were on sale

(24) and its anaphoric counterpart appear to be both wellformed; thus this appears to be again a case of non-complementary distribution of pronouns and anaphors. Furthermore, the optionality of the g-marker requirement does not help in these cases, since the g-marker for the pronoun or anaphor, N, is internal to the first maximal projection that dominates them. However, it is crucial to the wellformedness of the anaphoric counterparts to (24) that the subject-based definition of locality is chosen. The reason is that the first subject accessible to the object of the embedded NP is the matrix subject. One way of deriving the noncomplementary distribution of pronouns and anaphors in examples of the type in (24) is to have recourse to our notion of barrier for pronouns, because NP is a barrier for the pronoun in (24), and under it no disjoint reference patterns are predicted to arise with them. The predictions for the examples in (21) and (23) remain unchanged, as can be easily checked; as for sentential positions, we have already seen that the subject-based definition of locality and our definition of barrier are always equivalent.

148

Rita Manzini

The last example to be considered involves a pronoun in the object position of an NP again, where the NP however is in object position, as in (25): (25)

The boys saw [pictures of them]

Again the optionality of g-markers is irrelevant here, as in (24). If the pronoun is associated with our definition of barrier, as required by (24), its locality domain is the embedded NP, hence no disjoint reference patterns are predicted to arise. In summary, if what precedes is correct, once NP-internal positions are taken into consideration, English anaphors must be associated with the subject-based definition of governing category, pronouns with the definition of locality domain corresponding to our notion of barrier.

3. ITALIAN RECIPROCAL CONSTRUCTIONS

The existence of two separate notions of locality domain corresponding to our definition of barrier and to the definition of governing category appears to be confirmed by the Italian reciprocal I'un I'altro. The Italian reciprocal consists of two elements, I'uno ('the one'/'each') and I'altro ('the other'), which occupy two different types of positions and enter two different types of dependencies. L'altro behaves like a lexical anaphor, surfacing in A-position and entering referential dependencies with other elements in A-position; I'uno behaves like a floating quantifier. Schematically, configurations of the type of (26) appear to be created, where R2 corresponds to the referential dependency between I'altro and its antecedent NP; while R1 expresses the dependency between I'uno and NP: (26)

R2 I I NP ... I'uno ... I'altro Ri

It is important to stress at this point that our purpose is not to give a full account of reciprocal constructions, either universally or for the Italian type. Rather, what I am interested in is whether l'uno I'altro in NP-internal position requires a subject-based definition of locality or rather our definition of barrier. What I will conclude is that l'uno behaves according to our definition of barrier, I'altro according to a subject-based definition of locality.

Locality and Parameters again

149

An issue that can be largely disregarded here is whether there is indeed a dependency corresponding to R, in (26); or there is only a dependency corresponding to R,* in (27) and created at LF by l'uno moving to take scope over NP. There is no doubt that (27) must be the LF for something like (26); the question is whether R[ has also an existence of its own, or only R|* does: (27)

l'unoj (NP ... t|... l'altro) R,*

Two observations are in order before we dismiss the issue. First, if I'uno at LF takes scope immediately over NP, the locality properties of R ^ are exactly the same as the locality properties of Rj. Thus (26) and (27) are equivalent in this respect. Second, accepting that something like R]* characterises the quantifier part of a reciprocal in English as well, as argued for instance in Heim et al. (1988), and that L F is not parameterised, the only hope of accounting for the discrepancies that we will see exist between Italian and English is at s-structure. Thus (26) and (27) are not equivalent in this respect, and there is perhaps a reason why R[ must be postulated as an s-structure dependency. Given this background, consider an NP in the object position of a sentence. L'uno can float either NP-externally or NP-internally. If I'uno floats NP-externally, the sentence is wellformed, provided NP is otherwise subjectless. Relevant examples are of the type in (28). Notice that in our examples the NP containing (part of) the reciprocal is systematically made into an accusative subject of a small clause, rather than into an object; this is to avoid as much as possible readings with the reciprocal taken as an argument of the verb: (28)

Quei pittori considerano l'uno [ N P i ritratti dell'altro] ammirevoli Those painters consider each the portraits of the other admirable

(28) does not chose among locality domains for l'uno, which I assume is VP-internal. By our definition of barrier its locality domain is VP, the first maximal projection that dominates it. The subject quei pittori ('those painters') is then predicted to be a possible antecedent for it, correctly. The same correct prediction, that the subject of the sentence is a possible antecedent for l'uno, follows however under a subject-based definition of governing category, since in this case the subject itself defines the governing category. Consider now I'altro. Under a subject-based definition of locality, the locality domain for I'altro is defined by the subject of the sentence, and

150

Rita Manzini

binding of l'altro from an NP-external position is correctly predicted to be possible. Under our definition of barrier, however, the locality domain for l'altro is NP; thus binding of l'altro by the subject of the sentence, which is NP-external, is incorrectly predicted to be impossible. Examples of the type of (28) seem then to argue in favour of a subject-based definition of locality for l'altro. The prediction is that adding a subject to the NP in (28) produces a sentence where l'altro cannot refer NP-externally, since NP is now the locality domain for l'altro under a subject-based definition as well. As I'uno is still NP-external and can only have an NP-external antecedent, this in turn should produce an ungrammatical sentence. The prediction seems to be correct, as in (29), where the pronominal subject of NP can indifferently be taken to be coreferential with the subject of the sentence, or not; judgements of this type are confirmed in Belletti (1983): (29)

*Quei pittori considerano l'uno [ N P i loro ritratti dell'altro] ammirevoli Those painters consider each their portraits of the other admirable

Consider now the cases, crucial to the determination of the locality domain for l'uno, where this floats NP-internally. These are exemplified in (30)(31), where (30) differs from (31) in that an overt subject is present in NP: (30)

Quei pittori considerano [ N P i loro ritratti l'uno dell'altro] ammirevoli Those painters consider their portraits each of the other admirable

(31)

Quei pittori considerano [ N P i ritratti l'uno dell'altro] ammirevoli Those painters consider the portraits each of the other admirable

(30) represents by far the easier of the two examples, though again it does not distinguish between locality domains for l'uno. The locality domain for l'uno is NP, both under our definition of barrier, since NP is the maximal projection that dominates l'uno, and under a subject-based definition of governing category, since NP has a subject. Hence the only possible antecedent for l'uno is the subject of NP, loro ('their'). This of course is also true for l'altro, which we have just seen to be associated with the subject-based definition of locality. The prediction correctly is that if the subject of NP, which is pronominal, is interpreted as coreferential with the subject of the sentence, so is the reciprocal; but not otherwise.

Locality and Parameters again

151

Consider now (31). Contrary to examples of the type of (28), which are generally judged wellformed, examples of the type of (31) give rise to contradictory judgements. Certainly (31) has a wellformed interpretation under which an empty or implicit subject of NP binds I'uno and I'altro, and this subject in turn can or not refer to the subject of the sentence. Of course, this interpretation is irrelevant here, reducing essentially to that in (30), with an empty category or an implicit argument substituted for the lexical pronoun. The relevant interpretation is that under which I'uno and I'altro are bound by the subject of the sentence, but not by the subject of the NP; in other words, the admiration is reciprocal, not the portraying. If we accept the judgement in Belletti (1983), then under this interpretation examples of the type in (31) are informed. This in turn cannot be predicted if the locality domain of I'uno is subject-based. For, under a subject-based definition, if NP has no subject, or no subject distinct from I'uno I'altro, the locality domain for I'uno is clearly the sentence. Hence binding of I'uno by the sentential subject is predicted to be possible, incorrectly. Suppose then we take our notion of barrier as defining the locality domain for I'uno. The barrier for I'uno is of course NP under our definition, since NP is a maximal projection that dominates it. L'uno must then be construed with an antecedent within NP. Since in sentences of the type of (31) its antecedent, the sentential subject, is NP-external, we correctly predict ungrammaticality. Thus, the locality domain for l'uno must be defined by our notion of barrier. Another prediction which follows from our hypothesis that Italian reciprocals are associated with our notion of barrier is that they can never be found in NP's which are in (nominative) subject position unless they are bound NP-internally. In this case, the facts are well established, as in (32): (32)

*Quei pittori pensano che [ N P lo stile l'uno dell'altro] sia ammirevole Those painters think that the style each of the other is admirable

There is of course no prohibition against having English reciprocals, or reflexives in examples of the type in (32); which follows if they are associated with a subject-based definition of locality. On the other hand, examples of the type of (33) are predicted to be wellformed, if I'altro is associated with a subject-based locality domain: (33)

Quei pittori pensano l'uno che [ N P lo stile dell'altro] sia ammirevole Those painters think each that the style of the other is admirable

152

Rita Manzini

The status of (33) is extremely difficult to assess. It appears however that if there is a violation in (33) it does not give rise to uninterpretability judgements as (32) does. Thus we can tentatively take (33) to confirm our account.

4. PARAMETERS IN LOCALITY THEORY

If the conclusions in Manzini (1988; 1989), as summarised in section 1, are correct, chain dependencies are associated with the definition of barrier in (4). On the other hand, in section 2 I have argued that referential dependencies involving English anaphors and pronouns can be associated with our notion of barrier or with the subject-based definition of governing category in (5). In section 3 the existence of two separate locality domains corresponding to our notion of barrier and to the subject-based definition of governing category has further been argued for on the basis of the Italian reciprocal I'uno I'altro. Of course, the existence of more than one definition of locality for referential dependencies leads us to the issue of whether there is a locality parameter and whether the two notions of locality considered so far are values of this parameter. The parameters that we will be concerned with fall essentially into two types. The first type of parameter is what we can refer to as an ad hoc parameter, i.e. a parameter built into the theory of locality for the sole purpose of accounting for locality effects. The second type of parameter is a non-ad hoc one in the sense that though it derives locality effects, it is not associated with the theory of locality itself. The first type of parameter is found in Manzini and Wexler (1987), Wexler and Manzini (1987) and, before that, in Yang (1984). The theory's starting point is the subject-based definition of locality in (5). The idea is that as the definition in (5) refers to the notion of subject, so other definitions of locality can refer to other opacity creating elements, such as I, finite Tns, referential (i.e. non-subjunctive) Tns, etc. Once the locality theory for chains is taken into consideration, as in section 1, the picture that emerges is that there is a basic definition of locality, as in (4), to which various opacity elements are added for referential dependencies. This is the picture arrived at in Koster (1986). The values of the locality parameter, i.e. the various opacity creating elements, are not associated with languages but with single lexical items in a language. Thus the Icelandic reciprocal has much the same form and locality properties as English each other, but the Icelandic reflexive sig obeys altogether different locality constraints. This is consistent with the general hypothesis, first put forward in Borer (1984), that parameter learning is part of the learning of the lexicon of a language; a restrictive version

Locality and Parameters again

153

of it surfaces in Manzini and Wexler (1987), Wexler and Manzini (1987) as the Lexical Parameterisation Hypothesis (see Newson, this volume, for discussion). Thus the locality parameter is consistent with the one generally accepted restriction on parameters so far proposed. The various values of the locality parameters can indeed be conceived of as features, associated with lexical items in the way other features are. Furthermore, the values of the locality parameter define languages each of which is a subset, or a superset, of each of the others; they are therefore ordered in a markedness hierarchy by the Subset Principle. Notice that if this is the case the result that locality is not parameterised for traces, though it is for other elements, can be derived directly from the fact that parameters are associated with lexical items. For, if the unmarked value of a parameter is conceived of as the default value, empty categories are necessarily associated with it. Thus the fact that no variation ever involves traces appears to provide evidence that markedness hierarchies are present at least to the extent that the unmarked setting of a parameter is distinguished from the marked setting(s). In turn the definition of markedness on the basis of the Subset Principle is supported to the extent that it in fact predicts the correct value to be the unmarked one. Remember on the other hand that we ultimately intend to formulate the locality parameter so that (4) becomes the basis of a definition of locality to which the notion of subject, as in (5), and possibly other opacity creating elements, can be added as different settings of a parameter. Under this picture another obvious definition of markedness can be suggested to apply, independent of the Subset Principle. Since all definitions of locality include (4) as one of their subparts, (4) is in fact universal. If so, (4) can be unmarked for the simple reason that it is part of universal grammar; other values are then marked for the simple reason that they can only be obtained by 'manual' alteration of the inbuilt programme, where the alteration takes the restrictive form of addition, and not deletion, of information. Again the association of chains with (4) as the unmarked setting for locality follows. Notice that the choice between the subset based definition of markedness hierarchies and the one tentatively sketched here is an entirely empirical one, since it is obvious that their predictions must be at variance with one another. For instance, one of the striking properties of the markedness hierachies defined by the Subset Principle is that they differ for anaphors and pronominals; in fact the markedness hierarchy for pronominals is the mirror image of the markedness hierarchy for anaphors. Under a definition of markedness of the type we are envisaging, (4) represents the unmarked value across any other linguistic categorisations. This, however, and other questions of a psycholinguistic nature, cannot be settled within the limits of this investigation. Rather, the question that

154

Rita Manzini

I intend to settle is the purely linguistic one, concerning the adequacy of ad hoc and non-ad hoc models of locality parameters. In short, if the discussion that precedes is correct, there is a parameterised definition of locality of the form in (34), where the requirements corresponding to the definition of barrier must be invariably satisfied and various additional opacity-creating elements form additional optional requirements: (34)

7 is a locality domain for a iff 0. 7 is a maximal projection, 7 dominates a, (if a is g-marked, 7 dominates the g-marker of a) and 1. 7 dominates a subject accessible to a; or 2. etc.

If the notion of government is defined in terms of locality domain, as in (34), rather than in terms of barrier, it follows that not only the conditions on movement, but also the Binding Theory, as in (12), can be formulated as government conditions, completing the unification of locality theory with respect at least to the notion of locality referred to. Let us then consider what we have referred to as the non-ad hoc approach to the locality parameter, represented notably by Pica (1987). According to Pica (1987), as well as Chomsky (1986a), lexical anaphors move at LF; Binding Theory, or ECP, holds of the anaphor trace link, rather than of the anaphor and its antecedent. According to Pica (1987), there are essentially two types of anaphors, anaphors like each other or himself which are descriptively associated with a subject-based definition of locality, and anaphors of the type of Icelandic sig, that are descriptively associated with a referential Tns opacity element. The long-distance binding effects with anaphors such as Icelandic sig, whose opacity creating element in terms of a definition like (34) is an indicative Tns, follow from the fact that they are X's and they move at LF from head to head. This leaves us then with anaphors of the type of himself or each other, which are XP's and whose movement is strictly local. The strict locality of the movement of these anaphors follows again from the fact that these anaphors move according to their categorial type, if Pica (1987) is correct. Unfortunately, the theory encounters serious execution problems. Consider in particular XP anaphors. If XP anaphors move according to the XP type, only two possibilities are open: either they move to A-position or they move to A'-position; but both options are higly problematic. If they move to A-position, it is indeed expected that they display strictly local binding effects; but in general there will be no A-position for them to move into. If they move to A'-position, then there appears to be absolutely no reason why they couldn't move successive cyclically, thus producing long-distance binding effects once more.

Locality and Parameters again

155

I do not want to imply that these problems are insoluble; only that their solution will involve a complication of the grammar. If so, the potential simplicity argument for what I have called non-ad hoc theories would disappear. What is crucial to our argument however is that the parameter derived in Pica (1987) is a two-way parameter, between short-distance and long-distance dependencies. If Manzini and Wexler (1987), Wexler and Manzini (1987) are correct, there are of course many more values to the parameter; but leaving these aside, on the basis solely of the new evidence presented here, we must conclude that the parameter that is needed for observational adequacy is at least a three-way one, the short-distance value of Pica (1987) splitting into a subject-based and a non-subject-based value. Notice that a parameter to this effect can presumably be added to the theory in Pica (1987); but this only illustrates our point further, namely that a locality parameter is required in any case. In fact, a desirable property of the theory in Pica (1987) is that it links the locality domain of anaphors and pronouns with their ability or inability to take antecedents other than subjects. In particular, it appears to be a fact that anaphors whose opacity element is a subject, do not necessarily have a subject as their antecedent; on the other hand, long-distance anaphors are subject oriented. The link between long-distance binding and subject orientation follows if the landing site for a long-distance anaphor, which is an X° and moves head-to-head, is I. This still needs to be stipulated within the theory, but it at least provides a natural basis for linking antecedence and locality. By contrast, the theory in Manzini and Wexler (1987), Wexler and Manzini (1987) cannot derive the link between long-distance binding and subject orientation. Rather, the subject orientation of certain anaphors is treated as a second parameter, the antecedent parameter. Thus it is likely that at least this aspect of the theory in Pica (1987) is correct. This however leaves our present argument unchanged. Once more, our argument is simply that a crucial feature of the theory in Manzini and Wexler (1987), Wexler and Manzini (1987) must be retained, namely parameterised locality domains. Any modification of the original conception of the locality parameter in order to take into account at least the link with the antecedent parameter must in turn raise the question whether the argument in favour of the Subset Principle is preserved. In the meantime, however, our provisional conclusions are supportive of the Subset Principle. If the necessity of a parameter of the type in (34) is demonstrated, the original argument in favor of the Subset Principle stands at least for the time being.

156

Rita

Manzini

REFERENCES Abney, S. 1987. The English Noun Phrase in its Sentential Aspect. Doctoral Dissertation, MIT. Belletti, A. 1983. On the Anaphoric Status of the Reciprocal Construction in Italian. The Linguistic Review 2. Borer, H. 1984. Parametric Syntax. Dordrecht: Foris. Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. 1986a. Knowledge of Language: its Nature, Origin and Use. New York: Praeger. Chomsky, N. 1986b. Barriers. Cambridge, Massachusetts: MIT Press. Heim, I., H. Lasnik and R. May, to appear. Reciprocity and Plurality. Ms. UCLA, University of Connecticut and UC Irvine. Hyams, N. 1986. Language Acquisition and the Theory of Parameters. Dordrecht: Reidel. Kitagawa, Y. 1986. Subjects in Japanese and English. Doctoral Dissertation, University of Massachusetts. Koster, J. 1986. Domains and Dynasties. Dordrecht: Foris. Manzini, M. R. 1983. On Control and Control Theory. Linguistic Inquiry 14. 421-446. Manzini, M. R. 1988. Constituent Structure and Locality. In A. Cardinaletti, G. Cinque and G. Giusti (eds.) Constituent Structure. Papers from the 1987 GLOW Conference, Annali di Ca' Foscari 27, IV. Manzini, M. R. 1989. Locality. Ms. University College London. Manzini, M. R. and K. Wexler. 1987. Parameters, Binding Theory and Learnability. Linguistic Inquiry 17. 413-444. Pica, P. 1987. On the Nature of the Reflexivization Cycle. In Proceedings ofNELS 17. GSLA. University of Massachusetts. Sportiche, D. 1988. A Theory of Floating Quantifiers and its Corollaries for Constituent Structure. Linguistic Inquiry 19. 425-449. Wexler, K. and M. R. Manzini 1987. Parameters and Learnability in Binding Theory. In T. Roeper and E. Williams (eds.) Parameters in Linguistic Theory. Dordrecht: Reidel. Yang, D.-W. 1984. The Extended Binding Theory of Anaphora. Theoretical Linguistic Research 1.

On the rhythm parameter in phonology* Marina Nespor University of Amsterdam

It is an observable fact that languages such as Spanish or Italian sound very different from, say, English or Dutch. It is this difference that made Lloyd James (1940) compare the sound of the first type of languages to the sound of a machine-gun and that of the second type of languages to a message in morse code. Lloyd James went then a step further by attributing this difference in sound to different types of rhythm: "machinegun rhythm" and "morse code rhythm". This dichotomy was taken over by Pike (1945) who renamed the two types of rhythms syllable-timed and stress-timed. That is, Spanish would have a temporal organisation based on the regular recurrence of syllables and English one based on the regular recurrence of stresses.1 Abercrombie (1967) went even further by claiming that the rhythm based on the isochrony of syllables and that based on the isochrony of interstress intervals are the only two rhythms available for the languages of the world. The isochrony of syllables and that of interstress intervals are, moreover, mutually exclusive. They would be compatible only in an ideal language in which there were only one syllable type and in which stressed syllables were maximally alternating with unstressed ones. In fact, the existence of such a language has never been attested, so that, within this conception of rhythm, a language belongs either to one category or to the other. According to this view, having one type of rhythm rather than the other has many consequences for the phonology of a language. That is, a stresstimed language is characterised by a set of properties not present in syllable timed languages, and vice versa. For example, if interstress intervals are to be isochronous in English, the syllables contained in an interval must be reduced in certain cases and stretched in other cases, to achieve the desired result. These processes would not have any reason to exist in Spanish where rhythm is supposedly not based on the regular recurrence of stress. The question that will be addressed in this paper is whether there are indeed two types of rhythm, that is, whether the machine-gun vs. morsecode distinction is represented by the different settings of a single parameter in the phonology of rhythm. A very important prediction made by the postulation of such a parameter is that no language exists that shares some of the phonological characteristics

158

Marina Nespor

typical of stress-timed languages and some typical of syllable-timed languages. That is, within a theory in which one of the two mutually exclusive types of rhythms is at the origin of a series of phonological processes, it is impossible to have a system whose sound is neither that of a machinegun, nor that of a morse code, but rather intermediate between the two. This view of rhythm has, in addition, implications for learnability: if there is a parameter for rhythm, it should be possible to find evidence that this parameter is set in one of two ways on the basis of primary linguistic data. That is, a child would select one of the two types of rhythms depending on the language he is exposed to and would subsequently develop the set of phonological rules that belong to that particular system. In this paper, following a suggestion by Dasher and Bolinger (1982), I will take a view that is the opposite of the one discussed so far. The basic idea is that, the machine-gun vs. morse code distinction which characterises the rhythm of Spanish on the one hand and that of English on the other hand, is not the result of two different settings of one single parameter. I will argue that the phonology of rhythm does not contain a parameter that accounts for the machine-gun vs. morse-code distinction. This distinction is the result of a series of nonrhythmic phonological processes rather than the cause of these processes. If a specific set of phonological processes coexist in a certain system, the language in question gives the machine-gun impression; if another set of processes coexist, the morse code effect is originated. The problem we are confronted with is, first of all, how to empirically distinguish the two alternative proposals. It is important to observe that if there are independent reasons for the clustering together, in a given phonological system, of the specific phonological processes that characterise stress rhythm or of those that characterise syllable rhythm, then the task of phonologically determining which of the two theories has greater empirical adequacy would be a very hard one. If, however, no such independent reasons exist, the nonparametric approach (henceforth "Theory 2" (T2)) makes a prediction whose verification represents a falsification of the parametric approach ("Theory 1" (Tl) ). That is, the machinegun and the morse code types of languages would be extreme cases at the two ends of a continuum in which there are languages that share some nonrhythmic phonological properties with both and whose rhythm, consequently, is perceived as neither stress-timed nor syllable-timed. If this prediction of T2 is empirically confirmed, then the question arises as to whether it is appropriate to speak of two different types of rhythmic organisation. If rhythmic organisation follows general principles in all languages then we would expect to have similar processes in the phonology of rhythm of both so-called syllable-timed and stress-timed types of

On the rhythm parameter in phonology

159

languages. Alternatively, we would expect them to have two quite distinct rhythmic subcomponents. Additional empirical tests that would help us choose between T1 and T2 come from the area of language acquisition: if T1 is on the right track, the machine-gun vs. morse code differences should be audible in the speech of children from the first stages of the acquisition of language. If on the other hand, T2 is on the right track, the difference between the two should be clear only after all the phonological processes that give rise to the machine-gun vs. morse code effect have been developed. In this paper, I will argue that T2 is preferable for reasons of empirical adequacy. 2 To this end, I will first present the results of phonetic experiments carried out by a number of phoneticians, which indicate that the dichotomy stress-timing vs. syllable-timing is neither based on any measurable physical reality, nor confirmed by data on perception (cf. section 1). I will then present phonological evidence that the relation of causality between a certain type of "rhythm" and the existence of certain nonrhythmic phonological processes is as predicted by T2 (cf. section 2). In section 3, I will argue in favour of a unified rhythmic subcomponent of phonology for Italian and English, two typical examples of syllable-timing and stress-timing, respectively, for the supporters of T l . The conclusion will then be drawn that the rhythmic organisational principles of the two groups of languages are the same and that the element that regularly recurs to give the impression of "order [...] in movement" (cf. Plato, The Laws, book II: 93) is stress. No justification is thus left for the classification of languages into stresstimed and syllable-timed (cf. also den Os, 1988).

1. PHONETIC EVIDENCE AGAINST TWO TYPES OF TIMING

The dichotomy stress-timed and syllable-timed has largely been taken for granted since Pike (1945), although already from the early sixties many studies devoted to the issue have put into question the physical basis of this dichotomy. Shen and Peterson (1962), O'Connor (1965) and Lea (1974), for example, have shown, with different types of experiments, that in English, interstress intervals increase in duration in a manner that is directly proportional to the number of syllables they contain. Bolinger (1965), besides showing that the isochrony of interstress intervals in English is not a physical reality, finds that the length of the intervals is influenced not only by the number of syllables they contain, but, among other factors, also by the structure of the syllables and the position of the interval within the utterance. More recently, Roach (1982) carried out some experiments to test two claims made by Abercrombie (1967:98): first, that there is variation in

160

Marina Nespor

syllable length in a stress-timed language as opposed to a syllable-timed language, second that in the latter type of languages, stress pulses are unevenly spaced. The languages that form the empirical basis of the experiments are precisely those mentioned by Abercrombie: French, Telegu and Yoruba as examples of syllable-timing, and English, Russian and Arabic as examples of stress-timing. The first claim made by Abercrombie is not supported by the results, since deviations in syllable duration are very similar in all six languages. As far as the second claim is concerned, the results even contradict it, in that deviations in interstress intervals are higher in English than in the other languages. Borzone de Manrique and Signorini (1983), investigated a "syllabletimed" language, (Argentinian) Spanish, and showed a) that syllable duration is not constant but varies depending on various factors and b) that interstress intervals tend to cluster around an average duration. Their conclusion is that Spanish has a tendency to stress alternation. In an article that reports the results of experiments carried out both on "stress-timed" and on "syllable-timed" languages (Dauer, 1983) it is shown that the duration of interstress intervals is not significantly different in "stresstimed" English, on the one hand, and "syllable-timed" Spanish, Italian and Greek, on the other hand. Dauer thus suggests that the timing of stresses reflects universal properties of language organisation (cf. also Allen, 1975). den Os (1988) contains a comparative study of rhythm in Dutch and Italian. She measures interstress intervals in the two languages and shows that, given intervals with the same number of syllables and syllables with the same number of phonemes, there is no difference in duration in the two languages. That is, if the phonetic content of the two languages is kept similar, then their rhythm is similar. This amounts to saying that it is the phonetic material of a string rather than a particular timing strategy that gives rise to the perception of a different temporal structure in Dutch and Italian. From all these studies 3 two important conclusions may be drawn. First, the isochrony of interstress intervals and syllables does not exist in the physical reality of "stress-timed" and "syllable-timed" languages, respectively. Second, the two groups of languages do not show significant differences in their temporal organisation. These conclusions indicate that there is no acoustic support for the rhythmic nature of the dichotomy stress-timed and syllable-timed. Lehiste (1973, 1977), following a suggestion by Classe (1939), proposes that isochrony, though not detectable in the physical message, could characterise the way in which language is perceived. Specifically, the intuition speakers have about the isochrony of interstress intervals in English may be based on a perceptual illusion. That is, the tendency of listeners

On the rhythm parameter in phonology

161

to hear such intervals as more isochronous than they really are (cf. also Donovan and Darwin, 1979, Darwin and Donovan, 1980) might suggest the presence of an underlying rhythm that imposes itself on the phonetic material. In other words, the (more or less) regular recurrence of stresses would be part of the rhythmic competence of native speakers of English. A similar conclusion is reached by Cutler (1980a) on the basis of syllable omission errors. These speech errors tend to produce sequences whose interstress intervals are more regular than they are in the original target sentence (cf. also Cutler, 1980b). These results are very interesting for the present discussion in that T1 and T2 make different predictions about perception as well. According to T l , first, the behavior exhibited by native speakers of English to regularise interstress intervals should be extraneous to native speakers of "syllabletimed" languages since stress would supposedly not play any role in their rhythmic organisation; second, the native speakers of syllable-timed languages should have the tendency of perceiving syllables as more isochronous than they really are. As far as the latter prediction of Tl is concerned, there are, to my knowledge, no perception experiments on syllable-timed languages that would parallel those just mentioned for English. It is, however interesting to notice that most claims about Spanish, French, Italian or Yoruba having syllables of similar length are made by native speakers of English, not by native speakers of syllable-timed languages, the ones that supposedly should most feel this type of regularity. Concerning the first prediction of T l , important results have been reached by Scott, Isard and de Boysson-Bardies (1985), who found that native speakers of "syllable-timed" French and of "stress-timed" English behave in the same way: they both hear the intervals in between stressed syllables as more regular than they actually are. While the similar behaviour of French and English listeners in the perception of linguistic rhythm contradicts the first predicion of Tl mentioned above, it is just what T2 would predict: since language is temporally organised according to universal principles, these should have similar effects in the perception of all languages. These results thus indicate that there is no perceptual support for different underlying rhythmic systems for stress-timed and syllabletimed languages.

162

Marina Nespor

2. PHONOLOGICAL EVIDENCE AGAINST TWO TYPES OF RHYTHM

2.1. Nonrhythmic languages

characteristics

of "stress-timed"

and

"syllable-timed"

If "stress-timed" and "syllable-timed" languages do not differ in their underlying rhythmic organisation, a different explanation of the machinegun vs. morse code effect is called for. That is, if the isochrony of interstress intervals and of syllables does not exist in the physical reality of the different languages, the question to be asked is which other physical characteristics of the two types of languages are responsible for the fact that they are perceived as either "stress-timed" or "syllable-timed" (cf. Lehiste, 1977). Dauer (1983), an important contribution to T2, indicates three factors that would contribute to give the illusion of different temporal organisations: syllabic structure (cf. also Bolinger, 1962), vowel reduction and the various physical correlates of stress. As far as the syllable is concerned, in stress-timed languages there is, according to Dauer, a greater variation in syllable types and thus in their length than there is in syllable-timed languages. In English, for example, the most common syllables consist of a minimum of one and a maximum of seven segments that result in 16 syllable types 4 . In Dutch, another language classified as stress-timed, the same amount of segments per syllable yields 19 most common syllable types. In Spanish, on the other hand, the most common syllables contain from 1 to 5 segments that result in 9 syllable types, and in both Italian and Greek there are up to 5 segments in a syllable and a total of 8 most common types of syllable. A reduced variation in syllable complexity is partly responsible for the fact that Spanish, Italian and Greek give the impression of having more or less isochronous syllables in comparison to languages with a much larger variation in syllable complexity. Dauer observes that, in addition, more than half of the Spanish and French syllables are of the CV type. Similar results are reached by Bortolini (1976) for Italian: over 60% of the syllables are CV. The fact that the large majority of syllables in these three languages are open contrasts with the situation in English and Dutch, where there is a greater distribution of occurrence of the different types of syllables and where open syllables are by no means the majority. These statistical observations provide a second clue as to why Spanish, Italian or French give the impression of having syllables of similar length when compared with English or Dutch. In addition, it is observed by Dauer that "stress-timed" languages have a strong tendency, opposed to only a slight tendency in "syllable-timed" languages, for heavy syllables to be stressed and for light syllables to be stressless. Since duration is one of the physical correlates of stress, syllable

On the rhythm parameter in phonology

163

weight and stress reinforce each other in some languages much more than in others. The second phonological factor that, according to Dauer, characterises "stress-timed" English, Swedish and Russian as opposed to Spanish, Italian or Greek, is the reduction of stressless vowels. A phenomenon instead that is widespread in "syllable-timed" languages is the deletion of one of two adjacent vowels. The important difference between the two processes for the present discussion is that while a syllable whose vowel undergoes reduction retains its syllabicity, a syllable whose vowel undergoes deletion disappears. Very short syllables are thus originated in "stress-timed" languages but not in "syllable-timed" languages. Thus, the lack of vowel reduction in Spanish, Italian and Greek also contributes to the impression of syllable isochrony in these languages. The presence of it in English or Dutch, instead, is partly responsible for the impression that the stressed syllables recur at regular intervals. That is, the fact that stressless syllables are reduced and thus shortened, together with the fact that they are shorter than stressed syllables to begin with, makes them so much less prominent than the syllables that carry stress, and the impression is created that a sequence of stressless syllables occupy a more or less constant amount of time, independently of how many syllables it contains. 5 Finally, stress has a greater lengthening effect in English than it has in Spanish (cf. Dauer, 1983). This is one more characteristic that makes the difference in duration between stressed and stressless syllables much greater in the former language than in the latter, thus reinforcing the illusion of regular recurrence of stresses and syllables, respectively. Now that the nonrhythmic processes have been identified that are present in the languages most often used as examples of either stress-timing or syllable-timing, it must be demonstrated that the causality relation between rhythmic and nonrhythmic phonology supports T2.1 turn to this task in the next section. 2.2. On the existence of intermediate systems As was mentioned in the introduction, the prediction made by T2 is that either there are independent reasons why different timing related phonological processes coexist in a given system, or else languages would exist that are intermediate between "stress-timed" and "syllable-timed" languages as far as the perception of their temporal structure is concerned. In the first case, the division of languages into two groups would be justified independently of whether a certain type of rhythm triggers the application of phonological rules or certain types of rules produce a certain rhythmic effect. I am not aware, however, of any reason why vowel reduction, a rich syllable structure and certain phonetic correlates of stress should coexist in one phonological system. And, in fact, there are systems

164 Marina Nespor that share some nonrhythmic phonological rules with so-called stress-timed languages and some with so called syllable-timed languages. Catalan has such a phonological system: it has 12 most common syllable types constituted by a minimum of 1 and a maximum of 6 segments. It is thus, in this respect, intermediate between Italian, Greek and Spanish on the one hand, with 8 to 9 syllable types and a maximum of 5 segments per syllable, and Dutch and English on the other hand, with 16 to 19 syllable types and a maximum of 8 segments per syllable. Catalan has, in addition, a rule that centralises unstressed vowels thus reducing their syllables (cf. Mascaro, 1976, Wheeler, 1979), a rule typical, as we have seen, of stress-timed languages. In addition, the phonology of Catalan contains a rule that deletes one of two adjacent vowels under certain conditions (cf. Mascaro, 1989), a rule typical of "syllable-timed" languages, according to Dauer (1983). As far as stress is concerned, there is no strong tendency for it to fall on heavy or long syllables. In this respect, Catalan is thus more similar to Italian or Spanish than it is to English or Dutch. Not surprisingly, Catalan is neither machine-gun like, nor morse code like. A second language that has been described as neither stress nor syllabletimed is Portuguese. Brazilian Portuguese, in particular, has a repertoire of syllable types similar to that of "syllable-timed" languages, a minimum of one and a maximum of 5 segments per syllable, but has several simplifications of syllable structure when the syllable is not stressed (cf. Major, 1985): for example, unstressed vowels are often raised and thus shortened and diphthongs are reduced to monophthongs when unstressed. It is because of these characteristics and because of a tendency to regularly alternate strong and weak syllables (cf. Maia, 1981), that Portuguese has been said to be a language whose rhythm is changing from syllable-timed to stress-timed (cf. Major, 1981). Polish appears also to be an intermediate case: it has a very complex syllable structure as well as alternating rhythmic stress (cf. Rubach and Booij, 1985), but no rule of vowel reduction at normal rates of speech. Vowels are reduced in fast speech; this is, however, a phonetic process that is not typical of "stress-timed" languages only, but takes place in "syllable-timed" languages as well (cf. den Os, 1988). Again, it is not surprising that Polish is considered stress-timed by some linguists (e.g. Rubach and Booij, 1985) and syllable-timed by others (Hayes and Puppel, 1985).6 The existence of languages whose temporal structure is in between that of "stress-rhythm" and of "syllable-rhythm" cannot be accounted for within Tl. It is, instead, exactly what we expect, given T2.

On the rhythm parameter in phonology

165

2.3. On the development of rhythm We have seen, in section 1, that the classification of languages into those with a stress based type of rhythm and those with a syllable based rhythm does not correspond to any physical reality. In section 2.1., we have seen that there are several nonrhythmic phenomena typical of languages classified as stress-timed and others typical of languages classified as syllabletimed that may very well be at the origin of the perception of different rhythms in the two types of languages. In order to show that the phonological processes are indeed the cause, and not the effect, of the machine-gun and morse code effects, we have pointed to the existence of languages that have some phonological processes in common with "stresstimed" languages and some with "syllable-timed" languages and whose rhythm is neither perceived as stress-timed nor as syllable-timed. In this section we will examine another piece of evidence in favour of T2 based on the acquisition of phonology. If two types of rhythmic organisation were available for the languages of the world, we would expect the different rhythms to be acquired quite early in the phonological development of a child. That is, the rhythm parameter should be set before the acquisition of the phonological rules triggered by one specific type of rhythm. One such rule, for "stress-timed" languages, would be vowel reduction, the intensity of which would have to be directly proportional to the number of syllables contained in an interstress interval. If, on the other hand, rhythm is not parametric, but rather a universal organising element in language, we would expect that in the first stages of language acquisition, when the phonological system of a language is not yet completely developed, the surface temporal patterns of speech would be more similar for speakers of "stress-timed" and of "syllable-timed" languages than it is at later stages. That is, before the development of the phonological rules and structures that give the illusion of different temporal organisations in different languages, such an illusion should not exist. One of the characteristics of the first stages of language acquisition, for example, is a very uniform syllable structure. At this stage, one would thus expect the difference in the number of occurring syllable types in the speech of, say, English and Italian children not to be as large as it is in the speech of adult speakers and thus also the temporal structures of the two speeches to be more similar than at later stages. It has, in fact, been observed in Allen and Hawkins (1975), an experimental study on the development of rhythm in native speakers of English, that children's first utterances contain only heavy syllables, in the sense that all vowels are fully articulated. The lack of syllable reduction makes the rhythm at this stage of language acquisition sound syllable-timed

166

Marina Nespor

(cf. Allen and Hawkins, 1975). It is the acquisition of the reduction processes that contributes to the development of adult rhythm. Once more, we are confronted with data that are accounted for within T2, while they are not explainable within Tl. From the observations presented in sections 1 and 2, the conclusion must be drawn that T2 is superior to Tl for both phonetic and phonological reasons. That is, no motivation has been found in favour of different temporal organisations in language, but rather against it.

3. THE PHONOLOGY OF RHYTHM: ARGUMENTS FOR A UNIFIED RHYTHMIC COMPONENT

3.1. The metrical grid in English and Italian Given the conclusion of the previous sections, we expect the rhythmic subcomponents of so called stress-timed languages not to differ from that of syllable-timed languages. The present section, as well as the following three, is devoted to arguments in favour of a nonparametric rhythmic subcomponent of phonology. The languages on which the discussion will be based are English and Italian. It has been suggested in Selkirk (1984), that the difference between stresstimed and syllable-timed languages is incorporated at the basic level of the metrical grid, the representation of rhythm. Specifically, while in the grid of both English and Italian, to each syllable in the linguistic material corresponds an x at the first grid level, at the second, or basic level, a distinction is made between the two languages: in English, only those syllables that have some degree of stress are assigned an x, while in Italian, every syllable is assigned an x, independently of its being stressed or stressless (cf. (1) a and b, respectively (Selkirk's examples)).

(1)

a.

x x x x x x x the manager's here

b.

x xxx x xxx il popolo

In this way, the observation that Italian syllables are more or less isochronous is incorporated in the representation of rhythm. Since, however, the length of a syllable depends crucially on the number of segments it contains, both in Italian and in English, and since the number of segments per syllable can vary in Italian, though less than in English, the representation proposed by Selkirk for Italian is not a reflection of physical reality. The results of an experiment described in den Os (1988) indicate, in addition, that representing the timing of "stress-timed" and

On the rhythm parameter in phonology

167

"syllable-timed" languages in different ways does not reflect the way in which the two types of languages are perceived either. The two languages on which den Os's experiment is based are Italian and Dutch. A Dutch and an Italian text, similar in syllable composition were recorded and then delexicalised by means of low-pass filtering (cf. den Os, 1988: 40). These utterances formed the material for one experiment. The two texts were then devoided of their melody. The utterances without intonation formed the material for a parallel experiment. Both intonated and monotone versions of the two texts were presented to native speakers of Dutch that had to determine whether what they were hearing was originally Dutch or Italian. While the subjects very often correctly identified the two languages when confronted with the intonated texts, they were absolutely unable to do so with the monotone versions. Since the rhythmic patterns of the two languages was not modified in any way, the results of this experiment show that Italian and Dutch do not differ as to their rhythmic organisation. It is the segmental material that fills the syllables that gives the illusion of different rhythms. This experiment convincingly shows that the two different metrical grids for Italian and English proposed by Selkirk are not a reflection of the perception of rhythm. The conclusion must thus be drawn, I believe, that if the metrical grid is to represent rhythm, it should not make any distinction between stress-timed and syllable-timed languages. Therefore, the second grid level will only contain one x for every prominent syllable in both types of languages (cf. also Roca, 1986). The first two levels of the grid of il popolo are thus as in (2).

(2)

x x xxx il popolo

I will now turn to some observations about certain rules of rhythm in English and Italian, as well as about the structures that constitute arhythmic configurations in the two languages.

3.2. The Rhythm Rule in English and Italian The phonology of English includes a rhythmic process whose effect is that of eliminating arhythmic configurations consisting of word primary stresses on adjacent syllables, the so called stress clash (cf. Liberman and Prince, 1977). The phenomenon is usually accounted for by a rule that moves the leftmost of the two stresses to the next syllable with some

168

Marina Nespor

prominence. The application of the rule is illustrated in (3), where " / " marks word primary stress. 7 (3)

a. b.

thirteen vs. thirteen men Tennessee vs. Tennessee air

The same process is present in the phonology of Italian, as Nespor and Vogel (1979) have shown (cf. (4)). (4)

a. b.

ventitré 'twentythree' si presenterà '(it) will be presented'

vs. véntitre gradi 'twentythree degrees' vs. si présentera bène '(it) will be well presented'

It has, in addition, been shown that the domain within which the rules apply is identical in the two languages (cf. Selkirk, 1978, Nespor and Vogel, 1982, 1986). As shown in (5) and (6) for Italian and English, respectively, this domain coincides with the phonological phrase [