Learning: Rule Extraction and Representation 9783110803488, 9783110161335

182 72 11MB

English Pages 304 Year 1999

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Learning: Rule Extraction and Representation
 9783110803488, 9783110161335

Table of contents :
Introduction
I. Rule Acquisition and Representation of Structured and Unstructured Information
1. Words as Programs of Mental Computation
1.1 Language as a Species Specific Capacity – Conceptually Necessary Assumptions
1.2 Structure and Interpretation of Linguistic Representations
1.3 Lexical Items and the Combinatorial Principles of I—Language
1.4 Further Computational Aspects of Linguistic Expressions
1.5 Summary and Perspectives
2. Discovering Grammar: Prosodic and Morpho-Syntactic Aspects of Rule Formation in First Language Acquisition
2.1 Introduction
2.2 The Prosodic Bootstrapping Account
2.3 Distributional Learning in the Acquisition of Morpho-Syntax
2.4 Conclusions
References
3. Rule-Application During Language Comprehension in the Adult and the Child
3.1 Introduction
3.2 ERPs as a Method for Examining Language Comprehension Processes
3.3 ERPs and Semantic Processing
3.4 ERPs and Syntactic Processing
3.5 ERPs in Language Development
References
4. Learning, Representation and Retrieval of Rule-Related Knowledge in the Song System of Birds
4.1 Introduction
4.2 Song Learning: Acquisition of Rule-Related Knowledge
4.3 Retrieval of Rule-Related Knowledge: Evidence from Song Performance
4.4 Conclusions: Processing of Rule-Related Knowledge in a Songbird
References
5. Representation and Learning of Structure in Perceptuo-Motor Event-Sequences
5.1 Introduction
5.2 The SRT-Learning Task
5.3 Neural Representation of Sequence Knowledge
5.4 Theoretical Accounts of Implicit Sequence Learning
5.5 Conclusions
References
6. Imposing Structure on an Unstructured Environment: Ontogenetic Changes in the Ability to Form Rules of Behavior Under Conditions of Low Environmental Predictability
6.1 Introduction
6.2 The Concept of Cognitive Control
6.3 Age Differences in Cognitive Control
6.4 Age-Related Changes in the Ability to Form Rules of Behavior Under Conditions of Low Environmental Predictability
6.5 Are Age Differences in Fluid Intelligence Predictive of Age Differences in the Ability to Generate Rules of Behavior under Conditions of Low Environmental Structure?
6.6 Summary and Conclusions
References
II. Perception and Representation of Visual-Spatial and Temporal Information
7. Motion Perception and Motion Imagery: New Evidence of Constructive Brain Processes from Functional Magnetic Resonance Imaging Studies
7.1 Introduction
7.2 FMRI Experiments
7.3 Conclusions
References
8. Recognition Memory of Objects and Spatial Locations: Figural and Verbal Representations
8.1 Introduction
8.2 Recognition of Familiar Objects and Spatial Locations Based on Pictures and Words
8.3 Recognition of Unfamiliar Objects and Spatial Locations
8.4 General Discussion
8.5 Conclusions
References
9. Memory for Time: Separating Temporal from Spatial Information Processing
9.1 Introduction
9.2 No, Single, or Double Dissociation of Temporal and Spatial Information Processing? Behavioral Experiments
9.3 Methodological Shift: ERPs
9.4 Conclusions
References
10. Spatial Representations in Small-Brain Insect Navigators: Ant Algorithms
10.1 Introduction
10.2 Analysis: Vectors, Routes, and Maps
10.3 Discussion: Navigating Successfully
10.4 Conclusions
References
11. Elementary and Configural Forms of Memory in an Insect, the Honeybee
11.1 Introduction
11.2 Elementary and Configural Forms of Learning in Classical Conditioning
11.3 Learning in the Natural Context
11.4 Conclusion
References
List of contributors
Index

Citation preview

Learning Rule Extraction and Representation

1749

I

1999

?

Learning Rule Extraction and Representation

Editors Angela D. Friedend Randolf Menzel

W DE Walter de Gruyter G Berlin · New York 1999

Editors Professor Dr. phil. Angela D. Friederici Max Planck Institute of Cognitive Neuroscience Stephanstr. l a D-04103 Leipzig Germany

Professor Dr. rer. nat. Randolf Menzel Institute of Neurobiology Free University Berlin (FU) Königin-Luise-Str. 28—30 D-14195 Berlin Germany

With 47 figures and tables. This project has been sponsored by the Berlin Brandenburg Academy of Science and the Max Planck Institute of Cognitive Neuropsychology, Leipzig. © Printed on acid free paper which falls within the guidelines of the ANSI to ensure permanence and durability. Library of Congress

Cataloging-in-Publication-Data

Learning : Rule extraction and representation / editors, Angela D. Friederici, Randolf Menzel. Includes bibliographical references. ISBN 3-11-016133-8 (cloth : alk. paper) 1. Learning. 2. Neuropsychology. 3. Cognitive neuroscience. 4. Evoked potentials (Electrophysiology) I. Friederici, Angela D. II. Menzel, Randolf, 1940— QP408.L44 1999 612.8'.2—dc21 98-51761 CIP

Die Deutsche Bibliothek —

Cataloging-in-Publication-Data

Learning : Rule extraction and representation / ed. Angela D. Friederici ; Randolf Menzel. — Berlin; New York : de Gruyter, 1999 ISBN 3-11-016133-8

© Copyright 1999 by Walter de Gruyter G m b H & Co. KG, D-10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage and retrieval system, without permission in writing from the publisher. Printed in Germany. Typesetting and Printing: Arthur Collignon GmbH, Berlin. — Binding: Lüderitz & Bauer, Berlin. — Cover Design: Hansbernd Lindemann, Berlin. - Cover Illustration: Ruth Tesmar, Berlin.

Contents

Introduction

XI

Angela D. Friedend and Randolf Menzel

I. Rule Acquisition and Representation of Structured and Unstructured Information 1. Words as Programs of Mental Computation Manfred Bierwisch 1.1 1.2 1.3 1.4 1.5

Language as a Species Specific Capacity — Conceptually Necessary Assumptions Structure and Interpretation of Linguistic Representations Lexical Items and the Combinatorial Principles of I—Language . . Further Computational Aspects of Linguistic Expressions Summary and Perspectives

3 7 14 24 32

2. Discovering Grammar: Prosodic and Morpho-Syntactic Aspects of Rule Formation in First Language Acquisition Barbara Höhle and Jürgen Weissenborn 2.1 2.2 2.2.1 2.2.2 2.2.3 2.2.4 2.3 2.3.1 2.3.2 2.3.3 2.3.4

Introduction The Prosodic Bootstrapping Account Prosodic Information as a Cue to Syntax Children's Sensitivity to Prosodic Information Prosodic Information in Children's Speech Processing Limits of Prosodic Bootstrapping Distributional Learning in the Acquisition of Morpho-Syntax . . . Non-Prosodic Distributional Information as a Cue to Syntax . . . Children's Sensitivity to Non-Prosodic Information Closed-class elements as early anchorpoints for segmentation and classification Sensitivity to Closed-Class Elements in Preverbal Children

37 41 41 43 44 46 48 48 49 52 54

VI 2.3.5 2.4

Contents Sensitivity to Syntactic Restrictions Associated with Closed-Class Elements Conclusions References

57 61 63

3. Rule-Application During Language Comprehension in the Adult and the Child Anja Hahne and Angela D. Friedend 3.1 3.2 3.3 3.4 3.4.1 3.4.2 3.4.2.1 3.4.2.2 3.4.2.3 3.4.2.4 3.4.2.5 3.4.3 3.5

Introduction ERPs as a Method for Examining Language Comprehension Processes ERPs and Semantic Processing ERPs and Syntactic Processing P600 Left Anterior Negativities Influence of proportion Influence of semantics Influence of task Influence of syntactic preferences Influence of prosody A Neurocognitive Model of Language Comprehension ERPs in Language Development References

71 72 72 74 74 75 75 77 78 80 81 84 85 86

4. Learning, Representation and Retrieval of Rule-Related Knowledge in the Song System of Birds Dietmar Todt, Henrike Hultsch and Roger Mundry 4.1 4.2 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.3 4.3.1 4.3.2 4.3.3 4.4 4.4.1 4.4.2 4.4.3

Introduction Song Learning: Acquisition of Rule-Related Knowledge Methodological Aspects Discontinuous versus Incremental Processes Preordained Knowledge: Acquisition of Song Structure Hierarchical Organization: Implementation of Levels Extraction of Cues Encoded in a Learning Design Retrieval of Rule-Related Knowledge: Evidence from Song Performance Rules reflected by trajectories of song development Repertoire Modification: Open versus Closed Processes Experimental Examination of Song Retrieval in Adult Birds . . . . Conclusions: Processing of Rule-Related Knowledge in a Songbird Song Acquisition and Memory Mechanisms Hierarchical Representation Format and Retrieval Rules Comparative Aspects References

89 91 92 93 96 100 101 103 104 105 106 108 108 109 110 Ill

Contents

VII

5. Representation and Learning of Structure in Perceptuo-Motor Event-Sequences Jascha Rüsseler and Frank Rosier 5.1 5.2 5.2.1 5.3 5.3.1 5.3.2 5.3.3 5.4 5.4.1 5.4.2 5.4.2.1 5.4.2.2 5.4.2.3 5.4.3 5.5

Introduction The SRT-Learning Task Awareness of Stimulus-Structure in the SRT-Task Neural Representation of Sequence Knowledge Sequence Learning in Subjects with Explicit Memory Deficits . . . Sequence Learning in Patients with Striatal Dysfunction Neuroimaging Studies of Sequence Learning Theoretical Accounts of Implicit Sequence Learning Attentional versus Non-Attentional Learning Mechanisms Role of Perceptual and Motor Processes in Serial Learning ERP-correlates of stimulus evaluation processes ERP-correlates of response preparation ERP-studies of sequence learning Connectionist Models of Sequence Learning Conclusions References

117 118 118 121 121 122 123 125 125 126 128 129 130 133 133 134

6. Imposing Structure on an Unstructured Environment: Ontogenetic Changes in the Ability to Form Rules of Behavior Under Conditions of Low Environmental Predictability Peter A. Frensch, Ulman Lindenberger and Jutta Kray 6.1 6.2 6.2.1 6.2.1.1 6.2.1.2 6.3 6.3.1 6.3.2 6.3.3 6.3.4 6.4 6.4.1 6.4.2 6.4.3 6.4.4 6.4.5

Introduction The Concept of Cognitive Control The Psychological Reality of Cognitive Control Psychological support Neuropsychological support Age Differences in Cognitive Control Frontal-Lobe Development Coordination and Separation Inhibition and Monitoring Summary Age-Related Changes in the Ability to Form Rules of Behavior Under Conditions of Low Environmental Predictability The Continuous Monitoring Task The Measurement of Monitoring Accuracy Design of Experiment Characteristics of Sample Main Findings

139 140 141 142 143 145 145 145 147 147 148 148 149 150 151 152

Contents

VIII

6.5

6.6

Are Age Differences in Fluid Intelligence Predictive of Age Differences in the Ability to Generate Rules of Behavior under Conditions of Low Environmental Structure? Summary and Conclusions References

154 156 158

II. Perception and Representation of Visual-Spatial and Temporal Information 7. Motion Perception and Motion Imagery: New Evidence of Constructive Brain Processes from Functional Magnetic Resonance Imaging Studies Rainer Goebel, Lars Muckli and Wolf Singer 7.1 7.1.1 7.1.2 7.1.3 7.2 7.2.1 7.2.2 7.2.3 7.3

Introduction Two Main Visual Processing Pathways Functional Magnetic Resonance Imaging Methodological Details of the f M R I Measurements F M R I Experiments Apparent Motion Motion Imagery Perception of Transparent Motion Conclusions References

165 165 166 167 169 170 173 178 182 183

8. Recognition Memory of Objects and Spatial Locations: Figural and Verbal Representations Axel Mecklinger and Volker Bosch 8.1 8.1.1 8.1.2 8.2 8.2.1 8.2.1.1 8.2.1.2 8.2.1.3 8.2.1.4 8.2.2 8.2.2.1 8.2.2.2 8.2.2.3 8.2.3

Introduction Figural and Verbal Representational Codes The E R P Approach Recognition of Familiar Objects and Spatial Locations Based on Pictures and Words Methods Subjects Stimuli Procedure E E G recording and statistical analyses Results Behavioral data E R P indices of object recognition E R P indices of spatial recognition Discussion

187 188 189 190 191 191 191 192 192 193 193 194 197 197

Contents

8.2.3.1 Object recognition based on pictures and words 8.2.3.2 Spatial recognition based on pictures and words 8.3 Recognition of Unfamiliar Objects and Spatial Locations 8.3.1 Methods 8.3.1.1 Subjects 8.3.1.2 Stimuli 8.3.1.3 Procedure 8.3.1.4 EEG recording and statistical analyses 8.3.2 Results 8.3.2.1 Behavioral data 8.3.2.2 Object-based versus spatially-based judgments 8.3.2.3 Visualizers and verbalizers 8.3.3 Discussion 8.4 General Discussion 8.4.1 Figural and Verbal Representational Codes for Objects 8.4.2 Figural and Verbal Representational Codes for Spatial Locations 8.5 Conclusions References

IX

198 198 199 200 200 200 201 202 202 202 202 204 206 207 207 209 210 210

9. Memory for Time: Separating Temporal from Spatial Information Processing Ricarda Schubotz and Angela D. Friedend 9.1 9.1.1 9.1.2 9.1.2.1 9.1.2.2 9.1.3 9.1.3.1 9.1.3.2 9.2

Introduction Sense of Time Working Memory Multi-component model of WM Dual task paradigms and double dissociation Time and Space Dependence Autonomy No, Single, or Double Dissociation of Temporal and Spatial Information Processing? Behavioral Experiments 9.2.1 Experimental Realization 9.2.1.1 Dual Tasks 9.2.1.2 Hypotheses 9.2.2 Experiment 1 9.2.2.1 Method 9.2.2.2 Results 9.2.3 Experiment 2 9.2.3.1 Method 9.2.3.2 Results 9.2.4 Discussion of the Behavioral Data: Single Dissociation 9.3 Methodological Shift: ERPs 9.3.1 Event-Related Potentials (ERP) 9.3.2 Neurophysiological Hypotheses

215 215 216 216 217 219 219 220 222 222 222 223 224 224 225 226 226 226 227 228 229 230

X

9.3.3 9.3.3.1 9.3.3.2 9.3.3.3 9.4

Contents

Experiment 3: ERP-Study Method Results Discussion: Implications for the neurophysiological hypotheses . . Conclusions References

232 232 233 236 237 237

10. Spatial Representations in Small-Brain Insect Navigators: Ant Algorithms Rüdiger Wehner 10.1 10.2 10.2.1 10.2.2 10.2.3 10.3 10.4

Introduction Analysis: Vectors, Routes, and Maps Skylight Geometry — Compass Mechanisms Trigonometric Algorithms — Path Integrating Schemes Cognitive Maps - Landmark Snapshots and Site-Based Vectors . Discussion: Navigating Successfully Conclusions References

241 242 242 247 247 252 255 256

11. Elementary and Configural Forms of Memory in an Insect, the Honeybee Randolf Menzel, Martin Giurfa, Bertram Gerber and Frank Hellstern 11.1 11.1.1 11.2 11.2.1 11.2.2 11.2.3 11.2.3.1 11.2.3.2 11.2.3.3 11.2.4 11.2.4.1 11.2.4.2 11.3 11.3.1 11.3.2 11.3.3 11.3.4 11.4

Introduction Behavior and Biology of Honeybee Elementary and Configural Forms of Learning in Classical Conditioning Classical Conditioning of the Proboscis-Extension Reflex (PER) . The Elementary-Configural Distinction Cognitive Aspects of Elementary Forms of Conditioning? Backward inhibitory learning Blocking Second order conditioning Configural Forms of Conditioning Biconditional discrimination Negative and positive patterning Learning in the Natural Context Context Dependent Learning and Retrieval Serial Order in a Temporal-Spatial Domain The Representation of Space in Navigation Visual Discrimination Learning in Honeybees: Generalization, Categorization and Concept Formation Conclusion References

259 259 260 260 262 262 262 263 264 265 265 266 266 266 268 269 271 276 277

List of contributors

283

Index

287

Introduction Angela D. Friederici and Randolf Menzel

The behavioral sciences have witnessed a merging of two historically separated concepts about the source of information residing in the brain, experience dependent and innate information. The great melting point is a newly emerging scientific discipline, called cognitive neuroscience. The two disciplines contributing to it, psychology and neurobiology, originated in the 18th century study of the philosophy of the mind, epistemology, branched out on separate paths over the last two centuries and join forces now with their respective methods of experimentation and conceptualization. The breakthrough occurred in the 1960's, when cognitive psychology emerged from the dominance of behaviorism and ethology by combining rigorous analysis of behavior with the original ideas of Gestalt Psychology. Perception does not only depend on the biological apparatus as was argued, but also on the constructional action of the mind (brain). The key term became "internal representation", and from this point it was legitimate to ask how and where in the brain mental processes are performed. Neurobiology began to provide the tools for addressing these questions when in the 1970's the methods were developed to record single neuronal activity in awake animals and when data from non-invasive recording and imaging techniques were provided to relate focused brain activities with internal processing. The merge of cognitive psychology and neurobiology helped to readdress the question of where the information comes from in the brain, from the environment or the genom. Each discipline had developed ways to overcome the dichotomy. In neurobiology, the Spanish neuroanatomist Ramon y Cajal (1911) had advanced the view that growth processes of neurons involved in development and under the guidance of the genom might endure into adulthood, and by virtue of experience dependent control might contribute to learning and memory. This view has received strong support in recent times as developmental neurobiologists and neurophysiologists interested in the cellular and molecular bases of learning and memory discovered more and more common mechanisms (2). In psychology, Hebb (1949) had proposed the concept of a nervous system as a complex neuronal network in which higher mental processes are carried out through the formation of cell assemblies, i. e. organized clusters of highly interconnected neurons. Fodor (1983) introduced the idea that an input system, such as those responsible for visual or language processing, is modular with a fixed neuronal structure whereas the so-called central system responsible for beliefs etc. is not. Modularity at the level of systems and interconnectivity at the neuronal level mark the recent advances in neuropsychology and neurophysiology.

XII

Introduction

Thus, cognitive neuroscience today poses the question about the source of information in the brain differently. Given that both genom (= memory of the species) and experience (= memory of the individual) contribute to memory, what are the rules of their interaction? Given the richness of the tasks to be solved these rules are certainly manifold and guide neural processing at many levels simultaneously. For example, information necessary for the solution of a complex problem, e. g. navigation of an insect guided by and despite of the ever changing pattern of polarized light of the blue sky, does not even lie in the neural machinery at all, but rather in the morphology of the eye perceiving this pattern (see chapter 10). A rather simple algorithm is then able to transform the signal from the arrangement of photoreceptors to a motor command controlling body length axis and thus direction of movement. Selective attention combined with sensitive periods of learning during development is another mechanism for reducing the complexity of a task and guiding experience dependent processes. Examples range from perceptual processes like binocular vision, birdsong learning to language learning. A third strategy of the nervous system appears to be to solve problems at the implicit level and transfer it to the explicit level only under certain conditions. An aspect of this is the discovery that even simple forms of associative learning bear components which are usually referred to by cognitive terms (like attention, expectation, planning). Such a concept is not only relevant for the distinction between intentional and incidental forms of learning in humans (see chapter 5) but also for the interpretation of learning in rather simple organisms like the honey bee (see chapter 11). The debate between Chomsky (1957) and Skinner (1957) about language learning is a particular enlightening example of the nurture-nature dichotomy which has clouded understanding of language acquisition in the past. Today it is undebated that the fast process of language learning in young infants is under the guidance of developmental processes which appear to map the linguistic input onto a parametrizied linguistic representation rather than learn it in a skinnerian way (7). It thus appears that rule extraction and representation by individual learning is a process under tight control of the species' phylogenetic memory, its genetically controlled developmental programs. However, little is understood so far about the generalities behind these processes. The strategy applied here to gain insight into the generalities is a comparative one. Young and old, healthy and impaired animals and humans, as well as different behavioral domains are investigated. The methods used are wide ranging: behavioral observations under natural or seminatural conditions, non-invasive forms of recordings from the brain and invasive single neuron recordings, mental processes such as spatial and temporal information processing, verbal and non-verbal sequential processing, implicit and explicit forms of learning. The present enterprise is not to compare different species directly. Rather the attempt was to evaluate specific domains in which the different species under investigation display highly developed behaviors and to analyze how they are implemented. The focus of investigation was laid on two essential behavioral domains: firstly on the ability to process auditory and visual sequential information

Introduction

XIII

during perception and production, and secondly on the ability to represent object and spatial information and to navigate through space. In each of these domains humans as well as animals were investigated. In the first domain, humans certainly demonstrate a very specific ability, namely to learn and process language. Although it is assumed that this behavior is genetically determined, there is as yet no direct evidence to support this view. The first three chapters, however, provide theoretical and empirical evidence in favor of biological bases of language behavior. Bierwisch discusses language as a species specific capacity highlighting the combinatorial property of the human language. Höhle and Weissenborn demonstrate that prelinguistic infants are already sensitive to those linguistic elements in the auditory input which signal the syntactic structure of a sentence. Hahne and Friederici show that the brain systems responsible for the processing of syntactic and semantic information in the child are the same as those supporting language processes during adulthood. These results, together with recent findings indicating that specific language impairments are caused by genetic defects (8), strongly suggest that the major underpinnings of human language ability are laid down in the human genome. An interesting cross-species comparison within the domain of auditory sequential processing is provided by Todt, Hultsch and Mundry's investigation of the song system of birds. Similar to language, birdsong has a clear hierarchical organization. However, unlike language, birdsong is not a recursive system allowing an indefinite number of phrases, rather a more or less fixed number of phrases are used to communicate meaning. Experiments in which the auditory input of the bird is controlled systematically show very nicely the degree to which learning follows a predetermined pattern and to which it is modified by particular input from different birds. Two additional studies investigated the human ability to learn and process structural information in the visual domain. Rüsseler and Rosier demonstrate that complex sequential rules can be learned implicitly. Frensch, Lindenberger and Kray show that humans are even able to deal with unstructured environments in a highly systematic way. This suggests that the human processing system is akin to animal perception in constantly searching for regularities in the input. The second behavioral domain focused on in the present book is the formation and use of memory for objects and space, processes necessary for navigation. It is clear that a prerequisite for navigation towards a particular goal is the recognition and representation of objects, the perception and representation of space as well as the representation of a sequence of spatial snapshots. Goebel, Muckli and Singer identify particular rules of visual perception in humans and their neuronal substrate. They stress the constructive nature of visual perception present at the neuronal level. Mecklinger and Bosch investigate the human ability to memorize and recognize object and spatial information. They find that both types of information can be represented in an imaged-based or a verbal code each supported by different brain systems, in addition to the established dissociation between an object-processing system in the ventral pathway, and a spatial processing system in the dorsal pathway. Schubotz and Friederici provide evidence for a special system to process temporal information. Although these types of information are

XIV

Introduction

processed by anatomically separate systems, navigation through space requires the integration of these types of information on-line. We are only beginning to understand how the primate brain implements these integration processes. Insects with small brains such as honeybees and ants also navigate perfectly in space. Thus it appears that similar behavior can be implemented quite differently in different species. Menzel, Giurfa, Gerber and Hellstern provide evidence that honey bees are able to learn elementary associations and configurations. This ability is taken to be the basis of their behavioral flexibility in dealing with the environment. Wehner poses the question of whether animals like ants and bees can be considered to be intelligent given their excellent navigational behavior. In showing that the ants' behavior is not primarily brain-based but that important components for successful navigation are based in the structure and function of their eyes he raises the epistemological problem of whether higher-level algorithmic descriptions of behavior necessarily leads to an understanding of the lower-level mechanisms actually mediating the behavior. The reader may want to keep this question in mind when navigating through the book. It will certainly not be able to give an answer to this question in general. However, the present contributions together with the recent advances in neuroscience could pave the way towards specific answers concerning the relation between cognition and the brain.

References 1. Cajal, R. y (1911). Histologie du Systeme nerveux d l'homme et des vertebres. Madrid Tipograpfica artistica. 2. Carew, T. J., Menzel, R., and Shatz, C.J. (eds.) (1998). Mechanistic relationship between development and learning. Dahlem Workshop Reports (Chichester, New York: J. Wiley and Sons), pp. 313. 3. Hebb, D. O. (1949). The Organization of Behavior (Chichester, New York: J. Wiley and Sons). 4. Fodor, J. A. (1983). The modularity of mind (Cambridge, MA: MIT Press).

5. Chomsky, N. (1957). Syntactic Structures (The Hague: Mouton). 6. Skinner, B. F. (1957). Verbal Behavior. New York: Appleton-Century-Crofts. 7. Kuhl, P. K. (1998). The development of speech and language. In: Mechanistic relationship between development and learning. Dahlem Workshop Reports, T. J. Carew, R. Menzel, and C. J. Shatz, eds. (Chichester, New York: J. Wiley and Sons), pp. 2 9 - 5 2 . 8. Korenberg, J. R (1998). Williams syndrome: Genes, Evolution and Imprinting. Journal Cognitive Neuroscience, Supplement, p. 11.

I.

Rule Acquisition and Representation of Structured and Unstructured Information

1. Words as Programs of Mental Computation Manfred Bierwisch

1.1 Language as a Species Specific Capacity — Conceptually Necessary Assumptions There is little doubt that the capacity to acquire and use a natural language is a species specific property of human beings. What needs to be determined is the character and specificity of the biological basis for this capacity and the way it is implemented in the human organism, especially in its cerebral structures. The aim of the present chapter is to characterize some of the conditions that any account of the biological basis of the language capacity must meet, and more specifically, what kinds of rules and representations must be assumed in order to come to grips with well known properties of linguistic behavior. This aim is not a new enterprise, to be sure, and the considerations based on the largely analytic methodology I will rely on might be considered as a prerequisit rather than the result of experimental investigations. They are by no means trivial, though, and related to empirical observations throughout. As a first conceptual necessity, we observe that the linguistic capacity relates two domains of behavioral organization — the production and perception of physical signals, and the conceptual and intentional organization of experience.1 Let us abbreviate the first domain as A—Ρ (for Articulation and Perception of signals), and the second domain as C—I (for Conceptual and Intentional organization of experience). As a first approximation, the language capacity can thus be schematized as follows: (1)

A - P «-» C - I

A few comments about these domains and the correspondence in question seem to be indicated. To begin with, an important point about the nature of A - P is the fact, that it coordinates the motor control patterns underlying the production of physical signals with the perceptual identification of the relevant properties of these signals. Although coordination of this sort is certainly not unique — it is in fact a necessary condition for all kinds of animal communication, from bees and birds to mammals —, it is however by no means a trivial aspect of behavioral organization. With respect to natural language, this organization furthermore seems to be of a fairly abstract nature: while the preferred or normal modality for A—Ρ is clearly given by articulation of the vocal tract 1 For a more systematic exposition of these considerations see e. g., reference (1) and (2), and references given there.

4

Manfred Bierwisch

and corresponding auditory perception, recent research in sign language suggests that visual perception related to articulation by the hand is an essentially equivalent alternative. 2 As to C—I, we might observe that intentional interaction with the environment as well as conceptual categorization involving different perceptual modalities again have a wide range of phylogenetic precursors and relations, although the complexity and specificity exhibit remarkable differences. Second, the complexity of this domain has phylogenetically developed characteristics within the human species and increases, moreover, remarkably under the conditions created by the language capacity, an aspect we will return to. For the time being, we notice as a (virtually necessary) assumption that language coordinates two mental domains that developed independently and prior to their recruitment by the language capacity. This capacity involves the following three conditions: 1. Lexical Foundation: The correspondence between A—Ρ and C—I is fundamentally dependent on a system L of pairs < α, σ > as its basic repertoire, where each pair connects an articulatory pattern α in A—Ρ with a semantic structure σ in C—I; σ determines - roughly speaking - the meaning assigned to α which in turn specifies the articulatory and perceptual properties of the lexical item. 2. Discrete Representation: The essentially different organization of A—Ρ and C—I requires the correspondence between the two domains to rely on sufficiently abstract, discrete aspects of the representations to be mapped onto each other. 3 As a consequence, natural language is crucially structured in terms of discrete elements or features. 3. Recursive Combination: The property distinguishing natural language from all other systems of representation and communication is the combinatorial capacity, by means of which a potentially infinite range of expressions can be constructed on the basis of the members of L. Recursive combination is the basis of the discrete infinity of language, which is a necessary presupposition for what is sometimes called the affability of language - see e. g., reference (5), 18ff. — ' viz. the possibility to express any propositional content in every natural language, assuming suitable adaptation of the lexical repertoire L. 2 See e. g., reference (3) for an overview and reference (4) for an instructive discussion. It might be noted that the integration of perception and production of structured signals is a conceptually necessary condition throughout systems of communication, while the adumbrated abstract nature of A - P , which makes it neutral with respect to the employed modality, is an empirical assumption that may be wrong and, if true, could be factually otherwise. And the apparent predisposition for phonetic categorization on which sound structure of language seems to be based, is, indeed, a nontrivial argument to this effect. 3 The essentially different organization of A—Ρ and C—I has various aspects, one of which is the dimensionality: Whereas the production and perception of signals is crucially related to time-dependent linearity, the representations in C—I must reflect rather different possibilities of conceptual and perceptual dependencies. I will not be concerned here with the question whether the coordination of articulation and perception in A—Ρ already requires discreteness in some respects, as suggested not only by the categorial perception of humans, but also by certain aspects of e. g., bird songs.

Words as Programs of Mental Computation

5

It looks as if the emergence of recursive combination, an operation that is also the essential condition for natural numbers and arithmetic, is a crucial step in the phylogenesis of man. 4 Under this assumption, the imposition of recursive combination on the lexical repertoire, which would not be possible without the availability of discrete representation, is the decisive jump creating natural language. In any case, the schema in Figure (1) can be made somewhat more specific, indicating that knowledge of language, a system of mental organization called I-language 5 is based on discrete structures imposed on or recruited from the domains of articulation/perception and conceptualization/intention. The relevant representations projected by I-language on A—Ρ and C - I are called Phonetic Form (or PF) and Semantic Form (or SF), 6 respectively. We thus arrive at the schema displayed in Figure (2), where PF and SF must be assumed to be embedded in their respective domains, indicating the interface aspects by means of which these domains provide the linguistic representation α and σ of the pairs mentioned earlier: (2)

Signal / I > P A R I S I ] < 0 / I > JOHN Ι ]o ]o

The categorization introduced in (5) and illustrated in (6) (b) is based on the following assumptions: (7)

(a) 0 (for states of affairs) and 1 (for individuals) are basic categories of C; (b) if χ and y are any categories, then is a functor category of C.

12 I want to emphasize that the present considerations concern the formal nature of representations of thought, rather than its content, even though these aspects are not independent of each other: The type structure just mentioned reflects aspects of the common sense ontology, i. e., the general categories on which C—I is based, but in the present context it must be construed as a purely formal rather than a substantive characterization of the representational system. With regard to substantive considerations, one might in fact think of Frege's Begriffsschrift (ref. 21) and particularly of Wittgenstein's Tractatus Logico—Philosophicus (ref. 22) as proposals to develop a comparable system from a different but related perspective. It is, in other words, no coincidence that the representational system in question will exhibit properties of standard logic in several respects, as we will notice shortly. 13 The specific properties o f f g, h will be defined in (7) below.

10

Manfred Bierwisch

(8)

If A is a unit o f category and Β is a unit o f category y, then [ A Β ] is a c o m p l e x unit o f category x.

A p p l y i n g these definitions t o the illustration in (6), w e observe the functor F U T U R E c o m b i n i n g with the state o f affairs [[LEAVE P A R I S ] JOHN], w h i c h in turn is m a d e u p f r o m the two-place predicate LEAVE that c o m b i n e s with the individual PARIS to create the c o m p l e x o n e - p l a c e predicate [LEAVE PARIS] that finally applies to the individual JOHN. T h e labeled bracketing o f (6) (b) c a n just as well be represented by a tree structure as s h o w n in (9), where basic elements o f S F are assigned t o terminal n o d e s and category labels to non—terminal n o d e s o f the tree: (9)

Future

Leave

Paris

John

I

I < n >

I ι

1 0

ο The f o r m a t used in (9) brings out the 'skeleton' o f S F m e n t i o n e d earlier m o r e clearly as the tree structure by m e a n s o f w h i c h the terminal elements are organized. The structure defined a n d illustrated in ( 5 ) - ( 9 ) is to be elaborated in certain respects that w e need n o t g o i n t o at the m o m e n t . 1 4 Three c o m m e n t s are to be made, however, about the properties of S F as developed so far.

14 Three points should at least be mentioned in this respect: (i) The combinatorial structure determined by C as defined in (8) accounts for so—called functional application and could be generalized to include what is called functional composition in the following way: (8')

If A is of Category and Β is of category < y / z > , then [A 5 ] is of category < x / z > .

As a matter of fact, the definition of functional application in (8) is just the special case of (8'), where the ζ in < y / z > is empty. For extensive discussion of these matters see various contributions in Oehrle et al. (ref. 23). I want to point out, though, that the generalization indicated in (8') is inherent in C as defined in (7) and does not extend the formal properties of S F . (ii) The basic elements JOHN, PARIS, LEAVE, FUTURE used in (6) (b) and (9) are abbreviations that are eventually to be replaced by complex structures made up f r o m more basic components of SF in the way just described. (Thus even the operator FUTURE has to replaced by a more complex structure of roughly the following (still simplified) sort: [UI BEFORE P], where UI is an indexical variable representing the event of uttering the expression containing it and ρ will be substituted by the argument to which FUTURE would apply.) I will return to these problems below, (iii) In order to turn the representational structure defined in this way into a proper semantic system, it would have to be supplied with a compositional system of truth- and satisfaction-conditions of the sort developed in reference (24), reference (25), and much subsequent work in model theoretic semantics. I will not go into those issues, as I am interested in conceptual structures rather than formal semantics.

11

Words as Programs of Mental Computation

First, the representations that SF recruits from or imposes on C—I are abstract in much the sense in which PF is assumed to be abstract with respect to A - P . To get an idea of the abstractness in question, consider a simple sentence like (10): (10)

You will find Picasso in the next room.

Under the primary interpretation of Picasso as referring to a real person, (10) is a statement about the present location of this person. The equally normal interpretation of Picasso would refer however to (some of) his paintings exhibited in a particular room. In other words, the computation of the particular configuration in C—I associated with a unit in SF will rely on sometimes rather complex conditions in the knowledge represented in C—I. This phenomenon has far reaching ramifications, which under one perspective make representations of SF abstract, but under the inverse perspective provide a remarkable plasticity to the interpretation of linguistic expressions. Problems related to this aspect of mental computation are discussed in reference (26), (27), and work cited there. As a crucial consequence of this property, which is found in all natural languages and must in fact be considered as a necessary condition of the language capacity as such, it is only a systematic selection from the representations provided by C—I that enter into the combinatorial structures specified in SF — just as only a selection of properties from A—Ρ enter into the representations of PF. Second, it is important to observe that — just as in PF — the combinatorial organization of SF-representations is in principle the same within and between lexical items. This point is less obvious than the essentially identical concatenation of segments of PF within and across words. A few examples might illustrate the point: (11)

(a) Schimmel

(b) white horse

(12)

(a) loswerden

(b) get rid of

(13)

(a) visualize

(b) sichtbar machen

(14)

(a) seek

(b) try to find

(15)

(a) erzürnen

(b) zornig machen

The conceptual structures expressed by one lexical item in the (a)-cases are represented by a combination of words in the (b)-cases, and there are in fact reasons, which I need not spell out here in detail, according to which the (a)- and (b)-cases have largely the same SF—representation. 15 A related point can be made by means of so-called analytic or periphrastic tenses like will come, has come as opposed to simple tenses like comes, came. 15 As a matter of fact, the possibility to introduce new words by means of explicit definitions like (i) exploits essentially this property of SF: (i)

A spheroid is a body that is almost spherical.

12

Manfred Bierwisch

Thus in (16), FUTURE and LEAVE correspond to two separate words, whereas the combination of PAST and LEAVE IN (17) is represented by one word: (16)

(a) John will leave Paris

(b) [ FUTURE [[ LEAVE PARIS ] JOHN ]]

(17)

(a) John left Paris

(b)

[PAST[[LEAVEPARIS] JOHN]] 1 6

It is an important consequence of these considerations that SF by itself does not formally distinguish words, phrases, and sentences, but provides only categorized hierarchies of basic semantic components. These components do not generally coincide with lexical items. This becomes obvious, if the elements LEAVE, JOHN, PARIS, used for expository reasons, are replaced by somewhat more appropriate structures as indicated e. g., in (18): (18)

[[LEAVE X ] Y ] abbreviates a complex structure [BECOME [ N O T [[LOC [ AT Χ ]] Υ ]]] where BECOME of category < 0/0 > represents the coming about of a particular state 17 ; NOT of category < 0/0 > negates a state of affairs; Loc of category < < 0 / l > / l > is a two—place predicate relating χ to place Y;18 AT of category < 1/1 > is a functor specifying the contiguous environment of an individual x, both χ and its environment being of category 1.

In a similar vein, the element JOHN of category 1 is short for a more detailed specification, as indicated in (19), where /john/ represents an address the content of which is spelled out at P F , with CALLED of category < < 0 / 1 >/l > representing the characteristic condition of proper names, viz. to assign a name to a specific individual: (19)

John abbreviates [Specific ζ [Person ζ : [Called / john / ] z]]

where Specific is an operator of category < < l / 0 > / l > , which picks out a particular individual and assigns it the property specified in ρ of [Specific ζ [ρ]]. The third and last comment concerns the fact, that the structure of SF cannot be based on linear ordering in the way in which PF is based on a linear skeleton. 16 Actually, the S F of left as a separable combination would have to make use of functional composition in the sense defined in (8') in fn 14, such that the SF associated with left is a complex functor [PAST LEAVE ] of category < < 0 / l > / l > — just like LEAVE. I will omit these technicalities here. 17 More specifically, the state of affairs [BECOME P] is interpreted by a situation consisting in the change from NOT Ρ to Ρ for arbitrary states of affairs p. For a more detailed characterization of this and some other basic elements see reference (28). 18 χ, γ, ζ , Ρ, υ etc. are variables in the usual sense. I will have to say more about them in section 3.

Words as Programs of Mental Computation

13

There are at least three reasons for this claim. First, there is no temporal relationship between the conceptual components of expressions like John loves Mary or

white horse, or arbitrary other examples: John does not precede Mary, and the relation of loving is not a temporal transition between them in any reasonable sense. Different orderings characteristic for German clause structure - as in Hans liebt Maria

vs. Liebt

Hans Maria?

-

are due to g r a m m a t i c a l conditions, n o t to

linear ordering in conceptual structure. Hence the sequential ordering of the words cannot correspond to a linear ordering in SF. Secondly, the same observation holds for the components within lexical items, as can be seen from examples like those in (11)—(15). More generally, while the sound—structure of words determines a linear ordering, their conceptual structure does not, even if the components constituting a word exhibit linear ordering, as in nieder—brennen vs. burn down. And third, the structure of C—I, from which SF is recruited, does not exhibit a temporal structure, in contrast to A—P, supporting PF. 1 9 There is a crucial structural consequence following from this conceptually necessary assumption: As linear ordering does not have a systematic place in SF, structure can only be determined in terms of hierarchical organization, indicated by bracketing or tree structure. Moreover, branching structure (or bracketing, for that matter) is restricted to binary branching, since the role of two or more arguments combining with one operator can only be distinguished in terms of hierarchical grouping 2 0 , a condition that has been followed in all examples given so far. I want to establish this conclusion in a more principled way: 19 It should perhaps be pointed out that there is, of course, a temporal aspect with respect to thinking in the sense of grasping a thought, drawing a conclusion, integrating perceptions, etc., which is in fact the target of a great deal of psycholinguistic research, clarifying reaction times, temporal patterns of activation, etc. But this does not concern the structure of thoughts or meanings of linguistic expressions. To clarify this point, consider cases like (i) and (ii): There is no different temporal order in the thought expressed by the two sentences, corresponding to the syntactically determined position of the verb kommt: (i) (ii)

Wir fangen an, weil er wahrscheinlich nicht kommt (We will start, because he is not likely to come) Wir fangen an, denn er kommt wahrscheinlich nicht (We will start, as he is not likely to come)

The correspondence between the linearity of PF and the structure of SF will be taken up below. 20 Thus, if the two arguments of a two-place predicate are to be distinguished, as in e. g., χ Before y, this cannot be done in terms of ordering, but only by means of grouping, where (i) - (iii) would be equivalent notations differing only in irrelevant notational options: (i)

[ [ Before υ ] χ ] (ii)

[ χ [ Before υ ] ] (iii)

[ χ [ γ Before ]

As a matter of fact, SF—trees like the one given in (9) could be considered as a kind of mobile, where left-to-right-ordering is induced merely by accidental writing conventions. For this reason, SF can be represented by means of the polish notation, mentioned in fn 7 and followed in the examples given so far. This convention does not hold for the surface ordering of words.

14

Manfred Bierwisch

(20)

SF is based on hierarchical structures that allow for only binary branching. 21

The notion expressed in (20) is formally embodied in the condition (7) (b) allowing only functor categories of the form for arbitrary categories χ and y. More generally, then, the structure of SF, including its recursive character, is completely determined by the category system C (and vice versa).

1.3

Lexical Items and the Combinatorial Principles of I-Language

So far, I have characterized representations in PF and SF, and I have given some hints with respect to their interpretation in terms of the domains A - Ρ and C—I, respectively. The essential condition on which I-Language is based is a computable correspondence overcoming the discrepancy between the non-linear, hierarchical structure of SF and the strictly sequential ordering of PF. As a part of this task, I-Language has to provide basic units for the computation of the correspondence to start with. These units constitute the lexical items of I-Language, each item combining a sequence α of PF with a configuration σ of SF. It should be noted in this connection that a fixed set of lexical items must be assumed not merely as a contingent fact about I-Language, but as a conceptual necessity, because the content of α cannot be derived from that of σ, and vice versa. 22 In other words, the idiosyncratic nature of lexical items is not an incident, but a rather a systematic aspect of natural language. The simplest or most economical way to combine lexical items in order to compute the correspondence between PF and SF for complex expressions would now be based on two principles: (21)

(a) Strict Functional Combination: Two expressions Ei and E2 can be combined iff E2 is a possible argument of the functor Ei according to the categorization of their SF. (b) Uniform Linearization: The operator Ei uniformly precedes (or follows) its argument E2 in PF.

21 This might appear to be too restrictive, as e. g., functors corresponding to and, or, or the relation of equivalence, similarity, or symmetry are symmetrical, i. e., have two equivalent arguments that must not be given a different hierarchical position. This does not seem to be generally correct, however. Examples like (i) and (ii) indicate, that e. g., and may differ from strictly symmetrical conjunction: (i)

He married and became rich.

(ii)

He became rich and married.

22 This is due to the so-called arbitrariness of conventional signs, allowing configurations in SF to be combined with arbitrary units of PF. Onomatopoetic elements like crack, bang, buzz, etc. do not really escape this arbitrariness of natural language, as their "translations" in other languages easily shows. It might be noted, however, that iconic sign systems like e. g., music or painting are not arbitrary in the same way and hence

Words as Programs of Mental Computation

15

According to (21) (a) and (b), the combinatorial properties of I-Language would follow without further stipulation from principles already inherent in SF and PF, respectively, plus the independently necessary existence of lexical items. Even though this option should be taken as some kind of guideline, it is a rather straightforward observation that for empirical reasons both principles cannot be maintained in the most direct way. We already came across violations of the principle of Uniform Linearization by e. g., German clause structure, illustrated in (22): (22)

(a) Wenn er teilnimmt, dann geht sie. (If he participates, then she will quit.) (b) Nimmt er teil, dann geht sie. (If he participates, then she will quit.)

(22) (b) does not only exhibit a different position of the finite verb nimmt than (22) (a), which is semantically equivalent, it even separates the two parts of teilnehmen, which must be registered a single lexical item, since its SF — corresponding to that ofparticipate, or take part — cannot be derived from that of teil and nehmen. It might furthermore be observed, that this difference in surface ordering depends on the overt realization of the conditional operator wenn in (22a), which is missing (or implicit) in (22) (b). A comparable, but different phenomenon shows up as a consequence of wh—words as in (23): (23)

(a) Has he seen the picture? (b) Which picture has he seen?

Violations of (21) (a), the principle of Strict Functional Combination, are equally ubiquitous, even though sometimes more intricate and less easily observed. (24) presents two types of violations: (24)

(a) We expected him to arrive early, (b) We expected his early arrival.

(25)

(a) We convinced him to arrive early, (b) *We convinced his early arrival.

The difference between (24) (a) and (b) is not due to different functor-argument-structure in SF, but instantiates two options to realize the argument of expect by either a verbal or a nominal construction, an alternative that convince does not allow for, such that (25) (b) is deviant (indicated by the asterisk). Hence what must be determined over and above the functor-argument-categorization in SF is the distinction between verbal and nominal elements and the conditions they are subject to. 23 This is traditionally done by means of grammatical categories such as Noun, Verb, Adjective, etc. These grammatical categories play in fact a two-fold role, indicated in (26) (a) and (b): do not have a lexical system, as the "content" of a musical theme or painting corresponds directly to its auditory or visual structure. 23 Things are actually more complex, as (24) (a) and (25) (a) are special types of nonfinite verbal complement constructions, the details of which need not concern us here, however.

16

(26)

Manfred Bierwisch

(a) his early arrival is marked as Nominal, him to arrive early as Verbal and Infinite. (b) expect is marked for Nominal or Infinitival complements, convince for Infinitival complements only.

(26) (a) exemplifies what is usually called grammatical or syntactic categorization, (26) (b) illustrates so-called subcategorization properties of convince and expect, which are moreover categorized as Verbal in the sense of (26) (a). An even more specific type of subcategorization is illustrated in (27): (27)

(a) Eva saß auf dem Stuhl (b) Eva setzte sich auf den Stuhl

(Eva sat on the chair) (Eva sat on the chair)

While English does not distinguish position and motion in this case, German has two different verbs and furthermore a distinction between locative and directional prepositions, which require their complements to exhibit Dative (marked by dem) and Accusative (marked by den), respectively. (28) indicates the subcategorization information underlying this distinction: (28)

(a) sitzen and sich setzen are marked for Locative and Directional complement, respectively; (b) Locative auf is marked for Dative, Directional auf for Accusative complement.

It might be added that even the property of wenn as opposed to the invisible conditional operator illustrated in (22) can be captured in crucial respects by means of subcategorization information determining the clause type introduced by wenn. Obviously, subcategorization cannot be derived from either SF- or PF-information of lexical items and must therefore be additionally specified. More generally, facts like those dealt with in (22) to (28) are the reason to add the formal features φ to the lexical entries mentioned in section 1. Before I am going to discuss the effect of these features on the operation Combine, introduced in (3) above, I will characterize the nature of φ in somewhat more detail, noting that the issues involved have been the subject matter of descriptive and theoretical work in linguistics for several decades. To begin with, the grammatical form G F comprising categorization and subcategorization is based on syntactic and various types of morphological features. 24 Syntactic features specify categories like Verb, Noun, Adjective, Determiner, etc., mor24 It should be emphasized that categorization as presently discussed is to be distinguished from the categorization on which the skeleton of SF is based. The confusing terminology is due to distinct traditions of formal analysis in syntax and semantics. With respect to its systematic status, the categorial system underlying SF might just as well be called a semantic type system. The term usually found in the pertinent literature, however, is "categorial grammar". That is why I sticked to the unfortunate terminology, hopefully avoiding confusion by keeping strictly apart categories of SF and of grammatical form GF, respectively.

17

Words as Programs of Mental Computation

phological features specify categories like Case, Number, Gender, Person, Tense, Aspect, etc. They may (but need not necessarily) be represented by binary features like [ ±_ Noun ], [ +_ Plural ], much like the features of PF. What they have in common (as opposed to primes of PF and SF) is their strictly formal status, restricted to their role in computing the PF-SF-correspondence without any direct interpretation in either C—I or A—P.25 Syntactic and various types of morphological features differ, however, in the way they participate in the derivation of complex linguistic expressions. One aspect relates to the distinction between categorization and subcategorization. 26 Categorization, roughly speaking, classifies an expression E, while subcategorization classifies the possible or necessary complements Ε combines with. To elucidate this point, a more explicit characterization of the role of variables in SF is indicated. From a formal point of view, variables are basic elements of SF, assigned to categories of C as defined in (7). Substantially, a variable provides a position the value of which is defined by the combinatorial conditions in SF in one of two ways: (a) the value of a variable χ is determined by other elements in SF (which either bind or substitute x), or (b) the value of χ is left for contextual interpretation in C—I, turning it into a parameter of SF. The two possibilities are visible in (29) (a) and (b) respectively: (29)

(a) He was in the room. (b) John was in.

He left the room, John left.

In (29) (a), the object of the preposition in as well as the object of the verb left binds a variable in the semantic representation of these items. The same variable functions as a parameter to be fixed by contextual conditions in (29) (b). The variable to be bound by the subject of was and of left on the other hand is specified 25 This seems to be in conflict with features like [+ Plural], [+ Fern], etc. for which there seems to be a straightforward conceptual interpretation in terms of cardinality, sex, etc. Two points are to be made here, though. First, morphological features can, and frequently do, occur without the adumbrated conceptual purport: glasses, trousers, pants are plural, even when used for single objects, ships are grammatically feminine, although there isn't any sex. Second, and more importantly, morphological features do have formal effects their putative conceptual interpretation alone has not: (i)

These/*This glasses are/*is in need of repair

(ii)

The police is/*are gathering at the station

The plural of glasses in (i) forces plural agreement, although it might refer to a single object; the singular of police in (ii) requires singular agreement, even though it clearly refers to a plurality of policemen. This means that morphological features may be associated with conditions on preferable SF-components indicating their default interpretation in C—I, which may however, be suspended. 26 The terminology has been introduced in Chomsky (ref. 29) with respect to theoretical assumptions that have remarkably been modified in the meanwhile. For reasons to be discussed shortly, the information in question has more recently been called Thematic Grid, or Argument Structure, see e. g., Grimshaw (ref. 30).

Manfred Bierwisch

18

in both (29) (a) and (b) by He and John, respectively. Although the assumption of variables and their different properties is motivated here by means of syntactic constructions, it should be noted that the variables as such are basic elements SF, which must in fact be assumed for strictly semantic reasons. With this provision, the subcategorization of an expression Ε can be construed as specifying conditions for variables in the SF of Ε to be bound by syntactic complements of E. Three types of information are to be distinguished in this respect: (30)

(a) A variable χ may or may not be available for binding by a syntactic complement; if it is available, the binding may be obligatory or optional. (b) A variable χ occupies a specific position with respect to other variables, which determines the syntactic role of the complement binding x. (c) The binding of χ can be subject to morphological conditions imposed on the complement in question.

The effect of (30) (c) is illustrated by Case requirements in (31), where the object of trauen (trust) must be a Dative, while that of schätzen (like, esteem) must be an Accusative: (31)

(a) Sie traut ihm (Dative) nicht. (b) Sie schätzt ihn (Accusative) nicht.

(She does not trust him) (She does not like him)

The types of information indicated in (30) are not independent of each other, they can moreover be derived in part from other properties of the expression they belong to: The ranking of a variable according to (30) (b) corresponds essentially to its position in the SF-hierarchy; conditions in the sense of (30) (c) as well as the optionality mentioned in (30) (a) are constrained by the categorization of Ε. I cannot go into the details of these dependencies here; I only want to point out that conditions of this sort seem to be the actual content of the formal features constituting the information φ of categorization and subcategorization. For the sake of illustration, I will follow the representation developed in Bierwisch (ref. 2; 31) and related work, using the formalism of Lambda abstraction to make variables accessible in the sense of (30) (a). The entry for leave, the SF of which was sketched in (18), will now be something like (32): (32) / l e a v e / [+V, - Ν ] ]o/i yi ]o]o]o

(λχ) λy

[Becomeo/o [Noto/o [[Loc/i [ Ati/i xi ]i

[Acc] [Nom]

PF

Cat

Subcat

SF

GF Subcategorization — or Argument Structure, for that matter — associates morpho—syntactic conditions - the Case features identifying object and subject

Words as Programs of Mental Computation

19

respectively in the present example - with variables abstracted from SF, interrelating thereby semantic and morphosyntactic information. The features for Accusative and Nominative are in fact default conditions, given the categorization [+V, - N ] for verbs and the abstraction of the two argument positions λχ and λγ, and therefore need not actually be specified in the lexical entry leave. The fact that the object of leave can be omitted, as shown in (29), is indicated by the parentheses around λχ, turning it into an optional argument position. Now, the most parsimonious assumption about the organization of I-Language would restrict the type of interaction between SF and other information just discussed strictly to this component, allowing for no direct dependencies between SF and Cat or PF. 27 Even this restricted assumption would leave us, however, with the problem to generally motivate the features appearing in GF, including the general constraints which supply predictable features under normal or default conditions. Remember, that φ-features, and G F in general, had to be assumed in view of the fact that Strict Functional Combination (principle (21) (a)) does not account for certain characteristic properties of natural language: the combination of two expressions Ei and E2 depends in crucial respects on conditions not belonging to SF. As a partial answer, accounting for the formal features of GF, we might observe, that G F and the constraints it obeys, consist of lexical, i. e., strictly local information that can be identified by inductive generalization, provided the general principles governing combinatorial processes are independently given. The latter proviso is, of course, anything but trivial. It requires in fact a complete and explicit characterization of the operation Combine sketched in (3) above. I will briefly indicate how the structure of lexical information illustrated in (32) determines the operations to be assumed for Combine (Ει, E2) to form the complex expression E'. The basic operation combines a predicate with an appropriate argument expression - such as leave and the room into leave the room - where the argument meets the morphological condition associated with the relevant argument position and binds the pertinent variable. The formal representation of argument positions by means of lambda operators would suggest to base this operation on the general schema of so called Lambda-conversion, indicated in (33), where the argument a simply replaces the variable χ bound by the operator λχ, thereby dropping the operator in question, provided Λ; and a are of the same category: (33)

Ax [Ρ χ ]

fl=>[Pfl]

27 This assumption might be too strong in view of phenomena like Focus, relating conditions on conceptual interpretation to pitch accent in PF, as illustrated by minimal pairs like (i) vs. (ii), with capitals indicating pitch accent. (i)

JOHN hit BILL and then HE kicked HIM

(ii)

John HIT Bill and then he K I C K E D him

Even though these are by no means marginal issues, I will not pursue these problems, as they are likely to be governed by independent and specific principles, and not determined by lexical information in any case.

20

Manfred Bierwisch

According to this operation, the argument a would bind the variable of the argument position by substitution. The argument expression could furthermore be checked for whatever condition is associated with λχ. The basic notion underlying this operation has widely been used to account for the compositional structure of natural language e. g., in reference (24), (25) and subsequent work. But even though it seems to be appropriate in principle to capture the head-complement combination in this way, there are a number of reasons that require the basic notion of the lambda calculus to be adopted in a somewhat modified way. The most important problem to be overcome concerns the logical form of arguments such as everybody, something or certain places in a construction like everybody left something in certain places. An influential solution of this problem was proposed in Montague (ref. 25), subsequently developed into the theory of generalized quantification. The core observation is the twofold effect that expressions like everybody, the chair or some visitors contribute to the conceptual interpretation. On the one hand, these expressions identify a referential domain — the range of pertinent people in the case of everybody, the set of relevant chairs and visitors in the chair and some visitors, respectively — on the other hand they specify the selection to be made from this domain — the total domain in everybody, a fixed choice in the chair, and a restricted subset in some visitors. Thus these two conditions together specify the referential effect of a given argument expression. Formally, this effect can be represented by a referential index obeying the conditions in question, and it is this index that replaces - i. e., binds - the variable providing the argument position. For the sake of illustration, suppose that everybody has a lexical representation as indicated in (34), where the variable γ is to be specified by a referential index Y, related by the operator EVERY to the domain specified by H U M A N ; ignoring problems of tense, the SF-representation of (35) (a) would thus come out as (35) (b): (34) (35)

/ everybody/[ +N, + F ] [[ Every/i y ]/o [Human 0 /i y]o]o/o (a) Everybody left (b) [ [ Every Y [ Human Y] ]0/0 [Become [Not [ [ Loc [ At χ] ] Y] ] ]0 ]o

For (35) to be wellformed, the optionality of the object—position of leave — indicated by the parentheses in (32) — must be invoked, thereby leaving the variable χ as an unbound parameter to be fixed under contextual interpretation, preferably as the background of orientation. The obligatory subject position on the other hand is bound by the variable Y, the referential specificity of which is defined by the operator introduced by everybody. To further illustrate the binding of argument positions under functional application, I will briefly comment on constructions like (36), based on a related, but different entry leave: (36)

Everybody left something in the room.

The two versions of leave in (35) and (36) differ mire radically than the surface similarity might suggest. First, the object of leave in (36) - which is not optional

21

Words as Programs of Mental Computation

- does not identify the place where the individual(s) referred to by the subject cease to be located. It rather identifies the theme the location of which is at issue; second, the location - which is not changed, but rather maintained — is identified by means of the additional complement realized by the prepositional phrase. Finally the subject of the verb is now the agent rather than the theme of location, where the agent is in fact permissive, not bringing about a state where the theme would not be in its original position. All of this can be represented by the following additional entry for leave: (37)

/ leave / [+'V, - Ν ] λΡ λζ λχ [ N o t [ Cause [ Become [ N o t [ Ρ ζ ] ] ] χ ] ] [Loc] [Acc] [Nom]

I will not go into the interesting problem whether and in which way the similarities among (32) and (37) are to be accounted for in a more elaborate theory of lexical structure. 28 In order to indicate how the SF of (36) might eventually derive on the basis of (37) and its syntactic complements, I will assume the following (somewhat simplified) derivation of the prepositional phrase in the room on the basis of the lexical items (38), (39), and (41): (38)

/ the / [+N, + F ] λ Ν [Def χ [ Ν χ ] ]

(39)

/ r o o m / [ + N , - V ] λ χ [Room χ ]

(40)

/ the room / [+N, + F ] [ Def X [Room X ] ]

(41)

/ in / [ - V , - N , +Loc] λυ λζ [ [ L o c [ I n t υ ] ζ ] ]

(42)

/ in the room / [ - V , - Ν , + L o c ] λζ [[Def X] [Room X]] [[Loc [Int X]z]]

29

The Operator Def in (38) provides a (contextually definite) instance from the domain of rooms, and the functor I n t in (41) identifies the internal space of its argument, such that in the room denotes the property of being inside the internal space of some definite room. Applying now the lexical information of leave given in (37) to the expression derived in (42), we get (43), indicating the SF and the argument structure of leave in the room: (43)

λζ λχ [ Def X ] [ Room X ] ] [ Not [ Cause [ Become [ Not [ Loc [ Int X ] ] z]x]]]l [Acc] [Nom]

28 See reference (2) for some discussion of these issues. 29 I have omitted certain technical details here. The functional head the combines with the noun room by direct functional application, with subsequent specification of the referential variable X. I will assume without further comment that direct functional application - according to the operation indicated in (33) — does in fact apply to non—referential arguments. Capitals are used to identify variables of this sort. — I will turn to the syntactic aspect of the combinatorial structure of these expressions shortly.

22

Manfred Bierwisch

Binding the variables of the object and the subject by something and everybody, respectively, the following (still simplified) SF of the sentence in (36) will emerge30: (44)

[ [[Every Z] [Human Z]] [ [[Some Y] [Object Y]] [ [ [ D e f X ] [Room X]] [Not [[Cause [Become [Not [[Loc [Int X]] Y]]]] Z]] ] 1 ]

So far, I have outlined the computation of the SF of complex expressions on the basis of the lexical items entering the combinatorial process. Each step of this computation is determined by the subcategorization or Argument Structure of the head and the categorization of the complement. More specifically, the combination of a head Ei with a complement E2 requires the categorization of E2 to be compatible with the formal features assigned to the first argument position of the head Ei, thereby discharging the position in question. Hence the computation of SF for complex expressions is not only controlled by morpho-syntactic features (as illustrated in (22) to (29) above), it also requires a derived syntactic categorization and argument structure to be assigned to complex expressions in order to allow for recursive computation. In other words, the integration of word meanings into semantic representations of complex expressions, illustrated in (42) and (44), must be accompanied by the creation of complex syntactic structures, providing the categorization and argument structure of complex expressions, alongside with their semantic and, of course, phonetic form. 31 The somewhat simplified syntactic categorization assigned to the constituents of (36) with the SF (44) would be something like (45), where syntactic category labels abbreviate sets of syntactic features 32 : (45)

[ [ everybody DP] [ [ something Dp] [ [ left v] [ [ in p] [ [ the D] [ room N] DP] PP] VP] VP] VP]

While (45) corresponds appropriately to the compositional structure of the SF assumed for (36), it clearly does not indicate the surface order of it, a point to be 30 One of the various problems not taken up in (44) is the wide scope of the definiteness operator in the prepositional object the room, which is an effect of the presuppositional character definiteness, determined by DEF. I cannot go here into the details accounting for this aspect. 31 According to recent neuropsychological investigations, syntactic and semantic processes accounting for the grammatical and semantic form of complex expressions are not only controlled by different types of information, they are in fact realized in separate cortical areas, exhibiting different time characteristics. See e. g., reference (32) for evidence and discussion of this point. 32 Following Chomsky (ref. 1), I will assume that the category X P is relationally defined as the 'maximal projection' of the category X with respect to a given structure. Hence, what used to be called "levels of projection" in earlier versions of phrase structure theory does not appear explicitly in the categorization of linguistic expressions, but follows from their position in complex structures, such as (45). In this sense, VP abbreviates the features [+V, — N, ... ], where [... ] indicates further features the head of the expression might project.

Words as Programs of Mental Computation

23

taken up below. For reasons to which I will turn next, (45) does not indicate the argument structure of linguistic expressions either. In order to be more explicit about this aspect, I will slightly modify the organization of lexical items (and linguistic expressions in general) relied on so far. Remember that the component φ of formal features is made up of two subcomponents, the categorization Cat and the Subcategorization or Argument Structure AS of an expression E. Let me abbreviate the features appearing in Cat by κ and the information of AS by θ With this proviso, the general structure of linguistic expressions can be characterized as follows: (46)

Ε = , where μ consists of phonetic and formal features.

If Ε is a basic lexical item, μ consists of phonetic information α only 33 ; if Ε is a complex expression, μ comprises the phonetic features of Ε together with formal features imposed according to the following convention: (47)

Let and belong to Ει and E2 respectively. Then the expression E' combining Ei and E2 contains < < < μ ι , κι>, >, κι >, where Ει is the head of E'.

This convention must be part of the operation Combine mentioned in (3), which will be reformulated in (48). In its present form, (47) simply states that the categorization of the head is that of the resulting complex expression as well. Thus the categorization of in in (45) is projected to in the room, that of left to left in the room, etc.

While the categorization κ is a homogeneous (possibly structured) set of formal features, things are more complex with the Argument Structure Θ, which is a hybrid combination of operators abstracting over variables in SF and formal features assigned to these operators. 34 As a matter of fact, θ can be considered as an interface between φ and σ, admitting basic elements of both SF and GF. The structure of θ is systematically constrained by conditions I cannot pursue here in detail (but see Reference (2) for some discussion). I will only mention the fact that the order or ranking of positions in θ is essentially determined by the structure of 33 The structure < < α, κ > , > of basic lexical items corresponds neatly to the claim made in Levelt (ref. 33), according to which a lexical item Ε associates a lemma with a lexeme, where is the lexeme, while the lemma consists of , viz. the Semantic Form of Ε prefixed by the Argument Structure Θ, which makes semantic variables accessible for syntactic specification. In other words, the information of lexeme and lemma, even though accessed differently in language production, comprehension, or recall, and presumably stored separately in memory, must be systematically associated for each lexical item in language acquisition. 34 It should be noted, that θ of an expression Ε can be empty, which simply means that Ε cannot be the head of a complex expression. Expressions like the room, but also lexical entries like everybody, something in the above illustration are of this type.

24

Manfred Bierwisch

SF from which the variables in θ are abstracted. Due to this ranking, the pertinent semantic relations are mapped on the syntactic role of the complement binding a given variable in SF. Technically speaking, the first argument position in θ is always assigned to the closest syntactic constituent satisfying the features assigned to the position in question. With these considerations in mind, the operation Combine can now be formulated as follows: (48)

Let Ei and E2 be < , > and < , >, respectively, then Combine (Ει, Ε2) = Ε', where (a) Ε' = < < < >, κι>, >, (b) Κ2 satisfies the formal features associated with the first position in θι, (c) θι σι combines with θ2 σ2 by functional application, thereby eliminating the the first position in θι according to the convention (33).

Obviously, the result of Combine (Ει, E2) as now defined is again of the format > generally assumed for linguistic expressions in (46). Condition (48) (a) takes care of the categorization effect formulated in (47) and of the corresponding semantic integration by means of functional application 35 . (48) (b) imposes the morphosyntactic conditions encoded in the Argument Structure of the head on the relevant complement, and (48) (c) eliminates this information, if it has checked the categorization of the complement. It should be obvious that the actual effect of Combine is completely determined by the information provided by the lexical entries involved. It is mainly in this sense, that the title of this chapter suggests lexical items to be considered as programs for mental computation.

1.4

Further Computational Aspects of Linguistic Expressions

The operation Combine as formulated in (48) accounts for much of the correspondence between PF and SF, given the information of lexical items. It accounts in particular for the way in which the structure of complex expressions deviates from the principle of Strict Functional Combination mentioned in (21) (a): According to condition (48) (b), formal features of lexical origin control the combination of head and complement in complex expressions. We might furthermore construe condition (48) (c) as a filter that rules out expressions with argument positions that are not appropriately saturated, if we adopt the following convention: (49)

Formal features appearing in θ must eventually be eliminated.

35 This must be extended to generalized quantification as informally discussed above. I must refrain here from proposals concerning the technical details of this extension.

Words as Programs of Mental Computation

25

It is mainly for this reason that I did not indicate subcategorization features in (45) above. 36 Combine would furthermore account for the linear ordering of constituents in a complex expression to the extent to which it follows the principle of Uniform Linearization formulated in (21) (b). We simply need to ignore all but the PF-features contained in μ,, retaining just the PF-representation of E'. To put it the other way round, if we spell out the PF features in the μ, of E' created by (48) in sequential ordering, we get a representation that obeys Uniform Linearization. We already noted that this principle does not generally hold. Deviations can be of two types: (A) The head of a complex expression might either precede or follow its coconstituent in PF. Thus in to make an attempt the Verb precedes its complement, while in the German equivalent einen Versuch zu machen the Verb follows the complement. The head-position does not only differ across languages, however, it can vary even within a given language with respect to different categories. (49) is a simple example which exhibits left- and right-headed constituents (indicated by the subscript L and R for the respective heads) in German: (50)

[dieL [AnnahmeL [daßL [jeder [[ZUL [diesem Versammlung]] [gehen wirdR ]R ]R ] ] ]

(the assumption that everybody will go to this meeting) Roughly speaking, in German Determiners {die, dieser), Nouns {Annahme), and Prepositions {zu) are left heads, Verbs {gehen wird) and their projections are right heads. Hence the condition controlling the linearization of constituents must be parameterized with respect to languages and syntactic categories. (B) Constituents of a complex expression can appear outside the domain they belong to according to the operation Combine. The phenomenon is illustrated in (51) and (52) for English and German, respectively, with t indicating the position the coindexed element should occupy according to its role under Combine: (51)

(a) One never knows [ [which meeting]; [Peter [will [attend ti ] ] ] ] (b) will; Peter [ti [attend the last meeting]]

(52)

(a) Man weiß nie [ [welche Versammlung]; [ Peter [ ti besuchen wird ] ] ] (b) wird; [Peter [die letzte Versammlung besuchen t, ] ] (c) [die letzte Versammlung]; [ wirdk [niemand ti besuchen tk ] ] ]

36 It might be noted that (49) together with condition (48) (c) comes close to Chomsky's (ref. 1) notion of checking and eliminating formal features and the role of Full Interpretation precluding uninterpretable features from the interface levels.

26

Manfred Bierwisch

The "dislocation" of phrases, i. e., of maximal, complete constituents in the (a)cases obeys different conditions than the dislocation of minimal heads in the (b) cases37. Looking more closely for the way in which the phenomena sketched in (A) and (B) may be reflected in I-Language, we notice first the conceptual necessity to assign linear ordering to the hierarchy of constituents defined by Combine. The optimal choice according to which the head is uniformly either the left- or the rightmost unit of its constituent must at least be available for parametric variation for different languages and even different categories within one language. We might consider this property, including the parametrization, as part of the computational content of the syntactic category features V, N, etc. If this were correct, variation according to (A) would not be represented explicitly, but rather follow from the interpretation of syntactic features. The effect of this interpretation would then be visible at PF as the correspondence between syntactic structure and PF. We might furthermore assume that one choice, say left-headedness, is the default option, such that only cases of right-headedness need to be marked in some way.38 One might assume, moreover, that languages differ with respect to their basic or default option. 39 The simplest assumption would finally require the headedness of a category to be preserved for all its projections, an assumption that is only apparently at variance with cases like (52), as these require a different account for independent reasons, to which I will turn below. (52)

(a) [ the president's ] [ voteL [against the election ] ]r ] (b) [ theL [ [ voteL [of the president ] ]L [ against the election ] ] ]

While the phenomena in (A) are simply different ways to project the hierarchy created by Combine onto sequential condition that hold for PF, things seem to be more complicated with respect to the phenomena in (B). In cases like (50) and (52), the basic syntactic constituency is not preserved, because two different positions of one unit are needed, one appearing in PF, and one that accounts for the functor37 The properties of "Head-movement" and "Phrase-movement" have been the target of extensive discussion over the past decade or so. See reference (34) and references there for discussion. I cannot go into the intricate issues of this distinction. 38 A proposal of similar orientation has been made in Kaynes (ref. 35). Kaynes proposes a Linear Correspondence Axiom (LCA) that maps constituency uniformly and without parameterization onto sequential ordering. Variation of the sort illustrated in (50) must then be treated as a consequence of syntactic constituency, (see fn. 43.) The difference between Kayne's approach and the one proposed here is that LCA is based on asymmetric c-command — a purely configurational property wrt. syntactic structure, while the notion of head used here is based on the projection of syntactic categorization. 39 This is in fact the gist of ideas developed already in Greenberg (ref. 36), who observes that languages with post-positions tend to have verb final position, while languages with prepositions prefer the verb to precede its complements. - One might add, moreover, the observation, that word- and phrase-structure are often based on different default options.

Words as Programs of Mental Computation

27

argument-structure in SF. As can be seen in (53), the unit that is assigned to more than one position can even be a proper part of a lexical item, as anfangen (start) must be lexically fixed and cannot be derived from its formal constituents an and fangen: (53)

[ fingi [ sie [ [ein neues Experiment ] an ti ] ] (did she begin a new experiment?)

"Dislocations" of this sort are a characteristic property of natural language that the computational system allows, even though this violates the principle of Uniform Linearization. A systematic account of this property was a central concern of Generative Grammar; dislocation was initially captured by means of so-called movement transformations, which were eventually reduced to a single and rather general operation Move, which is subject to characteristic constraints. 40 The interesting point is that the constraints in question can be expressed in terms of properties of lexical items responsible for the relevant aspects of movement. For the sake of illustration, compare the direct question (53) with the embedded question (54a) and the independent clause (54) (b): (54)

(a) Ich weiß nicht [ ob sie ein neues Experiment an fing ] I do not know whether she began a new experiment (b) [Siek [ fingi [ tk ein neues Experiment an ti ] ] ] She began a new experiment

The moved element in (53) is the finite verb fing, which is the head of anfing and all its projections. This element does not move in the subordinate clause (54) (a), but it does move in the main clause (54) (b), where additionally the subject sie must be moved in the initial position. These differences can be reduced to properties of a functional element complementizer, which determines the syntactic and semantic properties of the clause whose head it is. The complementizer of the embedded clause in (54) is ob, which turns the clause into an indirect question, preventing at the same time the finite Verb from moving. The complementizer of (53) turns the clause into a direct question and requires the Verb to move into the top-position. The complementizer of assertive clauses like (54) (b) requires additionally a Topic-constituent to move to the initial site of the clause. While the complementizer in (54) (a) is phonetically realized by ob, the complementizers in (53) and (54) (b) do not exhibit segmental phonetic content, although their pres40 Instead of moving a constituent A from an initial to a final position, which creates a chain (Ai, ti) with the initial position marked by ti, and the final position marked as A , the operation can be thought of as forming a chain between two positions, or just as a twofold relation connecting A to two positions at the same time. See Gärtner (ref. 37) for a discussion of the latter option and a comparison with movement and chain formation. In the following remarks, I will stick to the movement metaphor of dislocation as a convenient way of talking, even though there might be differences between the different approaches with theoretical and empirical consequences.

28

Manfred Bierwisch

ence manifests itself in the syntactic and semantic effects just mentioned. 41 We thus can assume that the three clause types in question differ as follows (with q and a indicating the question— and assertion-complementizer, respectively): (55)

(a) [ob [sie ein neues Experiment an fing]] = (54) (a) (b) [ [ fingi 0 q ] [ sie ein neues Experiment an ti ] ] = (52) (c) [siet [[fing; 0 a ] [tk ein neues Experiment an tj]]] = (53) (b)

The eventual position of the finite Verb is determined by the different complementizers: ob leaves the Verb in its original position, 0 q attracts it to the initial position, and 0 a furthermore gets a topicalized constituent in the position on top. The lexical items creating these effects are functional elements that determine the type of the clause of which they are the head. Simplifying some of the technical details, these items can be represented as follows: (56)

(a) / o b / [ + V , + F , + W ]

λΡ

[QUESTION P ]

[+Fin] (b) / 0 / [ + V , + F , + Q ]

(c) / 0 / [ + V , + F ]

λΡ

ΛΝ

[+Fin] [+Fin] λΡ λν λΧ [+Fin] [+Fin] [+Top]

[QUESTION Ρ ]

[Ρ]

The categorization by [+V, + F ] expresses the assumption that complementizers are functional heads applying to verbal complements, the features + W and + Q are meant to identify subordinate and independent question clauses, respectively. The operator QUESTION in SF abbreviates the semantic properties of (direct and indirect) questions, which I need not elaborate here. The crucial point is the particular (strong) status of the bold face features [+Fin] and [+Top]. So far, features assigned to positions in AS define conditions to be matched by constituents saturating the relevant position under Combine·, now strong features must be matched by the categorization of a constituent within the expression headed by the functional element and require the PF- and Cat-features of this constituent to saturate the position in question. Thus [+Fin] identifies the (smallest) constituent catego-

41 It should be emphasized that phonetically empty elements like the Complementizer in (54) (b) must not be considered as arbitrary stipulations, but are "invisible" elements that must be assumed for strong systematic reasons. A rather uncontroversial case in point is the indefinite Determiner for plural and mass terms like trees or wood, for which there is n o counterpart to the singular indefinite marker a. Using 0 for phonetically empty constituents, we must assume the following paradigm: Definite: the tree the trees the w o o d Indefinite: a tree 0 trees 0 wood I will use 0 to also represent the P F of empty complementizers. This yields several h o m o p h o n o u s (empty) elements, differing, however, with respect to their formal and (possibly) semantic features. — The analysis of Verb-placement sketched here follows Wilder (ref. 38) in crucial respects, w h o deals, however, with a much wider range of facts within a somewhat different minimalist framework.

Words as Programs of Mental Computation

29

rized by [+Fin] and attaches it to the expression headed by the empty complementizer. As usual, saturation eliminates the argument position in question, but in this case it does not have a semantic effect. Formally, this is reflected by the fact that argument positions with strong features do not bind a variable in SF. They are improper positions, so to speak, having only syntactic effects. That this is empirically correct, can be seen in cases like fing ... an, where no identifiable part of SF corresponding to the head fing could possibly be affected. In other words, it is only the lexeme that is relevant for the dislocation triggered by strong features. Suppose, then, that strong features are responsible for dislocation of type (B), triggering a general operation Adjoin (Ει, E2), which can be defined as follows: (57)

Let Ei with be a head or phrase in E2 = < , >, such that κι matches a strong formal feature associated with the first position in Θ2; then: Adjoin (Ει E2) — E", where (a) E" = < < < >, κ>, >, and either (b) 02 = λχ with χ of type 1 ' and σ" = σι : σι' ' where 02' derives from 02 by binding χ by some position λγ in 0i, or (c) 02 = λΡ with Ρ of type 0, and σ" = σι), i. e., the effect of

32

Manfred Bierwisch

functional application of to σι, thereby eliminating Θ2 according to convention (33). The syntactic effect of Combine' (Ει, E2) is exactly that of Combine (Ει, E2). The two operations differ by the fact that under complementation an argument position of the head is discharged, while under modification the only argument position of the modifier is discharged, with the consequences illustrated in (61) and (63) for extensional and intensional modification, respectively. As a matter of fact, the different possibilities can be subsumed under one operation Combine with specific semantic consequences, depending on the Argument Structure of head and non—head, respectively. I will refrain from repeating the details.

1.5

Summary and Perspectives

The outline of mental computation for natural language given above is in need of completion in several respects. It allows, however, some general conclusions about the nature of the language capacity. 1. Starting with the basic observation that knowledge of a language - i.e., ILanguage - provides a systematic and unlimited correspondence between sound and meaning, or more generally: between patterns of signals and conceptual structures, we have to conclude that this correspondence presupposes a computational capacity that goes beyond a mere listing of expressions associating sound and meaning. 2. The computational character of I-Language has two consequences: First, abstract representations, on which combinatorial operations can be based, must be extracted from or projected onto A—P, the domain of articulation and perception of signals, and C - I , the domain of conceptual and intentional representation of experience and behavior. This leads to the abstract representational systems PF and SF of Phonetic and Semantic Form, respectively. Second, within these systems general conditions of structural organization must allow for recursive combinatorial operations. Minimal assumptions require linear concatenation of bundles of articulatory conditions for PF, and a type-based functor-argument hierarchy for SF, where the primes recruit conditions on individuals and states of affairs from C-I. 3. The computational system defining the correspondence between PF and SF must be based on two minimal requirements: first a fixed list of associations with a from PF and σ from SF, and second a general principle that provides the systematic and recursive combination of elements from this list. Minimal assumptions would furthermore require the character of this principle to be inherent in the structure of PF and SF per se, viz. linear concatenation of linked articulatory conditions for PF, and functional application with respect to configurations of semantic primes for SF. Hence the minimal requirement would lead to a uni-

Words as Programs of Mental Computation

33

form PF-linearization of strict functional combination in SF, based on pairs only. 4. As things turn out, natural languages systematically deviate from these minimal conditions of uniform linearization and strict functional combination, relying on additional conditions to be encoded by formal features that are not interpreted in PF or SF. As a consequence, lexical information consists of triples , rather than pairs < α, σ >, where the grammatical properties represented by φ are furthermore divided into the categorization κ and the Argument Structure Θ. A categorized Phonetic Form μ = is called the lexeme, the Semantic Form σ prefixed by the Argument Structure θ is called the lemma of a lexical entry. 5. The Argument Structure specifies essentially the combinatorial properties of listed lexical items as well as of complex expressions based on them. In this sense, words control the computational processes leading to arbitrary complex expressions by means of their Argument Structure Θ. For the same reason, θ is the interface between κ and σ, in other words, θ is the place where semantic and morpho-syntactic conditions are interrelated. 6. Conditions controlling the deviation from the conceptually simplest computational system are stored in φ and must be taken up in the computational principles determining complex linguistic expressions. For this reason, the operation Combine (Ει, E2) creating a complex expression E' must make reference to conditions contained in κ and Θ, distinguishing, among others, between complementation and modification. This distinction is due to the fact that being the head of an expression and discharging an argument position need not coincide, and can furthermore be subject to idiosyncratic, lexically fixed grammatical conditions. Thus, the two parts of Combine, sketched in (48) and (64), account for deviation from strict functional combination; violations of the principle of uniform linearization are taken care of by the operation Adjoin characterized in (57) and the parameter sensitivity of Linearize given in (58). Thus, starting from minimal assumptions characterizing a system that systematically associates signals with meaning over an unlimited range, we observed that natural languages systematically deviate from these assumptions by conditions giving rise to formal, morpho-syntactic features leading in particular to the distinction between conceptually determined semantic representations and computationally induced syntactic structures. To the extent to which these observations are correct, we are lead to the interesting question, what determines the deviation of natural linguistic capacity from the most parsimonious computational system. At least the following possibilities are to be considered: First, the conditions that ultimately crystallize in the features of φ might be properties fixed in the genetic endowment due to either selectional adaptation, or to exaption without selectional benefit, accompanying other phylogenetic developments. Assuming, secondly, that the possibility of φ—features does not arise from selectional adaptation, one might consider them as emerging properties, either due to as yet unknown computational properties of the overall neurophysiological basis of the language capacity, or — least likely — due to boundary

34

Manfred Bierwisch

conditions of language use. The decision between these alternatives should ultimately be an empirical issue, which can crucially profit in any case from a better understanding of the structural properties of the computational system and its output.

References 1. Chomsky, N. (1995). The Minimalst Program (Cambridge, Mass.: MIT Press). 2. Bierwisch, Μ. (1997). Lexical Information from a Minimalist Point of View. In: The Role of Economy Principles in Linguistic Theory, C. Wilder, Η.—M. Gärtner, and M. Bierwisch, eds. (Berlin: Akademie—Verlag), pp. 227—266. 3. Klima, Ε. and Bellugy, U. (1979). The Signs of Language (Cambridge, Mass.: Harvard University Press). 4. Sacks, O. (1989). Seeing Voices (Berkeley, Los Angeles: University of California Press). 5. Katz, J. J. (1972). Semantic Theory (New York: Harper and Row). 6. Chomsky, N. (1988). Language and Problems of Knowledge (Cambridge, Mass.: MIT Press). 7. Pinker, S. (1994). The Language Instinct (New York: Harper Collins). 8. Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin and Use (New York: Praeger). 9. Chomsky, N. (1981) Lectures on Government and Binding (Dordrecht: Reidel). 10. Bierwisch, Μ. (1989). The Semantics of Gradation. In: Dimensional Adjectives, M. Bierwisch and E. Lang, eds. (Berlin, New York: Springer—Verlag), pp. 71-261. 11. Jackendoff, R. (1990). Semantic Structures (Cambridge, Mass.: MIT—Press). 12. Kamp, Η. and Reyle, U. (1993). From Discourse to Logic (Dordrecht: Kluwer Academic Press). 13. Ajdukiewicz, K. (1935). Die syntaktische Konnexität, Studia Philosophoca, 1, 1—27.

14. Mohanan, K. P. (1986). The Theory of Lexical Phonology (Dordrecht: Reidel). 15. Halle, M. and Vergnaud, J . - R . (1980). Three-dimensional Phonology, Journal of Linguistic Research, 1, 83—105. 16. Halle, M. and Vergnaud, J . - R . (1987). An Essay on Stress (Cambridge, Mass.: MIT Press). 17. Tarski, A. (1956). Logic, Semantics, and Metamathematics (London: Oxford University Press). 18. Marr, D. (1982). Vision (San Francisco: Freeman). 19. Bloom, P., Peterson, Μ. Α., Nadel, L., and Garrett, M. F. (eds.) (1996). Language and Space (Cambridge, Mass.: MIT Press). 20. Miller, G. Α., Johnson-Laird, P. N. (1976). Language and Perception (Cambridge, Mass:. Harvard University Press). 21. Frege, G. (1964). Begriffsschrift und andere Aufsätze (Darmstadt: Wissenschaftliche Buchgesellschaft). 22. Wittgenstein, L. (1922). Tractatus Logico—Philosophicus (London: Routledge and Keagan Paul). 23. Oehrle, R. T., Bach, E., and Wheeler, D. (eds.) (1988). Categorial Grammars and Natural Language Structures (Dordrecht: Reidel). 24. Cresswell, Μ. J. (1973). Logics and Languages (London: Methuen). 25. Montague, R. (1974). Formal Philosophy (New Haven:Yale University Press). 26. Bierwisch, M. (1983). Semantische und konzeptuelle Repräsentation lexikalischer Einheiten. In: Untersuchungen zur Semantik, R. Ruzicka and W. Mötsch,

Words as Programs of Mental Computation

35

eds. (Berlin: Akademie-Verlag), pp. 61-99. Pustejovsky, J. (1995). The Generative Lexicon (Cambridge, Mass.: MIT Press). Dowty; D. (1979). Word Meaning and Montague Grammar (Dordrecht: Reidel). Chomsky, N. (1965). Aspects of the Theory of Syntax (Cambridge, Mass.: MIT Press). Grimshaw, J. (1990). Argument Structure (Cambridge, Mass.: MIT Press). Wunderlich, D. (1997). CAUSE and the Structure of Verbs, Linguistic Inquiry 28, 27-68. Rosier, F., Pütz, P., Friederici, A. D., and Hahne, A. (1993). Event Related Potentials While Encountering Semantic and Syntactic Constraint Violations, Journal of Cognitive Neuroscience, 5:3, 345-362.

33. Levelt, W. J. M. (1989). Speaking. From Intention to Articulaion (Cambridge, Mass.: MIT—Press). 34. Chomsky, N. and Lasnik, H. (1993). The Theory of Principles and Parameters. In: Syntax: An International Handbook of Contemporary Research, J. Jacobs, A. von Stechow, W. Sternefeld, and T. Vennemann, eds. (Berlin: Walter de Gruyter), pp. 506-569. 35. Kayne, R. (1994). The Antisymmetry of Syntax (Cambridge, Mass.: MIT-Press). 36. Greenberg, J. (1966). Language Universale (The Hague: Mouton). 37. Gärtner, Η. Μ. (1998). Generalized Transformations and Beyond (Berlin: Akademie - Verlag). 38. Wilder, C. (1995). Derivational Economy and the Analysis of V2, FAS Papers in Linguistics, I, Berlin, 117—156. 39. Higginbotham, J. (1985). On Semantics, Linguistic Inquiry, 16, 547 — 593.

27.

28.

29.

30. 31.

32.

2. Discovering Grammar: Prosodic and Morpho-Syntactic Aspects of Rule Formation in First Language Acquisition Barbara Höhle and Jürgen Weissenborn

2.1

Introduction

Of all human activities linguistic behavior seems to be the one which is most directly characterizable as rule governed: Acquiring a language means to build up the body of rules which constitutes the linguistic knowledge of the adult. Two aspects of this knowledge should be distinguished: First, linguistic knowledge proper which concerns the different levels of representation of the language to be learned, i. e. its phonology, morpho-syntax and semantics. Second, the child has to learn how to put this knowledge to use in actual communication. This constitutes the social function of language. One main difference between these two kinds of knowledge is that the former varies only marginally between the members of a given linguistic community whereas the latter may differ greatly among individuals. The clearest manifestation of the relative homogeneity of linguistic knowledge proper among speakers of the same language is the great extent to which they agree on what constitutes a violation of the 'rules' of their mother tongue. This difference in the variability of the two types of knowledge suggests that it is at least partially related to the conditions under which acquisition takes place. Thus the individual aspects of linguistic experience seem to play only a marginal role in the acquisition of grammatical knowledge whereas they may strongly influence the development of the capacities of language use. There is strong evidence that language with the structural properties as displayed in humans is a species-specific faculty. Although animals can coordinate their behavior using 'signals', their communication is limited to specific types of information (see chapter 4). That is, the structure of their communicative systems lack the properties which constitute the originality of human language, i. e. the possibility to form an infinite number of different messages on the basis of a small set of primitive elements and to engage into cooperative problem solving which lies beyond biologically preprogrammed solutions. Additional evidence for species-specificity comes from the rather unsuccessful attempts to teach primates some form of human language like American Sign Language (1). The assumption that the language faculty is at least partially genetically determined is further supported by a number of observations which show that language development can take place even under heavy disturbances of the external and internal environment of the child. Thus, language learning in blind children is not substantially different from language learning in sighted children as shown by Landau and Gleitman (2; but see also 3), nor is sign language acquired differently by deaf

38

Barbara Höhle and Jürgen Weissenborn

children than is oral language by hearing children as shown by Newport and Meier (4). Furthermore, genetic disorders like the Down Syndrome or the Williams Syndrome affect the child's capacity to acquire language differently from other domains of cognitive development. Whereas the non-linguistic development of these children may be heavily impaired, language development may nevertheless proceed in much the same way as in normal children albeit slower and although these children may not reach full linguistic competence (5—7). These observations together with cases of exceptional linguistic abilities like the one of the individual Christophe (8) constitute strong evidence for the dissociation between the faculty to acquire a language and other cognitive capacities, i. e. for a modular organization of the human cognitive system. They relativize views which assume that language development as a whole crucially depends on general cognitive development (9; 10). Another source of evidence for a genetic predisposition for language is that there seems to be a 'critical period' for acquisition after which normal language development is no longer possible (1; 11). Corresponding constraints on learning have been found for the development of bird song (12) and the development of vision in cats (13). Evidence for a critical period in human language development comes from children who have undergone brain surgery in the speech area (14) and from feral children like Genie who had practically no language exposure until age 13 (15). The linguistic abilities of these children, especially in the domain of syntax, never reach normal levels (16). Similarly, how well deaf children growing up with speaking parents who do not know sign language acquire sign language depends on how early they are exposed to it (17).The existence of a critical period may be related to yet poorly understood maturational processes in the brain (18) as hemispheric specialization (e. g., the lateralization of the speech areas in the left hemisphere) and to the decrease of sensitivity to information which is crucial for the formation of grammatical rules. These findings indicate strongly that the child's mind is not a 'tabula rasa' onto which language is engraved by behavioristic stimulus-response mechanisms similar for all kinds of learning but that it is especially equipped in a way that makes the task of language acquisition possible under the different external and internal conditions just mentioned as argued by Chomsky in his review of Skinner's "Verbal Behavior" (see 19). That is, it seems plausible to assume that the system of human cognitive faculties is at least partially organized in a modular way, i. e. that it is based on information-specific, to a certain degree autonomous, genetically determined subsystems which develop on the basis of innate knowledge (20). Accordingly, one of the main goals of language acquisition research is to determine which are the perceptual and cognitive capacities the child is equipped with from the beginning. One way to investigate this issue is to find out which potentially linguistically relevant aspects of the linguistic and non-linguistic input the child is sensitive to from early on, and how this sensitivity to specific input properties is put to use by the child in order to discover the relevant invariants in the input, i. e. to extract from the input the language specific regularities, i. e. rules. There is accumulating evidence that the child's linguistic behavior conforms in different areas from very early on to the linguistic behavior of the adult. Between age 2;00 and 3;00 the child has acquired the basic rules of the target language.

Discovering Grammar: Rule Formation in First Language Acquisition

39

Interestingly, there are syntactic domains like word-order in which language development is almost errorfree from the beginning as shown for German among others by Penner and Weissenborn (21; see also 22; 23). This means that the child must have acquired the relevant regularities before their first overt manifestation in language production (24). This picture seems not to be much different for children acquiring more than one language simultaneously as argued for example by Meisel (25; 26) and by Tracy (27). Together with the observation that there are innumerable errors that one would expect children to make but that in fact never show up, these findings lend support to the assumption that children must be equipped with some kind of a priori knowledge that constrains their hypotheses about how to break up the incoming speech signal and to extract the underlying regularities, i. e. the grammar of the mother tongue. A number of approaches to language acquisition have thus incorporated a more or less rich innate component. This component is most clearly worked out in linguistic approaches to language acquisition based on specific grammatical models like Generative Grammar (e. g., 28-30). These share the approach of Chomsky (31) as a common ancestor. In present versions, Chomskian approaches postulate a Universal Grammar that characterizes the language faculty of the child at the initial stage of language acquisition (32). Universal Grammar consists of principles relevant to various subsystems of the language. Some of the principles may be parameterized, i. e. they can take one of a small number of values. Each parameter defines a specific property of the target language. Two examples for parameters are the Null-Subject parameter and the WH-parameter. The former determines whether a given language has the possibility to leave out the subject of a sentence. Thus Italian allows mangio besides io mangio whereas this is impossible in French or English where the subject has always to be overt, like in je mange or I eat. The WH-parameter concerns the position of interrogative pronouns in questions. Thus in Chinese the interrogative pronoun occupies the same position in the sentence as the constituent it asks for, e. g., John eats what? whereas in English it has to be placed sentence initially, i. e. What does John eat?. The parameters thus limit the number of hypotheses about the grammar of the target language that the child can formulate. The acquisition process thus essentially consists in the child finding out which are the parameter values that hold for the language to be learned. In order for this process to be successful Roeper and Weissenborn (33) have proposed that for every parameter there must exist unambiguous trigger information in the input which tells the child which parameter value to choose (33). 'Learning' in the nativist framework thus consists in the child's identification, from a set of pregiven candidates, of those parameter values that define the grammar of her 1 language. An important question is whether the child's linguistic knowledge changes not only quantitatively but also qualitatively over time. That is, are the categories and principles that underlie the child's linguistic knowledge consistently the same as 1 Following current usage we employ the feminine form of pronouns referring to the noun "child".

40

Barbara Höhle and Jürgen Weissenborn

those of the adult, or are they different at some point? Under the assumption of strict continuity at every point in development, the child's grammar will be a possible subset of the adult's grammar using the same formal categories as argued e. g., by Pinker (28) and Weissenborn (23; see also 21). A weaker continuity hypothesis assumes that, although the child's grammar uses the same formal categories as the adult's, it may include at some stage of development generalizations that lead to structures that are ungrammatical in the language to be learned, but that constitute a valid option in some other language (34). An alternative view allows for discontinuity: the child's linguistic representations may be based on categories, principles and corresponding grammatical rules that are qualitatively different from those of the adult, e. g., relying on different word-order principles or semantic categories (35—37). The origin of the discontinuities is generally explained by assuming particular innate biases of the child to analyze the linguistic input. These biases may be either linguistic or cognitive. Thus, as an example of a linguistic bias, in the parameter-setting model the child may, in the absence of the relevant triggering data, initially fix a given parameter in a different way than it is ultimately fixed in the target language. Data will be necessary to signal that a resetting is needed (38). It has been suggested, in fact, that the values of a given parameter are ordered, in the sense that one of these values constitutes the 'default' value, in all languages. For example, Hyams (38) assumes that all children start with the assumption that their languages allow null subjects, this holds for children learning Italian and Spanish, which do, and for children learning English, which does not. The child learning English must therefore at some later point of her linguistic development discover the correct option for her language. An essentially cognitive bias is proposed in Slobin (10). Slobin assumes that the child is biased to linguistically encode first those meanings which enter into the representations of 'prototypical situations'. These core semantic notions, together with the application of a basic set of the so-called Operating Principles, result in a 'Basic Child Grammar'. That is, Slobin postulates that during the initial stages of the acquisition process the linguistic representations of children learning different languages are reducible to largely the same set of principles. It is important to point out that all approaches which assume that the grammar of the child and the adult may differ in a substantial way must explain how the child can succeed to get rid of her erroneous linguistic knowledge as she eventually always does. The case of children acquiring two or more languages simultaneously constitutes an ideal test situation for this hypothesis. The available evidence seems to indicate that the specific grammatical system of the language to be learned constrains the acquisition process more than the 'Basic Child Grammar' hypothesis predicts: That is, the children treat the languages differently from the beginning. These findings support a continuity view of language acquisition also in the case of bilingual development (25; but see 39). Which is the kind of input information that enables the child to build up her linguistic knowledge in such a straightforward way? Basically there are three types of information the child can draw on, namely prosodic, conceptual-semantic, and syntactic information. Obviously these three types of information are not equally available to the child from the start of language acquisition. It is not before the

Discovering Grammar: Rule Formation in First Language Acquisition

41

child has identified potentially linguistically relevant units in the input like morphemes and words that she can start trying to m a p semantic-conceptual units onto formal ones, i. e. to associate meaning with form, or to exploit co-occurrence relations between previously identified units, i. e. to apply distributional learning procedures. That is to say that acoustically based information must be the first type of information in the speech signal the child can make use of. We will see that from this initial restriction to basically one type of information as input to the child's learning mechanisms may follow crucial aspects of the structure of the acquisition process. In the following we will argue that from very early on children are engaged in detailed analyses of the speech input that focus on different features across time. Existing data on the change of perceptual abilities during the first year of life support a picture according to which the first step into language seems to be guided by prosodic features that are directly accessible from an acoustic representation of the input. Later on, learning mechanisms acting on non-prosodic distributional characteristics of the input show up and become integrated into more complex analyzing procedures.

2.2

The Prosodic Bootstrapping Account

The idea that children may use prosodic information for a first step into a structural analysis of the speech input has been put forward among others by Gleitman and Wanner (40) and Pinker (28). Within this so-called 'prosodic bootstrapping account' it is assumed that prosodic information constrains the child's syntactic and semantic analysis of the input. Prosodic features like stress, rhythm and intonation might support the child's solution of two basic tasks that are fundamental in acquiring the syntactic regularities of the mother tongue, namely the segmentation of the speech stream into syntactically relevant portions (sentences, clauses, phrases, words) and the assignment of these units to different grammatical classes. This view supposes that there are systematic relations between prosodic and syntactic structure that are accessible to perceptual mechanisms in a rather direct bottom-up fashion and that children are sensitive to this very specific kind of acoustic information from very early on. In the following section, evidence for these prerequisites for a prosodic bootstrapping mechanism will be reviewed.

2.2.1

Prosodic Information as a Cue to Syntax

Prosodic properties of spoken language are multi functional. One of the main functions of prosody is a pragmatic one. By modulating the intonational contour of an utterance the speaker can express the communicative goal he wants to achieve with his utterance. For example, a question typically involves an increase in fundamental frequency towards the end of the utterance whereas a decrease is more typical for a declarative. Furthermore, the information structure of a sentence is highlighted by prosodic information. The informationally most prominent

42

Barbara Höhle and Jürgen Weissenborn

part of the utterance - the sentence focus — normally carries the main stress of the sentence. Last but not least, affective attitudes of the speaker also influence the prosodic structure of an utterance. The prosodic bootstrapping account focuses on the at least partial correlation between prosodic and syntactic boundaries. These boundaries are marked acoustically by a rather restricted set of phonetic features which provide information for the assignment of a syntactic structure to an utterance. In languages like English and German there are three phonetic features which serve as boundary markers. First, pauses in the signal occur more likely at major syntactic boundaries than at syntactically irrelevant points (e. g., 41-43). Second, syllables occurring directly before a clause boundary are lengthened as compared to syllables at within-clause positions (e. g., 42; 44—48). Third, typical changes of the fundamental frequency can be observed before clause boundaries with a clear discontinuity of the contour at the point of the boundary (42; 48-50). For adult listeners it has been shown that they make use of these different kinds of prosodic information for the syntactic analysis of spoken language (46-48; 51-54). Another perceptually prominent prosodic feature is stress. Phonetically spoken stressed syllables have a longer duration, a higher pitch and a higher amplitude than unstressed syllables (e. g., 55). These acoustic characteristics make stressed syllables perceptually more salient than unstressed syllables. Furthermore, in socalled stress-timed languages like English and German, the rhythmic pattern of language is carried by the stressed syllables which tend to appear at regular temporal intervals within a sentence (56). As Cutler (e. g., 57) has noted, the rhythmic pattern of language might support the segmentation of the continuous speech stream into words. Contrary to written language where word boundaries are clearly marked by spaces, there is no comparable unique phonetic signal for word boundaries in spoken language. But, for example in English, the majority of content words have a strong syllable as their first syllable (58). In this case, a segmentation strategy of assuming that each strong syllable marks the beginning of a new content word would be successful in about 90% of the cases. There is empirical evidence that adult speakers of English tend to assign a word boundary before strong syllables (59; 60). Furthermore, breaks in the periodical appearance of strong syllables caused by pauses or lengthening of segments might be an additional prosodic parameter that influences the perception of prosodic or syntactic boundaries (47; 51). According to proposals by Mazuka (61) and Nespor and colleagues (62) basic configurational parameters like the Branching Direction parameter and the Head Direction parameter may already be set on a prelexical stage of language acquisition on the basis of prosodic information. The Branching Direction parameter determines the side of recursiveness, e. g., whether an embedded clause follows or precedes the main clause. The Head Direction parameter determines whether complements follow or precede their head. The two parameters are correlated since in most cases a right-branching language is head-initial like English, French and Italian and a left-branching language is head-final like Japanese and Turkish. A preverbal setting of these parameters is assumed since word order errors which would be an indicator of an incorrect setting do not occur in children's early

Discovering Grammar: Rule Formation in First Language Acquisition

43

utterances (see section l).The two proposals disagree with respect to the specific kind of prosodic information the child may use to set these parameters. According to Mazuka (61) the crucial phonetic cue is the strength of the prosodic break between main and subordinate clauses. She found that the clause boundary is prosodically more clearly marked in left-branching constructions than in rightbranching constructions. Nespor et al. (62) assume that the rhythmic pattern within one special prosodic unit, the so-called phonological phrase, provides the crucial information. In right-branching languages, the prosodically most prominent element appears at the end of a phonological phrase resulting in a weakstrong rhythmical pattern within the phonological phrase. Contrary to this, in left-branching languages the prosodically most prominent element appears at the beginning of a phonological phrase resulting in a strong-weak rhythmical pattern within the phonological phrase. The prosodic features discussed above might support the child's earliest steps into language structure by providing information about the boundaries of syntactic units and the order of elements within them. However, most of the reported findings of a correlation between prosodic and syntactic boundaries stem from analyses of recorded sentences constructed especially for the purposes of detailed phonetic analyses. There is evidence that prosodic boundaries are phonetically less well marked in spontaneous speech (63). Another consideration is that in several cultures people interacting with very young children use a different language register than when interacting with other adults. One of the main differences between infant directed and adult directed speech is a prosodic. Infant directed speech has a higher overall fundamental frequency and a wider variation in fundamental frequency than adult directed speech (64). Furthermore, in infant directed speech clause boundaries coincide with prosodic boundaries more often than in adult directed speech and are more frequently marked by clear pauses (64-66). Content words are stressed more than function words, but on the other hand function words are reduced to a lesser degree than in adult directed speech (67). These specific prosodic features of infant directed speech together with the higher correspondence between syntactic and prosodic boundaries may facilitate prosodic bootstrapping. Several studies have shown that infants prefer to listen to samples of infant directed speech compared to samples of adult directed speech (68—70). This finding suggests that the specific features of infant directed speech may especially match the infant's perceptual abilities (69). But it is unclear whether the prosodic features of infant directed speech are as strongly related to the acquisition of structural knowledge of language as a first view suggests since the typical features found for American English do not hold to the same degree for other languages (71; 72).

2.2.2

Children's Sensitivity to Prosodic Information

Experiments using the high-amplitude-sucking method have shown that even newborns are able to discriminate their mother tongue from another language (73). This ability was also shown when the speech stimuli were low-pass-filtered with a cutoff frequency of 400Hz. Speech stimuli filtered with this cutoff frequency still contain the prosodic characteristics of fundamental frequency contour, pausing and length-

44

Barbara Höhle and Jürgen Weissenborn

ening but the phonetic information that is necessary to identify the segmental content is missing (but see 74). Furthermore, two-month old children can not only differentiate their native language from another one but they prefer to listen to their native language. This preference was manifested with natural speech as well as with low-pass-filtered speech samples (75). The fact that the results are the same for natural and low-pass-filtered speech suggests that it is a sensitivity to the prosodic characteristics of language that enables the child to differentiate between her native and other languages. As Jusczyk (76) suggests, this early sensitivity to language-specific prosodic properties of speech might have its reason in prenatal hearing experience. Since the uterine wall acts similar to a low-pass-filter the child is already exposed to the prosodic characteristics of her native language well before birth. Further investigations show that newborns are not only sensitive to global prosodic properties of language but also to very specific features. From birth on children are able to discriminate multisyllabic strings on the basis of different stress patterns (77—79). Furthermore, they are able to discriminate bisyllabic sequences that were spliced out of a single word from bisyllabic sequences that were spliced out of two words (e. g., /mati/ from "mathemaiz'quement" or "panorama /^pique"; 80). This result indicates that newborns are sensitive to the potential prosodic markings of syntactically relevant boundaries. At the age of 7 months, infants prefer to listen to speech samples with pauses inserted at clause boundaries compared to speech samples with pauses inserted at within-clause locations (81; 82). The same effect was found for nine-month-olds with speech samples that were interrupted by pauses either at phrase boundaries or at within-phrase positions (83; 84). It has been pointed out that this preference for 'natural' pause positions is not due to the mismatch between prosodic and syntactic structure but may be caused by the disturbance of the correlation of the different acoustic features that normally interact in marking prosodic boundaries. Since the within-clause or within-phrase pauses were inserted into the speech stimuli after the recording had been made they were not accompanied by final lengthening or fundamental frequency changes which characterize clause or phrase boundaries in spontaneous speech. The reported preference for the natural speech samples seems to be the result of a learning process since it is not found in 6-month-olds. Learning in this domain has to be assumed since languages show variation concerning the specific phonetic features they use to mark clause internal prosodic boundaries (85). Early learning of language-specific prosodic properties has also been shown for lexical stress, a feature which is a subject to inter-language variation as well. In English, a language with variable stress position but with a majority of content words stressed on the first syllable (58), children at nine months of age prefer to listen to bisyllabic words stressed on the first syllable compared to bisyllabic words stressed on the second syllable (86). The same preference was not found for six-month-olds.

2.2.3

Prosodic Information in Children's Speech Processing

Prosodic information, especially prosodic boundary information, may help the child to segment the speech input into smaller chunks that are easier to process for further analysis. If this segmentation is guided by prosodic boundary markers,

Discovering Grammar: Rule Formation in First Language Acquisition

45

it is guaranteed - given the partial correlation between prosodic structure on the one hand and semantic and syntactic structure on the other hand - that in most cases these chunks are linguistically relevant. There are indications that prosodic information enhances sensitivity to different aspects of the speech signal from very early on. As work from Mandel, Jusczyk and Kemler Nelson (87) shows, even children at two months of age are more sensitive to phonetic changes in a prosodically structured speech string compared to a string that does not have any prosodic structure above the word stress level. This result suggests that the prosodic organization of an utterance might enhance memory capacities for phonetic and other linguistic information which by itself might be a prerequisite for a more detailed analysis of the incoming signal. Investigations of Morgan and colleagues (88; 89) demonstrate how the sensitivity to prosodic — especially to rhythmic features of the input — may relate to the development of word segmentation skills. They presented sequences of nonsense syllables with either a constant or a varying rhythmic pattern to children between six and nine months of age. After several repetitions of the syllable sequences, noise bursts were presented between the syllables. Children exposed to the constant rhythmical patterns showed weaker reactions to the noise bursts than the children presented with varying rhythmical patterns. From this result, they conclude that children form more coherent representational units from syllable sequences occurring in a constant stress pattern. Thus the initial identification of word-like units in the speech input may be supported by a sensitivity to rhythmic patterns and a tendency to segment the input into rhythmic groups. It is still unclear whether we already have to assume a preference for segmenting the input into trochaic (stress-unstressed) units over iambic (unstressed-stressed) units in these early phases. Morgan and Saffran (89) did not find any differences between the coherence of trochaic and iambic strings. But this equivalence may be the result of learning given that the same strings had been presented to the children several times. Recent work from Morgan (88) shows that the rhythmic grouping is immediately transferred to novel syllable strings with a trochaic pattern but not with a iambic pattern. Evidence for a preference to segment strings of nonsense syllables into trochaic rhythmical groups for nine-month-olds but not for seven-month-olds was found by Echols, Crowhurst and Childers (90). These results indicate that at least children who grow up in an English speaking community where the trochaic pattern is the dominant one have a tendency to segment their input into trochaic rhythmical groups from very early on. Direct evidence that the sensitivity to rhythmical patterns in the input relates to word segmentation and word recognition skills stems from work of Jusczyk and colleagues (76; 91). They found that children at seven to eight months of age already can recognize bisyllabic words that were first presented to them in isolation in a text. But this ability only appeared for words with a trochaic stress pattern and not for words with an iambic pattern. Iambic words were not recognized in context before the age of ten to eleven months. The hypothesis that the delay for the recognition of iambic words is based on a missegmentation of the speech string (which is expected if the children had a bias to segment their input into trochaic units) is supported by further observations from these studies. First,

46

Barbara Höhle and Jürgen Weissenborn

when the children were first exposed to the isolated stressed syllable of either a trochaic or an iambic word they recognized this stressed syllable in a text passage which contained the iambic word but not in a text passage which contained the trochaic word. As with the results of Morgan (88), this finding suggests that trochaic words are perceived as coherent units from which it is hard to extract a subpart. Contrary to this, iambic words are not represented as units but are segmented before the stressed syllable. This segmentation supports the recognition of the single syllable. Second, when presented first with text passages in which an iambic word was always followed by the same unstressed syllable (e. g., the guitar is..) the children recognized pseudowords consisting of the stressed and the following unstressed syllable (taris) presented to them later in isolation. These data clearly support the assumption that English children have a preference to segment their input into trochaic units. This preference seems also to exist for speech production. As Gerken (92—94) has suggested, the omission of unstressed elements in early language production can be explained by the assumption that speech production is influenced by a metrical template consisting of a stressed syllable followed by an unstressed one. According to this assumption, only unstressed syllables that do not fit into this template, namely an unstressed syllable preceding a stressed one or an unstressed syllable following another unstressed one are missing in children's utterances. Whether the preference for the trochaic bisyllabic rhythmic group reflects a universal tendency during early language acquisition or whether it is already a result of language specific patterns in the input is still an open question since comparable investigations in languages the dominant stress pattern of which is iambic have not yet been done.

2.2.4

Limits of Prosodic Bootstrapping

Altogether, there is considerable evidence that children are sensitive to prosodic characteristics of language from very early on and that they use this information in processing their speech input. Nevertheless, prosodic information is only of limited value with regard to the extraction of syntactically relevant units out of the speech stream. The reason for this, as mentioned before, is that even though prosodic and syntactic structures are related they are autonomous subcomponents of grammar and thus coincide only partially as argued strongly by Jackendoff (95). Examples for systematic matches between prosodic and syntactic structures are sentence boundaries, boundaries of non-restrictive relative clauses and parentheticals which are obligatory marked by prosodic boundary features, (cf. 96). But in many cases, boundaries of prosodic units are not isomorphic with boundaries of syntactic units (44). There are optional prosodic boundaries which can be inserted at various positions in an utterance (96). The existence of these optional boundaries poses a special problem for correlating prosodic and syntactic structure. The positioning of optional prosodic boundaries is related to performance factors. It is more likely that a long utterance is broken up into more than one prosodic unit than a short one (44). Thus for example, the boundary between subject and predicate is prosodically more marked when these constituents are

Discovering Grammar: Rule Formation in First Language Acquisition

47

long (e. g., the girl with long blond hair # left her mother's house; # marks a prosodic boundary) than when they are short (e. g., the girl # left the room). Generally speaking, this means that the same syntactic boundary might be marked prosodically in one sentence but not in another. Furthermore, optional boundaries can freely cut across syntactic constituents. According to Selkirk (97), the positioning of optional boundaries within an utterance is not determined by its syntactic but by its semantic structure. Depending on differences in informational structure the same sentence may have different prosodic structures. Thus, as Selkirk (97, p. 285) states "... the relation between syntactic structure and all aspects of intonational structure can be depicted as a one-to-many mapping". For example the sentence Jane gave the book to Mary (97, p. 293) could be realized with six different intonational phrasings: Φ Jane Φ Jane Φ Jane # Jane Φ Jane Φ Jane

gave the book to Mary# # gave the book to Mary^r gave the book # to Mary# gave # the book # to Mary Φ # gave the book # to Mary Φ # gave # the book # to Mary#

Although the boundaries of the prosodic constituents altogether occur at syntactically relevant points, only the intonational phrasing of (b) corresponds to the syntactic constituents of this sentence. Systematic mismatches between prosodic and syntactic structures appear when a syntactic constituent consists only of a function word. If, for example, the subject in the above example were a pronoun (/ gave the book to Mary) an intonational break between the subject and the predicate would be very unlikely for two reasons. First, a prosodic break after the subject would lead to a very short prosodic constituent, and as was already mentioned, length influences prosodic grouping. Second, this prosodic constituent would contain nothing but a function word. Function words form their own prosodic constituents only if they are stressed which only occurs in specific conditions (98). That is, an unstressed function word does not form a prosodic constituent of its own but rather cliticizes to an adjacent content word. This unit cannot be differentiated from a simple lexical word on prosodic grounds. Thus in a sentence like the woman is going to a party the auxiliary can cliticize to the preceding noun: the woman's going to a party. In this case the syntactic relevant boundary between subject and predicate is situated within a prosodic word and thus prosodically unmarked. The last example leads to a related problem concerning the usefulness of prosodic information to identify word boundaries. According to the rhythmic principle mentioned above in a language like English it is very likely that a stressed syllable indicates the beginning of a new word. But according to an analysis of spoken language by Cutler and Carter (58) around 10% of the lexical words in spoken English do not start with a stressed syllable (i. e. fertility). That is, a purely stress based segmentation strategy would locate a word boundary between the first syllable fer and the second stressed syllable ti. This also means that a purely

48

Barbara Höhle and Jürgen Weissenborn

stress based segmentation strategy could not recognize unstressed function words as syntactically independent units. Since function words constitute a high percentage of the word tokens in spoken language a stressed based segmentation strategy would often fail. A further question is whether the phonetic features that mark syntactic boundaries can be recognized by purely bottom-up processing mechanisms given that the critical phonetic features do not only serve this restricted syntactic purpose. Pauses in continuous speech can occur at different positions in the utterances and are related to non-linguistic factors as cognitive, affective and social variables (99). Detailed measurements of spontaneous speech reveal that a considerable amount of the pauses in spoken language do not appear at major syntactic boundaries but within phrases (100—102). Without syntactic knowledge the child has no possibility to decide whether a pause in the incoming signal coincides with a syntactic boundary or not. The same problem occurs with regard to the usefulness of the length of segments as a boundary signal. On the one hand, duration is an inherent feature of phonemes. For example, in German the length of a vowel is a phonologically distinctive feature as there are words which can only be discriminated by vowel length. Thus the pure duration of a vowel does not provide any information about whether this vowel precedes a syntactic boundary or not. Duration only gives information about boundaries if the duration of the same syllable or even the same word is compared across a number of different sentence positions. This means that the child cannot directly use durational information to identify syntactic boundaries but has to rely on a previous distributional analysis. In the next section, we will argue that children are equipped with perceptual mechanisms for distributional analyses of non-prosodic features of the input. This type of learning seems to overrule the reliance on prosodic information in the second half of the first year of life and may contribute to the solution of the problems left open by prosodic bootstrapping.

2.3

Distributional Learning in the Acquisition of Morpho-Syntax

2.3.1

Non-Prosodic Distributional Information as a Cue to Syntax

Distributional analysis as a tool for linguistic research essentially goes back to the work of the American structuralists (e. g., 103). The basic purpose for which this form of analysis was developed was the identification of the relevant units on the different levels of linguistic representation (e. g., phonemes, morphemes, syntactic phrases) and the assignment of these units to grammatical classes. The analysis procedures should only consider formal properties and should not take semantic information into account. The first step of distributional analysis consists in segmenting utterances into linguistically relevant units. A part of an utterance can be considered to be a relevant segment if it can appear in the same form across

Discovering Grammar: Rule Formation in First Language Acquisition

49

different contexts. The criterion for grouping segments together into the same grammatical class is their interchangeability in a given context. The classification of segments is a necessary prerequisite for the formulation of grammatical rules which should hold for entire classes of elements and not only for single elements. To some degree, the child that starts to learn her mother tongue is in a similar situation as the linguist who investigates a language which is foreign to him. The child has to segment the input, classify the resulting units and determine the language specific combinatorial regularities. We assume - and there is evidence that this is correct - that during the second half of their first year children start to rely more heavily on non-prosodic distributional properties of the input to solve these tasks. The growing sensitivity of the analysis procedures for morphosyntactic distributional information may demote the importance of prosodic information and consequently help to solve the segmentation problems which result from the mismatches between prosodic and syntactic units. In addition to formal morphosyntactic features, performance features like the frequency of occurrence of a given unit and the frequency of co-occurrence patterns of units also seem to play a role in the distributional analysis. Before presenting empirical evidence for this type of distributional analysis we will briefly discuss how distributional information might help the child to arrive at more adequate syntactic structures of the input especially for those cases where mismatches between prosodic and syntactic units have been reported. For instance, taking into account co-occurrence patterns of syllables can help the child to determine the correct word boundaries for words that are not stressed on the first syllable. That is, if the child recognizes that a string like -tility is always preceded by the string fer- she may conclude that the whole string forms a higher level unit, i. e. a word. Furthermore, the occurrence of function words as either cliticized or non-cliticized elements across varying contexts can be used by the child to recognize that strings like woman's consists of two different lexical units, that is in other contexts the auxiliary appears non-clitized in its full form: Is the woman going to a party?. The same mechanism can work for the identification of complex syntactic units. A given syntactic phrase can occur in different positions in a sentence: Peter gave the book to Mary — Peter gave Mary the book. In both cases, cross-utterance comparisons can thus help to determine the underlying syntactic structure.

2.3.2

Children's Sensitivity to Non-Prosodic Information

There is growing evidence that children are able to perform detailed distributional analyses from very early on and that non-prosodic distributional properties of the input become more and more important during the second half of the first year of life. One domain where early distributional learning has been shown to occur is on the segmental level. In this domain, learning the characteristics of the mother tongue must involve two different aspects. First, languages differ concerning their sound inventories. For example, English has the dental fricative /th/ like in the

50

Barbara Höhle and Jürgen Weissenborn

word the which does not exist in German. On the other hand German has the velar fricative /ch/ like in the word Dach (roof) which does not belong to the English inventory. At the age of nine months, children seem to have learned which sounds belong to their mother tongue and which do not. That is, they are now able to recognize their mother tongue on the basis of segmental information alone (104). The fact, that six-month-olds do not show this preference in the same condition suggests that the differentiation between native and non-native sounds is the result of a learning process which takes place in the second half of the first year of life. The exposure to a language with its specific sound inventory even influences basic perceptual mechanisms. Children are born with abilities for finegrained phonetic discrimination which exceed by far those of adults. Adults have better discrimination abilities for phonetic contrasts that are relevant in their language whereas newborns do not show better discrimination performance for native as compared to foreign phonetic contrasts. This picture changes at the end of the first year of life when children start to show the same pattern as adults, i. e. better discrimination performance for native than for foreign phonetic contrasts (for a review see 76). The identification of individual segments of the target language is a prerequisite for the acquisition of phonotactic restrictions. Phonotactic restrictions constrain the possibilities of combining phonemes within a syllable. They determine which types of consonant clusters can occur in the onset (phonemes preceding the vowel) or the coda (phonemes following the vowel) of a syllable as well as the possible combinations of the nucleus (vowel) and the coda in a syllable. For instance, English does not allow the sequence /kn/ to occur in syllable initial position. In contrast, there are many German words that begin with this sequence. The language specificity of phonotactic restrictions shows that they can not be reduced to pure articulatory constraints. Children of nine months of age seem to have at least partially acquired the phonotactic rules of their language. They prefer to listen to their mother tongue compared to speech samples of a foreign language which are neither prosodically nor segmentally different from their mother tongue but contain syllables that violate the native phonotactic restrictions (104). This situation is further complicated by the fact that different phonotactic restrictions hold for the onset and the coda of the syllable. Exactly those consonant clusters which are allowed in the onset are not allowed in the coda and vice versa. For instance block is an existing English word but the same consonant cluster that appears in the onset of this word could not appear in the coda of an English monosyllabic word *dobl. Similarly, the sequence /rk/ which appears in the coda of the monosyllabic English word fork could not appear in the onset *rkof. At nine months children seem to have mastered these specific constraints as shown by Friederici and Wessels (105). This sensitivity to phonotactic restrictions must be the result of learning which takes place in the second half of the first year of life since it is not found at the age of six months. Knowing about possible consonant clusters within a syllable provides the child with further information how to segment speech input at least at the syllabic level. If consonant sequences which violate phonotactic restrictions appear in the input it is clear that a syllable boundary must occur within this sequence. For instance,

Discovering Grammar: Rule Formation in First Language Acquisition

51

a sequence like goodgirl must have a syllable boundary between Id/ and Igl because these sounds do not constitute a licit consonant cluster in English. Since word boundaries always coincide with syllable boundaries (but not vice versa) finding syllable boundaries is at least one indicator for word boundaries. Furthermore, positional restrictions for allophonic variations of a phoneme might provide information about word boundaries (106). For instance, in English stop consonants are aspirated more strongly when they occur at word beginnings than when they occur word-internally. Nine-month-old children are not only able to discriminate sound sequences that are allowed in their language from sound sequences that violate the phonotactic restrictions of their language but they also seem to have learned about the frequency of occurrence of legal sound combinations. As Jusczyk, Luce and Luce (107) have shown, children at this age prefer phonotactically legal syllables that are combined of very frequent phoneme combinations over phonotactically legal syllables that are combined of low frequent phoneme combinations. The above reported finding of sensitivity to the legality of a phoneme combination might also be the result of the sensitivity towards the frequency of co-occurrence patterns on the level of sound sequences. Phonotactic non-legal sequences are those that never appear in the input. So both types of findings can be seen as a result of the sensitivity towards the frequency of co-occurrence patterns on the level of single segments. Since this sensitivity could not be shown for six-month-old children it must be that children learn about these co-occurrence patterns of phonemes between six and nine months. Direct evidence that children are able to learn distributional regularities with only a restricted amount of input comes from the already mentioned work by Morgan and Saffran (89). They found that contrary to six-month-olds who rely only on prosodic information nine-month-olds form coherent bisyllabic units from two syllables only if these syllables were presented several times in a constant metrical pattern and in the same order. Children's sensitivity to non-prosodic distributional information is clearly demonstrated in a study by Saffran, Aslin and Newport (108). In this study, synthetic syllable strings were presented to the children in which all prosodic features were held constant but the transitional probabilities within bisyllabic subparts of these strings were varied. Some syllables always occurred together in a constant order which results in a transitional probability of 100%. Other syllables only occurred together in 30% of their occurrences. Only after two minutes of listening experience to these syllable sequences eightmonth-old children seemed to have formed representational units out of the bisyllabic strings with the transitional probability of 100% but not out of those with a lower transitional probability. The reported sensitivity to co-occurrence patterns on the segmental as well as the syllabic level may contribute to the identification of word boundaries. Since words have a constant segmental and syllabic composition but appear across varying contexts, computations of transitional probabilities between segments and syllables would result in high transitional probabilities for segments that belong to one word as compared to combinations that cross word boundaries.

52

2.3.3

Barbara Höhle and Jürgen Weissenborn

Closed-Class Elements as early Anchorpoints for Segmentation and Classification

The findings that children are able to compute the transitional probabilities between different units of speech like segments and syllables from their input and that they treat sequences with high transitional probabilities as units lead to the conclusion that children should represent phoneme sequences which occur very frequently in a fixed order across a number of different contexts as units from very early on. These units might be candidates for early lexical representations from which first top-down analyses of the input could start. They can serve as anchorpoints for further analyses of the input like, for example, the identification and grammatical classification of adjacent words and the determination of order regularities as argued e. g., by Valian and Coulson (109), by Morgan (88), and by Gerken (110). The high frequency of occurrence in different contexts holds especially for a subset of the vocabulary, namely the so-called closed-class elements, e. g., articles, pronouns and conjunctions. Frequency counts show that for example in English the 50 most frequent words are closed-class elements (111). Linguistically, the closed-class is defined by morphological properties. That is, the inventory of this class is fixed. It cannot be changed by productive derivational processes, loan words, etc. like that of the open-class, e. g., nouns, verbs, adjectives and adverbs. From this results the fact that the number of open-class elements exceeds by far the number of closed-class elements. On the other hand, elements of the closed-class occur much more frequently in speech than elements of the open-class: they constitute about 40% of the words of an average English text although they represent not more than 1% of the total vocabulary (112). There are indications that the closed- and the open-class elements do not only have different structural properties but that they behave differently in language processing. On the production side, it has been shown that the two classes are differently involved in speech errors. An error type called 'exchanges' typically affects the positions of open-class elements that deviate from the correct sentence position. Closed-class elements seem not to undergo such exchanges: they are 'stranded' in their original position (e. g., "I left the briefcase in my cigar", intended: the cigar in my briefcase; 113). This indicates that they are processed on different stages during speech production. Differences have also been found on the perceptual side. Thus, the recognition of closed-class elements seems not to be influenced by word frequency to the same degree as this has been shown for open-class elements. It is still unclear whether this difference is really an attribute of the class membership or whether it is related to the differences in word frequency (e. g., 114; 115). As already mentioned the identification of lexical elements in the input can support further syntactic analysis. For example, if an English learning child is already able to recognize the article the in the input and if it hears a sequence like the fertility she can predict that a new word begins after the article. According to the metrical segmentation strategy the next word boundary should occur at the stressed syllable ti. But the syllable fer cannot be a content word on its own since it is only a weak syllable. This might lead the child to attach this syllable to the following string. So, an interaction between morphosyntactic distributional

Discovering Grammar: Rule Formation in First Language Acquisition

53

information and prosodic information might lead to the correct assignment of word boundaries in this case. On the word level, as already mentioned, closedclass elements could contribute to the grammatical classification of lexical elements (88; 110; 116; 117). For instance, articles are typically combined with nouns. From this co-occurrence pattern, children can conclude that all units following an article belong to the same grammatical class. Furthermore, closed-class elements could be used as an indication for boundaries of higher syntactic units or phrases since they typically occur at the edge of syntactic phrases. For example, in English articles always occupy the initial position of a syntactic phrase. The same holds for prepositions. The appearance of a conjunction can be interpreted as a cue for a clause boundary since most of the subordinate clauses are introduced by a conjunction. In addition, given that most closed-class items are syntactic heads their position has far reaching consequences for the word order in a given language. Until now, we have only discussed the elements of the closed class which occur as free morphemes, i. e. independent lexical items. Basically the same properties are displayed by closed-class items which are bound morphemes, i. e. affixes, like inflections: their number is very limited and they have a high frequency of occurrence, and they are subject to stranding in speech errors (e. g., "The park was trucked" intended: The truck was parked; 113). In languages as English or German inflectional endings typically hold the edge positions of words as either suffixes or prefixes so that they may also be used as an indicator for word boundaries. Interestingly, the inventory of affixes is very restricted concerning the phonological form in German. Syllabic inflectional endings do not vary with respect to the vowel since they are all schwa-syllables. The consonants that are contained in inflectional affixes are restricted to only six different ones /s, r n, m, g, t/. Nonsyllabic inflectional endings have another feature that may support their identification and separation from the word stem. The attachment of non-syllabic inflectional endings can lead to word forms that are deviant from the general phonotactic patterns, e. g., to a monosyllabic word with a long vowel followed by a consonant cluster, e. g., er zählt (he counts). In these cases the inflectional ending is considered to be extrasyllabic (e. g., 118). The extrasyllabicity of the inflectional ending, or to put it differently, the specific form of this pattern, may be a cue for the recognition of the special status of the segments representing inflectional endings. Furthermore, inflectional endings may also support the identification of higher syntactic units. In many languages, as for instance in German and in Italian, the elements of certain syntactic phrases have to agree with respect to morphosyntactic features like number, case etc. In the ideal case, this leads to an overt marking of all elements of one phrase with the same inflectional ending, indicating that these words belong together syntactically. As Morgan, Meier and Newport (119) have shown an inflectional marking of syntactic groups of this sort supports the extraction of the rules of an artificial grammar at least with adult learners. Comparable results for children are still lacking, but the observed sensitivity to distributional patterns supports the assumption that children are attuned to this kind of information as well. Additionally, inflectional endings may help the child to determine the grammatical class of words when these endings are specific to a

54

Barbara Höhle and Jürgen Weissenborn

single word class. For example, in German all inflectional endings that involve the consonants It/ and Igl are verb-specific whereas all endings involving /s, r, mI only attach to elements of the noun phrase. The preceding overview of the characteristics of the elements of the closed-class render plausible the hypothesis that an early sensitivity towards these elements in the input should support the acquisition of language specific morphosyntactic regularities. In the following we will report additional findings that support the assumption that children are sensitive form very early on to closed-class elements and syntactic restrictions which are associated with the use of these elements.

2.3.4

Sensitivity to Closed-Class Elements in Preverbal Children

According to our hypothesis outlined above, function words and inflectional endings have an important function for the acquisition of syntax from very early on. This hypothesis implies that children are sensitive to these elements and the information that can be extracted from them already in the early phases of language acquisition. One crucial counter argument against this view has been the observation that children productively use closed-class elements much later than open-class elements. Most of the first multi-word utterances that children produce around the age of one and half to two years consist only of open-class elements while the closed-class elements are generally missing. This observation has lead to the conclusion that closed-class elements are acquired later than open-class elements. In addition to semantic and syntactic differences between the two classes it has been proposed that phonological differences are responsible for this dissociation. In many languages, function words as well as inflectional endings are usually unstressed in spoken language. Furthermore, most of the function words are monosyllabic, have a simple syllabic structure and often even fail to reach the phonological requirements for a minimal word (112). These characteristics result in a lesser degree of perceptual salience for these elements compared to the openclass elements and therefore to a perceptual disadvantage for the closed-class elements. This perceptual disadvantage may in turn slow down the acquisition of these elements (40). Closer investigations of children who already produce open-class but not closedclass elements suggest that the observations from spontaneous language production are not a reliable indicator of the children's linguistic competence. For children of 16 months on it has been shown that they process closed-class elements appearing in their speech input (120-122). According to these results, the omissions of closed-class elements in the utterances of the children does not reflect the absence of these elements in the linguistic representations that underly language processing in very young children. Rather, the omission of closed-class items seems to be due to other reasons like the already mentioned preference for trochaic patterns as suggested by Gerken (92; 93). Our assumption that closed-class elements guide early segmentation and classification processes during language acquisition presupposes that children are sensitive to these elements even earlier than has been found in previous studies. To test

Discovering Grammar: Rule Formation in First Language Acquisition

55

this hypothesis we conducted an experiment in order to find out whether children younger than 15 months are able to recognize function words presented to them in continuous speech. For this experiment we used the head turn preference paradigm which has been used in a wide range of studies concerned with the development of linguistic knowledge during the first year of life. With this method, the attention of children towards an acoustic stimulus can be measured. The child is seated on her mother's lap in a three sided test booth. In front of the child on the center wall of the booth is a fixed green lamp. On each of the side walls, a red lamp is fixed in a 90° angle to the child's viewing direction. At the same position as the side lamps, two loudspeakers are mounted on the outside of the booth. One experimental trial goes as follows. At the beginning the green lamp at the center wall starts to blink. When the child orients his view to this lamp one of the red lamps on the side wall, namely on that side where the next acoustic stimulus will be presented, starts to blink. When the child turns her head to look at the now blinking red lamp the presentation of the acoustic stimulus is started. The presentation of the acoustic stimulus is stopped when the child turns away her head for more than two seconds. Thus, by her head-turns the child can manipulate the duration of the presentation of the single speech stimuli. The dependent variable in this kind of experiment is the duration of the head turn to the side where the stimulus is presented while listening to it. The head turn durations are interpreted as a measurement of the child's listening preferences. In most experiments in which this paradigm was used, children showed longer head turn durations while listening to acoustic stimuli that contained features the children were already familiar with. For instance, longer head turn durations were observed when stimuli of the child's mother tongue are presented compared to stimuli of a foreign language (104). The method can also be applied to investigate preferences of the children within her mother tongue. For instance, longer head turn durations were observed for more frequent compared to less frequent structures within one and the same language (86). It should be mentioned that contrary to this trend, some more recent studies found longer head turn durations while presenting 'new' material (90; 108). It is not yet clear which variables are responsible for the differences in the directions of the observed preferences. But the direction of the preference is only of secondary importance because in most cases the crucial question is whether children can discriminate speech stimuli which differ with respect to specific features. Thus, independently of the direction of the preference the method yields insights into the children's actual linguistic knowledge as well as into the children's processing capacities for spoken language. In combining a familiarization and a testing phase within one single experiment the head turn preference paradigm can also be used to investigate the question what kind of patterns or regularities children pick out of speech sequences. Using this method, Jusczyk and Aslin (123) showed that children from seven to eight months of age are able to recognize words in continuous speech that were presented to them in isolation during the learning phase. The critical words were four different monosyllabic nouns. For each word a short text was constructed in which it appeared once in each sentence. In the familiarization phase, two of the words were presented to the children repeatedly until the child had listened to each word

56

Barbara Höhle and Jürgen Weissenborn

for a predetermined amount of time. During the immediately following testing phase all four text passages were presented several times to the children. Jusczyk and Aslin found that the children had longer head turn durations while listening to a text that contained the previously presented words than compared to the texts that did not contain the previously presented words. Since all children heard the same four texts but were exposed to different words during the familiarization phase this result cannot be caused by a general preference to some of the texts but has to be the consequence of the former exposure to the critical words. This result suggests that at the age of seven to eight months children have at least some capacity to recognize open-class words in continuous speech. Using the same method we examined the ability of children from seven to 15 months to recognize closed-class elements in continuous speech. As target words four different closed-class elements were chosen, i. e. two determiners sein (his) and das (the) and two prepositions bis (up to) and von (from). For each target word a text of six sentences each containing the critical word was constructed. Across the sentences the target words appeared in different sentence positions. In the familiarization phase of the experiment, two words out of this set were presented to the children. After having listened at least for 30 seconds to each of these words the four texts were presented four times in different orders. We tested 34 children from seven to 15 months of age. For the texts containing the target words we found average head turn durations of 8.33 sec, while the children showed significant shorter head turn durations of 7.24 sec on the average for the texts without the test items. This difference appeared for 28 of the children tested. To find out whether there was a developmental change with respect to the reactions to the target words we compared the results of the younger half of our sample with the results of the older half. As Figure 1 shows no such tendency appeared. For both age groups, the listening times to text passages containing the previously presented items were significantly longer compared to the listening times to the text passages without them. Our results suggest that children can recognize functional elements in spoken language already in the second half of their first year of life. However, on the basis of our experiment, we cannot decide whether we tapped already existing lexical representations or whether the longer head turn durations for the texts with the previously presented items only reflects the recognition of items for which an internal representation was built up during the familiarization phase of the experiment. Another finding in our study raises the possibility that pre-experimental knowledge about the target words might influence the reactions in the testing phase. Among the children in the younger age group, the largest difference in the head turn duration to a text passage with a familiarized item as compared to the same text without familiarization appeared for the text containing the article das. This word form had the highest word frequency of the target words used in our study. According to our hypothesis, word frequency is one of the parameters that determines when in development a lexical representation for a certain word form is built up. But since the influence of lexical parameters was not systematically varied in our study further research is needed to evaluate this possibility systematically. A central result of the presented study is that even with children as young

Discovering Grammar: Rule Formation in First Language Acquisition

57

10000 . Β familiar Q unfamiliar

9000 .. 8000 - -

7000 .. 6000 . .

®

5000 - -

»8 sm

4000 . 7-9 montholds

Sm 10-15 montholds

Fig. 2.1: Mean head turn duration for familiar and unfamiliar items.

as seven to nine months we could not find any evidence for the assumption that closed-class items are harder to perceive than open-class items (for more details of the study, see 124).

2.3.5

Sensitivity to Syntactic Restrictions Associated with ClosedClass Elements

If, as we have shown, children can identify closed-class elements already below the age of ten months they may also detect the syntactic regularities that are associated with the use of closed-class elements f r o m early on, even before they produce these elements regularly. This assumption is, for example, supported by the observation that f r o m the start German children place the finite verb almost always correctly in sentence final position in embedded clauses introduced by a complementizer as shown in (1): (1)

Bert sagt, daß Lisa Oma hilft (Bert says Lisa grandmother helps)

Embedded clauses introduced by the complementizer daß (that) arise in the spontaneous utterances of children around 3;00 and 4;00 years of age ( e. g., 125; 126). Since complementizerless embedded clauses like in (2) (2)

Bert sagt, Lisa hilft Oma (Bert says that Lisa helps grandmother)

in which the finite verb must occupy the second position also exist in German the children must have found out about the relationship between the presence of a complementizer, i. e. a closed-class element, and the position of the finite verb before they start to produce sentences like (1) and (2).

58

Barbara Höhle and Jürgen Weissenborn

We tested this hypothesis with a sentence imitation task with children ranging from 2;06 to 6;00 years of age. That is, the youngest children were below the age at which the complementizer daß is used spontaneously. The imitation task was chosen because it has been shown that the younger the children are the less often they reproduce ungrammatical stimuli in a parrot-like, i. e. literal way. Instead, they change them in a way which, we assume, reflects their actual grammatical knowledge (for a discussion of this experimental procedure see e.g., 127). The material for the imitation task consisted of 32 pairs of sentences as shown in (1) and (2). For each of these sentences an ungrammatical sentence was constructed that differed from the grammatical one only by the position of the verb: (3)

Bert sagt, daß Lisa Oma hilft > *Bert sagt, daß Lisa hilft Oma

(4)

Bert sagt, Lisa hilft Oma > * Bert sagt, Lisa Oma hilft

According to our hypothesis we predicted that even the youngest children in our study should be sensitive to the ungrammatical sentences showing by this that their grammatical knowledge was more adult like than from what could have been assumed on the basis of their productive linguistic behavior. That is more specifically, we expected that the responses of the children to the ungrammatical sentences should reveal a special sensitivity to the presence or the absence of the complementizer, a reaction which would support the assumption about the crucial role closed-class elements play in the construction of children's grammatical knowledge. The results largely supported our hypotheses. Even the youngest children of our study, i. e. the two and a half to four year olds, were sensitive to the grammaticality of the stimulus sentences. This was shown by the finding that grammatical sentences were repeated literally in 59% of the cases whereas ungrammatical sentences were repeated literally only in 40% of the cases. A further analysis of the non-literal responses to ungrammatical sentences revealed first, that most of them were in fact corrections (82%), and second, that most of these corrections involved the complementizer (see Fig. 2.2). That is, the ungrammatical sentences without complementizer were mainly corrected by inserting the complementizer, the ungrammatical sentences with complementizer were mainly corrected by deletion of the complementizer. These findings support our assumption that children have adult-like grammatical knowledge in this syntactic domain even if they don't display it yet in their spontaneous speech, and furthermore that children are indeed especially sensitive to the complementizer. This sensitivity becomes more apparent in view of the fact that for both ungrammatical sentence types a choice between two possible ways of corrections was possible. In each case the correction could either involve the complementizer or a change of the position of the finite verb as shown in (5): (5)

(i) Ungrammatical test sentence: * Bert sagt, daß Lisa hilft Oma (Bert says that Lisa helps grandmother)

Discovering Grammar: Rule Formation in First Language Acquisition 100

59

gg deletion of comp.

90 insertion of comp.

80 70

a verb movement

0) o>

60

in adverb deletion

c ο α α

50

ra

m

object deletion

40 30 20 10 0 4 years

2-3 vears

Fig. 2.2: Types of corrections for ungrammatical sentences in 2;06—3;00 and 4;00 year old children.

Possible corrections: a. Complementizer deletion: Bert sagt, Lisa hilft Oma (Bert says Lisa helps grandmother) b. Movement of the verb into the final position: Bert sagt, daß Lisa Oma hilft (Bert says that Lisa grandmother helps) (ii) Ungrammatical test sentence: *Bert sagt, Lisa Oma hilft (Bert says Lisa grandmother helps) Possible corrections: a. Complementizer insertion: Bert sagt, daß Lisa Oma hilft (Bert says that Lisa grandmother helps) b. Movement of the final verb into the second position: Bert sagt, Lisa hilft Oma (Bert says Lisa helps grandmother) It is worth noting that adults did not show the same correction behavior as children. Although adults like children inserted a complementizer when the ungrammatical sentences did not contain one, in contrast to the children they changed the position of the finite verb when the ungrammatical stimulus sentence did already contain the complementizer. In a further experiment we wanted to show that even younger children are sensitive to the specific complementizer related word-order constraints in German embedded clauses. This experiment is still under way. We used the same method as in our study on the sensitivity to closed-class elements, namely the headturn preference paradigm. As Santelman and Jusczyk (128) have shown this paradigm is also useful to investigate children's sensitivity to grammatical violations since they found longer listening times to grammatical sentences as compared to ungrammatical sentences for children in the age of 18 months. In our study the same sentence types are used as in the imitation task. In order to make sure that a differentiation between the grammatical and the ungrammatical sentences was not

60

Barbara Höhle and Jürgen Weissenborn

possible solely on the basic of prosodic information we run a pretest with adults. For this pretest the test material was low-pass-filtered. The filtered stimuli were presented to the subjects in pairs of an originally grammatical and the corresponding ungrammatical sentence. The task was to decide which one of a pair of stimuli sounded more normal. For the pairs containing a complementizer, the grammatical sentence was chosen in 55% of the time. For the pairs without a complementizer, the grammatical sentence was chosen in 48.3% of the time. This pattern of decisions for the grammatical sentences did not differ significantly from the 50% chance level. This means that adult speakers did not detect any prosodic, grammaticality related information which influenced their choice in favor of the prosodic patterns of the grammatical test sentences. Seventeen children with a mean age of about 20 months have been tested until now. The results we are presenting in the following are preliminary in the sense that the number of subjects is still to small in order to consider our findings as final. Up to now we only found a small but not significant preference in the 20month-olds for the grammatical sentences. This could mean that children at this age do not yet detect any structural difference between the test items. However, a closer analysis of the data showed that the children had a significant preference for sentences without complementizers over sentences with complementizers (see Fig. 2.3): 11000

T

10000 .. 9000 .. • grammatical

8000 ..

• ungrammatical

7000 .. 6000 . .

5000 .. 4000 ._ with comp.

without comp.

Fig. 2.3: Mean head turn duration for grammatical and ungrammatical sentences with and without complementizers.

If these results turn out to be robust they would show that children of about 20 months of age are sensitive to the presence or the absence of the complementizer, i. e. that they have identified it, and that the sentences with and without complementizers must be different for them at some formal level of representation. That is, assuming further that the children have identified the verb which should be not

Discovering Grammar: Rule Formation in First Language Acquisition

61

controversial given their productive use of verbs around 20 months, they can now engage into the distributional learning process which should lead to the recognition of the complementary distribution of the complementizer and the finite verb in embedded clauses. In this section we have presented evidence from sentence repetition tasks that children between age 2;6 and 4;0 show knowledge about the dependency between the presence of a complementizer and the position of the finite verb in embedded clauses earlier than their production data suggest. Furthermore our results support the predicted early sensitivity to functional elements, e. g., daß, the initial absence of which from the child's productions consequently cannot be due to the fact that the child does not perceive this element in the input or to a lack of grammatical knowledge (for a more detailed presentation of the preceding study see 24).

2.4

Conclusions

The preceding discussion of children's language processing capacities during early language development has shown that already the very young child has perceptual and computational (e. g., attention and memory) abilities which allows her to identify language specific properties of the linguistic input. Apparently these abilities undergo a developmental change during the first year of life. During the first six months we can observe that the child becomes increasingly sensitive to the prosodic properties of the target language. Insofar as these are unambiguously isomorphic with syntactic structures this could signify that the first syntactic regularities are built up simultaneously with prosodic ones. But obviously prosodic information alone, as we have seen, is not sufficient to bootstrap into grammar given the existence of mismatches between prosodic and morphosyntactic structures. It has to be supplemented by phonotactic and morphosyntactic information. It is during the second half of the first year of life that this kind of information becomes accessible to the child. The identification of linguistically relevant units like syllables which now constitute additional input to the distributional learning mechanisms of the child largely increases the child's possibilities to extract grammatical regularities from the speech signal, i. e. to discover the rules of her mother tongue. The decrease of the importance of prosodic information for the processing of the input becomes especially apparent in our finding that children younger then 10 months are able to identify unstressed closed-class elements in the input. This early capacity of the child has far-reaching learning-theoretical consequences. As we have pointed out, a subset of the closed-class elements, the functional categories, play a central role in linguistic theory since their properties are supposed to determine the characteristics of the grammar of a given language. Thus the earlier the child gets access to the information encoded in functional elements the sooner language specific grammatical knowledge can be built up. That this must be the case, even before the child starts to use certain functional elements spontaneously, follows from the almost errorless acquisition in specific syntactic domains. This assumption was supported by the findings of our study on the sensitivity of children to word order violations in embedded clauses which showed that,

62

Barbara Höhle and Jürgen Weissenborn

as predicted, the complementizer daß (that) must be visible to the child long before she uses it. It should also be pointed out that this very early sensitivity to functional elements revealed by our studies is to be expected under a parameter setting approach to language acquisition in which these elements play a crucial role as potential trigger information. Furthermore, the findings from the study of early language acquisition render plausibility to the assumptions of linguistic theory about the role of functional elements for the specific form the grammar of a given language can take. The view of language acquisition we have presented so far obviously needs further investigations especially in the domain of the acquisition of the lexicon which has not been addressed at all in this paper. It seems reasonable to assume that at least some formal elements must have been identified in the input before a form - meaning mapping can take place. In this sense the acquisition phenomena we have focused on in our discussion constitute a prerequisite to the acquisition of the lexicon. To conclude we want to briefly point out possible consequences of our actual understanding of first language acquisition for our view of developmental language disorders and untutored second language acquisition. If, as we have seen, prosodic and morphosyntactic bootstrapping capacities are crucial for initiating the child's discovery of grammar, it is tempting to assume that the differences between first language acquisition and language acquisition in developmental language disorders as well as in second language learning may be related to differences in the bootstrapping capacities of these two different classes of learners. With respect to developmental language disorders, i. e. Specific Language Impairments (SLI), recent work by Penner (129) and Fikkert, Penner, and Wyman (130) suggests that the linguistic deficits of SLI children may stem from reduced bootstrapping capacities for crucial prosodic and morphosyntactic information in the speech signal. With respect to untutored second language learning we may want to say that whereas in first language acquisition the child initially focuses on formal features of the input to extract regularities, a fact which explains why in children learning different languages we do not observe a stage of shared linguistic representations, the learning process of the second language learner, contrary to the child, is initially 'conceptually' not 'formally' driven. If we assume that the level of conceptual representations is basically universal this would explain that all second language learners end up at some point in development with a largely similar grammar which Klein and Perdue (131) have called the 'Basic Variety'.

Acknowledgments We want to thank the parents, the children and the teachers of the various schools which participated in our studies. Special thanks go to Dorothea Kiefer and Katja Kühn for their help in collecting the data, to Damir Cavar, Tom Roeper and Zvi Penner for discussion, and to Susan Powers, Anja Ischebeck and Michaela Schmitz for discussion and various help with the manuscript. The research for this study was partially supported by a grant to Jürgen Weissenborn from the Berlin-

Discovering Grammar: Rule Formation in First Language Acquisition

63

Brandenburg Academy of Science in the framework of the working group "Rule learning and rule knowledge in biological systems" and by the Deutsche Forschungsgemeinschaft (DFG) in the framework of the Innovationskolleg "Formal models of cognitive complexity".

References 1. Pinker, S. (1995). The Language Instinct (New York: Morrow). 2. Landau, B. and Gleitman, L. (1985). Language and Experience: Evidence from the Blind Child (Cambridge, Mass.: MIT Press). 3. Andersen, E., Dunlea, Α., and Kekelis, L. (1984). Blind children's language: Resolving some differences. J. Child Lang. 11, 645-664. 4. Newport, E. and Meier, R. (1984). The acquisition of American sign language. In: The Cross-Linguistic Study of Language Acquisition, Vol. 1, D. Slobin ed. (Hillsdale, N. J.: Lawrence Erlbaum), pp. 881-938. 5. Schaner-Wolles, C. (1994). Intermodular synchronization: On the role of morphology in the normal and impaired acquisition of a verb-second language. In: How Tolerant is Universal Grammar? Essays on Language Learnablity and Language Variation, R. Tracy and E. Lattey, eds. (Tübingen: Niemeyer), pp. 205-224. 6. Cromer, R. (1992). A case study of dissociations between language and cognition. In: Constraints on Language Acquisition. Studies of Atypical Children, H. Tager-Flusberg, ed. (Hillsdale, N. J.: Lawrence Erlbaum), pp. 141-153. 7. Rondal, J. (1995). Exceptional Language Development in Down Syndrome (Cambridge: Cambridge University Press). 8. Smith, N. and Tsimpli, I.-M. (1995). The Mind of a Savant. Language Learning and Modularity (Oxford: Blackwell). 9. Karmiloff-Smith, A. (1992). Beyond Modularity: A Developemtal Perspec-

10.

11.

12.

13.

14.

15.

16.

17.

18.

tive on Cognitive Science (Cambridge, Mass.: M I T Press). Slobin, D. (1986). Crosslinguistic evidence for the language-making capacity. In: The Cross-Linguistic Study of Language Acquisition, Vol. 2, D. Slobin, ed. (Hillsdale, N. J.: Lawrence Erlbaum), pp. 1157—1256. Lenneberg, Ε. (1967). Biological Foundations of Language (New York: Wiley). Marler, P. (1991). The instinct to learn. In: The Epigenesis of Mind: Essays on Biology and Cognition, S. Carey and R. Gelman, eds. (Hillsdale, N. J., Lawrence Erlbaum), pp. 37—66. Hubel, D. and Wiesel, Τ. (1963). Receptive fields of cells in striate cortex in very young, visually inexperienced kittens. Journal of Neurophysiology 26, 994-1002. Goodman, R. and Whitaker, H. (1985). Hemispherectomy: A review with special reference to the linguistic abilities and disabilities of the residual right hemisphere. In: Hemispheric Function and Collaboration in the Child, C. Best, ed. (Orlando: Academic Press), pp. 121-155. Curtiss, S. (1977). Genie: A Psycholinguistic Study of a Modern-day Wild Child (New York: Academic Press). Obler, L. (1985). Language through the life-span. In: The development of Language, J. Berko Gleason, ed. (Columbus: Merrill), pp. 277 — 305. Newport, E. (1990). Maturational constraints on language learning. Cognitive Science 14, 11-28. Johnson, M., ed. (1993). Brain Development and Cognition. A Reader (Oxford: Blackwell).

64

Barbara Höhle and Jürgen Weissenborn

19. Chomsky, Ν. (1959). Review: Verbal behavior, by Skinner, Β. F., Language 35, 25-58. 20. Fodor, J. (1983). The Modularity of Mind (Cambridge, Mass.: MIT Press). 21. Penner, Ζ. and Weissenborn, J. (1996). Strong continuity, parameter setting and the trigger hierarchy. On the acquisition of the DP in Bernese Swiss German and High German. In: Generative Perspectives on Language Acquisition: Empirical Findings, Theoretical Considerations, Crosslinguistic Comparisons, H. Clahsen, ed. (Amsterdam: John Benjamins), pp. 161 — 200.

22. Schönenberger, Μ., Penner, Ζ., and Weissenborn, J. (1997). Object placement and early German grammar. In: Proceedings of the 21st Annual Boston University Conference on Language Development, Vol. 2, E. Hughes, M. Hughes and A. Greenhill, eds. (Somerville, Mass.: Cascadilla Press), pp. 539-549. 23. Weissenborn, J. (1994). Constraining the child's grammar: Local wellformedness in the development of verb movement in German and French. In: Syntactic Theory and Language Acquisition: Crosslinguistic Pespectives, Vol. 1: Phrase Structure, B. Lust, M. Suner, and J. Whitman eds. (Hillsdale, N. J.: Lawrence Erlbaum), pp. 215-247. 24. Weissenborn, J., Höhle, Β., Kiefer, D., and Cavar, D. (1998). Children's sensitivity to word-order violations in German: Evidence for very early parameter-setting. In: Proceedings of the 22nd Annual Boston Conference on Language Development, Vol. 2, A. Greenhill, M. Hughes, H. Littlefield, and H. Walsh, eds. (Somerville, Mass.: Cascadilla Press), pp. 756-767. 25. Meisel, J. (1986). Word order and case marking in early child language. Evidence from simultaneous acquisition of two first languages: French and German. Linguistics 24, 123—183.

26. Meisel, J., ed. (1990). Two First Languages. Early Grammatical Development in Bilingual Children (Dordrecht: Foris). 27. Tracy, R. (1995). Child languages in contact. Bilingual language acquisition (English/German) in early childhood, ms. University of Tübingen. 28. Pinker, S. (1984). Language Learnability and Language Development (Cambridge, Mass.: Harvard University Press). 29. Culicover, P. and Wilkins, W. (1984). Locality in Linguistic Theory (Orlando: Academic Press). 30. Fodor, J. and Crain, S. (1987). Simplicity and generality of rules in language acquisition. In: Mechanisms of Language Acquisition, B. MacWhinney, ed. (Hillsdale, N. J.: Lawrence Erlbaum), pp. 35—63. 31. Chomsky, N. (1965). Aspects of the Theory of Syntax (Cambridge, Mass.: MIT Press). 32. Chomsky, N. (1986) Knowledge of Language: Its Nature, Origin, and Use (New York: Praeger). 33. Roeper, T. and Weissenborn, J. (1990). How to make parameters work. In: Language Processing and Language Acquisition, L. Frazier and J. de Villiers, eds. (Dordrecht: Kluwer Academic Publishers), pp. 147-162. 34. Borer, H. and Wexler, K. (1987). The maturation of syntax. In: Parameter Setting, T. Roeper and E. Williams, eds. (Dordecht: Reidel), pp 123-177. 35. Braine, M. (1976). Children's First Word Combinations. In: Monographs of the Society of Research in Child Development 41. 36. Felix, S. (1984). Maturational aspects of Universal Grammar. In: Interlanguage, A. Davis, C. Criper, and A. Howatt, eds. (Edinburgh: Edinburgh University Press), pp. 131-161. 37. MacNamara, J. (1982). Names for Things: A Study of Human Learning (Cambridge, Mass.: MIT Press). 38. Hyams, N. (1986). Language acquisition and the theory of parameters (Dordrecht: Reidel).

Discovering Grammar: Rule Formation in First Language Acquisition 39. Hulk, A. and van der Linden, E. (1996). Language mixing in a FrenchDutch bilingual child. In: TTiA 55,2. A selection of papers, Eurosla VI, 89-103. 40. Gleitman, L. R. and Wanner, E. (1982). Language acquisition: the state of the state of the art. In: Language Acquisition: The State of the Art, E. Wanner and L. R. Gleitman, eds. (Cambridge, Mass.: Cambridge University Press), pp. 3—48. 41. Goldman-Eisler, F. (1972). Pauses, clauses, sentences. Language and Speech 75, 103-113. 42. Garro, L. and Parker, F. (1982). Some suprasegmental characteristics of relative clauses in English. Journal of Phonetics 10, 149-161. 43. Grosjean, F., Grosjean, L., and Lane, H. (1979). The patterns of silence: Performance structures in sentence production. Cogn. Psychol. 11, 5 8 - 8 1 . 44. Ferreira, F. (1993). Creation of prosody during sentence production. Psychol. Rev. 100, 233-253. 45. Klatt, D. H. (1975). Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics 3, 129-140. 46. Price, P. J., Ostendorf, M., ShattuckHufnagel, S., and Fong, C. (1991). The use of prosody in syntactic disambiguation. J. Acoust. Soc. Am. 90, 2956-2970. 47. Scott, D. (1982). Duration as a cue to the perception of a phrase boundary. J. Acoust. Soc. Am. 71, 996-1007. 48. Warren, P., Grabe, Ε., and Nolan, F. (1995). Prosody, phonology and parsing in closure ambiguities. Language and Cognitive Processes 10, 457—486. 49. Cooper, W. E. and Sorensen, J. M. (1977). Fundamental frequency contours at syntactic boundaries. J. Acoust. Soc. Am. 62, 683-692. 50. Kutik, E. J., Cooper, W. E., and Boyce, S. (1983). Declination of fundamental frequency in speakers' production of parenthetical and main

51.

52.

53.

54.

55. 56.

57.

58.

59.

60.

61.

62.

65

clauses. J. Acoust. Soc. Am. 73, 1731-1738. Lehiste, I., Olive, J. P., and Streeter, L. A. (1976). Role of duration in disambiguating syntactically ambiguous sentences. J. Acoust. Soc. Am. 60, 1199-1202. Marslen-Wilson, W. D., Tyler, L. K., Warren, P., Grenier, P., and Lee, C. S. (1992). Prosodic effects in minimal attachment. Quart. J. Exp. Psychol. 45, 73-87. Pynte, J. and Prieur, B. (1996). Prosodic breaks and attachment decisions in sentence parsing. Language and Cognitive Processes 11, 165—191. Streeter, L. A. (1978). Acoustic determinants of phrase boundary perception. J. Acoust. Soc. Am. 64, 1582-1592. Spencer, A. (1996). Phonology (Cambridge, Mass.: Blackwell). Lehiste, I. (1973). Rhythmic units and syntactic units in production and perception. Jour. Acoust. Soc. Am. 54, 1228-1234. Cutler, A. (1994). Segmentation problems, rhythmic solutions. Lingua 92, 81-104. Cutler, A. and Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabulary. Computer Speech and Language 2, 133-142. Cutler, A. and Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. J. Mem. Lang. 31, 218-236. Cutler, A. and Norris, D. (1988). The role of strong syllables in segmentation for lexical access. J. Exp. Psychol.: Hum. Percep. Perform. 14, 113-121. Mazuka, R. (1996). Can a grammatical parameter be set before the first word? Prosodic contributions to early setting of a grammatical parameter. In: Signal to Syntax, J. L. Morgan and K. Demuth, eds. (Mahwah: Lawrence Erlbaum), pp. 313 — 330. Nespor, M., Guasti, Μ. T., and Christophe, A. (1996). Selection word order: The rhythmic activation principle. In:

66

Barbara Höhle and Jürgen Weissenborn Interfaces in Phonology, U. Kleinhenz, ed. (Berlin: Akademie Verlag), pp. 1 — 26.

63. Lieberman, P., Katz, W., Jongman, Α., Zimmerman, R., and Miller, M. (1985). Measures of the sentence intonation of read and spontaneous speech in American English. J. Acoust. Soc. Am. 77, 649-657. 64. Fernald, A. and Simon, T. (1984). Expanded intonation contours in mother's speech to newborns. Dev. Psychol. 20, 104-113. 65. Fisher, C. and Tokura, H. (1996). Prosody in speech to infants: Direct and indirect acoustic cues to syntactic structure. In: Signal to Syntax, J. L. Morgan and K. Demuth, eds. (Mahwah: Lawrence Erlbaum), pp. 343 — 363. 66. Ratner, Ν. B. (1986). Durational cues which mark clause boundaries in mother-child speech. Journal of Phonetics 12, 285-289. 67. Ratner, Ν. B. (1984). Patterns of vowel modification in mother-child speech. J. Child Lang. 11, 557-578. 68. Cooper, R. P. and Aslin, R. N. (1990). Preference for infant-directed speech in the first month after birth. Child Dev. 61, 1584-1595. 69. Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior and Development 8, 181-195. 70. Werker, J. F. and McLeod, P. J. (1989). Infant preference for both male and female infant-directed-talk: A developmental study of attentional and affective responsiveness. Can. J. Psychol. 43, 230-246. 71. Fernald, Α., Taeschner, T., Dunn, J., Papousek, M., De Boysson-Baries, B., and Fukui, I. (1989). A cross-language study of prosodic modifications in mothers' and father's speech to perverbal infants. J. Child Lang. 16, 4 7 7 501. 72. Ochs, E. and SchiefTelin, B., eds. (1979). Developmental Pragmatics (New York).

73. Mehler, J., Jusczyk, P. W., Lambertz, G., Halsted, G., Bertoncini, J., and Amiel-Tison, C. (1988). A precursor of language acquisition in young infants. Cognition 29, 143-178. 74. Lieberman, P. (1996). Some biological constraints on the analysis of prosody. In: Signal to Syntax, J. L. Morgan and K. Demuth, eds. (Mahwah: Lawrence Erlbaum), pp. 55—65. 75. Moon, C., Panneton-Cooper, R., and Fifer, W. P. (1993). Two-day-olds prefer their native language. Infant Behavior and Development 16, 495 — 500. 76. Jusczyk, P. W. (1997). The Discovery of Spoken Language (Cambridge, Mass.: MIT Press). 77. Jusczyk, P. W. and Thompson, E. (1978). Perception of a phonetic contrast in multisyllabic utterances by 2month-old-infants. Perception and Psychophysics 23, 105-109. 78. Sansavini, Α., Bertoncini, J., and Giovanelli, G. (1997). Newborns discriminate the rhythm of multisyllabic stressed words. Dev. Psychol. 33, 3-11. 79. Spring, D. R. and Dale, P. S. (1977). Discrimination of linguistic stress in early infancy. Journal of Speech and Hearing Research 20, 224—232. 80. Christophe, Α., Dupoux, Ε., Bertoncini, J., and Mehler, J. (1994). Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. J. Acoust. Soc. Am. 95, 1570-1580. 81. Hirsh-Pasek, K., Kemler Nelson, D. G., Jusczyk, P. W., Wright Cassidy, K., Druss, B„ and Kennedy, L. (1987). Clauses are perceptual units for young infants. Cognition 26, 269—286. 82. Gerken, L. Α., Jusczyk, P. W., and Mandel, D. R. (1994). When prosody fails to cue syntactic structure: Ninemonth-olds' sensitivity to phonological versus syntactic phrases. Cognition 51, 237-265. 83. Kemler Nelson, D. G., Hirsh-Pasek, K „ Jusczyk, P. W., and Wright Cassidy, K. (1989). How prosodic cues in

Discovering Grammar: Rule Formation in First Language Acquisition

84.

85.

86.

87.

88.

89.

90.

91.

92.

93.

motherese might assist language learning. J. Child Lang. 16, 5 5 - 6 8 . Jusczyk, P. W„ Hirsh-Pasek, K., Kemler Nelson, D. G., Kennedy, L., Woodward, Α., and Piwoz, J. (1992). Perception of acoustic correlates of major phrasal units by young infants. Cogn. Psychol. 24, 252-293. Venditti, J. J., Sun-Ah, J., and Beckman, M. (1996). Prosodic cues to syntactic and other linguistic structures in Japanese, Korean and English. In: Signal to Syntax, J. L. Morgan and K. Demuth, eds. (Mahwah: Lawrence Erlbaum), pp. 287-311. Jusczyk, P. W., Cutler, Α., and Redanz, N. (1993). Preference for the predominant stress patterns of English words. Child Dev. 64, 675-687. Mandel, D., Jusczyk, P. W., and Kemler Nelson, D. G. (1994). Does sentential prosody help infants to organize and remember speech information. Cognition 53, 155-180. Morgan, J. L. (1997). A rhythmic bias in preverbal speech segmentation. J. Mem. Lang. 35, 666-688. Morgan, J. L. and Saffran, J. R. (1995). Emerging integration of sequential and suprasegmental information in preverbal speech segmentation. Child Dev. 66, 911-936. Echols, C. H., Crowhurst, M. J., and Childers, J. B. (1997). The perception of rhythmic units in speech by infants and adults. J. Mem. Lang. 36, 2 0 2 225. Newsome, M. and Jusczyk, P. W. (1995). Do infants use stress as a cue for segmenting fluent speech? In: Proceedings of the 19th Annual Boston University Conference on Language Development, D. MacLaughlin and S. McEwen, eds. (Sommerville, Mass.: Cascadilla Press). Gerken, L. A. (1991). The metrical basis for children's subjectless sentences. J. Mem. Lang. 30, 431-451. Gerken, L. A. (1994). A metrical template account of children's weak sylla-

94.

95.

96.

97.

98.

99.

100.

101.

102.

103.

104.

67

ble omissions from multisyllabic words. J. Child Lang. 21, 565-584. Gerken, L. A. (1994). Young children's representation of prosodic phonology: Evidence from English-speakers' weak syllable productions. J. Mem. Lang. 33, 19-38. Jackendoff, R. (1997). The Architecture of the Language Faculty (Cambridge, Mass.: MIT Press). Marcus, M. and Hindle, D. (1990). Description theory and intonation boundaries. In: Cognitive Models of Speech Processing, G. Τ. M. Altmann, ed. (Cambridge, Mass.: MIT Press), pp. 483-512. Selkirk, E. O. (1984). Phonology and Syntax: The Relation between Sound and Structure (Cambridge, Mass.: MIT Press). Selkirk, E. O. (1996). The prosodic structure of function words. In: Signal to Syntax, J. L. Morgan and K. Demuth, eds. (Mahwah: Lawrence Erlbaum), pp. 187—213. Fernald, A. and McRoberts, G. (1996). Prosodic bootstrapping: A critical analysis of the argument and the evidence. In: Signal to Syntax, J. L. Morgan and K. Demuth, eds. (Mahwah: Lawrence Erlbaum), pp. 365—388. Boomer, D. S. and Dittmann, A. T. (1962). Hesitation pauses and juncture pauses in speech. Language and Speech 5, 215-220. Hawkins, P R . (1971). The syntactic location of hesitation pauses. Language and Speech 14, 277—288. Henderson, Α., Goldman-Eisler, F., and Skarbek, A. (1966). Sequential temporal patterns in spontaneous speech. Language and Speech 9, 207-216. Harris, Z. S. (1951). Methods in Structural Linguistics (Chicago: University of Chicago Press). Jusczyk, P. W., Friederici, A. D., Wessels, J. Μ. I., Svenkerud, V. Y., and Jusczyk, A. M. (1993). Infants' sensitivity to the sound patterns of native

68

105.

106.

107.

108.

109.

110.

111.

112.

113.

114.

Barbara Höhle and Jürgen Weissenborn language words. J. Mem. Lang. 32, 402-420. Friederici, A. D. and Wessels, J. Μ. I. (1993). Phonotactic knowledge of word boundaries and its use in infant speech perception. Perception and Psychophysics 54, 287-295. Brent, M. R. and Cartwright, T. A. (1995). Distributional regularity and phonotactics are useful for early lexical acquisition. Cognition 61, 93—125. Jusczyk, P. W., Luce, P.A., and Charles-Luce, J. (1994). Infants' sensitivity to phonotactic patterns in the native language. J. Mem. Lang. 33, 630-645. Saffran, J. R., Aslin, R. N„ and Newport, E., L. (1996). Statistical Learning by 8-Month-Old Infants. Science 274, 1926-1928. Valian, V. and Coulson, A. S. (1988). Anchor points in language learning: The role of marker frequency. J. Mem. Lang. 27, 7 1 - 8 6 . Gerken, L. A. (1996). Phonological and distributional information in syntax acquisition. In: Signal to Syntax, J. L. Morgan and K. Demuth, eds. (Mahwah: Lawrence Erlbaum), pp. 411-425. Kucera, H. and Francis, N. (1967). A Computational Analysis of Present Day English (Providence: Brown University Press). Morgan, J. L., Shi, R., and Allopenna, P. (1996). Perceptual bases of rudimentary grammatical categories: Toward a broader conceptualization of bootstrapping. In: Signal to Syntax, J. L. Morgan and K. Demuth, eds. (Mahwah: Lawrence Erlbaum), pp. 263 — 283. Garrett, M. F. (1980). Levels of processing in sentence production. In: Language production, Vol. 1. Speech and Talk, B. Butterworth, ed. (London: Academic Press), pp. 177—220. Bradley, D. C., Garrett, M. F., and Zurif, Ε. B. (1980). Syntactic deficits in Broca's aphasia. In: Biological Studies

115.

116.

117.

118. 119.

120.

121.

122.

123.

124.

of Mental Processes, D. Caplan, ed. (Cambridge, Mass.: MIT Press). Gordon, B. and Caramazza, A. (1985). Lexical access and frequency sensitivity: Frequency saturation and open/ closed class equivalence. Cognition 21, 95-115. Maratsos, M. (1988). The acquisition of formal word classes. In: Categories and Processes in Language Acquisition, Y. Levy, I. M. Schlesinger and M. D. S. Braine, eds. (Hillsdale, NJ: Erlbaum), pp. 3 1 - 4 4 . Maratsos, M. and Chalkley, M. A. (1983). The internal language of children's syntax: The ontogenesis and representation of syntactic categories. In: Children's Language, Vol. 2, Κ. E. Nelson, ed. (New York: Gardner Press), pp. 127-214. Wiese, R. (1996). The Phonology of German (Oxford: Clarendon Press). Morgan, J. L., Meier, R. P., and Newport, E. L. (1987). Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language. Cogn. Psychol. 19, 498-550. Gerken, L. A. and Mcintosh, B. J. (1993). Interplay of function morphemes and prosody in early language. Dev. Psychol. 29, 448-457. Katz, N., Baker, E., and MacNamara, J. (1974). What's in a name? A study of how children learn common and proper names. Child Dev. 45, 469— 473. Shipley, E. F., Smith, C. S., and Gleitman, L. R. (1969). A study in the acquisition of language: Free responses to commands. Language 45, 322—342. Jusczyk, P. W. and Aslin, R. N. (1995). Infants' detection of the sound patterns of words in fluent speech. Cogn. Psychol. 29, 1 - 2 3 . Höhle, Β. and Weissenborn, J. (1998). Sensitivity to closed-class-elements in preverbal children. In: Proceedings of the 22 Annual Boston Conference on Language Development, Vol. 1,

Discovering Grammar: Rule Formation in First Language Acquisition A. Greenhill, M. Hughes, H. Littlefield, and H. Walsh, eds. (Somerville, Mass.: Cascadilla Press), pp. 348—359. 125. Rothweiler, Μ. (1993). Der Erwerb von Nebensätzen im Deutschen (Niemeyer: Tübingen). 126. Müller, N. and Penner, Z. (1996). Early subordination: the acquisition of free morphology in French, German, and Swiss German. Linguistics 34, 133-165. 127. Lust, B., Flynn, S., and Foley, C. (1996). What children know about what they say: Elicited imitation as a research method for assessing children's syntax. In: Methods for Assessing Children's Syntax, D. McDaniel, C. McKee, and H. Smith Cairns, eds. (Cambridge, Mass.: MIT Press), pp. 5 5 - 7 6 .

69

128. Santelmann, L. and Jusczyk, P. W. (1997). What discontinuous dependencies reveal about the size of the learner's processing window. In: Proceedings of the 21st Annual Boston University Conference on Language Development, Vol. 2, E. Hughes, M. Hughes and A. Greenhill, eds. (Somerville, Mass.: Cascadilla Press), pp. 506—514. 129. Penner, Ζ. (1995). Störungen im Erwerb der Nominalphrase im Schweizerdeutschen. (Luzern: Edition SZH). 130. Fikkert, P., Penner, Z., and Wymann, K. (1998). Das Comeback der Prosodie. Neue Wege in der Diagnose und Therapie von phonlogischen Störungen. Ms. Universität Konstanz. 131. Klein, W. and Perdue, C. (1997). Basic variety, (or: Couldn't natural languages be much easier?) Second Language Research 13, 301-347.

3. Rule-Application During Language Comprehension in the Adult and the Child Anja Hahne and Angela D. Friedend

3.1

Introduction

The capacity to produce and comprehend language is one of the most remarkable capabilities in humans. However, due to the fast processing speed we are not able to directly experience the particular details of how this is actually done. Still, a brief conscious reflection suffices to reveal that the comprehension of sentences must entail a variety of subprocesses carried out in an extremely limited amount of time. On average, auditory speech consists of about three words or about 15 phonemes per second with an upper limit being much higher (1). Thus, there are about three seconds to process a sentence consisting of ten words. In order to comprehend such an utterance our brain has to isolate words within the continuous speech signal. In contrast to our intuition, spoken language does not provide pauses between single words to signal their boundaries. Furthermore the acoustic realization of the speech sounds greatly varies as a function of several parameters, such as speech rate, loudness, surrounding phonemes or speaker's gender. The process of segmenting words within a continuous speech stream is still a topic of debate (cf. 2). Besides segmenting individual words, the meaning of the words has to be activated and the underlying structural relations of the words have to be analyzed. The intonation and stress pattern and the speaker's tone of voice (humor or sarcasm) has also to be examined. Finally, a representation of the whole sentence has to be computed and integrated into the discourse context. Language as a system is highly rule-based. Language rules specify the relationship between individual sounds, meaningful units and the combination of these units at a phrasal level. The knowledge of a set of defined rules enables the speaker to generate meaningful utterances and the listener to comprehend them, respectively. This knowledge allows us to produce and comprehend a — potentially infinite number of different sentences consisting of a finite number of sounds and words although we have never heard or read these particular sentences before. This creative use of language appears to be specific for humans. The rapidity and ease with which children — as long as they are not neurologically impaired or suffer from language deprivation — acquire this enormous rule corpus is most fascinating. Children learn their mother tongue without any formal instruction. By the age they enter school they are already competent language users except for some subtleties (3), i. e., by the age of five their knowledge incorporates implicit rules which professional linguists are still trying to formalize.

72

Anja Hahne and Angela D. Friederici

Jackendoff (4) describes this phenomenon as "paradox of language development". In view of these facts a biological predisposition which enables children to deduce the language rules from the language surrounding them is likely to exist.

3.2

ERPs as a Method for Examining Language Comprehension Processes

Due to the tremendous rapidity of the processes of language comprehension, a precise description of these processes as well as the acquisition of these processes during childhood can only be ensured when using an on-line measure with a sufficient time resolution. One such measure on which we will focus in the remainder of the paper are event-related brain potentials (ERPs). It is characterized by a time resolution in the millisecond range and can provide information at each point in time during the comprehension process, i. e., it is a continuous measure. A second advantage of this method is that it is noninvasive opening up the possibility to apply it routinely to human subjects. ERPs are small voltage changes within the electroencephalogram (EEG) which are time-locked to sensory, motor or cognitive events. The ERP voltage changes are too small in relation to the EEG to bö recognized in the ongoing EEG. Therefore, they have to be extracted by signal processing techniques. The commonly used technique is an averaging procedure. The rationale is that the ERP waveform is uninfluenced by repetition of the same stimulus (or stimulus class) while the background EEG is unrelated to it and varies randomly across events. Averaging of the whole signal thus leads to an isolation of the ERP signal (see 5). The resulting ERP waveform can be described as positive and negative peaks which are also called "components". Components are sensitive towards experimental manipulations and vary in polarity, amplitude and topography (i. e., in their distribution over the scalp). They are traditionally labeled with a letter that characterizes their polarity and a peak latency value (e. g. N400 meaning a negative peak at 400 ms post stimulus onset). The scalp-recorded ERPs reflect the sum of simultaneous postsynaptic activity of several thousand neurons.

3.3

ERPs and Semantic Processing

Kutas and Hillyard (6) were the first employing this method to language processes. They presented sentences in a word-by-word manner on a CRT-screen. While most of the sentences were totally correct, 25% of them ended in a word that was semantically not compatible with the prior sentence context {He spread the warm bread with socks). The ERPs to the semantically anomalous words differed strongly from those to correct sentences. Incorrect sentences were characterized by a large negative deflection in the ERP with a maximum around 400 ms at centro-parietal electrode sites. This component was called N400 (compare Fig. 3.1). Subsequent studies showed that the amplitude of this component was

Rule-Application During Language Comprehension in the Adult and the Child

73

modulated by the strength of the semantic anomaly being larger for stronger violations. Interestingly, a physical anomaly of the terminal word, i. e. a word written in a larger letter size, did not elicit a N400 component but a positivity around 560 ms (6). Thus, the observed negative brain response to semantic anomaly seemed to be rather specific and not just a response to any kind of oddity. McCallum, Farmer and Pocock (7) replicated this result in the auditory domain using sentences that were spoken by a male voice and ended either semantically correct or incorrect or in a male or female voice. Kutas and Hillyard (8) combined a semantic and a physical violation and observed a N400 followed by a late positivity which further supports the idea that the two components are independent. Further experiments showed that a semantic anomaly leads to a N400 not only in sentence final position but also in the middle of a sentence (9). This component has been frequently replicated in many different languages, including American Sign Language (10).

Cz

^

-5 -ι μν

N400

.··•""·.

A\

3-1 0

1

0.5

1

1

1

1.5 sec

correct semantically incorrect

Fig. 3.1: Example of a N400 component elicited by semantically incongruent sentences as compared to correct sentences for the central-midline electrode site Cz. Negative is plotted up in this and the following figures.

Subsequent studies showed that the paradigm of a semantically incorrect word in a sentence context is neither necessary nor sufficient for the elicitation of the N400 component. Kutas and Hillyard (11) demonstrated that this component can also be observed for words which are terminating a sentence correctly but are fairly unexpected, i. e. words having a low cloze probability. The amplitude of the N400 was inversely correlated with the cloze probability. But semantically incorrect sentences are not sufficient to elicit an N400: Besson, Kutas and Van Petten (12) have shown that the component is largely reduced if the same semantically incorrect sentence is presented repeatedly. The N400 component is also sensitive to associative relations between words. Kutas, Lindamood and Hillyard (13) examined sentence final words which were semantically inappropriate but associatively related to a possible correct ending

74

Anja Hahne and Angela D. Friedend

(The pizza was too hot to eat / drink / cry). These sentences elicited a N400 but with a markedly reduced amplitude as compared to unrelated words. There is a number of studies trying to identify the nature of the N400, i. e. trying to identify which processing level the N400 is sensitive to. Initial studies presented evidence that the N400 reflects processes of more automatic lexical access (e. g. 14; 15; 16). Later studies, however, which ascertained control of potential variables, led to the conclusion that the N400 reflects more controlled processes of lexical integration (17-19).

3.4

ERPs and Syntactic Processing

Until the beginning of the 90's, ERP-studies on language comprehension focused nearly exclusively on semantic processing. Only recently, some studies examined the structural processing of sentences. Most of these studies describe a late, centroparietally distributed positivity in correlation with a variety of syntactic violations or anomalies, also called P600 component.

3.4.1

P600

Some of the studies examining syntactic processing aspects applied the logic which had been successfully used in the semantic domain, i. e., the comparison of correct and incorrect sentences. Others compared structurally simple versus structurally more complex sentences to explore syntactic processes. One of the first descriptions of a late positive component with regard to syntactic processing, which is often also called P600 component, was given by Osterhout and Holcomb (20). They presented so-called garden path sentences, i. e., sentences which contain a temporary syntactic ambiguity as in The broker persuaded which can either be completed like (a) the man to sell the stock or like (b) to sell the stock was sent to jail. Thus, two possible structures can be assigned to the sentence beginning The broker persuaded, either a simple active one (a) or a more complex one in which the verb is passivized and attached to a reduced relative clause (b). Osterhout and Holcomb compared sentences like The broker hoped to sell the stock (which are unambiguous) to temporary ambiguous structures as described above (The broker persuaded to sell the stock). They observed a widely distributed positivity between 500 and 800 ms on the infinitive marker to for visual presentation and replicated this result with auditory presentation (21). A similar positivity was also reported in correlation with violations of the syntactic phrase structure (e. g., The scientist criticized Max's of proof the theorem) or violations of preferred syntactic phrase structures for both the visual and the auditory modality (22-24). Moreover, a late positivity was observed for other types of syntactic violations, e. g. subjacency violations (25; 26), specificity violations (25) and in correlation with agreement violations (27-30).

Rule-Application During Language Comprehension in the Adult and the Child

75

The fact that the electrophysiological correlate of semantic processing would emerge earlier than the correlate of structural processing seems somewhat paradox, as serial models on language comprehension assume structural analyses to precede lexical-semantic analyses (31).

3.4.2

Left Anterior Negativities

However, there is also another type of components, which seems to have been neglected in the ERP-based psycholinguistic modeling: left anterior negativities. With regard to the component's latencies, two types of left anterior negativities can be distinguished: (a) left anterior negativities between 300 and 500 ms, the socalled LAN component; (b) early left anterior negativities between 100 and 300 ms, the so-called ELAN component. The LAN component has been observed in correlation with agreement violations (9; 27—29) and for some kinds of subcategorization (21; 32). The ELAN component, however, has only been observed in correlation with phrase structure violations. This component was first shown by Neville et al. (25) in the visual domain and Friederici et al. (27) in the auditory domain. Friederici et al. presented German sentences that were either correct or violated the phrase structure rules. A preposition was directly followed by a past participle. As a prepositional phrase indicated by the preposition requires a noun phrase to follow, the past participle leads to a clear word category violation (Der Freund wurde im besucht / The friend was in the visited). For these sentences we observed an early left anterior negativity (ELAN). Interestingly, a similar left anterior negativity has been observed also in correlation with function words, i. e., elements that carry particularly structural function such as determiners, conjunctions or auxiliaries (33). These results indicate that there seems to be an earlier phase of syntactic analysis taking place prior to or in parallel with semantic processing. This early phase of initial structural analysis, the so-called first-pass parse as modeled by Frazier (31), seems to be reflected in the early negativity. The late P600 component might rather reflect second-pass parsing processes. In a number of ERP experiments we tried to test this assumption and to specify the nature of the first-pass parsing and second-pass parsing processes. We will discuss these experiments each focusing on a particular aspect concerning the nature of the processes in sum. Based on these neurocognitive findings we will present a model describing the process of language comprehension in its different phases. We will close by sketching how this model can account for the development of language acquisition in the child.

3.4.2.1

Influence of

proportion

As first-pass parsing processes in contrast to second-pass parsing processes are assumed to be fairly automatic in nature, Hahne and Friederici (34) further ex-

76

Anja Hahne and Angela D. Friederici

plored this idea experimentally. We run an ERP-experiment designed to test the hypothesis of fairly automatic first-pass parsing processes as reflected in the early negativity and fairly controlled processes of second-pass parsing as reflected in the late positivity. In this experiment we varied the proportion of correct and incorrect sentences. This proportion manipulation paradigm has been frequently used to examine the relative amount of automaticity of cognitive processes (e. g. 19; 35-37). The rationale is to influence a participant's expectation about the occurrence of a particular stimulus type and thereby manipulating his or her strategic behavior. If a particular ERP component is influenced by the varying proportion, the underlying cognitive process is assumed to be under the participants control. By contrast, if the ERP component is not affected by the proportion manipulation, the cognitive process reflected in this component is assumed to be fairly automatic. In our study the proportion of sentences containing phrase structure violations was either 20% (i. e., 80% correct sentences) or 80% (i. e., 20% correct sentences). To the extent that the ELAN component reflects an automatic process which is independent of conscious expectancies and strategies, this early ERP component should be roughly equivalent across the two proportion conditions. By contrast, if the P600 component reflects a controlled process, this component should be larger for a low proportion of incorrect sentences than for a high proportion. Twenty right-handed students from the Free University of Berlin participated in the study. They listened to the stimuli via headphones. During the presentation

left anterior

centro-parietal

F7 -5 π μν ELAN

s

syntactic violation 0.5 correct

1.5 sec

1.5 sec

P600

incoirect

Fig. 3.2: Event-related brain potentials elicited by the terminal word of syntactically correct and incorrect sentences with incorrect sentences being either rare (upper panel) or frequent (lower panel) for a left anterior and a centro-parietal electrode position. Displayed are the waveforms averaged over 16 participants.

Rule-Application During Language Comprehension in the Adult and the Child

77

of the sentences participants were instructed to fixate a small star in the middle of a CRT screen placed in front of them and to avoid eye blinks during the presence of the fixation star. The fixation marker appeared on the screen 500 ms prior to the beginning of the auditory sentence and remained visible until 3000 ms after sentence offset. Then a response sign appeared on the screen for 2000 ms and the participant was asked to indicate via push buttons whether the sentence was correct or incorrect. The next trial started after an inter stimulus interval of 1000 ms. Results clearly supported our hypotheses. The ELAN component was observed and equally pronounced for both proportion conditions. The P600 component, however, was only present in case of a low proportion of incorrect sentences but not for a high proportion of incorrect sentences (see Fig. 3.2). This pattern of results indicates that the processes underlying the early left anterior negativity are rather independent of the participant's conscious expectancy and strategic behavior while the processes underlying the late positivity are under the control of the participant. These results are highly compatible with structure-driven serial models of language comprehension which assume two distinct parsing stages (31; 38): A first stage during which the parser automatically assigns the initial syntactic structure on the basis of word category information and a second stage during which structural reanalysis takes place.

3.4.2.2

Influence of

semantics

These models also predict that the first step of analysis is independent of semantic information. Hahne and Friederici (39) explored the time course and possible interactions of semantic and syntactic processes in language comprehension in two experiments. In these experiments we included a condition in which a word category error and a lexical-semantic error were realized on the same item. If the initial structure building process is indeed executed before any lexical semantic integration takes place this first pass parsing process should not be influenced by the semantic properties of the word, i. e. whether the word can be lexically integrated into the whole sentence or not. Thus, for the combined semantic-syntactic violation condition we expect the ELAN component to be present just as in a simple syntactic phrase structure violation condition. A further critical aspect of this experiment refers to possible semantic integration processes in case of word category violations. Are words that violate the phrase structure of a sentence semantically integrated into the whole sentence? Theoretically, there are at least two possibilities: Semantic integration processes could either be independent of phrase structure information. This assumption would predict additive components for the combined semantic-syntactic violation condition, i. e., an ELAN component followed by a N400 component. As a second possibility these integration processes could be functionally dependent on the assignment of the word in the given phrase structure of the sentence. Under this assumption we would predict that there should be no N400 component in case of a combined semantic-syntactic violation.

78

Anja Hahne and Angela D. Friederici

To test these different possibilities we designed an experiment in which we included four types of sentences. The sentences ended with a target word which was either (a) correct, (b) semantically incorrect, i. e., violating the selectional restriction of the verb, (c) syntactically incorrect, i. e., violating the phrase structure, or (d) semantically and syntactically incorrect, i. e., violating both the selectional restriction and the phrase structure. (a)

Das Baby wurde gefüttert. The baby was fed.

(b)

Das Lineal wurde gefüttert. The ruler was fed.

(c)

Die Gans wurde im gefüttert. The goose was in the fed.

(d)

Die Burg wurde im gefüttert. The castle was in the fed.

Additionally we also presented filler sentences which contained a complete prepositional phrase (Die Kuh wurde im Stall gefüttert — The cow was in the barn fed). All sentences were spoken by a female speaker and were presented auditorily via headphones. In this experiment participants were asked to judge the overall correctness of the sentences. The procedure was equivalent to the one described in section 3.4.2.1. The ERP-results for conditions (b) and (c) replicated earlier results: we observed a N400 for semantic violations and an ELAN component followed by a P600 for phrase structure violations. The combined violation (d), however, elicited the same result pattern as the pure syntactic condition, i. e., an ELAN followed by a P600. Interestingly, there was no N400 despite the clear selectional restriction error. This indicates that the semantic integration of a word into a sentence is not initiated automatically but may only be initiated after a successful structural integration of the element into the phrase structure built up on-line.

3.4.2.3

Influence of task

In a subsequent experiment we further examined the controlled nature of the semantic integration processes. In this experiment we presented the same sentences again but changed the instruction given to the participants. This time they were asked to judge the "semantic coherence" of the sentences exclusively, thereby ignoring syntactic aspects of the sentence. That is, participants had to classify sentences containing a phrase structure error only as being correct. If the semantic integration processes, as reflected in the N400, are under the participants' control it should be possible to induce them by this instruction. Thus we should observe a N400 also for the combined violation despite of the word category violation. In addition, this instruction manipulation gives us the opportunity to test the auton-

Rule-Application During Language Comprehension in the Adult and the Child

79

judgment of overall correctness left anterior

centro-parietal

F7 ELAN -5-1 μ ν ^

syntactic violation 1.5 sec

0

fcl

0.5

1.5 sec

P600

correct incorrect

F7 -5-,

ELAN v x S

combined syntactic-semantic violation 0

0.5

1

1.5 sec

0

W

1.5 sec

P600 judgment of semantic correctness left anterior

centro-parietal

F7 ELAN -5-,μγ

syntactic violation

correct incorrect Pz

ELAN

N400

-5-,μν

combined syntactic-semantic violation 0

05

1

K5 sec

0

0.5

1

1.5 sec

Fig. 3.3: Event-related brain potentials elicited by the terminal word of correct sentences as compared to either syntactically incorrect sentences (upper panel) or syntactically and semantically incorrect sentences (lower panel) for two different instructions for a left anterior and a centro-parietal electrode position.

80

Anja Hahne and Angela D. Friederici

omy of the ELAN component again. If this component reflects a rather automatic process it should not be affected by shifting the participants' attention towards semantic aspects of the sentence, i. e., we expect the early negativity to be present for the pure syntactic violation and for the combined violation condition. As in the previous experiment, 16 subjects were tested. Participants were told to judge only the "semantic coherence" of the sentences ignoring category violations. In all other aspects this experiment was equivalent to the previous one. With regard to semantic integration processes the data showed that, in contrast to the previous experiment, a N400 component was elicited also in the combined condition (d). Thus, the fact that the N400 was dependent on the instruction clearly suggests that the underlying processes are under the participants' strategic control. With regard to the early structure building processes the ERP-results were also clear-cut: even when focusing on semantic aspects of the sentence an ELAN component was elicited in the pure syntactic condition (c) as well as in the combined semantic-syntactic condition (d). This illustrates the highly autonomous character of first-pass parsing processes in auditory language comprehension. Interestingly, under the "semantic coherence instruction" there was no P600 for phrase structure violations. This confirms the results of the proportion experiment that the processes reflected in this component are of a fairly controlled nature (see Fig. 3.3). To summarize, the combined results suggest that first-pass parsing processes as reflected in the ELAN component are highly automatic. They are neither influenced by the proportion of correct and incorrect items in the experimental set nor by the lexical-semantic information of the word to be investigated or the instruction given to the listener.

3.4.2.4

Influence of syntactic

preferences

The ELAN component has been shown to correlate systematically with phrase structure violations, i. e., with obligatory word category errors. Friederici, Hahne and Mecklinger (22) examined whether the early negativity can really be described as a marker of the detection of a syntactic phrase structure error or whether this component is also sensitive to phrase structure preferences. Two different conditions were compared: on the one hand phrase structure violations similar to those used in the previous experiments (e and f) and on the other hand sentences with a violation of the preferred syntactic category due to the ambiguity of the word wurde (was / became) in g and h. (e)

Das Metall wurde veredelt von dem Goldschmied, den man auszeichnete. The metal was refined by the goldsmith who was honored.

(f)

Das Metall wurde zur veredelt von dem Goldschmied, den man auszeichnete. The metal was for refined by the goldsmith who was honored.

Rule-Application During Language Comprehension in the Adult and the Child

81

(g)

Das Metall wurde zur Veredelung geschmolzen von dem Goldschmied, den man auszeichnete. The metal was for refinement melted by the goldsmith who was honored.

(h)

Das Metall wurde Veredelung von dem Goldschmied, den man auszeichnete. The metal was/became refinement by the goldsmith who was honored.

The word wurde can either be read as an auxiliary (was) or as a main verb (became). Thus several word categories can follow. Under the auxiliary-reading a past participle is a correct continuation. Under the main verb-reading a noun is a correct possibility to continue the sentence. Normative results from a sentence completion study showed that the auxiliary reading is much more frequent than the main verb reading and can therefore be regarded as the preferred reading. We predicted an ELAN component for the phrase structure violation condition (f). However, if the ELAN component reflects the detection of a word category violation rather than a category preference we would expect this component to be absent in case of a preference violation (h). The sentences were again presented auditorily. Note, that the identification of the word category for the critical items is only possible at the word final suffix (suffix -t indicating verb status and suffix -ung indicating noun status). Thus the earliness of the left anterior negativity must be judged relative to the late word category decisionpoint (for details see Friederici, Hahne, and Mecklinger, 22). With 16 students participating, this experiment replicated the ELAN effect for the word category violation condition, not, however, for the preference violation condition. This suggests that the E L A N component is indeed a reflection of the detection of incorrect word category and is not sensitive to the detection of a correct but non-preferred category.

3.4.2.5

Influence of prosody

In a further series of experiments we examined another potentially relevant variable, which has been widely neglected in psycholinguistic research on parsing: prosody. Especially with regard to auditory comprehension prosodic information might be able to influence parsing processes, possibly even first-pass parsing processes. In two recent experiments, we tried to assess a potential influence of word stress on the structural analysis of sentences. In these experiments we presented again sentences violating the phrase structure like in (c) (Die Gans wurde im gefüttert / The goose was in the fed) in which the preposition marks the beginning of a prepositional phrase but in which the case marked preposition is not followed by the structurally required noun, but by a verb. The previous and the present experiments differ with regard to the acoustic realization of the preposition. In the experiments described above the preposition in the syntactically incorrect sentences was unstressed as it would be in a correct sentence containing a complete prepositional phrase. Technically, this was achieved as follows: In order to avoid any

82

Anja Hahne and Angela D. Friederici

possible acoustic deviation from a correct sentence including a prepositional phrase prior to the presentation of the critical word in the syntactic violation condition, the speaker initially produced a correct sentence with a noun after the preposition. In a second step this "superfluous" noun was spliced out of the digitized speech signal using a speech editing tool. To ensure that this splicing procedure would not be impaired by coarticulation phenomena we used nouns for which the phonological transitions from preposition offset to noun onset and from noun offset to participle onset were identical. By contrast, the stimulus material of the present experiment was realized in a different way. The speaker was not instructed in any specific way but just read the syntactically incorrect sentences aloud in a way that seemed "natural" to her. This resulted in a stressed preposition. It seemed as if the stress which is usually put on the noun within the prepositional phrase is now placed on the preposition. This difference in word stress the preposition received was indexed by a mean length difference of 59 ms. Word stress on a preposition is highly anomalous in isolated sentences. The only possibility for a preposition to receive stress is its contrastive usage, which was clearly not the case for our isolated sentences. We carried out an experiment in which we presented sentences of type (a), (b) and (c) with phrase structure violations containing an anomalously stressed preposition. The aim of the study was to investigate whether anomalous stress on an element which is highly important for phrase structure building can influence first pass parsing processes. Recall, that these processes have been shown to be highly autonomous with regard to semantic or attentional aspects. But are they equally autonomous with regard to prosodic aspects? Sixteen subjects participated in the experiment and were asked to judge the correctness of each sentence. In contrast to the previous experiments, sentences with phrase structure violations did not elicit an early negativity when the prosodic pattern was anomalous. We observed a parietally distributed positivity which had it's maximum about 300 ms earlier than the P600 in the previous experiments. Thus this positivity cannot be characterized as a P600 but is more likely to be a reflection of the very rare event of a considerably stressed element which functions as a type of "oddball". The fact that we did not observe an ELAN component for sentences containing anomalously stressed prepositions seems to indicate that first pass parsing processes are affected by this prosodic anomaly. Before drawing further conclusions about this interaction of prosodic and syntactic information processing during sentence comprehension, we intended to replicate this result. To eliminate the possibility that differences between the samples of participants tested in the two experiments would contribute to the divergent findings, we carried out a further experiment including both types of acoustic realizations of phrase structure violations. This gave us the opportunity to examine possible processing differences between stressed and unstressed prepositions in a within-subjects design. As can be seen in Figure 3.4, we replicated our previous results. Word category errors within a prepositional phrase led to an ELAN component followed by a P600 in case of a "normal" prosodic realization of the preposition but did not do so in case of an anomalous prosodic realization, i. e., with a preposition carrying

83

Rule-Application During Language Comprehension in the Adult and the Child

left anterior

centro-parietal

ELAN preposition unstressed

1.5 sec

w 1

1.5 sec

P600 correct incoirect

F7 -5 -. μ ν preposition stressed

—ι— 0.5

1.5 sec

1.5 set-

Fig. 3.4: Event-related brain potentials elicited by the terminal word of correct sentences as compared to syntactically incorrect sentences in which the preposition was either unstressed (upper panel) or stressed (lower panel) for a left anterior and a centro-parietal electrode position.

contrastive stress outside an appropriate context. This indicates a close interaction between syntactic and prosodic information processing during auditory sentence comprehension. Anomalous word stress on elements serving important functions during phrase structure building seem to be able to influence or even block further syntactic processing (cf. 40; 41). To test whether the absence of the ELAN component for stressed prepositions is really due to the blocking of further processing as a result of the anomalous stress pattern, Jescheniak, Hahne and Friederici (42) conducted a further experiment. They created a situation in which a stressed preposition was the adequate prosodic pattern of a sentence. As described above, the only possibility for a stressed preposition in a simple declarative sentence is its contrastive usage. Thus, the sentences containing a phrase structure violation following a stressed preposition were preceded by a question which put the focus on the preposition. Given a question like Wurde die Gans VORM Stall gefüttert? / Was the goose being fed IN FRONT OF the barn? an answer like in (c) would contain a stressed preposition in order to establish the contrast {the goose was being fed IN and not IN FONT OF the barn). Thus, the word stress of the preposition is contextually appropriate and should not block further syntactic processing. The results clearly supported our hypotheses: we observed an early negativity (though with a slightly different topography compared to previous experiments) for sentences with a stressed preposition when preceded by the appropriate question but we replicated again the absence of this component when these same sentences were presented

84

Anja Hahne and Angela D. Friederici

without the preceding question. In sum, these data demonstrate that prosodic information, and the adequacy of a prosodic pattern in context affects even firstpass parsing processes in spoken language comprehension.

3.4.3

A Neurocognitive Model of Language Comprehension

In the following, we will present a model which accounts for the results from the ERP research on language comprehension just outlined and which also takes previous behavioral neuropsychological data into account. Our model assumes that the process of sentence comprehension can be subdivided into three stages each of which has different properties and is correlated with different ERP components (cf. 43). During the first phase the parser incrementally assigns the initial syntactic structure on the basis of word category information only. This first-pass parse is reflected in the early left anterior negativity (ELAN). Importantly, this early negativity is only observed in correlation with outright phrase structure violations. As mentioned above, syntactic structures that are infrequent but nevertheless legal do not elicit this early negativity (22; 23). This suggests that during the first-pass parse the parser is only guided by phrase structure rules and does not take the frequency of occurrence of particular structures into account. This process is highly autonomous as it is independent of semantic aspects of the words being processed and it is independent of strategic behavior on behalf of the participant as neither the proportion of incorrect sentences nor the attention-shifting towards the semantic coherence of the sentence has any influence on these processes. However, this early process seems to take prosodic aspects, such as word stress, into account. An anomalous prosodic pattern is able to block initial phrase structure processes. During the second phase the parser tries to achieve the thematic role assignment. These processes are reflected by negativities around 400 ms. During this stage lexically bound information other than word category information is processed. On the one hand this is lexical-semantic information reflected in the centroparietally distributed N400 component, on the other hand this is syntactic information such as subcategorization and inflectional morphology reflected in left anterior negativities around 400 ms. The processing of lexical-semantic information as reflected in the N400 component can be characterized as being under the participant's control. During the third phase lexical-semantic and syntactic information are mapped onto each other. In case of any syntactic problems which have been detected during the previous analysis a reanalysis or repair will be initiated which is reflected by the late centro-parietally distributed positivity (P600). This process also appears to be fairly controlled as it can be influenced by strategic behavior.

Rule-Application During Language Comprehension in the Adult and the Child

3.5

85

ERPs in Language Development

So far we have analyzed the behavior of the adult language comprehension system. But how does the system acquire these processing mechanisms during language development? Do children comprehend sentences via the same three processing steps as shown for adults? In particular, are they able to process information about phrase structure rules in the same fast and automatic way as we have observed it for adults? Our knowledge about the developing brain and its speech processing abilities is fairly restricted. In part, this might be due to the difficulties in testing children's language behavior on-line. However, the method we have described in this paper, namely the ERPs during language comprehension, seems to be a promising approach to investigate sentence comprehension processes even during childhood. In a first approach we tested children using approximately the same paradigm as in the studies just described (44). The stimulus materials were tested in pilot studies in order to guarantee that the children were able to understand the sentences. We included sentences of type (a), (b) and (c). Children were told that they would hear "normal" sentences and "bewitched" sentences and their task was to find out which sentence was normal and which was bewitched by a little magician. The experimental procedure was adapted in a number of details to meet the children's needs. For example, to keep their motivation high, we presented a tiny magician on the computer screen in place of the standard fixation star. During the auditory presentation of the sentences children were asked to fixate the small magician on the computer screen in front of them and 3000 ms after the end of the sentence a smiling and a sad face appeared on the screen, children had to press a button to indicate their decision on the correctness of the sentence. Sixteen children ranging in age from 6.1 to 7.8 years participated in this experiment. Children at age seven showed ERP effects partly similar to those of the adults. For the semantically incorrect condition we observed a negativity which looked similar to a N400 but was somewhat delayed and more widely distributed than for adults. This seems to suggest that the processes of lexical-semantic integration in children of that age are similar to those of the adult listeners but delayed in its onset. For sentences containing a phrase structure violation we observed a left anterior negativity starting around 200 ms, peaking around 400 ms and extending beyond 1000 ms, i. e., the negativity had a similar topography as the ELAN but was somewhat delayed and longer lasting compared to adults. This finding indicates that children at the age of six to seven years successfully apply the phrase structure rules to detect the word category violations, although the process itself appears to be of a longer duration than in adult listeners. The phrase structure violation sentences also elicited a late positivity which was strongly lateralized to the right hemisphere. It started only around 750 ms and lasted beyond 1000 ms. Although there is a shift in latency with regard to the first pass parsing processes it is the late positivity — taken to reflect a controlled process of reanalysis and repair in adults — which seems to show the largest divergence from the adult pattern. It appears that these repair processes develop fairly late.

86

Anja Hahne and Angela D. Friederici

Thus, although there are clear differences between the children and the adult ERP pattern with respect to the timing and the topography of the observed effects, the data suggest that children around the age of seven years process auditory language partly applying processes similar to adults. In particular the early syntactic processes seem to involve brain systems also used by the adult listeners, as indicated by the distribution of the early negativity. The differences in topography of the late processes may be due to differential strategies accompanying processes of lexical semantic integration and repair in children and in adults. The combined data provide evidence for a fixed rule-based system underlying language processing. The application of phrase structure rules during language comprehension in the adult system is highly automatic, independent of aspects of meaning and instructions given. Moreover, processes supporting the application of phrase structure rules implemented during childhood appear to be based on brain structures also used in adulthood. Further studies including different age groups will explore the precise characteristics of the developmental changes in language comprehension.

Acknowledgments This research was supported by the Berlin-Brandenburg Academy of Sciences (BBAW). We thank Heike Bothel, Stefan Frisch, Birgit Hain and Malgorzata Mikolajewska for their assistence in data collection, Erdmut Pfeifer for software support and Jörg Jescheniak for helpful comments. Correspondence concerning this article should be addressed to: Anja Hahne, Max Planck Institute of Cognitive Neuroscience, P.O. Box 500 355, 04303 Leipzig, Germany, email: hahne@cns. mpg.de.

References 1. Liberman, A. M., Harris, F. S., Shankweiler, D. P., and Studdert-Kennedy, M. (1967) Perception of the speech code. Psych. Rev. 74, 4 3 1 - 4 6 1 . 2. Cairns, P., Shillcock, R., Chater, Ν., and Levy, J. (1997). Bootstrapping word boundaries: A bottom-up corpus-based approach to speech segmentation. Cogn. Psychol. 33, 111-153. 3. McNeill, D. (1970). The acquisition of language: The study of developmental psycholinguistics (New York: Harper and Row). 4. Jackendoff, R. S. (1994). Patterns in the mind. In: Language and human nature (New York: Basic Books), pp. 2 2 3 - 2 3 9 .

5. Coles, M. G. H. and Rugg, M. D. (1995). Event-related brain potentials: an introduction. In: Electrophysiology of mind: Event-related brain potentials and cognition, M. D. Rugg and M. G. H. Coles, eds. (New York: Oxford University Press), pp. 1—26. 6. Kutas, M. and Hillyard, S. A. (1980a). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science 207, 2 0 3 - 2 0 5 . 7. McCallum, W. C„ Farmer, S. F., and Pocock, P. V. (1984). The effects of physical and semantic incongruities on auditory event-related potentials. Electroenceph. clin. Neurophysiol. 59, 477-488.

Rule-Application During Language Comprehension in the Adult and the Child 8. Kutas, M. and Hillyard, S. A. (1980b). Event-related brain potentials to semantically inappropriate and surprisingly large words. Biol. Psychol. 11, 99-116. 9. Kutas, M. and Hillyard, S. A. (1983). Event-related potentials to grammatical errors and semantic anomalies. Mem. Cogn. 11, 539-550. 10. Kutas, M., Neville, H., and Holcomb, P. J. (1987). A preliminary comparison of the N400 response to semantic anomalies during reading, listening and signing. In: Cerebral psychophysiology: Studies in event-related potentials EEG, Suppl. 39, W. L. McCalllum, R. Zappoli, and F. Denoth, eds. (Amsterdam: Elsevier), pp. 325—330. 11. Kutas, M. and Hillyard, S.A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature 307, 161-163. 12. Besson, M., Kutas, M., and Van Petten, C. (1992). An event-related potential analysis of semantic congruity and repetition effects in sentences. J Cogn. Neurosci. 4, 132-149. 13. Kutas, M., Lindamood, Τ. E., and Hillyard, S. (1984). Word expectancy and event-related brain potentials during sentence processing. In: Preparatory states and processes, S. Kornblum and J. Requin, eds. (Hillsdale: Lawrence Erlbaum), pp. 217—237. 14. Besson, M., Fischler, I., Boaz, T., and Raney, G. (1992). Effects of automatic activation on explicit and implicit memory tests. J. Exp. Psychol.: Learn. Mem. Cogn. 18, 89-105. 15. Boddy, J. (1986). Event-related potentials in chronometric analysis of primed word recognition with different stimulus onset asynchronies. Psychophysiology 23, 232-245. 16. Kutas, M. and Hillyard, S.A. (1989). An electrophysiological probe of incidental semantic association. J. Cogn. Neurosci. 1, 3 8 - 4 9 . 17. Bentin, S., Kutas, M., and Hillyard, S. (1993). Electrophysiological evidence for task effects on semantic priming in

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

87

auditory word processing. Psychophysiology 30, 161-169. Brown, C. and Hagoort, P. (1993). The processing nature of the N400: Evidence from masked priming. J. Cogn. Neurosci. 5, 3 4 - 4 4 . Chwilla, D. J., Brown, C., and Hagoort, P. (1995). The N400 as a function of the level of processing. Psychophysiology 32, 274-285. Osterhout, L. and Holcomb, P. J. (1992). Event-related potentials and syntactic anomaly. J. Mem. Lang. 31, 785-804. Osterhout, L. and Holcomb, P.J. (1993). Event-related potentials and syntactic anomaly: Evidence of anomaly detection during the perception of continuous speech. Lang. Cogn. Processes 8, 413—437. Friederici, A. D., Hahne, Α., and Mecklinger, A. (1996). Temporal structure of syntactic parsing: Early and late event-related brain potential effects elicited by syntactic anomalies. J. Exp. Psychol.: Learn. Mem. Cogn. 22, 1219-1248. Hagoort, P., Brown, C., and Groothusen, J. (1993). The syntactic positive shift as an ERP measure of syntactic processing. Lang. Cogn. Processes 8, 439-483. Münte, Τ. F., Heinze, Η. J., and Mangun, G. R. (1993). Dissociation of brain activity related to syntactic and semantic aspects of language. J. Cogn. Neurosci. 5, 335-344. Neville, H. J., Nicol, J., Barss, Α., Forster, Κ. I., and Garrett, Μ. F. (1991). Syntactically based sentence processing classes: Evidence from event-related brain potentials. J. Cogn. Neurosci. 3, 151-165. McKinnon, R. and Osterhout, L. (1996). Constraints on movement phenomena in sentence processing: Evidence from event-related brain potentials. Lang. Cogn. Processes 11, 495 — 523. Friederici, A. D., Pfeifer, Ε., and Hahne, Α. (1993). Event-related brain

88

28.

29.

30.

31.

32.

33.

34.

35.

36.

Anja Hahne and Angela D. Friederici potentials during natural speech processing: Effects of semantic, morphological and syntactic violations. Cogn. Brain Res. 1, 183-192. Osterhout, L. and Mobley, L. A. (1995). Event-related brain potentials elicited by failure to agree. J. Mem. Lang. 34, 739-773. Gunter, Τ. C., Stowe, L. Α., and Mulder, G. (1997). When syntax meets semantics. Psychophysiology 34, 660— 676. Coulson, S., King, J. W., and Kutas, M. (1998). Expect the unexpected: Event-related brain responses to morphosyntactic violations. Lang. Cogn. Processes 13, 21—58. Frazier, L. (1987). Sentence processing: A tutorial review. In: Attention and performance XII, M. Coltheart, ed. (Hillsdale: Erlbaum), pp. 559-586. Rosier, F., Friederici, A. D., Pütz, P., and Hahne, A. (1993). Event-related brain potentials while encountering semantic and syntactic constraint violations. J. Cogn. Neurosci. 5, 345-362. Neville, H. J., Mills, D. L., and Lawson, D. S. (1992). Fractionating language: Different neural subsystems with different sensitive periods. Cerebral Cortex 2, 244-258. Hahne, A. and Friederici, A. D. (1997). Two stages in parsing: Early automatic and late controlled processes. Exp. Brain Res. 117, 47. DeGroot, Α. Μ. B. (1984). Primed lexical decision: Combined effects of the proportion of related prime-target pairs and the stimulus-onset asynchrony of prime and target. Quart. J. Exp. Psychol. 36A, 253-280. Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: roles of inhibitionless spreading activa-

37.

38.

39.

40.

41. 42.

43.

tion and limited capacity attention. J. Exp. Psychol. 106, 226-254. Stanovich, K. and West, R. (1983). On priming by a sentence context. J. Exp. Psychol.: General 112, 1 - 3 6 . Gorrell, P. (1995). Syntax and parsing (Cambridge, England: Cambridge University Press). Hahne, A. and Friederici, A. D. (1998). E R P evidence for autonomous first-pass parsing processes in auditory language comprehension. Poster presented at the 5th Annual Meeting of the Cognitive Neuroscience Society, San Francisco. Steedman, M. J. (1990). Syntax and intonational structure in a combinatory grammar. In: Cognitive models of speech processing: Psycholinguistic and computational perspectives, G. T. M. Altman, ed. (Cambridge, MA: MIT Press), pp. 457-482. Steedman, M . J . (1991). Structure and intonation. Language 67, 260—296. Jescheniak, J. D., Hahne, Α., and Friederici, A. D. (1998). Prosodic influences on syntactic parsing: An ERPStudy. Poster presented at the 5th Annual Meeting of the Cognitive Neuroscience Society, San Francisco. Friederici, A. D. (1995). The time course of syntactic activation during language processing: A model based on neuropsychological and neurophysiological data. Brain lang. 50, 259— 281.

44. Friederici, A. D. and Hahne, A. (in press). Developmental patterns of brain activity for semantic and syntactic processes. In: Approaches to bootstrapping: phonological, syntactic and neurophysiological aspects of early language acquisition, B. Höhle and J. Weissenborn, eds. (Amsterdam-Philadelphia: John Benjamins).

4.

Learning, Representation and Retrieval of Rule-Related Knowledge in the Song System of Birds Henrike Hultsch, Roger Mundry and Dietmar Todt

4.1

Introduction

The song system of birds is a unique paradigm of biocommunication. This can be concluded from several of its properties. First, the singing of birds develops by vocal imitation of individually experienced signals, and such signal acquisition through auditory learning is rare in nonhuman organisms. Until now, it has been documented only for a few families of birds (e. g. oscine birds and parrots; review in: 1) and mammals (e. g. marine mammals and bats; review in 2). Second, the birds' singing can be composed by a large number of different vocal patterns and their sequencing points to both sophisticated rules of pattern retrieval and a remarkable memory capacity. Finally, singing, though strictly serial in time, shows a clear hierarchical organization. The song system makes an excellent model for studies on how birds process and use large amounts of rule-related knowledge. Biologists distinguish about 9000 species of birds and list about 4000 of them as songbirds (oscines, passeriformes; 3). With a few exceptions, the species-typical singing is a matter of adult males who often modify it according to season, circadian time, ecological features and, above all, the social context. In situations of courtship or close range male-male interactions, for instance, singing usually is more complex and versatile than during territorial advertisement (4—15). Various analyses have shown that vocal interactions between neighbors can follow sophisticated rules, including both pattern specific and time specific relationships between the exchanged song-types (16—20). From an evolutionary perspective, singing has contributed to a separation of species, and this explains why vocal diversity is a key feature of the song systems of birds. The fact that species differ in their voice and also in phonetic details of their songs, has stimulated the special interest of investigators for a long time (review in: 21). Currently, however, the research focus also addresses those properties of the song system which birds do have in common. These approaches allow to describe a number correspondences among species. The singing of birds can be taken as a stream of behavior that shows a clearcut alternation of acoustical patterns and silent intervals (pauses). The most conspicuous pattern is the so-called 'song' (strophe). In the typical case, songs have a duration of about three seconds and are separated by pauses of a similar duration. From the perspective of information processing, song duration seems to be selected for providing optimal 'chunks' of information. A song is long enough to convey a distinct message and, at the same time, not so long to constrain a sensory

90

Henrike Hultsch, Roger Mundry and Dietmar Todt

10 -

, IiiΛ π

5 -

Λ - ' u i

!

. Inn.in.

Ι,

Tim· (9)

θ5β6

α

θ7θ8 β9

θ 10

Ρ

Fig. 4.1: Graphic display of singing in a nightingale. Top: Frequency spectrograms of four song-types (taken from a longer sequence of songs). Bottom: Spectrogram of the fourth song. Chififres below show the ID numbers of elementtypes (el, e2, ... elO). Greek letters indicate song sections. In nightingales, Alpha-sections are low in volume, whereas Beta-sections consist of louder element complexes or motifs. Gamma-sections are made up by element repetitions that results in a rhythmical structure of this song part (trill). Finally, Omega-sections are made up by one unrepeated element. This figures illustrates a structural hierarchy given by elements, song sections, songs, sequences of songs.

check for signals of a neighbor or delay a potential reply (15). In addition, songs form an intermediate level of a structural hierarchy in which the highest level is given by an episode of singing or a sequence of songs (term: inter-song level). On hierarchically lower levels one can distinguish several structural compounds that compose the songs (term: intra-song levels). In a top-down order these are, for example, song sections, or phrases, motifs, syllables and elements or notes (Fig. 4.1). The numbers of both phonetic song constituents and discerned intrasong levels vary a lot across species. Most analyses, therefore, concentrate on the basic level of song organization which is given by the so-called 'song elements'. At this level units are compared and, according to parametric cues or values assessed by frequency spectrography, either told apart or lumped. The pool of classified song elements is then taken to categorize the songs and thus assess the repertoire of song-types (21—26). Bird species differ in the sizes of their song-type repertoires (review in 13) chaffinches (Fringilla coelebs) and great tits (Parus major), for instance, sing three to eight different songs, whereas size in Eurasian blackbirds {Turdus merula) is

Rule-Related Knowledge in the Song System of Birds

91

approximately 50 and common nightingales (Luscinia megarhynchos) even master more than 150 types of song (18). In spite of such species-typic diversity, the composition of vocal repertoires reveals a basic principle in most songbirds: The sizes of element-type repertoires are larger than the sizes of their song-type repertoires. Lessons from the structural hierarchy of bird song have stimulated studies on the rules encoded in the sequencing of songs or song elements, respectively, and these rules have been described by a procedural hierarchy (27). The concept of a procedural hierarchy allows to investigate the significance of sequential positions, namely by assuming that a given pattern can be taken as a 'hierarch' for the next unit in the sequence. Such studies showed that the sequencing of songs reflects a remarkably high degree of freedom. That is, in principle, no song-type succession is excluded, and this facilitates pattern specific responses during vocal interaction. Nevertheless, one can find preferred sequential combinations of particular types of songs, which according to recent studies can be explained as a result of individual learning (see chapter 4.2). On the intra-song level, however, the sequencing of elements is much less flexible than that of songs. With the Eurasian blackbird and the common nightingale as typical species the procedural hierarchy on the intra-song level can be described by the following rules. First, particular types of elements occur at a particular song position only. Second, songs share initial types of elements, but differ in the subsequent ones. Thus the intra-song branching usually reflects a 'diffluent flow' schema (one-to-many principle). In terms of decisional aspects this means that a bird has a number of alternative options to continue a song after the first element has been produced (4; 9; 28-30). To summarize, the singing of birds reveals a complex structure and there are distinct rules that specify its dynamic properties. The level of songs is particular significant to both individual and interactional aspects. As birdsong is a learned behavior, the question arises, how birds acquire and develop their vocal competence and how signals are stored in an individual's memory. There are numerous demonstrations that a 'song' plays an important role here, too. Most direct support has been collected through learning experiments and much of the remaining part of this chapter will deal with this issue. In particular, we will review evidence on whether rules about the hierarchical organization of singing behaviors play a significant role in birds that acquire and use large repertoires of vocal patterns.

4.2

Song Learning: Acquisition of Rule-Related Knowledge

Questions of whether and how birds extract and acquire rule-related information about their songs have been studied in a number of species. These include, for instance, zebra finches (Taenopygia guttata·, 31), song sparrows (Melospiza melodia; 32), marsh wrens (Cistothorus palustris·, 33), canaries (Canarius serinus\ 34), starlings (Sturnus vulgaris·, 35), and the common nightingale (Luscinia megarhynchos). Most of these approaches focused on the intra-song level and systematic

92

Henrike Hultsch, Roger Mundry and Dietmar Todt

investigations on the learning of information encoded on the inter-song level are currently available only for the nightingale. This species is renowned for an outstanding vocal virtuosity, and the repertoire of an adult individual may comprise 200 different types of songs which are performed in a versatile singing style.

4.2.1

Methodological Aspects

The selection of an appropriate model species can play an essential role in biological research. Several characteristics make nightingales good candidates for studying song learning also in the laboratory. First, the early period of auditory song acquisition begins around day 15 post hatching and continues for, at least the first three months of life, thus providing an extended time span in which to conduct learning experiments. Second, young nightingales readily accept a human caretaker as their social tutor (36; 37), thus allowing us to standardize variables (e. g. by presenting our master songs through a loudspeaker) and also enabling us to check for factors which affect the acquisition process (e. g. by audiovisual control). Third, nightingales readily develop excellent copies of conspecific songs presented in a laboratory learning setting, which often is a problem in other oscine species. Fourth, auditory and motor learning are temporally separated by several months which allows to control for interactions between the two processes. To compose a learning program, particular songs are selected at random from our catalogue of master song-types and recorded on tape to form a particular string of master songs. In the standard design, each song in a string is a different song-type and, likewise, each of the different strings to which a subject is exposed during the period of tutoring consists of a unique set of song-types. We thus label a particular tutoring situation or regime by the particular string which is played then. The acquisition success of the tutored males then allows inferences on whether and how a particular exposure variable influenced their singing (38). Additional checks for an impact of variables are done by an analysis of audiovisual recordings which permit access to e. g. an individual bird's motility during a given tutoring experiment (39). Song acquisition depends on exposure variables (Fig. 4.2). In some respect, however, it can be remarkably resilient to changes in such variables. For example, while the acquisition success of nightingales is low (app. 30%) for songs experienced only five times, birds imitate around 75 % of those song-types which they heard 15 times. On the other hand, a more frequent exposure does not significantly improve acquisition success. Also, the number of songs in a string can be considerably increased (e. g. from 20 to 60 song-types) without raising exposure frequencies accordingly. As the birds cope well with such an increase in the number of stimuli to be acquired (40), the results contrast with paradigms from learning theory, which state that exposure frequency must grow proportionally with the number of stimuli to be acquired (review in 41). Implications of our findings are relevant to the issue of song acquisition as a special process or template learning (42) and any inquiry into the memory mechanisms has to take into account that specific adapta-

Rule-Related Knowledge in the Song System of Birds

93

Fig. 4.2: List of variables affecting the acquisition of song-types.

tions are involved in the process. Tracing the functional properties of acquisition mechanisms can be studied on a broad array of experiments that include the perceptual and motor phase of song development. In the following we will exemplify representative findings from our non-invasive, behavioral approach.

4.2.2

Discontinuous versus Incremental Processes

The fact that a long lasting memory of complex stimuli ensues from remarkably few exposures has motivated debates on whether the learning of a stimulus proceeds in a discontinuous or an incremental manner. In studies on human serial item learning, the conception of an 'all or none' process was favored, because phenomena of apparently incremental learning could be explained as a cumulative effect of several 'all or none' processes (43). For a number of reasons song acquisition is a particularly appropriate system to further elucidate this issue: Acoustic signals are volatile stimuli, thus the succession of stimuli directly relates to a succession of perceptual episodes and acquisition should ensue instantaneously upon hearing. Concurrently and because of this particular property of acoustic stimuli, the experimental quantification of 'amount of exposure' is straightforward. This is different from behavioral studies on visual learning (e. g. imprinting or food storing), where exposure is expressed as the duration of a trial during which subjects are exposed to all features of a complex stimulus simultaneously. In addition, the problem of stimulus novelty or familiarity, which is inherent to experiments on human verbal learning, can be objectively addressed by presenting naive fledgling birds to stimuli which they had not experienced before. To examine whether stimulus acquisition in the song acquisition of birds is effected in a discontinuous or incremental manner we studied the sequencing of imitations which handraised nightingales had developed from master song-types to which they had been exposed during tutoring experiments before (44). Different from our standard procedure, where song sequencing in a string of master song-

94

Henrike Hultsch, Roger Mundry and Dietmar Todt

types is kept constant over the whole tutoring period (e. g. over 20 exposures to a given string), song sequencing was rearranged anew between exposure trials here. As we did the sequence modification by rearranging a given set of song-types, the birds experienced any model song-type sufficiently often for pattern imitation to occur (45; 46). In contrast, they experienced a given sequence version much fewer times, because the succession of song-types was modified across trials. The number of exposures to a sequence version was varied by using two different experimental settings: In one of the learning programs we modified the sequencing of model song-types after every presentation, so that birds experienced any sequence version only once (total variation = TOTVAR). In the other program, sequential variation of a string was reduced (partial variation = PARVAR). Here, birds experienced one sequence version at the beginning and another at the end of the tutoring repeatedly, whereas in the middle trials were run each of them with a new sequencing of song-types. Thus, besides a raised exposure frequency to particular sequence versions, these versions were also placed at distinct positions of the tutoring, i. e. the initial and the final trials of that experiment. For control, subjects were additionally exposed to a master string with no sequence modification (no variation NOVAR). Analyses conducted to identify the impact of these tutoring regimes showed that all subjects had developed imitations that could be unambiguously assigned to one of the master song-types in the tutored strings. In addition, the acquisition from both sequentially varied master strings was remarkably high. That is, the acquisition was not impaired by the procedure of rearranging items in a master string. This finding served as a prerequisite for checking whether the birds had acquired also information encoded in the succession of song-types or a master string composition, respectively. Our analyses yielded the following results: In all subjects exposed to the PARVAR program, performances fitted best the first string version played to them (exposures 1 to 5 ). In contrast, there was no indication that serial information from any of the next five string versions (exposures 6 to 10) had been acquired. Birds produced song transitions which they had heard in the final string version (exposures 11 to 15) slightly more frequently than those heard before, but the prevalence of the first exposures was still significant. Subjects exposed to the TOTVAR experiment showed the same characteristics: Although these birds had experienced any string version only once, their performance revealed that they had acquired song-type sequencing from the first string version played to them. Different from the PARVAR experiment, the last string version was not emphasized among the other sequences in its effect on serial learning. A comparison of the amount of serial learning from the three tutoring regimes revealed that the performance improved with more frequent exposures (Fig. 4.3). In summary, these findings suggest that the acquisition of serial information encoded in a sequence of songs operates in a discontinuous, 'all or none' manner and, at the same time, is enhanced gradually, i. e. through an incremental process. These findings prompt questions on whether a similar operation could be postulated for acquisition processes at a more basic level of song organization; e. g. the structure within songs. To examine this issue, birds were allowed to experience a set of song-types that, in particular parts were experimentally modified upon

95

Rule-Related Knowledge in the Song System of Birds

100 η

80

-

60

-

- - o- - • Control —·— First —o—Last — Random

40 -

20

-

Rest

0 0 1

2

3

4

5

6

7

8

9

10

11

Sequential Intervals Fig. 4.3: Acquisition of serial information in the TOTVAR experiment. Each panel refers to the data of one male. The curves are cumulative frequency plots of the sequential intervals assessed between consecutive imitations. Curves labeled as 'Control' give data on learning from the NOVAR string. Curves labeled as 'First' or 'Last' refer to intervals concerning the first or last sequence version, respectively. Intervals referring to any of the remainder sequence versions (n= 13) are labeled as 'Rest' (see methods for details).

subsequent exposures. In short, again we f o u n d a strong tendency to imitate those versions which birds had experienced during the first encounters ('primer effect', 44). This effect persisted when an exposure to later versions was up to four times more frequent than to the first ones.

96

Henrike Hultsch, Roger Mundry and Dietmar Todt

Viewed from a biological perspective, such acquisition mechanisms appear clearly adaptive. During song acquisition in a natural context, birds would not be exposed to stereotyped sequences or a frequent occurrence of a given song-type. Rather there is considerable variability in the song delivery of a conspecific tutor in the field. To acquire the various song patterns from such a natural tutor, it seems appropriate to start a memory trace right from the first exposure and use that as a reference upon which information acquired during further exposures would be used to update and/or ameliorate memory contents. Behavioral support for a particular salience of the first exposure/s for the birds perceptual attention comes from the experiments of Müller-Bröse and Todt (39). They showed that the motility of birds during the tutoring covaried with the number of repeated exposures to song strings: It was low during the first exposure and increased with more frequent string presentations, but dropped again when a new set of song-types, i. e. a new master string was presented. Both the particular significance of the first exposure and the incremental effects of exposure frequency on the serial performance of acquired imitations are relevant to neurobiological concepts of the song acquisition system (rev. 47-49). Neuromorphological correlates of acquisition processes, as first demonstrated by Scheich and coworkers for both imprinting and song learning (rev. in 50) can be outlined as a decline in synaptic connectivity (reduction of spine synapses) along with learning related neuronal activity. Basing on their arguing for imprinting, specific information from the first stimulation would be inducive for synaptic selection or regression. Concurrently functional (i. e. active) connections would be consolidated by repeated input (Hebbian type of learning, 51).

4.2.3

Preordained Knowledge: Acquisition of Song Structure

The accuracy of imitations that birds develop from model songs heard only a few times may be remarkably high. Such accomplishments raise questions on whether and how inherent, preordained knowledge about the species-typical song structure comes into play with acquisition processes (52). We approached this kind of questions by testing the effects of modifying the structural rules of song composition in a way that violated its species-typic syntactical organization. The structure of a song is usually characterized by a division into several sections. A typical nightingale song has four of them (cf. Fig. 4.1). The first section (Alpha) holds one to three elements that are low in volume and short in duration. The next section (Beta) is louder, and typically consists of note complexes or motifs, which give that song part a tonal or melodic structure. In contrast, the third section (Gamma) is clearly rhythmical (trill section). Most songs also have a final section (Omega) that is filled by one unrepeated element. Other types of syntax features become clear when song sections are compared across different song renditions and flow charts of elements are inspected. Successive songs, for instance, often begin with the same type of Alpha-element. In addition, the transition between Alpha- and Beta-sections, as well as between Beta- and Gammasections, usually coincides with a number of 'decision points'. Here, flow charts

Rule-Related Knowledge in the Song System of Birds

97

of elements show a characteristic sequential branching that results from alternative continuations of song patterns (= diffluent flow schema; 53). In our experiments on the acquisition of structurally modified master songs, we assessed the following effects: (1)

Modifications to Alpha-sections: We omitted the introductory song elements. The modified songs were acquired at a rate comparable to that of normal songs. However, during song development the birds often invented Alpha-sections that were consistent with their speciestypical patterns.

(2)

Modifications to Beta-sections: We repeated the normally unrepeated element complexes of the Beta-sections. The modified songs were not acquired by the birds.

(3)

Modifications to Gamma-sections: We tripled the length of Gammasections by increasing the number of syllable repetitions. These songs were acquired at a rate comparable to that of normal songs. However, the birds reduced the number of syllable repetitions to the speciestypical range.

(4)

Modifications to Omega-sections: We repeated the normally unrepeated terminal element. These songs were acquired at a rate that was comparable to that of normal songs. During early stages of song ontogeny (plastic singing) the birds imitated the modifications. During adult singing, however, the birds corrected their imitations and produced only a single terminal element.

(5)

Modifications to inter-song boundaries. We exposed nightingales to 'super songs' which were generated by experimentally erasing the silent inter-song interval that normally segregates two successive master songs. The birds initially imitated the modified master songs as a sort of super-compound, but tended to split these compounds into two different song-types during their adult song performances.

Taken together, these findings suggest that the birds have access to a 'concept' of a species-typical song. Such concepts have been described as predispositions allowing the young birds to selectively acquire appropriate, i. e. species typical songs from a noisy acoustic environment (42). As the birds in our studies were also exposed to natural, i. e. unmodified song patterns, we cannot exclude that such predispositions were shaped through experience as well. Anyway, in most of the tested syntactical inconsistencies, prescriptions from preordained or acquired information did not restrict the acquisition during auditory exposure. That is, the majority of modifications were initially accepted by the birds and produced as imitations during the early stages of vocal development. And only after a certain amount of vocal practice had been accomplished, 'analysis' based on structure dependent knowledge came into play to set performance parameters according to

98

Henrike Hultsch, Roger Mundry and Dietmar Todt

species-typic rules. Thus, the findings presented here may be better accounted for by a rule-bound selection process. The hypothesis of a rule-bound selection process was supported by a recent experiment that examined rules inherent to the procedural hierarchy of information flow within song patterns. It was designed to violate the rule that two songs which differ in their first part should not share the pattern of their second part. To recall, a partial phonetic similarity of song-types is quite common in the individual nightingale, but is constraint to a sharing of initial parts such as the Alpha- and the Beta-sections. T o examine the significance of this rule, we presented young nightingales with pairs of master songs that were different in their first part (Alpha· + Beta-section), but shared the pattern of their second part (Gamma- + Omega-section). Since such pairs do not occur in a male's normal repertoire, we synthesized the test songs experimentally. A s a control, birds also heard pairs of master songs which shared the pattern of their first part, but were different in their second part and so followed the species-typic rule of intra-song organization (Fig. 4.4a). We observed two outcomes.

Master Song-types Diffluent Λ

C α, β

γ, ω ß

D α, β

• Υ, ω « » β

Κ

Μ

kHz

γ, ω

kHz 10

10-

β 6 4 2


Confluent

β

,

r ^ 0.5

i i l „,,»«

i Π

1

1.5

Π

111 1 1 1 I I I 1

β

Τ

-

2

2

' Vm u mI r i r i

4

s

kHz 10 8 β 4 2

0.5

1

T

1.5

2

s

kHj 10

I

8

;'• I I I I I I I I \

6 4 2 1 0.5

1.5

2

S

h V

"*

0.5



S

m mV ν i Vl ri i uV n 1.5

Fig. 4.4a: Frequency spectrogram of four master songs to which nightingales were exposed during a learning experiment. Songs given on the left were congruent in their Alpha and Beta section but differed in their subsequent parts, thus complying with the species-typic decision f l o w (diffluent branching). Songs given on the right were experimentally synthesized according to a confluent branching, which does not occur in the birds' normal song composition. Such song patterns were used to examine whether a violation o f species-typic rules would impair song learning.

Rule-Related Knowledge in the Song System of Birds

99

Master Song-types Diffluent

Confluent y, ω A

C

α, β

γ, ω Β

D

α, β

γ, ω

Ε. Plastic

Μ1

L. Plastic Crystallized

0.5

Μ2 0.5

0,5

Μ3 0

0,5

0

0,5

1

0,5

1

Μ4 0,5

1

Μ5 0

0,5

1

0

Relative Performance Frequency of Imitations

Fig. 4.4b: Result of the'learning experiment introduced in Fig. 4.4a. Data are given for five nightingales (Ml — M5). Horizontal columns show proportions of vocalized imitations derived from two master song types that were related either by a diffluent intra-song branching (left) or by a confluent one (right). Data refer to three stages of song development.

First, from the experiments with the biologically 'adequate' song pairs (control), males developed imitations of either of the tutored master song versions. These imitations were performed at about equal rates throughout song ontogeny, and all of them persisted in the birds' repertoires. Second, in the test design the majority of nightingales developed imitations of either of the tutored master song versions, too. During ontogeny, however, there was a marked decline in the performance rate of one of the pair-wise similar imitations. In most cases then, only one song version 'survived' to the end of song crystallization (Fig. 4.4b). As in the previous experiments, our findings are best explained by a rule bound selection process that shapes the performance according to the species-typical organization. Interestingly, the present case concerns a syntactical rule that refers to

100

Henrike Hultsch, Roger Mundry and Dietmar Todt

the composition of song repertoires and that poses constraints on possible song variants (allowing for diffluent branching only). Such a mechanism was hitherto unknown for the song development of birds.

4.2.4

Hierarchical Organization: Implementation of Levels

The large repertoire of distinct song-types in nightingales allows to study the impact of learning on different levels of song organization. In the following we describe how learning affects levels of beyond that of a song and how it can shape e. g. the performance of long sequences of songs (inter-song level). We have investigated this issue by exposing nightingales to a raised amount of master songs. During the tutoring, these songs were presented in a distributed manner, i. e. as units of several different master strings, each of which the birds experienced 20 times. When song repertoires performed by the trained birds were examined for rules of song-type sequencing, we found a surprising effect: The nightingales' singing clearly reflected information about the context of master song presentation. In other words, males produced imitations acquired from the same master string as sequentially associated song-types and segregated from imitations acquired from the other strings. This achievement was described by the term context effect (54), and explained as follows. Nightingales acquire and memorize a given string of master songs as a kind of super-unit which, upon retrieval, they separate from another super-unit, i. e. from songs learned from another master string. These super-units are termed context groups. The size of such groups is determined by the length of a tutored string, or the number of different master songs that a bird had experienced in the same temporal context. Further experimentation allowed to characterize the time between exposures to different master strings necessary to generate a context effect. If strings were separated by less than five minutes no clear context effect could be assessed (55). The formation of context groups turned out to be a basic feature of song learning in our birds. However, detailed inquiries into the song sequencing of context groups revealed another significant effect. Within performance episodes of context groups, we detected smaller subsets of sequentially associated song-types, which we called 'packages' (45). In functional terms, package formation describes the fact that a larger body of serial data (i. e., information from a string of master songs) is segmented into subsets of sequentially associated items. Such associations have a number of characteristics: First, the packages describe song-type associations which are hierarchically inferior to the context groups. Consequently, one can sketch a structural hierarchy that in a top-down order reads as follows: Context groups composed of song-type packages and packages composed of songs. Second, each package holds a limited number of song-types only, and frequency distributions of package sizes reveal a prominent peak between three and five types of songs. Third, within a package, sequential associations among song-types are stronger than associations between different packages. Nevertheless, song-type sequencing within a package can deviate from a unidirectional succession (schema: A - B - C - D - E ) and show more flexi-

Rule-Related Knowledge in the Song System of Birds

101

ble permutation modes (e. g.: A-B-A-E-D-E). Such complex combinatorial relationships among package members indeed substantiate the notion that package groups establish a level of hierarchy in the representation of song-types. Finally, different birds exposed to the same master string usually form different packages or show different boundaries between packages, respectively. Therefore, package groups can be characterized as self-induced associations. In the course of further inquiries into the mechanisms underlying package formation, packages were hypothesized to be the result of a process that segmented the information about a succession of master songs, which had been temporally coherent during exposure. This raised the issue of when after exposure and where in the subsystems mediating between auditory exposure and vocal production of songs such a segmentation would take place. Theoretically, it could be part of the acquisition, the storage, or the retrieval system (45). To date there is substantial evidence suggesting that package formation results from a process which segments serially presented information during auditory acquisition. One study used an indirect approach and tested nightingales raised in acoustic isolation. The examination of whether song-types packaging would occur in these birds yielded negative evidence (56). In a more experimentally oriented approach (57), nightingales were exposed to strings of master song-types which instead of normal intersong intervals (4 s) contained either prolonged (10 s) or reduced interval durations (1 s). If the size of developed packages was not affected by these treatments, this would favor the hypothesis that package formation is a retrieval based phenomenon. The results allowed to reject this hypothesis. Rather, the study provided evidence that package formation is a correlate of the auditory learning and affected by two properties of the acquisition mechanism: a capacity constraint memory buffer and a time window (Fig. 4.3). Additional support for package formation as an early achievement of song learning came from a third line of evidence. The package effect gets masked when the serial order of performed song-types becomes more stereotyped. This could be induced, for instance, when the exposure frequency to stereotyped master strings during acquisition was high, e. g. 100 times. ('serial order effect', 55). In summary, our studies on nightingales have shown that birds acquire information encoded in the serial succession of master song-types presented during their training. Concurrently, the birds develop additional hierarchy levels such as packages and context groups. In contrast to the self-induced package groups, both serial song order and context groups do clearly reflect the organization of inputs during auditory learning and consequently are distinguished as exposure-induced compounds.

4.2.5

Extraction of Cues Encoded in a Learning Design

The discovery of the 'context effect' stimulated us to examine in more detail how the birds would treat specific cues that we encoded in a given learning design. In particular, we wanted to know whether nightingales would be able to extract and

102

Henrike Hultsch, Roger Mundry and Dietmar Todt

also memorize special non-auditory stimuli that were paired with auditory stimuli. To test for such ability we exposed our birds to light flashes that were generated by a stroboscope and presented synchronously to the sound patterns of four different sets of master song-types. For control, we used other sets of song-types played back without strob-light stimulation. Analyses of songs produced by our experimental birds showed that these learning programs had affected both the acquisition process and the performance mode of acquired songs. For instance, under the strob-light regime males with a low overall learning success had acquired more songs than under the control regime. Such difference were less marked in males with a high overall learning success. We explain the increase of the test birds' acquisition as mediated by an increased attention or arousal in individuals who normally would score as poor learners. There were two other effects induced by the strob-light tutoring: Both groups of males preferred to sing strob imitations over those experienced through hearing alone. In addition, birds tended to sequentially cluster imitations developed from either regime, although the respective master songs had not been presented as a coherent sequence during the tutoring. As none of these effects could be accounted for regime related differences in the acoustic stimuli, we conclude that the master songs presented in the strob-light regime were encoded with some kind of 'flagging' which was stored in long term memory and eventually influenced the performance. For a long time, bird song has been viewed to develop from non associative signal acquisition. The findings on both the raised performance frequency and the regime related clustering of imitations now show that stimulus pairing can be effective in song learning too. Thus, at least nightingales are able to associate the acoustic input with extramodal stimuli, in this case light. Unfortunately, critical tests of associative learning, i. e. the experimental cueing of the behavioral performance, could not be applied here: During their own singing, the light stimuli experienced during auditory acquisition appeared to aversive, and the birds interrupted their performance (see chapter 4.3.2). Referring to our current model on the hierarchical representation of acquired information, the findings can be accommodated by postulating some kind of 'top level hierarch' that links song material acquired from different strings according to tutoring context. From an evolutionary perspective, the fact that this accomplishment became obvious by such simple stimuli as used here, documents that the song acquisition system of birds is extremely sensitive or 'predisposed' for the processing of contextual information to be associated with the acoustic experience. From all that is known about the functional significance of singing behaviors, such an adaptation makes strong sense. Therefore it is now expedient to examine the potential impact of other kinds of cues that are biologically more relevant then the tested ones. Candidates would be, for instance, categories as 'song-types heard at a particular location', 'song-types heard from a particular individual', or 'song-types specified by a particular quality'. Functionally, an association and cue related sorting of acquired song material with these categories would allow the bird to individually adapt repertoire delivery to its acoustic and social environment (cf. Fig. 4.5).

103

Rule-Related Knowledge in the Song System of Birds Learning Programs = Succession of Master Strings / Songs

OOO

CKXKXXXXKXMXKXXXXX)

OCKXXXKKK)

c ρ s Singing Programs: Succession of - Song - types - Packages - Context groups

(s) (ρ) ( c)

Fig. 4.5: Illustration of the two hierarchical levels (packages, context groups) detected above the level of songs when we analyzed the singing of nightingales (bottom). The birds had been tutored with three master strings that were composed by different master songs (top).

Field data supporting the view that associative learning may pay an importing role in song acquisition have been obtained from the thrush nightingale (Luscinia luscinia), a twin species of the common nightingale. In areas, where both species live sympatric, genetic thrush nightingales are found who sing songs of either species. According to phonetic and syntactic characteristics, their different songs could be sorted into three classes: nightingale, thrush nightingale, and songs with mixed characteristics. It is highly suggestive that the males learned from two species of tutors. Their performance was clearly not random with respect to the sequencing of these pattern classes. Rather, males performed them sequentially clumped in coherent bouts. Thus, temporal and social segregation of exposure contexts were reflected in the singing program of the adult birds.

4.3

Retrieval of Rule-Related Knowledge: Evidence from Song Performance

Analyses of song performance are a prerequisite in the behavioral approach to both rules of song acquisition and rules of song retrieval. This is why basic findings have already been mentioned in the preceding chapter. In the following, we will exemplify further features of song performance. We begin with a list of rules that become apparent during the ontogenetic development of song in birds which beforehand had been exposed to specific learning programs. The ontogeny of singing in nightingales shows a number of traits that are widespread across oscine birds (58; 59). For example, in the typical case, the early phase of auditory learning is segregated from the phase of vocal production by an

104

Henrike Hultsch, Roger Mundry and Dietmar Todt

interval of several weeks. Vocal activity of the young bird then, covers another longer time span that is often lasting for several months. Early in life, all birds perform temporally coherent arrays of vocalizations that first are phonetically amorphous and only gradually improve in terms of form and structure. Due to the high variation of vocal patterns, these behaviors have been compared to the playful activities found in young mammals. Referring to the profile of the developmental progress, Marler and Peters (32) have suggested a tripartite model which distinguishes among (a) subsong, (b) several stages of plastic song and (c) crystallized fullsong. The study of these stages allowed to identify a set of ontogenetic trajectories that will be described in the following paragraphs. Again we will focus on song development in nightingales to provide a comparison to characteristics of its song acquisition. The subsong of birds consists of soft and rambling vocalizations which are rather difficult to analyze. Therefore, the stage of plastic song is the matter of choice for investigators who search for rules of pattern development. In nightingales who have a particularly extended period of vocal ontogeny, plastic singing starts at an age of about eight months, i. e. in January. Then, first precursors of acquired imitations can be discerned and along with time an increasingly larger number of song-type precursors can be identified. The completion of the repertoire of song-type precursors takes a period of several weeks. During this process, imitations of master songs that the birds heard early in life do not emerge earlier than imitations of others that they had experienced later during the tutoring. In other words, the temporal order of song-type production does not reflect the temporal order of auditory acquisition.

4.3.1

Rules reflected by trajectories of song development

Ontogenetic trajectories are found on both hierarchy levels, that is within or between songs. Trajectories expressed at the intra-song level concern the following traits. During early stages of ontogeny, birds often sing incomplete songs; i. e., some song constituents may be missing. In addition, the serial succession of song sections may be inverted, which results for instance in the final trill-section of a song being produced ahead of the normally preceding note complex. Thus the intra-song syntax is not initially stereotyped. At the same time, however, the phonetic morphology of patterns is sufficiently elaborated to allow for an easy identification of song-type precursors. In other words, pattern phonetics takes its adult form ahead of the intra-song syntax, which 'crystallizes' only at about ten months of age. Interestingly, the phonetic or syntactic quality of imitations produced relatively late in ontogeny is not inferior to the quality of imitations produced earlier in age. This suggests a developmental trajectory which does not build on vocal 'experience' with a particular output, but concerns a general progression in motor competence or skill (60). Besides trajectories concerning the pattern structure of vocalizations, ontogenetic progression also proceeds in the time domain of singing. As both the duration of vocal compounds as well as their temporal segregation have to be shaped,

Rule-Related Knowledge in the Song System of Birds

105

trajectories in the time domain are highly interrelated and follow complex rules (61). The adult time structure of singing (songs alternating with silent intervals of about the same duration) is the last performance feature to crystallize; its adult form is achieved only at an age of about 11 months (cf. Fig. 4.6). On a higher level of song organization, the following rules could be characterized. Imitations which, in the adult performance, are identified as members of the same package emerge, quite consistently, together in time. In addition, throughout ontogeny these precursors are sequentially associated in the same way as in the adult singing (62). The association of different packages, i. e. the development of context groups, on the other hand, seems 'delayed' and in their fullygrown form these can only be assessed close to the time of song crystallization. However, the time structure of singing gives some indication that they are significant levels of performance organization already during ontogeny. During the phase of continuous vocal production, for example, the intervals between imitations acquired from the same master string or context group were significantly shorter than those intervals assessed when the birds switched to imitations of another context group (46). In conclusion, on the inter-song level, the ontogeny of song material indeed reflects properties of the song-type association groups referred to earlier (see 4.2). Here, trajectories do not only substantiate the view that these groups are memorized and encoded as higher levels units of song organization. It is, in addition, suggestive to think of retrieval following hierarchical mechanisms of action selection.

4.3.2

Repertoire Modification: Open versus Closed Processes

Motor development of song is not simply a process through which a bird improves the quality of acquired song material by vocal rehearsal. At least two further maneuvers merit a short consideration. One of them enlarges the repertoire of a bird, whereas the other one has the opposite effect. In nightingales, increasing the repertoire size is much more pronounced than decreasing it (63; 64). Repertoire enlargement is achieved by either acquiring additional song-types, or by developing new recombinations or, finally, inventing novel songs. In bird species called 'age-independent learners' (52), further learning can occur during the phase of plastic singing or even later in life. When at an age of nine months, for example, nightingales are exposed to a master string with two parts, a familiar portion (heard by the birds during the first period of song acquisition) and a novel portion (containing new song-types), one can observe two effects: First, the renewed exposure to the familiar song-types raises the performance frequency of imitations that the birds had acquired earlier. A similar finding is reported for white crowned sparrows, pointing to a process coined as 'action based learning' (65). Second, the birds acquire the novel song-types and perform them as an integral part of that context group from which the familiar song-type sequence was taken (66). These effects are in line with observations obtained for birds who, instead of being housed in isolation, are housed together, thus allowing

106

Henrike Hultsch, Roger Mundry and Dietmar Todt

them to vocally interact with each other. Here the composition of song repertoires and also the performance preference of shared song-types clearly converge. Thus, additional learning coupled with a shaping of the performance towards convergence can lead to a sharing of at least parts of repertoires among conspecific neighbors (18; 67-69). A different strategy of repertoire enlargement that, in contrast, enhances the vocal individuality of a songster, is the development of new recombinations or novel inventions of songs. Nightingales, for instance, are able to generate individual specific song-types by recombining parts of imitated songs in a novel way. Interestingly, such recombinations are limited to material of song-types associated within the same package group (46). In addition, nightingales may develop songtypes that do not contain material from the learning experiments, and so are completely new. Both during ontogeny and the adult singing, genuine inventions occur as coherent subsets in the singing, which results in an alternation of performance phases containing acquired imitations or novel inventions. Males classified as poor learners (acquisition success, related to presented master song-types 70%). Thus one may speculate that, at least in handraised nightingales, invented songs reflect a predisposition to develop a vocal repertoire of a certain size. Upon reaching the final stage of song development (crystallization) birds may reduce their song-type repertoire. This phenomenon is especially marked in species that as adults use only small song-type repertoires (70; 71). In nightingales, repertoire constriction is much less conspicuous; only about five to eight percent of imitations identified in the course of song development (66) are discarded from the final repertoire. Nevertheless, the ontogenetic 'history' of eventually discarded song-types makes repertoire constriction a quite interesting issue in nightingales, too. During the phase of plastic singing, these song-types are produced with a rather poor copy quality. Such correlational finding has to be closer characterized by further analyses.

4.3.3

Experimental Examination of Song Retrieval in Adult Birds

The validation of rules of song retrieval includes to examine their consistency or variation when a songster is exposed to sensory stimulation. In the natural environment such cases are given during vocal interactions among conspecific neighbors (72). Both pattern specific relationships and time specific relationships between songs that neighbors exchange reveal that songsters listen and also respond to another bird's songs. The most conspicuous reply in the pattern domain has been described as 'equivalent reaction' or 'vocal matching' (schema: X —» X'). Here, a bird responds to a perceived song by a producing a song of the same or a most similar pattern. Time specific relationships between songs can influence the meaning of pattern specific relationships. This is especially clear for interactions by vocal matching

Rule-Related Knowledge in the Song System of Birds

107

that can occur either as a rapid response which temporally overlaps the song of a neighbor, or as a delayed response which begins after the neighbor has finished his vocal pattern. The point is, that the message of the signal is encoded in its temporal relationship to the matched signals. That is, vocal matching may serve to address a particular songster, but superimposed on this matching is a message that is encoded in the temporal pattern of signal delivery (20). Inserting a song matching into the silent intersong intervals may be a form of addressed greeting (73), whereas overlapping matching can make a strong agonistic reply. There is evidence that overlapping singing is responded to not only by a given addressee, but also by other birds who obviously evaluate such interactions between their neighbors (74). Both temporal and pattern specific interactions of birds have been explained by a system of intrinsic components that determine a particular form of reply. Based on the assumption that each vocalization of a song must be preceded by a decision about which song-type is to follow next, a mechanism was postulated, which compares all variables affecting the retrieval of a particular set of song-types. Eventually then, that song-type is vocalized for which the promoting influences from the components is highest (75-79). Currently, the hypothesis has been specified by the view that decisions take place in a hierarchical top-down process, i. e. first among context groups and then among packages and finally among song-types. Predictions from the dynamical model have been tested especially by exposing songsters to auditory stimuli. As the model treats song-types as 'holistic' pattern compounds, it does not account for decisions related to more elementary song units. Thus, hierarchy levels below song-types will have to be considered in future experimentation. More recently we supplemented this approach by a study that, instead of acoustic patterns, applied light stimuli. The study served to specifically elucidate rules of song retrieval in our hand raised nightingales. Stimuli were generated by a flash light regime and specifically placed within a birds singing. We expected the stimuli to induce interruptions of singing, and thereby to answer two question: (1) Would the birds interrupt their vocalizations within songs, or between songs only? (2) When and how, i. e. with which song-type, would the birds continue their singing? The results were straightforward (80). The majority of songs that had been hit by a light stimulus were interrupted, and such response was observed with a latency of 0,05 to 2 s. This result showed that the performance of a nightingale song is not executed in prefixed manner, but in a rather flexible mode. After song interruptions, birds continued their singing, and this was done by repeating the same song-type that had been hit by the light stimulus. The probability of that effect dropped markedly after a time window of 4 s. These findings were in line with a hypothesis suggesting that the mechanisms of decision making and song retrieval operate by a memory buffer which, after retrieval of a song from memory, holds the respective information until a specific 'off-command' (81). It has been assumed that an 'off-command' normally occurs after the production of the concerned song is terminated, and that it can be delayed, for example, because of uncomplete pattern delivery, or as consequence of perceptual inputs such as auditory or visual stimuli. In case of a

108

Henrike Hultsch, Roger Mundry and Dietmar Todt

delay, the decision process which determines the quality (here: the type) of the next song, will favor the previous song-type as long as its information continues to be available in the memory buffer. Predictions of this model will be investigated by further experiments.

4.4

Conclusions: Processing of Rule-Related Knowledge in a Songbird

To recall, the singing behavior of birds reflects a structural hierarchy that, in a bottom-up order, is given by the units which compose a song pattern (intra-song level), by the songs and finally by sequences which are composed of songs (intersong level). Either of these levels reveals a differentiated organization and dynamics, and when this is examined one arrives at another kind of hierarchy, termed procedural. It is given, for instance, by the rules of unit succession and, for the versatile songster, these may reflect hierarchical mechanisms for decision making during their singing. Bird song is a learned behavior, and this fact allows to use this signal system for asking questions on the role that individual experience plays in the implementation of these rules. Our findings from learning experiments and studies of song development revealed a close linkage between both hierarchy domains and confirmed that either of them is biologically relevant. Song-type associations, for example, do not only reflect a hierarchical representation of the memorized information (Fig. 4.5). Their characteristics do also allow to trace the properties of the underlying memory mechanisms.

4.4.1

Song Acquisition and Memory Mechanisms

Song acquisition in birds who, like nightingales, develop large vocal repertoires has been explained as a coordinated operation of three mechanisms: a 'short term memory', a 'recognition memory' and a battery of 'submemories' (54), Properties of the short term memory cause a segmentation of serially coherent master strings into different packages of information. There is evidence that this segmentation results from two constraints, a limited capacity and a time constraint memory span (57). Properties of the recognition memory, on the other hand, identify stimulus patterns as novel or familiar and categorize information of the familiar patterns by song-type, package type, and context group. Acquired song material is then further processed in a battery of submemories, each of which stores information about a given string segment (a package). Parallel data processing in a battery of submemories would explain why long master strings are learned as effectively as short ones, even when heard only 10 to 20 times. However, since each submemory is supposed to hold information from a given string segment only, an additional process has to be postulated that somehow associates those packages that were developed from a given context group. The proposed acquisition system

Rule-Related Knowledge in the Song System of Birds

109

predicts that the first exposure to a master string would play a key role in the acquisition of serial information on song-type sequencing. This prediction was confirmed by experiments in which the serial succession of song-types in a master string was altered upon subsequent exposures during the tutoring (44; see chapter 4.2). Taken together, our studies provide an account on the distinction between 'experience expectant' and 'experience dependent' learning mechanisms (82). While the acquisition of the phonetic and syntactical rules encoded in a single songtype is reminiscent of an experience expectant process, this would not account for extracting and learning of information encoded in the rules of song-type sequencing (inter-song level).

4.4.2

Hierarchical Representation Format and Retrieval Rules

The representation of memorized information about songs appears to be organized in a hierarchical manner (Fig. 4.5). This can be concluded from analyses of song performance that uncovered specific rules of song retrieval. Some of these rules can be identified already during the ontogeny of singing. Thus, trajectories of the three hierarchy levels that, in a top-down order, were described as context groups, packages and song-types, develop in a way which allows an early detection of their particular features. To recall one example, the temporal diversity of intervals within and between context groups (shorter duration within than at switches between context groups) presumably points to properties of intrinsic pattern choice or retrieval. The effect could imply that access to the stored representation of song-types is quick or delayed depending on whether a retrieval 'program' from a given context group is already 'on' (i. e. a non-switch) or not yet 'on' (i. e. a switch). Alternatively, the differences in intervals could reflect decision times for retrieving patterns acquired from same or different contexts (46). A hierarchically prestructured repertoire is a candidate mechanism that would facilitate retrieval in situations demanding rapid vocal responses, e. g. during interactive counter-singing. Especially in the adult, territorial birds centrally or auditorily mediated decisions on 'what to sing next' would not have to be made among the whole pool of developed song-types. Rather, both decision steps and decision time would be reduced by using a search routine that subsequently addresses a particular subset of patterns only. The adaptive value of a hierarchical organized representation format of song data is quite conclusive in birds who, like the nightingale, have to administer large repertoires. During vocal interactions, the versatile songsters may respond to each other by sophisticated rules, i. e. by pattern specific and time specific relationships between the mutually exchanged songs. For example, in a reply category termed 'rapid matching', a male has to identify a neighbors song and at the time also to select and retrieve a song of the same type from his own repertoire within a latency of approximately one second only (19; 83).

110

4.4.3

Henrike Hultsch, Roger Mundry and Dietmar Todt

Comparative Aspects

Various perspectives on the vocal learning of birds invite to search for parallels in the development of communicative behaviors in general. Such parallels were postulated for the role of interactional variables (84), and for predispositions or sensitive phases to guide the acquisition process (85). Our findings on the learning mechanisms allow to add further facets to such a comparative framework. Such parallels were postulated for the role of interactional variables (84), and for predispositions or sensitive phases to guide the acquisition process (85). Our findings on the learning mechanisms allow to add further facets to such a comparative framework. The formation of song-type packages, for instance, shows striking similarities to the chunking of information in human memory research (86; 87). In humans, chunking is conceived of as a cognitive coding strategy that reduces the load on short-term memory upon both acquisition and retrieval of learned items by consolidating them into 'units of sense', e. g. a category. In birds, a chunk equivalent is given by a song (strophe) and organizing acquired songs into hierarchically higher 'packages' allows to administer and use large amounts of heard information. The functional significance of such a strategy can be concluded from the fact that songs are units of sense too, which play a particular role during the vocal interaction between neighbors (15). There are good reasons to suggest that cognitive accomplishments are involved in the development and use of bird song (see also 84). An interesting case is given, for instance, when the birds' performance is organized according to categorical cues acquired during exposure to stimuli. Such cues may be temporal, spatial or social ones. The exposure induced song-type associations developed by our nightingales, the context groups, highlight such accomplishments on higher levels of behavioral organization. A further example is the particular performance mode of invented songs, that might reflect a category of 'individual specific song patterns', as opposed to those shared with a tutor (see chapter 4.3.2). Finally, there are other studies demonstrating that e. g. warbler species do learn the situations in which to use their songs (88). Currently, however, it remains open on which particular cues such categorization is based and by which mechanisms a categorical representation would be achieved. This leaves us with a challenge to explicitly address this issue in the forthcoming research on repertoire birds. It is clear that any behavioral approach to the characterization of functional properties of control mechanisms warrants validation by inquiries into its neurobiological substrate. The specific anatomy of the birds' song control system with a whole set of distinct and interconnected nuclei has received extensive attention from neurobiologists and, during the last decade, exciting progress has been made in specifying the neural circuitry (reviews in 47; 49; 48; 89). Still however, unfolding the neural mechanisms which are causal to song learning and production is only at its beginning. A noninvasive study, as reviewed in this article, may lead to findings or questions that can guide and help in furthering progress here. Based on known features of neural circuitry, issues addressed by such questions could be: Which sort of

Rule-Related Knowledge in the Song System of Birds

111

neural circuitry would account for the storage, or control the retrieval and performance of a large song repertoire? H o w is genetically preordained information about song encoded and h o w does it interact with acquired rule related knowledge? H o w are the rules o f hierarchical organization represented in the central nervous system: structurally or physiologically or both, and if so, where? Given the remarkable advances in both physiological and neural modeling approaches, it seems reasonable to expect that at least some of these questions can be addressed in the near future.

Acknowledgments We appreciate the skillful help in handraising our birds, performing experiments or conducting data analyses provided by many people, namely Petra Ambrugger, Henrik Brumm, Claudia Fichtel, Nicole Geberzahn, Marina H o f f m a n n , Friederike Schleuß, Gisela Schwartz-Mittelstädt and Alexandra Wistel-Wozniak. We also are grateful to the B B A W ( A G R U L E ) for financial support of our studies.

References 1. Kroodsma, D . E . and Miller, E . H . (eds.) (1996). Ecology and evolution of acoustic communication in birds (Ithaka, London: Cornell University Press). 2. Janik, V. M. and Slater, P. J. B. (1997). Vocal learning in mammals. Adv. Study Behav. 26, 5 9 - 9 9 . 3. Howard, R. and Moore, A. (1991). A complete checklist of the birds of the world (London, England: Acad. Press). 4. Todt, D. (1970). Gesangliche Reaktionen der Amsel auf ihren experimentell reproduzierten Eigengesang. Z. vergl. Physiol. 66, 294-317. 5. Thompson, W. L. (1972). Singing behaviour of the Indigo Bunting, Passerina cyanea. Z. Tierpsychol. 31, 3 9 59. 6. Kroodsma, D. E. (1977). Correlates of song organization among north American wrens. Am. Naturalist 11, 995 — 1008. 7. Krebs, J. R., Ashcroft, R. M., and Webber, I. (1978). Song repertoires and territory defense. Nature 271, 539-541.

8. Catchpole, C. K. (1983). Variation in the song of the great reed warbler, Acrocephalus arundinaceus in relation to mate attraction and territorial defense. Anim. Behav. 31, 1217-1225. 9. Hultsch, H. (1980). Beziehungen zwischen Struktur, zeitlicher Variabilität und sozialem Einsatz im Gesang der Nachtigall, Luscinia megarhynchos. PhD thesis, F U Berlin. 10. Hultsch, H. (1993). Ecological versus psychobiological aspects of song learning in birds. Etologia 3, 309—323. 11. Falls, J. B. and DAgincourt, L. G. (1982). Why do meadowlarks switch song-types? Can. J. Zool. 60, 3 4 0 0 3408. 12. Kramer, H. G. and Lemon, R. E. (1983). Dynamics of territorial singing between neighbouring Song Sparrows (Melospiza melodia). Behaviour 85, 198-223. 13. Catchpole, C. K. and Slater, P. J. B. (1995). Bird song — biological themes and variations (Cambridge, USA: Cambridge Univ. Press). 14. Beecher, M. D. (1996). Birdsong learning in the laboratory and field. In:

112

15.

16.

17.

18.

Henri

Ecology and evolution of communication, D. E. Kroodsma and Ε. H. Miller, eds. (Ithaka, NY: Cornell Univ. Press), pp. 61-78. Todt, D. and Hultsch, H. (1996). Acquisition and performance of repertoires: Ways of coping with diversity and versatility. In: Ecology and evolution of communication, D. E. Kroodsma and Ε. H. Miller, eds. (Ithaka, NY: Cornell Univ. Press), pp. 79-96. Todt, D. (1970). Gesang und gesangliche Korrespondenz der Amsel. Naturwiss. 57, 61—66. Todt, D. (1971). Äquivalente und konvalente gesangliche Reaktionen einer extrem regelmäßig singenden Nachtigall (Luscinia megarhynchos B.). Z. vergl. Physiol. 71, 262-285. Hultsch, H. and Todt, D. (1981). Repertoire sharing and song post distance in nightingales. Behav. Ecol. Sociobiol. 8, 1 8 2 - 1 8 8 .

19. Hultsch, H. and Todt, D. (1982). Temporal performance roles during vocal interactions in nightingales. Behav. Ecol. Sociobiol. 11, 253-260. 20. Hultsch, H. and Todt, D. (1986). Zeichenbildung durch mustergleiches Antworten. Ζ. f. Semiotik 8, 233-244. 21. Kroodsma, D. E. (1982). Song repertoires: Problems in their definition and use. In: Acoustic Communication in Birds, Vol. 2, D. E. Kroodsma and Ε. H. Miller, eds. (New York: Academic Press), pp. 125—146. 22. Todt, D. (1968). Zur Steuerung unregelmäßiger Verhaltensabläufe. In: Kybernetik, H. Mittelstaedt, ed. (München, Germany: Oldenbourg), pp. 465485. 23. Lemon, R. Ε. and Chatfield, C. (1971). Organization of song in Cardinals. Anim. Behav. 19, 1-17. 24. Shiovitz, Κ. A. (1975). The process of species-specific song recognition by the indigo bunting (Passerina cyanea). Behaviour 55, 128-179. 25. Bondesen, P. (1979). The hierarchy of bioacoustic units expressed by a phrase formula. Biophon 6, 2—6.

Hultsch, Roger Mundry and Dietmar Todt 26. Thompson, N. S., LeDoux, K., and Moody, K. (1994). A system for describing bird song units. Bioacoustics 5, 267-279. 27. Todt, D. and Hultsch, H. (1980). Functional aspects of sequences and hierarchy in bird song. Acta XVII. Congr. Int. Orn., Berlin. 663-670. 28. Todt, D. (1970). Zur Ordnung im Gesang der Nachtigall (Luscinia megarhynchos). Verh. Dtsch. Zool. Ges. 64, 249-252. 29. Naguib, M., Kolb, H., and Hultsch, H. (1991). Hierarchische Verzeigungsstruktur in den Gesansgstrophen der Vögel. Verh. Dtsch. Zool. Ges. 84, All. 30. Naguib, M. and Kolb, H. (1992). Vergleich des Strophenaufbaus und der Strophenabfolgen an den Gesängen von Sposser (Luscinia luscinia) und Blaukehlchen (Luscinia svecica). J. Ornithol. 133, 133-145. 31. Böhner, J. (1990). Early acquisition of song in the zebra finch (Taeniopygia guttata). Anim. Behav. 39, 369-374. 32. Marler, P. and Peters, S. (1982). Structural changes in song ontogeny in the Swamp Sparrow, Melospiza georgiana. Auk 99, 446-458. 33. Kroodsma, D. E. (1979). Vocal dueling among male Marsh Wrens: Evidence for ritualized expressions of dominance / subordinance. Auk 98, 506— 515. 34. Nottebohm, F., Nottebohm, Μ. E., and Crane, L. A. (1986). Developmental and seasonal changes in canary song and their relation to changes in the anatomy of song control nuclei. Behav. Neurol. Biol. 46, 445-471. 35. Chaiken, M., Böhner, J., and Marler, P. (1993). Song acquisition in European starlings, Sturnus vulgaris: A comparison of the songs of live-tutored, tape-tutored and wild-caught males. Anim. Behav. 46, 1079-1090. 36. Todt, D., Hultsch, H., and Heike, D. (1979). Conditions affecting song acquisition in nightingales (Luscinia megarhynchos). Z. Tierpsychol. 51, 23-35.

Rule-Related Knowledge in the Song System of Birds 37. Todt, D. and Böhner, J. (1994). Former experience can modify social selectivity during song learning in the nightingale (Luscinia megarhynchos). Ethology 97, 169-176. 38. Hultsch, H., Lange, R., and Todt, D. (1984). Pattern-type labeled tutoring: A method for studying song-type memories in repertoire birds. Verh. Dtsch. Zool. Ges. 77, 249. 39. Müller-Bröse, Μ. and Todt, D. (1991). Lokomotorische Aktivität von Nachtigallen (Luscinia megarhynchos) während auditorischer Stimulation mit Artgesang, präsentiert in ihrer lernsensiblen Altersphase. Verh. Dtsch. Zool. Ges. 84, 476- 477. 40. Hultsch, H. and Todt, D. (1989). Song acquisition and acquisition constraints in the Nightingale (Luscinia megarhynchos). Naturwissenschaften 76, 83-86. 41. Crowder, R. G. (1976). Principles of learning and memory (Hillsdale, N. J.: Lawrence Erlbaum). 42. Marler, P. (1976). Sensory templates in species-specific behavior. In: Simpler networks and behavior, J. C. Fentress, ed. (Sunderland, MA.: Sinauer Associates), pp. 314-329. 43. Roitblat, H. L. (1987). Introduction to comparative Cognition (New York, USA: Freeman). 44. Hultsch, H. and Todt, D. (1996). Discontinuous and incremental processes in the song learning of birds: Evidence for a primer effect. J. Comp. Physiol A 179, 291-299. 45. Hultsch, H. and Todt, D. (1989). Memorization and reproduction of songs in Nightingales (Luscinia megarhynchos): Evidence for package formation. J. Comp. Physiol. A 165, 197— 203. 46. Hultsch, H. (1993). Tracing the memory mechanisms in the song acquisition of birds. Neth. J. Zool. 43, 155 — 171. 47. Konishi, M. (1989). Bird song for neurobiologists. Neuron 3, 541—549.

113

48. Doupe, A. (1993). A neural circuit spezialized for vocal learning. Curr. Op. Neurobiol. 3, 104-111. 49. Nottebohm, F. (1993). The search for neural mechanisms that define the sensitive period for song learning in birds. Neth. J. Zool. 43, 193-234. 50. Scheich, Η. and Braun, Κ. (1988). Synaptic selection and calcium regulation: Common mechanisms of auditory filial imprinting and vocal learning in birds? Verh. Dtsch. Zool. Ges. 81, 7 7 - 9 5 . 51. Hebb, D. O. (1949). The organization of behavior (New York: Wiley and Sons). 52. Marler, P. and Peters, S. (1987). A sensitive period for song acquisition in the song sparrow, Melospiza melodia: A case of age limited learning. Ethology 76, 89-100. 53. Thimm, F. (1973). Sequentielle und zeitliche Beziehungen im Reviergesang des Gartenrotschwanzes (Phoenicurus phoenicurus L.). J. comp. Physiol. 84, 311-334. 54. Hultsch, H. and Todt, D. (1989). Context memorization in the learning of birds. Naturwissenschaften 76, 584-586. 55. Hultsch, Η. and Todt, D. (1992). The serial order effect in the song acquisition of birds. Anim. Behav. 44, 590— 592. 56. Wistel-Wozniak, A. and Hultsch, H. (1992). Song perfomance in nightingales (Luscinia megarhynchos) which had been raised without exposure to acoustic learning programmes. Verh. Dtsch. Zool. Ges. 85, 246. 57. Hultsch, H. (1992). Time window and unit capacity: Dual constraints on the acquisition of serial information in songbirds. J. Comp. Physiol. A 170, 275-280. 58. Marler, P. (1991). Differences in behavioural development in closely related species: Bird song. In: The development and integration of behaviour, P. Bateson, ed. (Cambridge, USA: Cambridge Univ. Press), pp. 41—70.

114

Henrike Hultsch, Roger Mundry and Dietmar Todt

59. Todt, D. and Hultsch, H. (1992). Bird song: Variations that follow Rules. Behav. and Brain Sciences 15, 190. 60. Hultsch, H. (1991). Song ontogeny in birds: Closed or open developmental programs? In: Synapse, transmission, modulation, N. Eisner and H. Penzlin, eds. (Stuttgart, Germany: Thieme), p. 576. 61. Kopp, Μ. L. (1996). Ontogenetische Veränderungen in der Zeitstruktur des Gesangs der Nachtigall, Luscinia megarhynchos. PHD-thesis, Faculty of Biology, FU-Berlin. 62. Hultsch, Η. (1989). Ontogeny of song patterns and their performance mode in nightingales. In: Neural Mechanisms of Behaviour, J. Erber, R. Menzel, Η. J. Pflüger, and D. Todt, eds. (Stuttgart, Germany: Thieme), p. 113. 63. Freyschmidt, J., Kopp, Μ. L., and Hultsch, H. (1984). Individuelle Entwicklung von gelernten Gesangsmustern bei Nachtigallen. Verh. Dtsch. Zool. Ges. 77, 244. 64. Wistel-Wozniak, A. and Hultsch, H. (1993). Konstante und altersabhängig veränderte Gesangsmerkmale bei handaufgezogenen Nachtigallen. Verh. Dtsch. Zool. Ges. 86, 281. 65. Marler. P. and Nelson, D. (1993). Action-based learning: A new form of developmental plasticity in bird song. Neth. J. Zool. 43, 91-101. 66. Hultsch, H. (1991). Correlates of repertoire constriction in the song ontogeny of Nightingales (Luscinia megarhynchos). Verh. Dtsch. Zool. Ges. 84, 474. 67. Payne, R. B. (1981). Song learning and social interaction in indigo buntings. Anim. Behav. 29, 688-697. 68. Slater, P. J. B. (1989). Bird song learning: Causes and consequences. Ethol. Ecol. Evol. 1, 19-46. 69. Lemon, R. E., Perrault, S., and Weary, D. M. (1994). Dual strategies of song development in American redstarts, Setophaga ruticilla. Anim. Behav. 47, 317-329.

70. Marler, P. and Peters, S. (1982). Developmental overproduction and selective attrition: New processes in the epigenesis of birdsong. Dev. Psychobiol. 15, 369-378. 71. Nelson, D. Α., Marler, P., and Palleroni, A. (1995). A comparartive approach to vocal learning: Intraspecific variation in the learning process. Anim. Behav. 50, 8 3 - 9 7 . 72. Todt, D. and Hultsch, H. (1994). Biologische Grundlagen des Dialogs. In: Kommunikation und Humanontogenese, Κ. F. Wessel and F. Naumann, eds. (Bielefeld, Germany: Kleine Verlag), pp. 53—76. 73. Todt, D. (1981). On functions of vocal matching: Effect of counter-replies on song-post choice and singing. Z. f. Tierphsychol. 57, 7 3 - 9 3 . 74. Naguib, M. and Todt, D. (1997). Effects of dyadic vocal interactions on other conspecific receivers in nightingales. Anim. Behav. 54, 1535-1543. 75. Todt, D. (1975). Short term inhibition of vocal outputs occurring in the singing behaviour of blackbirds (Turdus merula). J. Comp. Physiol. 98, 2 8 9 306. 76. Todt, D. and Wolffgramm, J. (1975). Überprüfung von Steuerungssystemen zur Strophenwahl der Amsel durch digitale Simulierung. Biol. Cybernetics 17, 109-127. 77. Slater, P. J. Β. (1978). A simple model for competition between behaviour patterns. Behaviour 67, 236-257. 78. Slater, P. J. B. (1983). Sequences of songs in Chaffinches. Anim. Behav. 31, 272-281. 79. Whitney, C. L. (1981). Patterns of singing in the varied Thrush: II. A model of control. Z. Tierpsychol. 57, 141 — 162. 80. Riebel, K. and Todt, D. (1997). Light flash stimulation alters the nightingale's singing style: Implications for song control mechanisms. Behaviour 134, 1 - 2 0 . 81. Riebel, K. and Hultsch, H. (1992). Effects of interceptive visual stimuli on

Rule-Related Knowledge in the Song System of Birds

82.

83.

84.

85.

performance variables in the singing of birds. In: Rhytmogenesis in neurons and networks, R. Eisner and D. W. Richter, eds. (Stuttgart, Germany: Thieme), p. 244. Greenough, W. T., Black, J. E„ and Wallace, C. S. (1987). Experience and brain development. Child Development 58, 539-559. Wolffgramm, J. and Todt, D. (1982). Pattern and time specificity in vocal responses of Blackbirds, Turdus merula. Behav. 81, 264-286. Pepperberg, I. M. (1993). A review of the effects of social interaction on vocal learning in African grey parrots (Psittacus erithacus). Neth. J. Zool. 43, 104-124. Marler, P. and Peters, S. (1981). Birdsong and speech: Evidence for special

86.

87. 88.

89.

115

processing. In: Perspectives on the study of speech, P. Eimas and J. Miller, eds. (Hillsdale: Erlbaum), pp. 75-112. Bower, G. H. (1970). Organizational factors in memory. Cognitive Psychology 7, 18-46. Simon, H. A. (1974). How big is a chunk? Science 183, 482-468. Kroodsma, D. E. (1988). Song-types and their use: Developmental flexibility of the male Blue-winged Warbler. Ethology 79, 235-247. Margoliash, D., Fortune, E. S., Sutter, M. L., Vu, A. C., Wren-Hardin, B. D., and Dave, A. (1994). Distributed Representation in the song system of Oscines: Evolutionary implications and functional consequences. Brain. Behav. Evol. 44, 247-264.

5.

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences Jascha Rüsseler and Frank Rosier

5.1

Introduction

It is a basic feature of human beings that they can recognize, store, and produce regular sequences of events. For example, finding a way in a city requires perception and storage of a sequence of landmarks. Likewise, starting a car, preparing a meal, or doing other manual work needs the initiation and execution of a regular sequence of movements. On the highest level it is the language system which vividly illustrates the capability of the nervous system to exploit sequential regularities. In any case it is not the 'melentes' but always the regular sequence of the 'elements' which transmits information. And in most cases it is also not just the first order conditional probability of successive elements which represents the sequential dependencies. Rather, in many cases, higher order sequential dependencies and even more complex logical or grammatical rules determine which element is allowed to follow another in a sequence of events. These examples make clear that the nervous system must be particularly sensitive to regularities which are present in the environment. It recognizes lower and higher order sequential dependencies and it is able to abstract more complex rules from the perceptually encountered 'raw material'. These regularities are permanently stored, and in case of movements or other behavioral acts they can be reproduced intentionally. This basic ability to acquire and produce sequential dependencies is not genuine to the human nervous system. Systematic research on animal cognition has shown that other species, e. g. pigeons, rats, cats, dogs and monkeys exhibit sequential behavior and develop sequential representations as well (1-3). Although there is hardly any doubt about the fact that sequential dependencies are learned by humans and other species it is still an open question how this is accomplished. For example, it is not clear to what extent sequential dependencies of event sequences can be learned implicitly, i. e. without awareness. Is implicit learning restricted to the acquisition of simple, low order conditional dependencies only, or is it effective with more complex, higher order rules, too? Language acquisition in a natural environment suggests that even the most complex grammatical rules can be learned implicitly, at least during ontogeny. Another open question concerns the neural representation of knowledge acquired in sequence learning situations. Neuropsychology in humans has provided much evidence that declarative learning can be functionally dissociated from procedural learning. The former is tied to an intact temporal lobe system while the latter seems to be linked

118

Jascha Rüsseler and Frank Rosier

to an intact cerebellum and basal ganglia system. This distinction between declarative and procedural learning and memory has much in common with the explicitimplicit dichotomy, but it is not completely congruent. Again, it has to be asked which system is particularly sensitive for sequential order, which system performs the one or the other type of rule learning, and how do both systems possibly interact during acquisition and production of ordered event sequences. Systematic research on these issues needs well controlled experimental settings. Among others, the so-called serial-reaction-time (SRT) task has been used to study implicit and explicit learning of perceptuo-motor event sequences. In this chapter we will give an overview over this line of research. In particular, we will focus on studies examining how sequence knowledge is neurally represented. After introducing the SRT paradigm and presenting a short summary of the basic findings on explicit and implicit SRT-learning we will review three lines of evidence concerning the neural basis of sequence learning: Work with neurological patients which have circumscribed neuropsychological deficits, brain imaging studies, and studies in which event-related potentials were used to monitor brain processes during sequence learning. Moreover, we will discuss some models and hypotheses which deal with possible mechanisms underlying sequence learning.

5.2

The SRT-Learning Task

Nissen and Bullemer (4) introduced the SRT-task to study learning of regularities in event sequences by means of performance improvement. In a typical SRT experiment visual stimuli (typically the letter 'X' or an asterisk) are presented at one of four different positions on a computer screen. Subjects are instructed to press a corresponding key for each position as fast and as accurate as possible. Unknown to subjects, the stimuli appear according to a repeating sequence of positions (e. g. in the sequence known as the Nissen and Bullemer sequence, 4—2—3 — 1-3—2-4—3—2-1, 1 corresponds to the leftmost, 4 to the rightmost position of the horizontally aligned display. Note that after the 10th stimulus the sequence wraps around and is repeated). After some structured training blocks subjects are transferred to a random sequence of stimuli. Numerous studies found a prolongation of reaction time (RT) in the random compared to the preceding structured block, reflecting learning of the sequential regularities inherent in the stimulus material (e. g. 4—16).

5.2.1

Awareness of Stimulus-Structure in the SRT-Task

Implicit learning and memory refer to the acquisition and retrieval of information without conscious awareness of the underlying stimulus regularities which lead to performance changes. In contrast, explicit learning and memory are accompanied by awareness of the learned information and its influence on behavior (17—22). Nissen and Bullemer (4) first showed that learning of perceptuo-motor sequences in the SRT-task as reflected in RT-benefit for sequentially structured ver-

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

119

sus unstructured blocks can occur without the development of conscious awareness in amnesic patients. Nissen, Knopman and Schacter (23) found that in subjects given scopolamine prior to the experiment performance was impaired in a verbal memory task while sequence learning remained unaffected (scopolamine has reversible effects comparable to amnesia). The scopolamine subjects failed to exhibit any knowledge of the sequential stimulus structure. The authors concluded that there is a dissociation between neural systems responsible for structured sequence learning and systems responsible for declarative memory. Other investigators found that healthy subjects also learned sequential dependencies without even noticing that the material contained any structural regularity (e. g. 7; 9; 14; 24-29). Recently, some authors questioned the notion that learning in the SRT-task occurs without conscious awareness of the sequential regularities. The main criticism concerns the tests of explicit knowledge which were used in these studies. Several techniques have been developed to assess the subjects' degree of postexperimental sequence knowledge: First, in free-recall tasks subjects are asked to reproduce the previously presented sequence. The percentage of correctly recalled items is taken as an indicator of the amount of explicit knowledge (e. g. 8). Second, in the generate task (4; 28) subjects are confronted with the same stimulus-display as during training but they have to predict the next stimulus rather than to press a key after stimulus presentation. This procedure is problematic because it confounds different aspects of sequence knowledge: Knowledge of the perceptual event sequence (e. g. the locations on the display) and of the motor sequence (the sequence of subjects' keypresses) is tagged simultaneously. More importantly, Perruchet and Amorim (30) criticized the reliability of the generate task. Most of the studies using this procedure provided feedback about the correctness of a given answer (e. g. 15), thus allowing subjects to gain further sequence-related knowledge during the test. As a consequence, only the trials of the first sequence cycle are taken as a measure of explicit knowledge. With a usually small amount of trials (one sequence cycle) this procedure will be hardly reliable. Therefore, Perruchet and Amorim (30) developed the recognition task wherein subjects are confronted with fragments of the previously seen stimulus sequence (e. g. bigrams, trigrams or quadruples). They have to indicate whether or not these were part of the previously presented stimulus material (see also 31). The percentage of correctly categorized sequence fragments is taken as a measure of explicit sequence knowledge. Perruchet and Amorim (30) presented data showing that sequence learning can be fully explained on the basis of performance in a recognition task and they concluded that learning is explicit (but see 31 for contradicting results). Similar results are obtained in the domain of artificial grammar learning (32; for reviews on artificial grammar learning see 19; 20). Finally, in postexperimental questionnaires subjects are typically asked whether they had noticed any structure in the presented material. Willingham, Greeley and Bardone (31) showed that postexperimental interviews are biased towards reporting the presence of regularities. For example, 24.4% of subjects exposed to random stimuli mentioned the presence of a repeating pattern.

120

Jascha Rüsseler and Frank Rosier

Most authors postulate that implicit learning requires subjects to be unaware of the learned stimulus regularities during the learning process proper (8; 19; 20). This provides another problem for the researcher: He has to rely on backward inference from postexperimental interview data in order to diagnose the awareness-status during the learning phase (22). Pascual-Leone, Grafman and Hallett (33) proposed that explicit knowledge emerges gradually during learning from early implicit knowledge. Recent evidence is at variance with this hypothesis: Perruchet, Bigand and Benoit-Gonin (34) showed that explicit knowledge can develop without simultaneously present (implicit) RT improvements. They trained subjects for two or five stimulus-blocks (10 or 25 replications of a structured event-sequence). Explicit sequence knowledge was assessed in a recognition task (presentation of four-element long sequence fragments) and familiarity estimates (adjustment of bars which indicated the probability that the next stimulus would be presented at a given location). In two of the five experiments, the RT measure showed no evidence for sequence learning, whereas in all five experiments fragments which had been part of the repeated sequence were correctly recognized more often than new fragments. Whenever an RT advantage for structured sequences was present, participants showed reliable explicit knowledge, too, even after only ten presentations of the entire sequence. Given these results it seems unlikely that implicit knowledge is always a prerequisite of explicit knowledge. Unlike research on implicit memory (for reviews, see 3 5 - 3 7 ) which is motivated by the hypothesis of a functional dissociation between implicit and explicit memory, investigators using the SRT-task have not taken much effort to directly compare both forms of learning. Only two studies included explicit conditions: Curran and Keele (7, Exp. 1) compared SRT-performance for incidentally and intentionally instructed subjects with and without a distractor task (counting one of two tones of different pitch). Without distraction, the intentionally learning subjects acquired significantly more knowledge about underlying structural regularities than incidentally learning subjects but this advantage disappeared when both groups were transferred to the distraction condition. It seems that whatever may be responsible for the advantage of intentional learning, it depends on the full availability of attentional resources, but further studies are clearly needed to clarify this issue. (For example, the sequence used by Curran and Keele (7) was shorter than in most experiments; apart from tone counting no other distractor tasks have been used so far and the possible dependency of the advantage of intentional learning on the statistical structure of the sequence has not been studied either). Frensch and Miner (9, Exp. 1) compared incidental and intentional learning for different response-to-stimulus intervals (RSI; the time from a keypress to the onset of the following stimulus). They found implicit learning if the response-to-stimulus interval (RSI) was short (500 ms), but not if it was long (1500 ms). In contrast, intentionally instructed subjects showed learning for both RSIs, but nevertheless learning was inversely related to RSI in this condition, too. The authors conclude that implicit sequence learning depends on short-term memory resources, i. e. subsequent stimuli have to be coactivated to form associations between adjacent sequence elements (see below).

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

5.3

121

Neural Representation of Sequence Knowledge

There are three groups of studies relevant for the question of neural representation of sequence knowledge: Investigations of sequence learning in subjects with explicit memory deficits (Korsakoff-syndrome, Alzheimer's disease (AD)) or striatal dysfunction (Parkinson-disease (PD), Huntington-disease (HD)), and neuroimaging studies.

5.3.1

Sequence Learning in Subjects with Explicit Memory Deficits

Research on sequence learning in patients with explicit memory deficits (Korsakoff-syndrome, AD) is of interest because spared SRT-learning in these subjects suggests that performance is not dependent on brain regions crucial for explicit learning (see 18). Amnesic patients typically show a damage of medial temporal lobe regions (38) including the hippocampus, or of the diencephalon, whereas AD-patients suffer from more widespread damage of neural tissue (neurofibrillary tangles and neuritic plaques in limbic, temporal and posterior association cortex, damage to frontal regions; see 39). Nissen and Bullemer (4, Exp. 4) found no difference in RT-improvement for structured compared to random blocks between six amnesics and a healthy control group. This suggests that amnesics do learn sequential regularities in the SRT-task. To date, there are four studies which examined SRT-learning in AD-patients. Knopman and Nissen (42) and Grafman, Weingartner, Newhouse, Thompson, Lalonde, Litvan, Molchan and Sunderland (41) found small but significant learning for a sample of AD-patients. However, Knopman and Nissen (42) additionally showed that nine of their AD-subjects did not learn the sequential structure at all. In a later study, Knopman (40) tested the retention of sequence knowledge in ADpatients one to two weeks after the learning session and found no difference between AD-patients and healthy controls. Both groups had also shown the wellknown RT benefit for structured blocks in the first session. In a more sophisticated study, Ferraro, Balota and Connor (43) compared performance of very mildly and mildly demented AD-patients with that of nondemented PD-patients and healthy controls, respectively, and found only the mildly demented AD-patients to be impaired in their amount of sequence learning. Conclusions to be drawn from these results are limited because the above-mentioned studies suffer from several methodological shortcomings. First, overall-RT for patients is in general longer than that of healthy controls, thus making it difficult to compare the amount of learning in both groups. The size of the RTdifference between structured and random blocks may depend on the overall RT level (44). Second, the interpretation of group differences is difficult as most of the patients receive medication. It can not be ruled out that performance differences depend on medication, especially in cases where the drugs are known to affect motor functions (e. g. L-dopa for PD-patients). Third, in all of these studies only the 10-element 'Nissen and Bullemer (4) sequence' was used. Therefore, it is

122

Jascha Rüsseler and Frank Rosier

difficult to generalize the results (note that this sequence contains the very salient part ...4—3-2—1 at the end of the ten trial sequence, see above). Finally, the studies have not explicitly tested the role of attentional, memory or motor processes (for example by using dual tasks, varying the statistical structure of the sequence, or introducing deviant events) for the different groups of patients. To summarize, SRT-studies with Korsakoff- and AD-patients have provided evidence that learning of sequential regularities seems to be independent of brain structures which are needed for explicit learning and memory. However, in patients with more severe damage (like mildly demented AD) performance impairments are observed. The causes which lead to these deficits are not clear yet, but they may be due to attentional or short-term memory insufficiencies.

5.3.2

Sequence Learning in Patients with Striatal Dysfunction

Studies of SRT-learning with PD- or HD- patients are especially interesting because they provide the possibility of testing the more specific proposal that skill learning depends on the integrity of the striatum (e. g. Squire (45)). Furthermore, in PD-patients the impact of defects in motor control functions on perceptuomotor sequence learning can be studied. Knopman and Nissen (42) and Willingham and Koroshetz (46) found that HD-patients learned the sequential structure in a SRT-task, but to a lesser degree than healthy control subjects. For PD-patients, Ferraro, Balota and Connor (43) found learning impairments for non-demented patients compared to age-matched controls using the Nissen-Bullemer sequence. Pascual-Leone, Grafman, Clark, Stewart, Massaquoi, Lou and Hallett (47) compared sequence learning in PDpatients on and off medication. The state of medication had one effect only, namely that overall RT was slower without medication. Sequence learning for PDpatients was observed, but it was less pronounced than in healthy controls. In a second experiment, Pascual-Leone et al. (47) used sequences of 8, 10 and 12 elements to examine the effect of sequence length on learning. For controls as well as PD-patients, learning was inversely related to sequence length, but PD-patients were impaired with each of the three sequences. In a third experiment, performance of subjects which were explicitly taught the ten element 'Nissen-Bullemer sequence' was examined. In this explicit learning task, again a difference between PD-patients and healthy controls emerged. This shows that PD-patients are less efficient in utilizing sequential knowledge to improve SRT-performance even if sequential dependencies are explicitly pointed out to them. This finding makes the interpretation of the aforementioned studies somewhat difficult as it cannot be decided whether implicit or explicit learning deficits (or both) are responsible for the well-established SRT-learning impairment in PD-patients. In a more recent study, Jackson, Jackson, Harrison, Henderson and Kennard (48) found no sequence learning at all for 11 PD-patients without medication. In sum, these results suggest that motor functions mediated by the striatum seem to be crucial for procedural learning as induced by the SRT-task.

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

5.3.3

123

Neuroimaging Studies of Sequence Learning

Neuroimaging studies can be used with healthy subjects to examine more directly which brain structures are involved in sequence learning in the SRT-task. Grafton, Hazeltine and Ivry (49) compared regional cerebral blood flow (rCBF) in a positron emission tomography (PET)-study in single- versus dual-task SRTconditions. With PET it is possible to detect metabolic effects with a longer latency which accompany particular learning states. Subjects started with three random blocks followed by three structured stimulus blocks and had to perform a tonecounting task simultaneously (dual task condition; six element sequence, ambiguous structure according to (6)). As none of the participants became aware of the sequence, the authors considered learning in the dual-task phase to be implicit. Finally, three blocks of the same sequence were presented without the distractor task (single-task condition). Seven of twelve subjects became aware of the sequence, thus learning in the single task condition was considered to be explicit. RT-analysis confirmed that learning took place in both phases of the experiment albeit subjects learned more in the single task 'explicit' condition. In the 'implicit' dual task-condition, enhanced activity was found in contralateral motor effector areas (incl. motor cortex, SM A, putamen), in the rostral prefrontal cortex and in the parietal cortex (comparison of rCBF in block 1 (random), and during the following blocks of the dual-task phase). During (explicit) singletask performance, activity was enhanced in the right dorsolateral prefrontal cortex, right premotor cortex, right ventral putamen, and bilateral parietal-occipital cortex (comparison of rCBF in block one of single-task performance and during the following single-task blocks). The authors conclude that the major difference between explicit and implicit learning is an enhanced activity in right prefrontal cortex during explicit learning which may be related to episodic memory functions. Moreover, it was concluded that motor learning involves a number of different cerebral areas (49). There are some methodological problems with this study. First, subjects responded with their dominant right hand only, thus making the interpretation of laterality effects difficult. Second, Grafton, Hazeltine and Ivry (49) used an unusually short sequence and did not assess explicit knowledge in an appropriate way. Therefore, it cannot be ruled out that learning in the dual-task phase was explicit, too, at least to a certain degree. Rauch, Savage, Brown, Curran, Alpert, Kendrick, Fischman and Kosslyn (50) conducted a PET-study which used a more complex sequence (twelve elements, hierarchical structure according to (6)) and controlled the amount of explicit knowledge more thoroughly. Subjects started with three random stimulus blocks followed by three structured and three random blocks. A series of explicit memory tests revealed that none of the subjects had developed explicit sequence knowledge up to this point. Next, the experimenter informed the participants about the repeating sequence in the stimulus material. Three structured blocks and an assessment of explicit sequence knowledge concluded the experiment. RT analysis revealed a learning effect for both, the implicit and the explicit learning episodes. Again, subjects learned significantly more in the explicit than in the implicit condi-

124

Jascha Rüsseler and Frank Rosier

tion. During implicit learning PET-data revealed significant activity in the right ventral premotor cortex, the right ventral caudate/nucleus accumbens, the right thalamus, and bilateral visual association cortices (area 19; implicit — random condition). During explicit learning activation foci were found bilaterally in the cerebellar vermis, the left fusiform cortex, the left inferior frontal cortex, the right thalamus, the right middle frontal cortex, and the right brain stem (explicit random condition). A direct comparison of implicit and explicit learning (implicit - explicit condition) showed activity in the right ventral premotor cortex. This suggests that the right ventral premotor cortex might be of principal importance for a distinction between explicit and implicit learning. Note that due to a limited axial field of view some areas that may be important for sequence learning could not be studied (SMA, D L P F C (dorsolateral prefrontal cortex)). The authors concluded that implicit sequence learning might be mediated by a distributed system (right ventral premotor cortex, right ventral striatum, right thalamus and bilateral visual association cortex). In contrast, explicit sequence learning may be mediated by a subsystem relevant for motor learning (cerebellum, thalamus, brain stem) and subsystems which may reflect the implementation of conscious strategies (visual imagery, language mediation). Two recent studies explored the relevance of motor processes for explicit and implicit sequence learning in more detail (33; 51). In both studies, a centrally presented digit (1, 2, 3 or 4) served as imperative stimulus. Digits were presented in a repeating sequence of either 12 or 10 elements. Explicit knowledge was assessed after every training block. Pascual-Leone, Grafman and Hallett (33) mapped the motor cortex with transcranial magnetic stimulation (TMS) to study changes in the cortical output maps of the relevant muscles. Cortical output maps of the task-relevant muscles became increasingly larger during implicit learning. When full explicit knowledge of the sequence was achieved the cortical output maps regressed to their baseline topography. The authors concluded that rapid functional plasticity of cortical outputs is of prime importance for the transfer of knowledge from an implicit to an explicit state and that explicit knowledge emerges from earlier implicit knowledge (see above for a discussion of this point). Zhuang, Toro, Grafman, Manganotti, Leocanti and Hallett (51) showed that event-related desynchronization (ERD), computed from the human EEG, reaches a maximum level during explicit learning, and declines after full explicit knowledge of the sequence is obtained. E R D is most prominent over motor areas. Localized E R D is interpreted as reflecting an increase in activity of relatively small and independent cell assemblies. Taken together, the results of these two studies suggest that the transition from implicit to explicit knowledge in the SRT-task goes together with a change in cortical motor activation. These changes could imply the generation of a motor plan which represents the whole motor sequence in higher cortical modules. As is evident from the above-mentioned studies, to date no coherent picture of the neural representation of implicit sequence learning has emerged (see also Curran (18)). However, several brain regions have been identified as being relevant for sequence learning (cerebellum, basal ganglia, DLPFC, SMA, premotor cortex, visual association areas, right frontal cortex). It is not yet clear which of these

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

125

regions are causally linked to the acquisition and storage of sequence knowledge and which are of secondary importance, in that they are merely reflecting attentional or other unspecific task effects.

5.4

Theoretical Accounts of Implicit Sequence Learning

It is generally assumed that an associative learning mechanism is responsible for the RT difference between structured and unstructured stimulus blocks (e. g. 5; 6; 9; 52). However, different models have been proposed to explain the exact nature of this association process. These differ in the presumed locus of where the associations are formed, the characterization of learning as primarily motor- or responselearning, and the role of attention.

5.4.1

Attentional versus Non-Attentional Learning Mechanisms

In a model of sequence learning, Cohen, Ivry and Keele (6) proposed the existence of two independent learning mechanisms which differ in their attentional demands. In a series of experiments they explored the influence of a distractor task on implicit learning of sequences which differed in their statistical structure. They used three types of sequences: Unique sequences which consisted of unequivocally paired associations only (e. g. 1 - 2 - 3 , where 1 is always followed by 2, 2 by 3 and 3 by 1), hybrid sequences containing unique as well as ambiguous associations (e.g. 1 - 2 — 3 - 2 - 3 - 1 - 2 , where 1 is always followed by 2, whereas 3 can be followed by either 2 or 1 depending on the preceding stimulus) and hierarchic sequences which comprise higher order dependencies only (e. g. 1—2—3—2—1 — 3, where 1 can be followed by 2 or 3, 2 by 3 or 1 and 3 by 2 or 1, depending on the predecessor of the actual stimulus). Without distraction, all three sequence types were learned by the subjects with larger gains in response speed for unique than for hybrid and hierarchical sequences, respectively. However, with a concurrently performed tone-counting task, only learning of unique and hybrid sequences was observed (one of two tones differing in pitch was presented after each imperative stimulus and the subjects had to report the number of high-pitched tones after each block). Cohen, Ivry and Keele (6) concluded f r o m these results that unique associations are learned by an automatic mechanism which does not require attention, whereas higher order, hierarchical dependencies are learned by a different, "controlled" mechanism which can operate only if enough attentional resources are available (also see Curran and Keele (7)). This model has been challenged f r o m different perspectives. First, it was shown that in contrast to Curran and Keele (7) and Cohen, Ivry and Keele (6), hierarchic sequences can be learned under dual-task conditions, too (10; 11). However, these different outcomes of the dual-task studies could be due to variations in importance subjects ascribed to the secondary tone counting task (53). The outcome of dual-task situations depends heavily on the attention allocation policy. If this is not controlled by explicit instructions or pay-off matrices, results are hardly interpretable at all (54).

126

Jascha Rüsseler and Frank Rosier

Second, some authors referred to different theoretical constructs to explain dual-task interference. Frensch and Miner (9; Exp. 2) found sequence learning for short RSI's (500 ms) but not for longer ones (1500 ms) in a dual-task situation with hierarchic sequences. In a single-task situation (9, Exp. 1), a 12-element sequence was not learned with a long RSI (1500 ms) but with a short RSI (500 ms). The lack of learning was explained by the fact that additional short-term memory (STM) capacity is needed for the concurrent tone-counting task. Therefore, a smaller number of consecutive elements of the sequence can be held in STM simultaneously and hierarchic associations cannot be formed. Stadler (14) observed that learning of sequences with random RSIs between successive elements (i. e. no additional attentional load, but disruption of sequence organization) was as impaired as learning with fixed RSIs and an additional distractor task (i. e. increased attentional load and disruption of sequence organization). He hypothesized that the disruption of sequence organization could be responsible for attenuated learning effects under distraction. Hypotheses which postulate unitary attentional resources imply that a variety of distractor tasks should affect implicit sequence learning (11; 55). To date, apart from tone-counting, only two distractor tasks have been used. Stadler (14) found that a letter-string recall task which poses additional load on STM impaired serial learning, whereas Heuer and Schmidtke (11) found no learning deficit using spatial and verbal versions of the Brooks-task (recall of a visually or verbally described path through a matrix comprising nine squares, see 56). However, learning was impaired if subjects had to perform concurrently a variation of the tone-counting task: They had to press a footpedal whenever a higher-pitched tone was presented. Heuer and Schmidtke (11) explain these results in terms of their task integration hypothesis: The tone-counting and the key-pressing tasks are treated as one entity by the subjects thus leading to longer and less structured sequences in the dual-task than in the single-task situation (i. e. in the case of an unstructured tone sequence every second stimulus (the imperative stimulus) follows a specified sequence and every other stimulus (the tone) is random). Schmidtke and Heuer (57) presented further evidence for a task-integration process using their go/no-go variation of the tone-counting task. They combined a six-element hybrid visual sequence with a six- or five-element sequence of tones, respectively. The six-element tone-sequence results in a combined sequence of 12 elements while the five-element tone sequence results in a sequence which repeats not earlier than after 60 elements. In accordance with the task-integration hypothesis, learning was more impaired if the visual stimulus sequence was combined with a fivetone distraction sequence than with the six-tone distraction sequence. Taken together, research on the mechanisms of dual-task interference have yielded results which do not easily fit into a model which assumes an attentional and an independently operating unattentional learning mechanism.

5.4.2

Role of Perceptual and Motor Processes in Serial Learning

Many studies of sequence learning addressed the question to what extend SRTlearning is a consequence of learning stimulus-, response-, or stimulus-responsesequences, respectively.

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

127

Willingham, Nissen and Bullemer (15) conducted a study which showed that both perceptual and motor processes may contribute to the acquisition of perceptuo-motor sequences. X-marks appearing at four different locations in four different colors were used as stimuli and participants were instructed to respond to the colors. With this set-up, subjects failed to show an RT-advantage for structured compared to random blocks if the task-relevant colors changed randomly while the stimulus locations formed a predictable sequence — i. e. when the response sequence was random but the perceptual sequence structured. In contrast, for a structured sequence of colored stimuli (response sequence present) the wellknown RT-benefit was found, indicating response rather than stimulus learning. However, when subjects were instructed to respond to the location of uncolored stimuli in a subsequent transfer phase no RT-benefit was found, although the locations followed the same regularities as during training, i. e. the response sequence was the same as before. The authors concluded that stimulus structures are learned only if they are relevant for subsequent behavior and if they can be mapped directly onto responses. In a more recent study, Mayr (27) found that spatial sequences were learned independently of response sequences. Subjects had to respond to objects appearing at four different locations on a computer monitor. Objects as well as locations followed repeating sequences in some blocks. By using an eight-element spatial and a nine-element object-sequence, simultaneous learning of two different and independent sequences could be studied. Learning of spatial as well as nonspatial sequences was observed in these experiments. Mayr (27) speculates that the distance of stimuli in the visual field in the Willingham et al. (15) study might have been too small to lead to an RT advantage for structured blocks. Furthermore, he suggests that sequence learning may involve the acquisition of a series of attention shifts to the locations where the next stimulus has to be expected after a considerable amount of learning (see also 58). Alternatively, a sequence of eye-movements could have been learned, thus involving motor learning. To date, no experiments have been conducted to disentangle these two alternative hypotheses. In transfer experiments, subjects are first exposed to sequentially structured material and then transferred to stimuli which are constructed according to a new set of rules. Alternatively, the response mode can be changed. Cohen, Ivry and Keele (6) found that exposure to differently structured material in a SRT-task results in negative transfer whereas shifting subjects to different effector systems (i. e. the use of different fingers during training and test) leads to an almost perfect transfer of the acquired knowledge. In an extension of these studies, Keele, Jennings, Jones, Caulton and Cohen (59) replicated transfer across effectors when the response modality remained the same. However, transfer was less complete if the response modality was switched from key-pressing to verbal answers. The authors conclude that learning may have a response-specific but not an effector-specific component, i. e. sequence learning is represented as a motor program which is not effector-specific. The phenomenon of (incomplete) manual to verbal transfer shows that a purely response-based mechanism of sequence learning is unlikely. Stadler (60) studied sequence learning in a speeded visual search task. The location of the actual target depended on the sequence of locations of previous targets.

128

Jascha Rüsseler and Frank Rosier

He also found positive transfer despite changes in the motor characteristics of the task. Howard, Mutter and Howard (12) compared learning in subjects who simply observed structured event sequences with subjects who responded with keypresses. They failed to show any differences in learning. Note that in the 'observation only group' subjects had to press a key for the first ten trials. This should give subjects a hint about the length of the sequence. Additionally, it cannot be ruled out that they continued with covert responding for the rest of the observation trials. Nevertheless, these results provide support for the view that stimulusstimulus associations are of prime importance for serial learning. In contrast, the results of several studies which used variants of the SRT-task give support to the idea of motor learning. Nattkemper and Prinz (61) used eight different letters as stimuli and mapped two letters each onto one response finger. In an otherwise repeating event sequence two types of deviant letters replaced standards: Letters which required a response with the same finger as a regular letter (violation of the stimulus-, but preservation of the response sequence) and letters requiring a response with a different finger (violation of both, the stimulus and response sequences). If sequential structure is learned and represented perceptually, RT to deviants requiring a same-finger response should increase compared to RT for regular letters, whereas motor learning should result in RT-enhancement only for letters which additionally violate the response sequence. Nattkemper and Prinz (61) found increasing RTs for deviant letters only which violated the response sequence. This indicates that sequential regularities are stored as motor sequences. Similar findings are reported by Hoffmann and Koch (62) who found that changing the stimulus aspects in a sequence learning task while leaving the response aspects unchanged did not affect implicit serial learning. Ziessler (63), on the basis of a somewhat different experimental set-up, claimed that the acquisition of sequence knowledge can be viewed as response-stimulus learning. He used a visual search task and manipulated the number of different responses mapped onto target stimuli. The relation of target identity and the position of the following target was learned better by subjects who responded to each target with one specified response than by subjects who could select between two response alternatives. Ziessler (63) hypothesized that learning of the underlying rules occurs only if the position changes appear to the subjects as effects of their previous responses (response-effect learning). Recent studies give further support to the idea that response-effect learning may play a role in the SRT-task as well (Ziessler, 64).

5.4.2.1

ERP-correlates

of stimulus evaluation

processes

An important methodological shortcoming of most of the studies which try to characterize sequence learning as either perceptual or motoric is that both aspects are confounded due to the one-to-one-mapping of stimuli onto responses. This restricts possible conclusions, if only behavioral response parameters (error rates, RT) are used as dependent variables. Event-related potentials (ERPs) seem to be

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

129

particularly suited to overcome this methodological restriction because different components of the ERP are selectively sensitive to stimulus evaluation and response preparation processes. Furthermore, ERPs derived from the human EEG reflect immediate brain activity changes which accompany the processing of single stimuli or responses in an event sequence. Irregular deviant stimuli of low probability which are presented in an otherwise regular event-sequence elicit an enhanced negativity with a peak latency of about 200 ms poststimulus (N200-component). If such stimulus changes are task relevant, the N200 will be followed by an enhanced positivity having an onset latency of about 350 ms (P300-component) (e. g. 65—68). The N200-component seems to reflect stimulus evaluation processes which are sensitive to the probability of the eliciting events (for reviews, see 69; 70). For the visual modality, it has been shown that an enhanced N200 is accompanied by conscious detection of the stimulus deviation (the so-called N2c, see 69). The amplitude of the P300 component is sensitive to the subjective stimulus probability and to the task-relevance of the presented stimuli (for reviews see 71; 72). Despite the fact that N200 and P300 components are often elicited by similar experimental manipulations, their timing (N200 precedes P300) and their sensitivity to experimental variations suggests that both manifest different kinds of stimulus evaluation processes. For example, Gehring, Gratton, Coles and Donchin (67) showed that in a warned choice RT paradigm the N200-component for unpredictable stimuli was enhanced regardless of their location in the visual field, whereas the P300 amplitude was enhanced only if unexpected stimuli appeared at taskrelevant locations. In light of this evidence the authors concluded that the N200 component reflects the evaluation of basic attributes of unexpected stimuli (i. e. their physical features), whereas the P300 component reflects the evaluation of more abstract stimulus features (e. g. their task relevance). Thus, it seems that both components are sensitive to deviations of the perceptual input from expectancies but they reflect mechanisms which evaluate functionally distinct aspects of stimulus contingencies.

5.4.2.2

ERP-correlates

of response

preparation

The lateralized readiness potential (LRP) is regarded as an index of the selection and activation of responses (73). It is derived from the readiness potential (RP), a slow negativity that emerges some time before movement onset and which rises gradually to its maximum just before movement execution (74). The RP preceding voluntary finger and hand movements is larger contralateral to the executing hand. This asymmetry of the RP seems to start after the selection of the responding hand (75). De Jong, Wierda, Mulder and Mulder (76) and Gratton, Coles, Sirevaag, Eriksen and Donchin (77) independently proposed a method to exclude asymmetries which are not related to the movement. This is achieved by first averaging the RP separately for left and right hand movements. Second, the signals of contra- and ipsilateral electrodes are subtracted for left and right hand movements, and the two resulting difference waves are finally averaged. The re-

130

Jascha Rüsseler and Frank Rosier

suiting measure is known as LRP. One important property of the LRP which follows from its computation is that the LRP-amplitude is related to the correctness of a response. Selection and activation of the correct response results in a negative, selection of the incorrect response in a positive deflection of the LRR The LRP is functionally related to response preparation (e. g. 67; 78). For example, Gratton, Coles, Sirevaag, Eriksen and Donchin (77) asked subjects to respond as fast as possible to one of two imperative stimuli primed by a warning tone with their left or right hand, respectively. They found a close relationship between the correctness of the responding hand and the polarity of the prestimulus LRP for fast responses (fast guesses), i. e. for correct reactions the prestimulus LRP was negative whereas for incorrect reactions the LRP-amplitude was positive. Furthermore, it has been shown that LRP-onset latency is closely linked to EMG onset and RT, i. e. LRP onset seems to indicate the initiation of a central motor command (see 73 for an overview).

5.4.2.3

ERP-studies

of sequence

learning

To date, there are two experiments in which implicit and explicit sequence learning was studied by means of ERPs. Eimer, Goschke, Schlaghecken and Stürmer (8) recorded ERPs while subjects performed a variant of the SRT-task. Four capital letters (A,B,C,D) were presented in a repeated 10-element sequence on a computer monitor and subjects had to press a corresponding key for each stimulus. Standard letters were occasionally replaced by deviating letters which required a response with the opposite hand. Subjects were categorized as implicit or explicit learners according to their performance in postexperimental free recall and recognition tests. Both subject groups learned the underlying regularities as reflected by the well-known RT-advantage for structured versus random blocks and a difference in RT for standard and deviant letters which evolved in the course of the experiment. In two studies which only differed in the number of interspersed deviant stimuli it could be shown that deviating letters elicited a larger negativity 240— 340 ms poststimulus compared to regular letters (N200-effect). This effect was significantly larger in the second experimental half and only present for subjects possessing some explicit sequence knowledge. Furthermore, a slight enhancement of the P300-amplitude for deviant compared to regular stimuli was found for all subjects. The authors concluded that the N200-component may reflect the amount of consciously available knowledge about stimulus regularities. This conclusion relies on how well verbalizable sequence knowledge was assessed. As Eimer, Goschke, Schlaghecken and Stürmer (8) asked their subjects after the first half of the experiment whether they had noted any structural regularities or not, it might be assumed that participants might have overtly searched for such regularities in the second half. The LRP to standard stimuli revealed significant activation of the correct response 200 ms after stimulus onset in the first experimental half whereas in the second half a significant LRP-onset was present as soon as 0-100 ms after letter

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

131

onset. Additionally, in the second experimental half a significant activation of the incorrect response was found for deviants preceding the correct reaction. This LRP-onset effect suggests that sequence knowledge is encoded in the form of motor representations. Rüsseler and Rosier (52) introduced a second type of deviant in a similar experiment to disentangle perceptual and motor processes during sequence learning. Letters were presented in an eight element repeating sequence (VLKTXSMR). Subjects had to respond to a letter appearing centrally on a computer screen with a lift of the appropriate finger (M or T: left middle finger, V or R: left index finger, X or K: right index finger, L or S: right middle finger). In each replication of an otherwise repeating sequence one of two types of deviating stimuli replaced one of the regular letters: Perceptual deviants changed the perceptual, but preserved the response sequence, whereas motor deviants changed both perceptual and response sequences (for example a perceptual deviant was created by replacing the letter L by S (VSKTXSMR), a motor deviant by replacing L by either M,T,V or R which required a response with the opposite hand(V7TCTXSMR)). Subjects performed 38 blocks of 96 letters each. In blocks 1 - 4 , 20, and 36 the letters were presented as a random series, in all remaining blocks the regular letter sequence was presented with one randomly located deviant. Upon completion of the experimental blocks participants performed a free recall and a recognition test (rating of 10 bigrams, trigrams and quadruples on a five point scale) to assess their amount of explicit, verbalizable sequence knowledge. Ten subjects were categorized as implicit learners (no or only little verbalizable sequence knowledge) and the remaining nine subjects as explicit learners (they had reproduced at least five consecutive letters correctly).

first half

second half

first half

second half

Fig. 5.1: RTs from Rüsseler and Rosier (1997) for explicit and implicit learners separately for first and second experimental half and standard, perceptual deviant and motor deviant letters, respectively. Note the shortening of RT for both groups for standard letters in the second half and the difference between standards and perceptual as well as motor deviants in the second experimental half for the explicit group.

132

Jascha Rüsseler and Frank Rosier

Both subject groups learned the sequential regularities as reflected by a continuously decreasing RT for standard letters. Explicit learners showed a reliable increase of RT for perceptual as well as for motor deviants compared to standard letters, whereas implicit learners showed a difference between standards and motor deviants only (see Fig. 5.1). These results suggest that in subjects who possess verbalizable sequence knowledge, two different processes contribute to their performance improvement: The formation of stimulus-stimulus associations (St-i S t ) and of response-response associations (R t -i - R t ), respectively. In contrast, implicit learners seem to form response-response associations only. These conclusions are further supported by the ERP-data: Explicit learners showed an enhanced negativity 250—350 ms poststimulus for perceptual and motor deviants (N200-effect) and an enhanced positivity 4 5 0 550 ms poststimulus for motor deviants only (P300-effect). In contrast, implicit subjects showed no effects of stimulus deviance in the E R P (see Fig. 5.2). The detection of perceptual deviance which is reflected in the N200-effect for subjects acquiring verbalizable knowledge replicates the finding of Eimer et al. (1996), and strongly suggests that stimulus-stimulus associations are formed.

IMPLICIT

P3

PZ

V Ol

P4

V*v OZ

02

STANDARD PERCEPTUAL DEV. MOTOR DEV. Fig. 5.2: ERPs at central, parietal and occipital electrodes of the second experimental half from Rüsseler and Rosier (1997) for explicit (left side) and implicit learners, respectively. Note the absence of an effect of stimulus deviance for the implicit group and the enhanced negativity of perceptual as well as motor deviants 250—350 ms poststimulus (N200 latency range) for the explicit group.

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

133

LRPs did not differ for both groups, but nevertheless reflected the acquisition of the sequential structure. LRP-onset latency for standard letters was shortened in the second experimental half, compared to the first (441 versus 367 ms). Furthermore, for motor deviants a slight but reliable activation of the incorrect response (positive LRP) was found for both groups in the second half. This suggests formation of response-response associations for both groups of subjects. To summarize: These studies have yielded evidence that sequence learning can be characterized as learning of stimulus sequences as well as learning of response sequences whereby the former process may be restricted to the mode of explicit learning. Further research should focus on a more detailed analysis of these processes, their interactions and on their relation to the awareness status of the subjects.

5.4.3

Connectionist Models of Sequence Learning

Two connectionist models have been developed which simulate human performance in the SRT-task (5; 79—81). Both models assume that sequences are learned by means of high-level associations between combinations of the actual and previous stimuli and/or responses. It has been shown that a learning mechanism which only encompasses pairwise associations of stimuli is not sufficient, because sequences that do not contain first order but only higher order dependencies can be learned by human subjects (e. g. 28). Therefore, computational networks that model human SRT-task performance have to be able to learn higher order associations. For example, in a sequence like 1—2—3—2—1—3, the network has to learn that 1—2 is followed by 3, whereas 3—2 is followed by 1. In a model proposed by Cleeremans and McClelland (5) this is realized by introducing a Simple Recurrent Network (SRN) consisting of an input unit, context units, one hidden layer and the output unit. The hidden unit feeds back on the context unit which thus provides information about the preceding stimuli. This model closely fits with data obtained in experiments with human subjects. The results of simulation studies give support to the idea that the underlying learning mechanism is of associative nature. An inductive mechanism which represents sequence knowledge in a more abstract rule-based format does not seem to be a prerequisite for this type of lawful behavior.

5.5

Conclusions

Research with the SRT-paradigm has yielded a large body of evidence that human subjects are able to learn the structure of event sequences without concurrent awareness of the acquired rules. Many studies tried to characterize the underlying learning processes in more detail. Experiments employing the dual-task methodology provided evidence that performance in sequence learning tasks depends on a unitary learning mechanism. However, the effectiveness of this mechanism is modulated by attentional allocation strategies.

134

Jascha Rüsseler and Frank Rosier

Studies of implicit serial learning in clinical populations with functional deficits of the brain stress the importance of response-response associations: Amnesic patients show no learning deficit (e. g. 4) whereas patients with deficits in motor control functions (Parkinson's disease) are clearly impaired in serial learning (e. g. 43). Moreover, different lines of evidence support the position that implicit serial learning is by-and-large a learning of response associations. This follows, among others, from transfer experiments with healthy subjects (59), from experiments which change stimulus aspects while leaving response aspects of the task unchanged (62), from studies which introduce deviants in an otherwise regular sequence of letters (61), and from experiments employing ERPs (8; 52). In contrast, explicit learning seems to be characterized by the fact that both, stimulus-stimulus and response-response associations are acquired (52). This follows, among others, from the particular sensitivity of stimulus-locked ERP-components to irregularities in the perceptual sequence of events.

Acknowledgments Preparation of this chapter was supported by the Berlin-Brandenburg Academy of Sciences (BBAW) and by the German Research Foundation (DFG, grant Ro 529/8-1). We thank Hansjerg Goelz, Kerstin Jost, Mustafa Oeczan and Bettina Rolke for their help in the presented experiment.

References 1. Roitblat, H. L. and von Fersen, L. (1992). Comparative cognition: Representations and processes in learning and memory. Annu. Rev. Psychol. 43, 671-710. 2. Terrace, H. S. and McGonigle, B. (1994). Memory and representation of serial order by children, monkeys and pigeons. Curr. Directions Psychol. Sei. 3, 180-185. 3. Roitblat, H. L. (1987). Introduction to animal cognition (New York: Freeman). 4. Nissen, Μ. J. and Bullemer, P. (1987). Attentional requirements of learning: Evidence from performance measures. Cogn. Psychol. 19, 1 - 3 2 . 5. Cleeremans, A. and McClelland, J. L. (1991). Learning the structure of event sequences. J. Exp. Psychol. Gen. 120, 3, 235-253.

6. Cohen, Α., Ivry, R. I., and Keele, S. W. (1990). Attention and structure in sequence learning. J. Exp. Psychol. Learn. Mem. Cogn. 16, 1, 17-30. 7. Curran, T. and Keele, S. W. (1993). Attentional and nonattentional forms of sequence learning. J. Exp. Psychol. Learn. Mem. Cogn. 19, 1, 189-202. 8. Eimer, Μ., Goschke, Τ., Schlaghecken, F., and Stürmer, Β. (1996). Explicit and implicit learning of event sequences: Evidence from event-related brain potentials. J. Exp. Psychol. Learn. Mem. Cogn. 22, 4, 970-987. 9. Frensch, P. A. and Miner, C. S. (1994). Effects of presentation rate and individual differences in short-term memory capacity on an indirect measure of serial learning. Mem. Cogn. 22, 1, 95 — 110.

10. Frensch, P.A., Büchner, Α., and Lin, J. (1994). Implicit learning of unique

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

and ambiguous serial transitions in the presence and absence of a distractor task. J. Exp. Psychol. Learn. Mem. Cogn. 20, 3, 567-584. Heuer, Η. and Schmidtke, V. (1996). Secondary-task effects on sequence learning. Psychol. Res. 59, 119-133. Howard, J. H. Jr., Mutter, S. Α., and Howard, D. V. (1992). Serial pattern learning by event observation. J. Exp. Psychol. Learn. Mem. Cogn. 18, 5, 1029-1039. Stadler, Μ. Α. (1992). Statistical structure and implicit serial learning. J. Exp. Psychol. Learn. Mem. Cogn. 18, 2, 318-327. Stadler, Μ. Α. (1995). Role of attention in implicit learning. J. Exp. Psychol. Learn. Mem. Cogn. 21, 3, 674-685. Willingham, D. B., Nissen, Μ. J., and Bullemer, P. (1989). On the development of procedural knowledge. J. Exp. Psychol. Learn. Mem. Cogn. 15, 6, 1047-1060. Willingham, D. B„ Greenberg, A. R„ and Thomas, R. C. (1997). Responseto-stimulus interval does not affect implicit motor sequence learning, but does affect performance. Mem. Cogn. 25, 4, 534-542. Berry, D. C. (1994). Implicit learning: Twenty-five years on. A tutorial. In: Attention and Performance XV: Conscious and nonconscious information processing, C. Umilta and M. Moscovitch, eds. (Cambridge), pp. 755—781. Curran, T. (1995). On the neural mechanisms of sequence learning. Psyche 2, 2. URL:http://psyche.cs.monash.edu. au/volume2 — 1 /psyche-95—2-12sequence-1 -curran.html Reber, A. S. (1989). Implicit learning and tacit knowledge. J. Exp. Psychol. Gen. 118, 3, 219-235. Reber, A. S. (1992). Implicit learning and tacit knowledge (New York, Oxford: Oxford University Press). Seger, C. A. (1994). Implicit learning. Psychol. Bull. 115, 2, 163-196.

135

22. Shanks, D. R. and St.John, M. F. (1994). Characteristics of dissociable human learning systems. Behav. Brain Sei. 17, 367-447. 23. Nissen, Μ. J., Knopman, D. S., and Schacter, D. L. (1987). Neurochemical dissociation of memory systems. Neurology 37, 789-794. 24. Cherry, Κ. E. and Stadler, Μ. Α. (1995). Implicit learning of a nonverbal sequence in younger and older adults. Psychol. Aging 10, 3, 379-394. 25. Howard, D. V. and Howard, J. H. Jr. (1989). Age differences in learning serial patterns: Direct versus indirect measures. Psychol. Aging 4, 3, 357— 364. 26. Howard, D. V. and Howard, J. H. Jr. (1992). Adult age differences in the rate of learning serial patterns: Evidence from direct and indirect tests. Psychol. Aging 7, 2, 232-241. 27. Mayr, U. (1996). Spatial attention and implicit sequence learning: Evidence for independent learning of spatial and nonspatial sequences. J. Exp. Psychol. Learn. Mem. Cogn. 22, 2, 350-364. 28. Reed, J. and Johnson, P. (1994). Assessing implicit learning with indirect tests: Determining what is learned about sequence structure. J. Exp. Psychol. Learn. Mem. Cogn. 20, 3, 585-594. 29. Stadler, Μ. Α. (1993). Implicit serial learning: Questions inspired by Hebb (1961). Mem. Cogn. 21, 6, 819-827. 30. Perruchet, P. and Amorim, M. A. (1992). Conscious knowledge and changes in performance in sequence learning: Evidence against dissociation. J. Exp. Psychol. Learn. Mem. Cogn. 18, 4, 785-800. 31. Willingham, D. B., Greeley, T., and Bardone, A. M. (1993). Dissociation in a serial response time task using a recognition measure: Comment on Perruchet and Amorim (1992). J. Exp. Psychol. Learn. Mem. Cogn. 19, 6, 1424-1430. 32. Perruchet, P. and Pacteau, C. (1990). Synthetic grammar learning: Implicit

136

33.

34.

35.

36.

37.

38.

39.

40.

41.

Jascha Rüsseler and Frank Rosier rule abstraction or explicit fragmentary knowledge? J. Exp. Psychol. Gen. 119, 264-279. Pascual-Leone, Α., Grafman, J., and Hallett, M. (1994). Modulation of cortical motor output maps during development of implicit and explicit knowledge. Science 263, 1287-1289. Perruchet, P., Bigand, E., and BenoitGonin, F. (1997). The emergence of explicit knowledge during the early phase of learning in sequential reaction time tasks. Psychol. Res. 60, 4 - 1 3 . Roediger, H. L. and McDermott, Κ. B. (1993). Implicit memory in normal human subjects. In: Handbook of Neuropsychology, H. Spinnler and F. Boiler, eds. (Amsterdam: Elsevier), pp. 63 — 131. Schacter, D. L. (1993). Implicit memory: History and current status. J. Exp. Psychol. Learn. Mem. Cogn. 13, 3, 501-518. Schacter, D. L., Chiu, C.-Y. P., and Ochsner, Κ. N. (1993). Implicit memory: A selective review. Annu. Rev. Neurosci. 16, 159-182. Parkin, A. J. and Leng, Ν. R. C. (1993). Neuropsychology of the amnesic syndrome (Hillsday, NJ: Lawrence Erlbaum Associates). Arnold, S. E„ Hyman, B. T„ Flory, J., Damasio, A. R., and Van Hoesen, G. W. (1991). The topographical and neuroanatomic distribution of neurofibrillary tangles and neuritic plaques in the cerebral cortex of patients with Alzheimer's disease. Cereb. Cortex 1, 103-116. Knopman, D. (1991). Long-term retention of implicitly acquired learning in patients with Alzheimer's disease. J. Clin. Exp. Neuropsychol. 13, 8 8 0 894. Grafman, J., Weingartner, H., Newhouse, P. Α., Thompson, K., Lalonde, F., Litvan, I., Molchan, S., and Sunderland, T. (1990). Implicit learning in patients with Alzheimer's disease. Pharmacopsychiat. 23, 94—101.

42. Knopman, D. and Nissen, Μ. J. (1991). Procedural learning is impaired in Huntington's disease: Evidence from the serial reaction time task. Neuropsychologia 29, 245-254. 43. Ferraro, R. F., Balota, D. Α., and Connor, L. T. (1993). Implicit memory and the formation of new associations in nondemented Parkinson's disease individuals and individuals with senile dementia of the Alzheimer Type: A serial reaction time (SRT) investigation. Brain Cogn. 21, 163-180. 44. Chapman, L. J., Chapman, J. P., Curran, T., and Miller, Μ. B. (1994). Do children and the elderly show heightened semantic priming? How to answer the question. Dev. Rev. 14, 159-185. 45. Squire, L. (1992). Memory and the hippocampus: A synthesis of findings with rats, monkeys and humans. Psychol. Rev. 99, 195-231. 46. Willingham, D. B. and Koroshetz, W. J. (1993). Evidence for dissociable motor skills in Huntington's disease patients. Psychobiology 21, 173-182. 47. Pascual-Leone, Α., Grafman, J., Clark, K., Stewart, M., Massaquoi, S., Lou, J. S., and Hallett, M. (1993). Procedural learning in Parkinson's disease and cerebellar degeneration. Ann. Neurol. 34, 594-602. 48. Jackson, G. M., Jackson, S. R., Harrison, J., Henderson, L., and Kennard, C. (1995). Serial reaction time learning and Parkinson's disease: Evidence for a procedural learning deficit. Neuropsychologia 33, 5, 577 — 593. 49. Grafton, S. T., Hazeltine, E., and Ivry, R. (1995). Functional mapping of sequence learning in normal humans. J. Cogn. Neurosci. 7, 4, 497-510. 50. Rauch, S. L., Savage, C. R., Brown, H. D., Curran, T., Alpert, Ν. M., Kendrick, Α., Fischman, A. J., and Kosslyn, S. M. (1995). A PET investigation of implicit and explicit sequence learning. Human Brain Map. 3, 271—286. 51. Zhuang, P., Toro, C., Grafman, J., Manganotti, P., Leocani, L., and Hallett, M. (1997). Event-related desynch-

Representation and Learning of Structure in Perceptuo-Motor Event-Sequences

52.

53.

54.

55.

56.

57.

58.

59.

60.

61.

62.

ronization (ERD) in the alpha frequency during development of implicit and explicit learning. Electroencephalogr. Clin. Neurophysiol. 102, 3 7 4 381. Rüsseler, J. and Rosier, F. (1997). Implicit and explicit learning of event sequences: Evidence for distinct coding of perceptual and motor representations. Manuscript submitted for publication. Cohen, Α., Wasserman, Α., and Soroker, N. (1997). Learning spatial sequences in unilateral neglect. Psychol. Res. 60, 4 2 - 5 2 . Navon, D. and Gopher, D. (1979). On the economy of the human processing system. Psychol. Rev. 60, 98-112. Heuer, Η. (1996). Doppeltätigkeiten. In: Enzyklopädie der Psychologie C II 2: Aufmerksamkeit, O. Neumann and A. F. Sanders, eds. (Göttingen: Hogrefe), pp. 163-218. Brooks, L. R. (1967). The suppression of visualization in reading. Q. J. Exp. Psychol. 19, 289-299. Schmidtke, V. and Heuer, Η. (1997). Task integration as a factor in secondary-task effects on sequence learning. Psychol. Res. 60, 5 3 - 7 1 . Posner, Μ. I. and Rothbart, Μ. (1992). Attentional mechanisms and conscious experience. In: The neuropsychology of consciousness, A. D. Milner and M. D. Rugg, eds. (London, England: Academic Press), pp. 91-112. Keele, S. W„ Jennings, P., Jones, P., Caulton, D., and Cohen, A. (1995). On the modularity of sequence representation. J. Motor Behav. 27, 17-30. Stadler, Μ. Α. (1989). On learning complex procedural knowledge. J. Exp. Psychol. Learn. Mem. Cogn. 15, 6, 1061-1069. Nattkemper, D. and Prinz, W. (1997). Stimulus and response anticipation in a serial reaction task. Psychol. Res. 60, 98-112. Hoffmann, J. and Koch, I. (1997). Stimulus-response compatibility and sequential learning in the serial reac-

63.

64.

65.

66.

67.

68.

69.

70.

71.

137

tion time task. Psychol. Res. 60, 87 — 97. Ziessler, M. (1994). The impact of motor responses on serial-pattern learning. Psychol. Res. 57, 3 0 - 4 1 . Ziessler, M. (1997). Die Wirkung von Reaktions-Effekt-Beziehungen beim impliziten Sequenzlernen. Paper presented at the 39. Conference of Experimental Psychology (TeaP), Berlin, March 1997, 2 4 - 2 7 . Courchesne, E., Courchesne, Y., and Hillyard, S . A . (1978). The effect of stimulus deviation on P3 waves to easily recognized stimuli. Neuropsychologia 16, 189-199. Duncan-Johnson, C. and Donchin, E. (1982). The P300 component of the event-related brain potential as an index of information-processing. Biol. Psychol. 14, 1 - 5 2 . Gehring, W. J., Gratton, G., Coles, M. G. H., and Donchin, E. (1992). Probability effects on stimulus evaluation and response processes. J. Exp. Psychol. Hum. Percept. Perform. 18, 1, 198-216. Squires, K. C., Donchin, E., Herning, R. I., and Mc Carthy, G. (1977). On the influence of task relevance and stimulus probability on event-related potential components. Electroencephalogr. Clin. Neurophysiol. 42, 1 — 14. Pritchard, W. S., Shappell, S. Α., and Brandt, Μ. E. (1991). Psychophysiology of N200/N400: A review and classification scheme. Advances in Psychophysiology 4, 43—106. Ritter, W., Ford, J. M., Gaillard, A. K. W„ Harter, Μ. R., Kutas, M., Näätänen, R., Polich, J., Renault, B., and Rohrbaugh, J. (1984). Cognition and event-related potentials. 1. The relationship of negative potentials and cognitive processes. Ann. N Y Acad. Sei. 425, 2 4 - 3 8 . Donchin, E. and Coles, M. G. H. (1988). Is the P300-component a manifestation of context updating? Behav. Brain Sei. 11, 355-372.

138 72. Johnson, R. S. (19B8). The amplitude of the P300 component of the eventrelated potential: Review and synthesis. In: Advances in Psychophysiology: Vol. 3, R H. Ackles, J. R. Jennings, and M. G. H. Coles, eds. (Greenwich, CT: J AI press), pp. 62-138. 73. Coles, M. G. H. (1989). Modern mind brain reading: Psychophysiology, physiology and cognition. Psychophysiology 26, 3, 251—269. 74. Kornhuber, Η. H. and Deecke, L. (1965). Hirnpotentialänderungen bei Willkürbewegungen und passiven Bewegungen des Menschen: Bereitschaftspotential and reafferente Potentiale. Pflügers Archiv für die gesamte Physiologie 248, 1 - 1 7 . 75. Kutas, M. and Donchin, E. (1980). Preparation to respond as manifested by movement related brain potentials. Brain Res. 202, 95-115. 76. De Jong, R., Wierda, M., Mulder, G., and Mulder, L. J. M. (1988). Use of partial stimulus information in response processing. J. Exp. Psychol. Hum. Percept. Perform. 14, 682-692.

Jascha Rüsseler and Frank Rosier 77. Gratton, G., Coles, M. G. H., Sirevaag, E. J., Eriksen, C. W., and Donchin, E. (1988). Pre- and poststimulus activation of response channels: A psychophysiological analysis. J. Exp. Psychol. Hum. Percept. Perform. 14, 331-344. 78. Gratton, G., Bosco, C. M., Kramer, A. F., Coles, M. G. H., Wickens, C. D., and Donchin, E. (1990). Eventrelated brain potentials as indices of information extraction and response priming. Electroencephalogr. Clin. Neurophysiol. 75, 419-432. 79. Cleeremans, A. (1994). The representation of structure in sequence prediction tasks. In: Attention and Performance XV: Conscious and nonconscious information processing, C. Umilta and M. Moscovitch, eds. (Cambridge), pp. 783-809. 80. Cleeremans, A. (1997). Sequence learning in a dual-stimulus setting. Psychol. Res. 60, 7 2 - 8 6 . 81. Keele, S. W. and Jennings, P. J. (1992). Attention in the representation of sequence: Experiment and theory. Human Movement Studies 11, 125—138.

6.

Imposing Structure on an Unstructured Environment: Ontogenetic Changes in the Ability to Form Rules of Behavior Under Conditions of Low Environmental Predictability Peter A. Frensch, Ulman Lindenberger and Jutta Kray

6.1

Introduction

Learning is typically defined in terms of an organism's ability to modify its own internal Knowledge Structure in order to accommodate internal and/or external demands (e. g., 1). Often, the modification of internal Knowledge Structure is directed toward increasing the consistency with the structure of the external environment, and indeed, most existing research on human and animal learning is concerned with how organisms come to internally represent systematic structural features of the environment (cf. 1; 2); contributions to this volume). By manipulating the features of the environmental structure, research looking into the link between environment and learning has generated a considerable deal of knowledge about the properties of learning mechanisms that are operative both in humans and in animals under different learning conditions. However, humans at least are also capable of dealing with unpredictable, unstructured environments in highly systematic ways; that is, they are capable of modifying their internal Knowledge Structure, i. e., of learning, even in the absence of environmental systematicity. Put differently, humans are able to form stable rules of behavior even when these rules of behavior are not anchored in the structure of the external environment. The main focus in the present chapter is on the human ability to learn, that is, to form internal rules of behavior, under conditions of low environmental structure. Clearly, learning under these kinds of conditions must be strongly influenced by the properties of the organism's information-processing system within which learning takes place. Consequently, individual differences in the formation of internal rules of behavior must reflect, to a large extent, individual differences in the human information-processing system. Our focus is on potential ontogenetic changes in the human ability to generate internal rules of behavior under conditions of low environmental structure. In the first phase of the research program described in this chapter, we want to demonstrate that there indeed exist age-related differences in the ability to learn, that is, to form internal rules of behavior, when the environment is unpredictable. In a second phase of our research, we attempt to trace age differences in learning under conditions of low environmental structure to age differences in the functioning of the information-processing system.

140

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

The chapter is divided into two conceptually distinct parts. In the first part (i. e., sections 6.2-6.5), we examine, both theoretically and empirically, potential age-related changes in the ability to generate internal rules of behavior under conditions of low environmental structure. In the second part (i. e., section 6.5), we consider whether this ability may be related to age differences in the functioning of the human information-processing system. Specifically, we empirically assess the relation between rule-formation ability and a general marker of cognitive functioning, namely performance on fluid intelligence tests (3 — 7). Sections 6.2 and 6.3 are devoted to a discussion of the existing literature in an attempt to determine whether there is theoretical reason to believe that the ability to form rules of behavior when the environment is unpredictable changes with age. Because a key requirement for the ability to generate internal rules of behavior when these rules are not represented in the environment is the ability to have control over one's own information processing — an ability we term 'cognitive control' henceforth - we first discuss the psychological reality of the cognitive control concept, and show, on the basis of both psychological and neuropsychological evidence, that the control of one's processing is functionally separable of the processes that are controlled. In the second section, we summarize existing research in the areas of cognitive development and cognitive aging that appears to demonstrate that cognitive control ability may diminish with increasing age. These findings imply, albeit weakly, the possibility of ontogenetic changes in the ability to generate internal rules of behavior under conditions of low environmental structure. In the third section, we introduce a new experimental paradigm that can be used to investigate age differences in the ability to generate internal rules under conditions of low environmental predictability, and describe the results of an initial experiment with this paradigm. In Section 6.5, we briefly discuss the nature of age-related decline in general information-processing functioning that has been described in the literature, and demonstrate that age-related changes in information-processing functioning, captured by measures of fluid intelligence, are empirically related to the ability to generate rules of behavior when the environment is unpredictive.

6.2

The Concept of Cognitive Control

A key requirement for the ability to learn, i. e., to form rules for the regulation of behavior when the environment is not structured, is that one is able to exert control over one's cognitive processing. In this section, we present a brief overview of the existing literature on the concept of "cognitive control" with the aim of demonstrating that such an ability does indeed exist in humans and is functionally separable from other types of cognitive processing. The term "cognitive control" is meant to encompass a whole variety of different regulatory processes that are logically required to ensure successful task completion. For the purposes of this chapter, we define "cognitive control" somewhat loosely as cognitive activity that is not directly concerned with the processing of task-relevant information, but is nevertheless necessary to achieve error-free

Imposing Structure on an Unstructured Environment

141

performance on one or multiple tasks that are performed either in isolation or in parallel. The distinction between processes that are concerned with task-relevant information — for which we will use the term elementary processes henceforth — and control, or regulatory, processes that initiate, supervise, stop, and redirect elementary processes, can be likened to the difference between basic procedures and control structures in many computer programming environments. Basic procedures store information in registers, read information out of registers, compare different pieces of information, modify information in some specified way, and so on. In contrast, the control structure regulates when basic procedures are to be performed, which procedures are performed in parallel, which ones serially, which ones need to be cascaded, which ones not, which particular piece of information is processed by which procedure, what to do if procedures fail, and so on. In cognitive-processing terminology, elementary processes encode environmental information, store and represent information in memory, retrieve information from memory, transform information, and so on. Control processes, or executive control processes, as they are sometimes called (e. g., 8; 9), are responsible primarily for scheduling and initiating elementary processes, for monitoring the processes while they are executed, and for evaluating the results of executed processes. Control processes may not be under conscious control but may operate automatically and without attentional (10; 11).

6.2.1

The Psychological Reality of Cognitive Control

The basic distinction between cognitive control processes and elementary (i. e., to-be-controlled) processes has been part of virtually every general informationprocessing theory of the cognitive system since the early 1960s (e. g., 12; 13). Atkinson and Shiffrin (12) seem to have introduced the notion of cognitive control into modern information-processing models of cognition by distinguishing between memory structures, on the one hand, and control processes, on the other hand. The distinction between control processes and memory processes, writes Shallice (14), "was introduced by the analogy of the relation between what a human programmer writes at a remote console and the computer hardware and built-in program that the written program controls" (p. 395). Control processes were defined as processes that "are not permanent features of memory, but are transient phenomena under the control of the subject" (12, p. 106). As a more recent example of the many theories in which the assumption of cognitive control processes figures prominently, consider the model of working memory that was developed by Baddeley and Hitch (15), and has served as a catalyst for much of the recent research on working/short-term memory. Baddeley and Hitch argue that working memory consists of multiple, functionally independent subsystems. Two of the subsystems serve primarily to manipulate speechbased information (phonological loop) and visual information (visuo-spatial sketch pad). These systems are capable of storing information and of performing relatively simple elementary processes on phonological and visual information (e. g., rehearsal). The Central Executive is a controlling attentional system that

142

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

supervises and coordinates the processing in these two "slave" systems. Given the complexity of the Central Executive and the relative simplicity of the phonological loop and visuo-spatial sketch pad, it is perhaps not surprising that while our understanding of the basic processing in the two slave systems has increased rather sharply over the past twenty years, the same cannot be said for our understanding of the Central Executive. Specifically, it is not known at this time which processes are and are not under the control of the Central Executive nor is it known how control processes change with practice or age.

6.2.1.1

Psychological

support

Empirical studies that speak to the functional independence of control and elementary processes come from two main lines of research, experimental studies employing a dual-task methodology, and neuropsychological research. A large number of dual-task studies has addressed the general question of whether one particular cognitive control mechanism, namely the coordination of two tasks that are performed concurrently is functionally independent of the ability to perform the individual tasks, or, put differently, whether there exists a general time-sharing ability. Early research on this question was characterized by, largely factor-analytic, approaches in which the main question of interest was whether subject variance in combined (Tasks A and B) task performance could be predicted by variance on the separate tasks, A and B. If this was the case, such was the reasoning, then a time-sharing ability did not exist. If, on the other hand, unique variance on the combined tasks could be demonstrated, then it could be concluded that timesharing ability was independent of the ability to perform the tasks in isolation. Early factor-analytic studies on time-sharing have yielded generally mixed results on the question of whether a time-sharing ability is independent of the ability to perform the tasks in isolation, and have been severely criticized in recent years on methodological grounds. Ackerman, Schneider, and Wickens (16), for example, have argued that unique variance on dual-task performance may reflect unreliable measurement error, rather than time-sharing ability. The few recent studies that have successfully escaped this criticism (e. g., 17—21), have generally been in support of a time-sharing ability. Yee, et al. (21), for instance, asked subjects to simultaneously perform a verbal and a visual task (verbal and auditory in a follow-up experiment) at two times of measurement. At Time 1, both tasks were performed in isolation as well. Yee et al. (21) then used a regression approach to partial variance due to performance on the individual tasks from the variance on the combined tasks obtained at the two times of measurement. The two resulting residual scores were reliably correlated and, in addition, correlated reliably with the residual scores obtained from a second procedure in which a verbal and an auditory task were performed simultaneously. Yee et al.'s (21) findings indicate that the simultaneous execution of two tasks requires an ability that is independent of the ability to perform the individual tasks in isolation. These results provide indirect support for the functional independence of elementary processes and at least one particular control process, coordination.

Imposing Structure on an Unstructured Environment

143

The simultaneous control of two tasks adds additional burden to the cognitive system even when the two tasks are selected such that interference is minimized (e. g., 22). For example, McLeod (23) gave his participants lengthy practice on a pursuit tracking task in which they were required to keep a spot of light in contact with a target at the same time as they were responding by saying "high" or "low" to a high or low pitch tone. Detailed monitoring of the tracking performance showed no specific effect of the concurrent tone reaction response indicating that interference between the tasks was minimal. Nonetheless, simply knowing that the next trial was one in which a tone response might be required, caused a small amount of general impairment in tracking skill, even when no response was actually made. Following a later and more detailed study, Shallice, McLeod, and Lewis (24) concluded that even when interference is minimized and participants are highly practiced, a decrement of around 10% in performance occurs as a result of the requirement to perform two tasks simultaneously. Most dual-task studies simply demonstrate that the simultaneous control of two task results in overall performance decrements that are not linearly predicted by performance on the two tasks in isolation. A study by Kelso, Southard, and Goodman (25) goes one step further by showing that the need to control two tasks simultaneously may affect the way in which the component tasks are performed. In this study, participants were required to reach out and touch targets of varying sizes at various distances with their right and left hand. As is well known from Fitts Law (26), the time it takes to strike a target increases as it becomes smaller or is moved further away. However, when participants are asked to touch a small and distant target with the left hand and a large target that is close by with the right hand, then the right hand is slowed down such that the two hands hit the targets at the same time. This "coupling effect" very nicely demonstrates how cognitive control processes may impose themselves on elementary processes.

6.2.1.2

Neuropsychological

support

Perhaps the most persuasive evidence in support of the functional difference between control and elementary processes comes from neuropsychological patients suffering from the frontal-lobe syndrome. The frontal lobes of the brain have over the years constituted one of the most fascinating and yet frustrating puzzles in neuropsychology (27). Whereas earlier investigators (28; 29) did not observe significant deficits in patients with frontal-lobe damage, more recently obtained evidence points to the frontal lobe's crucial role in planning, organizing, and controlling cognition and action (e. g., 30). Relative to other brain regions, the size of the frontal lobes reaches a maximum in humans, where they comprise about 40% of the total brain volume (31). As one would expect from a system serving regulatory functions, the frontal lobes have a very rich system of connections with lower levels of the brain and with virtually all other parts of the cortex (32). Many frontal-lobe patients show deficits on tasks that pose high demands on cognitive control. Two typical examples are the Wisconsin Card Sorting Test (WCST) and tests of verbal fluency. In the WCST, patients are given a pack of

144

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

cards where each card contains a pattern that is made up of various shapes that vary in color, size, and surround. Patients are instructed to sort the cards into piles on the basis of a rule. Once the patients are able to follow the rule without making errors, the rule is changed. Normal subjects can sort the cards according to different rules relatively easily and quickly, without making many errors. Patients with frontal-lobe damage, however, tend to learn the first rule, and then appear unable to switch to a new rule, demonstrating a high number of errors that tend to be perseverations based on the first rule. The patients' perseverations are easily explained in terms of a cognitive control deficit (30). That is, patients appear to have lost the capacity to interrupt and change ongoing activity. Once a particular task strategy has been selected and implemented, it continues to run and cannot be modified. Interestingly, many of the frontal-lobe patients who show this pattern, also exhibit a deficit that has been termed "utilization behavior" (33). It appears that for these patients, environmental stimuli often take over the functional role of that part of the cognitive control system that initiates action. For example, when a glass is placed on a table in front of these patients, the patients will grasp it. If a bottle of water is then placed next to the glass, they will seize it, fill the glass, and drink the water, regardless of whether it is appropriate to do so or not (34). Such behavior is rarely observed in normal subjects or patients with damage to different parts of the brain, thus supporting the assumption that the frontal lobes are involved in the cognitive control of action. Patients with frontal lobe damage also show remarkable deficits on verbal fluency tasks where subjects are asked to produce as many words as possible from a given category such as furniture, or words beginning with a given letter, such as B. Frontal lobe patients are typically able to produce only three to four items per minute, whereas normal subjects produce at least a dozen (35). Patients also frequently make mistakes and produce words that do not belong to the required category or repeat the same words several times. Again, the patients' difficulties in the verbal fluency tasks can be explained if it is assumed that parts of the frontal lobes are dedicated to cognitive control. The task is difficult presumably because no routine, overlearned program for generating items exist that could be run off without control. Instead, the patients need to set up a retrieval strategy and monitor that the produced items come from the correct category and are not repetitions. The frontal lobes also seem to play an important role in the ability to inhibit goal-irrelevant information (36). The classical test to assess this ability is the Stroop Color-Word Test (37), where the participant is asked to name the color of the ink in which an incongruent color-name is written (e. g., the word red printed in green ink). Performance on the Stroop, and on other tests of inhibition such as the Gottschalt Hidden Figures Test (38), is differentially impaired in patients with frontal lobe damage (39). In line with this observation, recent analyses of eventrelated brain potentials in monkeys and normal humans strongly suggest that activity in the prefrontal cortex is related to response inhibition (36; 40; 41), and that anterior regions of the cortex with high connectivity to the frontal lobes may be involved in a neural process for error detection and compensation (42).

Imposing Structure on an Unstructured Environment

145

In summary, there exists both experimental and neuropsychological evidence supporting the functional independence, or separability, of elementary processes and at least some control mechanisms. Furthermore, in theoretical models of the cognitive system, or parts thereof, such as the Baddeley and Hitch's (15) model of working memory, control and elementary processes are typically separated. Together, these findings allow for the possibility that the developmental trajectories of control and elementary processes are different.

6.3

Age Differences in Cognitive Control

In this section, we examine whether current knowledge about the age gradients for some cognitive control processes is at least compatible with the idea that cognitive control ability varies with age. We first present some information on frontallobe development because current evidence and theorizing emphasizes that the frontal lobes are implicated in cognitive control processes. We then review the evidence on two types of control processes, separation/coordination and inhibition/monitoring. As argued above, the finding of age differences in cognitive control would be suggestive of ontogenetic changes in the ability to generate internal rules of behavior under conditions of low environmental structure.

6.3.1

Frontal-Lobe Development

Of all regions of the brain, the frontal lobes are the last and most slowly to develop. The area occupied by the frontal lobes increases rapidly during the first two years after birth, then again from about four to seven years of age, and reaches its final size during adolescence. In terms of connectivity, a period of synapse accumulation reaching its peak at two years of age is followed by a period of progressive synapse elimination, or pruning, that is not completed before late adolescence. Moreover, myelination of the frontal cortex occurs relatively late and continues throughout the teenage years (43). With respect to the other end of life, certain areas of the frontal lobes, such as the prefrontal cortex, appear to be among the first to show neuroanatomical and functional signs of aging-induced deterioration (44-47).

6.3.2

Coordination and Separation

Evidence in the fields of both cognitive aging (for a summary, see 48) and child development (49; 50) suggests that the magnitude of age effects increases as a function of task difficulty. In other words, both for children and older adults, differences to younger adults are generally larger with more difficult tasks. As a corollary, age by task-difficulty interactions in response times often disappear

146

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

when scaled in proportional metric. For example, 10-year-old children and 65year-old adults may, on average, respond 1.6 times slowed than young adults, which means that absolute differences in latencies are larger under more difficult experimental conditions. Salthouse (51) and others (52; 50) have argued that this empirical regularity is easily explained in terms of age differences in the average duration of processing steps. We concur in this judgment. However, the regularity effect is equally well explained by the cognitive-control hypothesis if we assume that cognitive control demands, on average, tend to increase with task difficulty. Thus, without further specification, the fact that more difficult tasks produce larger age effects does not discriminate between speed-of-processing hypothesis and cognitive-control accounts. To allow for differential predictions, one possibility is to distinguish two components of task difficulty: amount of processing (i. e., number of processing steps), and complexity of processing (i. e., cognitive control demand). Two studies in the field of cognitive aging allow for such a distinction (53; 54). Charness and Campbell (53) found that control costs associated with coordinating the components of a complex mental-calculation algorithm increased with age and that age differences in coordination costs were not attenuated by practice. Mayr and Kliegl (54; see also 55) investigated age differences in figural reasoning under task conditions that differed in coordination demands (high versus low). At equal levels of difficulty across task conditions (operationally defined on the basis of young adults' latencies), age differences were more pronounced when coordination demands were high. Thus, the results of both studies suggest that the costs associated with the coordination of information within complex tasks are especially high for older adults. In contrast, studies using a dual-task paradigm to investigate age differences in coordination/separation ability have provided a more mixed picture. When age differences in single-task performance were eliminated by individualized adjustment of difficulty levels, concurrence costs (i. e., the drop in performance when performing the two tasks together) were sometimes equivalent in young and old adults (56) and sometimes greater in older adults (57-59). When no adjustments were made, proportional adult age differences were of about the same magnitude under single and dual-task conditions (60), which would mean that young and old adults experience concurrence costs of equivalent magnitude. Similar discrepancies in results are found in the child-developmental literature (61; 62). In our opinion, the apparent invariance in concurrence costs across age found in some of the age-comparative dual-task studies should be interpreted with caution. First, separation (i. e., the ability to keep two tasks as separate as possible) may be easier than coordination in the narrow sense (i. e., the ability to interconnect two concurrently performed tasks), especially if the two tasks involve different modalities. Second, standard dual-task paradigms vary the crucial source of difficulty (i. e., cognitive control demands) at only two levels by comparing single with dual-task performance. Clearly, control difficulty needs to be manipulated across a wider range to obtain more conclusive evidence.

Imposing Structure on an Unstructured Environment

6.3.3

147

Inhibition and Monitoring

Using the Wisconsin Card Sorting Test (WCST), Chelune and Baer (63) observed that the tendency to commit perseveration errors decreased during middle childhood. Similar results were obtained with tests of field independence, such as the Embedded Figures Test (EFT; 64), where participants perceive conflicting cues and have to overcome the tendency to respond to the more salient cue to make a correct response. Children's performance on the EFT improves until it reaches an asymptote in the middle teenage years (65). Finally, inhibition ability as measured by the Stroop Color-Word Test (37), shows gradual performance increments during childhood and adolescence up to early adulthood (66-68). These findings are consistent with the claim that cognitive control processes, such as inhibition ability, promote developmental change in fluid intelligence. To obtain stronger evidence, one would need to know more about the relationship between measures of fluid intelligence and inhibition ability. Unfortunately, child developmentalists have rarely looked at such measures in combination. A notable exception is a study by Case and Globerson (69) who found a rather high correlation between the Children's Embedded Figures Test and Raven's Colored Progressive Matrices in a sample of forty-three 7.6 to 8.6 year-old children (r = .40; corrected for attenuation, r = .53). In the field of cognitive aging, an increasing body of experimental research points to pronounced age differences in memory updating (70—72), memory for source and context information (70; 73—76), memory for processing errors (77), and inhibition ability (78—81). Moreover, inhibition and monitoring abilities as assessed by standard procedures, such as the Wisconsin Card Sorting Test (82; 83), the Embedded Figures Test (84; 85), and the Stroop Color-Word Test (66; 86; 87, show first signs of decline in the late 50s and relatively pronounced decrements thereafter.

6.3.4

Summary

With respect to cognitive control processes presumably supported by the frontal lobes, the available evidence points to a decrease of coordination and separation abilities in old age, to an increase of inhibition and monitoring abilities during childhood, and to a decrease of these abilities in old age. Together, these findings allow for the possibility that the ability to form internal rules of behavior in the absence of environmental structure, as an ability that requires direct cognitive control, might change with age as well. The available evidence also allows for the possibility that age-related changes in this ability might be related to age-related differences in general cognitive functioning. In the next section, we introduce a new experimental paradigm that can be used to empirically assess age differences in the ability to learn under conditions of low environmental predictability, and describe the results of an initial experiment with this paradigm. In sections 6.5 and 6.6, we are concerned with the relation between age differences in learning under low environmental systematicity and age differences in general cognitive functioning, as assessed by measures of fluid intelligence.

148

6.4

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

Age-Related Changes in the Ability to Form Rules of Behavior Under Conditions of Low Environmental Predictability

The main purpose of the experimental task described below was to allow for an assessment of age differences in learning in a situation where the environment changes randomly and thus unpredictably. In this section, we first briefly describe the experimental task. Then, we discuss the main results obtained with a first experiment using this task.

6.4.1

The Continuous Monitoring Task

Figure 6.1 shows the basic setup of the Continuous Monitoring Task (CMT) from the viewpoint of a research participant. When performing the CMT, participants were seated in front of a computer screen. On the screen, two half-circles were displayed, one located spatially above the other. Two features of the upper halfcircle, its size and its brightness, were controlled by the computer; the corresponding features of the lower half-circle were controlled by the research participant. Computer-controlled stimulus

Subject-controlled stimulus

Ϊ

Levers for size a n d brightness

Fig. 6.1: The Continuous Monitoring Task (CMT).

The computer-controlled half-circle was modified continuously in real time on the two dimensions, size and brightness. The modifications of each of the two dimensions occurred randomly, discretely, and unpredictably. Furthermore, the changes sometimes occurred in synchrony, that is, changes in size and brightness occurred simultaneously, or the changes occurred in temporal succession, that is, with a time lag. Whether the changes occurred in synchrony or in succession at any point in time was unpredictable as well. Participants were instructed to continuously adjust either one (i. e., ID task) or both (i. e., 2D task) of the two dimensions, size and brightness, of the participant-

Imposing Structure on an Unstructured Environment

149

controlled half-circle to isomorphic changes in the computer-controlled half-circle. In essence, thus, participants' task was a one-dimensional or a two-dimensional tracking task. Adjustment of the participant-controlled half-circle was achieved via two levers, one for size, one for brightness, that could be moved continuously. Both the changes in the computer-controlled stimulus and the responses made by the participants were continuously assessed by the computer and stored for analysis. Two additional characteristics of the CMT are important. First, the computercontrolled stimulus always changed on both dimensions, size and brightness, even when participants' task was a one-dimensional tracking task. This was done to ensure that differences between participants' one-dimensional and two-dimensional performances would not reflect differences in the amount of perceptual information that was available in the computer-controlled stimulus. Second, to make certain that individual differences in perceptual discrimination thresholds would not affect performance comparisons across age, the amount by which size and brightness changed was individually adjusted for both the computer-controlled and the participant-controlled stimuli. The adjustments were based on individual discrimination thresholds that were determined earlier in the same experiment with the same stimulus materials.

6.4.2

The Measurement of Monitoring Accuracy

Given the continuous nature of the CMT, participants' performance can, in principle, be assessed both at a macro and a micro level. At a macro level, and this will be our focus in the present chapter, participants' performance was assessed in terms of the absolute difference between the size or brightness level of the computer-controlled stimulus and the size or brightness level of the participant-controlled stimulus. More precisely, monitoring accuracy was computed as the time integral over the difference between the computer-controlled and the participantcontrolled stimulus, scaled against a random baseline such that individual accuracy could vary between 0% and 100%. Zero% reflected chance performance and 100% reflected perfect synchrony with the computer-controlled stimulus at all times. Of course, due to participants' need to respond after the computer-controlled stimulus had changed, perfect synchrony could never be achieved. Even our best participants were not able to surpass the 90% accuracy level. Two independent variables were manipulated in the context of this task. First, as mentioned above, participants were asked to either monitor size and brightness in isolation (ID task) or concurrently (2D task). Second, we manipulated the average time duration for which any particular size or brightness level of the computer-controlled stimulus remained on the screen before it was replaced by the next size or brightness level. To participants, this manipulation "felt" like a speed-of-film manipulation. That is, short durations created the impression of a quickly changing stimulus; long durations created the impression of a slowly changing stimulus.

150

Peter A. Frensch, U l m a n Lindenberger and Jutta Kray

Film Speed (ms per Frame) Fig. 6.2: Sample time-accuracy function for an individual participant.

The manipulation of frame duration allowed us to assess participants' monitoring accuracy across a whole range of fast and slow changing stimulus patterns, as shown in Figure 6.2. Therefore, the manipulation allowed us to compute individual time-accuracy functions. Individual data were fitted separately for size and brightness and ID and 2D conditions to a hyperbolic power function using the CNLR procedure in SPSSX. Figure 6.2 shows an individual participant's level of monitoring accuracy as a function of the time duration between successive changes in the size dimension. As can be seen, performance is strongly affected by the time duration manipulation. At long time durations, corresponding to slow changes in the computer-controlled stimulus, the participant performed quite well, above 80%. At short time durations, corresponding to fast changes in the computercontrolled stimulus, the participant's performance was very poor. The integrals under the individual time accuracy functions, one of which is shown in Figure 6.2, were the basic units of analysis that will be referred to when the main empirical findings are summarized.

6.4.3

Design of Experiment

For the experiment, 79 participants from three different age ranges were recruited, children between eight and nine years of age, young adults between 20 and 25 years of age, and old adults between 65 and 70 years of age. Data collection took place over seven sessions. In the first session, a battery of intellectual ability mea-

151

Imposing Structure on an Unstructured Environment

sures was administered. In the second and third session, perceptual discrimination thresholds for size and brightness were determined for each participant. In sessions four and five, participants were instructed to adjust the two dimensions size and brightness in the CMT in isolation, that is, to perform the ID tasks. Finally, in sessions six and seven, participants performed the size and brightness tasks concurrently, that is, in a two-dimensional situation.

6.4.4

Characteristics of Sample

Table 6.1 summarizes the basic characteristics of the sample (N = 69; data from 10 participants were discarded for various reasons) separately for children, young adults, and old adults. Subjective Physical and Mental Health. Age differences in subjective physical health were significant only between children and old adults; for subjective mental health, children differed significantly from young adults. Knowledge. In the vocabulary test, young and old adults performed better than children. For Spot-a-word, all three age groups differed significantly from each other. Perceptual Speed. Young adults performed significantly better than children and old adults in both indicators of perceptual speed, the 'Identical Pictures' and

Table 6.1 Charakteristics of Sample Children ( 8 - 9 ) (n = 22)

Young Adults ( 1 9 - 2 5 ) (» = 23)

Μ

SD

67.2

2.0

0.6 0.7

2.3 2.0

0.9 0.9

23.9 20.3

4.5 8.1

24.9 26.3

4.4 6.1

30.9 5.3

2.9 61.4

24.3 10.0

3.3 44.7

8.6

2.4 3.3

5.8 17.8

1.2 3.4

6.3 17.9

1.2 3.6

Μ

SD

Μ

Age

8.4

0.5

21.8

1.7

Subfektive physical health 2 mental health 3

1.5 1.5

0.6 0.5

2.0 2.2

Knowledge Vocabulary Spot-a-Word

8.2 5.9

3.0 4.4

2.1 28.7 9 21

Perpetual Speed Identical Pictures 23.9 Digit-Symbol Sub. Thresholds for Size Brightness a

Old Adults ( 6 4 - 7 0 ) 01 = 24)

SD •

b

This was scored with a Likert-type scale ranging from 1 (excellent) to 5 (very poor), individually determined in sessions 2 and 3.

152

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

'Digit-Symbol Substitution Test'. In the 'Digit-Symbol Test,' older adults performed significantly better than children. Sensory Thresholds. Children differed significantly from young and old adults in their sensory thresholds for size and brightness (higher thresholds). However, there was no significant difference between young and old adults.

6.4.5

Main Findings

Figure 6.3 shows the mean performance in each age group on the CMT, separately for the one-dimensional and two-dimensional conditions. Depicted are the integrals under the average time-accuracy functions in each age group. In the 1D situation, shown in the left panel of the Figure, young adults performed better than old adults, who in turn, performed better than the children. For the 2D situation, shown in the right panel of Figure 6.3, the same rank-order applied, except that now the performances of children and old adults were no longer different. ^400 I 380 I 360 ^340^ w a 32(H S 300^

2D

2D

ID-



2D·

I 280

«260 I 240 i 220 S

200

Children

Ν = 22

Young Adults

Ν = 23

Old Adults

Ν = 24

Fig. 6.3: One-dimensional (ID) and two-dimensional (2D) performance on the CMT for children, young adults, and old adults.

Figure 6.4 displays the difference between performance on the one-dimensional and the two-dimensional task separately for the three age groups. The ID minus 2D performance difference is taken here as a global measure of the ability to learn, i. e., to form internal rules of behavior, in a situation where environmental change is unpredictable. We use the difference score rather than "raw" performance on the 2D task as a global measure of this ability because in the difference score individual differences in perceptual ability, motor speed, etc. are removed. It is important to note that the difference score is inversely scaled. That is, the smaller the score, the better the participant is capable of dealing with the unpredictable environment, thus, the better the participant's ability to learn.

Imposing Structure on an Unstructured Environment

Children

Young Ad.

Ν = 22

Ν = 23

153

Old Ad.

Ν = 24

Fig. 6.4: Age differences in the ability to form internal rules of behavior, defined as difference in performance on the one-dimensional ( I D ) and two-dimensional (2D) task.

What Figure 6.4 conveys is that the overall difference between 2D and ID performance, interpreted here as a global measure of learning under conditions of low environmental predictability, does not differ for the children and the young adult group. However, both of these age groups exhibit smaller overall scores than the old adult age group. Thus, it appears that children and young adults are better at dealing with the unpredictable environment than are old adults; by inference, children and younger adults are thus able to learn better under conditions of low environmental predictability than old adults. One might argue that the differential learning that is captured by this finding might simply be a reflection of a differential speed deficit. According to this explanation, old adults would show decreased learning because their processing speed is much slower than that of young adults. Children would not show this deficit, so the argument could go, because they do not differ in processing speed from young adults. That this explanation is unlikely to be correct is shown in Figure 6.5. Figure 6.5 displays the performance of the children and the old adult group relative to the performance in the young adult group. The bars in Figure 6.5 capture the deviation of the children and old adult groups from the performance of the young adults group on three dependent measures: performance on the onedimensional CMT, 2D minus ID performance on the CMT, and performance on measures of fluid intelligence. If we use performance on the one-dimensional monitoring task as a proxy measure for processing speed, we find that children, in fact, perform worse than older adults on processing speed; yet, they show smaller 2D minus ID scores than older adults. It thus seems that the ability to form internal rules of behavior under conditions of low environmental predictability does indeed change across the life span, as has been anticipated by our discussion of age-related changes in cognitive control in

154

l.oJ

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

ID Performance

1D - 2D

Fluid Intelligence

Deficit Relative To Young Adults

Fig. 6.5: The performance of children and old adults is expressed as standard deviation units of the young-adult performance on the ID task, 2D minus ID performance, and fluidintelligence composites.

the previous section. However, the changes in learning ability appear to be more pronounced between the ages of 25 and 70 than between the ages of 8 and 25. In the next section, we consider whether learning ability under conditions of low environmental predictability may be related to age-related changes in general cognitive functioning as assessed by measures of fluid intelligence. First, we briefly discuss the age-gradient of fluid intelligence. Then, we present findings obtained with the C M T that empirically assess the relation between age differences in learning and age differences in fluid intelligence.

6.5

Are Age Differences in Fluid Intelligence Predictive of Age Differences in the Ability to Generate Rules of Behavior under Conditions of Low Environmental Structure?

Some of the best-documented findings in the literature on cognitive development and aging are age-related differences in fluid intelligence (5; 7; see also 88-90), Type A cognition (28), or the mechanics of cognition (3; 4). Fluid intelligence refers to an individual's capacity to organize information, to ignore irrelevancies, to concentrate, and to maintain and divide attention. Age-related differences in fluid intelligence are found with a wide variety of memory, reasoning, and spatial abilities measures. Perhaps the most widely used marker test of fluid intelligence is the Raven's Advanced Progressive Matrices test (APM; 91-93). This test is known to be a good index of general intelligence. For instance, in radex models of ability organization, the Raven's falls very close to the centroid of general intelligence (94).

Imposing Structure on an Unstructured Environment »

155

Each item in this test consists of a matrix of geometric patterns. Participants are instructed to determine the relations among elements in the rows and the columns, and to select the pattern that best completes the matrix. Performance on the Raven's has been found to steeply increase during middle childhood and adolescence, and to gradually decrease thereafter (see Figure 6.6). The link between age and fluid intelligence has been consistently demonstrated since at least the 1920s (91; 95; 96), and is readily apparent in many results from standardization data in psychometric and neuropsychological test batteries. Given that (a) the samples used for standardization are typically large and representative, (b) the performance measures are of established reliability and span a broad range of cognitive abilities, (c) negative adult age trends have been observed both with cross-sectional and longitudinal sampling schemes, and (d) adult age differences on tasks related to fluid intelligence are especially large at asymptotic limits of performance (97), the general phenomenon of a link between age and fluid intelligence must be considered robust. 60i

0 0

10

20

30 40 50 60 70 80 Age in Years Fig. 6.6: Age gradients for the Raven's Advanced Progressive Matrices Test. Each set of interconnected data points refer to a large cross-sectional reported in Salthouse (1991) or Raven (1989).

Age-related differences on tests of fluid intelligence are not only meaningful because they can be demonstrated reliably. They are also meaningful because performance on these tests has been shown to predict performance outside the laboratory and in non-academic settings (e. g., 98—100). Although the relation between performance on test batteries and performance outside the laboratory is not perfect, it has been shown to be significant and of at least moderate size. In the last, very brief, part of this chapter, we summarize empirical findings obtained with the CMT that directly address the question to what extent, if at all, old-age differences in rule formation ability are predicted by age differences in fluid intelligence.

156

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

In order to address this question, we performed a series of hierarchical regression analyses on the data described in section 6.3 of this chapter, followed by communality analyses. This analysis allows us to determine how much of the agerelated difference in the ability to learn under conditions of low environmental structure is due to the unique and shared influence of age-related change in (a) processing speed and (b) fluid intelligence, when processing speed is indexed by performance on the one-dimensional CMT, and fluid intelligence is indexed by performance on the Raven's. Given that we did not obtain an age difference in learning ability for the children and the young adult group, this analysis is, by necessity, limited to the empirically obtained difference between the young and old adult age groups. When partitioning the positive terms of age-group differences in learning ability, we find that 48.9% of the age-related variance was uniquely predicted by age group, 37.6% by age group and fluid intelligence, and 13.5% by age group, fluid intelligence, and processing speed. The unique term for age-group differences in learning ability related to processing speed was negative, indicating that processing speed is more strongly related to learning ability when the other predictors are present in the regression equation than when it is considered as the only predictor. These findings demonstrate that the ability to form rules of behavior in the absence of environmental structure is predicted, to some extent, by age-related differences in fluid intelligence, above and beyond the influence exerted by processing speed. A qualitatively very similar pattern of results is obtained when we use two measures of perceptual speed, rather than performance on the one-dimensional monitoring-task, to index processing speed. Although the numbers change, the pattern of results stays the same: young and old adult group differences in learning ability are predicted by age group differences in fluid intelligence. By negative implication, these results suggest that the ability to learn, i. e., to generate rules of behavior, under conditions of low environmental predictability, develops relatively early in life (e. g., before the age of eight). According to a lifespan dissociation hypothesis, positive age changes in fluid intelligence during middle childhood and adolescence may be related predominantly to factors other than learning (i. e., rule formation) ability, such as processing speed (101) or knowledge acquisition. In contrast, age-based reductions in fluid intelligence are more general in kind, and may also involve learning ability. This hypothesis is also consistent with evolutionary considerations in the context of life-span theory (102). Currently, further empirical work, including micro-analyses of monitoring performance, testing-the-limits through extensive practice, and more direct assessments of relevant component processes are underway to test the tenability of this hypothesis.

6.6

Summary and Conclusions

At the beginning of this chapter, we argued that learning is typically defined in terms of an organism's ability to modify its own internal Knowledge Structure in order to accommodate internal and/or external demands, and that indeed, most

Imposing Structure on an Unstructured Environment

157

existing research on human and animal learning is concerned with how organisms come to internally represent systematic structural features of the environment. However, humans at least are also capable of dealing with unpredictable, unstructured environments in highly systematic ways; that is, they are capable of modifying their internal Knowledge Structure even in the absence of environmental systematicity. Our main focus in the present chapter has been on the human ability to learn, that is, on the ability to form internal rules of behavior under conditions of low environmental structure. More specifically, our focus has been on potential ontogenetic changes in the human ability to generate internal rules of behavior. In the first part of the chapter, we examined whether there is theoretical reason to suspect that the ability to form rules of behavior when the environment is unpredictable may change with age. We first discussed the psychological reality of the cognitive control concept, and showed, on the basis of both psychological and neuropsychological evidence, that the control of one's processing is functionally separable of the processes that are controlled. Then, we summarized existing research in the areas of cognitive development and cognitive aging that demonstrates that cognitive control ability may diminish with increasing age. These findings imply the possibility of ontogenetic changes in the ability to generate internal rules of behavior under conditions of low environmental structure. We introduced a new experimental paradigm that can be used to investigate age differences in the ability to generate behavioral rules, and described the results of an initial experiment with this paradigm. Based on our empirical findings, we argued that the ability to form internal rules of behavior under conditions of low environmental predictability does indeed change across the life span, as had been anticipated by the discussion of agerelated changes in cognitive control. However, the changes in learning ability appeared to be more pronounced between the ages of 25 and 70 than between the ages of 8 and 25. We then discussed the age gradient of general cognitive functioning, as captured by measures of fluid intelligence, and empirically demonstrated that age differences in the ability to generate rules of behavior are predicted by age differences in measures of fluid intelligence. Taken together, the theoretical analysis and empirical findings presented in this chapter should be viewed as a first step toward trying to understand and disentangle the complex relation between properties of the information-processing system and the human organism's ability to form consistent rules of behavior when the environment is unpredictable. It will be fascinating to work out in detail which rules of behavior are optimal for a given information-processing system, but then, this step, if indeed it will ever be achieved, will need to be the focus of a different chapter ...

Acknowledgments Peter A. Frensch, Ulman Lindenberger, and Jutta Kray, Max-Planck-Institute for Human Development, Berlin, Germany. We are grateful to Paul B. Baltes for his very valuable insights into the theoretical aspects and his generous support of the

158

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

practical aspects o f the research reported in this chapter. We thank Annette RentzLühning, Sabine Felber, M a r k u s Metzler, a n d D o r i t Wenke for their assistance with data collection. Special thanks t o M . Stroux and B. Wischnewski for prog r a m m i n g the experimental task described in this chapter. C o r r e s p o n d e n c e c o n cerning this article s h o u l d be addressed to Peter A . Frensch, Max-Planck-Institute for H u m a n D e v e l o p m e n t , Lentzeallee 94, D - 1 4 1 9 5 Berlin, Germany. Electronic mail m a y be sent t o [email protected].

References 1. Frensch, P . A . (1998). One concept, multiple meanings: On how to define the concept of implicit learning. In: Handbook of implicit learning, M. Stadler and P. Frensch, eds. (Thousand Oaks, CA: Sage), pp. 4 7 - 1 0 4 . 2. Stadler, Μ. Α., and Frensch, P. A. (1998). H a n d b o o k of implicit learning (Thousand Oaks, CA: Sage). 3. Baltes, P. B. (1987). Theoretical propositions of life-span developmental psychology: On the dynamics between growth and decline. Dev. Psychol. 23, 611-626. 4. Baltes, P. B. (1993). The aging mind: Potential and limits. Gerontologist 33, 580-594. 5. Cattell, R. B. (1971). Abilities: Their structure, growth, and action (Boston: Houghton Mifflin). 6. Hebb, D. O. (1949). The organization of behavior. New York: Wiley. 7. Horn, J. L. (1982). The theory of fluid and crystallized intelligence in relation to concepts of cognitive psychology and aging in adulthood. In: Aging and cognitive processes, F. I. M. Craik and S. E. Trehub, eds. (New York: Plenum Press), pp. 237-278. 8. Logan, G. D. (1985). Executive control of thought and action. Acta Psychol. 60, 193-210. 9. Sternberg, R. J. (1985). Beyond IQ: A triarchic theory of human intelligence (New York: Cambridge University Press). 10. Norman, D. A. and Shallice, T. (1986). Attention to action: Willed and auto-

11.

12.

13.

14.

15.

16.

matic control of behavior. In: Consciousness and self-regulation, Vol. 4, R. J. Davidson, G. E. Schwartz, and D. Shapiro, eds. (New York: Plenum Press). Reason, J. (1990). H u m a n error (Cambridge, England: Cambridge University Press). Atkinson, R. C. and Shiffrin, R. M. (1968). H u m a n memory: A proposed system and its control processes. In: The psychology of learning and motivation, Vol. 2, K. W. Spence and J. T. Spence, eds., New York: Academic Press), pp. 8 9 - 1 0 5 . Newell, A. and Simon, H . A . (1972) H u m a n problem solving (Englewood Cliffs, NJ: Prentice-Hall). Shallice, T. (1994). Multiple levels of control processes. In: Attention and performance, Vol. 15, C. Umiltä and M. Moscovich, eds.(Cambridge, MA: M I T Press), pp. 3 9 5 - 4 2 0 . Baddeley, A. D. and Hitch, G. (1974). Working memory. In: The psychology of learning and motivation, Vol. 8, G . Bower, ed. (New York: Academic Press), pp. 47-89. Ackerman, P. L., Schneider, W., and Wickens, C. D. (1984). Deciding the existence of a time-sharing ability: A combined methodological and theoretical approach. H u m . Factors 26, 71 — 82.

17. Carlson, R. Α., Wenger, J. L., and Sullivan, M. A. (1993). Coordinating information from perception and working memory. J. Exp. Psychol.: Hum. Percept. Perform. 19, 5 3 1 - 5 4 8 .

Imposing Structure on an Unstructured Environment 18. Hagendorf, H. and Sä, Β. (July, 1994). Coordination in visual working memory. Poster presented at the International Conference on Working Memory, Cambridge, England. 19. Hunt, E., Pellegrino, J. W., and Yee, P. L. (1989). Individual differences in attention. In: The psychology of learning and motivation: Advances in research and theory, G. Bower, ed. (Orlando, FL: Academic Press), pp. 285 — 310. 20. Wenger, J. L. and Carlson, R. A. (in press). Learning and the coordination of sequential information. J. Exp. Psychol.: Hum. Percept. Perform. 21. Yee, P L . , Hunt, E., and Pellegrino, J. W. (1991). Coordinating cognitive information: Task effects and individual differences in integrating information from several sources. Cogn. Psychol. 23, 615-680. 22. Heuer, Η. (1994). Koordination [Coordination], In: Enzyklopädie der Psychologie [Encyclopedia of psychology] C II 3: Psychomotorik, Η. Heuer and S. Keele, eds. (Göttingen, Germany: Hogrefe), pp. 147—222. 23. McLeod, P. (1977). A dual task response modality effect: Support for multiprocessor models of attention. Quart. J. Exp. Psychol. 29, 651-667. 24. Shallice, T., McLeod, P., and Lewis, K. (1985). Isolating cognitive modules with the dual-task paradigm: Are speech perception and production separate processes? Quart. J. Exp. Psychol. 37a, 507-532. 25. Kelso, J. S., Southard, D. L., Goodman, D. (1979). On the coordination of two-handed movements. J. Exp. Psychol.: Hum. Percept. Perform. 5, 229-238. 26. Fitts, P. M. (1964). Perceptual-motor skills learning. In: Categories of human learning, A. W. Welto, ed. (New York: Academic Press). 27. Stuss, D. T. and Benson, D. F. (1986). Neuropsychological studies of the frontal lobes. Psychol. Bull. 95, 3 - 2 8 .

159

28. Hebb, D. O. (1945). Man's frontal lobe: A critical review. Archives of Neurology and Psychiatry 54, 10-24. 29. Teuber, H. L. (1955). Physiological psychology. Ann. Rev. Psychol. 9, 267-296. 30. Shallice, T. (1988). From neuropsychology to mental structure (Cambridge, England: Cambridge University Press). 31. Hynd, G. W. and Willis, G. (1985). Neurological foundations of intelligence. In: Handbook of intelligence: Theories, measurements, and applications, Β. B. Wolman, ed. (New York: Wiley), pp. 119-157. 32. Luria, A. R. (1973). The working brain: An introduction for neuropsychology (New York: Basic Books). 33. Lhermitte, F. (1983). "Utilization behavior" and its relation to lesions of the frontal lobes. Brain 106, 237-255. 34. Baddeley, A. (1986). Working memory. Oxford, UK: Clarendon Press. 35. Baddeley, A. and Wilson, B. (1988). Frontal amnesia and the dysexecutive syndrome. Brain and Cognition 7, 212-230. 36. Gemba, H. and Sasaki, K. (1989). Potential related to no-go hand movement task with color discrimination in human. Neurosci. Lett. 101, 263-268. 37. Stroop, J. R. (1935). Studies of interference in serial verbal reactions. J. Exp. Psychol. 18, 643-662. 38. Teuber, H. L. (1972). Unity and diversity of frontal lobe functions. Acta Neurobiologica Experimenta 32, 615— 656. 39. Benson, D. F., Stuss, D. Τ., Naeser, Μ. Α., Weir, W. S., Kaplan, Ε. F., and Levine, Η. (1981). The long-term effects of prefrontal leukotomy. Arch. Neurol. 38, 165-169. 40. Sasaki, K. and Gemba, H. (1986). Electrical activity in the prefrontal cortex specific to no-go reaction of conditioned hand movement with color discrimination in the monkey. Exp. Brain Res. 64, 603-606.

160

Peter A. Frensch, Ulman Lindenberger and Jutta Kray

41. Sasaki, Κ., Gemba, Η., and Tsujimoto, Τ. (1989). Suppression of visually initiated hand movement by stimulation of the prefrontal cortex in the monkey. Brain Res. 495, 100-107. 42. Gehring, W. J., Goss, Β., Coles, Μ. G. Η., Meyer, D. E., and Donchin, E. (1994). A neural system for error detection and compensation. Psychol. Sei. 4, 385-390. 43. Reinis. S. and Goldman, J. M. (1980). The development of the brain (Springfield, IL: Thomas). 44. Fuster, J. M. (1989). The prefrontal cortex (New York: Raven Press. 2nd ed). 45. Haug, H„ Knebel, G„ Mecke, E., Ο rum, C., and Sass, N.-L. (1981). The aging of cortical cytoarchitectonics in the light of stereological investigations. In: Eleventh international congress of anatomy: Advances in the morphology of cells and tissues (New York: A. R. Liss), pp. 193-197. 46. Ivy, G. O., Petit, T. L„ and Markus, Ε. J. (1992). A physiological framework for perceptual and cognitive changes in aging. In: The Handbook of Aging and Cognition, F. I. M. Craik and T. A. Salthouse, eds. (Hillsdale, New Jersey: Lawrence Erlbaum), pp. 273-314. 47. Shaw, T. G., Mörtel, K. F., Meyer, J. S., Rogers, R. L., Hardenberg, J., and Cutaia, Μ. M. (1984). Cerebral blood flow changes in benign aging and cerebrovascular disease. Neurology 34, 855-862. 48. Salthouse, T. A. (1991). Theoretical perspectives on cognitive aging (Hillsdale, NJ: Erlbaum). 49. Hale, S., Fry, A. F., and Jessie, K. A. (1993). Effects of practice on speed of information processing in children and adults: Age sensitivity and age invariance. Dev. Psychol. 29, 880-892. 50. Kail, R. (1991). Developmental change in speed of processing during childhood and adolescence. Psychol. Bull. 109, 490-501.

51. Salthouse, T. A. (1985). A theory of cognitive aging (Amsterdam: NorthHolland). 52. Cerella, J. (1990). Aging and information-processing rate. In: Handbook of the psychology of aging, J. E. Birren and K. W. Schaie, eds. (San Diego, CA: Academic Press. 3rd ed.), pp. 201-221. 53. Charness, N. and Campbell, J. I. D. (1988). Acquiring skill at mental calculation in adulthood: A task decomposition. J. Exp. Psychol.: General 117, 115-129. 54. Mayr, U. and Kliegl, R. (1993). Sequential and coordinative complexity: Age-based processing limitations in figural transformations. J. Exp. Psychol.: Learn. Mem. Cogn. 19, 1297-1320. 55. Mayr, U., Kliegl, R., and Krampe, R. T. (1996). Sequential and coordinative processing dynamics in figural transformations across the life span. Cognition 59, 6 1 - 9 0 . 56. Somberg, B. and Salthouse, T. A. (1982). Divided attention abilities in young and old adults. J. Exp. Psychol.: Hum. Percept. Perform. 8, 651—663. 57. Korteling, J. E. (1991). Effects of skill integration and perceptual competition on age-related differences in dual-task performance. Hum. Factors 33, 35 — 44. 58. Ponds, R. W. H. M „ Brouwer, W. H., and Wolffelaar, P. C. (1988). Age differences in divided attention in a simulated driving task. Journals of Gerontology: Psychological Sciences 43, 151-156. 59. Salthouse, Τ. Α., Rogan, J. D., and Prill, K. A. (1984). Division of attention: Age differences on a visually presented memory task. Memory and Cognition 12, 613-620. 60. McDowd, J. M. and Craik, F. I. M. (1988). Effects of aging and task difficulty on divided attention performance. J. Exp. Psychol.: Hum. Percept. Perform. 14, 267-280.

Imposing Structure on an Unstructured Environment 61. Birch, L. L. (1976). Age trends in children's time-sharing performance. J. Exp. Child Psychol. 22, 331-345. 62. Birch, L. L. (1978). Baseline differences, attention, and age differences in time-sharing performance. J. Exp. Child Psychol. 25, 505-513. 63. Chelune, G. J. and Baer, R. A. (1986). Developmental norms for the Wisconsin Card Sorting Test. Journal of Clinical and Experimental Neuropsychology 8, 219-228. 64. Witkin, Η. Α., Dyk, R. B„ Faterson, G. E., Goodenough, N. R., and Karp, S. A. (1962). Psychological Differentiation (New York: Wiley). 65. Witkin, H. A. and Goodenough, D. R. (1981). Cognitive styles: Essence and origins (New York: International Universities Press). 66. Comalli, P. E., Wapner, S., and Werner, H. (1962). Interference effects of Stroop-color test in children, adulthood, and aging. Journal of Genetic Psychology 100, 4 7 - 5 3 . 67. Rand, G., Wapner, S., Werner, H., and MacFarland, J. H. (1963). Age differences in performance on the Stroop color-word test. Journal of Personality 31, 534-558. 68. Wise, L. Α., Sutton, J. Α., and Gibbons, P. D. (1975). Decrement in Stroop interference time with age. Perceptual and Motor Skills 41, 149-150. 69. Case, R. and Globerson, T. (1974). Field independence and central computing space. Child Dev. 45, 772-778. 70. Kliegl, R. and Lindenberger, U. (1993). Modeling intrusion errors and correct recall in episodic memory: Adult age differences in encoding list context. J.. Exp. Psychol.: Learn. Mem. Cogn. 19, 617-637. 71. McCormack, P.D. (1982). Coding of spatial information by young and elderly adults. Journal of Gerontology 37, 8 0 - 8 6 . 72. McCormack, P. D. (1984). Temporal coding by young and elderly subjects in a list-discrimination setting. Bulletin

73.

74.

75.

76.

77.

78.

79.

80.

81.

82.

83.

161

of the Psychonomic Society 22, 401 — 402. Mäntylä, Τ. and Bäckman, L. (1992). Aging and memory for expected and unexpected objects in real-world settings. J. Exp. Psychol.: Learn.Mem. Cogn. 18, 1298-1309. Mclntyre, J. S., and Craik, F. I. M. (1987). Age differences in memory for item and source information. Canadian Journal of Psychology 41, 175 — 192. Schacter, D. L., Kaszniak, A. K., Kihlstrom, J. F., and Valdiserri, M. (1991). The relation between source memory and aging. Psychology and Aging 6, 559-568. Schacter, D. L., Osowiecki, D., Kaszniak, A. W., Kihlstrom, J. F., and Valdiserri, M. (1994). Source memory: Extending the boundaries of age-related deficits. Psychology and Aging 9, 81 — 89. Rabbitt, P. M. A. (1990). Age, IQ and awareness, and recall of errors. Ergonomics 33, 1291-1305. Connelly, S. L. and Hasher, L. (1993). Aging and the inhibition of spatial location. J.l. Exp. Psychol.: Hum. Percep. Perform. 19, 1238-1250. Hamm, V. P. and Hasher, L. (1992). Age and the availability of inferences. Psychology and Aging 7, 56—64. Kane, M. J., Hasher, L., Stoltzfus, E. R., Zacks, R. T., and Connelly, S. L. (1994). Inhibitory attentional mechanisms and aging. Psychology and Aging 9, 103-112. Rogers, W. A. and Fisk, A. D. (1991). Age-related differences in the maintenance and modification of automatic processes: Arithmetic Stroop interference. Human Factors 33, 45—56. Davis, H. P., Cohen, Α., Gandy, M., Colombo, P., VanDusseldorp, G., Simolke, N., and Romano, J. (1990). Lexical priming deficits as a function of age. Behavioral Neuroscience 104, 288-297. Haaland, K. Y„ Vranes, L. F., Goodwin, J. S., and Garry, P. J. (1987). Wis-

162

84.

85.

86.

87.

88.

89.

90. 91.

92.

93.

94.

Peter A. Frensch, Ulman Lindenberger and Jutta Kray consin Card Sort Test performance in a healthy elderly population. Journal of Gerontology 42, 345 - 346. Axelrod, S. and Cohen, L. D. (1961). Senescence and embedded-figure performance in vision and touch. Perceptual and Motor Skills 12, 283-288. Lee, J. A. and Pollack, R. H. (1978). The effect of age on perceptual problem-solving strategies. Experimental Aging Research 4, 37—54. Cohn, Ν. Β., Dustman, R. Ε., and Bradford, D. C. (1984). Age-related decrements in Stroop color test performance. J. Clin. Psychol. 40, 12441250. Eisner, D. A. (1972). Life-span age differences in visual perception. Perceptual and Motor Skills 34, 857-858. Carroll, J. B. (1993). Human cognitive abilities. A survey of factor-analytic studies. (Cambridge, MA: Cambridge University Press). Spearman, C. E. (1923). The nature of intelligence and the principles of cognition (London: MacMillan). Spearman, C. E. (1927). The abilities of man (New York: MacMillan). Raven, J. C. (1941). Standardisation of progressive matrices. British Journal of Medical Psychology 19, 137-150. Raven, J. (1989). The Raven Progressive Matrices: A review of national norming studies and ethnic and socioeconomic variation within the United States. Journal of Educational Measurement 26, 1 — 16. Raven, J. C., Court, J. H., and Raven, J. (1987). A manual for Raven's Progressive Matrices and Vocabulary Tests (London: Η. K. Lewis; San Antonio, Texas: The Psychological Corporation). Marshalek, B., Lohman, D. F., and Snow, R. E. (1983). The complexity

95.

96.

97.

98.

99.

100.

101.

102.

continuum in the radex and hierarchical models of intelligence. Intelligence 7, 107-127. Foster, J. C. and Taylor, G. A. (1920). The applicability of mental tests to persons over 50. J. Appl. Psychol. 4, 39— 58. Jones, Η. E., and Conrad, H. (1933). The growth and decline of intelligence: A study of a homogeneous group between the ages of ten and sixty. Genetic Psychological Monographs 13, 223— 298. Baltes, P. B. and Kliegl, R. (1992). Further testing of limits of cognitive plasticity: Negative age differences in a mnemonic skill are robust. Dev. Psychol. 28, 121-125. Hunter, J. E. and Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychol. Bull. 96, 7 2 - 9 8 . O'Toole, Β. I. and Stankov, L. (1992). Ultimate validity of psychological tests. Personality-and-Individual-Differences 13, 699-716. Willis, S. L. and Schaie, K. W. (1986). Practical intelligence in later adulthood. In: Practical intelligence: Nature and origins of competence in the everyday world, R. J. Sternberg and R. K. Wagner, eds. (Cambridge, England: Cambridge University Press), pp. 236-268. Kail, R. and Salthouse, T. A. (1994). Processing speed as a mental capacity. Acta Psychologica 86, 199-225. Baltes, P. B., Lindenberger, U., and Staudinger, U . M . (1998). Life-span theory in developmental psychology. In: Handbook of child psychology. Vol. 1 Theoretical models of human development, R. Lerner, ed. (New York: Wiley. 5th ed.), pp. 1029-1143.

II.

Perception and Representation of VisualSpatial and Temporal Information

7.

Motion Perception and Motion Imagery: New Evidence of Constructive Brain Processes from Functional Magnetic Resonance Imaging Studies Rainer Goebel, Lars Muckli and Wolf Singer

7.1

Introduction

Since the pioneering work of the gestalt psychologists, several rules of perceptual organization have been formulated which characterize visual perception as a constructive process of the brain. This view emphasizes that a visual stimulus is not passively mapped onto the brain but is instead actively processed based on innate knowledge and acquired experiences with the visual world. The constructive nature of visual processing can be convincingly demonstrated with perceptual illusions, e. g. the perception of illusory contours. There is accumulating evidence that at the neuronal level the operation of gestalt principles is reflected in synchronous oscillatory discharges of organized cell assemblies (1). Here we report several functional Magnetic Resonance Imaging (fMRI) studies which attempt to localize the underlying neuronal substrate of several constructive brain processes at the system level. We have studied the perceptual illusions of apparent motion, apparent motion of objects defined by illusory contours and the phenomenon of shape-frommotion. Further insights in the constructive nature of vision were obtained by comparing brain activation during imagery of moving objects with activation during imagery of static objects. The influence of slight alterations of the configuration of stimuli on visual perception and the f M R I signal was investigated in detail by means of a transparent motion paradigm.

7.1.1

Two Main Visual Processing Pathways

It has been widely accepted that the mammalian visual system is composed of several parallel channels, each subserving a different aspect of visual function (2). Ungerleider and Mishkin (3) proposed that visual processing in the primate cortex is divided into two main pathways, a ventral stream devoted to the fine analysis of the visual scene including the perception of form and color, and a dorsal stream that codes the spatial characteristics of the visual scene and analyzes motion. The ventral stream which uses mainly foveal signals, processes information relatively slowly but with high spatial resolution. Activity flows from the primary visual cortex (VI) via the inferotemporal cortex (IT) to perirhinal and prefrontal areas. The dorsal stream, which is concerned with spatial localization of objects and the programming of visually controlled motor acts, operates rapidly but with coarse

166

Rainer Goebel, Lars Muckli and Wolf Singer

spatial resolution (4). In the dorsal stream, activity flows from primary visual cortex via several areas in the parietal and superior temporal lobe to premotor and prefrontal areas. In prefrontal cortex, the two streams project to different areas, the dorsal stream in the dorsolateral area, devoted to short-term storage of spatial locations, and the ventral stream in a ventral area specialized for shortterm memory of object features. Most experiments that led to the differentiation of the dorsal and ventral pathways were performed on rhesus monkeys (Macaca mulatto). Most neuroscientists agree that the visual system of this monkey approximates very closely the situation in the human brain. With modern brain imaging techniques, the two pathways could also be separated in the normal human brain (5 — 7).

7.1.2

Functional Magnetic Resonance Imaging

Using functional Magnetic Resonance Imaging (fMRI), investigators were able to localize neural activity in human brains during sensory, motor, and cognitive activity. The technique is based on the fact that there are local blood flow and blood oxygenation changes in response to neural activity which are detectable with the technology of Magnetic Resonance Imaging (MRI) by choosing appropriate imaging parameters. The f M R I signal reflects changes in local oxygenation level since oxygenated hemoglobin has a much smaller magnetic susceptibility than deoxygenated hemoglobin (BOLD Effect = Blood Oxygenation Level Dependent Effect). f M R I is rapidly becoming the technology of choice for many functional brain activation studies in humans because it is acceptable for repeated use and it has better spatial and temporal resolution than Positron Emission Tomography (PET) imaging which is based on the same hemodynamic phenomena. The ultimate limitation of spatial resolution of f M R I may be the spatial specificity of the circulatory system and its local changes in response to neural activity. Anatomical and physiological evidence suggests that this circulatory specificity will be less than 1 mm for cerebral cortex. The temporal resolution of f M R I is limited by the vagaries of the hemodynamic responses of the brain's circulatory system as it changes with neural activation. While some of those changes occur very quickly (on the order of 10 —100 ms), most hemodynamic changes that are detectable with current f M R I systems appear only after a delay of about 2000 ms and take another few seconds to reach peak. This means that f M R I is unlikely to replace tools such as electroencephalography (EEG) and magnetoencephalography (MEG). These methods measure the electrical and magnetic neuronal signals, have millisecond temporal resolution but suffer from poor spatial resolution. Current research attempts to combine fMRI/PET and EEG/MEG in order to provide the essential data for the analysis of neuronal processes, their topological distribution and their precise timing. At the beginning of an f M R I experiment, a fast scan of the subject's brain is recorded consisting of three orthogonal slices. These are used to select the orientation and position of the slices that will be used in the functional imaging as well as for an anatomical reference scan. The 2D or 3D anatomical reference scan

Motion Perception and Motion Imagery: New Evidence from fMRI Studies

167

consists normally of images with higher spatial resolution (e. g., 1.0 mm χ 1.0 mm χ 1.0 mm) than the functional images and is used to relate the functional findings to known anatomical landmarks in the brain. During functional imaging, the set of specified slices (comprising a volume) is measured at regular intervals while the subject performs different experimental tasks. The changes of neural activity during different conditions are reflected in small changes in the functional images which can be visualized using appropriate data analysis techniques.

7.1.3

Methodological Details of the fMRI Measurements

Fifteen subjects without a previous neurological or psychiatric history participated in one or more experiments. Age ranged from 25 to 33 years and there were six males. Informed consent was obtained from each subject. Visual stimuli were delivered under computer control (Digital DECpc Celebris XL 590) to an LCD display panel (Sharp QA-1000) and a high-luminance overhead projector (Elmo HP-285P) or to a high-luminance L C D projector (EIKI LC-6000). Subjects were in supine position and viewed the screen through an adjustable mirror fixed to the head coil. The image was back-projected onto a frosted screen positioned at the foot end of the scanner. Visual stimuli were generated in real time using the ELSA Winner 2000 Pro/X graphics adapter and the ELSA Powerlib C library. Each experiment consisted of 128 measurements of 10—15 oblique transversal slices. The slices were positioned either parallel to the calcarine fissure or parallel to the AC-PC line. In each experimental condition, a sequence of eight measurements was recorded lasting 24s. In all conditions a small fixation cross appeared in the center of the screen. This was the only stimulus during fixation and imagery periods (see below). In some studies an objective motion stimulus was used consisting of 400 white dots moving radially outward on a black background (visual field: 38° wide by 23° high, dot size: 0.06° χ 0.06°, dot velocity: 3.67s - 14.47s). This stimulus is known to produce a clear response in the motion sensitive areas of the dorsal stream (areas M T and msT) without provoking eye movements (8). Additionally, a static stimulus which consisted of 400 stationary dots was used in the first experiments. Eye movements were recorded with a video camera outside the scanner in order to verify whether subjects were able to fixate the cross during apparent motion and motion imagery conditions. Echo-planar images were collected on a 1.5-T scanner (Siemens Magnetom Vision) using the standard head coil and a gradient echo echoplanar sequence (TR = 3000 ms, TE = 66 ms, flip angle = 90°, FOV = 210 mm χ 210 mm, slice thickness = 3 mm, imaging matrix = 64 χ 64, voxel size = 3.2 mm χ 3.2 mm χ 3 mm). In most experiments the Siemens Magnetom gradient overdrive was used allowing functional EPI scans with high spatial resolution (TR = 3000 ms, TE = 69 ms, flip angle = 90°, FOV = 210 mm χ 210 mm, slice thickness = 3 mm, imaging matrix = 128 χ 128, voxel size = 1.6 mm χ 1.6 mm χ 3 mm). Before each set of functional scans, we recorded a Τ1-weighted series of 2D images with the same orientation, slice thickness, and field of view as the functional scans or a fast T l weighted 3D volume ( M P R A G E sequence).

168

Rainer Goebel, Lars Muckli and Wolf Singer

Data analysis was performed using custom software (9; 10). Prior to statistical analysis the time series of functional images was aligned for each slice in order to minimize the effects of head movement. For each slice the third recorded functional image was used as a reference image to which all other images of the slice time series were registered. In order to evaluate statistically the differences between experimental conditions cross-correlation analysis was applied. For the computation of correlation maps, the stimulation protocol served as a reference function reflecting the temporal sequence of experimental and control conditions (experimental condition = 1, control condition = 0). On a pixel-by-pixel basis the signal time course was cross-correlated with the respective reference function (11). Pixels were included into the statistical map if the obtained correlation value was greater than 0.3 given lag values of 1 and 2 (corresponding to a 3 - 9 sees delay after the beginning of a stimulation condition in order to adapt to the hemodynamic response). Cross-correlation maps were superimposed both on the original functional scans as well as onto the Τ1-weighted anatomical reference scans. Striate and extrastriate cortical areas VI, V2d, V3, V3a were discriminated based on anatomical location and functional properties (12). The delineation of these areas was validated for each subject using results from separate recordings for generating retinotopic maps (compare 13-14). In order to delineate MT/MST as well as other motion-sensitive areas we defined as regions of interest (ROI) those regions that could be activated with the objective motion stimulus consisting of radially moving dots. In order to assess whether an area is motion-selective we compared the objective motion condition with the static dot stimulus. In experiments 3 and 4 additional ROIs were defined by comparing all imagery conditions with the fixation conditions. The data for statistical comparisons consisted of the mean time course of all voxels of an analyzed area. Based on these data the mean of the raw f M R I signal for each subject and condition in a given experiment was computed. These mean values were analyzed using ANOVA and post-hoc pairwise comparisons using stimulus condition as a within-group factor. The obtained p-values were corrected for multiple comparisons. Values of per cent signal change averaged across subjects were computed on the basis of the difference between the mean values of the f M R I signal in each experimental condition and the mean f M R I signal in the fixation periods for each individual subject. For 3D visualization and measurement of Talairach coordinates, high resolution Τ1-weighted 3D data sets (voxel size: 1.0 mm χ 1.0 mm χ 1.0 mm) were recorded in separate sessions. Statistical maps were transformed into 3D data sets and interpolated to the same resolution as the structural 3D data set. For each subject the structural and functional 3D data sets were transformed into Talairach space (16) which allowed us to compare activated brain regions across different experiments and across different subjects and to determine Talairach coordinates of these regions. Talairach transformation was performed in two steps. The first step consisted of rotating the 3D data set of each subject to be aligned with the stereotaxic axes. For this step the location of the anterior commissure (AC) and the posterior commissure (PC) as well as two rotation parameters for midsagittal alignment had to be specified manually. In the second step the extreme points of the cerebrum

Motion Perception and Motion Imagery: New Evidence from fMRI Studies

169

were specified. These points together with the AC and PC coordinates were then used to scale the 3D data sets into the dimensions of the standard brain of the Talairach and Tournaux atlas using a piecewise affine and continuous transformation for each of the 12 defined subvolumes.

7.2

FMRI Experiments

Perceptual illusions and visual imagery are important paradigms for the experimental study of constructive aspects of vision, viz the generation of visual percepts that goes beyond the information contained in the mere physical composition of the stimuli (17). Certain types of illusions arise when the physical properties of a stimulus are supplemented by perceptual features that rely on the assumptions of the brain about what it expects in the outside world. For example, when stimuli separated in space are turned on and off in alternation at appropriate temporal intervals (18), subjects perceive one stimulus moving between the two stimulus positions (apparent motion) rather than two stationary flickering stimuli. The perception of motion of objects that in fact do not change position is an example of a constructive visual process. Visual imagery, on the other hand, may be performed in the complete absence of visual stimuli. Thus, when a subject is requested to imagine a previously seen visual scene, the task usually consists of the construction of a visual image purely from stored information. In order to study constructive aspects of motion perception we gradually reduced the amount of objective motion perception and increased the amount of internally generated motion representations in a series of experiments using objectively moving stimuli, apparent motion stimuli, stimuli inducing apparent motion of figures defined by illusory contours and motion imagery conditions (19; 20).

7.2.1

Apparent Motion

Previous brain imaging studies using either positron emission tomography (PET) or functional magnetic resonance imaging (fMRI) have shown that human cortical areas MT (V5) and msT respond with increased activity to moving stimuli (8; 21; 22) and in certain conditions where stationary stimuli induce illusory motion (17; 23). It remains to be seen to what extent apparent motion stimuli also activate the motion-selective areas M T and msT. Since it is known that these areas respond to some degree to flickering stimuli (8), appropriate control conditions had to be specified which differ from the apparent motion stimuli only in the relative timing of appearing objects. Additionally, form-motion interactions were investigated in a second experiment by inducing apparent motion of squares whose outlines were defined by illusory contours. This paradigm integrates two perceptual illusions and should provide additional insight into the interaction of brain regions responsible for extracting different stimulus properties.

170

Rainer Goebel, Lars Muckli and Wolf Singer

MT/MST -

256 ms

Θ

0 32 ms

0) 6 u> r •C Ο

ra

Ο

256 ms

V3A

χ

A

A

O) V c/> 0 6

Ι ÜH •

32 ms

Apparent motion

Flickering control

Objective motion

Static Apparent Flickering motion control

Fig. 7.1: Main experimental conditions and results of the first apparent motion experiment. The sequence of stimulation conditions consisted of: Fixation, Objective motion, Fixation, Static, Fixation, Apparent motion, Fixation, Flickering control. This sequence was repeated twice. (A) The apparent motion stimulus consisted of two concentric rings which appeared in alternation, separated by a short blank interval to enhance the perception of apparent motion. (B) Statistical comparison of activation levels in VI, V3 and MT/MST averaged over 10 subjects. Mean percent signal change ±s.e.m. is displayed for each condition.

In the first experiment, perception of apparent motion was induced with two concentric rings (diameters: 3° and 10°) that appeared in alternation, separated by a short blank interval of 32 ms (Fig. 7.1 A). We opted for this stimulus, which was perceived as a single shrinking and expanding ring by all subjects (n=10), because it does not elicit eye movements. As a control stimulus devoid of apparent motion cues both concentric rings were turned on and off simultaneously. Thus, this flickering control condition differed from the apparent motion condition only in the relative timing between the large and the small ring. Two flickering control conditions were used (flickering control I and II, see Fig. 7.1 A). Area VI responded with similar activation levels to all stimulation conditions, with the lowest activation in the two ring conditions. In area V3, the objective motion condition, the apparent motion condition and the flickering control evoked similarly large responses, but in contrast to VI the stationary stimulus was significantly less effective (p

α> Ε

εto

CL

Parameter X

Fig. 10.3: Spatial representations either used (A, Β) or discussed to be used (C, D) by hymenopteran insects during large-scale navigation. In all four cases, the central-place foragers face the large-scale problem of setting their courses for a distant goal, which they cannot detect directly. H, homing site, or central place, to which the navigator routinely returns; FL, F2, two foraging sites regularly visited by the animal. A Vector representation. Vector information (FLY, F2Y) is acquired and used during path integration (dead reckoning). Its acquisition and use does not depend on local landmark information. Β Landmark-based route information (FL R, F2R) usually supplements vector information, but can be used also in the absence of vector information. The animal pinpoints H, F l , and F2 by the location of these sites relative to nearby landmarks (site-specific snapshots Hs, FL s, F2S).

Spatial Representations in Small-Brain Insect Navigators: Ant Algorithms

249

certainly favored the evolution of such supplementary systems for the simple reason that path integration, in the very essence of the matter, is an egocentric system of navigation and hence inherently prone to cumulative errors. While the path-integration system employs skymark information, some of the most prominent supplementary systems depend on landmark information. The use bees and ants make of such information is so efficient that some researchers have invoked the concept of large-scale cognitive maps built into the cockpits of insect navigators (19; 20). This concept implies that the insect is able to chart the relative positions of familiar locations in a common system of reference, that is to construct a metric map, or floor plan, of its environment. However appealing this concept might be to the cognitive phenomenologist, it rests on shaky experimental grounds. Several investigators have not been able to replicate the results claimed in favor of the m a p hypothesis (21—23); for a review see 24), or could interpret the results by referring to more basic mechanisms of landmark orientation (see 25). One of these basic mechanisms discovered in both bees and ants is site recognition by image matching. To pinpoint a particular goal — be it a nesting site or a feeding site - the insect stores a panoramic view of the skyline surrounding the goal and then guides its return by moving so as to improve the fit between its current view of the skyline and the stored one. This template-matching hypothesis is supported by substantial experimental evidence. Recent work even shows that the stored "panoramic" images are organized retinotopically (26; for Drosophila tested in a tethered-flight paradigm see 27). For such a retinotopic system to be employed reliably the animal must experience a particular skyline consistently f r o m the same vantage point and must consistently adopt the same direction of view, so that its internal (retinotopic) and external (earth-bound) coordinates are kept in register (for bees see 28—30). When first leaving a goal, to which a social bee or social wasp will later return, the insect navigators perform elaborate flight maneuvers, so-called orientation flights, during which they acquire the necessary landmark information (e. g. 31). During these flights, the insect backs away f r o m the goal in a series of arcs that

C If the animal could associate the home vectors Fl ν and F2v (reversals of Fl κ and F2y) with the appropriate site-specific landmark memories (Fls and F2s), and if it later could recall these home vectors from a long-term memory store, vector information could be embedded within a particular familiar environment. In this case, insects might be able to use vector addition and subtraction in order to compute a novel route (FlF2y) from two memorized vectors (Fl y and F2 y). D In a metric map vectors, sites, and routes are encoded in a general (allocentric) system of reference. In the present account, the references are symbolized by Cartesian coordinates X and Y, but any other large-scale system of reference would do as well. Note: Central place foragers such as bees and ants have been found to acquire and use representations A and B, but not D. Recent work shows that Cataglyphis ants are able to store and retrieve site-based vectors as shown in C (Fl y, F2 y), but these vectors seem to be used by the ants as a means to facilitate route-based navigation rather than to accomplish vector addition and subtraction.

250

Rüdiger Wehner

are centered about the goal. As one can deduce f r o m the dynamics of the flights, the insect views the goal at relatively fixed retinal positions. Video recordings of both the orientation flights and the subsequent return flights show that the insects assume the same orientation when first acquiring and later using this landmark information, and that they do so during those periods of their circling flights during which their angular velocities are low (32; 33). One prerequisite for the template-matching mechanism to work is that the patterns to be matched - the stored one and the current one — are rather similar. This is usually achieved by the mere fact that the path-integration system brings the animal sufficiently close to the goal for the matching mechanism to work effectively. It is only there that the latter mechanism is switched on. For example, an array of (artificial) landmarks specifying the nesting site of a Cataglyphis colony induces searching for the nest only if the ants have reset their path-integration system to zero. Otherwise, the ants strictly follow their vector courses, and the familiar landmarks are ignored (26). Landmark information is used not only for site recognition, but also for route guidance. Ants can follow a familiar route characterized by either natural or artificial landmarks even if their vector stores have previously been emptied. They might do so by acquiring a series of snapshot-based templates of visual scenes, and might later retrace their routes by trying to match their current retinal images to sequentially retrieved stored images. This hypothesis, however, still requires experimental testing. Based on the evidence given so far as well as on data not presented here we can conclude that bees and ants are able to memorize vector courses (Fig. 10.3A) and landmark routes (Fig. 10.3B). Are they also able to combine both types of information? This is an important question in so far as if it could be answered in the affirmative, the insect would have at its disposal some basic mental tools to assemble a "vector m a p " of its foraging environment (Fig. 10.3C). In this context one should note that all landmark information is acquired while the insect is continually integrating its path by dead reckoning. Hence, the information collected in this way is represented within an egocentric frame of reference. Once the animal has completed its foraging journey and has arrived at the nesting site again, its home vector has been reduced to zero. If such a "zero-vector" ant is displaced by the experimenter f r o m the nest to the very feeding site from which it has just returned, and released there, it does not recall its current home vector and set out in the proper home direction, but gets immediately engaged in a systematic search centered about the point of release (34). Amazingly, the search density profile does not show any directional bias in the direction of the vector course, which the ants have followed just a few minutes ago. Nevertheless, under certain conditions (e. g. when the ant starts for a new foraging trip to a previously visited feeding site), the goal vector - i. e. the 180° reversal of the home vector - is routinely recalled from longer-term storage. Such higher-order, long-term storage of vector information would open up the possibility that vector information gets linked to memory stores of panoramic views pertinent to particular foraging sites. If this were the case, information acquired within an egocentric system of coordinates might eventually be transferred to a geocentric (allocentric) frame of reference.

Spatial Representations in Small-Brain Insect Navigators: Ant Algorithms

251



Fig. 10.4: Path integration as analyzed in Cataglyphis ants: egocentric (ant-based) and geocentric (site-based) vectors. For the sake of simplicity, an ant's foraging path (for an example of a real foraging path see 18) is dissected schematically into 13 steps. After each step the ant updates its egocentric home vector as indicated for steps nos. 4, 10, and 13. In the dead-reckoning scheme adopted by Cataglyphis for updating its ant-based home vector (see algorithm described in 16) angles δ rather than α are used — and measured by means of the skylight compass illustrated in Fig. 10.1. In addition to this egocentric vector information, the ants can obtain geocentric site-based vector information. Such site-based vectors are associated with characteristics of particular sites, e. g. landmark snapshots taken at these sites. As an example, a site-specific vector (see heavily dashed line) is shown for locality no. 7 positioned next to landmark LM. Note: Egocentric vector information is acquired and used by the ant continually as it moves through its environment. Geocentric vector information seems to be acquired at particular locations, where it is linked, for example, to landmark panoramas. It might be represented neuronally in some higher-order memory store (as compared to the continually updated egocentric vector information which might reside in some kind of working memory).

O u r recent Cataglyphis w o r k provides s o m e evidence t h a t t h e latter m i g h t a c t u ally occur. If a n t s e n c o u n t e r p a r t i c u l a r l a n d m a r k s at p a r t i c u l a r sites a l o n g a freq u e n t l y traveled r o u t e , they are able t o a c q u i r e site-specific ( l a n d m a r k - b a s e d ) vect o r i n f o r m a t i o n in a d d i t i o n t o their c o n t i n u a l l y u p d a t e d ( a n t - b a s e d ) h o m e - v e c t o r i n f o r m a t i o n . In s u b s e q u e n t tests they can retrieve this site-specific v e c t o r i n f o r m a tion f r o m a l o n g - t e r m m e m o r y store even if their c u r r e n t h o m e v e c t o r h a s n o t yet

252

Rüdiger Wehner

been reset to zero (Fig. 10.4). This can be shown most convincingly by releasing ants in the vicinity of a specific landmark configuration characterizing a particular site with which the ants are familiar. Cataglyphis would then recall the (site-specific) vector information pertinent to the landmark scene in question, and walk for some distance in the proper direction. The argument that in this case vector information is linked to landmark scenes is convincing, because the site-specific (artificial) landmarks were displaced, along with the ants, to novel territory. Due to this experimental paradigm the influence that any large-scale landmark scenery could have exerted on the ant's directional choices was fully excluded. Bees tested within a much smaller-scale environment exhibit analogous behavioral performances (35). In a two-compartment maze, they associate 45° and 135° gratings of black-and-white stripes presented on vertical walls with flight trajectories to the left and to the right, respectively. Intermediate stripe orientations, to which the bees have never been trained, evoke flight trajectories with intermediate directions. Hence, bees seem to be able to link a continuously variable visual parameter to an equally variable motor parameter (but see 36, for somewhat different results). Returning to large-scale navigation and to the Cataglyphis results mentioned above we are now in the position to propose an intriguing hypothesis. If insects were able to store simultaneously site-specific vectors of a number of places within their foraging ranges, they would, in principle, dispose of the information necessary to compute novel routes by performing some kind of vector summation or subtraction, that is, to construct what could be called a vector m a p (Fig. 10.3C). In this kind of map, the vectors are laid down, so to speak, on to the surface of the earth, i. e. transferred from the ant's egocentric frame of reference to a geocentric one However, the question whether insect navigators can handle site-specific vectors in this geocentric way, is a matter of debate (see 23; 37).

10.3

Discussion: Navigating Successfully

Are ants intelligent? This is the question we have raised in the beginning. Can we answer it now after some of the most amazing navigational abilities of these animals have come to the fore? Of course, as intelligence has always been a controversial topic, and too complex a notion to be captured by a simple definition, any answer will be doomed to elicit disagreement f r o m one scientific camp or another. In any way, as Newell and Simon (38) remark in their famous Turing Award Lecture, "there is no intelligence principle as there is no vital principle that conveys by its very nature the essence of life". In accord with this notion, let us follow McFarland's and Boesser's (39) advice to abandon the quest for general intelligence and focus instead on particular intelligent behaviors. If such behaviors are characterized by the ability to solve hard problems (40), the question boils down to an inquiry about how hard these problems are, and how they are solved. The hardness of the problem has been illustrated, at least partly, in the preceding part of this chapter, and more about the structural complexity of the tasks actually

Spatial Representations in Small-Brain Insect Navigators: Ant Algorithms

253

accomplished by social hymenopterans will be presented below. What then are the strategic ways the insect uses in solving its navigational problems? The cognitivistic approach in tackling such questions is to create a fully elaborated representation of the subject's three-dimensional world and its relevant properties, to outline the computational structure of the all-inclusive problem, and then to use this information to perform any particular task that comes up at any particular time during the navigational process. Following this approach one studies complex behavior at the level of computational procedures, and does so by manipulating symbols with little consideration of the underlying structure of the device - neural, electronic or otherwise - on which the computations are performed. This approach is highlighted best by Newell's and Simon's (41) General Problem Solver. The navigational problems discussed in the present account are truly hard in computational terms. In the three subtitles of the Analysis chapter the expressions given in front of the dash allude to the all-inclusive, general-purpose solutions of the navigational problem under investigation. Provided with intricate knowledge of spherical skylight geometry, the physical laws of straylight optics, and astronomical information about the movement of the Earth, ingenious navigators can reconstruct the Ε-vector distributions in the daytime sky and use them as a compass scale. They would be able to compute a given compass course at any particular time of day and year and in any particular part of the world, even if only small patches of open sky were available every now and then. Furthermore, they would use trigonometric rules to integrate information about courses steered and distances covered into a current home vector. Finally, they would acquire and use a topographic map — or its mental analogue, a cognitive map - containing all information about the three-dimensional structure of their home-range landscape, providing them with the geographical coordinates of their current positions or future goals, and informing them about any novel course they would like to take. Ant navigators successfully cope with these exacting spatial problems, but do so in ways adapted to their particular needs. For example, they constrain their foraging excursions to short periods of time, both day and year. Given these temporal constraints, they can afford to rely exclusively on some simple and stereotyped information about skylight cues, and yet navigate exactly. It is only under particular experimental conditions, in which the animals are presented with unnatural sequences of skylight stimuli, that systematic navigational errors occur. Whatever paradigm is adopted by the experimenter, the ants' errors can always be predicted, with considerable accuracy, by using a simple matching algorithm, by which the animal tries to match an internal Ε-vector template — or "map" — with the current Ε-vector distribution. The usefulness of this map concept in explaining the experimental data is not to claim that Cataglyphis consults a map in reading its compass courses. Probing into the neural machinery mediating the insect's behavior reveals some striking results: Within a specialized visual subsystem, the insect's "polarization channel", a large set of microanalyzers feeds into a small set of macroanalyzers. Finally, it is in the interaction of the macroanalyzer responses that any particular compass course is encoded. Evaluating response ratios and their interactive modulations is

254

Rüdiger Wehner

quite different an algorithmic procedure than consulting a map, yet both interpretations are compatible with the insect's behavioral responses. If neurophysiological inquiry had not provided us with some details of the underlying neural hardware (which by now has also been mimicked by the electronic hardware of an autonomous robotic agent; (10)), the way of how the insect's brain actually solves the compass problem would not spring to mind. Nevertheless, as the decisive aspect of the skylight information exploited by the insect navigator is a spatial one - the pattern of polarized light, or the E-vector distribution —, it is pertinent to ask where is this spatial aspect represented in the insect's neural machinery. Due to the overall geometry of the insect's compound eye, and some particular optical specializations within the polarization channel, the microanalyzers form a regular fan-like array (for a review and for references see 7). The regularity and peculiarity of this receptor array certainly contribute to how efficiently this information is picked up and integrated by the underlying macroanalyzers. (In the present context "efficiency" relates to the extent to which the system guarantees high signal contrast of macroanalyzer outputs as well as high robustness against changes in skylight parameters.) Hence, it is in the geometry and connectivity of neuronal arrays that the insect's internal representation of the external skylight patterns is embedded. Similarly, the large-field neurons involved in visual course control of flying insects are tuned to efficiently sense the patterns of self-induced image flow that result from particular types of movement (e. g. pitch or role) of the animal. Again, these large-field neurons receive the inputs f r o m certain sets of small-field neurons representing, in this case, directionally sensitive movement detectors (42). What can be said about the flexibility of this skylight-compass system? Even though the structural components of the system might be largely hard-wired, and hence represent "instant knowledge", the network parameters and transition functions must be modifiable, i. e. at least adjustable to the solar ephemeris function that varies locally as well as seasonally. Recent experiments in ants and bees (43; 44) show that these adjustments are made during the individual's early foraging life, and hence represent "knowledge through experience". Furthermore, compass information picked up f r o m the sky through one sensory channel (either through the polarization channel or through the spectral channel) can be transferred from this channel to the other one (8). As the preceding discussion of the Cataglyphis skylight compass has already shown, insect navigators do not tackle their way-finding problems by referring to first principles, that is, through reasoning based on logic and abstract symbol manipulation. This becomes even more apparent if we now turn our gaze from the sky to the surface of the Earth, i. e. to the problems involved in landmark guidance. This mode of navigational behavior makes tougher demands on any nervous system, because in contrast to skymark information landmark information is largely unpredictable. Furthermore, it is acquired about many places, so that many patterns of landmarks must be retained and recognized. The notion that the insect navigator does so by assembling a "cognitive m a p " of its foraging surroundings has been very attractive indeed (19; 20), but has not yet been supported by experimental evidence (see chapter 10.2).

Spatial Representations in Small-Brain Insect Navigators: Ant Algorithms

10.4

255

Conclusions

First and foremost, the insect navigator must obtain any large-scale information from small-scale egocentric perceptions. It cannot take a bird's eye - let alone a bee's eye or an ant's eye — point of view in order to chart the spatial relationships of familiar locations within a common system of reference. Instead, it is bound to experience such spatial relationships sequentially over time, as it moves through its environment. This poses several distinct questions. For example, how does the insect gain the necessary information in the first place? Acquiring retinotopic templates of local landmark sceneries might be one basic mechanism, but it is certainly not the only conceivable one. We do not know yet how this information is gained in walking insects, but flying insects such as bees and wasps perform elaborate "orientation flights". The geometry of these circling flights makes it quite likely that views later used for image matching are stored whenever the insect assumes a preferred orientation at the end points of successive arcs (32). Then, the area immediately surrounding the goal will be kept relatively stationary on the retina. Furthermore, the arc-end-point hypothesis is corroborated by the similarities in the flight patterns of departures and subsequent arrivals (45; 33; 46). The next important question is how the template information acquired locally and sequentially about a number of localities is globally encoded in memory. Finally, templates must be activated - or memories retrieved - appropriately, that is, at the right time and at the right place. Retrieval of memorized visual patterns might occur when the animal has reached a particular point in a sequence, as it has been observed in small-scale maze learning (47; 48) or large-scale trap-lining behavior (49). It might occur also by referring to large-scale spatial contexts as provided by distant landmark panoramas (50). Hence, a variety of contextual cues such as position in a sequence, time of day, or spatial context might help to structure the knowledge that the insect has acquired of its landmark surroundings. What appears as map-like behavior is most certainly the result of the insect's elaborate and flexible ways of contextually priming its memories. Social hymenopterans seem to use a variety of intricately interlocked specificpurpose subroutines, in order to navigate over long distances between nest and foraging sites. Functional comparisons and neurophysiological analyses make it quite likely that these subroutines have evolved from pre-existing visuomotor control mechanisms which have been recruited and modified for various aspects of long-distance navigation (51; 26). This evolutionary perspective does not leave much room for an overarching computational center which would handle all incoming sensory information and finally decide what action is to be taken - even though a top-down cognitive scientist can always come up with a general-purpose input-output model that does exactly the job that is required. Particular parts of the subroutines involved in navigation are certainly hardwired — such as, for example, the neural equivalent of the celestial map or the template-matching algorithm - but others depend on information deduced from the unpredictable properties and contingencies of the animal's environment. If we finally tried to comment on the degree of flexibility with which these subroutines

256

Rüdiger Wehner

are interlocked, we would enter largely uncharted territory. However, as research in these fields is proceeding apace, we might not have to wait long for some answers to emerge. Should we then return to the leading question asked in the very beginning by following Minsky's (40) advice and regarding intelligence as "a stage magician's trick, a name for whichever process we d o not yet understand"?

Acknowledgements Special thanks are to my graduate students w h o have co-operated with me in both Mahares and Zuerich. In the last years, and in the topics covered in this chapter, these are Per Antonsen, Daniel Heusser, Urs Lingg, Barbara Michel, Simona Sassi, and, as a post-doctoral student, Susanne Akesson from Lund University. In addition, I am very grateful to Ursula Menzi and H a n n a Michel for their competent assistance in preparing this chapter.

References 1. Marais, E. (1937). The Soul of the White Ant (London: Methuen). 2. Simon, H. (1976). The Science of the Artificial (Cambridge, MA: MIT Press). 3. Wasmann, E. (1909). Die psychischen Fähigkeiten der Ameisen. Mit einem Ausblick auf die vergleichende Tierpsychologie. 2. Aufl. (Stuttgart: E. Schweizerbartsche Verlagsbuchhandlung). 4. Forel, A. (1922). Mensch und Ameise (Wien: Rikola Verlag). 5. Hofstadter, D. R. (1982). Metamagical themas: Can inspiration be mechanized? Scient. Amer. 24713, 18-31. 6. Coulson, K. L. (1988). Polarization and intensity of light in the atmosphere (Hampton, VA: Deepak Publications). 7. Wehner, R. (1994). The polarizationvision project: Championing organismic biology. In: Neural Basis of Behavioural Adaptation, K. Schildberger and N. Eisner, eds. (Stuttgart, New York: G. Fischer), pp. 103-143. 8. Wehner, R. (1997). The ant's celestial compass system: Spectral and polarization channels. In: Orientation and Communication in Arthropods, M. Lehrer, ed. (Basel, Boston: Birkhäuser Verlag), pp. 145—185.

9. Labhart, Τ. and Petzold, J. (1993). Processing of polarized light information in the visual system of crickets. In: Sensory Systems of Arthropods, K. Wiese et al., eds. (Basel: Birkhäuser Verlag), pp. 158-169. 10. Lambrinos, D., Maris, M., Kobayashi, H., Labhart, T., Pfeifer, R., and Wehner, R. (1997). An autonomous agent navigating with a polarized light compass. Adapt. Behav. 6, 175-206. 11. Hartmann, G. and Wehner, R. (1995). The ant's path integration system: A neural architecture. Biol. Cybern. 73, 483-497. 12. Schwind, R. (1983). Zonation of the optical environment and zonation in the rhabdom structure within the eye of the backswimmer, Notonecta glauca. Cell Tiss. Res. 232, 53-632. 13. Schwind, R. (1984). The plunge reaction of the backswimmer Notonecta glauca. J. Comp. Physiol. A 155, 319— 321. 14. Ronacher, B. and Wehner, R. (1995). Desert ants Cataglyphis fortis use selfinduced optic flow to measure distances travelled. J. Comp. Physiol. A 177, 2 1 - 2 7 .

Spatial Representations in Small-Brain Insect Navigators: Ant Algorithms 15. Srinivasan, Μ. V., Zhang, S.W., Lehrer, Μ., and Collett, T. S. (1996). Honeybee navigation en route to the goal: visual flight control and odometry. J. Exp. Biol. 199, 237-244. 16. Müller, Μ. and Wehner, R. (1988). Path integration in desert ants, Cataglyphis fortis. Proc. Natl. Acad. Sei. U. S. A. 85, 5287-5290. 17. Collett, T. S., Baron, J., and Sellen, Κ. (1996). On the encoding of movement vectors by honeybees. Are distance and direction represented independently? J. Comp. Physiol. A 179, 395-406. 18. Wehner, R. and Wehner, S. (1990). Insect navigation: Use of maps or Ariadne's thread? Ethol. Ecol. Evol. 2, 27-48. 19. Gould, J. L. (1986). The locale map of honey bees: do insects have cognitive maps? Science 232, 861-863. 20. Gallistel, C. R. (1989). Animal cognition: the representation of space, time and number. A. Rev. Psychol. 40, 155-189. 21. Wehner, R. and Menzel, R. (1990). Do insects have cognitive maps? A. Rev. Neurosci. 13, 403-414. 22. Menzel, R., Chittka, L, Eichnüller, S., Geiger, Κ., Peitsch, D„ and Knoll, P. (1990). Dominance of celestial cues over landmarks disproves map-like orientation in honey bees. Z. Naturforsch. 45, 723-726. 23. Wehner, R., Bleuler, S., Nievergelt, C., and Shah, D. (1990). Bees navigate by using vectors and routes rather than maps. Naturwiss. 77, 479-482. 24. Wehner, R. (1992). Arthropods. In: Animal Homing, F. PAPI, ed. (London: Chapman & Hall), pp. 45-144. 25. Dyer, F. C. (1996). Spatial memory and navigation by honeybees on the scale of the foraging range. J. Exp. Biol. 199, 147-154. 26. Wehner, R., Michel, B., and Antonsen, P. (1996). Visual navigation in insects: Coupling of egocentric and geocentric information. J. Exp. Biol. 199, 129— 140.

257

27. Dill, M., Wolf, R., and Heisenberg, Μ. (1993). Visual pattern recognition in Drosophila involves retinotopic matching. Nature 365, 751-753. 28. Wehner, R. and Flatt, I. (1977). Visual fixation in freely flying bees. Z. Naturforsch. 32c, 469-471. 29. Dickinson, J. A. (1994). Bees link local landmarks with celestial compass cues. Naturwiss. 81, 465-467. 30. Collett, T. S. and Baron, J. (1994). Biological compasses and the coordinate frame of landmark memories in honeybees. Nature 368, 137-140. 31. Zeil, J. (1993a). Orientation flights of solitary wasps (Cerceris\ Sphecidae, Hymenoptera). I. Description of flight. J. Comp. Physiol. A 172, 189-205. 32. Collett, T. S. and Lehrer, Μ. (1993). Looking and learning: a spatial pattern in the orientation flight of the wasp Vespula vulgaris. Proc. R. Soc. Lond. Β 252, 129-134. 33. Collett, T. S. (1995). Making learning easy: The acquisition of visual information during the orientation flights of social wasps. J. Comp. Physiol. A 177, 737-747. 34. Müller, Μ. and Wehner, R. (1994). The hidden spiral: Systematic search and path integration in desert ants, Cataglyphis fortis. J. Comp. Physiol. A 175, 525-530. 35. Collett, T. S. and Baron, J. (1995). Learnt sensori-motor mappings in honeybees: Interpolation and its possible relevance to navigation. J. Comp. Physiol. A 176, 287-298. 36. Collett, T. S„ Fauria, K., Dale, K., and Baron, J. (1997). Places and patterns a study of context learning in honeybees. J. Comp. Physiol. A 181, 3 4 3 353. 37. Menzel, R., Geiger, Κ., Chittka, L., Joerges, J., Kunze, J., and Müller, U. (1996). The knowledge base of bee navigation. J.Exp. Biol. 199, 1 4 1 146. 38. Newell, A. and Simon, H. A. (1976). Computer Science as empirical in-

Rüdiger Wehner

258

39.

40.

41.

42.

43.

44.

quiry: Symbols and search. Comm. ACM 19, 113-126. McFarland, D. and Boesser, M. (1993). Intelligent Behavior in Animals and Robots (Cambridge, Mass.: MIT Press). Minsky, M. (1985). The Society of Mind (New York, London: Simon and Schuster). Newell, A. and Simon, H . A . (1972). Human Problem Solving (Englewood Cliffs, NJ: Prentice-Hall). Krapp, H. G. and Hengstenberg, R. (1997). A fast stimulus procedure to determine local receptive field properties of motion-sensitive visual interneurons. Vis. Res. 37, 225-234. Wehner, R. and Müller, Μ. (1993). How do ants acquire their celestial ephemeris function? Naturwiss. 80, 331-333. Dyer, F. C. and Dickinson, J. A. (1994). Development of sun compensation by honey bees: How partially experienced bees estimate the sun's course. Proc. Natl. Acad. Sei. USA 91, 4471-4474.

45. Zeil, J. (1993b). Orientation flights of solitary wasps (Cerceris; Sphecidae, Hymenoptera). II. Similarities between orientation and return flights and the use of motion parallax. J. Comp. Physiol. A 172, 207-222. 46. Zeil, J., Kelber, Α., and Voss, R. (1996). Structure and function of learning flights in bees and wasps. J. Exp. Biol. 199, 245-252. 47. Collett, T. S., Fry, S. N., and Wehner, R. (1993). Sequence learning by honeybees. J. Comp. Physiol. A 172, 693 — 706. 48. Pastergue-Ruiz, I. and Beugnon, G. (1994). Spatial sequential memory in the ant Cataglyphis cursors. Proc. Congr. Int. Union Study Soc. Ins. 12, 490. 49. Janzen (1974). The deflowering of Central America. Nat. Hist. 83, 4 8 - 5 3 . 50. Collett, T. S. and Kelber, A. (1988). The retrieval of visuo-spatial memories by honeybees. J. Comp. Physiol. A 163, 145-150. 51. Collett, T. S. (1996). Insect navigating en route to the goal: Multiple strategies of landmark guidance in insect navigation. J. Exp. Biol. 199, 227-235.

11. Elementary and Conflgural Forms of Memory in an Insect: The Honeybee* Randolf Menzel, Martin Giurfa, Bertram Gerber and Frank Hellstern

11.1

Introduction

Insects are small and have, therefore, small brains. Intuitively, small brains might be thought to provide the animal with less computational power, and thus insects might be expected to be more stereotyped in their behavior than are vertebrates. However, the multitude and flexibility of behaviors produced by the small number of peripheral and central neurons of a particular insect species, the honeybee Apis mellifera, is impressive. Bees are fast and elegant flyers and learn quickly to locate and handle flowers of a variety of forms for the extraction of pollen and nectar (1). They flexibly organize the distribution of tasks (brood care, feeding, foraging) in their society (2; 3) and communicate about both the needs of the colony (nectar, pollen, water, etc.) and the availability of resources (flower patches, nest sites, etc.) with the aid of a ritualized body movement (the waggle dance see 2).

11.1.1

Behavior and Biology of Honeybee

Bees perceive and rapidly learn about a multitude of visual features; they see colors in a trichromatic fashion over a spectral range which is extended into UV and shortened in the red (4), and they see and discriminate patterns (5; 6). Both kinds of visual cues can be learned as signal for food or for the hive entrance (7). Bees use the pattern of polarized light in the sky to navigate according to a timecompensated celestial compass (8). Compass directions between the hive and feeding sites are learned for each new feeding site, and are related to the surrounding landmarks in such a way that compass directions can be inferred from them when the sky is overcast (9; 10). The flight direction relative to the sun compass is converted into an angle of the body-length axis relative to gravity during the waggle dance on the vertical comb. Flight distances are measured visually (11 — 13), and transposed into an activity code during the communicative waggle dance. The precise localization of the hive entrance or a feeding site is learned relative to its associated landmarks (14), and landmarks en route are learned in a sequential order (15). A sense of the earth's magnetic field can be used to help learn the * This paper is dedicated to the memory of Dr. Martin Hammer, who substantially contributed to the concepts presented here.

260

Randolf Menzel, Martin Giurfa, Bertram Gerber and Frank Hellstern

relative position of a food source (16) and to perform the waggle dance more exactly (17). The choice between food sources is not random but depends very much on the experience at the various food sources (18). The caloric value of the food in addition to the effort invested in obtaining it allows the bees to build specific memories of the food sources and to guide foraging decisions (19; 20). Odors also play an important role, both in the organization of the community and as learned signals of food sources. Odors are learned particularly quickly and are very well discriminated (21; 22). Furthermore, olfactory learning can be analyzed also under restrained laboratory conditions and has proven to be a useful paradigm for investigation of the physiological substrate of learned behavior (23 — 25). Thus, honeybees are able to retain information and recall it later on relevant occasions. To that end, new information must be integrated into already acquired "knowledge". Such a richness and flexibility of perceptual and behavioral repertoire can hardly be the product of an inflexible, hard-wired form of information processing. It is, therefore, worth asking how complex memory is, whether it leads to internal representations, and whether some form of operation on internal representations does occur. The existence and quality of internal representations cannot be directly observed but can only be inferred from behavioral studies. As in the case of all studies on cognition, we face the problem that assumptions about the existence of representations and the operation on them can often be refuted by seemingly simpler assumptions, for example, by explaining behavioral flexibility in associative terms. Thus, it will be our special concern to focus on paradigms that test predictions that cannot be met by elementary associative assumptions. To address the problem of cognition in the honeybee as a study case, we will look into elementary forms of associative learning and briefly outline the experimental evidence that allows us to ask whether the establishment of elementary associations might involve cognition-like processes of attention or expectation and prediction. We will then move on to forms of associative learning that involve relational dependencies. We shall call the latter configural forms of learning (26). Examples will be drawn from olfactory conditioning, visual learning in a natural setting, and navigation. We suggest that bees may evince the capacity of building both elementary and configural associations and, thus, we shall conclude that learning in an insect can, at least qualitatively, be comparable to that of vertebrates.

11.2

Elementary and Configural Forms of Learning in Classical Conditioning

11.2.1

Classical Conditioning of the Proboscis-Extension Reflex (PER)

The antennae of a honeybee are its main chemosensory organs. When these are touched with sucrose solution, the bee reflexively extends its proboscis to reach

b)

c)

1 2 3 4 5 6 7 8

Trials

1 2 3 4 5

Trials

Fig. 11.1: (a) The proboscis extension response (PER) paradigm. Honeybees are harnessed in little metal tubes such that they are free to move their antennae and mouthparts. Touching of the antennae with a drop of sucrose solution releases the PER in hungry bees. The PER can be conditioned to an olfactory stimulus by forward pairing of the odor with sucrose. The bees will respond with the PER later when the odor is given alone. (b) Acquisition of the PER by bees trained by either multiple forward pairing trials of CS and US (paired) or by unpaired presentations of CS and US (unpaired). During training with forward pairings, the CS preceded the US by 2 sec during a trial; therefore, each forward conditioning tiral includes also a test situation, the 2 sec of CS presentation before the onset of the US. An extension of the proboscis during this period is evaluated as a conditioned response, and expressed as probability of PER in a group of bees (ordinate). (c) Differential conditioning with four (squares) or eight (circles) trials (left graph). During differential conditioning, one odor is paired with sucrose (CS + ), the other odor is presented unpaired (CS~) between the CS + trials. The interval between CS + and CS~ trials was 5 min. During reversal training (30 min after differential conditioning), former CS~ is now paired with the sucrose stimulus and, thus, becomes the CS + (right graph) (see text).

262

Randolf Menzel, Martin Giurfa, Bertram Gerber and Frank Hellstern

and suck the sucrose. The experimental situation shown in Figure 11.1 allows the free movement of the bee's antennae and mouth parts. Odorants do not release such a reflex in naive animals. If, however, an odorant is presented immediately before the sucrose solution (forward pairing), an association is formed that enables the odorant to release the proboscis-extension response (PER) in a following test. This effect is clearly associative and involves classical, but not operant, conditioning (21). Thus, the odorant can be viewed as the conditioned stimulus (CS) and the sucrose solution as the reinforcing, unconditioned stimulus (US). PER conditioning shows most of the basic characteristics of classical conditioning: acquisition and extinction, differential conditioning and reversal learning, stimulus specificity and generalization, dependence on odorant as well as on reinforcement intensity, dependence on the temporal interval between stimulus and reinforcement, and dependence on the temporal interval between learning trials (27), among others. Physiological correlates of such behavioral phenomena can be found at different levels, ranging from molecular biology and biochemistry (25) to single identified neurons (28; 29) and optical imaging of neuronal ensembles using Ca 2 + - and voltage sensitive dyes.

11.2.2

The Elementary-Configural Distinction

Two forms of associations, elementary and configural, may be distinguished (26; 30). Consider, for instance, the following two discrimination problems (for a detailed account see below, section 11.2.4d: Configural forms of conditioning): The first problem involves reinforced presentations of a stimulus A and unreinforced presentations of a stimulus Β ( A + / B - ) . This discrimination can be solved by associations of either single (elementary) stimulus with its respective outcome. The second problem involves reinforced presentations of two stimulus compounds (AB+ and C D + ) and unreinforced presentations of two additional compounds made up from other combinations of the same elements (AC— and B D - ) . To solve this discrimination, subjects cannot rely on single components but must respond to specific stimulus configurations, i. e. to A only if it occurs in association with Β (but not with C), and to D only if it occurs in association with C (but not with B). Thus, a learned configuration is unique and can be discriminated from its components.

11.2.3

Cognitive Aspects of Elementary Forms of Conditioning?

Traditionally, classical conditioning was viewed as an automatic passive process: Whenever a neutral stimulus (A) and a reinforcing stimulus ( + ) occurred together, they were thought to be associated. In other words, a joint occurrence of A and a reinforcement was considered to be both necessary and sufficient to induce associative learning. Associative processing of internal representations of stimuli in the brain was then suggested to be a simple, one-to-one reflection of these stimuli in the external world. However, this is not an adequate conceptualization, as the next paragraph will show.

Elementary and Configural Forms of Memory in an Insect: The Honeybee

11.2.3.1

263

Backward inhibitory learning

Learning psychologists have recognized that a stimulus A that precedes reinforcement (forward pairing, A + ) later on supports conditioned responses, but if the sequence of stimuli is reversed (backward pairing, +A), less conditioned responses develop. However, backward pairing does not result in no learning, but often in inhibitory learning (31; 32). In other words, whether an excitatory or an inhibitory association will be formed can be regulated by the nervous system according to the predictive relation between the stimuli involved (33). Backward inhibitory conditioning has been recently demonstrated in honeybees both by retardation of acquisition and by summation assays (34) (Fig. 11.2). The extent of inhibition depends non-monotonously on the interval between reinforce-

600 550 500 450 0) g 400 f 350 — \ ^ 300 ο