Table of contents :
Phonemic and phonetic symbols
1 Introduction
1. Cognitive Linguistics
2. The language system
3. Symbolic units
4. The basic organization of grammar
5. Construction Grammar
6. Morphology
7. Compositi onality
8. Morphological or word schemas
9. Cognitive Phonology
10. Summary
2 Articulatory phonetics
1. How speech sounds are produced
2. The consonants
2.1. The place of articulation
2.2. The manner of articulation
2.3. Voicing
3. The vowels
3.1. Vowel articulation
4. Summary
3 Sounds and meaning
1. Sounds and words
2. The phoneme
3. Minimal pairs
4. Contrastive or overlapping distribution
5. Complementary distribution
5.1. Allophones versus procedural and schematic knowledge
5.2. Basic level and prototypes
5.3. Stop allophones
5.4. Vowel allophones
5.5. Phonetic similarity
5.6. Syllabification and allophones
5.7. Voicing and voicelessness in allophones
6. Free variation
7. On the autonomy of phonological units vis-à-vis the symbolic units
8. Vocalic differences between the phonetic and phonemic levels
9. Phonological features
10. Nasalization revisited
11. A brief excursion into writing systems
12. Summary
4 Alternation patterns
1. Morphemes and allomorphs
2. Morphological schémas or word level constructions
3. What is in the lexicon?
4. The velar nasal
5. Phonotactics
6. Morpheme internal ‘changes’
6.1 Consonant alternations
6.2. Velar softening
7. The English Laxing
8. Allomorphs and stress
9. Suppletion
10. Sandhi
11. Linking
12. Dissimilation
13. Summary
5 Word stress
1. What stress is and how it is assigned
2. Morpheme types and word stress
3. Separable and inseparable prefixes
4. Stress pattern and syllable structure
5. Secondary stress before primary stress
6. Morphologically determined stress pattern
6.1. Zero derivation
6.2. Suffixes that carry primary stress
6.3. Suffixes that determine the locus of the primary stress in the stem
7. Schema constellations
8. Entrenchment
9. Compound stress
10. Stress shift
11. Summary
6 Intonation and grammatical constructions
1. Propositions, assertions and intentions
2. Default contours in grammatical constructions
3. Interaction of defaults and non-defaults
4. Frequencies of the defaults
5. Broad and narrow focus revisited
6. Predicate focus and event focus
7. Phonological phrasing
8. Intonational universals and discourse management
9. Summary
7 Concluding remarks
Chapter notes
Person index
Subject index

Cognitive Phonology in Construction Grammar


This is an introduction to both English phonology and phonology in general. While the basic notions will be largely covered, it is assumed that the reader already has a background in linguistics. However, a beginner in phonology can follow the text perfectly well if s/he begins with Chapter 2 and reads Chapter 1 last. I will analyze a variety of phonological phenomena of English using the frameworks of Cognitive Linguistics and Construction Grammar. In this approach, phonology is not a separate or autonomous module in grammar but an integral component of the symbolic units. This entails in particular that, in the minds of the speakers, speech sounds have no existence independent of the representational structures and therefore word formation must be at the heart of the discussion. Experimental evidence tends to indicate that the mental representation of words contains even predictable, allophonic information, and if so, this means that morphemes cannot be stored in terms of a single underlying form, but all non-automatic allomorphy must be in the lexicon. Word formation uses derivational and inflectional schemas that simultaneously unify stems with both combinatory and non-combinatory material to form word-level constructions. Intonation is a formal concomitant of sentential constructions. Given that the basic symbolic units of language form a continuum, going from morphemes to sentences, all aspects of phonology are tightly interwoven with this global symbolic organization. The discussion concentrates on two standard varieties of English, Southern British English and General American. This book started out as a class handout for my phonetics and phonology students in Nice, and over the last couple of years, it evolved into this volume. I thank my students for their patience with the manual. I also wish to express my gratitude to Vniversite de Nice and University of Maryland at College Park, for a research grant that made it possible for me to spend the month of July in 2002 at the university library in College Park. Thanks are due to the Centre de Recherche sur les Ecritures de Langue Anglaise (C.R.E.L.A) at Universite de Nice for library services and logistics. In particular, I am greatly indebted to the two anonymous reviewers of Mouton de Gruyter, whose comments helped me to be more precise and clear on almost all the issues discussed. Any potential errors or misconceptions, however, are mine alone. I wish to dedicate this book to my children, Markus A. Ahonen and Anna K. Blum, and my grandchildren, Eila and Jack.


1 Introduction


1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Cognitive Linguistics The language system Symbolic units The basic organization of grammar Construction Grammar Morphology Compositionality Morphological or word schemas Cognitive Phonology Summary For further thought

2 Articulatery phonetics 1. 2. 2.1. 2.2. 2.3. 3. 3.1. 4.

How speech sounds are produced The consonants The place of articulation The manner of articulation Voicing The vowels Vowel articulation Summary For further thought

3 Sounds and meaning 1. 2. 3. 4. 5. 5.1.

Sounds and words The phoneme Minimal pairs Contrastive or overlapping distribution Complementary distribution Allophones versus procedural and schematic knowledge

1 3 5 8 10 15 18 23 28 33 34 35 35 38 38 39 42 44 46 51 52 53 53 56 61 62 62 64


Contents 5.2. 5.3. 5.4. 5.5. 5.6. 5.7. 6. 7. 8. 9. 10. 11. 12.

Basic level and prototypes Stop allophones Vowel allophones Phonetic similarity Syllabification and allophones Voicing and voicelessness in allophones Free variation On the autonomy of phonological units vis-ä-vis the symbolic units Vocalic differences between the phonetic and phonemic levels Phonological features Nasalization revisited A brief excursion into writing systems Summary For further thought

4 Alternation patterns 1. 2. 3. 4. 5. 6. 6.1 6.2. 7. 8. 9. 10. 11. 12. 13.

Morphemes and allomorphs Morphological schemas or word level constructions What is in the lexicon? The velar nasal Phonotactics Morpheme internal'changes' Consonant alternations Velar softening The English Laxing Allomorphs and stress Suppletion Sandhi Linking Dissimilation Summary For further thought

5 Word stress 1. 2. 3.

What stress is and how it is assigned Morpheme types and word stress Separable and inseparable prefixes

68 70 72 74 75 77 79 82 85 89 93 95 97 98 100 100 106 116 121 129 135 135 137 140 143 145 146 151 154 155 15 6 157 157 163 167

Contents 4. 5. 6. 6.1. 6.2. 6.3. 7. 8. 9. 10. 11.



Stress pattern and syllable structure Secondary stress before primary stress Morphologically determined stress pattern Zero derivation Suffixes that carry primary stress Suffixes that determine the locus of the primary stress in the stem Schema constellations Entrenchment Compound stress Stress shift Summary For further thought

ix 169 172 174 174 176 177 193 199 202 204 205 207

Intonation and grammatical constructions


1. 2. 3. 4. 5. 6. 7. 8. 9.

208 211 219 226 228 231 233 234 236 237

Propositions, assertions and intentions Default contours in grammatical constructions Interaction of defaults and non-defaults Frequencies of the defaults Broad and narrow focus revisited Predicate focus and event focus Phonological phrasing International universals and discourse management Summary For further thought

Concluding remarks

Chapter notes References Person index Subject index

239 243 244 262 266

Phonemic and phonetic symbols used in the text

Southern British English (SBE) and General American (GA) Vowels i beat bead tease I lit lid kiss e/ei fate fade maze let said best ε ae sat bad jazz α father farmer car; GA: pot rod posh SBE: pot rod posh D sought sawed horse Ό ο put pull push θ/θϋ SBE: boat road hose ο/ου GA: boat road hose u scoot food tune 3 Bert bird further 3 around forget doctor Λ cut bud fuss ai tight tide buy ao out loud house voice noise join DI here near beer 13 εθ there flared cares Ö9 pure cured tours

D k 9 f V θ δ s ζ h J" 3 tf d3 m η g 1 { r j w

? (For the last three diphthongs in GA, see the text) Consonants ρ pear appeal nap b boat ribbon robe t tickle attain light d dinner adore road

GA: writer rider letter kite account sick gate soggy pig five muffin loaf veal even love thumb pathos math this feather breathe seal basic cats zone fuzzy dogs hat behave ahead show fisherman lush genre measure beige choke catcher fetch joke bulging edge mat woman swim nose sunny run finger tank song leap alive liquid pale salt tall write merry GA: car horse yet onion million witch afterward whine out [?aot]; casual speech: button cat

Voicelessness of usually voiced consonants 1 play r pray j0 cute w quiet

Phonemic and phonetic symbols Syllabic consonants 1 tickle bottle whistle η fashion button listen m prism truism spasm

Unreleased stops p' tap stop t cat cot k stick rock

Aspirated stops ph pair appeal th toil attain kh coin accord

Nasalized vowels 1 sing ε ten u loom


When the pronunciations of the examples in SBE and GA are different, both are given, SBE first and GA after, separated by a tilde, for example, {hot} /hDt/ ~ /hat/.

Chapter 1 Introduction

1. Cognitive Linguistics Cognitive Linguistics starts from the assumption that language is an integral part of the general human cognitive faculties, and that language is organized according to the same principles as those that govern the rest of human cognitive functioning (Fillmore 1976, 1982, 1985; Fillmore, Kay and O'Connor 1988; Kay 1997; Lakoff 1987, 1993; Langacker 1987; Talmy 1988, 2000; Taylor 1989, 2002; Wierzbicka 1988). This is in a sharp contrast with the generative tradition of linguistics, largely originating with Chomsky (1957, 1965), in which human beings are assumed to be endowed with an innate, language-specific module in the brain, separate from the other cognitive facilities (Chomsky 1980). This module is proposed to determine the form that a human language can take by providing a biological pre-setting of the limits of a possible language. In language acquisition then, this language faculty "yields knowledge of language when presented with linguistic experience" (Chomsky 1986: xxvi). I do not claim to provide an answer to the question of whether anything in language is innate or not, as this question has no bearing on this work, but what is central here is the conviction that language is on a par with all other human cognitive faculties, by whatever manner they are acquired. Categorization of experience is at the heart of human cognitive processes. We do not comprehend the world as a set of isolated entities, events and properties, but we form generalizations over instantiations of what we come to consider the same experience. These generalizations are structured in the mind in ways that are not always maximally economical and parsimonious. For example, lexical storage is not in terms of only the minimum that is needed for word recognition, but it seems to include redundant material as well. Also, there is ample evidence that human categorization is prototype-centered, which means that not all members of a category are identical in terms of some specific property. Rather, there seem to be one or a few central members that are representative of the whole category, and in addition, there are often less central members. All members, however, bear some resemblance or another to each other and/or the central member. Since category membership thus is gradient, it is also the case that, among the less central members, there are those that are only 'more or less' in the category. This means that category boundaries are fuzzy, they are not



rigid, and consequently, it is not always clear to what extent an entity is in the category. In the 1970's, Rosch Heider and her associates developed experimental paradigms for the study of human categorization, and they were able to experimentally establish the psychological reality of prototypes and the so-called 'family resemblances' which category members share with one another (Heider 1971, 1972; Mervis and Rosch 1981; Rosch 1973a, 1973b, 1977, 1978, 1981; Rosch and Mervis 1975). Language is a system based on complex categorization par excellence. To illustrate what prototype categories mean in linguistics, let us consider a semantic category that has been methodically discussed by Lakoff (1987: 74-84). He considers the concept , which is, just like most other semantic concepts, the intersection of several complex conceptual structures. The central, prototypical meaning includes cimmediate biological ascendance>, but in Western cultures, it also tends to contain social stereotypes such as being a 'housewife' and having a husband. Thus, a 'good' mother is married and stays at home. But the word 'mother' actually has numerous uses: A mother can be, for example, a 'birth mother,' 'natural mother,' 'genetic mother,' 'biological mother,' 'adoptive mother,' 'stepmother,' 'surrogate mother,' 'foster mother,' 'working mother' or 'unwed mother.' These expressions form a radial category in that all the members are related to one another through various kinds of similarities, and they can all be traced to the central meaning cluster in one way or another, but biological ascendance, 'housewifeness' and a husband are not found in every one of them. In other words, not all members have the same properties as the prototypical . For example, a 'foster mother' bears no genetic links to the child, but she assumes part of the social responsibility of the conventional and hence 'foster mother' is semantically linked to the central concept. What characterizes a radial category is the fact that the membership is based on 'family resemblances' among the members: Each member bears some similarity or another to at least one other member, and by going through these similarity links, all members of the category can be traced to the prototypical member(s) (Wittgenstein 1953). To illustrate the fuzziness of category boundaries, let us consider the expression 'real mother,' also discussed by Lakoff. Its meaning can be ambiguous, the intended referent depending only on the speaker. For example, both of the following are semantically well-formed and pragmatically perfectly felicitous: (1) (2)

I was adopted and I do not know who my real mother is. I consider my adoptive mother to be my real mother.

The language system


There are thus not always hard and fast rules that strictly delimit categories to certain types of members only. Experimental evidence for graded categories can be found, for example, in Kempton (1981) and in Labov (1973). Even the central membership may be gradient in that members clearly within the core of the category may be 'more or less central' (Lakoff 1987: 12). Of course not all categories are radial, the members may be linked to each other in other ways as well, as for example, through metaphor and metonymy. As far as phonology is concerned, however, it is the radial and gradient categories that are of major importance, as we will see below. The way humans conceptualize the world is largely based on their own experiences with it, that is, human conceptual categories are embodied; they are not j u s t based on the characteristics of the p h e n o m e n a themselves, but we categorize the world on the basis of the way we interact with it (M. Johnson 1987; Lakoff 1987; Lakoff and Johnson 1980). This principle of embodiment applies not only to physical experiences like pain and pleasure, but also to psychological phenomena, such as logic and language, including speech sounds (Lakoff 1987: xvi). Nathan (1996) discusses the representation of phonological features and notes that they may involve representations of, for example, the tongue body movements, and these representation speech sounds may also include auditory information (Fowler 1989). W e can thus expect the mental representation of sounds to be based on both perception and production. However, the speech sounds are of course categorized into phonemes first and foremost on the basis of their distinctive role, which solidly incorporates the sound classes into both semantics and morphosyntax.

2. The language system A language system is an abstract cognitive organization that relates sound and meaning. The principal and only function of language as a system is the expression of meaning. All the linguistic units, i.e., the grammatical constructions, lexical items and phonemes, are geared toward the realization of this one goal, they all converge in meaningful expressions. What this means is that when we encode a set of meanings into a spoken form, we transform concepts into a stream of speech sounds using the linguistic conventions that are stored in the brain, i.e., the grammar. Concepts and linguistic meaning are roughly equated in (3) below, but strictly speaking, they are not the same. Linguistic meaning is part of the language system while concepts, being more general, may include information that is not in



a one-to-one correspondence with the linguistic elements. Speech sounds themselves are nothing but small variations in air pressure, a sound wave, and when this wave reaches the ears of the listener s/he decodes the message or transforms the sound wave impinging upon his/her eardrums into meaningful concepts in his brain. We may represent language relating meaning and sounds with the following diagram: (3)



ψ encoding


high fall Wh-question The default for a wh-question has a falling nucleus, as in How did it happen in Figure 5, and the range of the fall tends to be f r o m high to low. Table 4 gives the constructional schema for wh-questions. W h e n the default tune is overridden, the new tune obviously must be contextually appropriate, for its function is to signal a change in the conventional conversational use, which itself would not be pragmatically felicitous at that given moment. And the information structure, i.e., what is accented and what is not, must, of course, also match the general discourse perspective. In Figure 6, the contour of Have you ever been to London, a syntactic yes/noquestion, has a high falling tune, which is the default for the wh-questions, but the utterance sounds perfectly authentic and the speaker gives the impression of having a genuine interest in the matter.


Intonation and grammatical HAVE



constructions BEEN







... ,.•••••""

n 1660? 0

j | B 24100






MW'jPH^fr^W··»· 24700








Figure 6. Yes/no-question with a falling contour The tag-questions in Figures 7 and 8 show how the speaker's illocutionary force guides the choice of the tune. The first, I wasn't late, was I, is a true question, and this type tends to have a rising contour on the tag, j u s t as in yes/no-questions. The utterance having the form of a tagquestion in Figure 8 , 1 was charming, wasn't I, however, is actually not a question at all, but all the speaker does is seek confirmation to her own assertion about her having been charming, and thus a no answer would not be appreciated. The tune on the tag resembles more statements than questions. In the tag part in 7 and 8, the choice of the intonational contour depends on the speaker's intention, not the sentence type, nor the context. Since these types of sentences are actually called tag-questions, we might want to suggest that the default for them is the rising intonation, because it makes them into questions unambiguously, and the 'confirmation seeking' use would then be a non-default. Table 5 gives the default schema for tagquestions. The syntactic details are given only in very general terms.

Interaction |






of defaults and non-defaults |






..... 6000






* r

Figure 7. A tag-question asking a question |











6817 ^



e i e i i l l l f l i l l l J · !



-6817 95800





Figure 8. A tag-question seeking confirmation






Intonation and grammatical constructions

Table 5. The constructional schema for tag-questions [ subject j + [ verb ( + argument 0-n)] VP ] ·•· Broad Focus Declarative sentence, polarity [a] [ Aux + subject j pronoun ] Low rise, polarity [-a] Tag Question The default for the imperative sentence type used as a command has a falling tune, identical to the neutral intonation in Figure 1. This construction can be described as in Table 6. Table 6. The constructional schema for imperatives Verb2ND-pERSON

( + complement)... Fall

Imperative We have now seen the two basic concepts, sentence type and default intonation, and how they go together in grammatical constructions. This amounts to a statement that each sentence type expresses a specific illocutionary force by default. And this being so, the default intonation of each construction must then conform to the speaker's current intention and also to the lexical choices in the actual instantiation of the constructional schema.

4. Frequencies of the defaults When we operate with the idea of a default contour, the question arises as to how often speakers actually use the defaults. Before trying to answer this question, it is important to realize that mere frequencies do not tell anything about the authenticity of a contour, or of any other linguistic entity either. The only measure of naturalness and authenticity is the pragmatically felicitous use of a linguistic entity in a given context. As for intonation, Hirschberg (1989, 1995, 2000) has been studying speaking styles, read and spontaneous speech in particular, and, among other things, she has been looking into the frequencies of the rising and falling contours in declaratives, yes/no- and wh-questions in GA. The data in Table 7 illustrate her findings in read speech in two databases. We can see that, while the proposed default for yes/no-questions is a rise, the tokens in the RM database had a fall in almost half of the instances and 20% in the other database; the other tunes had a rising nucleus. Wh-questions had a fall

Frequencies of the defaults


most of the time, as the defaults would predict, but there were rises as well. Table 7. Percentages of intonational falls in read speech (Hirschberg 1989) yes/no-question



RM Fall

TIMIT 1 Fall

RM Fall

TIMIT 1 Fall

Read speech





Table 8. Percentages of intonational falls in read and spontaneous speech (Hirschberg 2000) atis Database

declarative fall (N=288)

yes/no-question fall (N=74)

wh-question fall (N=30)

Read Spontaneous

93% 70%

30% 43%

84% 62%

In another study Hirschberg compared read and spontaneous speech. Table 8 shows again only the percentages of falls, and j u s t as in Table 7, the remaining utterances had a rise. The results for read speech show a closer match with the conventional view and the proposed defaults, namely, that in declaratives and wh-questions, intonation tends to fall and in yes/no-questions, it rises. In spontaneous speech, however, the tendencies are somewhat less clear-cut. Comparing the numbers in Table 8 with their complements, we see that the spontaneous declaratives had rises in 3 0 % of the cases. In the yes/no-questions, the number of rises was 57%, as the defaults would predict, but 43% did have a fall. In the whquestions, the default fall was present in 62% of the utterances. In these studies, Hirschberg demonstrates that the tunes are not a matter of static choices. Her results do lend support to the defaults proposed here, in that we can take them to be the more frequent tunes for these three sentence types, but they are indeed j u s t defaults and by no means the only ones. Hirschberg also shows that there are differences between the speaking styles so that in the spontaneous speech, deviations from the defaults are more frequent than in read speech. If we now assume that read speech is more closely related to hyper-speech than spontaneous speech is, then the defaults can be seen as being more frequent in careful styles.


Intonation and grammatical


5. Broad and narrow focus revisited This far, I have mainly been concerned with the default tunes of grammatical construction types at the sentence level, but establishing the default contour for any given instantiation of a constructional schema is not a simple matter. As we saw above, Ladd notes that in the broad focus, there is a strong tendency for the nucleus to go on the last content word, but this placement is not automatic. Consider the following sentences, modified from Ladd (1980), where the nucleus is on the last semantic argument of the verb. The reading is a broad focus, that is, there is no special emphasis on any one word. (18) (19) (20)

Mary bought a computerNUCLEus yesterday. Did Mary buy a computerNUCLEus yesterday? Who bought a computerNUCLEus yesterday?

The sentences (18) and (19) can be said without specific contextual presuppositions, and in that sense, they are both unmarked and conform to the defaults for these sentence types. The presupposition in both of them is Mary, i.e., we do know who she is, and the assertion in (18) is that she bought a computer yesterday, but in (19) there is no assertion, the speaker is asking a question about Mary. Wh-questions, however, always express specific propositional presuppositions such that only the semantic information corresponding to the wh-word is not presupposed, for everything else is taken for granted. In (20), the presupposition expressed is that someone bought a computer yesterday, and we want to know who it was. In the broad focus, the circumstantial yesterday in (18)—(20) is deaccented. The broad focus thus does not invariably place the nucleus on the last content word, but in Ladd's terms again, it falls on the last accentable syllable, and, as noted above, often this syllable is on the last semantic argument of the verb or the construction. To see this, we can contrast the above examples with those below. (18') Mary bought a computer yesterdayNUCLEus(19') Did Mary buy the computer yesterdayNUCLEus? (20') Who bought a computer yesterdayNUCLEus? The pragmatic contexts of the primed examples are radically different from those of the corresponding earlier examples. In (18')—(20'), there is a narrow focus on yesterday and what precedes the nucleus is contextually dependent. In (18'), we know that Mary bought a computer and what is

Broad and narrow focus revisited


asserted is that she did it yesterday, and in (19'), we know that Mary has bought a computer and we want to know whether she did it yesterday. In (20'), we know that, on at least two separate occasions, some person or another has bought a computer, and we now want to know who did it yesterday. This is not to say that the unmarked nucleus cannot be on an adverbial expression, as we saw above, but the 'take-away' point again is that, while there are defaults in terms of intonation, the nucleus placement, even in the defaults, is not automatic. The problem of the default nucleus placement in the broad focus can be illustrated with the example in (21); (21a) is L a d d ' s (1980: 81) and (21b)—(21d) are mine. (21)

Has John read Slaughterhouse Five? (a) John doesn't readNucLEus books. (b) John doesn't read booksNUcLEus(c) #John doesn't NU cLEus read books. (d) #John NUC LEus doesn't read books.

Ladd's proposed answer with a broad focus is (21a), with the nucleus on read. This is a contextually appropriate answer to the question, and the speaker is asserting that John doesn't read books in general, therefore he hasn't read Slaughterhouse Five either. That (21a) has the broad focus depends on the fact that Slaughterhouse Five is a book and thus books is, in that sense, already contextually dependent, and the nucleus falls on the preceding accentable item by default. Also (21b) is a felicitous response, but this time, it has a narrow focus on books and the implication is that, while John doesn't read books, he does read something else, such as newspapers. The responses in (21c) and (2Id) are both pragmatically infelicitous. In (21c), the narrow focus on the auxiliary presupposes a preceding context where it has been claimed or implied that John does read books, which is not the case here. In (2Id), John carries a narrow focus and the implication is that it has been explicitly asserted that the person named John reads books, which too is contextually inappropriate. Ladd's point with this example was to show that the default accent placement is not independent of the context. Consider now the following question with four potential answers again, the dialogue being very similar to the one above. (22)

Do you like sushi? (a) I don't eat f i s h N U C L E u s · (c) # 1 d o n ' t N U C L E u s eat fish.

(b) I don't eat NUCL Eus fish. (d) # I N U C L E U S don't eat fish.


Intonation and grammatical constructions

Here too, only the first two are appropriate answers to the question, (22c) and (22d) being excluded on the same grounds as (21c) and (2Id). The broad focus is found in response (21a), but this time, however, the nucleus is on the direct object of the verb, fish. We can contrast this with (21a), where the nucleus of the broad focus is on the verb. The difference lies in the fact that the question in (22) does not mention fish explicitly, and thus, not being contextually dependent, fish can carry the nucleus in the broad focus - fish in (22a) is inferred from sushi. The speaker now asserts that s/he does not eat fish. In (22b), the speaker is saying that s/he does not eat fish but does, e.g., prepare or buy it, which is also a pragmatically felicitous answer, but eat now carries a narrow focus. I proposed above that an intonational contour with the broad focus is neutral in that it can be used to answer open-ended questions like "what happened," and it thus does not presuppose any specific propositional context. The example in (23a) illustrates this, but the other examples below it demonstrate that the scope of the prominence is actually ambiguous. The assertion or the new information in the three sentences in (23) is in the scope of the prominence, and the existence of the referent of the subject constituent is presupposed. The sentence in (23a) answers an open-ended question, but the others answer very specific questions. In both (23b) and (23c), the questions pertain exclusively to the verb phrase; we may note the potential circumstantial yesterday, which is deaccented in all of them. (23)

Answers with a Broad Focus (a) W h a t ' s new? Mary bought a new computer NUCLEUS (yesterday). (b) What did Mary do? Mary bought a new computerNUCLEus (yesterday). (c) What did Mary do yesterday? Mary bought a new computer NUCLEUS (yesterday).

These examples demonstrate that the broad focus can be used to answer questions that concern either the whole proposition or the predicate only. We of course need to know who Mary is, and Mary's existence is thus presupposed. Hence, the entire proposition is not what is asserted, only the predicate is. What is asserted about Mary is contained in the predicate, but the contour itself does not manifest any variability under the different questions in (23). The answer sentence in (23) can be used in response to other specific questions as well, such as those in (24), but now it would have a narrow

Predicate focus and event focus


focus on one constituent or another. In an utterance with a narrow focus, there is one word and one syllable in it that clearly stands out from the rest of the utterance, and as long as it is stressed, this syllable can be practically anywhere in an utterance. (24)

Answers with a Narrow Focus (a) Who bought a new computer NUCLEUS (yesterday)? MaryNUCLEus bought a new computer (yesterday). (b) When did Mary buy a computerNUCLEus? Mary bought a computer yesterdayNUcLEus(c) Did Mary buy a usedNUCLEus computer? Mary bought a newNUCLEus computer.

The broad focus can thus be used in contexts with no specific prepositional presuppositions, only the existence of the subject is presupposed (23a), but it can also answer precise questions concerning the verb phrase (23b)-(23c). An utterance with a narrow focus, however, always corresponds to clear propositional presuppositions, as (24a)-(24c) illustrate.

6. Predicate focus and event focus The locus of the nucleus can be used in the expression of whether the speaker wants to bring the predicate alone to the foreground, or whether s/he actually wants to place the whole proposition into the scope of the focus. This distinction involves the so-called thetic and categorical sentences (Sasse 1987), or sentence focus versus predicate focus (ValimaaBlum 1988; Lambrecht 1994; 2000). Sentence focus is also called event focus as opposed to predicate focus. A simple example of this difference is given in the following contrast, based on the nucleus placement (Schmerling 1973). (25a) Johnson diedNUCLEus· (25b) Johnson NUCLEUS died.

- Predicate focus - Event focus

In (25a), Johnson is present in the discourse model in one way or another already, and what is asserted is the fact that he died. In (25b), the addressee must know who Johnson is, but there need not have been any particular discussion about him at the utterance time, and what is asserted is the whole proposition, i.e., that Johnson died. The former illustrates predicate focus and the latter event focus.


Intonation and grammatical


Krifka et al. (1995) illustrate the same with the following contrast, making use of another construction type. (26a) Pandas | are facing extinctionNUcLEus(26b) PandasNUcLEus were roaming the camp.

- Predicate focus - Event focus

In (26a), the utterance is explicitly about pandas and the VP predicates something about them, that is, the VP contains the assertion. The utterance in (26b) introduces a whole new discourse event and the nucleus on the subject places the entire proposition into the scope of the focus. The contextual difference between the two is that in (26a), pandas are already present in the discourse model, and the speaker is thus predicating something about them by intonationally marking the subject as the topic. This was the case with (23a) as well. In (26b), pandas are introduced into the present discourse model as new information on a par with the predicate, that is, what is asserted is the whole proposition, and this is signaled by placing the nucleus on the subject. We must note that whether or not an expression can carry the event focus depends on many factors, such as the construction type, word order and the definiteness of the subject (Sasse 1987; Lambrecht 1994; 2000; Välimaa-Blum 1988; 1999b). Thus, while the event vs. predicate focus distinction can be a matter of a paradigmatic contrast in the intonation of a construction, as the example in (25) shows (Lambrecht 2000; Schmerling 1973; Välimaa-Blum 1988; 1999b), the lexical content, especially the subject definiteness must also be taken into account. The different word order variants and other grammatical constructions all form a paradigmatically contrasting system of forms, where intonation is a central component in the expression of the essentially pragmatic notion that the event versus predicate contrast represents (Fillmore, Kay and O'Connor 1988; Lambrecht 2000; Välimaa-Blum 1988; 1993). The fundamental assumption that runs all through this discussion is that the main function of intonation is to express the information structure, and that even the expression of the speaker attitude must respect this structure. Intonation is something that the speaker does for the addressee. The variability in the intonational prominence enables the speaker to present his/her ideas in a specific light, relevant to the unfolding discourse and his/her intentions in general. The intonational patterns used are based on tacit conventions, which the addressee, who shares them, actively interprets. This chapter is very much English-specific in that it is by no means true that intonation takes the same shapes in all languages. For example, Finnish has pragmatic speech act particles that express many of the

Phonological phrasing


functions that English encodes with a tune. But Finnish also involves complex interactions between word order, definiteness and intonation (Välimaa-Blum 1988, 1989a). French uses essentially syntactic means for focusing and the information structure (Lambrecht 1994). But the basic principle of grammatical constructions with a default intonation can certainly be extended to other languages as well.

7. Phonological phrasing Phonological phrasing has to do with the alignment of intonation with portions of the text. Just as sentences can be analyzed into units at several levels, such as clauses, phrases and words, utterances can also be analyzed into several levels, each with its own units, such as syllables, feet, phonological words, etc. The highest level is the utterance or the intonational phrase and it is the only level that is of interest to us in this context. It is independent of the syntax in that an intonational phrase may span over even two clauses, it may break a sentence into smaller parts or it may consist of a single word. Other names found in the literature for the same prosodic entity are tone group, breath group, intonation unit, sense group and information unit. To understand this unit from the pragmatic point of view, let us try to see why there are so many different appellations. 'Tone group' is based on the fact that each intonation unit must contain one and only one tonic syllable, which is the nuclear accent. 'Breath group' stems from the mistaken idea that the speaker fills up his lungs before starting to speak and empties them as s/he comes to the end. This view is erroneous, for a speaker may well produce several breath groups in one breath. 'Intonation unit,' just as intonational phrase, is neutral with respect to structure and function. 'Sense group' looks at intonation from the point of view of function in that, characteristically, there is semantic cohesion within intonational phrases. 'Information unit' represents a pragmatic point of view, where it can be shown that contours can frequently be divided into two parts, as already seen above. One part corresponds to the presupposition and other to the assertion, i.e., the old and new information, hence information unit. The semantically and pragmatically felicitous positioning of the nuclear accent is extremely important in the expression of the information structure, because it is largely the nucleus that signals the reading of the utterance as containing a broad or a narrow focus. The narrow focus is always conditioned by specific propositional presuppositions, while the broad


Intonation and grammatical constructions

focus may also relate to an open-ended context, with no propositional presuppositions. We cannot establish an upper limit to the length of intonational phrases, but we can say that formal speaking styles allow longer groups than the less formal ones. Syntactic and semantic cohesion are generally maintained more within an intonation unit than across two of them. Tone groups may be units of speech planning for the speaker, but fundamentally, they always impose a certain informational reading on the message. And since the grouping of words into intonation units depends virtually only on the speaker, we can never infallibly predict intonational phrasing on the basis of the linguistic form alone, all we can do is try to see what the defaults are. The intonational phrase can be defined as the element of intonational patterning, which contains the nuclear accent, which in turn can be defined as the portion of the contour where there is a pitch movement, either simple or complex, and the nucleus is also the last accented syllable in a tone group. I will not propose any formal analyses of the tone units, but, to begin with, the interested reader may want to consult Pierrehumbert (1980), Beckman and Pierrehumbert (1986), Cruttenden (1986) and Hirst and Di Cristo (1998).

8. Intonational universals and discourse management Intonation is something that is produced and interpreted only in a discourse context. It cannot be predicted from the syntactic structure alone, apart from the default patterns, which, as we have now seen, can be overridden by various factors. The only thing that really matters is the speaker's intention when producing the utterance. We may note, however, that very often, a rising contour is used to signal incompleteness and a falling contour completeness. Vaissiere (1995) observes that this tendency is a virtual linguistic universal, found across a wide range of languages, and she proposes that, ultimately, these two patterns are biologically and ethologically motivated. The two functions, the signaling of continuation and completion, belong to an unfolding discourse and its concomitant turn-taking patterns (Välimaa-Blum 1999a), and as such we can understand their universal status, because continuation and completion belong to 'floor management' (Clark and Schaefer 1989), which is something that speakers of all languages have to do in any ongoing conversation. Asking a question is an example of an incomplete discourse and answering it illustrates a potential completion. Yes/no-questions are incomplete in themselves, without the answer, and they use the rising tune. De-

International universals


claratives are typically used to make assertions, which introduce new information to the discourse, and they tend to have the completion pattern, the falling tune. And these two intonational shapes are the default patterns in yes/no-questions and declaratives. Rises are also used in non-terminal items on lists, in the so-called 'continuation' contour, but the final item of the list typically has a fall, the rise thus signaling incompleteness and the fall completion, which again perfectly agrees with the general tendency. The intonational contour in Figure 9 illustrates the list pattern with continuation rises and the final fall. W e may note in passing that, in this figure, the final consonants corresponding to the fricatives of the plural are strongly devoiced, when in fact, prescriptively speaking, they should be voiced throughout. W e can see this clearly if we compare the portions selected for each word in the caption and the actual ends of the corresponding intonational curves. These do not coincide because the pitch contour is shorter in each case. This means that voicing stopped before the end of each fricative, and this kind of final devoicing is typical of spoken English, especially in GA. !











J ™*"Vi











|A Ν D|


J*y f \ ^

\\ v ,

— ^

,• • • 1 —

Figure 9. The list intonation on Roses, lilies, daisies and geraniums, with rises on the non-terminal and a fall on the last item Wh-questions tend to come with a falling tune, unlike yes/noquestions. But wh-questions are different from polar questions in that they


Intonation and grammatical


contain a large amount of propositional presuppositions (Levinson 1983), and this places them in the middle ground between declaratives and interrogatives. For example, the interrogative sentence " W h a t time did John come h o m e " asks a question about the time of an event whose existence is taken for granted, i.e., it is linguistically presupposed that John did come home. This may explain why the typical wh-pattern is falling: while whquestions are asking for a specific piece of new information, they also supply a large amount of background information. A wh-question can only be made with the use of a wh-word, but yes/no-questions do not depend on a specific syntactic form, for a declarative sentence can be made into a yes/no-question by just using the universal rising contour. This is possible probably because this rise is associated with the syntactic yes/nopattern by default, and also because rises signal incompleteness and a declarative with a rise becomes, not an assertion but a query. In sum, the use of the rising and falling tunes to signal continuation and completion, respectively, is not specific to English, but English manifests a strong universal tendency, which is shared by numerous other languages.

9. S u m m a r y Intonation plays a central role in English in the expression of the information structure. The intonational prominence in an utterance can be said to have a deictic function: it guides the listener's attention to a certain portion of the utterance and this is where the information focus is found. To know where to place the information focus, one has a choice between the broad focus and narrow focus. The broad focus is the default, which, in general, places the nuclear accent on the last semantic argument of the verb, while the narrow focus can be found on any stressed syllable in the utterance, depending on the context. When the focus is broad, the contour matches t w o general classes of contexts. In one, it can answer open-ended questions and recite a continuous narrative text, and in the other, it can be used to respond to questions about some aspect or another in the verb phrase. A narrow focus always corresponds to very specific propositional presuppositions, with the focused constituent zooming in on what is asserted as the new piece of information, the rest of the utterance being contextually dependent, either explicitly or implicitly. A grammatical construction is an association of linguistic form, meaning and a conventional conversational use, and intonation is one of its formal concomitants. W e can associate each construction with a default contour, which can be overridden by the context, the lexical content and

For further th ο ugh t


the speaker's intention. Thus, apart from the defaults, the intonational pattern of an utterance cannot be predicted from the syntactic form alone. The speaker attitude, such as a high level of interest or the lack of it, incredulity or surprise, etc., may also influence the contour shape and especially, the pitch height of an utterance, but I believe that the expression of the speaker attitude is subsidiary to the information structure. Therefore, the pragmatically felicitous placement of the nucleus and an understanding of why it goes where it goes are of foremost importance in the study of intonation, and the information structure is the chief determinant in this respect. The most important fact about intonation, however, is that it can go wrong only if it creates utterances that are semantically and/or pragmatically incompatible with the context. A major portion of the discussion above concerned the default association of sentence types and intonation. But there are defaults at the level of the instantiations of the construction types as well. What the contour by default of any given utterance token is, depends not just on the schema it instantiates, but the preceding context intimately interacts with the nucleus placement. This discussion has left out many elements of intonation that would need to be understood if one wants to understand the choice of a contour exhaustively. Focusing or the bringing of certain linguistic elements into special prominence can be accomplished not only intonationally but also syntactically and lexically, and I have not touched upon the last two at all. For example, words like even have a focusing function. Even calls for the nucleus on the head of its complement, as in even a two^c^^-year-old can do that (Schmerling 1973). And as Fillmore, Kay and O'Connor (1988) demonstrate, a grammatical construction may even contain several foci, and these need to be identified in the constructional schema, without which the understanding of the pragmatic function of the grammatical unit would not be feasible.

For further thought Take any of the above sentence types with its default contour and compare it with a non-default token. What sorts of differences might there be between the two? For example, the yes/no-question did you see it happen with a low rise might feel much more probing than the same sentence with a high falling nucleus. The same applies to the wh-question how did it happen. The pragmatic interpretation of this sentence with a fall on happen is not quite the same as when this constituent has a rise. As noted above, in addition to the major sentence types, languages also have nu-


Intonation and grammatical


merous minor sentence types, each with their own, specific sentential formulas. Among them, I want to mention a couple of examples from Sadock and Zwicky (1985: 158). In the first four, (a)-(d), we have something that the authors say are used for the "punctuation of discourse:" a. How do you do? b. Pleased to meet you. c. Good morning. d. See you later. Another minor type uses words like how, what, why and about, and while the examples in (e)-(h) contain a wh-word, we must note that they do not have the syntactic form of a wh-question, and they are thus considered to form a separate sentence type. Their typical use is to 'make a suggestion.' e. How about going to the beach? f. What about selling the house? g. Why waste your time on reading such trash? h. Why not apply? The exclamations in (i)-(l) instantiate another minor sentence type, and in form, we may note that they resemble declaratives and interrogatives: i. What a good boy you are! j. How tacky that is! k. Boy, does he ever have beautiful legs! 1. Wow, can he knit! Your task now is to see for each example if there is an intonational pattern that best fits its conventional conversational use. Also, you may want to construct further examples and verify your proposals about the tunes with other people.

Chapter 7 Concluding remarks

The underlying assumption that I have made all through the preceding pages is that the ultimate function of a language system is the expression of meaning. In this kind of system, the sounds themselves have no independent role, but they always and only occur in combination with one another to form symbolic units. In phonology, we study the speech sounds from the point of view of their distinctive roles and the various manifestations they assume in connected speech. Given that speech sounds are associated with morphemes, at all times, we would not expect speakers to have a mental representation of the phonemes and their allophones detached from the meaningful units they occur in. There is actually evidence indicating that the segmentation of words into sounds is perhaps not something that comes to us as a matter of course in the cognitive maturation, and this supports the idea that sounds do not have an autonomous existence in the minds of the speakers. In the long-established view of the generative-transformational phonology, each morpheme has only one basic, underlying form, and the coallomorphs are derived from this by derivational rules that are phonologically conditioned, not sensitive to meaning and/or grammatical category. The rules apply in a sequential fashion, often iteratively and in two independent cycles. In this approach, there are only two levels, which correspond to the morpheme level and the phonetic level adopted here. Both allophonic rules and those that are morphologically conditioned apply in block, without any distinction between them. Since each morpheme only has one form in the lexicon and all the co-allomorphs are obtained from it under the guise of the sounds alone, there is no meaning or syntax in phonology, in spite of the fact that what the rules often actually generate is co-allomorphy. In the spirit of Cognitive Linguistics and Construction Grammar, I have now proposed that in phonology, there is no derivation of any kind. Morphemes contain all their non-automatic allomorphy in the lexicon, and the morphemic, phonemic and phonetic levels are maximally identical. The lexicon contains all the lexical morphemes and the word formation schemas. At the phoneme level, these are unified to form morphologically complex words, and at the utterance level, words occur in higher-level constructional frames, such as phrases and sentences. One consequence of the requirement of maximum similarity across the levels




is that even lexical representations are fully specified for both predictable and non-predictable phonological features. Phonologically fully predictable m o r p h o p h o n e m e s also occur in the lexicon, otherwise there is no abstractness. Since the phonemes are entirely specified for all features in the lexicon, they cannot be schematic for their co-allophones. It would actually be undesirable for the phonemes to be schematic, because this would necessarily introduce underspecification in the lexicon, which goes against psycholinguistic evidence. Rather, the distinctive sounds are conceived of in the spirit of exemplar models: the phoneme is a class of phonetically similar sounds that speakers take to be one and the same entity, mainly because all the co-allophones generate identical semantic contrasts. The phoneme classes are prototype-centered and the membership is gradient, some members may be even marginal to the extent that, out of context, their category membership is indeterminate. The prototype effects arising f r o m experimental studies appear to be based on the most frequent exemplars. The complexity in the SPE-type phonologies is largely found in the rule systems, which function without memory or teleology, where the rules go f r o m the base forms to the co-allomorphs automatically, without ' k n o w i n g ' where they come f r o m and without ' k n o w i n g ' what their outcome will be. The only information the rules can use has to do with the sounds themselves. The derivational results of these grammars are constrained by various well-formedness conditions, such as phonotactic limitations, but these, too, are autonomous from meaning and grammar. In the approach presented here, the principles that relate the morphemic, phonemic and phonetic levels deal with a range of meaningful co-allomorphs that belong to specific grammatical categories, not j u s t to anonymous strings of phonemes. The word schemas identify the morphosyntactic category of the stem, its sound shape and meaning, and they then make their own contribution to the word created in the same three domains. Once the morphemes are put together at the p h o n e m e level to form words, even the morphophonemes are fully specified, and it is also at this level that the words are syllabified and word stress is assigned. The phonotactic constraints operate at the word level to guarantee that only well-formed strings of phonemes are constructed in actual words. The words in the utterance level constructions may undergo phonological processes according to optional sandhi principles, which apply across word and even clause boundaries within the same intonational phrase. The phonemic constituency of a morpheme may thus change at the phonetic level, depending on the utterance context, but the requirement of maximum similarity across levels is generally maintained.

Concluding remarks


The approaches to phonology that assume one base form and a set of sequential rules that derive the co-allomorphs f r o m it, necessarily create underlying forms that are often highly abstract, even corresponding to non-existing sound shapes. If we assume that all the predictable and nonpredictable information is present in the various co-allomorphs in the lexicon, we avoid both abstractness and underspecification, neither of which is desirable in general. The grammar that is thus established may appear simple in the absence of complex rule interactions and cycles. But of course, there is complexity in Cognitive Phonology in Construction Grammar as well, but it stems from the co-allomorphy in the lexicon and the schemas that identify their stems in terms of semantics, sound shape and/or grammatical category. The phonotactic principles, which are part of the global wellformedness patterns of words, are needed to account for both nonoccurring sound combinations and phonemes that have a restricted combinatorial potential. I believe that negative phonotactic statements outlawing the occurrence of some sound sequence or another have their place in the grammar, for what they describe has to do with what a possible word in the language in question can be. Strictly speaking, while negative statements describe what cannot occur, they are actually the complement of one or more positive statements. The positive statements can be stated in terms of lists of procedures corresponding to actually occurring sound sequences or they may correspond to schematic generalizations, but the positive statements also entail knowledge of what cannot be said. Since the grammar, involving complex symbolic constructions at various levels, is assumed to be a cognitively real entity, not just a set of automatic patterns of usage, a negative statement may fully capture an aspect of this grammar. Accepting the fact that the boundaries are fuzzy, we can state that the speaker who knows the words of his/her language also knows what is not a possible word. While word formation schemas apply simultaneously, without sequential ordering, the morphemes, j u s t as the phonemes, are not free to combine in j u s t any order in words. In the morphology, there are conditions as regards the ordering of the derivational and inflectional schemas, e.g., the latter always appear after the former. It follows f r o m this that the sound shape of a suffix may depend on the suffix that follows it, j u s t as the sound shape of the stem may depend on the suffixes. The schemas and their stems are thus intimately intertwined; so intimately in fact, that frequently occurring affixes and affix clusters probably correspond to highly entrenched procedural knowledge in the minds of the speakers. W e can consider both phonemes and morphemes to be classes of mental entities with several manifestations in speech. What brings the co-




allophones of the same phoneme together is their phonetic similarity and the fact that they are perceived to be the same sound, which in turn is based on their having the same distinctive function. What unites the coallomorphs of the same morpheme together is their prototypically structured meaning and their grammatical category, but unlike the coallophones, however, they need not bear any formal similarity to one another. W e can see an analogy between propositions and the meanings of morphemes. The same morphological meaning may assume several different sound shapes in actual utterances, and the choice of the relevant co-allomorphs depends on the construction as a whole. Propositions likewise may assume different constructional forms, which all tend to have specialized contextual uses, which, however, are less rigidly context-dependent than those of the co-allomorphs of a morpheme. In the view of Cognitive Phonology in Construction G r a m m a r adopted here, we never lose track of why speech sounds are produced. Even when we talk to ourselves, what we utter is meaningful. The morphological schemas manipulate sound shapes, grammatical categories and meanings, and the core of phonology resides in this interaction. It follows from this that is not possible to separate phonology from morphology. The close connection of phonemes with meaning actually f o r m s the basis of embodiment in phonology. If morphology and syntax form a continuum, as is explicitly assumed in Construction Grammar, then it seems that there is not much in language that we can separate into independent modules. Isolating phonology f r o m the expression of meaning leads to a vision of the mind as a mindless machine. And this is hardly true of the human mind.

Chapter notes

1. I am using the word 'morpheme' in a rather traditional way, as pointed out to me by Marc Plenat (personal communication). However, the use of adjectives such as lexical, grammatical, derivational and inflectional and terms such as stem, root, allomorph, sound shape, morph, free, bound, affix, prefix, suffix, infix and formative, should bring out the intended referent in each instance. Also, strictly speaking, as we will see below, a free lexical morpheme corresponds to a class of sound shapes associated with meaning and grammatical information. Furthermore, in the framework adopted here, affixes actually have no meaning in themselves; they get their meaning/grammatical function from the morphological schemas that introduce them into words. 2. Philip Carr (personal communication) notes that the expression 'derived word' gives a rather process-like view, which I explicitly reject in this work. The word 'derived,' however, as opposed to 'inflected,' does capture the traditional distinction between the lexical and grammatical morphemes, and the lexical and grammatical schemas advocated here. I will thus continue using it without any connotations of sequential or diachronic derivation. 3. If the average number of sounds per word is four and Ν is the number of sounds, the following formula calculates the number of different words (= X) that can be expressed in this language. I am indebted to Gilles A. Blum for the mathematical formulation of this question, as well as the one in the following footnote. Ν * ( N - l ) * (N-2) * (N-3) = X 4. This is calculated as follows: (((n * ( n - l ) ) * (n-2)) * (n-3)) + (((6 * n) * ( n - l ) ) * (n-2)) + ((6 * n) * ( n - l ) ) 5. I am grateful to Keith Johnson for this reference. 6. Parts of this chapter are based on a talk given at a Journie Agregation organized by the English department at Universite de Paris 3 in March 2001.


