The Sounds of Japanese 0521617545, 9780521617543

This introduction to the sounds of Japanese is designed for English-speaking students with no prior knowledge of the lan

208 39 43MB

English Pages xx,263 [279] Year 2008

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Sounds of Japanese
 0521617545, 9780521617543

Table of contents :
Contents

List of figures x
List of tables xiv
Preface xvii

1 Phonetics 1
1.1 Speech sounds
1.2 Airstream mechanisms 3
1.3 Phonation 3
1.4 Nasality 4
1.5 Transcription and segments 5
1.6 Length 6
1.7 Suprasegmentals 7
1.8 Vowels 7
1.9 Obstruents 11
1.10 Sonorants 17
1.11 Secondary and double articulations
1.12 Acoustic displays 20
Exercises 24

2 Phonemics 26
2.1 Phonology 26
2.2 Contrast and minimal pairs 26
2.3 Allophones and phonemic symbols 27
2.4 Allophones in complementary distribution 28
2.5 Allophones in free variation 29
2.6 Distinctive features 30
2.7 Redundant features and allophonic rules 32
2.8 Phonotactics 35
2.9 Affricates 37
2.10 Diphthongs 39
2.11 Overlapping and neutralization 42
2.12 Careful pronunciation 46
Exercises 50

3 Vowels 53
3.1 Short vowels 53
3.2 Long vowels 56
3.3 Vowel sequences 61
3.4 Vowel reduction 68
Exercises 70

4 Syllable-initial consonants 74
4.1 Stops 74
4.2 Fricatives 77
4.3 Affricates 82
4.4 Nasals 87
4.5 Liquid 89
4.6 Semivowels 89
Exercises 94

5 Syllable-final consonants 96
5.1 Syllable-final nasals 96
5.2 The mora nasal phoneme 100
5.3 Phonotactics of the mora nasal 104
5.4 Syllable-final obstruents 105
5.5 The mora obstruent phoneme 106
5.6 Phonotactics of the mora obstruent 108
Exercises 112

6 Syllables and moras 115
6.1 Syllables 115
6.2 Moras 117
6.3 Mora timing 121
6.4 Syllables, moras, and accent 123
6.5 Words and music 126
6.6 Extra-long syllables 131
6.7 Vowel-vowel sequences 133
Exercises 138

7 Accent and intonation 142
7.1 Intonation 142
7.2 Pitch accent 143
7.3 Noun and particle accent 154
7.4 Verb accent 162
7.5 Adjective accent 173
7.6 Longer phrases 180
7.7 Compounds 187
7.8 Sentence-final intonation 195
Exercises 198

8 Other topics 206
8.1 Vowel devoicing 206
8.2 Syllable-initial velar nasals 214
8.3 Glottal stops 222
8.4 Alveopalatal obstruents and romanization 225
Exercises 232

Appendices 236
Appendix A Broad phonetic transcription 236
Appendix B Phonemes and allophones 237
Appendix C Hepburn and Kunrei romanization 239
References 245
Index 258

Citation preview

Contents

List offigures x List of tables xiv Preface xvii 1 Phonetics 1

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12

Speech sounds Airstream mechanisms 3 Phonation 3 Nasality 4 Transcription and segments 5 Length 6 Suprasegmentals 7 Vowels 7 Obstruents 11 Sonorants 17 Secondary and double articulations 19 Acoustic displays 20 Exercises 24

2 Phonemics 26

2.1 2.2 2.3 2.4 2.5 2.6 2. 7

VII

Phonology 26 Contrast and minimal pairs 26 Allophones and phonemic symbols 27 Allophones in complementary distribution 28 Allophones in free variation 29 Distinctive features 30 Redundant features and allophonic rules 32

Contents

VIII

2.8 Phonotactics 35 2.9 Affricates 37 2.10 Diphthongs 39 2.11 Overlapping and neutralization 42 2.12 Careful pronunciation 46 Exercises 50

3 Vowels 53 3.1 3.2 3.3 3.4

Short vowels 53 Long vowels 56 Vowel sequences 61 Vowel reduction 68 Exercises 70

4 Syllable-initial consonants

74

4.1 Stops 74 4.2 Fricatives 77 4.3 Affricates 82 4.4 Nasals 87 4.5 Liquid 89 4.6 Semivowels 89 Exercises 94

5 Syllable-final consonants 96 5.1 Syllable-final nasals 96 5.2 The mora nasal phoneme 100 5.3 Phonotactics of the mora nasal 104 5.4 Syllable-final obstruents 105 5.5 The mora obstruent phoneme 106 5.6 Phonotactics of the mora obstruent 108 Exercises 112

6 Syllables and moras

115

6.1 Syllables 115 6.2 Moras 117 6.3 Mora timing 121 6.4 Syllables, moras, and accent 123 6.5 Words and music 126

Contents

IX

6.6 Extra-long syllables 131 6.7 Vowel-vowel sequences 133 Exercises 138

7 Accent and intonation 142 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8

Intonation 142 Pitch accent 143 Noun and particle accent 154 Verb accent 162 Adjective accent 173 Longer phrases 180 Compounds 187 Sentence-final intonation 195 Exercises 198

8 Other topics 206 8.1 8.2 8.3 8.4

Vowel devoicing 206 Syllable-initial velar nasals 214 Glottal stops 222 Alveopalatal obstruents and romanization 225 Exercises 232 Appendices 236 Appendix A Broad phonetic transcription 236 Appendix B Phonemes and allophones 237 Appendix C Hepburn and Kunrei romanization 239 References 245 Index 258

XI

Figures

1.1 Vocal tract 1. 2 Vocal folds and glottis from above (adapted from Roca and Johnson 1999:15) 1.3 Extreme positions during vocal-fold vibration cycle (adapted from Roca and Johnson 1999: 16) 1.4 Velum and nasality (adapted from Ashby 1995:31) 1.5 1. 6 1. 7 1. 8 1.9 1.10

Stylized vowel-area diagrams Cardinal vowels American English vowels American English diphthongs Place-of-articulation labels Side-to-side cross-sections of [s] and [J] behind the alveolar ridge (adapted from Ladefoged and

Maddieson 1996:147, 149) l.ll Typical tongue positions for [s], [c;;] , and [J] 1.12 Demonstrating aspiration with a piece of paper 1.13 1.14 1.15 1.16 2.1 2.2 2.3 2.4

Waveform of spy [spaj] Spectrogram of spy [spaj] Formant values early and late in [aj] of spy [spaj] Spectrogram and pitch track of spy pie [spajphaj] Analogy between allophones and roman letters Place and nasality contrasts in some English consonants Using highness and lowness to characterize vowel h eight Traditional phonetic labels as informal feature specifications

2.5 English vowel nasalization rule 2.6 English fricative labialization rule 2.7 Cluster-splitting consonant transpositions in English speech errors 2.8 Monosyllabic tell and why sung on two notes each

x

page 2 3 4

5 8 9 10 11 12

15 15 16 21 22 23 24 28 31 32 32 33 35 38 41

List of figures

2.9 Monosyllabic cry sung on three notes 2.10 Consonant transposition splitting English syllable-initial /st/ into Isl and It/ 3 .1 Japanese vowels ( • ) in vowel space delimited by cardinal vowels (o) 3.2 Lip protrusion in French /u/ 3.3 Lip compression in Japanese /u/ 3.4 Waveforms of satoya (top) and satooya (bottom) 3.5 Phonemic transcriptions with explicit syllabification 3.6 Long vowels occupying two skeletal slots 3.7 The syllable rei sung on two notes 3.8 Speech errors splitting /ai / 3.9 Speech error splitting /ei / into /el and /ii 3.10 Speech error splitting /eH/ into /el and IHI 3.11 Major retailer's rooftop sign 3.12 The syllable zo sung on two notes 3.13 Average first and second formants of Japanese short vowels produced by speakers reading word lists ( • ) and prose passages (o) 4.1 Aspiration in English tea [thi] 4.2 Traditional fifty-sound display of hiragana 4.3 Hiragana ha-column with Hepburn and Kunrei romanization 4.4 Hiragana ta-column with Hepburn and Kunrei romanization 4.5 Hiragana za-column with Hepburn and Kunrei romanization 5.1 Syllable-final nasal before /c/ (left) and /s/ (right) 5.2 English /n/ before /ts/ (left) and /s/ (right) 5.3 Syllabification of extra-long nasals 5.4 IN/ Taking on place and aperture of following /p/ 5.5 IN/ Taking on default place and aperture before pause 5.6 IN/ with inherent place and aperture, replaced before /p/ (left) and retained before pause (right) 5. 7 Parallel syllabification of extra-long obstruents and nasals 5.8 Air pressure building up behind closure for [b] 5. 9 IQ/ Taking on voicing, place, and aperture of following Isl 5.10 IQ/ Taking on default place and aperture before pause 5.11 IQ/ with inherent place and aperture, replaced before /s/ (left) and retained before pause (right) 6.1 Partial sonority scale 6.2 Peaks of sonority matching number of syllables 6.3 Mismatches between peaks of sonority and number of syllables 6.4 Japanese short syllable template

42 43 54 55 55 59 60 61 64 64 65 65 66 67

70 75 79 80 83 87 98 98 100 102 103 103 106 109 111 111 111 116 116 116 117

XII

List of figures

6.5 Japanese short/long syllable template 6.6 Japanese syllable and mora structure 6.7 Onset, nucleus, and coda 6.8 English syllable constituents 6.9 Splitting at the onset-rhyme boundary 6.10 Japanese syllable constituents 6.11 5-7-5 meter in a senryu 6.12 Stress-timing and feet in English 6.13 Pitch track of /yo~bU:ol.Ksa/ 6.14 Syllable ending with IQ/ assigned to two notes 6.15 Syllable ending with /QI assigned to one note 6.16 Short syllable assigned to two notes 6.17 Two short syllables assigned to one note 6.18 Notational revision providing one note for each syllable 6.19 Syllable ending with !NI assigned to two notes 6.20 Syllable ending with !NI assigned to one note 6.21 Long-vowel syllables assigned to one and two notes 6.22 Diphthong syllable assigned to two notes and one note 6.23 Extra-long syllable types 6.24 Moras grouped into extra-long syllables 7.1 English statement and question intonation 7.2 Japanese statement and question intonation 7.3 Words differing in pitch accent 7.4 Final accent versus no accent 7.5 Pitch patterns on topic phrases 7.6 Phrasal pitch pattern 7.7 Accent on long syllables 7.8 Final accent versus no accent on words ending in a long syllable 7.9 Alternative pitch patterns on a phrase-initial unaccented long syllable 7.10 Traditional high/low representations of pitch patterns 7.11 Common representations of accent patterns 7.12 Mechanical procedure for specifying pitch-accent patterns 7.13 Basic intonation contour for an accent phrase with no accent 7.14 Basic intonation contour for an accent phrase with an accent 7.15 Intonation contour diagrams compared with actual pitch tracks 7.16 Schematic intonation contours for U +A and U + U combinations 7.17 Schematic intonation contours for A+ A and A+ U combinations 7.18 Schematic intonation contours for combinations with first-component focus

List of figures

XIII

118 118 119 119 120 121 121 122 124 127 127 127 128 128 129 129 130 130 132 132 143 143 144 144 145 145 146 146 147 148 148 150 151 152 152 181 182 184

7.19 Schematic intonation contours for combinations with second-component focus 7.20 Pitch track of accentually non-unified /nPcibei : kaiuN + kyo~Hgi/ 7.21 Branching diagrams of three-element compound nouns 7.22 One-word statements 7.23 One-word questions 7.24 Statement and question with accent on next-to-last syllable 7.25 Abrupt rising /ne/ and /yo/ 7.26 Abrupt falling /yo/ 8.1 Waveform and spectrogram of /kirt:ai / 'expectation' 8.2 Waveform and spectrogram of /k a~i/ 'smelly'(left) 8.3 8.4 8.5 8.6

and /s1!llta~iru/ 'style' (right) Waveform and spectrogram of /siisoH/ 'thought' (left) and /sll.5oH/ 'obstacle' (right) Waveform and spectrogram of /ima~su/ 'will stay' Waveform and spectrogram of /neta/ 'went to bed' Waveform and spectrogram of /akeru/ 'open'

185 192 193 196 197 197 197 198 207 208 208 223 224 225

Tables

1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 5.1 5.2 5.3

XIV

American English vowels American English diphthongs American English stops and fricatives Examples of nasalized [A.] and non-nasalized [A] in Engli Vowel contrasts before [k] and [J] in English syllables Vowel subsystems before /k/ and /r/ in English syllables Short vowel contrasts Short/long vowel contrasts Phonemic analysis with ten unrelated vowels Minimal pairs and the ten-vowel analysis Phonemic analysis with double vowels Phonemic analysis with a lengthening phoneme Sequences of identical short vowels Vowel-vowel sequences Stop contrasts Some allophones of stop phonemes Phonetic voiceless fricatives Distributions of phonetic voiceless fricatives Distributions of voiceless fricative phonemes [ts] before vowels other than /u/ Distributions of /ti, /cl , and /c/ Distributions of /d/, !JI, and /z/ Distributions of /ml and /n/ Syllable-initial C/y/ clusters Syllable-final nasals before stops Syllable-final nasals before fricatives Syllable-final nasals before semivowels and vowels

172

173 174 175 176

Preface

177

179 181 183 184 189 The original motivation for this book was dissatisfaction with An Introduction to Japanese Phonology (Vance 1987) as a textbook for a kind of course that I and many of my colleagues are often asked to teach. Faculty members who are interested in Japanese phonetics and phonology and who have appointments in departments that include East Asian languages are usually expected to provide an introductory course for beginning graduate students and advanced undergraduates. I myself have taught courses that fit this description for the University of Hawai'i at Manoa, for the University of Arizona, and for the Summer MA Program in Japanese Pedagogy at Columbia University. Most of the students who enroll are interested primarily in becoming teachers of the Japanese language, and they typically have little or no background in linguistics. The great majority of them will never take another course that deals with phonetics or phonology. Not surprisingly, this clientele hasn't been well served by An Introduction to Japanese Phonology, which targeted graduate students specializing in linguistics. When Cambridge University Press approached me about contributing to the Sounds of series, I welcomed the opportunity to develop a manuscript that I'd been working on little by little into something that would fit the series parameters. Compared to An Introduction to Japanese Phonology, The Sounds of Japanese covers much less material, but it covers that material much more thoroughly and in a way that I hope will meet the needs of its intended audience - students, teachers, and aspiring teachers of Japanese as a foreign language. The only background it presupposes is a fairly high level of Japanese language proficiency. The discussion is confined almost entirely to phonetics and what might be called "phonology proper." Morphophonemic alternations are mentioned only in passing.

190 191 191 192 211

212 212 213 226 227 228 230 231

XVII

xv

_sh

page 9 11 13 33 44 45 54 56 57 58 58 59 61 62 75 77

78 78 82 84 85 88 88 92 97 97 99

List of tables

5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 6.1 6.2 6.3 6.4 6.5 6.6 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.ll 7.12 7.13 7.14 7.15 7.16 7.17 7.18 7.19 7.20 7.21 7.22 7.23

Syllable-final nasals before nasals Phonemic transcriptions of syllable-final nasals Realizations of word-final IN/ before following words Common extra-long obstruents Degrees of length in obstruents and nasals Phonemic transcriptions of extra-long obstruents Mora obstruent before /f/ and /h/ Orthographic mora obstruent before voiced obstruents Mora obstruent before voiced obstruents Accent locations in noun+ /wa/ phrases Accent patterns on nouns consisting of two short syllables Accent patterns on words consisting of two long syllables Accent on city names Examples of default accent on syllable containing third mora from end Accent on diphthongs in city names Accent marking on words containing long syllables Possible accent locations for nouns of one to four syllables Possible accent locations for nouns containing long syllables Accent on isolated nouns and on noun+ /wa/ combinations Accent on noun+ /no/ combinations Nouns that maintain final accent when combined with /no/ Quantity nouns that maintain final accent when combined with /no/ Nouns combined with /naido/ Nouns combined with polite copula forms Nouns combined with /gulrai / Numbers combined with /guirai/ Nouns combined with /ma'de/ followed by /wa/ and by /deisu/ Nouns combined with /karal/ followed by /wa/ and by /deisu/ Unaccented nouns with two particles or with a particle and a copula form Accented and unaccented dictionary forms of verbs Dictionary forms of verbs ending with V/u/ Dictionary forms of verbs ending VV(C)/u/ Dictionary forms of accented verbs ending /ae/C/u/ or /oe/C/u/ Nonpast negative forms of verbs Past affirmative and gerund forms of verbs Affirmative passive and causative forms of verbs Potential forms of verbs Provisional and negative gerund forms of verbs

99 100 102 105 105 107 108 109 llO 123 123 123 124 125 136 146 154 154 155 156 157 157 158 158 159 159 160 160 161 163 163 165 165 167 168 169 170 171

List of tables

XVI

7.24 7.25 7.26 7.27 7.28 7.29 7.30 7.31 7.32 7.33 7.34

Polite non past affirmative and hortative forms of verbs Verb stems, humble forms, and honorific forms Adjective forms Accented and unaccented dictionary forms of adjectives Adverbial forms of adjectives Past affirmative and gerund forms of adjectives Polite nonpast affirmative forms of adjectives Combinations of two potential accent phrases Other combinations of two potential accent phrases Japanese full names Compound noun accent when second element is three

or four moras 7.35 Compound noun accent when second element is longer than four moras 7.36 Compound adjective accent 7.37 Verb +verb compound verb accent 7.38 Noun+ verb and adjective + verb compound verb accent 8.1 Accent location in prefecture names 8.2 Accent shift in prefecture names 8.3 Adverbial forms of accented adjectives 8.4 Accent locations in feminine given names 8.5 Romanizations of alveopalatal consonants 8.6 Permissible and impermissible Clo/ and C/yo/ sequences 8.7 C/ye/ sequences in a Kunrei-style analysis 8.8 Some phonetic CV sequences 8.9 Distributions of some consonant phonemes in an analysis with distinctive palatalization

XVIII

Preface

The Sounds of Japanese focuses almost exclusively on the "standard" dialect of Japanese, that is, the pronunciation of well-educated, upper-middleclass native speakers who grew up in the Tokyo metropolitan area (Shibatani 1990:186). This narrow focus is simply a matter of practicality; providing something close to an adequate description of this single, idealized variety is as much as I could realistically hope to accomplish, and this is the variety that is normally taught to students of Japanese as a foreign language. It's also the variety (more accurately, narrow range of varieties) most extensively studied in linguistic research. I certainly don't endorse the hegemonic notion that this particular variety of Japanese is somehow intrinsically superior to all the other varieties that the dialect diversity of modern Japan still has to offer. I'm also skeptical of the widespread belief that there are no significant pronunciation differences correlating with socio-economic background in the Tokyo metropolis (Akamatsu 2000:211), but I don't know of any relevant research Needless to say, there's also a great deal of dialect diversity in the Englishspeaking world, and when I compare Japanese to English, I need to identify a particular variety as the standard of comparison. There's no variety of English pronunciation in the United States that has quite the same status as received pronunciation (RP) in Great Britain (Bloomfield 1933: 49, Bronstein 1960:6, House 1998:210-4), but I'm going to refer to the variety I use in examples as United States newscaster English. This is just a makeshift label for a variety that most English speakers in the United States would be willing to acknowledge as standard. It's close but not identical to the variety that I speak myself, and it's more specific than so-called General American (Baugh and Cable 1978:374-5). As I mentioned above, this book presupposes a fairly high degree .of Japanese language proficiency. In particular, I assume that readers are familiar with Japanese grammar and vocabulary, although I provide enough glossing to make the discussion intelligible even to someone whose knowledge of the Japanese lexicon is quite limited. I also assume that readers are familiar with the present-day Japanese writing system, and I'll use the words hiragana, katakana, kana, and kanji without explanation and, hereafter, without italicization. I'll occasionally use dakuten ii ,¢.\, also without italics, to refer to the "voicing diacritic" of kana spelling, that is, the two "dots" that distinguish letters such as \'' (gu) and \ (ku). It's sometimes convenient to label vocabulary items by etymological source, and I'll use the labels native (corresponding to wago ;fOfil:t) for elements that predate that massive influx ofborrowings from Chinese that began about 400 CE, Sino-Japanese (corresponding to '"*' 25. kango l~~a:i) for elements that date back to that influx, and recent borrowing or recent loanword (corresponding to gairaigo Yf-* ~g) for elements that

XIX

Preface

came from other languages in the not-so-distant past, mostly from English. Following Martin (1975:151), I'll use Sino-Japanese binom (corresponding to niji-jukugo - ~1~~) to refer to the prototypical Sino-Japanese vocabulary items that are written with two kanji. When I use romanization, I use the modern, modified version of the Hepburn (Hebon-shiki ""-4{ ~):\) system, although I occasionally refer to the Kunrei (Kunrei-shiki Wil1JJ:\) system. Appendix C treats the romanization systems in detail. In connection with choice of topics and depth of coverage, I accept the idea that rhythm and intonation are more important than segmental accuracy in contributing to native listeners' impressions of "accentedness" and "intelligibility" (Anderson-Hsieh, Johnson, and Koehler 1992, Boula de Mareiiil and Vieru-Dimulescu 2006). But these prosodic aspects of pronunciation are much more challenging to describe and to teach, and the relevant sections of this book only scratch the surface. It's hard to avoid the uneasy feeling that we teach what we teach because we can, not because it's especially helpful. Incidentally, whenever I cite a pair of words to illustrate a segmental contrast, my intention is for the two words to have the same pitch-accent pattern, which makes them a true minimal pair (§2.2). In fact, though, the accentuation of lexical items varies enough among "standard" Tokyo speakers that at least some of the cited pairs will not match for many individual native speakers. Fortunately, it turns out that accent makes no difference as far as the analysis of the segmental system is concerned (Akamatsu 2000:51-2). The list of references that I provide isn't even close to a comprehensive bibliography of relevant sources, but there's at least one citation whenever documentation seems called for. I try to cite one Japanese-language source and one English-language source when I can, but in almost every case there are many other sources that I could've chosen instead. Even so, a reader who wants to learn more about any particular topic will usually be able to get a good start by consulting the items I do cite and their bibliographies. Much of the final manuscript for The Sounds of Japanese was written in Honolulu in the fall of 2006. I'm grateful to the College of Humanities and the Department of East Asian Studies at the University of Arizona for letting me negotiate a semester on "alternative assignment" - a sabbatical in all but name. I'd also like to thank the Department of East Asian Languages and Literatures at the University of Hawai'i at Manoa for granting me visiting scholar status, and my colleague Dave Ashworth for letting me borrow his office in Moore Hall. I benefited enormously from easy access to the resources of Sinclair Library and especially Hamilton Library, which has made a remarkable comeback from a nearly catastrophic flood in October of 2004.

xx

Preface

My comrades Mie Hiramoto, Terry Klafehn, Haruo Kubozono, and Leon Serafun provided help and encouragement along the way. For photos and acoustic displays in the text and for voice recordings on the accompanying CD, thanks are due to Dalila Ayoun, Haruko Cook, Peter Ecke, Samira Farwaneh, Hiromi Uchida Kelley, Alexis Manaster-Ramer, Yumi Matsumoto, Nobuko Ochner, Miki Ogasawara, Naomi Ogasawara, Katsuhiro "Justin" Ota, Mee-Jeong Park, Lolin Cervantes-Kelly, Kiwako Sato, Yasumasa Shigenaga, Karen Vance, Seiji Watanabe, Naoko Witzel, and Yiran Zheng. Nancy Arakawa handled the technical aspects of the recording process in the University of Hawai'i at Manoa's Language Learning Center. All the acoustic displays were produced using version 4.5.01 of the software package Praat, which is distributed free by its creators, Paul Boersma and David Weenink. I'd also like to thank Helen Barton, my editor at Cambridge University Press, for her patience with an author who didn't seem to understand the concept of a deadline. Sarah Green, Elizabeth Davey, and Anna Oxbury all helped to get the manuscript into production. I used portions of early drafts of this book in classes at Columbia and at Arizona, and I'm indebted to several cohorts of bright, hardworking students at those two schools and at Hawai'i for their acumen and their tolerance. Itsumi Ishikawa-Peck, Mieko Kawai, Yuka Matsugu, Naomi Ogasawara, Yumiko Sato, and Ikuko Yuasa provided particular examples or observations that I've incorporated into the text, and I'm especially grateful to Seiji Watanabe for his keen insight into the phonology of his native dialect. This book is dedicated to my parents, as a small token of gratitude for their unwavering support over the years. Sadly, my dad died just before I finished the manuscript. During the long process of writing and revising, I found myself wondering more than once whether I'd made a terrible mistake by taking this project on, but looking back on it now, my one and only regret is that my dad won't get to see the final product.

1.1 Sp

1 Phonetics

eech sounds is the study of S P EEC H S O UN D S, and we can divide it into three sub-fields. The study of the movements of the S P EEC H O RGANS in S P EEC H PR O D UC T! ON is called A RTI C UL AT O RY PH ON ET! C S. The study of the physical properties of speech sounds as sounds is called ACOUST I c PH ONET I CS. Finally, the study of listener responses to speech sounds in SPEEC H PE RCEPT I ON is called AUD IT O RY PH ONE TI CS. Most of what you learn about phonetics in this book will be about articulatory phonetics. The main reason is that you can get a lot of information about the movements of your speech organs just by paying careful attention to the sensations those movements cause. Acoustic phonetics and auditory phonetics require machines and mathematics right from the start, but you can understand fairly sophisticated articulatory descriptions without getting into equipment and experiments. On the other hand, this emphasis on articulation doesn't mean that we'll completely ignore the acoustic and auditory domains, and experimental results will figure in the discussion frequently. You'll find a very basic introduction to acoustic displays in §1.12 below. Since articulatory phonetics involves describing the activities of the speech organs, even a basic introduction like this one has to refer to certain aspects of human anatomy and physiology. Figure 1-1 is a schematic view of the VO CA L TR ACT, which you can think of as a kind of branching passageway from the lungs to the lips and nostrils. It's important to keep in mind that the speech organs all have other, more basic functions . For example, although the tongue plays a very important role in speech production, its role in eating is clearly primary. But even though the speech functions of the speech organs are secondary in this sense, these functions are probably more than just incidental. The reason for thinking PH ONET I CS

2

3

Phonetics

-~----+--

--==~Tf----1--

Nasal cavity Oral cavity

Figure

1.2Ai

Figure 1-1

Vocal tract

so is that some features of speech-organ anatomy and physiology suggest adaptation for speech. 1 For example, the basic functions of the LARYNX are ( 1) to seal off the lungs and keep the chest rigid during physical exertion, and (2) to prevent foreign material from getting into the lungs. But the human larynx doesn't seem to be optimally adapted either for protecting the lungs or for maximizing respiratory efficiency. At the same time, it does seem to be adapted for producing speech sounds. 2 Not all sounds that people produce with their speech organs are speech sounds. For instance, coughing and throat clearing aren't used as speech sounds in any language. Of course, there are also sounds that are speech sounds in some languages but not in others. For instance, CLICKS are used as consonants in some languages of Africa, but they aren't used as speech sounds in English or Japanese. 3 It's even possible for the same sound to be used sometimes as a speech sound and sometimes not. A famous illustration of this point involves comparing the sound of blowing out a candle with the sound represented by the what 1

For the idea that the speech functions of the speech organs are only incidental, see Sapir 1921:8-9. On speech-organ adaptation, see Hockett 1960: 127, Aitchison 1998:149-51, Hudson 2000:144-6. For a skeptical view, see Sampson 1997:24-5, 55-60. 2 On the human larynx in speech and survival, see Lieberman 1991:53-7, 1998:47-8,85-6. 3 For details on clicks, see Ladefoged and Traill 1984, l 994. Textbook treatments include Laver 1994:1 73-9, Ladefoged and Maddieson 1996:246-80.

1.3 Pl

1.3 Phonation

Front

1-2

Vocal folds and glottis from above (adapted from Roca and Johnson 1999:15)

the beginning of an English word like where (assuming a variety of English in which where and wear are pronounced differently). 4 The two sounds may not be exactly identical, but they're at least very nearly the same, and the point of the example is clear: in terms of function, they're completely different.

irstream mechanisms To produce any sort of audible signal you have to produce sound waves by putting air into motion. An AIR STR EA M M EC HA N I S M is a way of initiating such motion, and every speech sound is produced by one of four airstream mechanisms. The most important by far is the P ULMON I C-EG R ESS I VE airstream mechanism, in which the muscles of the chest cavity work like a bellows to expel air from the lungs. 5 All languages use the pulmonic-egressive airstream for the majority of their speech sounds, and many languages, including both Japanese and English, use only this airstream. In other words, all the consonants and vowels of Japanese and English are produced by various modifications of the pulmonic-egressive airstream on its way through the vocal tract. Incidentally, it isn't difficult to speak while breathing in rather than breathing out, but no language in the world uses this P ULMON I C- I NG RE SS IV E airstream mechanism for ordinary speech. 6

1onation The VOCA L CO RD S or VOCAL FO LD S are two shelf-like structures in the larynx that can be held apart or brought together, and the space between the vocal folds is called the G LO TT I S. When the vocal folds are held apart the glottis is open, and when the vocal folds are held together the glottis is closed. In Figure 1-2 the glottis is open for normal breathing. 4

Sapir 1925:37-9. treatments of airstrea m mechanisms include Catfo rd 1988: 19-32, Laver 1994:161-83. 6 Ladefoged and Zeitoun 1993.

5 Textbook

4

Phonetics Minimum glottal opening

Maximum glottal opening

Front

Figure 1-3

Extreme positions during vocal-fold vibration cycle (dapted from Roca and Johnson 1999:16)

If the vocal folds are held close together and there's sufficient airflow, they'll vibrate, that is, open and close rapidly again and again. This vibration is known as PH ONATION or vo 1c1NG.7 The two illustrations in Figure 1-3 show the most closed and most open positions that the vocal folds reach in the course of a vibration cycle. Speech sounds accompanied by voicing are VO ICED, and those not accompanied by voicing are VO ICELESS. For example, the English words zip and sip begin with consonant sounds that are identical except that the former (spelled z) is voiced, whereas the latter (spelled s) is voiceless. Not only can you hear voicing, you can feel it. Place the fingers of one hand on the side of your neck as if feeling for your pulse. If you pronounce a long zzzzz sound, you can feel the vibration of your vocal folds through your skin. If you pronounce a long sssss instead, you won't feel the vibration, since this sound is voiceless.

1.4 Nasality The rear portion of the roof of the mouth, called the SOFT PA LATE or VELUM, can be moved, as shown in Figure 1-4. When the velum is open (lowered), as on the right, air flowing out of the lungs can escape through the nose. When the velum is closed (raised), as on the left, the ORAL CAVITY is sealed off from the NASAL CAVITY. Speech sounds produced with the velum open are NASAL, and those produced with the velum closed are ORAL. For example, the English words mat and bat begin with consonant sounds that are nearly identical, but the former (spelled with m) is nasal, whereas the latter (spelled with b) is oral.

7

Textbook treatments of phonation include Catford 1977:93-11 6, Laver 1994:184-201.

5

1.5 Transcription and segments

Figure 1-4

Velum and nasality (adapted from Ash by 199 5:31)

1.5 Transcription and segments

A P H ONET I C TRANSCR I PTION of a stretch of speech consists of a sequence of PH ONE TI C SYMBOLS . These symbols have fixed , conventional values that are independent of any particular language. The I NTE R NAT I ONAL PH ONET I C ASSOC I ATION endorses a set of symbols called the I NTERNAT I ONAL P H ONE TI C ALP H ABET, and the acronym I PA can stand either for the association or for its alphabet. 8 Most linguists follow the IPA recommendations fairly closely, but many non-IPA symbols are in common use. 9 It's standard practice to enclose phonetic transcriptions in square brackets. For example, the English word bed can be transcribed [bed]. An individual consonant or vowel sound is called a P H ONETIC SEGMENT, and each symbol in a typical phonetic transcription stands for a segment. This kind of representation suggests that each segment is a brief but static configuration of the speech organs and that speech production involves changing from one state to another with instantaneous transitions between states. In fact, though, actual speech production is nowhere near so neat. The speech organs very seldom maintain a static configuration for any measurable length of time. IO Another problem is that when two segments are adjacent, the movements necessary to produce them overlap. For example, the vowel [u] in English soon is pronounced with rounded lips, and the vowel [i] in English seen is pronounced with spread lips. If you pronounce these two words and pay careful attention to your lips, you'll notice that the [s] in soon is pronounced with rounded lips, whereas the [s ] is seen is pronounced with spread lips. In other words, the lip position during the [s] in each word anticipates the position that's necessary 8

International Pho netic Associati on 1999 is the standard guide to the current version of the International Pho netic Alphabet. 10 9 Pullum and Ladusaw 1986. Abercrombie 1967:38.

6

Phonetics

for the following vowel. This overlapping or smearing of adjacent segments is known as C OARTI C ULATION, and coarticulation effects often extend beyond immediately adjacent segments. 11 One way to think about coarticulation is to regard the speech-organ configuration associated with a given segment as a TARGET CO N FIGURATION that isn't necessarily attained in actual speech. As a result of coarticulation, segments are squashed together into syllables, and it's syllables that seem to be the basic units of p roduction and perception.1 2 As we'll see in §6.1, it turns out to be very difficult to give a satisfactory definition of syllables, but they do seem to be intuitively natural units for ordinary speakers. On the other hand, the fact that most people can learn to use an alphabetic writing system without too much difficulty indicates that segmentsized units must have some sort of psychological reality as well. 13 The important point for now is just that segment-by-segment phonetic transcriptions involve a significant degree of abstraction. But such transcriptions are very convenient, and we'll rely on them heavily in this book. Much of the remainder of this chapter is concerned with the articulatory description of segments and with the IPA symbols used to transcribe them. You should always keep in mind that no transcription, no matter how careful, is really complete. Minute differences are generally ignored, but the analyst's native language, experience with other languages, and phonetic training all influence which details are included and which are left out. 14 A relatively precise transcription that provides a lot of details is called a NAR ROW T RAN SCRIPTION, and a relatively imprecise one is called a BROAD TRAN SCRIPT ION. 15 1.6 Length

Although segments don't in general correspond to static configurations of the speech organs (§1.5), some segments clearly last longer than others. For most purposes it's sufficient to distinguish two degrees of duration: short and long. In IPA transcription, a colon-like LEN G TH MARK [:] is written after the appropriate symbol to indicate that a particular segment is long. For example, the first vowel in koto :J 'coat' is longer than the first vowel in koto ~

r

11

12 Abercrombie 1967:37, Lieberman 1977:120-1. Farnetani 1997, Hardcastle and Hewlitt 1999. Liberman ( 1996:1-43) argues that a specialized phonetic mode of perception has evolved in humans that alJows effortless decoding of acoustic syllables into their component segments. For the idea that alphabetic letters represent intuitively real segment-sized units, see Saussure 1959:38-9, Fromkin and Rodman 1998:498. For objections, see Aronoff 1992, Daniels 1992, Faber 1992, Port 2007. See also Prince 1992, Fujimura 2000. 14 Bloomfield 1933:84 - 5, Pike 1943:109-10. 15 The terms "narrow" and "broad" go back to Henry Sweet, who used "broad" to mean "phonemic" (see Chapter 2 below). For discussion, see Matthews 2001:32-5, International Phonetic Association 1999:28, Rogers 2000:46-7. 13

7

1.8 Vowels

'koto': [ko:to] versus [koto]. We'll look at Japanese length distinctions very carefully in later chapters (§3.2, §§5.1-2, §§5.4-5).

1.7 Suprasegmentals Certain features of speech are ordinarily represented in phonetic transcriptions as superimposed on segments rather than being integral parts of segments. Such features are usually called SUPRAS E GME N TAL, and they're represented this way because linguists regard them as properties of units larger than single segments. For example, in the English word mama the first syllable has greater prominence or STR ESS than the second, and the degree of stress is usually considered a property of a syllable as a whole rather than a property of any of the individual segments that make up a syllable. The IPA transcription for a stress difference is a vertical stroke before the more prominent syllable: ['mama]. In terms of production, a syllable with a higher degree of stress is produced with more energy. 16 Another feature that's ordinarily treated as suprasegmental is PI TC H. All voiced sounds involve vocal-fold vibration (§1.3), and listeners perceive differences in the FR E Q U E NC Y of this vibration as differences in pitch. Within the limits of a speaker's voice range, the rate of vibration is under voluntary control and is determined mainly by the tension of the vocal folds. 17 When a particular pitch pattern is a property of a syllable, it's called a TONE; when it's a property of a unit such as a phrase, it's called I N T ONA TI ON . Descriptions of Japanese usually say that it has PIT C H ACCEN T. Roughly speaking, the idea is that pitch-accent patterns are properties of words, and a word's inherent pitch pattern has to be integrated into the intonation pattern of any phrase that contains that word. We'll look at the details of the Japanese pitch-accent system in Chapter 7.

1.8 Vowels Vowels are produced by positioning the speech organs so that the vocal tract is free of significant obstructions above the glottis. Our focus here is on what's called VOWEL Q U ALITY. If two people who speak the same language pronounce the same vowel, the two versions will probably be different in several respects. One person's vowel might be longer than the other person's, or it might be louder, or it might be on a higher pitch. Also, every individual has a distinctive voice quality, which is why you can recognize a familiar person's 16

Catford 1977:84-5, 1988:32-5, Laver 1994:511-23.

17

Hardcastle 1976:83-7.

8

9

Phonetics Front .. "11111 - - - - - -IJloa Back

Front ] is frequent in Japanese (§4.2); for example, it's the initial consonant in fune Jil'd 'boat'. The voiced bilabial fricative [~] also occurs occasionally in Japanese (§4.2). As noted in Table 1-3, English has the APICO-ALVEOLAR stops [t] and [d], and a very narrow transcription would require the IPA diacritic [_] to specify this place of articulation precisely: [!] and [g]. Many other languages, including French and Spanish, have APICO-DENTAL stops instead: [!] and [c;l]. As we'll see in §4.1, Japanese has LAMINO-ALVEOLAR stops: [t] and [g]. In other words, the portion of the tongue that touches the alveolar ridge in Japanese is the blade (LAMINA) rather than the tip (APEX). As in Table 1-3, we usually dispense with the diacritics when this degree of precision is unnecessary. The English stops [k] and [g] are DORSO-VELAR, which means the closure is between the body (ooRSUM) of the tongue and the velum. Many other languages have dorso-velar fricatives . For example, the final consonant (spelled ch) in the German word Buch 'book' is the voiceless dorso-velar fricative [x], and the voiced dorso-velar fricative [y] occurs occasionally in Japanese (§4.2) . There aren't any ooRso-uvuLAR segments in English, but many languages use this place of articulation. The UVULA is the small, cone-shaped mass of tissue that hangs down at the very back of the velum, so the closure for a dorso-uvular stop is between the back of the tongue body and the rearmost part of the velum. The voiceless dorso-uvular stop [q] occurs in Arabic, as in

14

15

Phonetics

qalb [qalb] ~'heart'. Japanese doesn't have any dorso-uvular obstruents, but it does have a dorso-uvular nasal (§1.10, §5.1 ). The English fricatives [8] and [o] are I NTERDENTAL, which means that the tip of the tongue protrudes from between the lower and upper teeth. Air flowing through the narrow gap between the upper surface of the tongue and the upper teeth causes the turbulence in these fricatives. Strictly speaking, the IPA symbols [8] and [o] represent apico-dental fricatives, produced with the tip of the tongue slightly farther back than in the English segments. The IPA diacritic LJ indicates a slightly advanced place of articulation, so in a very narrow transcription these two English fricatives would be and [¢].22 As we'll see in §4.2, the voiced apico-dental fricative [o] occurs occasionally in

Figure

ml

Japanese. The place of articulation in English [s] and [z] is lamina-alveolar, but the loud hissing of these fricatives is actually caused mostly by the teeth. The stream of air picks up speed as it passes through the lamina-alveolar constriction and then rushes through the narrow gap between the upper and lower teeth. Another characteristic of [s] and [z] is that the tongue forms a front-to-back channel behind the constriction. In other words, in a sideto-side cross-sectional view, the center of the tongue is lower than the sides, as shown on the left in Figure 1-10. The obstruction in English [J] and [3] is LAMJ N O-POSTALVEOLAR, that is, slightly farther back on the roof of the mouth than in [s] and [z]. Also, as shown on the right in Figure 1-10, the tongue doesn't form a channel behind the constriction in [J] and [3]. The loud hushing of [J] and [3], like the hissing of [s] and [z], is caused mostly by the teeth. These noisy hissing and hushing fricatives are traditionally called s1 Bl L ANTS. Japanese has sibilant fricatives with a place of articulation that isn't larninoalveolar and isn't lamino-postalveolar either (§4.2). The IPA symbols for these fricatives are [c] for voiceless and[~] for voiced, and an example of [c] is the first segment in shima l¥Ji 'island'. The main difference between [c] and [J] is that the constriction for [c] is longer from front to b ack, as Figure 1-11 shows.23 The IPA label for the upper articulator in [c] and[~] is ALV EOLO-PALATAL, but I'll use the slightly less awkward variant ALVEO PALAT A L in this book. This label is certainly appropriate for a long constriction that stretches from the alveolar ridge to the hard palate. The lower articulator in [c] and [~] is the blade of the tongue, so in standard lower-upper terminology they're L AM r oA LV EO PA LATAL fricatives. Japanese also has the voiceless DOR SO - PA L ATA L fricative [9] (§4.2). For example, it's the first segment in hima B~ 'free time'. The same sound occurs in 22

Ladefoged and Maddieson 1996: l 43 - 4, Ladefoged 2007: 164.

23

Akamatsu 1997:91-3.

Figure

1.9 Obstruents

Hard pa late surface

· 1-10

Side-to-side cross-sections of [s) and [J1 behind the alveolar ridge (adapted from Ladefoged and Maddieson 1996:147,149)

[s)

1-11

Typical tongue positions for [s], [c], and

[~)

[SJ

[SJ

German, as in ich [ic;:] 'I'. When the place of articulation for a fricative is inside the mouth, it's hard to feel exactly where it is because the maximum obstruction isn't a complete closure. But there's a simple trick that can make things easier. As an example, get ready to pronounce a normal [s] but then, instead of exhaling, inhale forcefully. You'll feel a distinct coolness on the roof of your mouth where the narrowest constriction for [s] is. 24 Then try the same thing with [SJ,[~], and [c;:] for comparison. This exercise should help you decide how well the descriptions I've given match your pronunciation. Table 1-3 lists English [h] as a GLOTTAL FRICATIVE. The glottis is open, but the vocal folds are close enough together that the air flowing through produces turbulence. A GLOTTAL STOP [?] is produced by bringing the vocal folds together to interrupt the flow of air from the lungs completely. Glottal stops are actually very common in English, although they don't function as ordinary consonants. For example, the sound in the middle of the interjection oh-oh is a glottal stop. Glottal stops occur in Japanese, too, and we'll look at the problems they raise in §5.6 and §8.3. Many languages have voiceless stops that are accompanied by ASPIRATION, that is, a short voiceless puff of air following the release of the obstruction. The English words speak, stun, and skull in Table 1-3 have the UNASPIRATED 24 Ashby

( 1995:56) recommends the forceful inhalation trick for approximants (§1.11), but it works just as well for fricatives.

16

Phonetics

17

1.1 0

Figure 1-12

Demonstrating aspiration with a piece of paper

voiceless stops [p], [t], and [k] immediately following the initial [s], but compare the words peak, ton, and cull. In peak, for instance, aspiration intervenes between the release of the [p] and the beginning of voicing for the vowel [i]. In other words, the initial stop in peak is As P 1RAT ED. If you're a native speaker of English, you can demonstrate the presence of aspiration very vividly with a small piece of paper. Grip one end of the paper between your thumb and forefinger and hold it up so that the loose end is about a centimeter in front of your lips, as in Figure 1-12. Then pronounce the word peak. The loose end of the paper will twitch very noticeably. You won't see the same twitching if you pronounce speak instead. The initial consonants in ton and cull are also aspirated. The IPA symbol for aspiration is a small raised h after the appropriate consonant symbol, so peak begins with [ph], ton begins with [th], and cull begins with [kh].

~

1.10 Sonorants

;onorants Consonants other than obstruents are commonly called s oNOR AN T S. Sonorants are usually voiced, and they involve articulations that allow air to flow through the vocal tract without turbulence. One important class of sonorant segments is N A SA L S. A nasal is produced with the same kind of complete closure in the oral cavity as a stop, but the velum is open so that air can escape freely through the nose (§1.4) . Three nasals that occur in English are bilabial [m] (as in sum), apico-alveolar [n] (as in sun), and dorso-velar [IJ] (spelled ng, as in sung) . In the same way as apico-alveolar stops (§1.9), a narrow transcription of an apico-alveolar nasal requires a diacritic to specify the place of articulation precisely: [Q]. Many other languages, such as French and Spanish, have the apico-dental nasal [IJ], and as we'll see in §4.4, Japanese has the lamino-alveolar nasal [JJ]. Here again, we usually dispense with the diacritics when such precision is unnecessary. Japanese also has the lamino-alveopalatal nasal [Jl], as in ni -= [pi] 'two' (§4.4), and the long dorso-uvular nasal [N:], as in utterance-final hon .:;$: [h6N:] 'book' (§5.1). Another class of sonorant consonants is L AT ER A L S. For example, the initial consonant in the English word lake has an apico-alveolar place of articulation like [t], [d], and [n], but only the center of the tongue is in contact with the upper articulator. This partial seal allows the airstream to flow out freely over the sides of the tongue. The IPA symbol for this kind of voiced lateral sonorant is [l]. Most varieties of Japanese don't have any laterals (§4.5). A third class of sonorant consonants is RH OT r cs (r-sounds). One common rhotic is the voiced apico-alveolar T A P [r], which is produced by throwing the tip of the tongue against the alveolar ridge so that the duration of the contact is extremely short. This sound is the typical pronunciation of the r-sound in many languages, including Japanese (§4.5), but a nearly identical sound, usually called a FL A P, occurs in American English, and I'll transcribe it with the same IPA symbol. 25 Most Americans can pronounce patty and paddy identically as [p"reri] (§2.10). Another common rhotic is the voiced apico-alveolar T R r LL [ r], which involves holding the tongue loosely with the tip near the alveolar ridge and setting the tongue into vibration so that the tip makes repeated contact with the alveolar ridge. Spanish has this trill in words spelled with an initial r, such as roca [roko] 'rock'. The American English r-sound is sometimes described as a voiced retroflex approximant. An APPRO X JMA NT is pronounced with one articulator close to another, but the opening between them is too wide to produce turbulence if 25

On the distinction between taps and flaps, see Ladefoged 1968:30, Laver 1994: 142, Ladefoged and Maddieson 1996:231-2.

18

Phonetics

19

the sound is voiced. 26 When there's no voicing, the same articulation ordinarily does produce turbulence. 27 The reason is that in voiced sounds some of the energy of the airstream goes into producing the phonation (§1.3). As a result, other things being equal, the airstream flowing through a narrow opening in the vocal tract has more energy when there's no voicing. Notice that the lateral [l] is an approximant; if you pronounce it without voicing, it should sound like a weak fricative. A RETROFLEX segment involves curling the tip of the tongue upward. Most American speakers produce the English r-sound with one approximant constriction in the lower pharynx, another approximant constriction in the mouth, and lip rounding. For some of these speakers, the oral constriction really is retroflex (API CO -POSTALV EOLAR), but for many others it's dorso-palatal, with no raising of the tongue tip. 28 IPA [-lJ represents a retroflex approximant, and IPA [J] represents a dental, alveolar, or postalveolar approximant, but [J] is usually pressed into service for the non-retroflex pronunciation. 29 I'll ignore all these complications in the rest of this book and just use [1] in broad transcriptions of English. The German r-sound is often pronounced as a voiced dorso-uvular approximant.30 There's no IPA symbol for this sound, so I'll use [If], which combines T the symbol [If] for a voiced dorso-uvular fricative with the diacritic [T ] to indicate a more open constriction. 31 An example is rot [lfo:t] 'red'. The articulatory T movements involved in approximant r-sounds are clearly very different from those involved in the tap and trill sounds described above. In other words, in terms of articulation, the class of rhotics is quite heterogeneous. There don't seem to be any acoustic properties that all rhotics share either. 32 It's common to group rhotics and approximant laterals together as LIQUIDS. It's not unusual for a nasal or a liquid to be SYLLABIC, that is, to serve as the NUCLEUS of a syllable. A syllabic consonant is transcribed by adding a short vertical stroke beneath the appropriate consonant symbol. For example, in ordinary pronunciation the last syllable of the English word button contains no vowel. Instead, it consists entirely of a syllabic nasal, and we could transcribe the word as [bAtn]. I The last class of sonorant consonants we'll consider is SEMIVOWELS, also known as GLIDES. A semivowel is produced very much like a vowel, 33 but it normally has a narrower constriction than the most similar vowel. 26

Ladefoged 1968:25, Catford 1977:251 , Laver 1994:270. Catford 1977:118-22, Clark and Yallop 1990:85-6. 28 Laver 1994:300, Ladefoged and Maddieson 1996:234-5. 29 30 Ladefoged and Maddieson 1996:234, Ladefoged 1999:41. Hall 1993. 31 32 This is the symbol that Rogers (2000:219) suggests. Lindau 1985:166. 33 Ladefoged and Maddieson 1996:322-3, International Phonetic Association 1999:6. 27

1.11

~

1.11 Secondary and double articulations

Semivowels are typically too short to constitute separate syllables, but they can be long. The important thing is that a semivowel is adjacent to a more open vowel segment that constitutes a syllable nucleus. English yet begins with a semivowel produced like the vowel [i] and transcribed as [j]. It's common to refer to [j] as a front unrounded semivowel, since [i] is a high front unrounded vowel (§1.8). It's also common to refer to [j] as a palatal semivowel. We can describe its place of articulation as dorsopalatal, since this is where we find the narrowest constriction in the vocal tract during a high front vowel. English wet begins with a semivowel produced like the vowel [u] and transcribed as [w]. It's common to refer to [w] as a back rounded semivowel, since [u] is a high back rounded vowel (§1.8) . It's also common to refer to [w] as a labial-velar semivowel, since there are narrow openings at two locations in the vocal tract. The tongue position for [u] produces a dorso-velar constriction, and the lip position produces a bilabial constriction. I'll say a little more about such double articulations in § 1.11 below. The IPA symbol [u:i] represents a back unrounded semivowel, that is, a semivowel produced like the high back unrounded vowel [m] (§1.8) . We'll discuss [u:i] in Japanese in §4.6. Notice that [j], [u:i], and [w] are all approximants, since we get turbulence in the absence of voicing. A voiceless [j] is the dorso-palatal fricative [y], and a voiceless [u:i] is the dorso-velar fricative [x] . A voiceless [w] is the fricative [M.] that occurs at the beginning of words like whet in varieties of English that distinguish whet [M.£t] from wet [W£t].

Secondary and double articulations A SEC O N DAR Y ARTI C ULATIO N is an articulation that occurs simultaneously with another articulation (known as the P RIMARY A RTI C ULATIO N ) but involves a lesser degree of constriction. It's sometimes convenient to describe secondary articulations as the addition of a vowel-like articulation to a consonant articulation.34 The most familiar secondary articulation is probably PALATA LI ZATI ON, which involves superimposing the tongue shape for the vowel [i] on a consonant. As we noted in §1.10, this tongue shape produces a dorso-palatal constriction. To give just one example, coarticulation (§1.5) produces obvious palatalization on the voiced interdental fricative at the beginning of English these. If you compare the exact position of the tongue at the beginning of these with the position at the beginning of thus, you can easily 34

Ladefoged and Maddieson 1996:354-5.

20

21

Phonetics

observe how the position in these anticipates the following [i]. The IPA symbol for palatalization is a small raised j after the appropriate consonant symbol, so we could transcribe these very narrowly as [oiiz]. In VELAR I ZAT I ON the tongue shape for a high back vowel (back of the tongue near the velum) is superimposed on a consonant articulation. In PHARYNGEA LI ZATI ON, the back of the tongue is retracted toward the rear wall of the P H ARY x (labeled @ in Figure 1-9 in §1.9).35 An English lateral at the end of a syllable is pharyngealized, as you can easily tell by noting the position of the dorsum at the end of the word sell. 36 For many English speakers, a lateral at the beginning of a syllable isn't pharyngealized, which means that there's no dorsal retraction at the beginning of the word less. The IPA diacritic H can represent either velarization or pharyngealization, and the auditory impression of these two secondary articulations is similar. 37 You'll often see the labels "dark l" for pharyngealized [t] (as in sell [sd]) and "clear l" for unpharyngealized [l] (as in less [lc:s]). Another common secondary articulation is L AB I ALIZ AT I ON, which is simply lip rounding superimposed on a consonant articulation. The English and [3] are usually labialized, as you can see lamino-postalveolar fricatives just by watching a native speaker pronounce words like dish and beige. The IPA symbol for labialization is a small raised w after the appropriate consonant symbol, so we could transcribe dish and beige very narrowly as [d!Jw] and [b~w]. A DOU BL E A RTI CU LATI ON refers to two simultaneous articulations both involving the same degree of constriction.38 The best known examples are the LA BI AL - VELAR consonants found in many West African languages. The IPA symbol for a voiceless labial-velar stop is with a ligature connecting the symbols for the two articulations. As I noted in the discussion of semivowels in §1.10, [w] is a labial-velar, but the two simultan eous articulations are both approximants rather than stops.

m

[kP],

1.12 Acoustic displays As promised above in §1.1, this section provides a rudimentary introduction to acoustic displays of speech. One type of display is a WAVEFO RM, which shows A MPLIT U D E on the vertical axis plotted against time on the horizontal axis. A listener perceives differences in amplitude as differences in loudness. 39 35

36 Catfo rd 1988: 109-10, Laver 1994: 326-30. Sproat and Fuj imu ra 1993:309. 38 Ladefoged and Maddieson 1996:328 - 50. Catfo rd 1988:109. 39 Ladefoged 1996:80- 9, Johnso n 1997:50-4. 37

0.482

QI

"O

.€ c. E

]

[v]

bilabial labiodental (apico-/lamino-)alveolar (dorso-)palatal (dorso-)velar 10

[kha::t] is an IPA transcription of cat in United States newscaster English . What does the symbol [h] represent?

11

What IPA symbol is used to transcribe the typical Tokyo pronunciation of the consonant sound in iro 15 'color'? How would a phonetician describe this sound?

12

[~i:dii:] is a fairly narrow IPA transcription of shidi ':/ - ·'f'1- 'CD' in Tokyo

Japanese. What does the symbol [i] represent? Give an explanation for why this phonetic property is natural in this word.

13

What acoustic property does the vertical axis correspond to in a waveform display? How about in a spectrogram?

27

2 Phonemics

2.1 Phonology PHONOLOGY is the study of how speech sounds are organized and put to systematic use in actual languages. What exactly counts as phonology differs depending on the theory, but the notion of CONTR AST is central in any approach. Generally speaking, a phonological description involves an inventory of contrastive units and some way of specifying how those units can combine. In this chapter I'll outline the basics of a traditional approach known as PHONEM I CS. The phonetic segments (§1.5) of a language are often called PH o N ES, and the phonemic approach involves interpreting phones as concrete manifestations of abstract elements called PHON EMES. As we'll see in detail below, we establish the inventory of phonemes by demonstrating contrasts between phonetic segments. The permissible combinations of the phonemes in a language are the PHONOTACTics of that language. A list of phonemes unaccompanied by phonotactic specifications isn't very informative. The most common way of providing phonotactic information is to describe the syllable shapes that a language allows, and we'll look at a few examples in §2.8.

2.2 Contrast and minimal pairs The standard way of showing that two phonetic segments contrast in a language is to find a MINIMAL PAIR, that is, two words that contain the same segments in the same order except at one point. For example, the English words gas [gres] and mass [mres] are a minimal pair. Each word can be transcribed in IPA (§LS) with three segments, and the only difference is that gas has [g] where mass has [m], so this pair demonstrates that [g] and [m] contrast in English. As a convenient way of recording the contrast we've demonstrated, we 26

2.3 Alli

2.3 Allophones and phonemic symbols

can write [g] ~ [m]. The words gas [gres] and guess [gi::s] are another minimal pair, demonstrating that [re]~ [e]. The minimal pair gas [gres] and gash [greJ] demonstrates that [s] ~ [J]. If two phonetic segments contrast with each other, they have to be manifestations of different phonemes. The idea is that the phonemes of a language function to differentiate meanings. In other words, if you substitute one phoneme for another, you ordinarily get something with a different meaning or something with no meaning at all. Minimal pairs illustrate the first of these two outcomes; if you substitute [e] for the [re] in gas [gres ], you get guess [gi::s ], which means something different. As an illustration of the second outcome, consider what happens when you substitute [i::] for the [re] in gash [greJ]. You get [gi::J], which doesn't mean anything in English. When three or more words contain the same segments in the same order except at one point, they're called a MI N IMAL SET. The English words bard [bwd], card [khwd], guard [gwd], hard [hwd], lard [lwd], and yard [jwd] are a minimal set. Any two words in a minimal set are a minimal pair.

ophones and phonemic symbols So far we don't have any reason to say that phonemes are any more abstract than phonetic segments. All we've done is demonstrate that certain phonetic segments, such as [s] and [J], contrast in English. But now let's consider a different kind of example - one in which different phonetic segments don't contrast. Compare the English words peak [phik] and speak [spik]. As we saw in § 1.9, peak contains a voiceless aspirated bilabial stop [ph], while speak contains a voiceless unaspirated bilabial stop [p ]. If you substitute [ph] for the normal [p] in speak, you get an odd-sounding pronunciation of speak; you don't get some other English word. The same sort of thing happens if you substitute [p] for the normal [ph] in peak, and no matter how hard you try, you won't find any minimal pairs that demonstrate a contrast between [ph] and [p] in English. In fact, most native speakers of English are quite surprised to discover that the phonetic segment corresponding to the p in peak is actually different from the phonetic segment corresponding to the p in speak. A phonemic analysis of English reflects this native-speaker intuition by saying that [pb] and [p] are different realizations of a single abstract element, namely, the phoneme Ip/. The standard practice is to enclose phonemes in slanted lines. The different realizations of a phoneme are called ALLO P HO NES of that phoneme, so [ph] and [p] are allophones of /pl. We could actually choose any symbol we like to represent the phoneme. After all, we're dealing with an abstract entity, not with a concrete phonetic segment, so we could just as well

28

29

Phonemics

Ip/

Ab stract entity:

/\ Concrete rea lizations: Figure 2-1

[p]

First letter of the alph abet

/\

A

a

Analogy between allophones and roman letters

represent it as l * I or as/~/. In most cases, though, the IPA symbol for one of the allophones is a practical choice for the phonemic symbol. An IPA symbol has the obvious virtue of being mnemonic; it isn't hard to remember that /p / has the allophones [ph] and [p] when we represent the phoneme this way. The reason for preferring /p / to /ph/ is just graphic simplicity. In many cases, though, there are good reasons to choose a non-IPA symbol for a phoneme. Ease of typing is often a relevant consideration, although it isn't as important today as it was before the proliferation of microcomputers. For example, if a language has a phoneme that's realized as a voiced bilabial fricative [~], it's easier just to type it as /Bl than to use a character from some special font. Another factor that often comes into play is orthography. English has a phoneme that's realized as the high front semivowel [j], but the letter that usually corresponds to it in ordinary spelling is y, and most linguists represent it as /y/. A third consideration is tradition; once a non-IPA symbol becomes established, there's usually no compelling reason to reject it. In later chapters of this book we'll adopt non-IPA symbols for several Japanese phonemes. You can look at the allophones of a phoneme as analogous to the capital and small letters of the Roman alphabet. In any given font, the first letter of the alphabet appears in either of two quite different shapes, and both shapes are concrete realizations of this abstract first letter. Figure 2-1 diagrams the analogy. If you substitute capital A for small a in peak, you get an odd-looking version of the same word (peAk); you don't get some other English word.

2.4 Allophones in complementary distribution A vital part of a phonological description of a language is specifying the circumstances under which each allophone of a phoneme appears. In many cases, the appearance of a particular allophone is predictable from the nearby sounds. As a simple example, compare the phonetic segments corresponding to then in tense [thens] and then in tenth [the!f8]. The oral closure for the [n] in [thens] is apico-alveolar, but the oral closure for the [If] in [the!f8] is distinctly farther forward. In fact, it's typically interdental (§1.9), which is why the transcription uses the IPA diacritic [J for an advanced place of articulation.

2.5 Al

2.5 Allophones in free variation

There's no contrast between apico-alveolar [n] and interdental [I)] in English, and it seems safe to say that native speakers feel them to be the same sound. In other words, [n] and [I)] are allophones of the same phoneme, which we can represent as /n/. The point of this example is that the [I)] allophone only appears immediately before an interdental fricative ([8] or [o]), while the [n] allophone never appears in that position. In other words the appearance of the [I)] allophone is predictable, and the obvious explanation for its interdental place of articulation is coarticulation (§1.5) . Two or more phonetic segments that never appear in identical surroundings are said to be in CO MPL EM ENTAR Y DI STRIB U TIO N ' and allophones of the same phoneme are often in this relationship. 1 As explained in the preceding paragraph, apico-alveolar [n] and interdental [I)] are in complementary distribution in English. It's important to understand, though, that complementary distribution alone isn't enough to show that different phonetic segments should be analyzed as allophones of a single phoneme. A famous example will make the problem clear. It's well known that [h] and [IJ] are in complementary distribution in English: [h] can only appear at the beginning of a syllable, and [IJ] can only appear after the vowel in a syllable. 2 The word hang [hrelJ] is a perfectly normal syllable, but [1Jieh], with [IJ] at the beginning and [h] at the end, is doubly bizarre. Even so, no one would seriously propose that [h] and [IJ] are allophones of the same phoneme, presumably because a voiceless glottal fricative and a dorso-velar nasal have so little in common. You can think of allophones in complementary distribution as like Superman and Clark Kent. You never see Superman wearing a business suit and glasses, and you never see Clark Kent wearing a red-and-blue outfit with a cape. Even so, you wouldn't suspect the two of being the same person if they didn't resemble each other physically. The idea that allophones of the same phoneme should resemble each other is known as the criterion of PH ONE TI C SIM 1LARI TY. It seems to be exactly what we want in the case of English [h] and [IJ], but things aren't always so clear. It isn't obvious how similar is similar enough, or even how to measure similarity in the first place. 3

llophones in free variation Another common relationship among allophones is so-called F REE VA RI A T 1o N . Different phonetic segments are in free variation if they occur in exactly 1 Lass

2 1984:1 8-21. Hawkins 1984: 26-7, Lass 1984:19, Kubozono and Honma 2002:27-8. 1984:27- 30.

3 Hawkins

30

Phonem ics

31

the same surroundings but never contrast. For example, an English voiceless stop before a pause can be pronounced either with or without an audible release of the closure. Using the IPA diacritic['] to indicate lack of release, the phonetic segment corresponding to the pat the end of the sentence Wake up! can be p ronounced either as released [p], with the lips separating audibly, or as unreleased [p'). Analyzing both [p] and [p,] as allophones of /p l reflects the intuition of native speakers of English that these two phonetic segments are the same sound. The term free variation suggests that the distribution of the relevant allophones is completely random, but truly random distribution probably isn't very common. 4 We often find systematic relationships between particular allophones and linguistic, situational, or social factors, and these relationships are typically matters of probability. For example, a certain allophone might be statistically more likely to occur at the beginning of a phrase (a linguistic factor), or in a formal interview (a situational factor), or in the speech of teenagers (a social factor) . I'll say a little more about this kind of variation in §2. 12. Occasionally, two different phonemic forms represent the same meaning. The word ration is a well-known example in English. Both [1reJ5n] and [1tif~n] are possible pronunciations, but [re) and [ti] can't be allophones of the same phoneme. It's easy to find minimal pairs like rat [1ret] versus rate [1tit] , so [re] ;e [ti]. It's common to say that the alternative pronunciations in a case like this are in free variation, but it's PH ONE MI C FREE VAR I ATI ON , not allophonic. In other words, a single word can be represented by either of two different phonemic forms; ration can be either /rreJ;m/ or /reJ;m/. A Japanese example is sabishii- samishii ~ LIi) 'lonely', with [b) in the first pronunciation and [m] in the second. We won't get to Japanese consonants until Chapter 4, but minimal pairs like yubi 'finger' versus y umi t§ 'bow' show that [b] ;C [ m] in Japanese. Phonemic free variation is sporadic - an idiosyncratic property of particular words. On the other hand, like allophonic free variation, phonemic free variation typically isn't just random. For one thing, each individual speaker usually prefers one pronunciation to the other. The opposite of phonemic free variation - two different words represented by a single phonemic form - is quite common. English see lsil and sea lsil and Japanese tako !l!ili 'octopus' and tako Ji1it 'kite' are familiar examples, and such words are called HOM ONYMS .

m

2.6 Distinctive features

It's often convenient to refer to a group of segments by specifying the phonetic characteristics they have in common. For example, we talk about h igh 4

Labov 1972: 188-9, Kager 1999:404.

Figure 2-2

2.6 Distinctive features Nasal

Oral

Bilabia l

[m]

[b]

Ap ico-alveola r

(n]

[d]

Place and nasa lity contrasts in so me Englis h consonants

vowels, voiced obstruents, and so on. A group containing all the segments in a language that share some set of phonetic characteristics is called a NAT u RA L CLASS ; the characteristics themselves are usually called phonetic FEATURES. 5 Features can be either articulatory or auditory characteristics.6 The minimal set gnash [nreJ], mash [mreJ], bash [breJ], and dash [dreJ] shows that [n] ;z0 [m] ;z0 [b] ;z0 [ d] in English. All four of these contrasting segments are voiced and involve a complete closure in the oral cavity.As Figure 2-2 shows, we can group them into pairs on either of two dimensions. The two rows differ in place of articulation (bilabial versus apico-alveolar), and the two columns differ in nasality (nasal versus oral). It makes sense to think of nasal segments as having a feature that oral segments lack, and the conventional way of referring to such differences is to put a plus or a minus in front of a feature label and enclose the combination in square brackets. Applying this convention to the segments in Figure 2-2, [m] and [n] are [+nasal], and [b] and [d] are [-nasal] . Since the presence or absence of nasality in a segment can serve to keep English words apart, nasality is a DI STI NC TIV E FE ATURE in the English phonological system. It seems reasonable to look at voicing as another feature that some segments have and other segments lack. Minimal pairs like vast [vrest] versus fast [frest] show that voicing, like nasality, is a distinctive feature in English. Both [v] and [f] are labiodental fricatives; the only difference is that [v] is [+voice] and [f] is [-voice] . In many cases, though, there's no straightforward way to describe a phonetic feature in terms of presence versus absence. Consider vowel height, for instance. As we saw in§ 1.8, we can locate vowels along a continuous scale from high to low. It's often convenient to talk about all the high vowels in a language as a group, and the notation [+high] serves this purpose, but it doesn't really make much sense to say that highness is a characteristic that a vowel either has or lacks. Even so, there's a long tradition of treating highness and lowness in precisely this way and distinguishing three different heights as in Figure 2-3. Notice that there's no label in the bottom right cell of Figure 2-3. The feature specifications [+high] and [+low] are contradictory because highness and lowness aren't independent dimensions. 5 Clark

6

and Yallop 1990:315-8, Halle 1992:207, Laver 1994: 110, jackendoff 2002:335. Jakobson , Fant, and Halle 1952, Chomsky and Halle 1968:293-329, Clark and Yallop I 990:3 15-8.

32

Figure 2-3

Phonem ics

(-high]

[+high]

(-low]

Mid

High

[+low]

Low

Using highness and lowness to characterize vowel height [m] voiced bilabial [ nasal

Figure 2-4

33

J

[f] voiceless labiodental [ fricative

J

l l l l [i]

voiceless aspirated velar stop

high front unrounded vowel

Traditional phonetic labels as informal feature specifications

Figure 2-5

Consonant place of articulation (§1.9) involves a much more complicated range of possibilities than vowel height, and it isn't especially helpful to try to position all these possibilities along some one-dimensional scale such as distance from the glottis. On the other hand, we do want to be able to refer to groups of consonants that share a particular place of articulation. It's not unusual to see feature specifications like [+bilabial] or [+alveolar], but these are just standard place-of-articulation labels pressed into service as feature names. Of course, it makes more sense to think of bilabialness as present or absent than it does to think of highness in those terms. This brief discussion of features is just the tip of a huge iceberg. A tremendous amount of research has gone into working out feature systems that factor place of articulation and other phonetic characteristics into specifications on several independent dimensions.7 We won't go into any of the details here because traditional phonetic labels are sufficient for our purposes in this book. Whenever it's convenient to use such labels as informal feature specifications, I'll simply enclose them in square brackets, without pluses or minuses, as in Figure 2-4.

2.7 Redundant features and allophonic rules Notice that not all the features listed in Figure 2-4 are distinctive in the English phonological system. We saw in §2.3 that aspirated [ph] and unaspirated [p] are both allophones of a single English phoneme, and the same is true of [t1'] and [t] and of [kh] and [k]. In general, the presence or absence of aspiration 7

Clements I 985, McCarthy 1988.

2.7 Redundant features and allophonic rules

Table 2-1 Examples of nasalized [Ji] in English

SPELLING bum PHONEMIC /bAIDI PHONETIC [bAln]

bun lbAn/ [bXn]

bunt lbAnt!

[bAilt]

bus lbAs/ [bAs]

bud !bAdl [bAd]

"-----''--~__;:_~_.:;_~~_.:;_~-'-~--+_.:;____;;;.__~---=-~~

[,\)

[A]

cr

A v

c

'',,,I

[nasal]

English vowel nasalization rule

never serves to distinguish one English word from another. A nondistinctive feature like aspiration is sometimes called a R E DU N DANT FEATUR E , but it's important to understand that redundancy in this sense doesn't mean irrelevance. The voiceless stops in pool [phut], tool [thut], and cool [khut] are aspirated, and those in spool [sput], stool [stut], and school [skut] are unaspirated. A person who uses an unaspirated [p] in pool or an aspirated [t"] in stool has a nonnative accent. In other words, the distribution of the aspirated and unaspirated allophones of /p/, /ti, and /k/ is part of the phonology of English. 8 Some of the other features listed in Figure 2-4 are distinctive in some circumstances and redundant in other circumstances in English. For instance, voicing is distinctive in obstruents but not in nasals or vowels, and nasality is distinctive in consonants but not in vowels. Let's look at nasality in vowels in a little more detail. Even though English doesn't have any phonemic contrasts between oral and nasal vowels, nasalized vowels (§1.8) are frequent. In general, an English vowel is nasalized if it's followed immediately by a nasal consonant in the same syllable- an obvious instance of coarticulation (§1.5). 9 The examples in Table 2-1 illustrate. The allophones [A] and [A.] are in complementary distribution (§2.4), and the nasality in [A.] is a redundant feature that's predictable from the ENVIRO N MENT, that is, the surroundings. One way of dealing with predictable features is to say that they're absent from phonemic forms and filled in automatically by rules. Figure 2-5 shows one way of writing a rule for English vowel nasalization. The a stands for a syllable, the 8 It

isn't easy to specify exactly when an English voiceless stop will be aspirated and when it won't, and I won't make an attempt here. See Hammond 1999:221-40. 9 Sloat, Taylor, and Hoard 1978: 112-3, Fromkin and Rodman 1998:259-60.

34

Phonemics

V stands for a vowel, and the C stands for a consonant. The solid lines connecting the o to the V and the C show that the vowel and the immediately following consonant belong to the same syllable. The solid line connecting the C to the feature [nasal] shows that the C is a nasal consonant. The dashed line connecting the feature [nasal] to the V means that the feature "spreads" to the V or gets "absorbed" by the V. In other words, a vowel in this environment becomes nasalized. Notice that this automatic vowel nasalization raises a problem for the standard method of citing minimal pairs to demonstrate phonemic contrasts (§2.2). Strictly speaking, the words bad [ba:d] and ban [bren] aren't a minimal pair because they differ in two segments, not just one: [red] versus [ren] . Of course, we can show that [d] ~ [n] by using words that begin with these segments, such as debt [d£t] versus net [n£t]. But the dorso-velar nasal [IJ] o nly occurs after the vowel of a syllable in English (§2.4), which leaves us with no way to demonstrate a contrast between [I)] and any non-nasal segment. T he minimal set rum [1Am], run [1Ari], and rung [.rAI)] shows that [m].., [n] .c [IJ], but rub [1Ab], rough [1Af], rug [1Ag], rush [1AJ] , and rut [1At] all differ from [1Al)] in two segments. Both [IJ] and [g] involve dorso-velar closure, and both are voiced, so the crucial difference is that [IJ] is nasal and [g] is oral. Suppose someone were to claim that what's distinctive in rung [1A1J] versus rug [1Ag] is the presence or absence of nasality in the vowel, and that what's redundant is the nasality in [IJ]. This claim amounts to saying that [I)] and [g] are allophones of a single phoneme, and that [A:] and [A] are realizations of two different phonemes. T his analysis may be logically coherent, but it certainly doesn't match the intuitions of native speakers. English speakers clearly feel that [IJ] and [g] are different consonants and that [A:] and [A] are both realizations of the same vowel. By treating vowel nasalization as redundant, as we did above, we can incorporate these intuitions into our phonemic transcriptions and represent rung as /r /\J]/ and rug as lrAgl. It's not at all unusual to see pairs like lrAIJ I and lrAgl described as minimal pairs, but extending the notion of a minimal pair to phonemic forms presupposes decisions about which features are distinctive and which are redundant. Some predictable features don't depend on the environment. As I m entioned in §1.11, the English lamino-postalveolar fricatives ISi and 131are usually labialized: [Jw] and [3w]. This rounding appears regardless of the surrounding segments, and it seems reasonable to treat it as a redundant feature that's added automatically to these fricatives. Figure 2-6 shows a simple labialization rule that does the job. The arrow indicates that whatever appears on its left "becomes" whatever appears on its right. Since this labialization rule doesn't

35

Figure 2-E

2.8 Ph] represents a vowel quality higher

2.11 Overlapping and neutralization Table 2-3 Vowel subsystems before / k/ and / r/ in English Syllables

[ik] [~k]

[Q9k] [uk] [rek] [Ak]

likl /ek/ /ok/ /uk/ /rek/ /Ak/

[rk] (Ek) [:ik] [uk] [ak]

/Jk/ /ek/ /:ik/ /uk/ /ak/

[p] [~]

!Ir! /Er/

[ ~)l]

/Ori /Ur/ [ru] /or/ [ \)1]

than [:i] but lower and less diphthongal than [Q9]. The transcription [t] represents a vowel quality between [i] and [1], and the transcription[\!] represents a vowel quality between [u] and [u]. Ifwe insist on assigning the four vowels in question to phonemes that we've already established, the phonetic similarity criterion (§2.4) narrows down the possibilities to two reasonable choices for each: Iii or hi for[!] (as in beer), /el or le! for[~] (as in bear), fol or /:i/ for[?] (as in bore) , and lu/ or /u/ for [y] (as in boor). Native speakers of this variety of English don't all have the same intuitions about these four problematic vowels, but at least some speakers find it very difficult to decide which of the two possibilities is the right choice in each case. This kind of uncertainty is consistent with an analysis that treats the vowels that occur before [k] and the vowels that occur before [1] as distinct subsystems of English phonology. Table 2-3 summarizes this analysis, using /r/ for the phoneme realized as [1] and capital letters for the four vowel phonemes other than !al that occur before /r/. As the phonemic symbols in the subsystem on the right imply, this analysis treats III as distinct from both Iii and !II, IE/ as distinct from both /el and /e/, IOI as distinct from both fol and h i , and IUI as distinct from both /u/ and /u/.30 As we'll see later in this book, Japanese consonants provide an even more dramatic example of neutralization. There are nineteen contrasting consonants syllable-initially (Chapter 4) but only two contrasting consonants syllable-finally (Chapter 5). Since native speakers of Japanese don't feel that either of the syllable-final consonants is the same as any of the syllable-initial consonants, there's no real doubt about treating the two sets of consonants as separate phonemic subsystems. Although some native speakers of English find it intuitively reasonable to treat the four high and mid vowels that occur before Ir! as a separate phonemic subsystem, as in Table 2-3, other speakers have intuitions that identify each of these four vowels with one of the vowel phonemes that occurs before l kl . For 30 Twaddell

1935.

46

Phonemics

47

example, many speakers identify the [~] in [~] as a realization of /e/, and for these speakers, /e/ just doesn't occur before a tautosyllabic /r/. The systematic absence of a phoneme in a particular environment or set of environments is called a DEFECTIVE DISTRIBUTION, and a defective distribution is just a kind of phonotactic restriction. There's nothing unusual about defective distributions; in fact, they're very common. Notice that English /el has a defective distribution regardless of whether it occurs before /r/, since a syllable can't end with /e/. As Table 2-3 shows, both [e] and [8] are possible before a syllable-final [k], so wreck [1ek] and rake [1~ik] are phonemically /rek/ and /rek/. Syllable- finally, though, only [8] occurs, as in ray [18]; a syllable pronounced [1e] isn't possible. Not surprisingly, despite the lack of contrast between [8] and [e], native speakers have the unambiguous intuition that syllable-final [8] is /e/, so ray [18] is phonemically /re/, and /re/ is a systematic gap in the English vocabulary. As another example, recall the claim earlier in this section that /ti is the intuitively correct phonemic analysis for a voiceless unaspirated [t] immediately following ls/ at the beginning of a syllable in English (as in words like stuff[stAf]) . Notice that this analysis leaves Id/ with a defective distribution: /d/ never occurs immediately following Isl at the beginning of a syllable. In the phonemic analysis of Japanese that I'll work out in Chapters 3-5, several phonemes have defective distributions.

2.12 Careful pronunciation In the discussion of neutralization in §2.11 just above, I mentioned CA R EF u L PRONUN C IATION several times. Careful pronunciation has a special status in phonological analysis because it seems to provide the foundation for nativespeaker intuitions. 31 Careful pronunciation is mostly a matter of tempo; pronunciation inevitably gets sloppier as it gets faster, although sloppy p ronunciation is certainly possible even at a slow tempo. For the sake of convenience, I'll refer to the fast and/or sloppy pronunciation that's typical of most everyday conversation as RAPID PRONUN C IATION, keeping in mind that more than just speed is involved. Rapid pronunciation often produces segments or sequences that native speakers would judge as unpronounceable or alien or phonotactically deviant in careful pronunciation. 32 For one thing, the influence of coarticulation increases in rapid pronunciation and leads to more ASSlM 1LAT IO N, that is, to phonemes becoming more like neighboring sounds. An example is English /n/ before a labiodental fricative in words like infant /mfant/ or convex /kanveks/. In careful pronunciation, 31

Jakobson and Halle 1962:466-7, Linell 1979:54-6, Lass 1984:294-5.

32

Linell 1979:1 20.

2.12 Careful pronunciation

the phoneme In/ in these words is realized as the apico-alveolar nasal [n] (§1.10), but in rapid pronunciation it's typically realized as the labiodental nasal [llJ]. If we had only the labiodental pronunciation to go on, it wouldn't be easy to decide whether [llJ] is an allophone of Im / or of In/, and we might argue that phonetic similarity (§2.4) favors /m/. But native speakers ofEnglish don't hesitate to say that these words contain In/, presumably because in careful pronunciation they have [n]. The typical native-speaker response to the segment [rl)] in isolation is that it's alien to the English sound system. We'll see in §4.1 that the Japanese phonemes /bl, /di, and /g/ are realized as the voiced stops [b], [d], and [g] in careful pronunciation but often as the voiced fricatives [~], [5], and [y] (§1.9) in rapid pronunciation. In isolation, these fricatives sound alien to native speakers of Japanese in the same way as [rl)] sounds alien to native speakers of English. Rapid pronunciation also favors certain kinds of ELI s 1o N, that is, deletion or disappearance of segments. For example, consonant clusters are often simplified, as in the pronunciation of fifths /fifes / as [frfs]. In careful pronunciation we have [frf8s], so we don't need to say that this word has two alternate phonemic forms as we did in cases of phonemic free variation (§2.5) . As another example, consider English syllables that end in a nasal consonant followed by an obstruent, as in words like bump /bAmpl and sent /sent/. We've already seen that a vowel is nasalized before a tautosyllabic nasal consonant even in careful pronunciation (§2.7), but in rapid pronunciation the closure for the nasal consonant can disappear entirely. When it does, vowel nasalization ends up carrying the contrast between words, as in sent realized as [s£t] contrasting with set realized as [set]. 33 This contrastiveness doesn't mean that a phonological analysis of English has to recognize a phonemic distinction between nasal and oral vowels. In careful pronunciation such words have ordinary nasal consonants, and native speakers of English often find it hard to pronounce nasal vowels in isolation. Also typical of rapid pronunciation are certain kinds of EPENTH ESI s, that is, insertion of segments. For example, something /sAID8rr]/ is often pronounced with a [p] between the [m] that realizes Im / and the [8] that realizes /8/: [si\rnp8!IJ]. Here again, we don't need two alternate phonemic forms, one with /p/ and one without, since a careful pronunciation of this word has [m8]. While it seems reasonable to treat all these characteristics of rapid pronunciation as systematic deviations from a phonemic system based on careful pronunciation, the notion of careful pronunciation is trickier than it seems at first. One problem is that there's such a thingas ELABORATE D PRO NUNC IATIO N' 33 Shibatani

1990:172, Bybee 2001 :44 - 8.

48

Phonemics

that is, overly or unnaturally careful pronunciation of the sort that people sometimes use when they want to articulate the sounds of a word very distinctly. Elaborated pronunciation involves what are sometimes called SHARPENING PROCESSES, and surprisingly, such processes can actually neutralize phonemic distinctions.34 An example from English is the overly precise pronunciation of train /tren/ as [th-i§n], with a syllabic [-j]. This pronunciation obliterates the distinction between the normally one-syllable word train and the normally two-syllable word terrain. It seems clear that we should treat the characteristics of elaborated pronunciation in the same way as the characteristics of rapid pronunciation - as deviations from the phonemic system. But it's not at all obvious how to distinguish elaborated pronunciation and careful pronunciation from each other in a consistent way. The overlapping of It / and /d/ is a case in point. I said above in §2.11 that careful pronunciation of words like coated and coded have [t] and [d] rather than [r]: [k11 QJ;It;:id] versus [k11 QJ;ld;:id]. But the pronunciations with [t] and [d] strike many native English speakers as unnaturally precise, that is, elaborated rather than careful. It's also important to distinguish the sloppiness of rapid pronunciation from other aspects of casual conversation such as CON TRACTIONS. For example, the English contraction can't presumably originated as a rapid pronunciation of cannot, but for a present-day speaker of English, the difference between these two forms isn't a matter of tempo or precision. The contracted form is perfectly normal even in careful pronunciation, but we might describe can't and other contractions as characteristic of CASUAL SPEEC H . 35 Casual speech is what we expect in informal situations, and can't is a less formal equivalent of cannot in much the same way as sax is a less formal equivalent of saxophone. There's no reason to say that sax and can't are phonemically anything other than /sreks/ and /krent/. A related problem involves so-called FUNCTION WORDS (also known as GRAMMAT I CAL WORDS) such as prepositions, conjunctions, articles, pronouns, and auxiliary verbs.36 Many of these words occur very frequently, and they tend to have significantly reduced forms even when they don't merge with a neighboring word (as happens in a contraction like can't). The English pronoun him is a typical example. When this word is unstressed, as it usually is, it's ordinarily pronounced [~m], even in fairly careful connected speech. But when him is stressed for emphasis, it's pronounced [him]. The unstressed form of a word like this is called its WEAK FORM, and the stressed form is

34

36

35 Linell 1979:54-6. Hasegawa 1979. O'G rady, Dobrovolsky, and Aronoff 1989: 89, Crystal 1992:160, Fromkin and Rodman 1998: 67-8, Radford et al. 1999:151.

49

2.12 Careful pronunciation

called its STRONG FO R M . 37 We presumably have to say that the weak form of a function word is phonemically different from the strong form- /;im/ versus /him/ in the case of him. Most nouns, verbs, adjectives, and adverbs are called CONT EN T W O RD S , (also known as LEX I C AL WORD S ) as opposed to function words. A content word can certainly be reduced in rapid pronunciation, and it might have an abbreviated form (like sax for saxophone), but it doesn't have a weak form that turns up in connected speech. Compare the phrase NEW hy mn (with the emphasis on new) to the phrase KNEW him (with the emphasis on knew). In very careful pronunciation, both phrases can be /nu him/, but only him, not hymn, can be /;im/ in less careful pronunciaton. The relationship beteween frequency and phonological reduction is actually more subtle than the simple two-way distinction between grammatical words and content words suggests. Among content words, more frequent items are more likely to undergo reduction in rapid speech and more likely to be reanalyzed so that a reduced form becomes the careful speech form. 38 As an illustration, consider the tendency in English for /a/ immediately followed by a liquid or a nasal to disappear in rapid pronunciation. For example, in the word happening / 'hrepallIJ/ (with a vertical stroke indicating stress on the first syllable; §1.7) fa / is followed by the nasal /n/, and ['hrepnTIJ] is a common rapid pronunciation. If we compare the very frequent word family /'fremali/, the popular name Emily l' cmali/, and the infrequent word simile / 1s1mali/, we see that all three words have three syllables, stress on the initial syllable, and /a/ followed by the liquid /l/, but the susceptibility of the /a/ to elision differs. The pronunciation [sTmli] for simile may well occur at a rapid tempo, but the pronunciation [£mli] for relatively frequent Emily seems much more likely. As for high-frequency family, the pronunciation [fremli] is possible even in careful pronunciation, giving us a kind of phonemic free variation (§2.5) between / 'fremli/ and / 'fremali/. In fact, many speakers strongly prefer two-syllable / 'fremli/ in careful speech and feel that three-syllable / 'fremali/ is unnaturally precise. We can understand contractions and weak forms as special cases of reduction that arise and become established because of the very high frequency of the vocabulary items involved. 39 Another complication involved in determining what counts as careful pronunciation is the potential influence of a prestigious dialect. A speaker whose native phonological system differs from the system of a more prestigious variety will typically adopt pronunciations closer to the prestige norm in a formal situation such as a job interview. Not surprisingly, most people see producing examples for a linguist as a formal situation, and it's not 37 39

Ladefoged 1982:97-9, Rogers 2000:95- 6. Bybee 2001:60 -2.

38

Bybee 2000, 2001 :41-2, Pierrehumbert 200 1.

51

Phonemics

50

easy to disentangle the effects of formal situations from the effects of careful pronunciation. 40 Another issue that inevitably arises in connection with careful speech is the role of literacy. More specifically, to what extent does the careful pronunciation of a word depend on how that word is spelled in whatever writing system a speaker has learned? Returning once again to the flap [r] that occurs in English, it's not clear that an illiterate speaker would know which instances of [r] can be replaced with [t] and which can be replaced with [d] in careful pronunciation. In the case of a word like coated, we have the related word coat ending in [t] to tell us that the flap in coated is related to It/, but what about a word like ready? There aren't any related words that would give us a reason to identify the pronunciation [1t:ri] as /redi/ rather than /reti/, and it's not hard to believe that the careful pronunciation [redi] might not exist at all if it weren't for the fact that ready is spelled with a d.41 Linguists have often treated writing as completely irrelevant to phonological analysis or as a complicating factor that might obscure a native speaker's intuitions but doesn't really affect them. 42 It seems to me, though, that knowledge of spelling conventions is bound to influence a literate speaker's phonological system to some degree, and questions involving aspects of the Japanese writing system will come up many times in this book. 43

EXERCISES In United States newscaster English, the words lift and list are a minimal pai r:

1

[lift] versus [list]; [t]"' [s]. Explain whether each pair below is or isn't a min imal pair in this same variety of English. Feel free to co mpare other varieties of English.

ask/ash gnat/mat pique/pick

breath /breathe knead/need task/tax

coast/ghost lap/wrap thick/this

face/phase myth/mist

Try to think of a minimal pair to show that [SJ"' [3] in United States newscaster

2

English. 3

Try to think of a minimal pair to show that [s]"' ['._,, J:.- (da-i-e-LE NGTH) in katakana, which implies that it ends in the long vowel /eH/. None of the examples of lei/ that we've looked at so far has a morpheme division between the two vowels, but compare words like yakeishi ~~t::fi 'heated stone' and meisha El ~lt 'eye doctor'. It's obvious that yakeishi divides into yake ~~t 'burning' and ishi ::fi 'stone' and that meisha divides into me El 'eye' and isha ~lt 'doctor'. Since we expect a syllable division to coincide with

m

3.3 Vowel sequences

$~ i J.

12

J> j

J.

j> j

-f!

?

~Iv

-f!

?

~Iv

zo

u

san

zo

II

san

The syllable

zo sung on two notes

a morpheme division, there might be four syllables in yakeishi (yaAkiAshi) and three syllables in meisha (mdAsha), at least in careful pronunciation. It's often claimed that a morpheme division between the two vowels in /ei/ inhibits the pronunciation [e:] .35 If so, [e:] would be less likely in meisha 'eye doctor' than in a word like meishi i'J± 'celebrity', which has a morpheme division between mei 'fame' and shi 'person' and clearly has only two syllables: meiAshi. Many native speakers are skeptical about this purported difference in the likelihood of [e:] depending on whether or not there's a morpheme break between /el and /ii. If there really is a difference, it's not very robust. The second half of the long vowel /oH/ is usually spelled with 7 (u) in hiragana, although a few words have 13 (o) instead. For example, Sino-Japanese to '.rt 'party' is spelled C. 7 (to-u), but native to+ 'ten' is spelled C.13 (to-o). In katakana, of course, the second half of a long vowel is normally spelled with the length mark - . The important point here is that these three spellings all represent phonemic /oH/, just as the consistent romanization with o (or oo) suggests. For example, Sino-Japanese kodo ~.ll 'altitude' has the hiragana spelling .::. 7 C.'' (ko-u-do ), and the borrowing kodo 'cord' has the katakana spelling :i-f' (ko-LENGTtt-do), and these two words are homonyms. They both have [o:], even in careful pronunciation. In short, even though these kana spelling distinctions are parallel to the distinctions that reflect the difference between lei! and /eH/, there's no parallel difference between foul and /oH/. Pronunciation in songs points to the same /oH/ analysis. The first line of the children's song "Zo-san" r ;r'') ~lvJ 'Mr. Elephant' assigns Sino-Japanese zo ~'elephant' to two notes, as Figure 3-12 shows. 36 Although zo is spelled ;c'') (zo-u) in hiragana, native speakers of Japanese sing it as [dzo] (§4.3) on the first note and [o] on the second note. This pronunciation makes sense if the syllable zo contains the phonemic long vowel /oH/ rather than the vowel sequence foul. Katakana spellings of recent borrowings and foreign proper names with'/ ( u) instead of - (the length mark) do represent /ou/, but these are rare; Souru /''/ Jv 'Seoul' in Table 3-8 is one of very few examples (and many speakers 35

Maeda 1971:1 72, Kindaichi and Akinaga 200 1:25 (front matter), Kubozono and Honma 2002:14. r.:f-) ~/vJ 'Mr. Elephant': words by Michio Mado, music by Ikuma Dan, 1945 (Nobarasha Henshubu 1985:276) .

36 "Zii-san"

68

Vowels

do have the pronunciation /soHru/ rather than /souru/ ). 37 In the hiragana spellings of native and Sino-Japanese words, a spelling with -) ( u) represents /ou/ only when there's a morpheme division immediately before the -5. For example, Yamatouta ::kfDJJ.fX 'Japanese poem' contains the native elements Yamato 'Japan' and uta 'poem', and the letters C.-) ( to-u) in the hiragana spelling ~i C. -) f;:_ (ya-ma-to-u-ta) represent /ou/. Native verb forms like sou ~-) 'accompany' also contain /ou/; in this word, the hiragana spelling is ~-) (sou). It's not unreasonable to say that this sou divides into the base so- and the nonpast affirmative ending -u, although verb forms raise problems for analysis into morphemes that we won't go into here (§6.7). 38 As a Sino-Japanese example, consider the obscure word gyou m ~ 'imperial reign', which has the hiragana spelling ~·J:-) (gi-yo-u). This word divides into gyo fiEiJ 'ruling' and u ~'world'. We expect a morpheme division to coincide with a syllable division, and it seems clear that these last two examples have two syllables each, at least in careful pronunciation: soAu and gyoAu. There's no morpheme division in Souru ') t7 Jv 'Seoul', of course, and it's not so easy to decide how many syllables it has, but it wouldn't be unreasonable to suppose that it has three: SoAuAru. We'll return to this problem of adjacent short vowels and syllable divisions in §6.7 when we look at the interaction between syllable structure and pitch accent. As I mentioned above, /ei/ is probably sometimes realized as (e:] in rapid pronunciation even when there's a morpheme division between the /el and the Iii . In parallel fashion, /ou/ is probably sometimes realized as [o:] in rapid pronunciation, but the distinction between /ou/ and /oH/ is very clear in careful pronunciation. Compare the /ou/ in the two-syllable words sou~-) 'accompany' and gyou fifll'=f'. 'imperial reign' with the /oH/ in the one-syllable words so ~ 7 'thus' and gyo fi 'line'. Notice that the distinction between /ou/ and /oH/ is reflected in romanization even when it isn't reflected in kana spelling. Native sou and so both have the hiragana spelling~ 7 (so-u), and Sino-Japanese gyou and gyo both have the hiragana spelling ~) 7 (gi-yo-u) .

3.4 Vowel reduction I mentioned in §1.7 that Japanese has pitch accent, while English has stress accent. In languages with stress accent, vowels in unstressed syllables often have a mid central unrounded quality known as SCHWA. The IPA symbol for a schwa is [g] (§1.8) . For example, except in unnatural, overly precise 37

The entry in Kindaichi and Akinaga 200 1 gives ') '/ Jv (imp lying fo ul) first but also gives 'J-Jv (implying /oH/). NHK 1998: 101 5 gives only 'J '/ Jv in its list of fore ign place names. NHK 1998 also contains the entry 'J '/ Jv (implying foul) for the borrowing souru 'soul (music)'. 38 Vance 1987: 175-208, 199 l , Klafehn 2003.

69

3.4 Vowel red uction

pronunciation, the first vowel and last vowel in English convicted are both [;:,] : [kh;:in'vtlct;:id]. Notice that the stress in convicted is on the middle syllable, as indicated by the vertical stroke in this phonetic transcription (§1.7). It often happens in English that a related word will have the stress on a different syllable, and a different vowel will show up instead of schwa, as in the noun convict ['khanviJct], with [a] rather than [;:,] in the first syllable. In other cases, the same word may or may not have a schwa in an unstressed syllable. For example, relax can be pronounced carefully, but not unnaturally, as [1i'lreks], with [i] in the first syllable, and less carefully as [1;:,'lreks], with[;:,] in the first syllable. In parallel fashion, the first syllable of superb can be [su] or [s;:i]. This tendency for schwa to appear in unstressed syllables is called RED UC TI ON,

and the schwa itself is often called a

R EDU CED VOWEL.

VOWEL

Since a

reduced vowel occurs in an unstressed syllable, we also expect it to have a shorter duration, a lower amplitude, and a lower fundamental frequency than a vowel in a typical stressed syllable. When native speakers of English learn Japanese, they often impose English-like stress accent and vowel reduction on their Japanese pronunciation. We'll look carefully at the Japanese accent system in Chapter 7. The important point here is just that[;:,] is alien to Japanese, except perhaps in very rapid or sloppy pronunciation. It's not that Japanese speakers always pronounce every vowel with a clearly distinct quality. On the contrary, Japanese vowels in connected speech tend to become centralized, which makes them less distinct from each other. Figure 3-13 shows average formants (§1.12) in pronunciations of the five Japanese short vowel phonemes by a small group of male native speakers. 39 Each speaker read a list of isolated words and then a series of prose passages. The graph in Figure 3-13 shows Fl increasing from top to bottom and F2 increasing from right to left. As we saw in §1.12, Fl correlates negatively with vowel height: a high vowel has a relatively low Fl, and a low vowel has a relatively high Fl. We also saw that F2 correlates positively with vowel frontness: a front vowel has a relatively high F2, and a back vowel has a relatively low F2. Because of these correlations, the arrangement of the two axes in Figure 3-13 gives us a display with the vowels in the same positions relative to each other as on a traditional vowel-area diagram (§1.8) like Figure 3-1 above. It's obvious from Figure 3-13 that vowels produced in the connected speech of prose passages were on average more central than vowels produced in isolated words. On the other hand, Japanese doesn't show an English-like disparity between reduced vowels in some syllables and F U LL vow ELS in others. In English, [;:,] occurs in some syllables even in careful pronunciation; in 39

Based on data reported in Keating and Huffman 1984.

71

Vowels

70

F2 in Hz 2200 2000 1800 1600 1400 1200 1000 800 300



Iii 0

0 /u/

eo



400

lo/

/e/

....,



0

500

0

:;· :c N

600

/a/ •

700

Average first and second formants of Japanese short vowels produced by speakers reading word lists(•) and prose passages (o)

Figure 3-13

Japanese, the centralization tendency is minimized in careful pronunciation, and the five vowel qualities are clearly distinct. Linguists often describe the replacement of a full vowel by a reduced vowel as a kind of

WEAKENING

or

LEN ITI ON,

caused by a reduction of effort.

40

that is, a change in pronunciation

Many historical changes can be classified

as weakenings, but the term also applies to some of the changes that occur in rapid pronunciation, including the vowel reduction we see in the pronunciation of English relax as [1;:i'lreks] rather than [1i'lreks]. As I mentioned in § 1.8 and §3.l , devoiced vowels occur in Japanese, and it makes sense to think of vowel devoicing as a kind of weakening. 41 In fact, in languages that have vowel reduction, reduced vowels are often susceptible to devoicing. Even in English, it's possible for a schwa surrounded by voiceless consonants to be devoiced in rapid pronunciation. For example, potato can be pronounced rapidly as

[p\i't'br®], with voiceless[~] in the first syllable. We'll look at the details of Japanese vowel devoicing in §8.1.

EXERCISES 1

Give a broad phonetic transcription and a phonemic transcription for each vowel.

bin M 'bottle' 40

ho IJJl'L 'sail' ji ~' letter' jun Jlf!!i 'order'

Lass 1984:177-83, Spencer 1996:60-2.

41

Kondo 1997:295-7.

ka ~'mosquito'

Exercises

kitte t;IJ-'¥ 'stamp' kuchi D 'mouth' mon ran Fii'!i 'orchid' sen ll 'line' su ~ 'nest' 2

f~ 'gate'

ne ;fi 'root'

Think of a realistic example of a Tokyo Japanese utterance in which a vowettength mistake could plausibly cause a misunderstanding. The mistake can be either a short vowel in place of a tong vowel or a tong vowel in place of a short vowel. Everything else about the target utterance and the mistaken utterance should be identical.

3

The words kii ~JI: 'strange' and ki

:f-- 'key' aren't homonyms in careful

pronunciation . Explain in detail how they differ in pronunciation .

4

Give a broad phonetic transcription and a phonemic transcription for each word, excluding the initial consonant.

ba //Y..::L for /ju/) and another using t;'j:f' for /ji/ (and t;'~rf'-v for /ja/, t;'J::/':f 3 for !Joi, and t;'19>;:'f"1 for/ju/). The C:/'Y spelling adds the voicing diacritic to the letter for /si / ( L/-;/) , and the t;'/7' spelling adds the diacritic to the letter for /Ci/ (i?/1-). In this case too, until about 400 years ago, the two spellings represented different pronunciations, and [:.i;] and [J:.i;] were realizations of different phonemes. 26 For example, before the 1946 spelling reform, kiji ~c$ 'article' was spelled ~ C: in hiragana, but kiji 1::1:fil 'cloth' was spelled ~ i?', and the spelling difference reflected the old pronunciation contrast: [ki~i) versus [kiJ:.i;i). The merger of these two phonemes has left modern Tokyo Japanese with a single pfioneme that's always realized as [J:.i;] in careful pronunciation, as we've seen. The two words kiji ~C:$ 'article' and kiji ~:!:ti! 'cloth' are now homonyms, both pronounced [kiJ:.i;i] and both transcribed phonemically as /kiJi/. And since the spelling reform of 1946 replaced m ost instances of T:J'/7' with C:/'Y, these two words now have the same hiragana spelling: ~ C:. 27 Kunrei romanization suggests that [J:.i;) might be an allophone of /z/, and so does the za-gyo ~'':fi 'za-column' in the traditional kana display (§4.2), that is, the column produced by adding the dakuten voicing diacritic to each of the five letters of the sa-gyo ~ff' sa-column'. As Figure 4-5 shows, the five syllables in this column are all romanized with the letter z in the Kunrei system. These 24 27

Toyama 1972: 198-202. Amanuma 1983:505.

25

Amanuma 1983:505.

26

Toyam a 1972:198-202.

Figure 4·

4.4 N
] versus [h]

[z] versus [dz]

[k] versus [gi]

Identify the Tokyo Japanese word represented by each broad phonetic transcription. (k%o:~a]

[J1jW:J~o:]

[11\aremono]

[mitswbac\;i]

(dzeimw~o]

[4>wgw]

3

Transcribe each word phonemically using the system developed so far in this book.

kuni 00 'country' wairo Jm.l'm 'bribe' f utoMfm 'envelope' chesu 1- :r.A. 'chess' fairu 771 Jv 'file' tsuji Ji 'crossroads' hiza ~'knee' sh6yu ~ itl:J 'soy sauce' 4

Given that [c] is the IPA symbol for a voiceless dorso-palatal stop (§4.3), why do you suppose /c/ was originally proposed as a tran scription for the Tokyo Japanese phoneme realized as [ts]? It might be helpful to compare /c/ and to /s/ and

5

/c/

/s/.

In §4.3 I elected to treat

/tu/ as a foreign ism. Do you agree with this decision?

Do you think that native speakers ofTokyo Japan ese find /tu/ harder to pronounce than /ci/ or /ce/? Give a broad phon etic tra nscription of the first line of the song "Happy Birthday" as Japanese speakers in Tokyo sing it today. Would the same phonetic transcription be appropriate for speakers born befo re World War II?

6

Are the words

6za .rm 'throne' and 6ja £*'king' a minimal pair in Tokyo

Japanese? Explain your answer.

7

In one well-known pronunciation dictionary (Kindaichi and Akinaga 2001), a word meaning 'film' is listed as either 71 JvA or 71 JvA. Transcribe both of these pronunciations phonemically. Consult with at least one native speake r of Tokyo Japanese about which pronunciation is preferable. Do the same for these other alternative pronunciations from the same dictionary: 7

;t-7/

;t;-7 'fork', 7:r.Jv r/7.X.Jv r 'felt'. Another well-known pronunciation dictionary (NHK 1998) gives only 71 JvA, 7;.t-7, and 7:r.Jvr for these words. What do you suppose might account for this difference between the two dictionaries? Both dictionaries list only l::.:i.-;( for a word meaning 'fuse' and

Exercises

only t: v for a word meaning 'fillet'. Why do you suppose 7:i-:7..' and 71v don't exist as alternative pronunciations?

8

According to pronunciation dictionaries (NHK 1998, Kindaichi and Akinaga 2001), the island of Guam is called /guamu/ in Tokyo Japanese, but some elderly Tokyo speakers at least sometimes say /gamu/ instead. How do you suppose this alternative pronunciation might have arisen?

97

5 Syllable-final conson ants

-----

5.1 Syllable-final nasals If you pronounce the word san .=:. 'three' all by itself, you'll notice that it ends in a long nasal sound that involves contact between the back of the tongue and the back of the soft palate. You'll occasionally see this segment described as dorso-velar [IJ: ], but most accounts agree that the contact is even farther back. 1 We'll transcribe it as dorso-uvular [N:] (§1.10) , although there's some doubt about whether the closure is always complete. 2 In terms of the Japanese writing system, this sound corresponds to hiragana Iv or katakana / in utterancefinal position. In other words, when a word occurs right before a pause and its kana spelling ends with Iv or/, the phonetic segment right before the pause is [N:]. The vowel before [N:] is clearly nasalized, so we can transcribe the whole word as [sCiN:]. 3 Now compare some of the ways the last segment in san .=:. is pronounced when it isn't right before a pause. The examples in Table 5-1 show what happens when the immediately following segment is a stop. In every case the long nasal at the end of san .:::::. and the following stop are homorganic (§2.9), that is, they have the same place of articulation. Remember that an affricate is phonetically a stop plus a fricative (§1.9, §4.3). The nasal is bilabial [m:] before bilabial [p] or [b ], !amino-alveolar [n:] before lamino-alveolar [t] or [d] , lamino-alveopalatal [Jl:] before lamino-alveopalatal [c] or [J], and dorso-velar [IJ:] before dorso-velar [k] or [g]. If the stop is palatalized [kj] or [gj] (§4.1), then the nasal is palatalized too, as in [geIJj:kji] 5[;3{1. 'healthy'. The vowel right before the nasal is nasalized in every example in Table 5-1, as the phonetic transcriptions indicate. 1

Hattori 1930:41 , Arisaka 1940:83-4, Naito 1961 :1 18. 1929:166, Hattori 1930:42, Bloch 1950:134-5 , Aoki 1976:204-5 , Kawakami 1977:43. 3 Bloch 1950:133-5, Jones 1967:88, Nakano 1969:224. 2 Sakuma

96

5.1 Syllable-final nasals Table 5-1 Syllable-final nasals before stops

sanpaku sanbu santo santsu sando sanzen sanchO sanji sankai sango

= rs

=w =~

=Jm =gr = -=t = ~~ =~

=@]

=~

[siim:pokw] [som:bw] [son:to:] [son:tsw:] [san:do] [son:dzeN:] [SOJl!C(fO:] [SOJl!J~i) [soIJ:kai) [sOI]:go:]

'three nights' 'three copies' 'third class' 'three letters' 'three degrees' 'three thousand' 'three trillion' 'three o'clock' 'three times' , number three (

Table 5-2 Syllable-final nasals before fricatives

yonft.to yonsai yonshO yonhyaku yonhai

IZB71-I-

[joup~i:to]

iz:g~

[j6Uj:sai] [j6Uj:(fo:] [joliJ:\:jakw) [j6Uj:hai]

iz:g jj[

IZBa IZB:ff

'four feet' 'four years old' 'four chapters' 'four hundred' 'four cupfuls'

A syllable-final nasal before Ir/ is apico-alveolar [i:.i:], but we'll dispense with the diacritic and transcribe sanrui = ~'third base' as [son:rwi]. As always, the vowel before the nasal is nasalized. The allophone of Ir/ that occurs after a nasal is the same as the one that occurs in utterance-initial position (§4.5); the tongue tip is already lightly in contact with the alveolar ridge for the nasal, and Ir/ is produced by rapidly releasing this contact. 4 Here again, we won't bother with a separate phonetic symbol for this allophone of Ir/. When the segment that immediately follows a syllable-final nasal isn't a stop or Ir/, the nasal is harder to describe. 5 The examples in Table 5-2 all contain yon IZY 'four' right before a fricative. In all these examples, yon IZY ends with a long nasal approximant that I've transcribed as [Uj:], that is, as a dorso-velar (or high back unrounded) semivowel that's long and nasalized. The exact place of articulation of this approximant undoubtedly varies to some extent depending on the place of articulation of the following fricative, but we can ignore these minor differences in our broad phonetic transcriptions. The lack of closure in a nasal before a fricative means that a following Isl isn't very likely to be mistaken for a following /c/. The schematic diagrams in 4

5

Maeda 1971:135. Akamatsu 1997: 11 3-6 describes this allophone of /r/ as lateral. Akamatsu 1997:58-62.

{

98

99

Syllable-fi nal consonants /cl= [ts] [n:]

[t]

Isl= [s] [s]

[Uj:]

[s]

Glottis Velum Tongue/alveolar ridge

Figure 5-1

Syllable-final nasal before

/cf

(left) and /s/ (right)

Ins/

/nts/ [n]

[t]

[s]

[n]

[s]

Glotti s Velum Tongue/ alveolar ridge

Figure 5-2

English /n/ before /ts/ (left) and /s/ (right)

Figure 5-1 show activity at three locations in the vocal tract during the production of [n:ts] in a word like santsui = ~'three pairs' and the production of [Uj:s] in a word like sansui Ll.Jll< 'landscape'. A single line(--) represents a complete closure, a double line ( = ) represents a gap between the relevant articulators, and a jagged line (AA) represents vibration, that is, voicing. In both cases the end of the glottal vibration (voicing) is roughly simultaneous with the closing of the velum. In [n:ts] the release of the !amino-alveolar closure shared by [n] and [t] is later. In [Uj:s], on the other hand, there's no !amino-alveolar closure in the first place, so of course there's no release. This clear distinction in Japanese is quite different from what we find when we compare the English clusters /nts/, as in prints /pnnts/, and /ns/, as in prince /prms/. The crucial detail is that English In/ is pronounced [n], with apico-alveolar closure, both before It/ and before Isl. The diagrams in Figure 5-2 illustrate. As the diagram on the right in Figure 5-2 shows, the transition from [n] to [s] involves three simultaneous changes instead of just two, and there's a well-known tendency in cases like this for the release of the oral closure to lag behind the cessation of voicing and the closing of the velum. The result of such a lag is an epenthetic voiceless stop like the one that appears in the

-5.1 Syllable-final nasals Table 5-3 Syllable-final nasals before semivowels and vowels

san'yado ,. sani san'en san'anpea san'oku san'u sanwari

.:::.-17-r =.ill: = f13 =. r:/~7

.:::.f;l

= :fj =-~

[sali):ja:do] [sali):i] [saU):eN:] [sali):am:pea] [sali):okrn] [sali):rn] [sali):uiari]

'three yards' 'third rank' 'three yen' 'three amperes' 'three hundred million' 'three (Buddhist) worlds' 'three tenths'

Table 5-4 Syllable-final nasals before nasals

=-~

sanmai sannen sannin

=if: .:::A

[sam::ai] [san::eN:] [SOJl::iN:]

'three sheets' 'three years' 'three people'

rapid pronunciation of something as [sXmp8i!]] (§2.12). In the case of prince, of course, the epenthetic stop is an apico-alveolar [t] between the [n] and the [s ], and when it appears, the distinction between the /ns/ in prince and the /nts/ in prints is blurred.6 The broad phonetic transcription [di:] is also adequate for a syllable-final nasal before a semivowel or a vowel. The examples in Table 5-3 all begin with san .::::. 'three'. The exact place of articulation of the long nasal approximant in each of the examples in Table 5-3 varies to some extent depending on the surrounding sounds. 7 In fact, when a syllable-final nasal is surrounded by high front segments, as in hin'i &bill: 'dignity' or shin'yo f§ffl 'trust', it might be more accurate to transcribe it as [J:], although it seems to involve at least a slight shift toward high back tongue position. When a syllable-final nasal has lo/ on both sides, as in yon'oku [g{! 'four hundred million', it shows the same weak rounding (§3.1) as /o/, so we might want to transcribe it as [w:]. From here on, though, we'll ignore all these details and just use [di:] in our broad phonetic transcriptions for any syllable-final nasal before a vowel or a semivowel. As in all the other examples we've looked at, a vowel right before a syllable-final nasal is nasalized. When a syllable-final nasal comes right before a syllable-initial nasal, the result is an extra-long nasal, as the examples in Table 5-4 show. To see that the phonetic word-medial nasals in Table 5-4 really are extra-long, compare the [m::] in sanmai =- ~ [sam::ai] 'three sheets' with the [m:] in sanbai .::::. ffr [siim:boi] 6

Bloch 1941:280.

7

Hattori 1930:46, Arisaka 1940:83-4.

100

Syllable-final consonants

101

Table 5-5 Phonemic transcriptions of syllable-final nasals

/saNbu/ /saNzeN/ /saNgoH/

_=:t)(

[sam:bw] (San:dzeN:) [sfu]:go:] [san:rwi] [sallj:sai] [sallj:eN:] [sallj:U[ari] [sam::ai)

.=:A.

(SOJ1::iN:)

-

i=i

~?

-=~

_=:p}J,

== P3 =='i!ilJ /sa niN/ (J

g

w

(J

(J

/\ m:

m

'three copies' 'three thousand' 'number three' 'third base' 'three years old' 'three yen' 'three tenths' 'three sheets' 'three people'

w ]

(J

~

g

w

m:

/\

b

w l

II ffi"

Figure 5-3

Syllabification of extra-long nasals

'three times'. As the phonetic transcriptions in Table 5-4 indicate, there's no articulatory or acoustic shift within an extra-long nasal to mark a transition from one syllable to the next. Even so, we'll analyze extra-long nasals phonologically as sequences of two phonemes: a syllable-final nasal followed by a syllable-initial nasal. A syllable-final nasal always corresponds to Iv or /in kana spelling, and there's no question that this two-phoneme analysis matches the intuition of native speakers. Figure 5-3 illustrates by comparing gunmu 'IJ~ [gc:Um::m] 'military affairs' with gunbu 'IJ-$ [gwm:bw] 'military authorities'. The syllabification treats extra-long [m::] as [m: ~m], that is, parallel to [m: ~b] (using carats to mark the syllable boundaries) .

5.2 The mora nasal phoneme There's general agreement that all the syllable-final nasals described in §5.1 are allophones of a single phoneme, and we'll transcribe that phoneme as I /.8 The examples in Table 5-5 illustrate. As we've seen, the place of articulation and the degree of constriction of a syllable-final nasal are predictable from the neighboring sounds. In other words, even though we're dealing with a wide variety of phonetic segments, they don't contrast; they're all in complementary distribution (§2.4) with each other. At the same time, these syllable-final 8

Arisaka 1940:82, Na ito 1961 :11 7, Jones 1967:88-9, Vance 1987:38.

5.2 The mora nasal phoneme

nasals share enough phonetic similarity (§2.4) to make it plausible to treat them all as realizations of the same abstract entity. Not only are they all nasal, they're all long and all unreleased. I've transcribed all the syllable-final nasals above with the IPA length symbol [:],and native speakers have the strong intuition that these long nasal segments are equal in duration to ordinary CV syllables, that is, syllables consisting of a single consonant followed by a single short vowel. 9 We'll see in Chapter 6 that Japanese has short and long syllables, and we'll follow the standard practice of describing the difference in terms of Mo RA s (also called BE AT s; §6.2). A short syllable consists of one mora while a long syllable consists of two moras. For example, the words to Ito/ ? 'door' and to ltoH/ '.% 'party' are both onesyllable words, but the short syllable /to/ is one mora and the long syllable /toH/ is two moras, Ito/ and IHI. The one-syllable word ton ltoN/ r / 'ton' is also two moras, Ito/ and /N/. Since the phoneme IN/ constitutes a mora, I'll refer to it as the MORA NASAL. The lack of release in /N/ just follows from the fact that IN/ doesn't have a complete oral closure unless the immediately following phoneme does. 10 As we saw in §5. l, IN/ is realized as an approximant before a fricative, a semivowel, or a vowel, and since an approximant doesn't have a complete oral closure, it can't have a release. Because Japanese /N/ is realized as such a wide variety of phonetically different syllable-final nasal segments, it seems like a chameleon to an English speaker. No English phoneme has anywhere near this range of allophones, and as we saw in §2.7, the three nasals [m], [n], and [IJ] contrast syllable-finally in English. We saw in Chapter 4 that there are nineteen contrasting consonants syllable-initially in Japanese, but there are only two contrasting consonants syllable-finally. One of those two is /N/, and we'll discuss the other syllablefinal consonant below in §§5.4-6. As I mentioned in §2.11, native speakers of Japanese don't feel that either of the syllable-final consonants is the same as any of the syllable-initial consonants. In other words, a syllable-final nasal isn't /m/ or /n/ or any of the other consonants we discussed in Chapter 4. At the same time, native speakers of Japanese feel that all the phonetically diverse syllable-final nasals that occur are the same sound. The phoneme IN/ is a straightforward reflection of these intuitions. Kana spelling reflects exactly the same intuitions, since /N/ corresponds consistently to Iv in hiragana and to / in katakana. We've already seen lots of examples of how IN/ assimilates to an immediately following segment within a word, but the same assimilation (§2.12) 9

Bloch 1950:149, Kindaichi 1954:155, Han 1962a:70.

10

Arisaka 1940:83, Kindaichi 1954:162-4.

102

Syllable-final consonants

103

Table 5-6 Realizations of word-final /N/ before following words

Gohon Gohon Gohon Gohon Gohon

1i)t~=tB t.:

deta kawaita moeta saita ureta

.li.;ifs:~\,lf;:

.li.;ifs:~~t.: .li.;ifs: ~\, l f;:

.li.;ifs: JC ht.:

/gohoN deta/ /gohoN kawaita/ /gohoN moeta/ /gohoN saita/ /gohoN ureta/

[n:d) [l):k]

[m::] [U[:s] [U[:w]

'5 emerged' '5 dried' '5 burned' '5 bloomed' '5 sold'

Figure 5 nasal [ voiced

J

I

IN/

--

oral [ voice less

J

I

/p/

-- -- J

bilabial] [ closure

Figure 5-4

/N/ Taking on place and aperture of following /p /

occurs even when the immediately following segment is in a different word, as long as no pause intervenes. Table 5-6 illustrates with short sentences, each consisting of the word gohon li:ifs: 'five (long objects)' followed by a verb. The phonetic transcription in each row shows how IN/ and the immediately following phoneme are realized phonetically. It's only when a pause follows that IN/ is realized as uvular [N:], as I noted above in §5.1. We can think of assimilation as one segment absorbing features (§§2.6-7 ) from a neighboring segment. In the case of Japanese /NI, it's tempting to say that it's specified only as a long nasal consonant and that it just absorbs its place of articulation and its degree of oral constriction fro m whatever follows. The degree of constriction, also called APERTURE, is either the narrowing required for the approximant allophone [Li{:] or the complete closure required for the other allophones ([m:], [n:], and so on). Figure 5-4 depicts the assimilation of IN/ to /pl (as in a word like /oNpa/ [6m:pa) ~1Bl 'sound wave' ) by adding a link from the place and aperture features of /p/ to /NI. A dashed line indicates the added link. Of course, this kind of simple feature absorption can't account for the complete closure and the dorso-uvular place of articulation of the [N:] allophone that realizes IN/ before a pause. When a word like /mimaseN/ ~ Iv 'doesn't watch' occurs utterance-finally, there's nothing for the IN/ to assimilate to. One solution to this problem is to say that complete closure is the default aperture and dorso-uvular the default place of articulation for /N/.11 In other words, when there's no following segment to provide a target

*-tt"

11

Shibata ni 1990: 170.

Figure 5

5.2 The mora nasal phoneme

[v:~~:~J I

II

!NI \

' lctorso-uvular] closure

L ;-s

/N/ Taking on default place and aperture before pause

[

v:~~:~] [ vo~::iess] I

!NI

I

I

/pl

!NI

~----J dorso-uvularJ [ closure ;-6

J

nasal [ voiced

[ bilabial] closure

II

I dorso-uvular] [ closure

/N/ with inherent place and aperture, replaced before /p/ (left) and retained before pause (right)

for assimilation, these features are automatically supplied. Figure 5-5 illustrates, with II representing a pause. An alternative solution is to say that JN/ is inherently specified for a complete dorso-uvular closure rather than being unspecified for place and aperture. On this assumption, assimilation involves replacing these features with those of the immediately following segment when there is one. In Figure 5-6, the link connecting JN/ and its original place and aperture features is severed when /p/ follows, as indicated by the cut line, and a new link connects JN/ to the place and aperture features of /p/. When a pause follows JN/, the original link just remains. When IN/ occurs right before a fricative, a semivowel, or a vowel, feature absorption (as in Figure 5-4) and inherent specification (as in Figure 5-6) both lead to wrong results. The problem in these cases is that, in general, the place and aperture of the allophone [U[:] don't match the place and aperture of the following segment. For example, when the phoneme after IN/ is Isl , as in a word like /keNsa/ [keti[:sa] ~1!t 'inspection', the !amino-alveolar place of articulation of [s] is farther forward than the dorso-velar place of articulation of [U[:], and the fricative constriction of [s] is narrower than the approximant constriction of [ti[:].As another example, consider what happens when the phoneme after IN/ is the low vowel /a/, as in a word like /daNacu/ [doU[:atsrn] 51*1EE 'suppression'. If /N/ took on the aperture of the following [a], it would be realized as [5:] rather than as [Ii{:] . In short, there doesn't seem to be any simple model of assimilation that will give us all the right allophones of /NI.

r 104

Syllable-final consonants

105

5.3 Phonotactics of the mora nasal

The examples we've already seen show that the phoneme IN/ can be either word-medial or word-final and can follow any short vowel. As we'll see in §6.6, IN/ can also follow a long vowel (as in ran 0 - / 'loan'), and there are even examples of /N/ followed by a tautosyllabic consonant (as in Nihonppoi 8:$>::> t£1i} 'Japanesy'). When I N/ is word-medial, the following phoneme can be any vowel or any syllable-initial consonant. It's been suggested that IN/ can also occur word-initially and constitute an entire syllable by itself. 12 For example, when uma -~ 'horse' is the first word in an utterance, it's easy to imagine pronouncing it as [?m:a], with an initial glottal stop followed by a long syllabic I nasal. The IPA diacritic []I marks a consonant as syllabic (§1.10). As we'll see in §8.3, a non-distinctive initial glottal stop is virtually automatic whenever the first phoneme in an utterance is a vowel. Should we transcribe the pronunciation [?m:a] phonemically as /Nma/? I argued in §2.12 that phonemic I forms are based on careful pronunciation, and a careful pronunciation of uma, utterance-initial or not, will have the vowel [w] in the first syllable. There's no doubt that the transcription /uma/ reflects native-speaker intuition about this word. The first part of the syllabic nasal in the pronunciation [?m:a] is a rapid' incidentally, pronunciation realization of /u/, not an allophone of I N /. Notice, that the syllabic nasal in this pronunciation is long but not extra-long; [?qi::a] sounds unnatural. As we saw in Figure 5-3, /Nm/ is normally realized as an extra-long bilabial nasal. Another possible example of word-initial IN/ is the casual expression of affirmation un -5 Iv 'yeah'. Despite the hiragana spelling, this word is normally pronounced [?N:] or [?m:], suggesting that it consists entirely of the phoneme I I /N/.13 But expressions of affirmation are traditionally classified as interjections, and the interjections of a language often have pronunciations that fall outside its phonotactic norms. 14 Compare the typical pronunciation of English uh-huh as [?quprp], which suggests the phonemic transcription /rnhm/. Whatever the right way to handle such items may be, Japanese un shouldn't force us to abandon a phonotactic generalization that holds for the "normal" vocabulary. There are some apparent examples of word-initial I N/, though, that aren't so easy to dismiss. One is an alternative pronunciation of the morpheme /naN/ {ii]' 'several' in words like /naNJuHok:u/ {ilJ +~'several billion'. In this case, the pronunciation [?.r,q~w:okw] is possible, and it gives the word a slangy flavor. Fiction writers rendering dialog sometimes spell this pronunciation / +ii&'., 12 13

bm

Bloch 1950:133-4, Martin 1952: 13, Kindaichi and Akinaga 2001 :7-8 (front matter). 14 Sakuma 1929:164. Ferguson 1982:50.

5.4 Syl

5.4 Syllable-final obstruents Table 5-7 Common extra-long obstruents

ippai itten ittsii

- tf -R- Jj

ISSal

-"/JJ,

itcho issho ikkai

-1([

- ~~

- @]

[ip::ai] (it::eN:) [it::sm:] [is::ai] [ic::co:] [ic::o:] [ik::ai]

' one

cupful' point' ' one letter' 'one year old' 'one trillion' 'one chapter' 'onetime ' ' one

Table 5-8 Degrees of length in obstruents and nasals

[k] [n] [n:]

[k::] [n::]

[aka] [ana] [an:da] [ak::a] [an::a]

aka :IJ§ 'dirt' ana 1\. 'hole' anda 'hit' akka ~{jj 'worsening' anna 'Slvf.t. 'that kind'

*H

+

for /JuH/ 'ten' with the katakana symbol / for IN/ followed by the kanji and the kanji {t for /oku/ 'hundred million'. The fact that writers notice this pronunciation and spell accordingly indicates that we're dealing with casual speech rather than rapid pronunciation. As we saw in §2.12, casual-speech forms are typical of informal situations and remain distinct from their more formal counterparts even in careful pronunciation. We can't ignore casualspeech forms in discussing the distribution of I N/, so we really don't have is phonemically any choice but to acknowledge that phonetic [?J!!J:¥UI:okm] I /NJuHoku/, just as the spelling /+it implies.

llable-final obstruents Phonetic extra-long voiceless obstruents are frequent in Japanese, and the examples in Table 5-7 illustrate the most common possibilities. There are only two degrees of obstruent length in Japanese, but the longer of the two is comparable to an extra-long nasal (§5.1). The examples in Table 5-8 illustrate. Like an extra-long nasal, an extra-long obstruent lacks any articulatory or acoustic shift to mark a transition from one syllable to the next, but we'll analyze these two segment types in parallel fashion. In other words, we'll treat extra-long obstruents phonologically as sequences of two phonemes: a syllable-final obstruent followed by a syllable-initial obstruent. A

106

Syllable-final consonants

107

u

h

a

/\ ~

II S"

Figure 5-7

u

u

a:

h

a

/\o:

~

l

II

n"

Parallel syllabification of extra-long obstruents and nasals

syllable-final obstruent always corresponds to a reduced-size "? or 'J (tsu) in kana spelling, and here again there's no question that this two-segment analysis matches the intuition of native speakers. Figure 5- 7 illustrates by comparing hassi5 [has::o:] 9£~ 'sending' with hanni5 [han::o:] Bf.J;f) 'reaction'. The syllabification treats extra-long [s::] as [s:As] and extra-long [n::] as [n:An] (§5.1). When the syllable-initial consonant is an affricate (§4.3), as in yottsu [jot::sw] ~"J 'four', then of course there's a shift from stop to fricative, but that shift doesn't mark the syllable break. The break comes within the extralong stop: [jot:Atsw]. 5.5 The mora obstruent phoneme

There's general agreement that all the syllable-final obstruents described in §5.4 are allophones of a single phoneme, and we'll transcribe that phoneme as /Q/.15 Table 5-9 illustrates with the same examples that appear in Table 5- 7 above. As with IN /, the place of articulation and the degree of constriction of /QI are predictable from the neighboring sounds. In other words, the wide variety of phonetic realizations of /QI don't contrast; they're all in complementary distribution (§2.4) with each other. At the same time, they share enough phonetic similarity (§2.4) to make it plausible to treat them all as realizations of the same abstract entity. Not only are they all obstruents, they're all long and all unreleased. As I noted in §5.2, I N/ shares these last two characteristics. Given the syllabification I adopted above in §5.4 (as in Figure 5-7), all syllable-final obstruents are phonetically long. This description reflects the strong native-speaker intuition that the portion of an extralong obstruent corresponding to IQ/, just like the portion of an extra-long nasal corresponding to IN/, has the same duration as an ordinary CV syllable.16 In other words, the phoneme /QI constitutes a mora (§5.2), and I'll refer to it as the MORA OBSTRUENT. For example, the word motto /moQto/ ~ "':Jc 'more' contains the long syllable /moQ/ and the short syllable /to/, and the long syllable is two moras, Imo/ and /QI. 15 16

Arisaka 1940:94, Hashimoto 1950:286, Hattori 1958:360- 1, Vance 1987:40-2. Kindaichi 1954:155, Han 1962a:71-2, Kawakami 1977:85-7.

5.5 The mora obstruent phoneme Table 5-9 Phonemic transcriptions of extra-long obstruents

/iQpai/ /iQteN/ /iQcuH/ /iQsai/ /iQeoH/ /iQfoH/ /iQkai/

-t1' -ti,

[ip::ai] [it::eN:] [it::sw:] [is::ai]

- ~~

[ic::~o:]

- B - )M

"""'-

--"fl

[i~::o:]

- @]

[ik::ai]

' one cupful' 'one point' 'one letter' 'one year old' one trillion' one chapter' one time' (

(

(

The lack of release in IQ/ follows from the fact that, as we saw in §5.4, IQ/ ends at a point within a phonetic extra-long obstruent that doesn't correspond to an articulatory or acoustic shift. As a result, IQ/ has a complete oral closure only when the immediately following phoneme is a stop or an affricate that continues the same closure. When the following phoneme is a fricative, IQ/ doesn't have a complete oral closure, so it can't have a release. In the same way as /N/, /QI seems like a chameleon phoneme to an English speaker because it has such a wide variety of phonetically different realizations. I mentioned in §2.11 and §5.2 that Japanese has only a two-way consonant contrast in syllable-final position, and the phonemic analysis in this chapter treats every phonetic syllable-final consonant either as an allophone of IN/ or as an allophone of IQ/, as in hassha /haQsa/ 9£M 'launch' versus hansha /haNsa/ RM 'reflection'. Native speakers of Japanese feel that a syllable-final obstruent, like a syllable-final nasal, isn't the same as any syllable-initial consonant. In other words, a syllable-final obstruent isn't /ti or Isl or any of the other consonants we discussed in Chapter 4. At the same time, native speakers of Japanese feel that all the phonetically diverse syllable-final obstruents that occur are the same sound. The phoneme /QI is a straightforward reflection of these intuitions, and so is kana spelling, since IQ/ corresponds consistently to reduced-size -J (tsu) in hiragana and to reduced-size:; (tsu) in katakana. In short, the only distinctive feature (§2.6) of a syllable-final consonant is whether it's nasal or oral; everything else about a syllable-final consonant's pronunciation is predictable from what comes next. We ran into some problems in §5.2 when we tried to describe all the allophones of IN/ as absorbing the place of articulation and degree of constriction of the following segment. No parallel problems arise in the examples of IQ/ that we've looked at so far. In every case, !QI has exactly the same place and aperture as the following segment. We'll see whether this assimilation pattern holds up after we've looked more carefully at the distribution of IQ/ in §5.6 below.

108

Syllable-final consonants Table 5-10 Mora obstruent before

/waQfuru/ /maQha/ /keHniQhi/

7/'7 Jv ?ji/\

J;--.:::../'1::

109

/f/ and /h/ [lJ.lOcl>::wrw] [moh::o] [ke:J1i\:::i]

(

waffle' 'Mach' 'Konig'

5.6 Phonotactics of the mora obstruent The examples in Table 5-9 in §5.5 show IQ/ followed by every syllable-initial voiceless obstruent phoneme except /fl and /hi . The sequences /Qf/ and /Qh/ don't occur in native or Sino-Japanese vocabulary items, but they do occur in mimetic items, recent borrowings, and foreign proper names. Table 5-10 gives a few examples. All the phonetic extra-long obstruents we've considered so far have been voiceless, but many recent borrowings are spelled in katakana with reducedsize '/ (tsii) followed by a symbol that indicates a syllable-initial voiced obstruent. The examples in Table 5-11 illustrate. It's clear that words like those in Table 5-11 contain /QI and are pronounced with phonetic extra-long obstruents. The question is, are the extra-long obstruents really voiced? Japanese has five syllable-initial phonemes with voiced obstruent realizations: / b/, /d/, lg/, lz/, and !JI. The katakana spellings in Table 5-11 suggest that each of these five appears after IQ/ in one of the examples. In careful pronunciation, / b/, /di, and lg/ are realized as voiced stops (§4.1), and !JI is realized as [14;], which means that it starts as a voiced stop (§4.3). As we saw in §4.3, /z/ has both [z] and [dz] as careful-pronunciation allophones, but following IQ/, lz/ is always [dz]. Consequently, IQ/ is realized with stop closure before all five of these phonemes, but a long voiced stop involves an inherent articulatory contradiction. 17 Voicing requires airflow from the lungs through the glottis (§ 1.3), but since the velum is closed (§ 1.4), a stop closure higher in the vocal tract means that the air coming through the glottis can't escape. As a result, the air pressure between the glottis and the closure quickly builds up and makes it impossible to keep air flowing from the lungs. The schematic diagram of the vocal tract in Figure 5-8 illustrates for [b ], and except for the location of the closure (the place of articulation), the situation is the same for any other voiced stop. You can easily demonstrate this pressure build-up to yourself by pronouncing as long a [b] as you possibly can. It takes less than a second for the build-up to halt the voicing. For comparison, try pronouncing as long a [v] as you can. Since [v] is a voiced labiodental fricative (§ 1.9), the air from your lungs escapes through the narrow gap 17

Jaeger I 978:320-1, Hayes and Steriade 2004:6-10.

Figure 5-8

5.6 Phonotactics of the mora obstruent

Table 5-11 Orthographic mora obstruent before voiced obstruents

mobbu .::Cy7'' 'mob' beddo ;-("/ 'bed' baggu F\y7'' 'bag'

guzzu 7''y ;( 'goods' bajji /\y-;) 'badge'

r

Bila bial closure

.........

'------~ '

Closed velum

t Vibrating vocal fo lds

t

t

Air pressure building up behind closure for [b]

between your lower lip and upper teeth, and you can maintain the voicing as long as you can keep exhaling. One natural response to the contradiction between voicing and sustained stop closure would be to pronounce words like the ones in Table 5-11 with extra-long stops that are voiceless. For example, baggu /\ y '· f 'bag' would be [bak::w] rather than [bag::w], in spite of the katakana spelling. In fact, many linguists have noted that this kind of voiceless pronunciation is widespread, athough there are disagreements about what factors influence the loss or retention of voicing. 18 In any case, as long as words like baggu / ~, 'Y 7'' 'bag' and bakku /\ 'Y7 'background' are phonetically distinct in careful pronunciation, we're dealing with a phonemic contrast, even if the distinction tends to disappear in less careful pronunciation. As a consequence, [bak::w] could be either /baQku/ or /baQgu/, which means that we have overlapping (§2.11) of the phonemes /k/ and lg/. The distinction between /i/ and IH I following /el (§3.3) is similar. Even though both /ei/ and /eH/ are typically pronounced [e:], they're phonetically distinct in careful pronunciation ([ei) /ei/ versus [e:] /eH/), and this phonemic contrast seems to match the intuition of native speakers. There's no real doubt that IQ/ followed by a voiced obstruent and IQ/ followed by an otherwise identical voiceless obstruent are phonetically distinct in careful pronunciation. Of course, the phonetic difference isn't necessarily just a simple voiced/voiceless contrast. Speakers typically pronounce IQ/ plus 18

Arisaka 1940:94, Kawakami 1977:90, Koo and Homma 1989, Nishimura 2006, Kawahara 2006:538-9.

110

Syllable-final consonants Table 5-12 Mora obstruent before voiced obstruents

/moQbu/ .:f:'/7'' 'mob' /beQdo/ .rZ')' r' 'bed' /baQgu/ ; ·\''/ :~l 'bag'

/guQzu/ 7''y;!:,' 'goods' /baQJi/ 1\y-;) 'badge'

a voiced obstruent with an extra-long stop that's voiced only at the beginning and not all the way through, which isn't surprising given the inherent difficulty of long voiced stops. 19 Using the IPA symbol[,] for non-release (§2.5), baggu 1\y7• 'bag' is often pronounced [bag,k:ur] instead of [bog::ur ], but it's still distinct from bakku 1\y7 [bak::ur] 'background'. In spite of these troublesome phonetic details, the phonemic transcriptions in Table 5-12 seem to reflect native-speaker intuition. In all the examples we've considered so far, IQ/ immediately precedes an obstruent, and it's tempting to infer that this is the only environment that allows IQ/. But there's a kind of emphatic pronunciation that native speakers feel involves IQ/ preceding a sonorant, with IQ/ realized as a long glottal stop: [?:]. Typical examples are the adjectives /hayai/ Jfilv} 'fast' and /samui/ ~v} 'cold' pronounced emphatically as [ho?:jai] and [sa?:muri]. The usual hiragana spellings for these pronunciations are ti"'?~v } and ~ "'? Dv \with reducedsize "'? (tsu) implying the phonemic forms /haQyai/ and /saQmui/. We also find IQ/ realized as a glottal stop utterance-finally in exclamations like Dame! [dame?] 'No!' 20 Here too, the typical hiragana spelling t!.151)"'?, with reducedsize "'? (tsu), implies a phonemic form containing /Q/: /dameQ/. I'll have more to say about treating[?:] as an allophone of IQ/ in the course of a discussion of glottal stops in §8.3. At this point, the only phonotactic restictions on IQ/ that seems to hold without exception are that IQ/ can't occur immediately before a vowel or at the beginning of a syllable. I suggested above in §5.5 that it might be possible to describe the assimilation of IQ/ to a following segment as simple absorption of place and aperture. But we've seen that an obstruent immediately following /QI can be either voiced or voiceless and that /QI takes on this feature as well. Figure 5-9 depicts the assimilation of IQ/ to Isl (as in a word like /ciQso/ [ci;;is::o] ~*'nitrogen') by adding a link from the voicing, place, and aperture features of Isl to IQ/. As before, a dashed line indicates the added link. Most of the examples we've looked at in this chapter don't cause any trouble for the feature-absorption 19

20

Kawakami 1977:90, Homma 1981:274-6, Kawahara 2006:540-4, 559-60. A breve Cl over a word-final vowel is occasionally used in romanization to indicate a glottal stop that appears inunediately before a pause, as in the headword d (hiraga na spelling Ji>-:>) 'oh! ' in Masuda 1974.

111

5.6 Phonotactics of the mora obstruent

Gira!] I

[oral] I

IQ/

Isl

-----J

[l am:~~~:~::solar] narrow aperture

Figure 5-9

/Q/ Taking on voicing, place, and aperture of following /s/

[oral]

I

II

IQ/ \ \

I glottal] Lc!osure

Figure 5-10

/QI Taking on default place and aperture before pause

[oral]

[oral]

I

I IQ/

Isl

~----J

J[

ILclosure glottal Figure 5-11

IQ/

J

voiceless !amino-alveolar narrow aperture

I I glottal] Lc!osure

/Q/ with inherent place and aperture, replaced before /s/ (left) and retained before pause (right)

account in Figure 5-9, but utterance-final glottal stop does. It's similar to the problem we encountered with the utterance-final allophone [N:] of IN/ (§5.2). One solution is to say that complete glottal closure is the default place and manner of articulation for IQ/. A glottal stop has to be voiceless, so we don't need to specify voicelessness as a default. Figure 5-10 illustrates, with II representing a pause. An alternative solution, parallel to the alternative for /N/, is to say that IQ/ is inherently specified for a complete glottal closure (and therefore for voicelessness as an automatic consequence) rather than being unspecified for voicing, place, and aperture. On this assumption, assimilation involves replacing these features with those of the immediately following segment when there is one. In Figure 5-11, the link connecting /QI and its original place and aperture features is severed when Isl follows, as indicated by the cut line, and a new link connects IQ/ to the voicing, place, and aperture features of Isl. When a pause follows /QI,

112

Syllable-final consonants

the original link just remains. Either of these solutions works fine for IQ/ except when the immediately following segment is a sonorant, as in emphatic forms like / haQya i/ ti-?~v >'fast! ' and /saQmui/ ~ '? tl'v >'cold! ', realized as [ho?:joi] and [so?:mwi]. We have to say that /QI either takes its default form (as in Figure 5-10) or retains its inherent form (as in Figure 5-11) before a sonorant as well as before a pause. In any case, describing IQ/ as basically a glottal stop has some real intuitive appeal, in spite of the fact that most of its phonetic realizations aren't glottal and many of them aren't stops. Native speakers seem to feel that a glottal stop is somehow the "unadulterated" form of /Q/.21

EXERCISES Provide a broad phonetic transcription for each word.

1

bisshori 0:-:> l J: f) jiman 2

§ 1g£

dakkyu IJ~S

konsento :::i~-t~r

hanno J5Z./;f} nittei

Fifi!

ippai - # zenmetsu ~mt

Write each of the following sentences in ordinary Japanese orthography.

[wci;;iuioekijkoroto:1:J'bv, T~ v'o tl*li,AJ~~ Jiv >f;:.iJ~-:>-C v>Q0

t->wU~ti~= Anu~-:>-C~Qh o .::h~~hQ~EIM~~~Q~~-r~

4

Imagine turning on a radio in the middle of an Engl ish-la nguage news broadcast and hearing the announcer say Police officers approached the residence/residents. Explain why, in the absence of any context, it would be hard to tell whether the last word in the sentence was residence or residents .

5

For each of the following allophones of !NI supply an example word in which that allophone occurs. Also provide a broad phonetic transcription of the entire word in each case. [n:J 21

[31:]

Arisaka 1940:94.

Exercises

113

6

For each of the following allophones of /Q/ supply an example word in which that allophone occurs . Also provide a broad phonetic transcription of the entire word in each case.

[s:]

7

[k:]

[p:]

[t:]

[c:]

[c:]

[h:]

[?]

In Exercise 5 in Chapter 3, I mentioned zujago ;;(- .Y1'~, a Japanese jazz musicians' argot (Tateishi 1989, Ito, Kitagawa, and Mester 1996) . The rules for the argot aren't easy to describe, but one interesting case is the argot form of /paN/ ,1~/ 'bread', which is /NpaH/. The last mora of the ord inary word has been transposed to the beginning, and the vowel has been lengthened. Provide a broad phonetic transcription to show how you think the argot form is pronounced. Is the argot form phonotactically admissible?

8

Katakana spellings implying /Qr/ occasionally appear in foreign proper names. Examples include 7":.r 7 - 'Allah' and

1- :r.. '/ 1) -

:::..

(for the surname of the 16th -

century Italian artist Benvenuto Cellini). Consult with a native speaker ofTokyo Japanese to find out how these names are pronounced, and provide a broad phonetic transcription of each.

9

The form " >t:::.-;; is common in Tokyo Japanese as a spontaneous response to sudden pain, much like ouch in English . Iwasaki (2006:334) calls forms like "' t:::.-;; the "clipped" forms of adjectives (compare /itai/ ~fiv> 'painful'). In recent

years, younger Tokyo speakers have started to use clipped adjective forms as

7£v> 'fast'), '"9-=:."-::> (compare 'amazing'), etc. Provide broad phonetic transcriptions and

colloquial exclamations: Ii~-::> (compare /hayai/ /sugoi/

phonemic transcriptions for these clipped forms. In terms of use/meaning, how do these clipped forms compare to the "tough-guy" forms ending in /eH/ (like /sugeH/) that I described in §4.6?

10

The English words line-up and clean-up were borrowed into Japanese baseball

7 1)- /7':;7". But the also be pronounced rainnappu 71 /T'/7°, and

terminology as rain'appu 7 -1 /7'/7" and kurln'appu word meaning 'line-up' can

kurlnnappu 7 1)- /-J-'/7° now seems to be the preferred pronunciation for the word meaning 'clean-up'. This change in pronunciation is a kind of epenthesis (§2 .12). On the other hand, the made-in -Japan compound bajon'appu ,1{ -

y3 /7'/7° 'upgrade' doesn't seem to allow the alternative pronunciation bajonnappu ,1~' - y3 Y-J-'/7". Transcribe all these words phonemically. Is there a plausible phonetic explanation for why the sequence/ N/V might tend to become / Nn/V? Is there any reason that the word meaning 'upgrade' would be unaffected? Can you think of any words that aren't recent borrowings but look

Syllable-final consonants

114

as if an epenthetic /n/ might have appeared between

IN/ and a vowel at some

time in the past? 11

Akamatsu (2000:154-61), working in a theoretical fra mework that differs in many respects from the one adopted in this book, proposes an analysis ofTokyo Japanese syllable-final consonants that includes elements corresponding to our phonemes /N/ and

/Q/ in some circ umstances, but

neither occurs immediately preceding a nasal or a pause. For example, making reasonable adjustments for the differences in framework, Akamatsu would transcribe kaban

fifil 'briefcase' as /kabaC/ and chinmoku ttll!& 'silence' as

/ciCmoku/, where /C/ is distinct both from /N/ and fro m /Q/. What do you suppose the motivation would be for proposi ng this /C/ ? How do you thin k Akamatsu would transcribe ren'ai ~~' love '? Try to argue against this ana lysis as persuasively as you can, taking both phonotactics and native-speaker intuition into consideration.

6 Syllables and moras

6.1 Syllables I mentioned in §1.5 that syllables seem to be basic units of speech production and perception, but there's no satisfactory articulatory definition of a syllable. 1 Auditory definitions are also problematic, and they usually rely on the notion of SONORITY, which means intrinsic loudness or audibility. 2 We can rank speech sounds on a SONORITY SCALE as in Figure 6-1. 3 Given this scale, the number of syllables in a word is supposed to be equal to the number of peaks of sonority. The diagrams in Figure 6-2 illustrate with English /frent~si / fantasy and Japanese /yoHgisa/ ~~~'suspect'. The circled numbers above each sonority peak indicate that the English word and the Japanese word have three syllables each, and these results certainly seem right. In some cases, though, the sonority-peak criterion clearly doesn't work. As the diagrams in Figure 6-3 show, the one-syllable English word /step/ step has two peaks of sonority, and the two-syllable Japanese word /cie/ 9;D!( 'wisdom' has only one peak. In any case, it seems beyond dispute that syllables are intuitively natural units for ordinary speakers. At the same time, English syllables are much more clearly natural to native speakers of English than Japanese syllables are to native speakers of Japanese. Children growing up in English-speaking communities seem to learn to count syllables as part of growing up, but children growing up in Japanese-speaking communities learn to count moras (§5.2, §5.5). There isn't even a colloquial Japanese word for referring to

1

Ladefoged 1967: 11-25, Catford I977:63-92, Ladefoged 1982:2 I9 -24, Laver 1994:113-4, Krakow 1999, Rogers 2000:267-8. 2 Jespersen 1928:452-3, Ladefoged I982:221-3, Laver 1994:503-5, Kubozono and Honma 2002:9-10. 3 Whitney 1865, Rogers 2000:268-70.

115

116

Syllables and moras

low vowels

High sonority

mid vowels high vowels sem ivowels nasals voiced fricatives voiceless fricatives voiced stops voiceless stops

Low sonority

Figure 6-1

Partial sonority scale

n

~

f

t

CD

s

~

®

i

J

[ j

®

High

i

®

\:

a ]

®

~

Low

Figure 6-2

o: g;

CD

Peaks of sonority matching number of syllables

(

t

CD

E

p

c

®

\:

i

e

CD High

Low

Figure 6-3

Mismatches between peaks of sonority and number of sylla bles

syllables. 4 Later in this chapter (§§6.4-5), I'll offer what I think is persuasive evidence that syllables are psychologically real units in Tokyo Japanese, but the fact remains that it usually takes careful explanation to make the notion clear to a native speaker. If two English speakers are native speakers of the same dialect, they usually agree on how many syllables there are in any particular word, although spelling can sometimes influence intuition. For example, it's not unusual for a person who pronounces the words royal and roil identically to say that royal 4

'&W

Linguists writing in Japanese nowadays use onsetsu to mean 'syllable', but non-linguists typically use this word to refer not to syllables but to moras (McCawley 1968:131, Kubozono and Honma 2002: 18).

117

6.2 Moras

( c ( / y/ ) ) v Figure 6-4

Japanese short syllable template

has two syllables and roil has one. When it comes to the boundaries between English syllables, though, there's a lot of uncertainty, and many linguists accept the idea that some consonants can be AMBI SYLLABI C, that is, simultaneously at the end of one syllable and at the beginning of the next. 5 In contrast, as long as there's agreement on the number of syllables in a Tokyo Japanese word, there's no doubt about where the syllable boundaries are. The only real challenge in Japanese is deciding whether two adjacent non-identical vowels belong to a single syllable or to two separate syllables, as I mentioned in §2.9 and §3.3. We'll take up this problem below in §6.7.

6.2 Moras

As I said in §5.2 and §5.5, Tokyo Japanese has short and long syllables. A short syllable consists of one mora while a long syllable consists of two moras. A mora is intuitively a unit of rhythm or timing, as the alternative term BEAT suggests. 6 Native speakers do seem to feel that moras are ISO C HRONOUS, that is, that every mora has the same duration as every other mora at a given tempo, but experimental efforts to demonstrate even approximate isochrony of moras in Japanese haven't been successful.7 I'll say more about this notion of isochrony below in §6.3. A Japanese short syllable has to contain a short vowel, and syllables that consist entirely of a short vowel are possible. Words like /e/ ~'picture' and /u/ ~ 'cormorant' are examples of such minimal syllables. Other short syllables consist of a single consonant followed by a short vowel, as in /ha/ ® 'tooth', or of a consonant followed by /y/ followed by a short vowel, as in /kyo/ ~ 'unpreparedness'. A convenient way to summarize the possibilities is to say that every Japanese short syllable fits the template in Figure 6-4. The parentheses in the template mean that the enclosed element is optional, that is, possible but not required. The V isn't enclosed in parentheses because it's required. The nested parentheses in (C(lyl)) mean that /y/ is the C in a syllable like /ya/. Nothing much turns on this, and (C)(ly/) would also work. Even though every Japanese short syllable fits the template in Figure 6-4, not every sequence of phonemes that fits the template is a possible short syllable. 5

7

Blevins 1995:232, Rogers 2000:92-3. Warner and Arai 200 I.

6

English beat is a translation of Japanese haku :JE.

118

Syllables and moras

119

v ( c ( /y/ ) ) v

IHI

IN/

Fi gure 6-7

/QI Figure 6-5

Japanese short/long syllable template

(J

(J

(J

(J

(J

(J

/'-.....

I

I

/'-.....

/'-.....

/'-.....

µ

/\

ko

Figure 6-6

µ

I

H

µ

µ

/!'-.

I

r y o

i

µ

/\

k e

µ

I

N

µ

/\

z e

µ

I

Q

µ

/\

t a

(J

µ

I

i

I

µ

/\

ci

Japanese syllable and mora structure

As we saw in Chapter 4, some sequences are phonotacticaliy prohibited (/hu/, /cy/ V, and so on). A Japanese long syllable begins like a short syllable and ends with another vowel (as in /kai/ ~ 'meeting'), the vowel-length phoneme I HI (§3.2; as in / hoH/ ti 'way'), the mora nasal IN/ (§5.2; as in /seN/ ;%,']l 'line'), or the m ora obstruent /QI (§5.4; as in the first syllable /buQ/ of /buQka/ !Jmfrffi 'prices'). We can expand the template in Figure 6-4 as in Figure 6-5. The braces in the template mean that any one of the enclosed elements (V, I HI, /NI, or !QI ) can occur as the second mora of a long syllable. The fact that the braces are enclosed in parentheses means that a seccond mora is possible but not required, which is just another way of saying that a syllable can be either long or short. Assuming this analysis of short syllables as one mora and long syllables as two moras, Figure 6-6 shows a common way of representing Japanese syllable and mora structure, using the words /koHryo/ ~JAi 'consideration', /ikeN/ )!Ji!, 'opinion', and /zeQtaici/ *t!l~{j 'absolute value' as examples. Each a stands for a syllable, each µ stands for a mora, and the lines show which moras each syllable includes and which segments each mora includes. The short syllables in Figure 6-6 illustrate all three possible types: V (the Iii in /ikeN/ ), CV (the /Ci/ in /zeQtaici/ ), and C/y/V (the /ryo/ in /koHryo/ ). The long syllables in Figure 6-6 illustrate all four possible types of second m ora: V (the first /i/ in /zeQtaiCi/), IHI (in /koHryo/) , !NI (in /ikeN/ ), and IQ! (in /zeQtaiCi/) . The nucleus (§1.10) of a syllable (also called the P EA K) is the portion with the highest sonority. It's almost always a vowel, although syllabic consonants

Figure 6-8

6.2 Moras Japanese /pyoN/

English /flreks/

0

N

C

r----1

r----1

r----1

r----1

fl

re

ks

py

0

N

C

,---.. o

,---.. N

Onset, nucleus, and coda English /fnmp/

English /sno/

English /rekt/

CJ

CJ

CJ

~

A~

I\ S

r

I

/\ m p

A

0

A

s

n

I R

I N

I

0

R

~ N

c

I

/\ k t

re

English syllable constituents

(§1.10) are possible in many languages, and word-initial IN/ is syllabic in Japanese examples like /NjuHoku/ /+~'several billion' (§5.3) . Notice, incidentally, that the templates in Figures 6-4 and 6-5 don't provide for this marginal possibility. Every syllable has to have a nucleus, and some have only a nucleus, as in English hi awe and Japanese Iii ~ 'stomach'. Any consonants before the nucleus in a syllable are the ONSET, and any consonants after the nucleus are the CODA. Figure 6-7 illustrates with English /flreks/ flax and Japanese /pyoN/ rJ. ~Iv 'hop', using the abbreviations 0 for onset, N for nucleus, and C for coda. A syllable with no coda is called an OPEN SY LLA BLE, and a syllable with a coda is called a C LOSED SYLLABLE. When a syllable ends with a diphthong or a long vowel, we can describe it as having a COMPLEX NUCLEUS, so English /tQ_if toy and Japanese /tai l ~~ 'sea bream' are open syllables. In English and many other languages, it seems clear that in a closed syllable the nucleus and the coda together form a CONS TITUENT, that is, a part that can function as a unit. This constituent is usually called the RH v ME (often spelled RIME) . 8 In an open syllable, the rhyme is just the nucleus. The name comes from the fact that one-syllable words rhyme with each other if they're identical after the onset, that is, if they have the same nucleus and (in the case of closed syllables) the same coda but different onsets. For example, English /rend/ and with no onset, /hrend/ hand with the onset /hi, /blrend/ bland with the onset /bl/, and /strrend/ strand with the onset /str/ all rhyme with each other. Englishspeaking children seem to learn the concept of rhyming effortlessly at a very young age, and of course rhyming plays a role in much of English poetry. Figure 6-8 shows the standard way of representing the internal structure of 8

Fudge 1987, Blevins 1995:2 12-6, Rogers 2000:88-9, Kubozo no and Honma 2002:43-5.

120

Syllables and moras

Bob Sloan

~

Slob Bone

cr 0

Figure 6-9

breakfast+lunch--+ brunch

cr

cr 0

R

~

I

N I

c

b

a

b

I

R

~

/\

s I

N I

c

0

n

I

0

/\

b r

cr R

~

N I E

c I k

0

I

R

~

N I /\

c

A n

c

Splitting at the onset-rhyme boundary

English syllables, using /Jrrmp/ shrimp, /sno/ snow, and /rekt/ act as examples. Notice that the nucleus and the coda form a unit in the closed syllables (shrimp and act). We looked at some English speech errors in §2.9, and it's clear that the onset-rhyme boundary isn't the only possible place where splits can occur, but it seems to be the favorite place. 9 An attested example is throat cutting pronounced mistakenly as coat thrutting, with the onset /k/ of the first syllable of cutting and the onset /9r/ of throat having changed places. 10 The onsetrhyme boundary also seems to be the favorite place for splitting in deliberate transpositions for the sake of humor and in words coined by BL EN o 1NG, that is, by combining the beginning of one word with the end of another. For example, it's not hard to imagine teasing a person named Bob Sloan by calling him Slob Bone, and the word brunch is a blend of breakfast and lunch. Both these examples involve splitting at the onset-rhyme boundary, as Figure 6-9 shows. In Japanese, on the other hand, rhyming doesn't play a role in poetry, and the nucleus and coda in a closed syllable don't seem to form a unit. Instead, it's moras (as in Figure 6-6) that are the constituents of syllables. 11 When a Japanese syllable has a complex nucleus, the nucleus is divided across two moras, just as the nucleus and coda in a closed syllable are divided. Figure 6-10 illustrates with / koN/ *1:t 'navy blue' and /kyoH/ 4-B 'today'. In Japanese speech errors, the favorite place for splitting long syllables is the boundary between moras. 12 For example, in one attested anticipation error, a speaker who meant to say /paHseNto/ /\--lz/l-- 'percent' pronounced /paNseNto/ instead, replacing / H/ (the second mora of the first syllable) with /NI, that is, anticipating the second mora of the second syllable. 13 Another attested error is the blend /doNde/, produced by a speaker who presumably

9

1

11

12

Kubozono 1989:266. Kubozono 1989:254. 13 Kubozono 1989:255.

°Fromkin 1971 :32. Kubozono 1989:266, Terao 2002:96, Kubozono and Honma 2002:58-9.

121

6.3 Mora timing

/koN/

µ

/kyoH/

µ

A

I

o

I

H

r------------

: '-y-'

: L..y---J '---.,-------' :

'-y-'

'-y-' :

I 0 N l____________

1

J

Japanese syllable constituents

/ha.N. ni.N.o/ lo. mo. ea. no.yo.H . na/ /ke. iJd i.cu/

Figure 6-11

A

k y

µ

N k 0 r--- -------- --.

:__?___ ~-----~-J Figure 6-10

µ

' Jn the detectives' room Treating the offender Like a play-thing.'

5-7-5 meter in a senryii

intended either /doHsite/ c."-5 l --C 'why' or /naNde/ fi:r-C· 'how come'. 14 The blend involves splitting at the boundary between the two moras of the first syllable in both words and combining the /do/ of /doHsite/ with the /Nde/ of /naNde/. 6.3 Mora timing

The psychological reality of moras in Japanese is beyond dispute. Native speakers ofJapanese learn to count moras as small children, and moras are the metrical units of traditional Japanese poetry. For example, just like a haiku, a kind of humorous poem called a senryii} 1l{Yil normally consists of a five-mora first line, a seven-mora second line, and a five-mora third line, for a total of seventeen moras. The poem in Figure 6-11 illustrates, using carats to mark boundaries between syllables and dots to mark boundaries between moras that are in the same long syllable. 15 The portions of particular interest in this poem are the long syllables /haN/ and /niN/ in the first line, /yoH/ in the second line, and /kei/ in the third line. The poem would have the pattern 3-6-4 if syllables were the units of meter, but it has the prescribed 5-7-5 pattern if we count moras. It's often suggested that every language can be categorized as belonging to one of a small number of rhythmic types. According to this typology, a language has SYLLABLE - TIMED rhythm if the overall tendency is for syllables to 14 15

Kubozono 1989:274. Blyth ( 1949:11 6) cites the poem in Figure 6-11 and provides the English translation.

122

Syllables and moras This

Figure 6-12

is

the

I road to I

town.

Stress-timing and feet in English

be isochronous, and French is commonly cited as an example of a syllabletimed language. In contrast, in a language with STR ESS-TIMED rhythm, the overall tendency is for the intervals between stressed syllables (§1.7) to be isochronous, and this unit of isochrony is traditionally called a FOOT. 16 English is categorized as a stress-timed language, and as the example in Figure 6-12 shows, the number of syllables in each foot varies. The acute accents mark stressed syllables, and the vertical lines mark foot boundaries. As for Japanese, given that it has two-mora long syllables and one-mora short syllables, and given also that the moras are (even very approximately) isochronous, it can't be a syllable-timed language. It can't be stress-timed either. As I mentioned in §l.7, Japanese accent involves pitch rather than stress, and in any case, there's no tendency for accented syllables (§6.4, §7.2) to occur at regular intervals. Instead, Japanese is usually said to have MOR A T l MED rhythm.'7 As I mentioned above in §6.2, though, experimental efforts haven't succeeded in demonstrating the phonetic reality of Japanese moras as isochronous timing units. Of course, no one seriously maintains that every mora has precisely the same duration as every other mora. Instead, the idea is that we should see compensation effects in Japanese that make average mora durations more nearly equal than we'd expect from the inherent durations of the segments involved. 18 I won't go into the details, but the evidence for compensation is weak, and the claims of different studies contradict each other. 19 Experimental studies have also failed to detect anything close to isochrony of the relevant units in languages categorized as syllable-timed or stress-timed. 20 Even if these rhythmic categories do reflect real properties of languages, they're no more than tendencies, and some researchers have argued that the purported isochrony of the relevant units in each category is just an illusion caused by the relative proportion of vowels and consonants in utterances. 2 1 Skilled recitation of poetry probably comes closest to the isochronous ideal - certainly closer than spontaneous speech, with all its hesitations and dysfluencies. 16

Pike l 945:35, Abercrombie l 967:96-8, Catford 1977:85-92. Ladefoged 1982:226, Rogers 2000:270-1, Kubozono and Honma 2002:20-1. 18 Han 1962a, Homma 1981, Vance 1987:70-2, Sato 1990. 19 Beckman 1982, Hoequist 1983, Warner and Arai 200 1. 20 21 Dauer 1983, Nespor 1990:160-1. Ramus, Nespor, and Mehler 1999. 17

123

6.4 Syllables, moras, and accent Table 6-1 Accent locations in noun+/ wa / phrases

/ma lJcuAraAwa/ /taAma ;Ago Awa/ /aAtaAmalAwa/

'pillow TOP' 'egg TOP' 'head TOP'

Table 6-2 Accent patterns on nouns consisting of two short syllables

fhalsLwa/ /haJiPAwa/ /haAsiAwa/

'chopsticks TOP' 'bridge TOP' 'edge TOP'

Table 6-3 Accent patterns on words consisting of two long syllables

/sel.NAni.NAwa/ /se.NAnil.NAwa/ /se.Kni .Kwa/

'thousand people TOP ' 'hermit TOP' 'seniority TO P'

6.4 Syllables, moras, and accent I mentioned just above in §6.3 that Japanese has a pitch accent system (§1.7), and as we'll see in §7.2, the essential part of the accent pattern on an accented Japanese word is the location of a pitch fall, that is, a change from a high pitch to a low pitch. We won't go into the details here, but the examples in Table 6-1 illustrate, with a downward-pointing arrow marking the location of the pitch fall in each case. Each example is a short phrase consisting of a noun followed by the topic marker /wa/ Lt, and the syllable just before the arrow is called the ACCEN T E D SY LLAB LE.

As we'll see in §7.3, a noun consisting of n short syllables (and therefore n moras) can have any of n + 1 possible accent patterns: the pitch fall can come after any syllable, or there may be no fall at all. The examples in Table 6-2 illustrate the three possibilities for two-syllable nouns of this type. In a noun consisting of n long syllables (and therefore 2n moras), there are still only n + 1 possible accent patterns, not 2 n + 1. The examples in Table 6-3 illustrate the three possibilities for nouns with two long syllables (and four moras). The downward arrow appears between the two moras of an accented long syllable because, according to traditional descriptions, the first mora of such a syllable is high pitched and the second mora is low pitched. In fact, the

124

Syllables and moras

125

Table 6-4 Accent on city names

~$

/kuArelAsi / /aAkLtalASif /kuAmaAmoAtolASif /se.Kdal.dif /ni.QAkOl.KSif /musoAra l.N Asif

I y ~ N

0

N~2fs:rfi

{ll.J#irfi

B:7'trfi '@:mtirfi

2:-

...

·...............·-·

c

i

b

300

~ 200 c:

·-

--··.

•.

a I

oH

::J

··. ··~

~ 100

.....

0

Figure 6-13

200

'Kure City' 'Akita City' 'Kumamoto City' 'Sendai City' 'Nikko City' 'Muroran City'

f)(E8$

400 Time (ms)

600

: 800

Pitch track of /yoAbiJol. HAsa/

fundamental frequency (§1.12) typically falls smoothly from the beginning to the end of a long accented syllable, as the pitch track (§1.12) in Figure 6-13 shows. 22 The word /yoAbLeoi.HAsa/ .:Y{iifrn)ijj ~ 'pilot survey' is accented on the long syllable /eoH/, but the pitch track in Figure 6-13 doesn't show an abrupt transition from a high-pitched first mora to a low-pitched second mora in /coH/. The important point here is that when two moras are in the same long syllable, there's no possible constrast between a pitch fall after the first mora and a pitch fall after the second mora. In other words, regardless of whether a syllable is long or short, it provides only one potential accent site. 23 Many accentual regularities are easy to describe once we understand that syllables rather than moras are the accent-bearing units of Tokyo Japanese. An example is the accentuation of city names ending with the morpheme /Sil rP 'city'. As Table 6-4 shows, a word of this form is accented on the syllable immediately before /si/, regardless of whether that syllable is short or long. When the syllable immediately before /Si/ is long, the downward arrow appears in the traditional position between the two moras of that syllable in Table 6-4. 22

23

Figure 6-13 is based on a token produced by an adult female native of Tokyo. A small number of obvious pitch-tracking errors (none in the /oH/ portion) have been erased from the display for the sake of clarity. McCawley 1968:59, Kubozono and Honma 2002:37-8.

6.4 Syllables, moras, and accent Table 6- 5 Examples of default accent on syllable containing third mora from end

® ® @ CD

/maAmL,.mu\meAmo/

a column of the kana syllabary

®@CD /palAJaAma/

'pajamas'

®®@CD /pa.Uol.QAto/

'pilot'

®®@CD /ho.HAmulAra.N/

®®@CD /eAreAbel.HAta.H/

'home run' 'elevator'

®®@CD /koANApuArel.QJruAsu/

'complex'

On the other hand, there are accentual regularities that involve moras as well as syllables. The most important is that the third mora from the end of a sequence seems to specify a kind of default location for accent; strings of meaningless syllables and recently borrowed words (the vast majority of which are nouns) tend to be accented on the syllable that contains the third mora from the end. 24 Notice that this description of default accent refers both to moras and to syllables. The examples in Table 6- 5 illustrate, with the moras numbered from the end and the accented syllable bracketed in each case. The accented syllable is the third syllable from the end in the first two examples and in the last example in Table 6-5, but it's the second syllable from the end in the other three examples. As I've already mentioned, traditional descriptions mark the pitch fall in an accented long syllable between the two moras, so when the third mora from the end happens to be the second mora of a long syllable, the arrow in the transcription appears between the third and fourth moras from the end. In all the examples in Table 6-5, the accented syllable doesn't correspond to the accented (that is, stressed) syllable in English. The last example is based on the English noun complex and denotes a psychological complex. Incidentally, when a recent loanword is too short to have a third mora from the end, the default location for accent is as far from the end as possible, which means the initial syllable. In many cases, the initial syllable is the only syllable. The two-mora words /pulAro/ 7 °P 'pro' (two short syllables) and /pel.N/ ~/ 'pen' (one long syllable) are typical examples. I should also point out that many 24

McCawley 1968: 133-4, Kubozono 1989:250, Kubozono and Honma 2002:36 -8.

126

127

Syllables and moras

recent loanwords are exceptions to the default pattern. Most of these have an accented syllable that corresponds to the accented syllable in the source language, as in /guAre'.H/ :~lv- 'gray' (from English gray) and /poAriAe'AsuAteAru/ ~ 1).X.7.f"Jv 'polyester' (from English polyester). 15 In many cases, of course, the default pattern and the source language accent produce the same result, as in /bi'AdeAo/ t:'f ";t (from English video) and /ko.NJcuAri'.HAto/ :I / 7 1) (from English concrete). On the other hand, there are certain syllable-structure patterns that seem to favor an accent location other than the default.26 For example, four-mora noun loanwords that consist of two short syllables followed by one long syllable tend to be accented on the third syllable from the end (the syllable preceding the default location), as in /do'AraAgo.N/ f 7 ::i'' y 'dragon' (from English dragon ).

r

Figu re 6-14

Figure 6-15

6.5 Words and music It's common to cite poetic practice as evidence for the psychological reality of rhythmic units in a language, but as we saw in §6.2, the metrical conventions of traditional Japanese poetry involve counting moras and not syllables. To find a similar kind of evidence for the psychological reality of syllables in Japanese, we need to look at the conventions for setting words to music. The basic principle is that a long syllable sounds equally good assigned to a single note or assigned to two notes, with each mora on a note of its own. I'll use some well-known children's songs as examples. I'll start by looking at the problem that the mora obstruent IQ! causes for matching words to melodies. As we saw in §§5.1-3, IQ/ is usually realized pho netically as a voiceless obstruent, and when a voiceless stop immediately follows, what corresponds to !QI acoustically is silence. Needless to say, it isn't possible to sing silence, so it's somewhat surprising to English speakers that Japanese songwriters can assign /Q/ to a note of its own. When they do, a singer has to "cheat" by copying the vowel of the mora immediately preceding IQ!. For example, if the word /na.Q_ta/ '/;f. -;d::. 'became' is assigned to three notes, a singer produces [na]-[at:]-[ta], singing IQ! as [at:]. Also, vowel rearticulation (§3.2) normally occurs between a vowel and its copy, so /naQta/ sung on three notes sounds like [no*ot::o]. On the other hand, songwriters can also assign an entire long syllable ending with IQ/ to a single note. If the same word /na.QAta/ is assigned to two notes, a singer produces [nat:]-[ta], singing /naQ/ as [not:]. What's important here is that both assignments seem equally felicitous. Figure 6-14 (from "Teruteru-bozu" rl QlQW.±.J 'Sunshine Charm 25

M cCawley 1968: 134, Giriko 2006:3- 4.

26

Ku bozo no 2006:42-7.

Figure 6-16

6.5 Words and music

,~

£ I µ µ ,J> ~

so

.ti. re

Syllable ending with

'# " I

'

l.,

t;

mu ka si mu Verse 2: !l c 7.}I/) ~ to hime sa 0

Figure 6-17

l.,

ka

si

*

no

ma

(!)

Figure 6-lS

Two short syllables assigned to one note

'~ Figure 6-18

fJ>

~

£I j

j

l

) I ;1

;t:)

c

7.}

If)

~

0

to

hi

me

sa

•~ J

*

ma

(!)

no

Notational revision providing one note for each syllable

type, that is, examples involving a single note spread over two or more syllables. The reason is that a single note can always be replaced by two shorter notes on the same pitch to preserve the melody and the rhythm. Nonetheless, some songs are published in a form that implies a single note matched with two syllables, often because one verse has a single syllable at the same point where another verse has two syllables. The example in Figure 6-17 (from "Urashima Taro" roo~::te~J [the name of a fairy- tale character]) illustrates.30 The two short syllables / hime/ of the word /o~to~hi~me / Z:,yg! 'young princess' are assigned to a single note in the second verse. It may be that the notation in Figure 6-17 reflects some sort of intuition on the part of the songwriter or the publisher, but the second verse would be sung exactly the same way if /hime/ were treated as in Figure 6-18. This notation makes / hime/ just like /oto/ and /sama/ by assigning it to a dotted eighth note and a sixteenth note instead of to one quarter note. When native speakers hum this song, pronouncing [q1111J for each note, they hum once for /s i/ in the first verse, but they hum twice for /hime/ in the second verse. This difference suggests that the revised version of the second verse in Figure 6-18 is a more accurate representation than the published version in Figure 6-17. Before moving on to consider other types of long syllables, I should acknowledge that there's an undeniable element of subjectivity in describing certain word-m elody correspondences as felicitous and others as infelicitous. Even so, native speakers of a language seem to have quite clear intuitions about what kinds of correspondences are ideal. Of course, it isn't realistic to imagine 30

"Urashima Taro" rim !l\!;*fi~ J : words and music anonymous, 1911 (Nobarasha Henshubu 1985:44-5).

Figure 6-2(

6.5 Words and music

'*

e I p'

~

/1)

fJ>

me

ka

p p



~

~

pI


ft t ::. J 'Found a Little Autumn') shows a long syllable on two notes. 32 The I N/ in this line is in the last syllable of the word /oAnLsa.N/ 'it (in a game of tag)', and this IN/ is assigned to a note of its own. Figure 6-20 (from the same song) shows a long syllable on one note. The I N / in this line is in the last syllable of the word /daAreAkaAsa.N/ t!.n·/p~ Iv 'someone', and this I N/ is assigned together with the preceding mora /sa/ to a single note. Unlike a syllable ending in /QI, though, there's no surefire way to tell whether a syllable ending in /N/ is sung on a single note or on two notes with the same pitch. As we saw, when a syllable ending in IQ/ is on two notes, the vowel is copied and vowel rearticulation appears. The syllable /saN/ in Figure 6-20 is sung [soIJ:], with a velar nasal because of the following lg/. If it were on two notes, it would be sung [so] -[ IJ:], not [so]-[OI]:] (that is, not [s6*6IJ:]). Maybe the transition from vowel to nasal is more abrupt in the two-note case [s6]-[1J] than in the one-note case [sOI]:], 31

32

Kubozono 1999a. "Chlsai aki mitsuketa" r lj' tf. ~>f'}tc, with the first three moras all in the same syllable. We can categorize possible extra-long syllables into the four basic types shown in Figure 6-23. In type 1, a diphthong or long vowel is followed by /QI, as in /ha•.i.QAta/ A -:::>t-: 'entered' and /to•.H.QAta/ 'passed'. In type 2, which occurs only in recent loanwords, a diphthong or long vowel is followed by IN/, as in /wa•.i.N/ 71 /''wine' and /ro•.H.N/ P - / 'loan'. In type 3, a short vowel is followed by /NQ/, as in /niAho.N.QApo•.il B:;;fs:-:::>ll'\, }'Japanesy'. In type 4, a long vowel is followed by Iii, as in /o.HAruAdoAbo•.H.i/ 7-tJv F$-1 'old boy'. The diqgrams in Figure 6-24 show the syllable structures I'm suggesting for some of these words. 35

Kindaichi 1963: 113-4, 1967b:70-l. English special mora is a translation of tokushu-mora !f!fl!*-=t- 7 (more commonly tokushu-haku !f!fli*fl3 'special beat'), and dependent mora is a translation of hijiritsu-mora ~F 1311:-=t -7. 37 Hattori 1958:361, Vance 1987:72-3. 36

Syllables and moras

132 Type 1

Type 2

Type 3

Type4

(C(/y/))V v~}/Q/

(C(/y/))V v~}/N/

(C(/y/))V/NQ/

(C(/y/))Y /Hi/

Figure 6-23

Extra-long syllable types CJ

cr ~ µ µ µ

CJ

~I µ µ µ µ

I i

I

Q

/\

/\

I

I

t a

r o

H

N

cr

CJ

I

cr

cr

µ

~ µ µ µ

~ µ µ

~ µ µ

µ

µ

/\

/\

/\

I

I

/\

/\

/\

I

0

H

r u

d

b

H

n i

Figure 6-24

/\ h a

ho

I

I

Q

p

I

0

CJ

CJ

()"

I

I

~ µ µ µ

0

0

I

Moras grouped into extra-long syllables

There's room for doubt about some of these examples. How can we be sure that /ha 1.i.QAta/ begins with the extra-long syllable /ha.i.Q/ instead of with the two syllables /haj.Q/? And how can we tell that /0Hrudobo 1Hi/ ends with the extra-long syllable /bo.H.i/ instead of with the two syllables Ibo.Ki/? In the case of /to 1.H.QAta/, there's no problem; it's clear that the first three moras don't form two syllables. The two -syllable sequence /to.HAQ/ can't be right because the mora obstruent /QI never occurs at the beginning of a syllable (§5.6), and the two- syllable sequence /toAo.Q/ can't be right because we don't have the vowel rearticulation that we'd expect between adjacent identical vowels in separate syllables (§3.2). There's no vowel rearticulation in the /boHi/ of /0Hrudobo 1 H i/ either, so we know that it contains the long vowel /oH/ and not the vowel sequence loo/, but there's nothing wrong with Iii being a syllable on its own. The /haiQ/ of / ha 1.i.QAta/ can't be /ha.iAQ/ because !QI can't be at the beginning of a syllable, but there's nothing wrong with the two-syllable sequence /haj.Q/. I've already mentioned that it's often hard to determine whether two adjacent non-identical vowels belong to the same syllable or to separate syllables (§2.10, §3.3), and we'll look at this problem more carefully just below in §6.7. At normal conversational tempos, there does seem to be a tendency to reduce extra-long syllables to ordinary long syllables, with V/HQ/ becoming V/Q/, V/ HN/ becoming VIN/, and /NQ/ becoming /NI. But in careful pronunciation (§2.12), speakers clearly distinguish /to 1.H .QAta/ :lm-::dc 'passed' from /to 1.QAta/ ml "'::J tc 'took', /sP.H.N/ ~-/''scene' from /si 1.N/ ~'core', and /ko 1.Kte/ ::i / j - 'script' from / ko 1.N.QAte/ *i:t"?--C (a noun meaning 'navy blue' followed by the colloquial quotative particle /Qte/ ).

133

6.7 Vowel-vowel sequences

6.7 Vowel-vowel sequences In this section I'll use ViV2 to stand for a sequence of two non-identical short vowels. Since Tokyo Japanese has five short vowel phonemes, there are twenty possible sequences: each of the five vowels as V1 followed by each of the other four as V2• We saw in §3.3 that all twenty ViV2 sequences occur in modern Tokyo Japanese, although some are quite rare unless the two vowels are in separate morphemes. As I've mentioned more than once already (§2.10, §3.3, §6.6), it's not always easy to decide whether V2 is in the same syllable as V1 or in a separate syllable, but as long as V2 isn't /i/ or Jul , it seems safe to say that there's a syllable division between V1 and V2 • To give just a few examples, it's intuitively clear that we have /aAe/ in /ma 1 e/ §u 'front', /oAe/ in /ko 1 e/ F' 'voice', /uAe/ in /cu 1 e/ ~'cane', and /iAe/ in /ri 1eki/ fU'.fra: 'profit'. The only uncertainty about these examples is whether there's a glide between the two vowels in some cases - /y/ when Vi is Iii or l e/, /w/ when Vi is Jul or lo/. The example of /ia/ that I cited in Table 3-8 (§3.3) was the recent loanword /gia/ :¥7 'gear', but even standard dictionaries recognize the alternative katakana spelling :¥-V (gi-ya), which implies /giya/.38 Except in very careful pronunciation, it's hard to tell /i/V from /iy/V and /u/V from /uw/V, and only slightly easier to tell /e/V from /ey/V and /o/V from /ow/V. 39 As we saw in §3.3, when Vi is /a/ and V2 is Iii, they tend to form a diphthong, even when they're in different morphemes. For example (with plus signs marking the relevant morpheme divisions), /naNta 1 i+saN/ ~ f*LLJ 'Mt. Nantai' and /oHta1 +isaN/ AEEl ~ 1'& 'Ota stomach powder' are both threesyllable words, with /ai / forming a diphthong in the second syllable /tai l: /na.NAta 1.iAsa.N/, /o.HAta 1.iAsa.N/. But there are complications. For one thing, if an /ai/ sequence is immediately followed by IN/, treating /ail as a diphthong would result in an extra-long syllable ending with /a.i.N/. I suggested above in §6.6 that extra-long syllables of this form are possible, as in /sa 1.i.N/ -tt1 /' 'signature', but if the vowels /a/ and /i/ are in different morphemes, there seems to be a preference for treating /iN/ as a separate syllable, as in /fa 1 +iN/ 'company employee', which seems to be two syllables: /$a 1Ai.N/. But I haven't made a convincing case that words like /sa 1 iN/ and /wa 1 iN/ 71 /''wine', with no morpheme break between /a/ and Iii , are really single extra-long syllables. The intuitively plausible distinction between /a.i.N/ in /sa 1 iN/ and /aAi.N/ in / ~fa +iN/ could be just an illusion. Another complication is that intuitions about syllabification seem to be influenced by differences in pitch accent. Intuition seems to favor treating

t±/

1

38

Matsumura 1995 ( D aijirin ), Shinmura 1998 (Kojien).

39

Martin 1975:734.

134

Syllables and moras

the sequence /ail as a diphthong when our transcription convention for accent puts the downward-pointing arrow between /a/ and /i/. For example, the transcription of the word /ka'i+ro/ @]~'circuit' is compatible both with the two-syllable analysis /ka'.Uo/ and with the three-syllable analysis / ka'ALro/. In / ka'.Lro/, the initial long syllable /ka.i/ bears the accent, and in / ka'ALJo/, the initial short syllable /ka/ bears the accent, but the actual pitch pattern is the same either way: a smooth fall from a high pitch at the beginning of /a/ to a low pitch at the end of /i/ (§6.4). Even so, the two-syllable analysis /ka'.LJo seems preferable. But when the accent is located somewhere else, the intuition that/ail is a diphthong seems weaker. Compare /doruga'ikoH/ f;v:$1'f.3( 'dollar diplomacy' with /gaiko'HkaN/ :$7'1-3('§ 'diplomat'. Could it be that /gai/ is a long syllable /ga.i/ in the first word and and two short syllables /gaj/ in the second? I don't know of any satisfying way to resolve the uncertainty in cases like this, but I'll come back to this problem later in this section after we've looked at other V1V2 sequences. Before moving on, I should take another look at the examples I cited earlier in this section with /el as V2: /ma•e/ ITTT 'front', /ko'e/ Ff 'voice', /cu•e/ ~'cane', /ri'ekil f!J~ 'profit'. Notice that all of these are transcribed with the downward-pointing arrow between the two vowels in the V1V2 sequence. But V2 is /el rather than /i/, and intuition doesn't seem to favor treating any of these cases as diphthongs. When the V1V2 sequence is lei/, we almost always seem to have a diphthong, typically realized as [e:] except in very careful speech (§3.3). Even when a morpheme division separates the two vowels, as in /me•+isa/ ~:?f 'eye doctor', there often isn't a clear syllable division between /el and Iii (§3.3). On the other hand, a few vocabulary items contain a sequence /ei / that isn't realized as [e:], except perhaps in very sloppy pronunciation. Two such items are /e' i/ ;ti.) 'ray (fish)' and /re'il v-1 'lei', which are normally pronounced [ei] and [rei]. 40 In contrast, /re'i/ -Wtl 'example' is typically pronounced [e:]. We can account for the difference by analyzing /re•if 'example' as /re'.i/, with one long syllable, and /e'i / 'ray' and /re'i / 'lei' as /e'j/ and /re•j/, with two short syllables each. Notice, though, that all three words have the accent location that I said favors the diphthong intuition in the case of /ail. The sequences /oil and /ui/ seem to be like /ai /. Words like /ko'il ffe.l 'carp' 'post', with the accent location favoring the diphthong intuand /ku'il ition, are probably single long syllables: /ko'.i/ and /ku'.i/. A morpheme break doesn't seem to make any difference, so /ko'+i / Ml:Ji 'intention' and / ko'+i / HJ v) 'dense' (with the suffix Iii marking the nonpast affirmative; §7.5) are probably both homonyms of /ko'i/'carp', that is, the single long syllable /ko'.i/. And

m

40

See the entries in Kindaichi and Akinaga 200 1.

135

6.7 Vowel-vowel sequences

/ku'+i/ nJ/&: 'phrase meaning' is probably a homonym of /ku'i / 'post': /ku'.i/. As with /ail, the diphthong intuition seems weaker when the accent location is somewhere else, as in /koi+no'bori/ f!!!im 'carp streamer' and /kui+uci'+ki/ tfLHi?~ 'pile driver'. The word /ko'iN/ :::J 1 / 'coin' might be a single extralong syllable (/ko'.i.N/), but like /sa'iN/ -tJ-1 / 'signature' and /wa'iN/ 71 / 'wine', it also might be a short syllable followed by a long syllable (/ko 1j .NI). When V2 in a V1V2 sequence is /u / rather than /i/, it seems less likely to form a diphthong with V1• There's little doubt that the vowels in /iu/ and /eu/ are always in separate syllables. These sequences are rare within morphemes, but there are proper names like /pu 1 riusu/ 7 °1) '7 7-. 'Prius' and /ze 1usu/ -1!'7 7-. 'Zeus'. The second of these two examples has the accent location that I said favors the diphthong intuition, but there seems to be a syllable boundary between V1 and V2 in both: /pu'AriAuAsu/, /ze 1AuAsu/. Needless to say, when the two vowels straddle a morpheme division, as in /ko+mori 1 +uta/ T~ ~ UJ{ 'lullaby', the two-syllable intuition is just as unambiguous. As we saw in §3.3, the sequence /ou/ is very rare within morphemes. When a speaker pronounces Souru ')'] Jv 'Seoul' as /so•uru/ (rather than /so'Hru/), it's hard to tell whether the two contiguous vowels form a diphthong or not, but I'm going to say that /ou/ is like /iu/ and /eu/, that is, that the two vowels are always in separate syllables. When there's a morpheme division between lo/ and /u/, as in /yamato 1 +uta/ -*fD~ 'Japanese poem', the two-syllable intuition is stronger. There are also a few verbs with an affirmative nonpast-tense form that ends in /ou/, including /soro 1 u/ Wi-5 'gather'. We could say that there's a morpheme division between lo/ and Jul in such examples, although attempts to analyze Japanese verb forms into morphemes raise difficult problems that are beyond the scope of this book (§3.3). 41 In any case, it seems clear that /o/ and /u/ are in separate syllables in such forms: /soAro 1Au/. In some cases, it's absolutely clear from the accent pattern that the two vowels in a V1V2 sequence are in separate syllables. As we saw in §6.4, a long syllable provides only a single potential site for accent, and an accented long syllable is traditionally described as having a high pitch on the first mora and a low pitch on the second mora. If there's a high pitch on the second vowel in a V1V2 sequence and a low pitch on the following mora, then V2 has to be in a separate syllable from V1 . For example, the accent pattern on /tamei'ki/ r&/&')}~, 'sigh' makes it clear that Iii constitutes a syllable by itself: /taAmeAi1Aki/. The sequence /au/ can occur within a morpheme, as in /a 1 uto/ 77 'out', and across a morpheme division, as in /funa 1 +uta/ fir~ ' boatman's song', both with the accent location that I've been saying favors the diphthong intuition.

r

41

Va nce1987: 175-208, 1991 , Klafehn 2003 .

136

137

Syllables and moras Table 6-6 Accent on diphthongs in city names

/mu.N, ba 1.iJ;i/ / ha, no 1. di/ /k:u, ra,ka 1.u,s i/ /bi,sa 1.uJii/ /ma, na, gu,a 1,Si/

k/'/-\1$ /'\)1$

:77fJ'7$ l::''-lf '7$ ?7":7''7$

'Mumbai City' 'Hanoi City' 'Krakau (Cracow) City' 'Bissau City' 'Managua City'

Despite the accent, the intuition that /a/ and /u/ are in separate syllables is quite strong when they straddle a morpheme division. The two-syllable intuition also holds for verb forms like /hara 1u / :f.l-J 'pay', where there's some doubt about whether there's a morpheme division between /a/ and /u/ . It may well be that /a/ and /u/ are in separate syllables even within a morpheme, as in /a 1 uto/ and / ha~su/ /\ '7 'A 'greenhouse', and the two-syllable intuition seems stronger when the accent location is somewhere else, as in /auto+ko 1 Hsu 7'7 ::i-".A 'outside lane' and /hausu+sa%ai/ /\ '7 'A~t~ 'greenhouse cultivation'. On the other hand, the accentuation of city names indicates that some /au/ sequences must be diphthongs. We saw in Table 6-4 (§6.4) that a city name ending in /si / rP 'city' is accented on the syllable immediately preceding /Sil, and Tokyo speakers treat /au/ just like /ail and /oi l before /si/. The examples in Table 6-6 illustrate. In the last example in Table 6-6, the sequence immediately preceding /si/ is /ua/, which we don't expect to form a diphthong, and the accent makes it clear that /a/ is a syllable on its own. Incidentally, many Tokyo speakers seem to prefer /fuJcu 1.di/ for ti#r!J 'Fukui City', and this form is consistent with the suggestion I made above that /ui / is typically treated as a diphthong. But pronunciation dictionaries give /fu,k:uY, Si/, with /u / and /i/ in separate syllables. 42 In the word / boHru+ka 1uNto/ ;:J;'- Jv;b '7 / '(ball-and-strike) count',

r

r

the sequence /kauN/ might be a single extra-long syllable (/ka.u. /), but it also might be a short syllable followed by a long syllable (/ka,u. /) . This example is like /sa1 iN/ -it1 /'signature' in that I don't know of any satisfying way to resolve the uncertainty about syllabification. In contrast, in a case like /sai 1 N+peN/ -lf1 /~/'marker pen', it's clear from the accent pattern that V 2 and the immediately following mora nasal form the long syllable /iN/, separate from /sa/: /saY.N,pe.N/. Since the first morpheme in /sai 1 N+peN/ is the same as the only morpheme in /sa 1 iN/, it may be tempting to see the syllabification of /saj 1.N, pe.N/ as an indication that the word /sa1 iN/ must be a short syllable followed by a long syllable (/sa 1, i.N/ ) rather than a single extra-long syllable 42

~-I

NHK 1998, Kindaichi and Akinaga 2001.

6.7 Vowel-vowel sequences

(/sal.i.N/). But this conclusion doesn't follow unless we assume that the syllabification of a V1V2 sequence within a morpheme is an inherent property of that morpheme and can't just be predicted from where the two relevant moras are and what the accent pattern is in a particular word. It would be nice if we could do without this assumption. After all, in most Tokyo Japanese words there aren't any V1V2 sequences or potential extra-long syllables to worry about, and syllabification really is completely predictable. But remember my earlier suggestion about /rel.i / {J!J 'example' and /relAi / v1 'lei'. I used an inherent difference in syllabification to explain why /rel.i/ is [re:] but /relAj/ is [rei] at a moderate conversational tempo. This explanation wouldn't work if the syllabification of /reli/ were predictable. We've already seen that it's hard to tell whether /saliN/ -lj"1 / 'signature' is a single extra-long syllable or a short syllable followed by a long syllable. Interestingly, we don't have the same problem with /supe 1 iN/ 7.-"1 /'Spain'. The /ei/ sequence in this word isn't ordinarily realized as [e:], so /peiN/ isn't a single extra-long syllable; it's a short syllable followed by a long syllable: /suApelAi.N/. In the case of /supeilN+JiN/ 7.-"1 /A 'Spaniard', the accent location makes it obvious that there's a syllable boundary between /e/ and / i/: /suApefKJi.N/. As I mentioned above in §6.6, in words like /oHrudobolHi / /.t-iv f';f-'-1 'old boy' and /pureHbolHi/ /v-;f-'-1 'playboy' it's hard to tell whether /boHi/ is a single extra-long syllable (/bo.H.i/) or a long syllable followed by a short syllable (/bo.HAi/). Company names ending with the morpheme /fa/ t± 'company' show the same pattern as the city names in Table 6-4, namely, accent on the syllable immediately preceding /fa /, as in /seAkiAJu.HAjPA+fa/ 7JF + =Ft± 'the Red Cross'. When the syllable right before /sa/ is long, we get the expected form, as in /nUal.iA+sa/ -=-71t± 'the Nirai Company'. Now compare /pureiboHil+fa/ 7°v1 ;t'-1t± 'the Playboy Company'. The accent location makes it clear that / i/ is a separate syllable from /boH/, since /i/ bears the accent: /puAre.iAbo.HjlA'Sa/. (The katakana spelling implies that the proper noun Pureiboi /v1 ;t'--1 has /ei/ where the common noun pureboi 7°v;t'-1 has /eH/, but this possible difference in phonemic form is beside the point here.) As I've already mentioned, though, it doesn't necessarily follow that the Iii in the word /pureHbolHi/ is a syllable on its own. It's tempting to eliminate the uncertainty by claiming that V1V2 /N/ sequences always have a syllable break between the two vowels and that V/H / V sequences always have a syllable break between the long vowel and the short vowel. There don't seem to be any examples that unambiguously require us to analyze such sequences as belonging to single extra-long syllables. As we've seen, in a case like /supeliN/

Syllables and moras

138

;z,-"'1 /'Spain' the stability of the [ei] realization provides independent evidence that /e/ and Iii are in separate syllables. But in cases like /sa 1 iN/ -If1 / 'signature' and /pureHbo 1 Hi/, we don't have the same kind of independent evidence, and intuitions are unclear. I mentioned in §6.l that native speakers of English are usually, but not always sure about how many syllables there are in a particular word, and there's no reason to expect that native speakers of Japanese will always be sure either. I'll consider the same kind of syllabification problems briefly again in §7.2, §7.4, §7.5, and §8.3. EXERCISES 1

Transcribe each word phonemically. Mark each word-i nternal syllable boundary with a carat and each syllable-interna l mora boundary with a period. chokkei TI[~ kaatsu

tJaEE

toronb6n

2

r0

gimuky6iku ~~f6C~' myakuhaku

llfiUB

henshubu *!ffi~.gfs

okashii t3b> l, t, ~

jakkan ;f§T

tatemono ~'1m

::.o J{-/

The basic rule for hyphenation in English text is that a hyphen has to go between syllables. Why isn't English hyphenation a trivially easy task? How about hyphenation of romanized Japanese?

3

In §5.3 I mentioned forms like /NJuQsai/ /+1,~ 'severa l decades old', and in Exercise 7 in Chapter 5 I mentioned that /paN/ /~/ 'bread' becomes / NpaH/ in zujago .A'-~°"f,g, a Japanese jazz musician s' argot (Tateishi 1989, Ito, Kitagawa, and Mester 1996). Draw diagrams like those in Figure 6-6 to show the syllables and moras in these forms. Can the syllab le template in Figure 6- 5 accommodate these forms?

4

Many languages seem to have what's called a WORD

MINIMALITY CONSTRAINT,

that is, a requirement that content words (nouns, verbs, adjectives, adverbs) have a minimum length. Tokyo Japanese allows one-mora nouns like /ha/

1*f

'tooth', but the normal pronunciations of certain kinds of list-like items suggest that there's a preference for content words to be at least two moras long (Ito

1990, Kubozono and Honma 2002:74). Carefully describe the pronunciation of the three items below and explain how they might be used as evidence for a weak word minimality constraint in Tokyo Japanese.

765-4321 (as a telephone number)

:k*± (an abbreviation meaning 'Tuesdays, Thursdays, and Saturdays')

Exercises

139

.::f:B:j!gpJNBlf*$gm~~ (the signs of the Chinese zodiac in traditi onal order)

5

Analyzing Tokyo Japanese long vowels as V1V1 (that is, as two identical vowel s in sequen ce; §3.2) would allow us to simplify the syllable template in Figure 6·5. Explai n how.

6

According to res ea rch on stuttering by native speakers of Japanese, the typical patte rn is to re peat the initial mora of an intended word (Ujihira and Ku bozo no 1994, Kubozono 1999b: 39, Kubozono and Honma 2002: 59). For example, when a stuttere r has trouble with the word /ko.NApa/

::i //~ 'party', the

pronu nciati on that results is usually described as something like /ko ko ko ko koNpa /, with the initial mora repeated several times. In contrast, the typical stuttering patte rn for native speakers of English is usually described as repetition of the initial consonant of an intended word, as in /k k k k kon/ for /kon/ cone . Do you think this difference has any relevance to proposed syllable-structure differences between Japanese and English?

7

Kubozono (1 999a: 247) reports that there are 588 long syllables in his sample of 100 twen tieth-century songs, and that the proportion of long syllables assigned to a single note is 64% (50/78) for those with the second mora IQ/, 49% (10 1/205) for those with the second mora / N/, 30% (41/13 7) for

those with the second mora /H/, and 6% (10/168) for those with the second mora Iii. How are these four phonemes realized phonetically? Do these realizations suggest any sort of explanation for the observed differences in proportion?

8

The Japanese names of several American states and Canadian provinces are listed below. Transcribe each phonemically and mark the accent location with a downward-pointing arrow. In each case, specify the syllable immediately preceding th e morpheme /suH/

1'M 'province, state'.

'/-t-1+1 lrinoi-shu 1 1) J 1 1·M Oregon-shu .:t v::i''/fM Ontario-shu ;t:;.,-7 1J.:t1'M Kebekku-shu 7"""'Y71'M Jojia-shu ~3 - :Y71+1 Teneshr -shu .:r.t-:;,-- 1•M Hawai-shu /\711+1 Arizona-sh u 7

9

1)

Th e Japan ese names of many chemical elements have katakana spellings that imply word-final /iumu/, as in natoriumu 1"

r

1 )

'7.b. 'sodium ' and karushiumu

tJ Jvyt'J .b. 'calcium'. But these words tend to be prono unced with /yuH/ or / uH/ in stead of /iu/ (Kindaichi and Akinaga 2001 :28 [front matter]): /n atoryuHmu/, /karusuHmu/ . Taking the following additional examples into

140

Syllables and moras

141

account, try to formulate a generalization that predicts when /yuH/ occurs and when /uH/ occurs.

aruminiumu 7 J'v2..:::... 7 b. 'aluminum ' itterubiumu -17 7 Jvt:''/ b. 'ytterbium' irijiumu -1 1)

:/7 b. ' iridium '

sutoronchiumu

kadomiumu -JJ

'Ar O Y1-'/b. 'strontium'

r· 2. 7 b. ' cadmium'

seshiumu ~ Y'/ b. 'cesium'

heriumu ;-,,. l) 7 b. 'helium' According to Jorden and Noda (1987:323), there is a general tendency for the vowel sequence /iu/ to be replaced by /yuH/ or /uH/ at an ordinary conversational tempo. Find a few relevant exa mples and test whether this proposed tendency seems to hold true. In add ition to whatever examples you manage to find, consider whether denkiunagi 'i'l.t'Affi.~ 'electric eel' can be pronounced with /yuH/ instead of /iu/? How abo ut Puriusu

10

1' 1) 7 'A 'Prius'?

Transcribe each word phonemically. Mark each word-internal syllable bounda ry with a carat and each syllable-internal mora bou nda ry with a period, and mark the accent location with a downward -pointing arrow.

hiroizumu I:: o-1' A'b. menrui

kai 4t

1iJil !Jii'i mukuiru f4x\ )~

T fll: oiru ;t11v kai

kon

::i -

/

kurumaisu III~-T

uguisu ~

waingurasu r; -1 / ,f 7 'A

11

Sino-Japanese morphemes of the form (C)V1V2 are usually taken to be monosyllables (Ku bozo no and Honma 2002: 5 -6). In modern Tokyo Japanese, what V1 V2 sequences actually occur within Sino-Japanese morphemes? Consult a dictionary that provides the kana spellings that were in use before the 1946 reform, and note the old spellings for /koH/

I, /koH/ 3 G* ~ 'come from Tokyo'. 35 On the other hand, we could say that /kara/ itself is unaccented but that an accent appears between /kara/ and a following particle or copula form. If so, we don't need to say anything special about noun+ /kara/ +verb combinations. As it turns out, all the particles that I tentatively transcribed above with no accent behave like / kara/ iJ>i? 'from'. As the examples in Table 7-14 show, when one of these particles forms an accent phrase with an unaccented noun and a ph rase-final particle or copula form, an accent appears between the last two words. 36 These examples raise the same problem that I considered in connection with /kara/. One solution is to say that each particle has an accent that disappears in accent phrases of the form noun+ particle+ verb, as in /koHeN de tabe' ru/ 0~ -C·jt A:. ~ 'eat in the park'. The other solution is to say that the particles themselves are unaccented but that an accent appears between a particle and a following particle or copula form. Before moving on to verbs, I should point out that some noun+ particle combinations have an idiosyncratic accent location. For example, the noun /i'ma/ 4- 'now' has initial accent, and we'd expect that accent to be retained when this noun combines with the particle /ma'de/ *"C' 'until' (as in Table 7-1 2), but the combination is pronounced /ima ma'de/. Also, a noun or noun+ particle combination used as a ritual expression can have an accent 34

36

McCawley 1968:139, Tanaka and Kubozono 1999:85-6. Martin 1975:2 1, Tanaka and Kubozono 1999:85 - 6.

35

Martin 1975:2 l.

162

Accent and intonation

location that differs from what we'd expect given the accent location of the noun in other circumstances. For example, the noun /tada 1 ima/ 7!.4' 'right now' and the noun+ particle combinations /ko 1 Nnici wa/ 4-B ti 'the present day mp' and /ko 1 NbaN wa/ 4-~ti 'this evening mp' are accented as shown, but dictionaries list the corresponding ritual expressions as finalaccented: /tadaima 1/ f::.t:!.'11) i 'I'm home', /koNniciwa 1/ ;:.AA::. t ti 'good day', /koNbaNwa 1/ ;:.AAf'AAi 'good evening'. 37 These ritual expressions don't combine with anything else in their normal uses, and as we saw in §7.2, if an accent phrase ends in a short vowel, there's no difference between final accent and no accent in terms of the pitch pattern. The reason dictionaries list the ritual expressions as final-accented is that they can be followed by a particle in phrases like konnichiwa mo iwanai de ;:.Jvt::.-f;tiib § :bid:P-z'' 'without even saying good day', and in these cases the ritual expression and the following particle form a single accent phrase with an accent immediately before the particle, as in /koNniciwa1 mo/. Finally, I should note that there's a great deal of variation among Tokyo speakers as to the accent location of many individual nouns. 38 To give just two examples, both /ku 1 ma/ and /kuma1/ are possible for fl~ 'bear', and both /de 1 Nsa/ and unaccented /deNfa/ are possible fo r m:lf1 'electric train'. 39 In many cases, a particular speaker will accept either form as correct but use one or the other consistently.40 But in other cases, a single speaker will use both forms, sometimes with a strong preference for one or the other in a particular collocation.

7.4 Verb accent We saw in §7.3 that an accent phrase consisting entirely of a noun can have an accentual fall after any syllable or nowhere. When it comes to an accent phrase consisting entirely of a verb form, the possibilities are much more limited. We'll look first at the DI C TIONARY FORMS of verbs, that is, the affirmative nonpasttense forms that are used as headwords for dictionary entries. As the examples in Table 7-15 show, no matter how many syllables there are in a dictionary form, there seem to be only two possibilities: accented or unaccented. 41 These examples suggest that if the dictionary form of a verb is accented, the accent is on the second syllable from the end. For the most part, this generalization is correct, and it applies regardless of whether the second-to-last syllable is short 37

39 41

38 Akamatsu 2000:270 - 3. Matsumura 1995, NHK 1998, Kindaichi and Akinaga 2001. 40 Akamatsu 2000:274. See the entries in NHK 1998 and in Kindaichi and Akinaga 2001. Martin 1952:33, McCawley 1968: 142, Akinaga 1998: 191- 4, Tanaka and Kubozono 1999:80.

163

7.4 Verb accent Table 7-15 Accented and unaccented dictionary forms of verbs

/na*ru/ /to*Hru/ /hare'ru/ /gaNba'ru/ /Sirabe'ru/

~.Q

'become' 'pass' [ijj(L.Q 'clear up' jijliji .Q 'persevere' l1ffll"" .Q 'investigate' ~.Q

/naru/ /koHru/ /harem/ /kaNJiru/ /kuraberu/

~~.Q

'sound'

i* .Q 'freeze' !litL.Q 'swell' ~ C: .Q 'feel' l::~"".Q

'compare'

Table 7-16 Dictionary forms of verbs ending with V/u/

/cigau/ ~ -5 'differ' /hirou/ ~-5 'pick up' /sukuu/ Jf)(-j 'rescue'

/hara'u/ tb -5 'pay' /omo'ul JM, -5 'think' /nu*u/ ~-5 'sew'

or long. The examples in the second row of Table 7-15 both have the long vowel /oH/ in the second-to-last syllable. From here on, I'll save a little space by calling a verb with an accented dictionary form an ACCENT ED VE RB and a verb with an unaccented dictionary form an UNACCEN T ED V ER B . Many verbs have a dictionary form that ends with a VV sequence. In all the actual words of this type, the first vowel in the VV sequence is /a/, lo/, or /u/, and the second is /u/. Table 7-16 gives some examples. The examples in the bottom row of Table 7-16 are transcribed with a sequence of two identical short vowels at the end, and as we saw in §3.2, this transcription implies that the two vowels are in separate syllables with vowel rearticulation between them in careful pronunciation. In ordinary conversation, verb forms that end this way are often pronounced with a final long vowel rather than with a sequence of two short vowels. To illustrate with examples in the table, the verb meaning 'rescue' can be pronounced [swkw:], and the verb meaning 'sew' can be pronounced [nw:]. 42 We saw in §3.3 that the vowel sequence fou l is very rare unless the two vowels are in separate morphemes, but several verbs have dictionary forms that end in fou l , including the two in the middle row of Table 7-16. This /ou/ sequence doesn't become [o:] except perhaps in quite sloppy pronunciation. For all verb forms like those in the table, whether or not there's a morpheme division between the next-to-last vowel and the final /u/ is a difficult question that we won't try to resolve here (§3.3, §6.7) .43 It's also hard to tell for sure whether the final /au/ and fou l sequences in such forms are in two separate syllables or in the same syllable, forming diphthongs 42 43

Hasegawa 1979: 127, Kindaichi and Akinaga 2001 :24 (front matter). Vance 1987:1 75-208, 1991 , Klafehn 2003.

164

Accent and intonation

(§1.8). The generalization I suggested above is that the dictionary form of an accented verb has an accent on the second syllable fro m the end, but the relevant syllable could just as easily be the syllable containing the second mora from the end. This alternative generalization makes the same prediction in every case we've looked at so far, regardless of whether or not final /au/ and /ou/ sequences are in the same syllable. On the other hand, combinations of a dictionary form followed by /no/ 0) are compatible with the claim that a final /u/ in such a form is in a separate syllable from the immediately preceding vowel. Regardless of whether /no/ functions grammatically as an indefinite pronoun or as a nominalizer, it combines with a preceding dictionary form into a single accent phrase. 44 For example, /oci•ru no/ 7'.g>i;,QO) can mean 'ones that fall' (with /no/ as a pronoun) or'falling; that s u BJ ECT falls' (with /no/ as a nominalizer). When the verb is accented, as /ocPru/ is, the combination is accented on the same syllable as the dictionary form on its own, but when the verb is unaccented, the combination is accented on the syllable right before /no/. Compare the unaccented dictionary form /agaru/ J:il ~ ~'rise' and the combination /agaru• no/. Returning to dictionary forms that end /au/ or /ou/, in the case of accented verbs like /ka•u/ fill]-) 'keep', the combination (/ka•u no/ for this verb) doesn't tell us whether the two vowels are in separate syllables. But in the case of unaccented verbs like /kau/ Ji-5 'buy', the accentual pitch fall in the combination always appears between /u/ and /no/ (as in /kau• no/ for this verb), and this accent location makes it clear that /u/ is a syllable on its own (§6. 7). When the dictionary form of a verb has a VV sequence just before its final syllable, it's usually clear whether or not the two vowels in the sequence are in the same syllable. Table 7-17 lists several of the relevant forms. Since Japanese has five contrasting short vowels (§3.1), twenty-five different VV sequences are possible, but only twelve are listed in Table 7-17. Several others could be added if we use verbs with a morpheme division between the two vowels. For example, the sequence /eo/ appears right before the final syllable in /sueo•ku/ t@.:Z fa\ 'leave as is', with an obvious division between /e/ and lo/ (compare /sueru/ :J=J§;Z~ 'set' and /oku/ m'.\ 'place'). There don't seem to be any relevant words with /aa/, /au/, /ou/, or /ua/ as the VV sequence. In /Sii•ru/ 5~\,d~ 'force' and in /yosoo•u/ 'kt-5 'put on', the two vowels in the VV sequence are identical, and as we saw in §3.2 and again just above in connection with Table 7-16, the two vowels in such cases are in separate syllables, and vowel rearticulation normally occurs between them. It's also clear from the location of the accent in these two forms that the second vowel in the VV sequence is a syllable on its own. If there were a long vowel instead of two short vowels, the next-to-last 44

Martin 1975:85 1-3, Jorden and Noda 1987:243, Kindaichi and Akinaga 2001 :75 (appendix).

165

7.4 Verb accent Table 7-17 Dictionary forms of verbs endingVV(C)/u/

vv

lJnaccented verb

Accented verb

/ii/ lie/

/kieru/ ~;t{, 'disappear'

/io/ /ai l

/ae/ /ao/

/kaeru/ 2t.:Z.6 'change' /kaoru/ ~{,'smell sweet'

/oil loe/ loo/

/moeru/ :tf&.:Z. 6 'burn'

/ui/ /ue/

/ueru/

;ftt[;t{> 'plant'

luo/

/sii"ru/ /hie"ru/ /njo"u/ /haiiru/ /haeiru/ /naoisu/ /oiiru/ /hoeiru/ /yosoo" u/ /kuiiru / /ueiru/ /uruoiu/

5~1t){,

'force' 'get cold' {;;]-) 'smell' .A6 'enter' ~;t{> 'grow' 'fix' ~\,){,'grow old' l!ft.:Z.6 ' bark' ~-)'put on' 'itlJ 1t ) 6 'regret' M.:Z.6 'starve' i!M1-5 'get moist' ~;t{>

m-r

Table 7-18 Dictionary forms of accented verbs ending /ae/C/u/ or /oe/C/u/

/haeiru/ ~;t{, 'grow' / hoeiru/ i!ft.:Z.6 'bark' /ka"eru/ #ff 6 'return home' /kaiesu/ Jl,3.T 'give back'

/kotaeiru/- /kota"eru/ /kangae"ru/-/kanga"eru/ /otoroe"ru/- /otoro"eru/ /totonoe"ru/- /totono"eru/

~;t{>

'answer'

~ .:Z.6 'think'

1lt ;t 6 'deteriorate' ~;t{, 'prepare'

syllable would be long, and it would bear the accent, as in /to"Hru/ JM6 'pass' in Table 7-15. Notice that except for /ha"iru/ A6 'enter', all the accented verbs in Table 7-17 have a dictionary form transcribed with the downward arrow symbol immediately before the final syllable. The obvious explanation for /ha"iru/ is to say that /hail is a long syllable (and that /ail is a diphthong) in this word. 45 If so, the accent appears where our general principle predicts: on the second-to-last syllable. The only other noncom pound verb with the sequence /ail in the same position in its dictionary form is accented /ma"iru/ *6 'go; come' (humble), and the same explanation works. As it turns out, though, other verbs with dictionary forms that end VV(C) /u/ look like clear-cut exceptions to the generalization that the dictionary form of an accented verb always has the accent on its second-to-last syllable. Table 7-18 includes some of the problematic examples, and all of them have either /ae/ or loe! as the VV sequence. Those on the right can be accented on either the 45

Martin 1952:33, Tanaka and Kubozono 1999:80.

166

Accent and intonation

167

second or the third syllable from the end. 46 Individual Tokyo speakers generally recognize both forms as possible in these cases. Looking first at accented verbs with three-syllable dictionary forms, the accent is on the second-to-last syllable in all examples containing the VV sequence foe! (as in /hoe•ru/ IJ;'(.:Z .'6 'bark'), and in most examples containing the VV sequence /ae/ (as in /hae•ru/ ~.:Z.'6 'grow'). But in /ka'eru/ '.lffl.'6 'return home' and /ka'esu/ ~T 'give back', all Tokyo speakers accent the third syllable from the end. If the dictionary form has four or more syllables, most examples containing the VV sequence /ae/ and a small number of examples containing the VV sequence /oe/ seem to allow accent either on the second syllable from the end (as in / kangae•ru/ ~ .:Z .'6 'think') or on the third syllable from the end (as in /kanga•eru/). We could try to maintain the second-to-last-syllable generalization at all costs, but we'd have to say that the forms in Table 7-18 with the downward-pointing arrow immediately following fol or /a/ contain a long syllable ending in the diphthong /oe/ or /ae/. 47 This explanation just doesn't seem right intuitively. As I said in connection with V1V2 sequences in §6.7, unless V2 is Iii or /u/, it seems safe to say that there's a syllable division between V1 and V2• In any case, even if we were willing to recognize /oe/ and /ae/ as potential diphthongs, we'd still have no explanation for why / hae•ru/ has three syllables while /ka'eru/ has only two: / haAe•Aru/ versus /ka-C

are exceptions to the generalization, assuming as I did above that /a/ and /e/ are in separate syllables: /ka'eQta/, / ka'esita/, /ka'eQte/, / ka'esite/. We saw earlier that the dictionary form of a verb and the indefinite pronoun or nominalizer /no/ 0) combine into a single accent phrase, and also that an accent appears on the last syllable of the verb in such a phrase when the verb is unaccented. Past affirmative forms like those in Table 7-20 behave in exactly the same way when they combine with /no/.53 For example, when the verb form is accented /acu'meta/ ~ilt)t.: 'collected', the combination is /acu'meta no/, and when the verb form is unaccented /kasaneta/ b.t.: 'stacked', the combination is / kasaneta• no/, with an accent on the last syllable of the verb form. Verb gerunds often combine into single accent phrases with the particles /wa/ ti and /mo/ ~, notably in expressions of permission and prohibition.54 As far as accent location is concerned, these gerund+ particle phrases are just like the phrases that end with /no/. If the verb is accented, the

m

53

Jorden and Noda 1987:243.

54

Martin 1975:498.

7.4 Verb accent

169

Table 7-21 Affirmative passive and causative forms of verbs

Unaccented verb DICTIONARY FORM PAST PASS NONPAST PASS PAST CAUS NONPAST CAUS PAST PASS CAUS NONPAST PASS CAUS NONPAST

/yameru/ /yam eta/ /yamerareru/ /yamerareta/ /yamesaseru/ /yamesaseta/ /yamesaserareru/ /yamesaserareta/

Accented verb

111$&9.Q 'quit' fff-&IJt::.. fff-&IJ Gn .Q ~%&9Gnt::.. 111$&9~-tt.Q 111$&9~-ttt::..

111$ &IJ ~ -tt Gn.Q fff-&IJ ~ -tt Gnt::..

/tabe"ru/ /ta"beta/ /taberare"ru/ /tabera"reta/ /tabesase"ru/ /tabesa" seta/ /tabesaserare"ru/ /tabesasera "reta/

ft.rZ.Q 'eat' ft.rZt::.. ftrZGtL.Q ftrZGnt::.. ftA:.~-tt.Q

ft A:.~ -ttt::.. ftrZ~-ttGn.Q

ftrZ ~ -tt Gnt::..

gerund+ particle combination is accented on the same syllable as the gerund on its own, but if the verb is unaccented, the gerund+ particle combination is accented on the last syllable of the gerund. 55 For example, the gerund of accented /sake"ru/ ~~JQ 'avoid' is /sa"kete/, and the gerund of unaccented /makeru/ :~Jt Q 'lose' is /makete/. In the phrases we're concerned with here, we find /sa"kete wa/ and /sa"kete mo/, as opposed to /makete 1 wa/ and /makete" mo/. Like gerunds and past affirmatives, passive and causative forms are unaccented for unaccented verbs and accented for accented verbs. 56 The examples in Table 7-21 illustrate. Every form in the left column of Table 7-21 is unaccented, and every form in the right column is accented. All the nonpast-tense forms based on /tabe"ru/ jtA:.Q 'eat', including the dictionary form itself, are accented on the second syllable from the end, and all the past-tense forms are accented on the third syllable from the end, which is the syllable containing the third mora fro m the end. Potential forms show the same pattern as passive and causative forms. Table 7-22 provides a few examples of nonpast-tense affirmative potential forms. This potential form is unaccented when the related basic verb is unaccented and accented when the related basic verb is accented. Prescriptive accounts stigmatize some of the potential forms in the table, but I'm not going to discuss this issue here.57 It would probably be better to refer to each potential form in the table as the citation form of a potential verb, even though dictionaries don't list such a form as a headword unless it's developed an idiosyncratic 55

Jorden and Noda 1988:245, 273. McCawley 1977:268-9, Kindaichi and Akinaga 200 1:90 - 2 (appendix). 57 Martin 1975:300-1, Jorden and Noda 1990:6-9, Matsuda 1993. 56

170

171

Accent and intonation Table 7-22 Potential forms of verbs

Unaccented verbs Dictionary form

Nonpast affirmative potential

/neru/ :ii Q 'sleep' Q 'board' /noru/ /akeru/ 00 ~t Q 'open' /cukau/ ~ -j 'use'

/nereru/ ~nQ /noreru/ *tLQ /akereru/ 00 ~ttLQ /cukaeru/ ~ ;Z Q

*

Accented verbs Dictionary form

Nonpast affirmative potential

/mi•ru/ ~Q 'look' /no•mu/ fiXtr 'drink' /tabe•ru/ '.ft.rZQ 'eat' /hana•su/ ~9 'speak'

/mire•ru/ ~nQ /nomeru/ iiX~Q /tabere•ru/ '.ft.rZtLQ /hanase•ru/ ~-ti- Q

meaning. What's relevant is just that each potential verb has all the forms that we'd expect any verb to have. For example, /nome•ru/ fiX60~ 'can drink' has the nonpast negative /nome•nai/ fiX~f;t\,d, the past affirmative /no•meta/ fiX ~t::., the gerund /no•mete/ :tiX~-C, and so on. We could say the same thing about nonpast affirmative passive, causative, and passive causative forms like those in Table 7-21. There are other verb forms that always have an accent, no matter what verb is involved. In some of these forms, the location of the accent depends on whether we're dealing with an accented verb or with an unaccented verb. The provisional, which ends /eba/, and the negative gerund, which ends /nakute/, are two forms that fit this description. The examples in Table 7-23 illustrate. The provisional of an unaccented verb is accented on the second syllable from the end, but the provisional of an accented verb is accented on the third syllable from the end.58 The negative gerund of an unaccented verb is accented on the /na/ of /nakute/, but the negative gerund of an accented verb is accented on the syllable right before /na/ (the same syllable as in the nonpast affirmative). 59 The past-tense negative, which ends /nakaQta/, and the negative adverbial, which ends /naku/, are like the negative gerund (Table 7-23); there's an accent on the syllable before /na/ if the verb is accented, and there's an accent 58 59

Martin 1952:42, McCawley 1968: 150, Akinaga 1998:193, Tanaka and Kubozono 1999:81 -2. Kindaichi and Akinaga 2001:90 (appendix).

7.4 Verb accent Table 7-23 Provisional and negative gerund forms of verbs

Unaccented verbs Dictionary form

Provisional

Negative gerund

¥f ~ 'put on' /kiru/ /yamu/ ll:: tr 'end' / makeru/ :~Jt~ 'lose' /sagasu/ !!9 'look for'

/kire'ba/ ~;hl;f /yame'ba/ ll:: ~If /makere'ba/ ~fthl;f /sagase'ba/ !!-inf

/kina'kute/ ~ ti\ 'l /yamana' kute/ ll:: i ti t'J < and /kuraku/ lffif < (see Table 7-28), the gerund ends with/kute/, asin /akana'kute/ 00 iJ>lj: 'win' has the nonpast negative form /kata•nai/, with accent on the next-to-last syllable (see Table 7-19), but the adjective /kitanali/ f§\,} 86 87

Jorden and Noda 1990:2 18, Kindaichi and Akinaga 2001:73,75 (appendix). 88 Martin 1975:374, Jorden and Noda 1987:264-8. Jorden and Noda 1990:218.

180

Accent and intonation

181

'dirty' is accented on the last syllable, like the dictionary forms of all accented adjectives (see Table 7-27). 89 I mentioned in §7.4 that a verb has a polite nonpast negative ending in /maseN/, but an alternative polite form consists of the nonpast negative followed by /desu/, normally combined into a single accent phrase. 90 In the case of accented /sa'ku/ ~ 'cut' (E 1) and /to'ru/ J&{> 'take' (E 2 ), marking only the division between the two main elements with a plus sign. In the dictionary form of such a compound, the first element is phonemically identical to the stem (§7.4) of E1 (/ki'ri/ in the case of /ki'ru/), and the second element is phonemically identical to the dictionary form of E2 , except for a change in the initial consonant in a few cases like /ike+do'ru/ ~ ltfltJ{> 'capture alive' (based on /ike'ru/ ~It{> 'keep alive' and /to'ru/ flli {> 'catch') . Since the compound as a whole is a verb, it has all the forms that any other verb has; /kiri+to'ru/ has the gerund /kiri+to'Qte/, the polite non past affirmative / kiri+torima'su/, and so on (§7.4). Notice that the first element doesn't vary. In §7.4 I classified verbs as either accented or unaccented, depending on the dictionary form, and as we saw, the accent patterns of several other forms of a verb differ depending on which of the two accent classes it belongs to. The accent class of a compound verb is for all practical purposes completely predictable in modern Tokyo Japanese. When E 1 is a verb, there's a very strong tendency, at least for younger speakers, to treat all such compounds as accented verbs, no matter which accent class the two element verbs belong to. 123 The 122 123

Tagashira and Hoff 1986. Akinaga 1998:1 95-6, Tanaka and Kubozono 1999:83, Kindaichi and Akinaga 200 1:54-5 (appendix). The older pattern is for the compound to be unaccented when E1 belongs to the accented class.

192

Accent and intonation

193

Table 7-38 Noun+verb and adjective+verb compound verb accent

Compound dictionary form

Ima

/ura+gi' ru/ Q'betray' /na+no'ru/ ::g*Q 'give one's name' /cika+zu'ku/ ilifi< 'approach' /naga+bi'ku/ ~51 ~J: . '(I'll) close (it) !'

yo

si

me'

ru

yo

Abrupt falling /yo/

The sentence-final particles /ne/ tl and Iyo/ J:: can carry a variety of rising intonation contours, and these subtly different contours convey a range of affective meanings, but I'll only consider a very common abrupt rising contour here. 137 I'll use the symbol /' to represent this contour in phonemic transcriptions, as in Figure 7-25. The basic accent pattern on the word before the particle is preserved in these examples, including the accentual fall in the examples on the right. When a sentence-final particle carries a falling contour, an accent appears on the last syllable of a basically unaccented word, and the accentual fall merges with the falling intonation. Like rising contours, falling contours allow a range of variation, but the examples in Figure 7-26 illustrate with a very common abrupt fall. The symbol "'. represents this contour in phonemic transcriptions.

EXERCISES Transcribe each word below phonemically. Includ e mora and syllable boundar-

1

ies in the appropriate places by putting a carat

U

between syllables and then

putting a period (.) between any moras that are in the sam e syllable. Also mark the accent location in each word with a downward pointing arrow(') in the appropriate place. For example, the transcription of tenki 7('.5{\. 'weather' would be /te.'N ~ ki/. b6ekigaisha .tli'.~~t±

haiburiddosha Fv( 7'' 1)

hiragana 5¥-1R:i3

keizai Ki!fdf'r

Porutogarugo

r-7''. 2

$;v r;il;v~ft

toshokan ~t!l~Ei

rokujihan /\~-¥ yama

'/

f!j!

k6nsl1pu ::i.-/7'-/ sekiyusut6bu

fintl7'

LlJ

As we saw in §7.2, ifthe initial syllable in an accent phrase is long and doesn't bear the accent, it can be pronounced either with a rising pitch (traditionally represented as LH or the equivalent) or with a high level pitch 137

Tanaka and Kubozono 1999:119.

Exercises

(traditionally repres ented as HH or the equivalent). The relevant entries in pronun ciation dicti onaries (NHK 1998, Kindaichi and Akinaga 2001) are consistently marke d in a way that implies LH. For example, unaccented /teN coH / ~wm 'mod ulation' and /koHkoH/ r'¢1i~ 'high school' are marked as LHHH , whereas initial-accented /te'NcoH/

J;5;&: 'store manager' and

/ko HkoH/ ~fr 'filial piety' are marked as HLLL. Pronunciation dictionaries 1

treat initial long syllables ending in /Q/ in parallel fashion: unaccented /keQko H/

'lZJllit 'flight cancellation' is marked as LHHH, and initial-accented

/ke 1 Qko H/ *RWJ 'fi ne' is marked as HLLL. But Haraguchi (1977:34-5) says that LLH H is more acc urate for words like /keQkoH I when they occur at the beginni ng of an acce nt phrase. What do you suppose the source of this disagreement might be? Consult with a native speaker of Tokyo Japanese and find out whether that person has any clear intuition about the pitch pattern in accent phrases beginning with a word like /keQkoH/. Is HHHH a possibility in such cases? What sort of experimental data might you look at to investigate this question?

3

Leaving aside Sino-Ja panese items, are there any one-mora loanwords? Whether there are or not, what would you expect the default accent location to be on such a word?

4

Transcribe the words below phonemically and note the accent location in each case. Do they have the default location? What do all these words have in common aside from their accent pattern?

Amerika 7 PJ ;lJ haiena /\1.:r..-t

biiberu /~ - ""-lv haiteku /\1'f7

konsom e '::1/') j masukara -c?7-.;/J7 sutorobo :At-oif-

5

firaria 717 1) 7 infure1 /7v

kyarameru :f--v7hv poroshatsu ~OY-V'J teburu 'f - 7'';v

hiidoru /\ - r:;v ltaria 17' 1J 7

marifana -c? 1) 77T sutereo 7-.'fv;t

ukurere'7 7vv

Provide a ph one mic transcription and mark the accent location for each of the Japanese surnames below.

And6 ~Jij Ike

tih

Dan ~

Kat6 tJOJij

Got6 1~Jij

Kita6ji ~t:kJm

Hara JR

Hayashi ,fif\

Nagai 7)R

Nakano rj 1JliY

llfq-t Ogasawara 1JY3.'H9: Okawa ,k.Jl l Oshima )(Ci; Saito Ji!i=J~ Sen 'f Takai 1'ttltt Takano 1~l1J Tanaka ff1 1r Wakabayashi :fi't,f Yamamoto 111-4: Yagi/\* Yanagisawa ;fl)~rR Nonaka

Yano

*!1!1

Yoshimoto

"R:zls:

200

Accent and intonation

Try to sort the names into two groups on the bas is of accent, and describe your two groups as precisely as you can. How are the very short and very long names on the list distributed in your two groups? In which cases does the name seem to be related to a common noun but have a different accent loca· tion? Where is a native speaker of English most likely to place the stress on each name? In which cases does the English stress location match the Tokyo Japanese accent location?

6

The illustration below is a pitch track and a synch ronized spectrogram of a token of /so no ma!kura wa/ "(-0),f;,t i;:l: 'that pillow TOP'.

'N'

c.> u

250

c::

"' :::l

C7

200

····..···

..........................

.....

··.

~

.....

..·............

g 4000 > u

c::

~

2000

C7

~

.....

200

400

600

800

Time (ms)

Give a broad phonetic transcription of this phrase, and then use the spectrogram to find "landmarks" and try to match t he segments in your transcription with portions of the two acoustic displays. How well does the pitch track correspond to the schematic diagram in Figure 7-6?

7

One way to handle the examples in Table 7-14 is to say that many combinations of an unaccented noun followed by a particle are final-accented rather than unaccented. Consider sentences consisting of an unaccented noun followed by the copula form /da/ t~' followed by the sentence-final particle /yo/

J;. Are the pitch patterns on such sentences consistent with the idea that these noun+ /da/ combinations are final-accented?

8

The nouns daburu

?'' f;v 'double' and toraburu r 77'Jv 'trouble' are recent

loanwords from English. According to the entries in NKD 2000- 2, the earliest attestations are 1912 for daburu and 1914 for toraburu. Capitalizing on the fact that both words happen to end with the syllable /ru/, Japanese speakers have also created a verb with the dictionary form daburu ?1''7''~ 'overlap'

Exercises

201

(earliest attestation 1927) and, more recently, a verb with the dictionary form toraburu r 77''{> ' have trouble' (earliest attestation 1975), which is still rather

slangy. Trans cribe all four words phonemically and mark accent location in each. Do the nouns both have an accent location that conforms to some gen eral tend ency? How about the verbs?

9

Dictiona ries list / hai/ as an alternative pronunciation for /hae/- /hae'/ ~'fly', and accordin g to NKD 2000-2, /kairu/ is a historically attested alternative pronun ciation fo r /kaeru/ ~!! 'frog'. How do these alternative pronunciations provide a hint for a plausible historical explanation of why the verb dictionary forms / ka 1 eru/ frr} {> ' return home' and /ka 1 esu/ ~T 'give back' are accented on the third sylla ble from the end? You should check NKD 2000-02 to see whether /ka 1 iru/ an d /ka 1 isu/ are attested as alternative pronunciations for these two verb fo rm s.

10

Accordi ng to the description in §7.4, the gerund of an unaccented verb is unaccented , but on th e basis of combinations such as /kaQte' mo/ (dictionary form /kau/

n-J ' buy') and /makete' wa/ (dictionary form /makeru/ ti.It{> 'lose'),

Martin (1975 :476) treats the gerunds of unaccented verbs as accented . Do you agree wi th th is dec ision? Consider the accent patterns on combinations of a verb gerund and an auxiliary, such as katte oku FJ-:>--C-Jo< 'buy in advance' and makete shimau

11

f! lt--C L i-5

'end up losing'.

Accordin g to the description in §7.4, the stem form of an unaccented verb is unaccented (see the examples in Table 7-25), but Martin (1975:392 - 3) says that th e stem form of an unaccented verb is final-accented. The support for Martin's claim co mes from combinations of a verb stem with the particle /wa/ or /mo/ in phrases like kashi wa shinai U U ;J: L ~v} 'tend

SUBJ ECT

doesn't'

(com pa re unaccented I kasu I jgt'9 'lend') and narabe mo shinai Ml"- l.i L ~ v} 'doesn't arrange either' (compare unaccented /naraberu/ .fill..Z{> 'arrange'). Transcribe these phrases phonemically and mark the accent tocation(s) and the division (s) between accent phrases (if there are any) in each case. Then exp ta in why these phrases can be construed as support for Martin's claim . Is there any way to accou nt fo r these phrases on the assumption that the stem form of an unaccented verb is unaccented?

12

Consider the imperative forms of verbs, that is, examples like tsumero (dictionary form / cume'ru/ ga~{> 'pack'), kimero (dictionary form /kimeru/ ~~Q 'decide'), aruke (dictionary form /aru 1 ku/

(dictionary form /cukau/

tl!!-5

!Jr: