Tones are the most challenging aspect of learning Chinese pronunciation for adult learners and traditional research most
468 77 2MB
English Pages xiv+150  Year 2018
Compared with other subdisciplines in Chinese linguistics, children’s language acquisition is a significant field with r
246 125 10MB Read more
Providing a solid foundation in second language acquisition, this book has become the leading introduction to the subjec
577 61 314KB Read more
This user-friendly guide to the basics helps you learn how to speak Chinese quickly and easily by drawing parallels with
576 150 38MB Read more
Five minutes a day is all it takes to begin learning Chinese! The perfect guide for busy people who want to learn Chine
432 115 52MB Read more
This user-friendly guide to the basics helps you learn how to speak Chinese quickly and easily by drawing parallels with
739 90 13MB Read more
This book is the first edited book to cover a wide range of issues related to Chinese as a second language (CSL) speech,
248 28 18MB Read more
Part I: Language in context. Interactionist approach / Alison Mackey, Rebekha Abbuhl, and Susan Gass ; The role of feedb
415 107 6MB Read more
Table of contents :
Three puzzles in Mandarin L2 tone acquisition --
Methodology: data collection and analysis --
Coarticulation effects in L2 Chinese tones --
Phonological universals and the acquisition order of Mandarin tones --
Acquisition of the third tone --
Teaching Mandarin Chinese tones.
Second Language Acquisition of Mandarin Chinese Tones
Utrecht Studies in Language and Communication Series Editor Paul van den Hoven Jan ten Thije
The titles published in this series are listed at brill.com/uslc
Second Language Acquisition of Mandarin Chinese Tones Beyond First-Language Transfer By
LEIDEN | BOSTON
Library of Congress Cataloging-in-Publication Data Names: Zhang, Hang, author. Title: Second language acquisition of Mandarin Chinese tones : beyond first-language transfer / by Hang Zhang. Description: Leiden ; Boston : Brill Rodopi, 2018. | Series: Utrecht studies in language and communication ; volume 33 | Includes bibliographical references and index. Identifiers: LCCN 2018003775 (print) | LCCN 2018012366 (ebook) | ISBN 9789004364790 (E-book) | ISBN 9789004305977 (hardback : alk. paper) Subjects: LCSH: Chinese language—Study and teaching—Foreign speakers. | Chinese language—Tone. | Mandarin dialects—Phonology. | Second language acquisition. Classification: LCC PL1065 (ebook) | LCC PL1065 .Z465 2018 (print) | DDC 495.181/3—dc23 LC record available at https://lccn.loc.gov/2018003775
Typeface for the Latin, Greek, and Cyrillic scripts: “Brill”. See and download: brill.com/brill-typeface. issn 0927-7706 isbn 978-90-04-30597-7 (hardback) isbn 978-90-04-36479-0 (e-book) Copyright 2018 by Koninklijke Brill NV, Leiden, The Netherlands. Koninklijke Brill NV incorporates the imprints Brill, Brill Hes & De Graaf, Brill Nijhoff, Brill Rodopi, Brill Sense and Hotei Publishing. All rights reserved. No part of this publication may be reproduced, translated, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission from the publisher. Authorization to photocopy items for internal or personal use is granted by Koninklijke Brill NV provided that the appropriate fees are paid directly to The Copyright Clearance Center, 222 Rosewood Drive, Suite 910, Danvers, MA 01923, USA. Fees are subject to change. This book is printed on acid-free paper and produced in a sustainable manner.
Contents Preface ix Acknowledgements x List of Tables xi List of Figures xii List of Selected Abbreviations xiii Mandarin Tones xiii 1 Introduction 1 1.1 Phonetics and Phonology of Mandarin Chinese Tones 2 1.1.1 Phonetics of Tones 2 1.1.2 Phonological Representations of Mandarin Chinese Tones 3 1.2 Chinese Tone Variations 8 1.2.1 Tone Coarticulation 8 1.2.2 Tone Sandhi in Chinese 9 188.8.131.52 The Variants of T3 9 184.108.40.206 Other Tone Sandhi Processes 12 1.3 Intonation in Chinese 13 1.4 The Acquisition of Mandarin Chinese Tones 15 1.4.1 Musical Ability and the Acquisition of Tones 15 1.4.2 The First Language Acquisition of Chinese Tones 17 1.4.3 Tone Perception by Adult Learners 18 1.4.4 The Second Language Acquisition of Mandarin Tones: A Historical Perspective 19 1.5 Organization of this Book 23 2 Three Puzzles in Mandarin L2 Tone Acquisition 26 2.1 Prosodic Structures of English, Japanese, and Korean 26 2.1.1 English Prosodic Structure 27 2.1.2 Japanese Prosodic Structure 28 2.1.3 Korean Prosodic Structure 29 2.2 Puzzles Surrounding the L2 Acquisition of Tones 30 2.2.1 Puzzle 1: Positional Effects of Contour Tones 30 2.2.2 Puzzle 2: Two Issues in L2 Studies on the Acquisition Order of Chinese Tones 31 2.2.3 Puzzle 3: T3 34
3 Methodology: Data Collection and Analysis 36 3.1 Test Materials 36 3.2 Participants and Recording Procedure 38 3.3 Assessment of L2 Tones 39 3.3.1 Correctness Judgments 39 3.3.2 Pitch Values 41 3.4 Data Analysis 42 4 Coarticulation Effects in L2 Chinese Tones 43 4.1 The Nature of Anticipatory Tone Coarticulation 44 4.2 Research Questions and Hypotheses 47 4.3 Results 49 4.3.1 Research Question 1: Accuracy Rates of Contour Tones 50 4.3.2 Research Question 2: Maximum F0 Values of T2 and T4 53 4.3.3 Research Question 3: Error Types of Contour Tones 55 4.4 Discussion 57 4.5 Conclusion 59 5 Phonological Universals and the Acquisition Order of Mandarin Tones 60 5.1 Phonological Background 61 5.1.1 Basic Concepts in Optimality Theory and the Tonal Markedness Scale 62 5.1.2 OT in Language Acquisition Studies: Grammar Restructuring 64 5.1.3 OCP Effects in Mandarin Chinese: An OT Account of T3 Sandhi 65 5.1.4 Present Study: Hypotheses Regarding the TMS and the OCP in L2 Tones 68 5.2 Results 70 5.2.1 Results of OCP Test 1: The Change Rate of ITC and NITC 71 5.2.2 OCP Test 2: The Acquisition Order of Tone Pairs and Effects of the TMS 73 5.2.3 The Acquisition of Tone 3 Sandhi 75 5.3 Discussion: An OT Account for the Acquisition of Identical Tone Sequences 76 5.3.1 OCP(WholeTone) or OCP(ConstTone) 76 5.3.2 Stages in L2 Tone Phonology Development 77 5.3.3 An Alternative Account for the Effect of the OCP Interacting with the TMS 80 5.4 Conclusion 81
6 Acquisition of the Third Tone 83 6.1 The Allophones and Sandhi Rules of Tone 3 84 6.2 The Second Language Acquisition of T3 85 6.2.1 The ‘Full-T3 First’ Method 85 6.2.2 The Present Study 87 6.3 Methodology 87 6.3.1 The Main Experiment (Experiment 1) 88 6.3.2 The Supplemental Experiment (Experiment 2) 90 220.127.116.11 Stimuli 90 18.104.22.168 Subjects and Recording Procedures 92 22.214.171.124 Analysis 92 6.4 Results 93 6.4.1 Perception of T3 Variants (Half-T3 and Full-T3) 93 6.4.2 Production of Half-T3 and Raised-T3 95 126.96.36.199 The Error Patterns of Half-T3 and Raised-T3 95 188.8.131.52 Substitutions Used for Half-T3 and Raised-T3 98 6.4.3 Production of Utterance-Final T3 102 6.5 Discussion 104 6.5.1 The Overproduction of Full-T3 104 6.5.2 Implications 106 184.108.40.206 Theoretical Implications: The Underlying Form of T3 107 220.127.116.11 The ‘Half-T3 First’ Method 108 6.6 Conclusion 110 7 Teaching Mandarin Chinese Tones 112 7.1 Pedagogical Implications 112 7.1.1 Pedagogical Implications of Chapters 4–6 112 7.1.2 From the Establishment of the Mental Representation of Tones to Motor Skills 114 7.2 Current Prevailing Teaching Materials 116 7.2.1 Tone Inventory Descriptions 118 7.2.2 Methods of Study 119 7.2.3 Exercises Aimed at Tone Training 122 7.3 Sample Exercises 123 References 133 Index 147
Preface This book examines non-native Mandarin Chinese tone productions made by speakers of English, Japanese, and Korean. Its goal is to show that there are factors influencing second-language acquisition that extend beyond transfer of structures from the learner’s first language and beyond characteristics extracted from the learner’s target language. The first two chapters provide background on the phonetics and phonology of Mandarin Chinese tones, and survey major findings from the past several decades on the first- and second-language acquisition of Chinese tones. The third chapter describes the procedure of one main experiment designed to answer several questions about second-language tone acquisition. The book’s three core chapters, Chapters 4–6, present research investigating the presence of influences that extend beyond learners’ first languages. The book concludes with a discussion in Chapter 7 of pedagogical implications and a review of current teaching materials for Chinese tones. Practical suggestions and ten sets of sample teaching materials are provided to improve Chinese tonal pedagogy. This book will be of interest to researchers and graduate students in the fields of second-language acquisition and Chinese linguistics. Although primarily a research monograph, the book contains intensive reviews of previous studies on tone acquisition, presents new perspectives on tone pedagogy, and offers practical solutions. Thus it (especially Chapters 1, 2, 6, and 7) will also be of interest to educators and learners of the Chinese language.
Acknowledgements This book could not have been written without the help of many people. My thanks go first to Professors Jennifer L. Smith and Elliott Moreton, who guided me in my initial exploration of second language Chinese tones and gave me a fresh perspective on how to approach my research. I would also like to thank my colleagues and students at various schools who helped me collect data and analyze multiple experiments. These include, but are not limited to, Chris Wiesen, Jia Lin, and Wendan Li at the University of North Carolina at Chapel Hill; Di Qi at Georgetown University; Liyi Jia at George Washington University; and Hong Shi, Linghong Huang, and Feiyan Wang at Zhejiang University. Special thanks go to Professor Shoko Hamano and Young-Key Kim-Renaud at George Washington University for their continual support and advice. I am also very grateful to Dr. Yi Xu, who granted me the permission to reprint his two figures, Figure 1.1 and Figure 4.2, in this book. I am also thankful for audiences in my research presentations, anonymous reviewers, and the editors from Brill, all of whom have helped me in improving my research and my writing. I am particularly indebted to Emily Moeng, who helped me refine this book throughout the project; each page of this work has benefited from her careful revision. During the course of writing this book, I had the great fortune of receiving the University Facilitating Fund (GWU) and the CCAS editing service from the Office of Research and Strategic Initiatives. I appreciate Dr. Yongwu Rong’s timely support. This book is dedicated to my husband Jie Cai and daughter Emma Xiaoman Cai. I have relied greatly on Jie and Emma’s love and encouragement throughout my time conducting research and writing, and this book would not have been possible without their constant support. All errors are my own responsibility.
List of Tables 1.1 The four lexical tones of standard Mandarin Chinese in isolation 4 1.2 The three allophones of T3 10 1.3 Accuracy rate of Mandarin-speaking children (Clumech, 1980) 17 4.1 Potential influence of anticipatory coarticulation on T2 and T4 accuracy rates 48 4.2 Results of a mixed logistic regression for T2 and T4 when all language groups are pooled together 51 4.3 The top three disyllabic response tones for target T2 (LH) at initial positions 56 4.4 Statistical analyses of error type comparisons for T2-T1, T2-T4, T4-T1, and T4-T4 57 5.1 OCP Test 1 results for English speakers 72 5.2 OCP Test 1 results for Japanese speakers 72 5.3 OCP Test 1 results for Korean speakers 73 5.4 Rao-Scott Chi-Square test results for TMS hypothesis 75 6.1 The three allophones of Tone 3 84 6.2 The four Chinese lexical tones 85 6.3 Example trisyllabic phrases used for production task in Experiment 2 91 6.4 Error rates of target Half-T3 at word-initial and word-final positions 96 6.5 Substitute tones for target Half-T3 100 6.6 Detailed substitution patterns for target Half-T3 with positional information 100 7.1 Textbooks and monographs under review 117 7.2 Tone training items listed in textbooks or monographs 123
List of Figures 1.1 Mean F0 contours of four Mandarin tones produced in isolation 5 2.1 Metrical grid for a bitonal metrical foot with one stressed syllable 28 4.1 Carryover and anticipatory coarticulation effects 45 4.2 Anticipatory effects on T2 and T4 of Mandarin Chinese reported by Xu (1997) 46 4.3 Accuracy rates of word-initial T2 (LH) in L2 disyllabic words 51 4.4 Accuracy rates of word-initial T4 (HL) in L2 disyllabic words 52 4.5 Average F0 (Hz) values of T2 offsets for L2 productions judged to be correct 53 4.6 Average F0 (Hz) values of T4 onsets for L2 productions judged to be correct 54 4.7 F0 values of T2 offsets and T4 onsets in the supplementary experiment 55 5.1 OCP Test 1 72 5.2 The occurrences of tone pairs in L2 production (OCP Test 2) 74 5.3 Error rates of individual tones in the main experiment 74 5.4 Accuracy rates of identical tone sequences 76 5.5 Re-ranking of OCP sub-constraints 79 6.1 Surface forms of T3 in disyllabic words 89 6.2 Error rates of tones in the perception task in Experiment 2 93 6.3 Error types for target Half-T3 (H3) for beginner- and intermediatelevel learners in the perception component of Experiment 2 94 6.4 Overall error rates in the Experiment 1 96 6.5 Error rates of L2 tonal productions in Experiment 2 97 6.6 Error rates of Half-T3, Raised-T3, and Full-T3 in Experiment 2 98 6.7 Substitutions within each language group in Experiment 1 99 6.8 Pitch track of two trisyllabic prosodic words with the same T3 morpheme at the beginning: (a) qǐng nǐ kàn and (b) qǐng tā kàn 102 6.9 Sentence-final T3 hǎo uttered by English, Japanese, and Korean speakers 103 6.10 The proportions of Half-T3 (H3) and Full-T3 (F3) at utterance-final positions 104 6.11 Tone sandhi rules based on different theoretical assumptions 109
List of Selected Abbreviations FAITH Faithfulness Constraints H High Hz Hertz ITC Identical Tone Combination L Low L1 First Language L2 Second Language NITC Non-Identical Tone Combination OCP Obligatory Contour Principle OT Optimality Theory SLA Second Language Acquisition TBU Tone-bearing unit TETU The Emergence of The Unmarked TMS Tonal Markedness Scale UG Universal Grammar
T1 T2 T3 FT3 (Full-T3, F3) HT3 (Half-T3, H3) RT3 (Raised-T3, R3) T4
Tone 1 (high level tone ) Tone 2 (high rising tone ) Tone 3 Low dipping tone  Low falling/level tone  Rising tone which results from Pre-T3 Sandhi Tone 4 (high falling tone )
Introduction Learning a new language requires one to learn new rules for pronunciation. A change in the quality of a consonant or vowel can change the lexical meaning of a word. For example, if in English a voiceless consonant is replaced with a voiced one, the meaning of a word can change: fie [fai] becomes vie [vai]; Sue [su:] becomes Zoo [zu:]; and thigh [θai] becomes thy [ðai].1 In these cases, the linguistic feature of ‘voicing’ (vibrating the vocal folds) is critical, since this single sound feature differentiates one word from another. We call this type of linguistic feature a contrastive, or phonemic, feature, as it distinguishes among word meanings in a language. Contrastive features are language specific – they may be important in one language, but not in another. While in all languages vowel and consonant articulations are central to conveying the meanings of words, language learners usually do not pay close attention to pitch changes. This is because, in most cases, people use pitch only to convey emotions and nuances. For example, in English one says ‘Two?’ with a rising pitch to ask a question, or ‘Two!’ with a falling pitch to indicate a demand. In this case, differences in pitch convey different emotions or sentence functions, but they do not change the word’s meaning: ‘two’ is still the number ‘two’ – the change in pitch contour operates mainly above the word level. This is called intonation. However in some languages, pitch operates at the word level. For example, in modern standard Chinese or Putonghua (hereafter Mandarin Chinese, Mandarin, or Chinese), the syllable tu has at least four different lexical meanings depending on its pitch: ‘convex,’ ‘picture,’ ‘soil,’ or ‘rabbit.’ Each of these four meanings is conveyed, respectively, with a high-level pitch, a mid-rising pitch, a low pitch, and a high-falling pitch. Just as voicing serves a contrastive function in English, pitch is a contrastive feature in tonal languages like Chinese. Tones are the most challenging aspect of learning Chinese pronunciation for adult learners. The main topic of this book is the second language (L2) acquisition of the Mandarin Chinese tone system by nontonal language speakers. Unlike traditional research practice, which examines non-native Mandarin tones by contrasting first language (L1) with second language differences, this 1 Symbols in square brackets indicate sounds transcribed in the International Phonetic Alphabet (IPA).
© koninklijke brill nv, leiden, 2018 | doi 10.1163/9789004364790_002
book aims to provide a fresh perspective on the study of L2 tone production by looking beyond the L1 or L2 for sources of learner errors. This introductory chapter provides some preliminary background on tone acquisition research. The first section (§1.1) deals with the phonetics and phonology of tones; section 1.2 with tone variations including tone sandhi; and section 1.3 with tone and intonation interaction in Mandarin Chinese. Section 1.4 gives a brief introduction to the first and second language acquisition of Mandarin tones. The chapter concludes with a summary of the book’s organization (§1.5). 1.1
Phonetics and Phonology of Mandarin Chinese Tones
As noted above, pitch in Mandarin Chinese serves a contrastive function because Chinese is a tonal language. Mandarin syllable structures are relatively simple – the number of segments in a syllable ranges from one segment, V, to a maximum of four, CGVC (or CGVG), where C is a consonant, V is a vowel, and G is a glide. With only 404 grammatical combinations of consonants and vowels in a Mandarin syllable, tone plays a substantial role in differentiating word meanings. Although there are still numerous homophones in Mandarin, the number of possible contrastive syllables increases to over a thousand when syllables are labelled with lexical tones. While this book focuses on the phonology of L2 tonal production, it is nonetheless important to have some background on the phonetics of tones since the two disciplines, phonetics and phonology, are closely related and often overlap. In general, phonetics focuses on how speech is physically created, realized, and perceived. Phonology focuses on sound patterns or how sounds interact as a system in a particular language. It deals with the systematic organization of sounds within and across languages for encoding linguistic information. Thus phonetics concerns itself with the physical properties of sounds (phones) while phonology concerns itself with the mental representation of speech sounds (phonemes) and sound patterns. 1.1.1 Phonetics of Tones Sound is produced by vibrations – it is a form of energy that travels through matter as waves. If we pluck a stretched rubber band or a guitar string, we can see the rubber band or guitar string vibrate as we hear the sound caused by those vibrations. Air carries the vibrations to our ears to produce nerve impulses that are conveyed to the brain to be interpreted as sounds. Pitch is related to the frequency of the sound waves produced. High-pitched sounds produce many sound waves per second and so have a high frequency. Low-pitched
sounds produce fewer sound waves per second and so have a lower frequency. The frequency of a sound wave is measured in a unit called the hertz (Hz). Factors such as the length, thickness, and tension of the vibrating materials can all influence vibration rate and therefore pitch. For example, as the thickness of a vibrating material increases, the rate of vibration, and therefore pitch, decreases. Likewise, as the thickness of the vibrating material decreases, the pitch increases. Human vocal cords vary in thickness and length. Women and children usually have thinner vocal cords and will therefore in most cases speak at a higher pitch than men. Stemple et al. (2000) reports a mean pitch of 106 Hz (with an average range of 77 Hz to 482 Hz) for male voices, but a mean pitch of 193 Hz (with an average range of 137 Hz to 634 Hz) for female voices. An individual can also raise or lower his or her pitch. The most direct way to do so is to manipulate the tension of the vocal folds, just as stretching or loosening a plucked rubber band changes its pitch. The more tightly a rubber band is stretched, the faster it vibrates when plucked. As the rate of vibration increases, the pitch increases. Human sound production is located in the larynx, an organ composed of two rings of cartilage: the cricoid cartilage and the thyroid cartilage. There are also two smaller pieces of cartilage, arytenoid cartilages, which sit on top of the rear rim of the cricoid cartilage. The vocal folds are two bands of muscle that join the thyroid cartilage and the two arytenoid cartilages. The rotation of the thyroid and cricoid cartilages in relation to each other causes changes in the length, thickness, and stiffness of the vocal folds. For example, when the crico-thyroid muscle contracts, it elongates the vocal folds, increasing their stiffness. As a result, the vibration of the vocal folds increases in frequency and the pitch rises (see more information in Ohala 1978; Hirose 1997; Yip 2002). Pitch is quantified by fundamental frequency (F0): the faster the vocal folds vibrate, the higher the pitch and the higher the F0 value, or hertz measurement, will be. ‘Pitch’ indicates the perceived height of sound. This term can refer to both speech and nonspeech sounds. ‘Tone,’ however, is a linguistic term, referring to a phonological category that distinguishes one word from another. Therefore in this book ‘tone’ is a term relevant only to language, and only to languages in which pitch plays a linguistic role (see further in Yip 2002: 5). It is one’s phonological knowledge of the language that tells a speaker of a tonal language which specific pitch is associated with which word meaning. 1.1.2 Phonological Representations of Mandarin Chinese Tones While tones in most African languages are level, Asian languages have both level and contour tones. This is an important feature of the Mandarin Chinese
tone inventory. Mandarin Chinese has four types of lexical tones on full syllables. Traditionally the four tones are labelled as Tone 1 to Tone 4; in this book they will be abbreviated as T1, T2, T3, and T4. Each category has a canonical citation form, i.e. how the tone is produced in isolation. In the large body of previous work on Chinese linguistics, tonal categories are classified by two sets of descriptive terms: register denoting pitch height (high/low), and contour denoting pitch movement (rising, falling, dipping, etc.) The phonological description of each tone’s citation form is displayed with a sample /ma/ word bearing that tone in Table 1.1 with both register and contour information under ‘Register/pitch pattern.’ Each tone’s pitch contour when produced in isolation is displayed in Figure 1.1. Linguists use pitch to describe tones in a purely relative sense. A particularly convenient method of describing the pitch of Mandarin tones was introduced by Chao (1930). In this system, pitch is plotted on a vertical scale that represents a speaker’s normal vocal range: the scale is divided into five points, with 1 indicating the lowest pitch and 5 the highest. Each tone can thus be described by marking its beginning and end point. For example, in the second column of Table 1.1, the pitch value of Tone 2 is transcribed as , meaning that it is a rising tone which begins with a pitch occurring at the mid-zone of a speaker’s pitch range and ends with a pitch occurring at the high end. The citation form of Tone 3 has a concave contour (falling-rising), so it is given an intermediate point with a tone value of . In the pinyin Romanization system of Chinese, tonal marks are placed on vowels, which carry tone better than consonants. This is due to the fact that vowels are sonorants (speech sounds produced with continuous, nonturbulent airflow in the vocal tract) and possess richer harmonic structures than consonants (J. Zhang 2004). Table 1.1
Tone 1 (T1) Tone 2 (T2) Tone 3 (T3) Tone 4 (T4)
The four lexical tones of standard Mandarin Chinese in isolation Pitch Value
55 35 214 51
High / Level High / Rising low / Dipping High / Falling
mā má mǎ mà
‘mother’ ‘hemp’ ‘horse’ ‘scold’
+ 120 + +
Normalized Time (%)
Figure 1.1 (Xu 1997, 2001): Mean F0 contours of four Mandarin tones on the monosyllable /ma/ produced in isolation. In this figure, T1 is indicated by H(igh), T2 by R(ising), T3 by L(ow), and T4 by F(alling). Time is normalized so that all tones are plotted with their average duration proportional to the average duration of T3.
Following Goldsmith’s (1976) theory of autosegmental phonology in which segments and tones appear on separate ‘tiers,’ we assume that tones are linked with Tone-Bearing Unit (TBU). The TBU in Mandarin is the syllable. This book follows the phonological model of tone proposed by Bao (1999), which is displayed in (1). (1) Chinese Tonal Representation I (Bao 1999) TBU: tone-bearing unit; T: tone root; r: register; c: contour; t: terminal tone segment
All of these features, including register and contour, are borne by a syllable. This model suggests that a contour tone behaves like a single unit since the contour node dominates the terminal tone segments (or component tones). This differs from the structure of intonational contours in nontonal languages, such as English, in which contours are composed of levels (see details in §2.1). According to Xu (1998) and Yip (2002), underlying tone targets are usually assumed to be best approximated by the end of the syllable which bears the tone. Here ‘underlying form’ refers to the theoretical base form that a sound is thought to have before any phonological rules have been applied. It is thus assumed that T1 has a tone target of high (H), and the contour tones (T2 and T4) have two tone targets, low-high (LH) and high-low (HL) respectively. The citation form of T3 contains three tone targets, represented as  in Chao’s system (1930). Due to its multiple variants, further discussion regarding the underlying form and surface forms of T3 is offered in §18.104.22.168. In general, the tones described above make up the four Mandarin Chinese tones on full syllables. However, there is another tone category, the so-called neutral tone, or qingsheng (light sound). This small subset of syllables, consisting mainly of affixes as well as noninitial syllables of some disyllabic words, are substantially shorter in duration compared to syllables bearing other tones. In some cases, these syllables do not have their own tones. For example, the suffix ‑de 的 and the interrogative sentence final particle ‑ma 吗, have no phonological representation of their own in any context. However in some cases, the sounds are derived from syllables bearing T1–T4, but are phonetically realized as neutral tones. For example, reduplicated forms may reduplicate without their tone in some instances: mā-ma ‘mother’ and jiě-jie ‘older sister’ bear the
neutral tone on their second syllables. The tonal specification of the neutral tone is controversial (see Cheng 1973: 54–83; Yip 2002: 181–185; B. Yang 2015: 11–13). However, no matter from what tone the neutral tone may have been derived, the surface phonetic forms of the neutral tone are predictable, with the pitch of the neutral tone being determined by the tone of the syllable preceding it. Shih (1987) reports the varying phonetic surface contour of syllables bearing the neutral tone, shown in (2). Note that the ‘T3’ here is the most widely distributed variant of T3, a low level tone (see §22.214.171.124). (2) Preceding tone
Syllable with neutral tone
starts high, then falls starts high, then falls, but not as low as after T1 starts fairly low, then rises starts fairly low, and falls even lower
55 (T1, H, high level) 35 (T2, LH, high rise) 21 (T3, L, low) 53 (T4, HL, high fall)
In a recent study, Chen and Xu (2006) examined three neutral tones in context and revealed a consistent pitch target which fell at the lower end of a speaker’s mid-pitch range. Since the neutral tone is not the focus of this book, it will not be discussed in further detail here. To summarize, both phonetic and phonological knowledge of pitch changes are required in speaking a tonal language. Tones 1, 2, 3, and 4 are phonological categorizations of pitch contours in Chinese which connect pitch types to word meanings. When one intends to say, for example, ‘mother’ (T1), one should have the phonological knowledge that the sound representation of the Chinese word for ‘mother’ is /ma/ with a high-level tone (T1). That is, the phonological knowledge of a Chinese speaker tells him or her that producing this word with, say, T3, will result in miscommunications. When the brain signals the phonetic mechanism to produce the high-level tone T1, the appropriate muscles must configure the vocal folds suitably to make the vocal folds vibrate at a rate which is located at the high range of the speaker’s pitch range. The actual pitch heights of the tone are relative to the sex and voice of each individual, and even to the mood of the moment. Despite the fact that a male speaker may produce T1 at a frequency of about 130 Hz, and a female speaker may produce the same tone at a frequency of about 220 Hz, the phonological representation of each tone does not differ between male and female speakers. In either case, T1 is produced somewhere near the high end of each individual
Mandarin speaker’s pitch range. No matter how variable the phonetic realization is, the underlying phonological tone targets should remain constant. 1.2
Chinese Tone Variations
While the phonological representation of tone in a tonal language is consistent across speakers and contexts, the phonetic realization or F0 contour of a tone can be affected by many factors, and especially varies in connected speech. For example, consonant voicing, aspiration, vowel height, stress, rate of speech, and various intonational events, such as narrow focus, can all affect surface pitch contours. A brief introduction of the interaction between tone and intonation in Chinese will be given in §1.3. This section focuses on the contextual variations of Chinese tones caused by adjacent tones. As tones in Chinese do not generally function individually but rather occur in connected speech, the actual pronunciation of each lexical tone often changes according to context. Thus it is very common for the phonetic realization of each tone to deviate from the tone’s canonical citation form. Generally speaking, there are two types of tonal variation that are triggered by tone context: tonal coarticulation and tone sandhi (Shen 1992; M. Chen 2000). Coarticulation effects are minor modifications of tone contours at the phonetic level. Tone sandhi, on the other hand, is a categorical change in tone – the change, that is, occurs at the phonological level. Let us take these cases of tone modification in turn. 1.2.1 Tone Coarticulation In most cases of tonal coarticulation, variations in tone occur at the phonetic level, and pitch contours are only slightly revised. The general pitch contour of the tone stays intact. For example, when T4 is followed by another T4, the first T4 does not end on a pitch as low as its citation form (Chao 1948). Because of this, the first T4 is sometimes transcribed as  rather than . However, it is still considered to belong to the T4 category. Coarticulation has often been assumed to be an automatic consequence of speech physiology (Chomsky and Halle 1968; Sweet 1877). Of particular interest in studies of this phenomenon is anticipatory coarticulation and carryover coarticulation. Anticipatory coarticulation is when one speech sound is influenced by subsequent sounds. Carryover coarticulation is when one speech
sound is influenced by preceding sounds. Both carryover and anticipatory effects are found across many tone languages (Gandour et al. 1992, 1994; Potisuk et al. 1997 for Thai; Shen 1990; Xu 1994, 1997 for Mandarin Chinese; Han and Kim 1974; Brunelle 2003 for Vietnamese; Hyman 1993 for Enginni; Laniran and Gerfen 1997 for Igbo). Previous studies on tone coarticulation have found that, while carryover effects are mostly assimilatory, anticipatory effects tend to be of a dissimilatory nature (Gandour et al. 1994; Xu 1994, 1997; Potisuk et al. 1997; Y. Chen 2012). How this phenomenon affects L2 tone productions will form the subject of Chapter 4. 1.2.2 Tone Sandhi in Chinese Xu (2001) argues that the realization of tonal pitch patterns in connected speech is either voluntary or, when it arises from articulatory constraints, involuntary. Most tonal coarticulation effects are assumed to involuntary (Xu 2001). That is, the phonetic implementation of sounds is constrained by the limitations of the articulators. For example, Xu (2001:13) states that carryover effects are “a natural result of realizing a tone under the limitation of maximum speed of pitch change and maximum speed of pitch direction shift.” However, some tone variations originally caused by involuntary contextual factors may be conventionalized into tone sandhi as a language develops. In these cases, the change in pitch contour of the canonical tone shape is systematic and may be phonologicized over time. For example, Pre-T3 Sandhi in Mandarin Chinese may have originated as anticipatory dissimilation (Yip 2002). Once phonologicized, sandhi processes are assumed to be voluntary acts. Several cases of common tone sandhi processes in Mandarin Chinese are introduced below. The first subsection focuses on T3 Sandhi. The second subsection outlines other tone sandhi processes that are restricted to particular words. 126.96.36.199 The Variants of T3 As seen in Table 1.1, T3 is the only tone type in Mandarin with a low register. Because of this, it plays a critical role in the fluctuating pitch level in sentences. In terms of contour, T3 displays the most variation among the four lexical tones, as it is regularly produced in three different forms, depending on the identity of surrounding tones. We refer to the three variant pronunciations (or allophones of T3), as Full-T3, Raised-T3, and Half-T3. The three T3 variants and the environments in which they occur are displayed in Table 1.2.
10 Table 1.2
Chapter 1 The three allophones of T3
Environment of occurrence
Full-T3 Raised-T3 Half-T3
  
Low dipping Mid rising Low falling/level
In isolation or utterance-final position Preceding T3 Preceding T1, T2, T4, neutral tonea
a Whether T3 surfaces as  or  before a neutral tone also depends on the syntactic structure of the utterance. See Cheng (1973) for further discussion.
The three variants of T3 are in complementary distribution. In most cases, T3 is realized as Half-T3 (pitch value ) when followed by any other tone (i.e. T1, T2, T4, or the neutral tone). The pitch contour of Half-T3 is low and mostly level, falling only very slightly.2 When followed by another T3, T3 is produced as Raised-T3, a mid-rising tone (pitch value ), which coincides with the pitch contour of another phoneme in Mandarin, T2. This is the well-known T3 Sandhi process. T3 is produced as a low-dipping tone (pitch value ) only at utterance-final positions or in isolation. In studies of theoretical Chinese linguistics, the identity of T3’s underlying form is highly contentious. As mentioned in §1.1.2, ‘underlying form’ refers to the theoretical base form that a sound is thought to have before any phonological rules have been applied. Because a syllable bearing T3 is often produced as  when spoken in isolation, this form of T3 is traditionally assumed to be identical to its underlying form, or base form (Chao 1930). The low-dipping T3 is thus called the ‘full’ T3 form, while the low-level T3 is traditionally called ‘half’ T3 (Chao 1948) in most L2 Chinese literature. Under this traditional view, Full-T3 is the base form and Half-T3 is its derivative variant. This traditional ‘Full-T3’ account leads linguists to posit two T3 Sandhi rules to capture the full range of T3 variants. The ‘Pre-T3 Sandhi’ rule states that T3 becomes Raised-T3 before another T3, as mentioned above. The ‘Half-T3 Sandhi’ rule states that the rising part of Full-T3 is omitted (becoming Half-T3) when it occurs before other tones. 2 Phonetic transcription conventions may differ slightly among researchers. For example, a T3 preceding T1, T2, and T4 may be described as , , or .
(3) Tone 3 Sandhi rules under the traditional ‘Full-T3 account’ a. Pre-T3 Sandhi:  →  / ____ [→214] or 
b. Half-T3 Sandhi:
 →  / ____ T (where T ≠  or )
However, some linguists have argued that Half-T3 , rather than Full-T3 , is the base form (Hockett 1947; Mei 1977; Yip 2002; H. Zhang 2014; C-Y. Chen 2005). Indeed, Table 1.2 shows that Half-T3 occurs before every tone other than T3 itself, and is therefore the most widely distributed variant. Furthermore, Duanmu (2000) reports that Half-T3, like Full-T3, often occur at the end of an utterance in native Chinese. In a survey of the distribution of Full-T3 in native Chinese, Shi and Li (1997) find that only 15% of T3s in utterance-final positions are low-dipping tones (Full-T3) in recordings of CCTV news programs. In Taiwan Mandarin Chinese, Tai (1978) also finds that the citation form is produced more often as  than as . C-Y. Chen (2005) argues that Full-T3, the low-dipping form, is a form which results from ‘emphatic stress.’ H. Zhang (2014) proposes that  is T3’s base form, while  is its intonation form. Despite this unsettled debate, the traditional Full-T3 account is widely accepted in the field of teaching Chinese as a second language, with most L2 Chinese educators insisting that L2 learners learn the assumed base form Full-T3 before learning the derived Half-T3 form. Chapter 6 will examine the effects of this traditional assumption about T3’s base form on how this tone is taught to, and learned by, L2 students. In contrast to the standard view, I follow Yip (2002) and Duanmu (2000) throughout this book in assuming that T3’s base form is, rather, Half-T3 , the low-level tone. In subsequent discussions I will therefore represent the tone target of T3 with the abbreviation L; the other tones will not diverge from the representation I introduced in §1.1.2, where T1 is H, T2 is LH, and T4 is HL. Below is a simplified representation of each tone: (4) Chinese Tonal Representation II (L: low; H: high). (a) T1 (b) T2 (c) T3 (d) T4 σ σ σ σ | /\ | /\ H L H L H L
The canonical T3 Sandhi process (i.e., Pre-T3 Sandhi) is annotated as in (5). (5) Pre-T3 Sandhi a. shuǐ ‘water’ L citation form (T3) b. shuǐ. guǒ L. L LH. L
‘fruit’ citation form (T3-T3) surface form (Raised-T3-T3)
188.8.131.52 Other Tone Sandhi Processes While T3 Sandhi applies to all T3 words, some tone sandhi processes in Chinese are restricted to a particular set of words or are only optionally applied. In this section I briefly introduce three cases of partial sandhi: 1) the ‘Yi-Bu-Qi-Ba’ Sandhi rule; 2) T2 in trisyllabic phrases; and 3) tones in reduplicated adjectives. The ‘Yi-Bu-Qi-Ba’ Sandhi Rule This sandhi pattern is related to some frequently used words, yi ‘one,’ bu ‘not,’ qi ‘seven,’ and ba ‘eight.’ The word yī bears a tone of T1 but can be produced with T4, yì, when followed by T1, T2, or T3. Bù bears a tone of T4. Both yì and bù are produced with a rising tone, with a pitch contour identical to that of T2, when they are followed by any syllable bearing T4. Examples of these processes can be seen in (6): (6)
(a) (b) (c) (d)
yì tiān [HL-H] ‘one day’ (e) bù chī [HL-H] ‘not to eat’ yì nián [HL-LH] ‘one year’ (f) bù lái [HL-LH] ‘not to come’ yì diǎn [HL-L] ‘one o’clock’ (g) bù xiǎng [HL-L] ‘not to think’ yí jiàn [LH-HL] ‘one item’ (h) bú qù [LH-HL] ‘not to go’
The words qi ‘seven’ and ba ‘eight’, each underlyingly T1 words, optionally bear a tone of T2 before T4. For example: (7)
(a) (b) (c) (d) (e) (f) (g) (h)
qī tiān qī nián qī diǎn qī/qí jiàn bā tiān bā nián bā diǎn bā/bá jiàn
[H-H] [H-LH] [H-L] [H/LH-HL] [H-H] [H-LH] [H-L] [H/LH-HL]
‘seven days’ ‘seven years’ ‘seven o’clock’ ‘seven items’ ‘eight days’ ‘eight years’ ‘eight o’clock’ ‘eight items’
T2 in Trisyllabic Phrases In fast conversation, a syllable bearing T2 can be optionally produced as a T1 when preceded by T1 or T2 and followed by any tone other than the neutral tone. For example, dōng nán fēng ‘southeast wind’ can be produced as dōng nān fēng; pú táo táng ‘glucose’ can be produced as pú tāo táng. Tone in Reduplicated Adjectives Monosyllabic adjectives can be reduplicated in Mandarin Chinese. The tone of the second occurrence of the adjective is optionally produced as T1, no matter what its underlying tone is. (8) shows two examples: (8) (a) yuǎn → yuǎn yuān (de) (b) màn → màn mān (de)
To recapitulate, the tone sandhi rules described in this subsection are applied only to particular words, in contrast to the T3 sandhi process which occurs for all words. With the exception of T3 Sandhi and T4 yì and bù becoming T2 when followed by another T4, the cases of tone sandhi introduced above are optional. 1.3
Intonation in Chinese
Tone is of primary significance in the phonology of Mandarin Chinese, but intonation is also integral to indicating the nature of phrases and sentences. Indeed, tone, stress,3 and intonation all interact with one another in Chinese (Shen 1989). Since tone and intonation share the common feature of being carried mainly by fundamental frequency (F0), the respective contributions of tone and intonation to the surface F0 contours are difficult to tease apart. As a result the analysis of surface F0 contours is greatly complicated when various sentence-level events (such as question or statement intonations, showing focus by giving prominence to certain sentence components, and so on) interact with lexical tones in connected speech. Chao (1933: 131) proposes two types of intonational additions in Mandarin Chinese: simultaneous and successive. Simultaneous additions are when one
3 Stress is a part of Chinese prosody but it is not the focus of this book. Stress patterns in Mandarin Chinese will be briefly mentioned in Chapter 7.
of the following four features is simultaneously added to tones to form the resulting sentence melodies: (a) a general raised pitch, (b) a general lowered pitch, (c) pitch range widened, and (d) pitch range narrowed. These four features may affect either the entire utterance or only part of the intonation. Successive additions are rising and falling endings which are added after tones are mapped. In order to determine whether intonation is imposed on tones simultaneously or successively, Shen (1989) examines the interplay of lexical tones and intonations. She argues that intonation is simultaneously superimposed as a whole onto tones. She further claims that Mandarin Chinese has fewer intonation patterns than those found in nontonal languages such as English, and that the relative height in F0 at the starting point of intonation cues the distinctive features between statement and question. According to Shen (1989), it is the pitch scale rather than the pitch shape that characterizes intonation patterns. Just as in any nontonal language, intonation is superimposed on the segmental and suprasegmental elements as a whole. The addition of intonation takes place concurrently with tone mapping and syllable protrusion. As a result, intonation contributes to the final output in surface pitch contours over sentences. Focal prominence in sentences is usually marked by prosodic prominence through the use of suprasegmental features. Despite certain mechanisms that occur globally, languages vary in how they realize focal prominence (Jun 2005; Ladd 2008). In nontonal languages such as English, information structure in most cases is manifested by pitch contour changes. However, in Mandarin Chinese sentence-level focus is expressed mainly by expanding pitch range, intensity, and duration. A key point is that Chinese does not change the essential shape of a lexical tone contour to express sentence-level focus (Flemming 2008; Xu 1999). According to Xu (1999), to mark focal prominence in an utterance, three focus-related pitch changes should be made: expand the pitch range of the focused words, drastically suppress the pitch range of the postfocus words, and keep all other words neutral (Xu 1999: 94–95). In other words, when Chinese speakers emphasize a specific word in a Chinese sentence, they may expand the pitch range, making a high tone higher and a low tone lower, and make the focused syllable longer and louder, but in most cases will still keep the essential shape of the pitch the same, such that native ears can identify the tone. To summarize, intonation is also an integral part of Mandarin Chinese prosody. When sentence-level phonological events take place concurrently with lexical tones, the basic contours of each lexical tone will be roughly preserved.
The Acquisition of Mandarin Chinese Tones
The Mandarin Chinese tone system is a dynamic and comprehensive whole that requires learners to develop a mental faculty (tone phonology) that attends to and processes the meaningful distinctions of tones. Acquiring the Chinese tone system therefore requires one both to mimic the acoustic-phonetic features of Chinese tones, as well as learn the phonology of Chinese sounds. The acoustic-phonetic aspect of learning Chinese naturally leads to the question of whether or not a leaner’s musical ability is relevant or transferable to acquiring Chinese. This section first outlines the relationship between musical abilities and tone acquisition, then introduces the current understanding of the order in which L1 speakers acquire Chinese tones. A summary of research on L2 perception of Chinese tones and a historical overview of the research on L2 tone production is then provided in the second half of this section. Further details on various aspects of L2 tone production are presented in Chapter 2. 1.4.1 Musical Ability and the Acquisition of Tones Both music and speech make use of acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, it might be natural to assume that musical ability substantially affects the acquisition of tone system. Some have even argued that tonedeaf people are not able to acquire a tone language. Past research on this question bears some discussion. First of all, speech and music do not depend on the same underlying cognitive and neural mechanisms. Thus inability to hear and reproduce relative pitch, and other musical abilities, should not prevent a person from learning a tonal language as his or her native language. A. Chen (2013) investigates possible innate perceptual biases that may shape Mandarin T3 Sandhi and examines the cross-domain perception of pitch from infancy to adulthood. A positive significant correlation between musical pitch processing and Mandarin lexical tone discrimination is observed among Dutch adult listeners but not among Mandarin adult listeners. This indicates that linguistic ability is separate from one’s musical abilities. In addition, young Dutch infants (four and six months old) showed facility in processing musical pitch but continued to show difficulties in discriminating between Mandarin T2 and T3. This finding further suggests that in early infancy musical and speech processing develop separately.
Behavioral and neuroimaging research shows that the left hemisphere is more adept at phonemic processing (phonemes, syllables, words, etc.) than the right hemisphere, while the right hemisphere is better at melodic and prosodic processing (music, pitch contours, affective prosody, etc.) than the left hemisphere (see the review in Jongman et al. 2006). That is, the left hemisphere is more sensitive to linguistic stimuli than the right hemisphere. For example, neuroimaging studies such as Gandour et al. (2000) find that, for speakers of tonal languages, the left prefrontal cortex is involved in the processing of lexical tone as a linguistic property when the tones presented have linguistic relevance, whereas the right prefrontal cortex is involved in pitch judgment tasks. Wang et al. (2001) confirm this finding in a task in which listeners were asked to identify dichotically presented tone pairs. Results reveal that lexical tones are predominantly processed in the left hemisphere by native Mandarin speakers, and that left hemisphere dominance arose from the intrinsic linguistic significance of F0 modulations. On the other hand, results of PET (positron emission tomography) studies reveal that, for speakers of a nontonal language such as English, the processing of Mandarin tone lies in the homologous right hemisphere frontal regions (similar to the way pitch is processed as a nonlinguistic stimulus). However, further neuroimaging research suggests that cortical involvements in tone processing can be modified in lowproficiency learners as proficiency improves. Wang et al. (2003) examine the acquisition of Mandarin Chinese tone contrasts in beginning adult learners by comparing cortical activation during a tone-identification task before and after a two-week training procedure. Using fMRI (functional magnetic resonance imaging), they found that improvements in performance are associated with increased activity in Wernicke’s area, which is located in the left superior temporal gyrus (Brodmann’s Area 22), and in adjacent regions also located in the left hemisphere. As mentioned above, insufficient musical ability does not prevent a person from learning a tonal language as one’s native language. However, musical experience may facilitate the learning of a new tonal language by adult learners (Wong and Perrachione 2007; Cooper and Wang 2012, 2013 for Cantonese tone-word learning; A. Chen 2013). In a study investigating the possible relationship between musical experience and L2 lexical tone perception, Wong and Perrachione (2007) trained naïve English-speaking subjects to identify English pseudo words superimposed with three pitch patterns resembling three Chinese lexical tones (level T1, rising T2, and falling T4). It was found that learners were capable of learning to use pitch patterns for lexical identification, with the most successful learners tending to have past musical experience.
1.4.2 The First Language Acquisition of Chinese Tones Although there seems to be almost no work addressing tonal perception in infants acquiring Mandarin Chinese, limited evidence collected from studies on other languages indicate that very young infants, certainly by six months but likely as early as one month, can perceive pitch differences (Harrison 1999, 2000). Studies of first language acquisition reveal that children acquire Mandarin Chinese tones at a very early age, long before they master the inventory of segmental sounds (vowels and consonants) (Chao 1951; Li and Thompson 1977). The pioneering study on infant tonal acquisition was published in the mid-twentieth century (Chao 1951) and was based on only one subject, the author’s granddaughter Canta in the first twenty-eight months of her life. It was not until the 1970s that formal studies on the L1 acquisition of Mandarin Chinese involving multiple subjects were done. Li and Thompson (1977) looked at seventeen Mandarin-speaking children ranging from one-and-a-half years to three years of age. They found that at the earliest one-word stage, highlevel tones (T1) were produced first, followed by high-falling tones (T4). Rising (T2) and dipping tones (T3) were produced later, and syllables with such tones were either avoided or changed to T1 or T4. When these last two tones were acquired, they were initially often confused with one another, and this confusion continued into the two- to three-word stage. Clumech (1980) studied two Mandarin-speaking children and confirms this order of acquisition. The accuracy rates are shown in Table 1.3 for each tone. Both Li and Thompson (1977) and Clumech (1980) agree that children have more or less mastered the tones at a stage when segments are still quite far from adult forms. For example, one of Li and Thompson’s later-stage subjects said [yaba day dəyi] for [laba džaydzəli] ‘the horn is here,’ but produced the tones with complete accuracy [21 55 41 41 214]. More recently, Zhu and Dodd Table 1.3
Accuracy rate of Mandarin-speaking children (Clumech 1980)
High level (T1) High falling (T4) High rising (T2) Low dipping (T3)
97.2 95.8 61.3 73.9
(2000) confirm the above findings, describing the phonological acquisition of 129 monolingual Mandarin Chinese speaking children, aged one-and-ahalf to four-and-a-half years of age. Children’s errors suggest that Mandarin Chinese-speaking children master four elements of Mandarin syllables in this order: (1) tones; (2) syllable-initial consonants; (3) vowels; and (4) syllable-final consonants. Once a child starts to produce multi-word utterances, the possibility of tonal alternations arises. Thus, the mastery of tone sandhi is also part of the task of tonal learning. Even less is known about this than about the acquisition of lexical contrasts. As Yue-Hashimoto (1980) reports, her subject correctly applied the Mandarin T3 Sandhi rule, which changes a low tone to a high rise before another low tone, starting at two years and three months of age. The tone sandhi phenomena associated with the dipping tone (Tone 3) in Mandarin Chinese are acquired with minor error once propositional utterances begin to be created. Li and Thompson (1977), for their part, report that the threeyear olds in their study did apply the Pre-T3 Sandhi rule, but erratically and with hesitation. Unfortunately, their study did not continue past the age of three, so it is unknown when the rule began to be applied consistently and correctly. 1.4.3 Tone Perception by Adult Learners Let us now move on to findings of adult learners’ perception of Chinese tones from previous studies. Several studies comparing native Chinese and nonnative listeners’ perception of lexical tones indicate that previous tonal experience plays a positive role in the identification of tone categories (Leather 1987; Stagray and Downs 1993). The studies also show normalization effects in the disambiguation of phonemic contrast (Jongman and Moore 2000). Bent (2005) finds that English listeners exhibit overall lower and more variable sensitivity to Mandarin tone contrasts than Mandarin listeners, with English listeners mostly attending to global aspects of the stimuli and Mandarin listeners mostly attending to lexical tone targets. Divergent findings make it unclear whether having a tonal language other than Chinese as an L1 facilitates or hinders the L2 perception of Chinese tones. Gandour (1983) finds that English listeners attach more importance to the height and less to the contour dimension of F0 compared to listeners who spoke a tonal L1, such as Mandarin, Cantonese, Taiwanese, or Thai. Lee et al. (1996) finds that Cantonese listeners perceive Mandarin tones better than English listeners do, suggesting that a tonal L1 may aid in discriminating among L2 tones if the L1 tonal system is more complex than that of the L2. However, when faced with easily-confused tones, such as the Mandarin
rising T2 and low-dipping T3, learners who have tonal experience do not seem to exhibit any advantage over speakers of nontonal languages. For example, Hao (2012) finds that the Cantonese group did not perform significantly better than the English group in perceiving and producing Mandarin tones, with both groups exhibiting significant difficulty in distinguishing T2 and T3. Several studies even reveal that speaking a tonal L1 negatively affects the perception of L2 tones. For example, X. Wang (2006) tested speakers of Hmong (a tonal language), Japanese, and English on how well they could identify Mandarin Chinese tones. The study found that English and Japanese speakers performed equally well, while the Hmong speakers performed significantly worse than the other two groups. The author concludes that if L1 tones do not map exactly onto L2 tones, this lack of alignment may interfere with the acquisition of non-native tones, especially at the initial stage of learning. So (2006) finds that the Cantonese tonal system hinders the learning of Mandarin tones, while the Japanese pitch-accent system facilitates the establishment of a new tonal system. The English stress-accent system neither helps nor hinders tone learning. On the link between the L2 perception and production of tones, it is acknowledged that while perception and production are related to one another, they do not appear to have a straightforward relationship. Although it is usually assumed that perception precedes production (Leather 1990; Flege 1993), B. Yang (2012, 2015) argues that, in the case of English speakers learning Chinese, L2 learners’ production exceeds their perception. She also finds that suprasegmental and segmental categories impact tone perception and production in different ways. Perception is influenced by tone categories and syllable-level categories, while tone production errors are independent of tone categories. Therefore B. Yang (2015) suggests that “tones are perceived with relative systematicity, whereas tone production is not.” By examining tone-bearing segments, she also concludes that tones are perceived at the phonological level, but are produced at the phonetic level. This may be because register plays an important role in English speaking learners’ perception of Chinese tones while contour has a bigger impact on tone productions. The Second Language Acquisition of Mandarin Tones: A Historical Perspective4 It is the L2 production, rather than the perception, of Mandarin Chinese tones that forms the principal concern of this book. To lay the groundwork for subsequent chapters, I first give a historical overview of the research on 1.4.4
4 For further information, see H. Zhang (2018) about “Current Trends in Research of Chinese Sound Acquisition.”
L2 production of Chinese tones. I provide further details on various aspects of L2 tone production studies in Chapter 2, which will also present three puzzles relating to L2 tone production. L2 Chinese phonological research since its inception has been concerned mainly with the acquisition of Chinese lexical tones. The amount of work done on the L2 acquisition of the Chinese sound system is relatively small compared to that done on its morphosyntax, and most of it is in the form of articles scattered in various academic journals. For example, Casas-Tost and Rovira-Esteva (2015) report that of the 745 articles published between 1966 and 2013 in the Journal of Chinese Language Teachers Association (JCLTA),5 a major journal on second-language Chinese, only 12% are phonology studies. Of these, about 62% are concerned with L2 Chinese tones, and mostly focus on the production of tones, especially Tone 3. L2 Chinese phonetic and phonological research has benefited from the ongoing development of general Second Language Acquisition (SLA) research and phonological theories. It followed the shift in SLA studies away from Contrastive Analysis (Lado 1957), and moved toward using Error Analysis (Corder 1967) to examine non-native sound patterns more closely. Research has also turned toward the recognition of interlanguages (Selinker 1972), or linguistic systems thought to have been developed by L2 learners. The notion of interlanguages was first proposed in the late 1960s and early 1970s by researchers such as Corder (1967), Nemser (1971), Selinker (1972), and Adjemian (1976) (as cited in White 2003). One of the most interesting features of this system is that it preserves some features of the learners’ L1 while creating innovations in speaking or writing the L2. In this book I utilize the theory of interlanguages and use the term ‘interlanguage grammar’ to refer to the grammar of a target language invented by non-native learners. In line with other L2 Chinese studies, the study of L2 Chinese sounds has shown a clear evolution from descriptive to more data-based studies. Recent work features several improvements over the first wave of research produced in the early 1970s: it is based on learners from a wider range of linguistic backgrounds than before, it offers a more diverse set of research topics, and the studies are set in more sophisticated theoretical frameworks. The evolution in the research is also reflected in researchers’ changing views of the sources of tonal errors. Tonal errors made by L2 learners were notorious for being ‘wild’ in the linguistic literature, with a large number of mistakes appearing to be randomly 5 Chinese as a Second Language (CSL) is the continuation of the Journal of the Chinese Language Teachers Association (JCLTA), USA.
produced. However, research over the years has shown that many of these mis-productions are in fact somewhat systematic, and many findings support the idea that tonal errors originate from interferences created by the learner’s mother tongue (referred to as ‘L1 transfer’). Most of these studies focus on data which show, in native English-speakers learning Chinese, the transfer of salient English prosodic structures, such as narrow pitch ranges, lexical-level stress patterns, and utterance-level prosody. For example, G-T. Chen (1974) reports that his Chinese informants used an average pitch range that was 258% greater for words and 154% greater for sentences when compared to English speakers. From this, he concluded that one source of error in tone production should be attributed to learners failing to widen their pitch range when speaking Chinese. A large number of studies have found that tone errors made by L2 learners, mostly English-speaking learners, can be traced back to the phonetics of English intonation (White 1981; Broselow et al. 1987; and Q-H. Chen 2000, among others). For example, White (1981) finds that English speakers use intonation to differentiate sentence types and express emotion and attitude when speaking Chinese. Studies concerning positional effects in L2 tones find that T4 is easier to perceive or produce at utterance-final positions, most likely because utterance-final T4 is phonetically similar to the falling English sentence intonation at the end of a phrase or a sentence (Broselow et al. 1987; H. Zhang 2015). Q-H. Chen (2000) reveals that English-speaking learners of Mandarin Chinese regularly produce ‘High-Low’ pitch patterns when learning Chinese tones, which is likely also transferred from English stress patterns and intonation. Although L1 transfer is still considered an important factor affecting L2 Chinese, in the late 1980s researchers began to look closely at the role of the L2 as well. Before that, in the early 1980s, Chinese linguists did intensive research on tone feature geometry, such as Yip (1980). These work led to the proposal that lexical tones be described in two primary ways: register and contour. Recall from my discussion in §1.1.2 that register refers to a tone’s pitch height (high, low) while contour refers to pitch movement (rising, falling, dipping, etc.). This proposal was influential in L2 tone studies. For example, Shen (1989) and Miracle (1989) analyze subjects’ tonal errors not as a whole, but rather focus on component errors, viewing mistakes as occurring in the components of shape and/or register. Bent (2005) extends the component approach by putting tone errors into three categories: (1) pitch range errors, which occur when a learner’s pitch range is smaller than that of native Mandarin speakers (Leather 1990); (2) pitch register errors, which occur when tone targets in a learner’s production fall short of both high and low targets (Q-H. Chen 1997; Leather
1990; Miracle 1989; Wang et al. 2003); and (3) tone contour errors, which occur when the wrong contour is produced in the form of incorrect pitch contour direction or when a static tone is substituted for a dynamic one (Wang et al. 2003). Other studies have created complex maps of the differences between L1 and L2 linguistic systems to describe the various causes of production and perception errors made by L2 learners (Leather 1990; Elliot 1991; Hao 2012; and X. Wang 2006, among others). In the past decade, another line of L2 Chinese phonological research has arisen: tonal error research which shifts the focus from L1 transfer and L2 difficulties to other sources of error such as phonological universals and pedagogical issues. This new perspective has given rise to further explanations for some longstanding issues. For example, researchers have noticed that some L2 Chinese tonal error patterns appear neither to be due to features of the target language alone, nor derived directly from the grammar of the learners’ native language patterns. Rather, such error patterns often reveal universally preferred structures (Broselow et al. 1998; Major 2001; H. Zhang 2007, 2010, 2013, etc.), which indicates that L2 learners construct mental grammars that are constrained by general and independently motivated principles. These principles are usually attested in the phonologies of other natural languages of the world. Furthermore, some specific errors, such as those that occur on Tone 3, the most problematic tone in L2 Chinese, are found to be caused by unnecessary computational burdens placed on learners. These burdens are viewed as being rooted in the current mainstream teaching methodology of tones. This book consolidates the newer research that investigates L2 tone grammars from perspectives other than direct L1 and L2 contrasts. The three core chapters, Chapters 4–6, present original research that looks at how there are factors other than L1 features that affect L2 tone grammars. This book features two innovations. First, it incorporates cross-linguistic studies. Up to now the focus of most studies on L2 Chinese tones has been on English speakers. In contrast, the majority of the studies in this book draws from data collected from Chinese learners of diverse L1 backgrounds: Japanese and Korean in addition to English. These three languages represent three different types of nontonal languages according to the characteristics of word prosody (Jun 2005): stress-accent languages (such as English), lexical-pitch-accent languages (such as Japanese), and non-stress and non-lexical-pitch-accent languages (such as Korean). Additionally, speakers of these three languages make up the majority of Chinese language learners worldwide (Hu 2008). Only by utilizing a cross-linguistic approach such as this can we successfully examine
the prosody of interlanguage tonal grammars, and ground the analysis theoretically from the perspective of phonological universals. Second, Chapters 4–6 use different working models to present research on various aspects of L2 acquisition of Mandarin tones. Chapter 5 presents quantitative research in which Optimality Theory (OT, Prince and Smolensky 1993) serves to model the unconscious tonal knowledge that underlies L2 tonal productions. OT is one of the most influential frameworks developed in linguistics in the past twenty-five years and has generated a considerable amount of insightful analysis on a variety of issues (McCarthy and Prince 1993, 1995; Kager 1999; Zhang and Yin 2012). This constraint-based framework is especially intriguing for researchers in language acquisition studies because of its proposal on how language is represented in the mind, as well as how language develops over time (Barlow and Gierut 1999; Hancin-Bhatt 2000). Some L2 phonologists believe that OT represents a promising direction for future research. OT provides a more explicit account of the interaction between L1 transfer and developmental effects in L2 phonology (Hancin-Bhatt 1997, 2000, 2008; Eckman 2004) which can further our understanding of L2 learners’ access to Universal Grammar (Chomsky 1965, 1981). Despite the increased interest in adopting OT to explain L2 phenomena, the few L2 phonological studies that have utilized OT have had a narrow focus, discussing only the acquisition of syllable structures (e.g. Hancin-Bhatt and Bhatt 1997; Broselow et al. 1998; Hayes 1999; Hancin-Bhatt 2000; Broselow 2004; Shepherd 2003; Cardoso 2011). The research in Chapter 5 is one of very few works that explore L2 studies of suprasegmentals, in particular, L2 tonal grammar within the OT framework. 1.5
Organization of this Book
Following the present chapter, Chapter 2 provides background to L2 Chinese tone studies, including (1) a description of the prosodic structures of learners’ first languages (English, Japanese, and Korean), and (2) a literature review focusing on three aspects of L2 tone studies and highlighting three puzzles that previous research on L2 acquisition of Chinese tones have not yet resolved. Accordingly, following a chapter that introduces methodology (Chapter 3), the book’s three core chapters (Chapters 4–6) take up these three puzzles in detail. The research presented in the three core chapters are relatively independent from one another as they focus on various aspects of L2 tones; however, they are also related to each other in that they draw upon the same set of data obtained from a series of phonological experiments.
Both Chapters 4 and 5 analyze dissimilation phenomena in L2 Chinese tones, but focus on different issues and work within two different frameworks. The goal of Chapter 4 is to present evidence showing that a cross-linguistically common phonetic mechanism, anticipatory dissimilation, causes specific error patterns in the non-native tone production of T2 and T4 by adult learners of Chinese. This cross-linguistic study of L2 Chinese tones finds that anticipatory tone coarticulation affects the intelligibility of the phonological identities of L2 tones. Since error patterns regarding the acquisition order of Mandarin tones cannot be directly derived from L1s, nor learned from L2, the study presented in Chapter 5 tests for evidence of the operation of two independent phonological principles, namely the Tonal Markedness Scale (TMS) and the Obligatory Contour Principle (OCP), in non-native tone production. In order to better explain how these two phonological constraints jointly play a role in shaping L2 tonal grammars, I conduct the research within the OT framework. As this theoretical framework is somewhat new to the research of L2 Chinese tones, some key OT concepts as it applies to L2 acquisition studies, and the OT treatment of tones, are explained at the beginning of the chapter. This chapter seeks to answer the following questions: 1. 2. 3.
How do interlanguage phonologies reflect language typologies? What do learners usually do when their first language prosodic grammar cannot accommodate the target language input? How do we model the second language Chinese tones using the OT grammar?
Chapter 5 then models, applying a set of ranked constraints, the unconscious tonal knowledge of L2 learners that underlies L2 tonal productions. A fourstage path of OCP sub-constraint re-ranking is proposed to account for the error patterns found in the phonological experiment. While Chapters 4 and 5 explore how independent phonetic mechanisms or phonological principles constrain L2 Chinese tone productions, Chapter 6 presents research on non-native T3 production. In it I argue that, other than L1, L2, and ‘universal’ phonological and phonetic constraints, there also exist artifacts created from current teaching methodologies. Mastering T3 is one of the most challenging undertakings for adult learners. Chapter 6 examines results from two experiments on the acquisition of three allophones of T3, with the first involving learners from different L1 backgrounds and the other involving learners at various proficiency levels of Chinese. By looking into L2 learners’
performance with each of the three variants of T3, Chapter 6 seeks to deepen insight into the difficulties surrounding T3 by arguing that mainstream teaching methodology is a source of learner error, in contrast to the view that the tone contains an inherent difficulty. The book concludes with a review of current materials and methods in the teaching of Chinese tones (Chapter 7), and offers practical suggestions for improving the teaching and learning of tones. This review includes (1) some ‘classic’ Chinese language textbooks, such as Dr. Yuenren Chao’s Mandarin Primer, and other popular textbooks currently used in the United States, P.R. China, and elsewhere; and (2) reference books that introduce the sound system of Mandarin and advise L2 learners on how to achieve a native-like tone system. By reviewing these teaching materials and pointing out current issues, the chapter highlights pedagogical implications of the research presented in this book and offers sample teaching materials for the teaching and learning of Chinese tones.
Three Puzzles in Mandarin L2 Tone Acquisition This chapter presents three phenomena in Mandarin L2 tone acquisition that cannot be fully explained by L1 transfer and have not yet been explained by past L2 studies. Before presenting these puzzles, I will first discuss one of the main causes of L1 transfer in the second language acquisition of tones: L1 prosody. The first section in this chapter introduces the prosodic characteristics of the three first languages involved in this book: American English, Tokyo Japanese, and Seoul Korean (hereafter English, Japanese, and Korean). Reviewing the prosodic structures of these three languages will help make clear how some interlanguage tone error patterns cannot be explained solely by L1 transfer. Section 2.2 presents the three puzzles themselves: (1) positional effects of contour tones, (2) the order of acquisition of Mandarin tones, and (3) the paradox of T3. 2.1
Prosodic Structures of English, Japanese, and Korean
This section is not intended to provide a detailed and complete description of the prosodic phonology of each language, but rather will focus on aspects of prosody that relate to Mandarin Chinese tone acquisition. I discuss the intonational structures in each language from a cross-linguistic perspective in order to highlight and compare similarities. Pitch is significant in all languages, but its role differs from one to another. Pitch patterns may be specified either at the lexical level or at the phrasal/ sentential level, or at both the lexical and phrasal/sentential levels, resulting in more or less dense tonal specifications. As discussed in the first chapter, pitch is used mainly to specify lexical meaning in the tone language Mandarin Chinese, although tonal languages also have intonation structures. By contrast, in nontonal languages like English, Japanese, and Korean, pitch features are mainly post-lexical or intonational. The question of whether pitch contours that arise from intonation in nontonal languages should be linguistically represented as contours (e.g. rising and falling) or as a series of even pitches (e.g. high and low) is highly debated in the linguistic literature. Many linguists have treated intonation contours as gestalts (i.e., the contour functions as one unit) (see Bolinger 1951; Jones 1972; Cooper and Sorensen 1981; Hirst and Di Cristo 1998). These researchers claim
© koninklijke brill nv, leiden, 2018 | doi 10.1163/9789004364790_003
Three Puzzles in Mandarin L2 Tone Acquisition
that intonational contours should be seen as holistic entities that directly reflect certain functional or structural aspects of speech, such as the distinction between statements and questions. However, it has been argued that the gestalt approach cannot fully account for intonational meaning, and that a series of even pitches is a better representation of intonation (Arvaniti 2011: 774–75). Studies such as Goldsmith (1976) have supported this view by analyzing contour tones in this way; those specifically devoted to English intonational contours include studies by Pike (1948), Liberman (1975), and Pierrehumbert (1980). In all of these analyses, the phonetic pitch contours in English intonation represent the surface realization of an underlying pattern of even pitch, with pitch changes merely acting as necessary transitions from one even pitch to the next. In this book, I assume that contours in nontonal languages including English, Japanese and Korean should be break down into even tones (or tone features). From the perspective of word prosody, English, Japanese, and Korean each represents a different type of nontonal language: English is an example of a stress language, Japanese a pitch-accent language, and Korean a non-stress non-pitch-accent language. In what follows I will devote some attention to prosodic structures that occur at the above-word levels (mainly in the form of phrase-level prosodies such as focal prominence marking), but it is on wordlevel prosodic structures relevant to L2 tone acquisition that I will particularly focus. 2.1.1 English Prosodic Structure According to Nespor and Vogel (1986) and Selkirk (1984), English has the following prosodic constituents: the Intonation Phrase (IP), the Intermediate Phrase (ip, also known as the ‘Phonological Phrase’), the Clitic Group or prosodic word, the foot, the syllable, and the mora. In typical intonational languages like English and many other European languages, global pitch contour is mainly specified at the post-lexical level through a complex interplay between metrical structure, prosodic phrasing, syntax, and pragmatics. At the word level, accents exist in English and are phonetically manifested by ‘stress,’ a cluster of phonetic properties that includes increased intensity and duration as well as various spectral correlations such as the adjustment of pitch shapes that are not specified by fixed pitch types. Because of this, English is called a ‘stress-accent language.’ Liberman’s notion of metrical phonology suggests that linguistic prominence consists of a binary relation between strong and weak nodes in a branching tree structure (Liberman 1975; Liberman and Prince 1977). There are two kinds of stress patterns in English: weak-strong (iambic) and strong-weak (trochaic). According to Delattre (1965),
most English words (74%) are trochaic. Diagram 2.1 illustrates the two kinds of stress patterns (Ladd 2008:55).
W permit (verb)
W permit (noun)
Figure 2.1 Metrical grid for a bitonal metrical foot with one stressed syllable.
In English, major pitch movements often accompany stressed syllables. That is, syllables that are metrically strong at the word level (known as ‘primary-stress’ syllables) often serve as docking sites to which phrasal-level pitch accent is associated post-lexically. Therefore, the pitch accent of sentence focus in the intonation contour must coincide with the head of a word-level stress foot. Because of the association between syllable stress and sentence-level pitch accent, English is called a ‘head-prominent’ language when it comes to sentence-level prominence marking. Head-prominent languages such as English, German, and Greek mark prominence through the use of pitch accent (i.e., salient pitch movement on the stressed syllable). The focused word receives a nuclear pitch accent while the following word(s) is ‘de-accented’ in what is known as post-focal prosodic subordination. The pitch peak on the stressed syllable of the focused word is usually immediately followed by a phrasal low tone. 2.1.2 Japanese Prosodic Structure Following Venditti’s (2005) description of Tokyo Japanese prosodic structures, I assume the following prosodic constituents in Japanese: the Intonation Phrase (IP), the Accentual Phrase (AP) which is analogous to the general Phonological Phrase (PhP), the prosodic word, the foot, the syllable, and the mora. Japanese is called a ‘pitch-accent language.’ In pitch-accent languages, such as Japanese, Swedish, and Serbian, pitch features operate on both lexical and phrasal/sentential levels. These languages have features of both tonal and intonational languages in that pitch is to some degree lexically associated. At the lexical level, Japanese words can be accented or unaccented, with accented
Three Puzzles in Mandarin L2 Tone Acquisition
words making up about half of the vocabulary. The placement of the accent is to some degree lexically contrastive. For example, the sequence hashi spoken in isolation can be accented in two ways: either háshi (accented on the first syllable, meaning ‘chopsticks’) or hashí (flat or accented on the second syllable, meaning either ‘edge’ or ‘bridge’).1 Unlike tonal languages, Japanese has only a certain number of these minimal pairs that differ in accent placement. Also unlike tonal languages, words in lexical pitch-accent languages have at most one syllable which is lexically specified (Arvaniti 2011) and there is only one fixed tone type associated with accented syllables. In Japanese, this manifests as a sharp drop from a high to a low pitch (Beckman and Pierrehumbert 1986). The Japanese lexical pitch accent is thus annotated as H*L. Both English and Japanese have accent at the lexical level, but their phonetic realizations of lexical accent differ. Japanese lexical pitch accent is mainly achieved through changes in pitch movement (hence ‘pitch-accent language’), while English stress is achieved through increased intensity and duration (hence ‘stress language’) (Beckman 1986). Japanese is a mora-timed language, meaning that the tone-bearing unit (TBU) is the mora. However, it is the syllable that bears the accent (Kubozono 1999). In contrast, English is a stress-timed language and the TBU is usually assumed to be the syllable (Gussenhoven 2004; Jun 2005). At the post-lexical level, the focus structure of a sentence in Japanese requires a particular intermediate phrasing pattern, the Accentual Phrase (AP), in which every focus occurs on the leftmost constituent of the phrase. While in English post-lexical prominence is essentially realized “culminatively by marking the head of a prosodic unit,” focal prominence in Japanese is realized “demarcatively by marking the edge of a prosodic unit” (Jun 2005: 440–41). Focal prominence in Japanese is usually expressed by inserting a prosodic boundary before and/or after the focused word (Venditti 2005; Venditti et al. 2008). 2.1.3 Korean Prosodic Structure In languages such as Korean, pitch patterns do not distinguish lexical meanings, so they are not tonal languages. At the same time, according to Jun (1993, 2005), neither stress nor pitch accent exist at the word level, making Korean different from other nontonal languages like English and Japanese. Research in Jun (1996, 1998) and Lim (2001) suggests that the fundamental frequency peaks and valleys in Korean intonation are not linked to any specific syllable 1 For more information about Japanese pitch accent and alternative accounts, see Kubozono (2011).
of a word, but are instead linked to a certain location in an accentual phrase (AP). That is, the pitch modulation of an utterance in Korean is a property of the sentence rather than the word. Jun (1990, 1993, 1998) developed a Korean Prosodic Model based on pitch contours, which was in turn based on Japanese prosodic models developed by Beckman and Pierrehumbert (1986) and Pierrehumbert and Beckman (1988). According to Jun’s model (1993, 1998), the delimiting of prosodic units is primarily determined by pitch contours or tonal patterns. There are at least two prosodic levels marked by intonation in Korean: the Intonational Phrase (IP) and the Accentual Phrase (AP, which is analogous to the general PhP). The typical tonal pattern of the Korean AP is LHLH or HHLH, where the AP-initial tone is determined by the laryngeal feature of the phrase-initial segment. When the segment is either aspirated or tense, the AP begins with an H tone (Halle and Stevens 1971, Jun 1998). Elsewhere, it begins with an L tone. 2.2
Puzzles Surrounding the L2 Acquisition of Tones
In this section I review major findings on the following three topics: the positional effects of L2 tones, the order of acquisition of Mandarin Chinese tones, and the acquisition of T3. It will become apparent from this review that there are three phenomena associated with L2 tones that cannot be explained by L1 transfer effects. These three phenomena will therefore serve as the ‘puzzles’ that I will attempt to resolve in this book. 2.2.1 Puzzle 1: Positional Effects of Contour Tones Previous studies such as Miracle (1989), Broselow et al. (1987), and Sun (1998) have found evidence suggesting that position within a polysyllabic word affects the perception and production of L2 tones. H. Zhang (2015) proposes two types of positional effects. One concerns ‘absolute’ positions. For example, the same tone may be produced differently depending on whether it is word-initial, word-medial, or word-final. The other positional effect concerns ‘relative’ positions, or how the tones affect each other in neighboring positions (‘intertonal effects’). As we shall see, the intertonal effects of two contour tones, the rising T2 (LH) and the falling T4 (HL), presents a particular problem for L2 learners of Chinese. Sun (1998) found that subjects perceive and produce the tones of monosyllabic words and the final syllables of polysyllabic words with greater accuracy than tones of nonfinal syllables. This observation was especially true for T1 and T4. Similar results were found in perception studies with English-speaking
Three Puzzles in Mandarin L2 Tone Acquisition
participants who had no prior exposure to Chinese (Broselow et al. 1987). Broselow et al. (1987) found that the falling tone (T4) is significantly easier to perceive when it occurs at the end of an utterance. This positional effect is likely due to the fact that an utterance-final T4 is phonetically similar to the English pitch contour, which features falling intonation at the end of a phrase. This positive L1 transfer of T4 perception seems to hold true for L2 tone productions. In a crosslinguistic study, H. Zhang (2015) analyzed the performance of non-native tones in varying prosodic units. In general, T2 is produced better at word- and phrase-initial positions, but T4 is produced better at wordand phrase-final positions. However, these general trends do not hold true in all environments, and one issue was left without an explanation. Participants in H. Zhang (2015) seem to exhibit two intertonal effects. First, although in general, word-initial T2s are produced significantly better than word-final T2s, word-initial T2s preceding T1 (and T4) are produced poorly. Second, although in general word-initial T4s are produced with a greater rate of error than wordfinal T4s, word-initial T4s are produced remarkably better when followed by T3 than when followed by other tones. Although these two findings are largely overshadowed by general T2-T4 positional patterns found in previous studies, they are no less interesting. Similar error patterns involving T2 at word-initial positions were also found in other studies such as Y-J. Wang (1995, 1997). Considering that the affected contour tones (T2 and T4) are located at wordinitial positions, it is possible that tones at word-final positions affect wordinitial contours. I explore this issue with a quantitative study in Chapter 4. Since both T1 and T4 have high onsets while T2 and T3 have low onsets, I hypo thesize that a crosslinguistically common phonetic mechanism, ‘anticipatory coarticulation,’ restricts the realization of contours at word-initial positions in L2 tones (Xu 1997; Y-J. Wang 1997). This tone coarticulation mechanism (see detailed discussion in Xu 1997) has been found in studies of native Mandarin Chinese, Thai, Vietnamese, etc., but the idea that this affects L2 tone phonologies has not yet been explored. Chapter 4 tests the hypothesis that this mechanism, responsible for fine-grained phonetic variation in natural language, is also the source of tone errors in L2 learners. Puzzle 2: Two Issues in L2 Studies on the Acquisition Order of Chinese Tones The temporal acquisition order of the four Chinese lexical tones is usually determined by comparing error and/or substitution rates in perception and production studies. There are two issues associated with the acquisition order of tones: the inherent difficulty of individual tones, and the inherent difficulty of bitonal sequences consisting of identical tones. 2.2.2
Studies that focused on the acquisition order of individual Mandarin tones by adult learners have reported different findings (Kiriloff 1969; G-T. Chen 1974; Zhao 1988; Miracle 1989; Shen 1989; Leather 1990; Elliot 1991; Lu 1992; Guo 1993; McGinnis 1996; Q-H. Chen 1997; Sun 1998; H. Zhang 2010). The lack of consensus is evident from a summation given by Sun (1998): the acquisition order could either be in the sequence T1 *FALL, *LEVEL.
(5) Tableau for a partial grammar which does not allow rising tones, with the sample input form /LH/ /LH/ LH ☞ HL ☞ HH
These same constraints in different relative rankings can model at least three types of attested tone languages, as shown in (6): (6) Three types of attested tone languages c. Type 1 (Panoan languages) (Gordon 2001): Contour tones are not permitted, but level tones are allowed. … *RISE, *FALL >> FAITH >> *LEVEL …
d. Type 2 (Lama) (J. Zhang 2004): Rising tones are not permitted, but falling and level tones are allowed. … *RISE >> FAITH >> *FALL, *LEVEL … e. Type 3 (Chinese): Both contour and level tones are allowed. … FAITH >> *RISE, *FALL, *LEVEL … As we can see from the tonal grammars listed in (6), within the OT framework differences in languages arise not from different constraints, but from different rankings of the same set of constraints. Constraint ranking can be used to both account for natural language inventories as well as explain phenomena found in language acquisition, as will be shown below. 5.1.2 OT in Language Acquisition Studies: Grammar Restructuring OT incorporates linguistic universals (represented by markedness constraints) in a formal theory of a speaker’s phonology, arguably explaining a wider range of phenomena in L2 phonological acquisition than rule-based theories (Broselow et al. 1998; Barlow and Gierut 1999; Hancin-Bhatt 2000; Lombardi 2003; Eckman 2004). The TMS is reflected both statically in the tone inventories of native tonal languages and in the dynamic process of language acquisition. For example, it is reported that Mandarin Chinese-speaking children produce level and falling tones earlier than rising tones (Li and Thompson 1977; Zhu and Dodd 2000). For these children, the process of acquiring Mandarin can be described in OT terms as a process of adjusting their ranking of tonal constraints to match that of mature native speakers. The acquisition order of
Phonological Universals and Acquisition Order of Tones
L1 Chinese tones can thus be understood as an adjustment from a Type 2-like grammar as shown in (6), where speakers eventually demote *RISE to converge on an adult’s Type 3 grammar. While both L1 and L2 learners re-rank constraints, this re-ranking process may differ to some degree. In L1 acquisition, all markedness constraints are typically assumed to initially dominate all faithfulness constraints. Then, in the course of L1 acquisition, either only markedness constraints are demoted (Tesar and Smolensky 1998, 2000), or markedness constraints are demoted while faithfulness constraints are promoted (Boersma and Levelt 2000). On the other hand, since L2 learners start their acquisition with the constraint ranking of their L1 (see Young-Scholten 1994; Archibald 1998), L2 learners must eventually arrive at the target ranking through a series of re-rankings. Grammar restructuring during language acquisition is assumed to be error-driven (Tesar and Smolensky 1998). That is, when input forms are inconsistent with a learner’s original constraint ranking, the learner must adjust his or her constraint rankings so as to produce the correct outputs.2 For example, if the input violates a high-ranked markedness constraint, that markedness constraint will be demoted. A few OT studies on the L2 acquisition of syllable structure suggest at least two types of constraint re-rankings: gradual demotion of initially high-ranked markedness constraints for learners expanding the range of their target structure (for example, Thai speakers acquiring English syllable structure; Hancin-Bhatt 1998, 2000, 2008), and gradual demotion of faithfulness constraints for learners restricting it (for example, English speakers acquiring Japanese syllable structure; Hayes 1999). The current study on tone acquisition presents a complex situation where learners must both restrict and expand their range of identical tone sequences. Details are provided in §5.1.4. 5.1.3 OCP Effects in Mandarin Chinese: An OT Account of T3 Sandhi Dissimilation is a cross-linguistically common phenomenon in which similar or identical sound structures in close proximity are avoided. Dissimilation can be found in static generalizations over a language’s lexicon, such as in Arabic roots where adjacent homorganic consonants are avoided (Alderete et al. 2013). Dissimilation can also be found in phonological processes, such as in Tashlhiyt Berber, where delabialization is triggered by two primary labial consonants in the same derived stem (Alderete and Frisch 2007). Although it has been claimed that dissimilatory processes can occur with any phonological feature
2 In an OT analysis of L2 tonal grammar, the target tones of Mandarin Chinese are taken as the inputs, and the tonal productions produced by L2 learners of Mandarin Chinese are taken as the outputs.
(Suzuki 1998), the most common cases involve the dissimilation of tone, place of articulation, and laryngeal features (Alderete 2003). In the 1970s and 1980s, studies on tone and prosodic phonology advanced a generative approach to dissimilation (Leben 1973; Goldsmith 1976). In autosegmental phonology, dissimilation is understood as an effect of the Obligatory Contour Principle (OCP). (7) Obligatory Contour Principle (Leben 1973; Goldsmith 1976): Adjacent identical autosegments are prohibited. The OCP was originally developed to explain tonal dissimilation phenomena observed in Mende and other African tone languages (see Leben 1973; Goldsmith 1976). After its original conception, it was argued that the OCP was a universal constraint, differing across languages only in its ranking with respect to other constraints (McCarthy 1986; Yip 1988). The OCP is mostly inactive (that is, ranked lower than FAITH constraints) in Mandarin Chinese, observable only in T3 Sandhi and T4 Sandhi processes (only applied to yi ‘one’ and bu ‘not’) (see details in §1.2.2). To take the grammar of T3 Sandhi first: in Chinese, all possible combinations of the four basic lexical tones occur in Chinese words, with the exception of T3-T3 (L-L) due to the T3 Sandhi process. When a T3 is followed by another T3, the first becomes a rising tone, which is identical in phonetic shape to T2 (LH). An example of the Tone 3 Sandhi process is given in (8). (8) Tone 3 Sandhi a. wǔ L b. wǔ. diǎn L. L LH. L
‘five’ citation form (T3) ‘five o’clock’ citation form (T3-T3) surface form (Raised-T3-T3)
T3 Sandhi is usually formulated as a rule (T3→T2/___T3) in traditional generative phonology. This process is arguably better described using the OCP in OT, since an OT account specifies the motivation for the range of sandhi processes. Although Yip (2002) sometimes invokes the operation of OCP constraints over constituent tones in her analyses, she suggests that the dissimilatory cases in Chinese identify complete contours as units. That is, the dominant OCP constraints in Chinese are those that operate over whole tones rather than over a
Phonological Universals and Acquisition Order of Tones
tone’s component or constituent parts (for example HL rather than just H or just L). Item (9) presents the OCP constraints used by Yip (2002). (9) OCP constraints used by Yip (2002) OCP(WholeTone): Identical whole tones cannot occur on adjacent syllables. OCP(ConstTone): Adjacent identical constituent tones are prohibited. OCP(L): Identical low tones cannot occur on adjacent syllables (L-L is not allowed). To account for the L2 tonal grammar discussed later, I propose to further break down the family of OCP (WholeTone) constraints into a set of OCP subconstraints, shown in (10–11): (10) O CP(Level), or OCP(L) and OCP(H): Identical whole level tones (Low or High) cannot occur on adjacent syllables. Specifically, H-H and L-L are not allowed. (11) O CP(Contour), including OCP(LH): Identical whole contour tones (such as Rising and falling) cannot occur on adjacent syllables. Specifically, LH-LH and HL-HL are not allowed. Chinese prohibits low tone sequences T3-T3 (L-L) while allowing other tone pairs T2-T2, T4-T4, and T1-T1 (LH-LH, HL-HL, and H-H). In other words, the well-known tone sandhi process found in Chinese is the result of a high-ranked OCP (L) constraint, which restricts any occurrences of L-L. Ranking OCP (WholeTone) below FAITH allows for all other identical tone sequences in the language. For the sake of simplicity, the analysis in tableaux (12) and (13) is restricted to candidates that do not change the second syllable of the word.3 (12) Tableau showing the prohibition of L-L surface tones (T3 Sandhi) /L-L/ ☞LH-L L-L
3 This is only a part of the grammar. Yip (2002) proposes a high-ranked positional faithfulness constraint (FaithPrWdHead) that resists changes made to the second syllable of a word. To rule out candidates such as HL-L and H-L, constraints such as OCP(ConstTone) and FaithNuclearTone should be included in the ranking.
(13) Tableaux for H-H, LH-LH, and HL-HL /H-H/ ☞H-H L-H
OCP(WholeTone) * OCP(WholeTone) * OCP(WholeTone) *
/LH-LH/ OCP(L) ☞LH-LH L-LH
*RISE ** *
/HL-HL/ OCP(L) ☞HL-HL L-HL
Tableau (12) shows how the input of a T3-T3 sequence, taken to be /L-L/ in its underlying form, results in a surface sequence of LH-L. The tableaux in (13) show how the input tone sequences H-H, LH-LH, and HL-HL each produce output sequences that remain faithful to the input, since the high-ranked OCP (L) does not affect the surface of these combinations. I therefore represent Chinese tonal grammar with the following constraint ranking (see Yip 2002: 210 for a detailed account): (14) ... OCP (L) >> FAITH >> *RISE, OCP(WholeTone) … Present Study: Hypotheses Regarding the TMS and the OCP in L2 Tones As discussed in §5.1.2, the TMS (i.e., *RISE >>*FALL >>*LEVEL) is not reflected in the inventory of Mandarin Chinese, since Chinese equally allows all tone types, T1, T2, and T4, to be freely distributed. However, as we can see in L1 tone studies, the TMS is reflected in the acquisition order of tones by Mandarinspeaking children (Li and Thompson 1977; Zhu and Dodd 2000). Most of the previous L2 studies (such as Kiriloff 1969; Miracle 1989; Leather 1990; Elliot 1991; Guo 1993; Q-H. Chen 1997; H. Zhang 2010) were conducted with English-speaking learners only. Here I hypothesize that TMS is functional in all English, Japanese, and Korean learners’ interlanguage tonal grammars. Therefore, I expect that T2 will be more difficult for all L2 learners to acquire than T4, and that T4 will be more difficult than T1. I expect the effects of the TMS (*T2>>*T4>>*T1) to be manifested in error and substitution rates. That is, if the TMS is functional in L2 grammars, the error rates of T2 will be higher than those of T4, and the error rates of T4 will be higher 5.1.4
Phonological Universals and Acquisition Order of Tones
than those of T1. I also expect the substitution rates to be negatively related to error rates. It is believed that other issues may interfere with the error and substitution rates of Tone 3, which will be discussed in detail in Chapter 6. Therefore, testing for evidence of the TMS will focus only on the production of T1, T2, and T4, although the acquisition of T3 Sandhi will be briefly mentioned in §5.2.3. I also hypothesize that the OCP constrains the L2 tones made by these three groups of learners. Identical tone combinations are often considered very difficult for L2 learners to produce (H. Zhang 2010, 2016a; Yang and Yang 2017). As mentioned earlier, H. Zhang (2010) found that English speakers acquiring tones tend to make more errors with identical tone sequences (such as T2T2) than nonidentical tone sequences such as T2-T3. I thus hypothesize that identical tone sequences are especially difficult for adults to produce because of the OCP. I posit a simplified constraint ranking of L1 grammars below, and model the L2 acquisition of tone with a series of constraint re-rankings. As mentioned earlier in Chapter 2, all three L1s studied here allow two adjacent TBUs bearing two consecutive H pitches or two consecutive L pitches. However, it is rare to find two identical contour tones within a single word in any of these L1s, such as the Chinese LH-LH or HL-HL sequences (for further information on Korean intonation, see Jun 1996, 2005; for English intonation, see Pierrehumbert 1980; for Japanese prosody, see Kubozono 2011 and Venditti 2005). Since the learners’ L1s do not have lexical tones (see Chapter 2 for information regarding different intrinsic structures of intonation and lexical tones), this chapter will be basing their constraint rankings on allowed phrase-level pitch patterns. Phrasal tone constraints may include Coincide constraints, which ensure that every phrase has phrasal tones (Zoll 2004), as well as Alignment constraints, which require tones to be at specific positions (McCarthy and Prince 1993). The surface form of intonation is determined by markedness constraints on phrasal tones. Therefore, phrasal tone constraints must dominate Faith in intonational languages. In this study, I assume high-ranked phrasal tone constraints (such as Coincide and Alignment), but do not include them in most of the following discussion, as this study focuses on the rankings of constraints related to lexical tone acquisition. In English, Japanese, and Korean, markedness constraints such as *Contour and OCP(Contour) also dominate Faith, since two consecutive identical contour tones (such as HL-HL or LH-LH) within a word are not permitted. However, other markedness constraints, such as OCP (H) and OCP (L) are dominated by Faith. Since English, Japanese, and Korean have similar representations of intonation, we will be representing their L1 grammars with a single (partial) constraint ranking, seen in (15).
(15) Partial ranking representing English, Japanese, and Korean intonation grammars P hraseTone >> OCP(Contour), *Contour >> Faith >> OCP(H), OCP(L), *Level Learners must do three things in order to correctly acquire identical tone sequences in Chinese: (1) they must expand their set of allowed identical tone sequences to include LH-LH and HL-HL sequences; (2) they must restrict their range of possible identical tone sequences to exclude L-L, since L-L exists in all of their L1s but is not allowed in Chinese; and (3) they must retain the H-H pitch sequence which already exists in their L1s. To restate this in terms of OT, these learners must (1) demote both the *Contour and OCP(Contour) constraint below Faith, (2) promote the OCP(L) constraint above Faith, and (3) maintain the OCP(H) at its original low L1 ranking. As discussed in §5.1.3, the OCP effect is rare (low-ranked) and only occasionally appears in the form of the OCP (L) in Mandarin. The general OCP (WholeTone) is masked by faithfulness constraints, as seen in the grammaticality of other identical tone combinations (H-H, HL-HL, LH-LH), and therefore is ‘inactive’ in the target language grammar. On the other hand, the first languages of these learners do not have lexical tones. The Chinese individual lexical tones, tone sequences, and sandhi processes are all difficult for learners to acquire. This study hypothesizes that, like L1 productions, L2 tone productions are also restricted by the OCP. That is, in the course of learning Chinese as a second language, learners will show the effects of OCP (WholeTone) in their interlanguage tones. Therefore, even though the tone pairs H-H, LHLH, and HL-HL are equally allowed in Chinese, learners may disfavor these identical tone sequences. This may be manifested both in the types of errors they make as well as in the types of substitutions they use. In other words, learners may produce identical tone sequences such as T2-T2 and T4-T4 with more errors than nonidentical tone sequences (such as T2-T3, T2-T4), and use nonidentical tone sequences as substitutions when these errors do occur. 5.2 Results This section reports the results of the experiment described in Chapter 3. For the sake of simplicity, results will focus on evidence for the operation of the OCP in what I refer to as ‘OCP tests,’ and TMS effects will be reported only when relevant. Particular attention will be paid to two types of tone combinations: Identical Tone Combinations (ITC) and Nonidentical Tone Combinations (NITC). Specifically, two OCP tests were conducted (1) to test for evidence
Phonological Universals and Acquisition Order of Tones
of the OCP, where the results of ITCs were compared with those of NITCs; (2) to determine the movement of OCP(L), where surface productions of Raised-T3-T3 were observed; and (3) to determine the rankings of OCP(H) and OCP(Contour), where the results of various ITCs (specifically T1-T1, T2-T2, and T4-T4) were compared. 5.2.1 Results of OCP Test 1: The Change Rate of ITC and NITC Simply comparing the error rates of ITCs with those of NITCs may not be sufficient to test for effects of the OCP, since OCP testing requires an analysis of entire tone sequences. For example, when a learner intends to produce a target T2-T2 (LH-LH) sequence, he or she may change the first tone, the second tone, or both tones, and incorrectly produce a NITC. Alternatively, he or she might change both tones and incorrectly produce another ITC, such as T2-T2 for T4-T4 (HL-HL). In this case, even a high error rate does not necessarily indicate that the speaker has difficulties with identical tone sequences. A simpler explanation would be that the speaker favors T4 over T2. Similar discrepancies can occur in producing target NITCs, where the speaker may produce erroneous identical tone pairs as well as erroneous nonidentical tone pairs. Because of this, this section does not calculate error rates but analyzes the types of substitutions made in L2 speakers’ productions. Although individual differences were found, I chose to look at these learners as a group, in order to posit generalizations regarding second language acquisition. The goal of Test 1 was to determine how often target ITCs were produced erroneously as NITCs, and conversely, how often target NITCs were produced erroneously as ITCs. I refer to these errors as the ‘change rate.’ To compare change rates, I set X to be the percentage of incorrectly-produced target ITC pairs that were substituted by NITCs, and Y as the percentage of incorrectlyproduced target NITC pairs that were substituted by ITCs, as illustrated in Figure 5.1. If X was significantly greater than Y, this result would be consistent with the hypothesis that the OCP has an effect on L2 productions. I used the SurveyFreq Procedure to calculate X and Y for each sub-data-set, and the Rao-Scott Chi-Square test to calculate p-values. X values were significantly higher than Y values both in general as well as within each of the three groups of speakers. Tables 5.1–5.3 below give details of Test 1. The fourth column shows the percentages of NITCs produced as ITCs, or vice versa, out of the total number of productions. Results of statistical tests are given in the last column. The change rate for ITCs was significantly greater than the change rate of NITCs, showing that all participants were better at producing a sequence of nonidentical tones than a sequence of identical ones. Let us now look closer at the subtypes of tonal productions.
Figure 5.1 OCP Test 1.
OCP Test 1 results for English speakers
Target Actual productions stimulus type
Raw percentages of actual productions
92.69% Y = 7.31% X = 63.75% 36.25%
75.31% 5.94% 11.95% 6.80%
Pr> Chisq < 0.0001
→ NITC → ITC → NITC → ITC
count: 964 count: 76 count: 153 count: 87
Table 5.2 OCP Test 1 results for Japanese speakers Target Actual productions stimulus type
Raw percentages of actual productions
93.46% Y = 6.54% X = 57.5% 42.5%
75.93% 5.31% 10.78% 7.96%
Pr> Chisq < 0.0001
→ NITC → ITC → NITC → ITC
count: 972 count: 68 count: 138 count: 102
Phonological Universals and Acquisition Order of Tones
Table 5.3 OCP Test 1 results for Korean speakers Target Actual productions stimulus type
Raw percentages of actual productions
89.62% Y = 10.38% X = 57.08% 42.92%
72.81% 8.43% 10.7% 8.04%
Pr> Chisq < 0.0001
→ NITC → ITC → NITC → ITC
count: 932 count: 108 count: 137 count :103
OCP Test 2: The Acquisition Order of Tone Pairs and Effects of the TMS The goal of Test 2 was to determine whether the proportions of ITCs in L2 productions matched those of the target stimuli in each sub-data-set. There were sixteen possible tone combinations of T1, T2, T3, and T4, all of which were evenly proportioned in the stimuli. The ITC sequences made up 3/16ths of the total number of stimuli (T1-T1, T2-T2, and T4-T4; note that T3-T3 should be produced as Raised-T3-T3). Therefore, if the OCP has no effect on L2 tones, we would expect the ITC productions to also make up about 3/16ths (18.75%) of the total production. The ITC rates are compared with their expected values using a binomial test. All ITCs together made up 12.72% of English speakers’ L2 tonal production, which was significantly lower than 18.75% (p < 0.0001); 13.28% of the Japanese speakers’ productions, also significantly lower than 18.75% (p < 0.0001); and 16.48% of the Korean speakers’ productions (p = 0.0008). Korean speakers exhibited the highest rate of ITCs, but all three groups of speakers produced less than 18.75% of ITCs, indicating that learners disfavored ITCs in general. I also looked into the breakdown of each tone pair and compared the occurrences of T1-T2, T2-T2, and T4-T4 in OCP Test 2. Interestingly, when the ITCs were broken down and analyzed by specific tone type, individual ITC sequence proportions varied widely. All three groups of speakers produced more T1-T1 sequences than T4-T4 sequences, and more T4-T4 sequences than T2-T2 sequences. Figure 5.2 displays the occurrences of each type of ITC. These numbers include all actual productions, both the correct as well as the erroneous ones that learners made when they were intending to produce a different tone combination. 5.2.2
Figure 5.2 The occurrences of tone pairs in L2 production (OCP Test 2).
If the OCP had no effect on speakers’ productions, we would expect each of these individual tone pair rates to be 1/16th, or 6.25%, of all productions. A binominal test indicates that H-H productions across all three groups of speakers were significantly greater than the expected 6.25%, whereas the T2-T2 and T4-T4 rates were both significantly lower than the expected 6.25%. It was also found that participants produced many more T4-T4 sequences than T2-T2 sequences. Additionally, accuracy rates of tone pairs (as target tone sequences) showed the same tendency: T1-T1 sequences were produced significantly more often than T4-T4 sequences, and T4-T4 sequences were produced correctly significantly more often than T2-T2 sequences. This finding held true across all three groups of speakers. The higher occurrence of T4-T4 over T2-T2 was very likely motivated by the Tonal Markedness Scale. To confirm the effects of the TMS (*Rise >> *Fall >> *Level), I calculated the error rates of individual T1, T2, and T4 tone productions (see Figure 5.3).
Figure 5.3 Error rates of individual tones in the main experiment.
Phonological Universals and Acquisition Order of Tones
Table 5.4 Rao-Scott Chi-Square test results for TMS hypothesis (values of Pr > Chisq)
T4 vs. T1 T2 vs. T4
< 0.0001 < 0.0001
0.2780 < 0.0001
0.0004 < 0.0001
The SurveyFreq Procedure was used to calculate error percentages. The three groups demonstrated similar error patterns. A Rao-Scott Chi-Square test accounted for multiple observations within participants. The final results are presented in Table 5.4. All three groups made more errors with T4 than T1. This difference was significant for the English and Japanese speakers, but was only a trend for the Korean speakers. The error rates of T2 were significantly higher than those of T4 across all three groups of speakers. The significantly better performance of T1 over T4 and T4 over T2 was confirmed by analyzing substitution patterns used by all three groups of learners. When learners made tone errors, they used T1 as a substitute for the target tone more often than T4, and T4 as a substitute tone more frequently than T2. 5.2.3 The Acquisition of Tone 3 Sandhi This section looks at the acquisition of T3-T3 sequences. In native Chinese, the dissimilation of T3-T3 (L-L) is motivated by OCP(L). The underlying L-L sequence surfaces in Chinese as Raised-T3-T3 in a process known as T3 Sandhi. By comparing the accuracy rate of non-native tone productions of Raised-T3-T3 sequences with that of other ITCs, I attempted to compare the acquisition of T3 Sandhi with other tone pairs. Figure 5.4 shows the accuracy rates of ITCs (namely, T1-T1, T2-T2, T4-T4) in relation to the accuracy rates of Raised-T3-T3. The accuracy rate of the Raised-T3-T3 sequence was very high compared to that of other ITCs, suggesting that Tone 3 Sandhi is acquired quickly (a detailed discussion regarding this high accuracy rate will be given in Chapter 6, but here I only include T3 data for comparison purposes.) Most of the accuracy rates of Raised-T3-T3 sequences were lower than that of T1-T1 sequences (with the exception of Japanese) but higher than that of T4-T4 and T2-T2 sequences.4 4 The accuracy rate of Raised-T3-T3 (LH-L) is higher than that of H-H for the Japanese participants. This may be due to L1 transfer of Japanese intonation. The Accentual Phrase of Japanese typically starts with a rising intonation.
Figure 5.4 Accuracy rates of identical tone sequences.
Discussion: An OT Account for the Acquisition of Identical Tone Sequences
5.3.1 OCP(WholeTone) or OCP(ConstTone) The two OCP tests reported in this chapter present evidence supporting the operation of the OCP in L2 productions of tones. To ensure that the observed effects come from the OCP for whole contour tones (as in Yip 2002) rather than the OCP for constituent tones motivated by OCP(ConstTone), I looked closer at the error rates of relevant tone sequences. If the observed effects come from OCP(ConstTone), we would expect the error rates of T2-T2 (LH-LH) and T4-T4 (LH-LH) to be very low. In addition, T2-T2 and T4-T4 sequences would be used as substitutions for other tone sequences often, since they consist of alternating H and L component tones. Meanwhile, the error rate of those sequences containing two identical component tones across syllable boundaries should be very high, specifically T1-T4 (H-HL), T2-T1 (LH-H), T2-T4 (LH-HL), T3-T2 (L-LH), T4-T2 (HL-LH), and T4-T3 (HL-L). Out of all sixteen possible tone combinations, T1-T1 (H-H) and T3-T3 (L-L) can be counted as both identical whole tone sequences as well as component tone sequences, so they were excluded from the analysis. This study found more support for an effect of OCP(WholeTone) than for OCP(ConstTone). The error rates of T2-T2 (LH-LH) were the highest among the sixteen tonal combinations across all three groups of participants. The error rates for T4-T4 (HL-HL) were lower than for T2-T2 (LH-LH) but still
Phonological Universals and Acquisition Order of Tones
much higher than for T1-T4 (H-HL), T2-T4 (LH-HL), and T4-T3 (HL-L).5 This held true across all three groups of speakers. In addition, some sequences such as T1-T4 (H-HL) and T2-T4 (LH-HL) were often used as substitutions for other sequences. However, it is difficult to exclude from consideration the possibility of some effect of OCP(ConstTone), as the most often-used substitutions include T2-T3 (LH-L) and T3-T4 (L-HL), both of which consist of alternating H and L tones. To summarize, although L2 learners showed some effects of an OCP(ConstTone) constraint, manifested by the error patterns listed above and the anticipatory dissimilation effects explored in Chapter 4, they were largely overshadowed by the effects of an OCP(WholeTone) constraint. 5.3.2 Stages in L2 Tone Phonology Development The results above show some evidence of an OCP (WholeTone) constraint in effect in learners’ interlanguage grammars, but this evidence seems dependent on the type of tone being analyzed. The high accuracy rate of T1-T1 (H-H) and wide use of T1-T1 as a substitute for other tones, as we found in the OCP tests, is in essence an ‘anti-OCP’ effect. This may be attributed to L1 transfer since H-H is a typical L1 tone sequence in Korean and Japanese and is allowed in English. However, I believe that the preference for H-H is not solely due to L1 transfer because (1) compared to contour tones, high level tones are physically easier to produce and are more widely distributed in natural languages, and (2) phonetic studies show that strings of T1-T1 (H-H) sequences require the least articulatory effort (Shih and Lu 2010). These reasons make it very likely that there is at least some type of universal preference for T1-T1 sequences at play. The asymmetry found in the OCP tests between LH-LH and HL-HL probably cannot be attributed to L2 knowledge, since both structures are equally grammatical in Chinese. This asymmetry also cannot be attributed to L1 transfer, since both structures are equally unusual in English, Japanese, and Korean. Because of this, I argue that the Tonal Markedness Scale is still accessible to adult L2 learners and interacts with the OCP. The remainder of this section will account for these patterns and propose a four-stage model describing the development of tonal phonology in the interlanguage using OCP sub-constraints. As discussed in (15), the OCP(H) constraint is low ranked for English, Japanese, and Korean. Because OCP(H) is also dominated by all other 5 T3-T2 (L-LH) and T4-T2 (HL-LH) have high error rates, likely due to strong positional effects of T2 (LH) as discussed in the ‘general positional effects’ of Chapter 4 in this book.
constraints in Chinese phonology, OCP(H) does not require re-ranking. It is not surprising that learners have little difficulty producing T1-T1 (H-H) sequences. Lexical contour tones (LH-LH and HL-HL) are new for these learners, so the general OCP constraint and OCP(Contour) are ranked high in the initial stage of acquisition. OCP(LH) is masked by the more general OCP(Contour) in L1s, since the rising tone is a type of contour tone. OCP(LH) is also masked by OCP(Contour) but ranked low in the Chinese phonology, since all contour pairs are allowed in Chinese. Item (16) shows the rankings of OCP constraints in L1 and L2 phonologies. Note especially the position of OCP(LH), shown in gray. (16) L1 and L2 phonologies L1 (initial state): OCP(Contour), OCP(LH) >> Faith >> OCP(H), OCP(L) L2 (target state): OCP(L) >> Faith >> OCP(Contour), OCP(LH), OCP(H) Item (16) suggests that OCP(Contour) must be demoted in order to approximate the target language ranking. T4-T4 (HL-HL) and T2-T2 (LH-LH) sequences are both new to L2 learners and therefore difficult for them to acquire. However, results reported above suggest that learners acquire T4-T4 sequences earlier than T2-T2 sequences, since (1) T4-T4 sequences had a lower error rate than T2-T2 sequences, and (2) T4-T4 was used as a substitute for other tone sequences more often than T2-T2. Following Boersma and Levelt (2000) and Boersma and Hayes (2001), who advocate a model of acquisition in which the rate of constraint re-ranking is dependent on the frequency with which a constraint is violated in the input data, I assume that the more-frequently violated markedness constraints are more quickly demoted than the less-frequently violated ones. In terms of tone acquisition, every time OCP(LH) is violated, OCP(Contour) will also be violated, since LH is a type of contour tone, but not vice versa. Therefore, the general OCP(Contour) constraint would be demoted before OCP(LH) is. This would result in an interlanguage phonology in which, at some point during L2 acquisition, OCP(Contour) is demoted, but OCP(LH) is not. This phenomenon, in which a general constraint is demoted before a specific one, is thought to exist in both L1 and interlanguage grammars (Broselow 2004). This intermediate stage is summarized in (17). (17) Intermediate Stage: OCP(LH), OCP(L)>> FAITH>> OCP(Contour), OCP(H)
Phonological Universals and Acquisition Order of Tones
In this intermediate stage of tone acquisition, T1-T1 and T4-T4 sequences are allowed, but T2-T2 is not, leading to more T4-T4 sequences than T2-T2 sequences in learners’ production. Figure 5.5 simulates this specific stage of L2 tone acquisition. Here we see the same group of input tone pairs surfacing with different outputs in the L1, the interlanguage, and the L2. Only T1-T1 remains throughout the acquisition process. In the intermediate stage of the L2 acquisition of Chinese tones, T1-T1 and T4-T4 are in learners’ outputs, but OCP(LH) remains high ranked, which leads to the absence of T2-T2 sequences in L2 tonal productions for a period of time. Example outputs are cited from the most frequent productions found in the current study. Surface identical tone sequences are boldfaced.
Figure 5.5 Re-ranking of OCP sub-constraints.
In this case, the demotion of a general OCP constraint is affected by the Tonal Markedness Scale. The effects of OCP(LH) are originally masked by the more general OCP(Contour) constraint in both L1 and L2 phonologies, only becoming visible over the course of L2 acquisition. This situation represents what is known as ‘The Emergence of the Unmarked’ (TETU, McCarthy and Prince 1994; Broselow et al. 1998). Very few participants in this study consistently produced all tone pairs correctly, suggesting that most participants still have the rankings found in the developing interlanguage phonology. To summarize, if we take the partial constraint ranking of the L1 intonation phonology as the initial stage of acquisition and the native Chinese constraint ranking as the final stage, the proposed path of OCP constraint re-ranking includes at least two intermediate stages. As shown in Figure 5.4, the high
accuracy rate of Raised-T3 indicates that restricting L-L pitch sequences is not difficult for language learners, so the promotion of OCP(L) occurs no later than the demotion of OCP(Contour), if not earlier. This is labelled Stagen. OCP(Contour) is demoted before OCP(LH), which is labelled Stagen+1. Since learners were able to produce all individual tones, including lexical contour tones, with high accuracy, I believe that *Contour is demoted very early in the re-ranking process, as shown in Stagen. (18) Summary of the demotion and promotion of the OCP constraints Stageinitial: Partial L1 ranking OCP(Contour), *Contour >> Faith >> OCP(Level), *Level Stagen: Early demotion of *Contour, promotion of OCP (L) OCP(L), OCP(Contour) >> Faith >> OCP(H), *Level, *Contour Stagen+1: Demotion of OCP(Contour) before OCP (LH): OCP(L), OCP(LH) >> Faith >> OCP(Contour), OCP(H), *Level Statefinal: Partial L2 ranking OCP(L) >> Faith >> OCP(Contour), OCP(H), *Contour, *Level The above is an idealized picture. Actual L2 phonological acquisition is more complicated, as L2 learners exhibit great variation among themselves. When L2 learners re-rank their constraints, this may result in an ‘unstable’ phonology with a seemingly inconsistent array of surface representations until constraint rankings stabilize (see the Gradual Learning Algorithm of Boersma and Hayes (2001), which accounts for variation using probabilistic constraint rankings; see Broselow and Xu (2004) and Cardoso (2007, 2011), among others, for an application of this model to variation in second language learning). An Alternative Account for the Effect of the OCP Interacting with the TMS This chapter has explained the interaction of the OCP and the TMS by linking individual tone types to OCP sub-constraints. There is an alternative account that can also explain this interaction, which will be briefly discussed here. As defined in (7), OCP constraints, or the prohibition of adjacent identical autosegments, do not actually reference specific markedness elements. On the other hand, the TMS (*RISE >> *FALL >> *Level) is concerned only with the 5.3.3
Phonological Universals and Acquisition Order of Tones
complexity of individual tones and makes no claims about tone pairs at all. These are two independent constraints, but they jointly shape the learners’ interlanguage tonal phonology since, for example, we found that two violations of T2 (LH) was more serious than two violations of T4 (HL). In Alderete (1997), OCP effects are understood to be the result of markedness constraints strengthened by the operation of ‘Local Conjunction’ of a constraint with itself, or ‘self-conjunction’ as discussed in Smolensky (1993). Within the context of L2 acquisition of Chinese tone pairs, I propose here that co-occurrence restrictions can be explained by extending the original TMS ranking to the locally-conjoined tonal markedness constraints, as shown in (19). (19) (*Rise) L 2 >> (*Fall) L 2 >> (*Level) L 2 By locally conjoining the TMS, the errors and substitutions that participants made in this study can be explained with a single general principle: a sequence of two rising tones is the most difficult to produce (most marked); a sequence of two falling tones is of medium difficulty; and a sequence of two level tones is the least difficult. In this way, the promotion of the constraint OCP(L), which was proposed by Yip (2002), and the self-conjunction of the TMS ‘(*Rise) L 2 >> (*Fall) L 2 >> (*Level) L 2’ proposed in this study are sufficient to account for the dissimilation phenomena discussed in this chapter on L2 Chinese tones. 5.4 Conclusion This chapter has tested hypotheses on the role of the TMS and the OCP in shaping L2 tonal phonologies, and has presented a constraint-based analysis of the L2 acquisition of Chinese tone pairs. An experiment conducted with English, Japanese, and Korean native speakers learning Chinese as an L2 showed that L2 learners disfavor identical tone sequences, with the exception of T1-T1 tone sequences, which were produced with a high level of accuracy throughout the acquisition process. When learning contour pairs unfamiliar to their native language phonology but allowed in the target language, learners were better at producing T4-T4 (HL-HL) sequences than T2-T2 (LH-LH) sequences. This study suggests that OCP(WholeTone) interacts with the TMS and that OCP sub-constraints operate separately from one another. OCP(H) is kept at a low rank in the L1, the interlanguage, and the L2; OCP(L) is promoted to a ranking higher than Faith early on in the acquisition process, and OCP(Contour) is demoted to a ranking lower than Faith later on. The asymmetry of T2-T2 and
T4-T4 in L2 tones motivates my proposal that the general OCP(Contour) constraint is demoted earlier than the specific OCP(LH) constraint is. This OT account incorporates both L1 transfer (in the case of T1-T1) and phonological universals (the case of T2-T2 and T4-T4) into a single model. The interacting effects of the OCP and the TMS found in L2 tonal phonologies can also be understood as a case of local conjunction of the TMS: (*Rising) L 2 >> (*Falling) L 2 >> (*Level) L 2. The OT model of constraint re-ranking assumed in this chapter provides us with the tools to explicitly model the unconscious knowledge underlying L2 productions. However, certain other phenomena observed in L2 speakers cannot be accounted for in OT, as they involve the pedagogical practice of tones in classrooms. Specifically, T3 seems to be associated with unusual error patterns. In the next chapter, we examine the topic of T3, one of the most problematic tones for L2 learners, to explore how traditional linguistic assumptions on the underlying form of T3 and the pedagogical practices derived from this assumption affect the acquisition of this difficult tone.
Acquisition of the Third Tone1 Puzzle 3 presented in Chapter 2 concerns the third tone (T3), which is consistently regarded as the most problematic in second language studies of Mandarin tones. Even within the central position held by the topic of lexical tone acquisition in L2 Chinese sound research, T3 holds pride of place and is a recurrent subject of inquiry in the field. Several factors are regarded as contributing to the difficulty of acquiring T3: the acoustic similarity between T3 and other tones (especially T2), its multiple variations (‘allophones’), and the complicated sandhi processes associated with the tone. The current study offers a different explanation. I argue that the difficulties associated with the acquisition of T3 stems partially from what the tone’s underlying form is assumed to be, and the prevalent teaching method that has developed from that assumption. This chapter seeks to analyze the issue of T3 acquisition comprehensively by looking at a number of potential factors. Specifically, I will be (1) analyzing all three allophones of T3 separately so as to determine how well learners apply tone sandhi rules, (2) studying both disyllabic and trisyllabic words, (3) including both a perception and production component to determine where learners’ main source of error comes from, and (4) analyzing participants from multiple first language backgrounds and proficiency levels in order to track the acquisition of T3 as proficiency evolves. I believe that analyzing these various factors will give us a more complete picture of T3 acquisition than previous studies. This chapter is organized as follows: §6.1 reviews facts about variants of T3 and sandhi rules; §6.2 then summarizes previous L2 studies of T3 and the current study’s research questions. Section 6.3 introduces the designs of two phonological experiments, the results of which are presented in §6.4. The discussion in §6.5 addresses both the theoretical and pedagogical implications of the research findings. The chapter concludes with final remarks in §6.6.
1 Part of this chapter is based on H. Zhang (2014; 2016b).
© koninklijke brill nv, leiden, 2018 | doi 10.1163/9789004364790_007
The Allophones and Sandhi Rules of Tone 3
As noted in Chapter 1, among the four Mandarin Chinese lexical tones T3 displays the most variation, having three allophones. That is, T3 is pronounced in three different ways depending on surrounding tones, but speakers mentally categorize them as belonging to a single phoneme, T3. As mentioned in Chapter 1, T3 is produced as Full-T3, a low-dipping tone transcribed as , on utterance-final syllables or in isolation. When followed by another T3, the first T3 is produced as a high-rising tone , the Raised-T3, in accordance with the T3 Sandhi rule. In all other contexts, T3 surfaces as Half-T3, a lowfalling/level tone which is usually transcribed as . Although these allophones have distinctive phonetic forms, they belong to the same phoneme T3 and are marked with the same tone mark in the pinyin system. All T3 allophones and the environments in which they occur are again summarized in Table 6.1 (i.e., Table 1.2). Although frequently used in the literature and adopted here, the terms ‘Full-T3’ and ‘Half-T3’ are somewhat misleading. The predominant assumption is that Full-T3 is the underlying form while Half-T3 is derived from Full-T3. In reality, it is unclear whether T3 should be treated as having an underlying representation of  (Half-T3) or  (Full-T3). Traditionally, the low-dipping tone  is assumed to be the underlying form because the isolated pronunciation is typically viewed as representing the underlying form. However, there is also reason to believe that Half-T3, rather than Full-T3, is in fact the underlying form. As seen in Table 6.1, Half-T3 is clearly the most widely distributed allophone of T3. It occurs before T1, T2, T4, and the neutral tone, whereas Full-T3 only occurs in isolation or at the end of an utterance. Half-T3 may be even more widespread than this – it occurs in utterance-final positions in Beijing Mandarin (Duanmu 2000; Shi and Li 1997), and is produced this way Table 6.1
The three allophones of Tone 3
Full-T3  Raised-T3  Half-T3 
Environment of occurrence
Low-dipping In isolation or utterance-final position Mid-rising Preceding T3 Low-falling/level Preceding T1, T2, T4, neutral tonea
a Whether T3 surfaces as  or  before a neutral tone also depends on the syntactic structure of the utterance. See Cheng (1973) for further discussion.
Acquisition of the Third Tone
to represent T3 in Taiwan Mandarin, even in isolation (Tai 1978: 117). We will revisit this matter in further detail in §6.5.2. The identity of the posited underlying tone, whether Full-T3 or Half-T3, has consequences for what accompanying sandhi rules to formulate in order to derive the full range of T3 variants. Positing Full-T3  as the underlying form necessitates two T3 Sandhi rules, namely the Pre-T3 Sandhi Rule and Half-T3 Rule, shown in Item (1): (1) T3 Sandhi rules under the theory that Full-T3 is the base (underlying) form a. Pre-T3 Sandhi Rule: → / ____ or  b. Half-T3 Rule: → /___T (where T ≠  or ) In this account, when T3 occurs before another T3, it becomes Raised-T3 (whose pitch contour coincides with the pitch contour of T2, ). This is referred to as “intersecting phonemes” (Chao 1980:69). The Half-T3 Rule states that when T3 occurs before any other tone, including neutral tones, it loses its final rise and becomes a low-falling tone . The Pre-T3 Sandhi Rule is sometimes referred to as phonemic sandhi, while the Half-T3 Rule is referred to as phonetic sandhi (Norman 1988). 6.2
The Second Language Acquisition of T3
6.2.1 The ‘Full-T3 First’ Method Theorizing  as T3’s base form carries over into second-language pedagogy in various ways. To begin with, this assumption has designated and standardized the tone mark for T3 – the falling-rising shape of the T3 phonetic transcription is used in the most prevalent pinyin system and teaching materials. Table 6.2 shows how each of the four tones is marked on the sample syllable Table 6.2 The four Chinese lexical tones Lexical Tones
Tone 1 (T1) Tone 2 (T2) Tone 3 (T3) Tone 4 (T4)
mā má mǎ mà
‘mother’ ‘hemp’ ‘horse’ ‘scold’
/ma/. T3 is marked with a dipping shape, reflecting the pitch contour of Full-T3. Half-T3 does not have a particular tonal mark. Following the traditional Full-T3 account, Full-T3 is usually taught to L2 learners first. Learners are subsequently taught to memorize the two T3 rules listed in (1) in order to determine when to use Half-T3  and when to use Raised-T3 . While the Pre-T3 Sandhi Rule is explicitly explained, the Half-T3 Rule is usually skipped or only briefly introduced in classrooms (Tsung 1987; S. Chen 1973). This procedure, referred to here as the ‘Full-T3 First’ method, is currently the dominant teaching method in P.R. China, the United States, and various other countries. As noted before, T3 is consistently regarded as one of the most problematic tones for L2 learners in both perception and production tasks. Numerous studies have reported that L2 learners exhibit a high error rate for T3, and acquire T3 very late (Lin 1985; Leather 1990; Elliot 1991; Q-H. Chen 1997; Winke 2007; Shi 2007; Tao and Guo 2008; H. Zhang 2014). Despite the plethora of L2 studies on the acquisition of T3, few relate the phonological theories to pedagogical approaches. While several studies such as those by Tsung (1987) and C-Y. Chen (2005) do question traditional teaching methods and advocate Half-T3 as the primary form when teaching Chinese as a second language, they lack quantitative support. Furthermore, the majority of previous studies examine the three variations of T3 as a whole, so it is not clear which variation(s) of T3 cause the greatest difficulty. Many observe similar types of errors with Half-T3. A number of researchers have found that, although the error rate for T3 as a target tone is quite high, Half-T3  is often (mistakenly) produced in place of T1, T2, and T4 (Tao and Guo 2008; H. Zhang 2013; C. Yang 2011). Tao and Guo (2008) addresses this contradictory phenomenon, saying that “although Tone 3 may be the most difficult for American students to produce, the difficulty did not hinder production of this tone” (Tao and Guo 2008: 35). The present study attempts to reanalyze this issue by subjecting non-native T3 performance to a new experiment and analysis, in which each of T3’s three variants is examined separately. In theory, substituting a sound for its allophone should not lead to a change in meaning, since the substitution does not change the phoneme’s underlying identity. However, in the case of Chinese tones, mispronouncing the T3 allophones will not only cause poor tonal performance, but may also lead to misunderstandings. This occurs for at least two reasons. First, T3 represents a case of “intersecting phonemes” (Chao 1980:69), since Raised-T3 is phonetically identical to another phoneme, T2. Second, the mispronunciation of allophones may cause a sort of chain reaction that influences adjacent tones due to tonal coarticulation effects in connected speech (Xu 1997). As indicated in Chapter 4, this kind of coarticulation effect may decrease the intelligibility of
Acquisition of the Third Tone
the phonological identities of L2 tones. Therefore, it is crucial for learners to produce the correct T3 allophone in specific environments in order to acquire native-like spoken Chinese. 6.2.2 The Present Study This chapter seeks to answer the following research questions: 1. 2. 3.
Which variant of T3 is the most difficult for learners to perceive? Which variant of T3 has the highest error rate in production, and which tone do learners produce instead when they make errors? How does the current dominant ‘Full-T3 First’ teaching method interact with T3 acquisition by L2 learners of Chinese?
Particular attention will be paid to L2 learners’ production of Half-T3 in this chapter, since this is the most widely distributed form of T3. In a phonetic study, Zhang and Lai (2010) investigated native Mandarin speakers’ applications of the two T3 Sandhi rules to novel combinations. Interestingly, they found that native Mandarin speakers apply the Half-T3 Rule with greater accuracy than the Pre-T3 Sandhi Rule, indicating a synchronic bias against the phonetically less-motivated pattern. Therefore, I also predict Half-T3 to be produced with greater accuracy than Raised-T3 by learners of Chinese. This prediction is further supported by the Tonal Markedness Scale (TMS), which was discussed in Chapter 5. According to the TMS, complex contour tones are more difficult for learners than simple contours. As the three pitch levels of Full-T3  form a complex contour, the tone is more difficult than simple contour tones, which usually consist of just two pitch levels, as in rising or falling tones as well as Half-T3 (J. Zhang 2004). Assuming that both L1 and L2 acquisition are constrained by Universal Grammar (Major 2001), L2 learners would demonstrate greater accuracy on Half-T3 than Raised-T3. To summarize, based on two considerations (how native Mandarin speakers apply the two T3 Sandhi rules, and the predictions made by the TMS), it is predicted that the error rate of Raised-T3  will be higher than the error rate of Half-T3 . 6.3 Methodology This study consists of two experiments: a main experiment and a supplemental experiment. I call them Experiment 1 and Experiment 2 in this chapter. The main experiment (Experiment 1) was previously described in Chapter 3, and involved three groups of learners with different first languages. This experiment
focused on the production of T3 in disyllabic words. The design of Experiment 1 is again summarized in §6.3.1. The supplemental experiment (Experiment 2) was conducted on three groups of English-speaking learners with different proficiency levels in Chinese. Experiment 2 contained both a perception and a production component. Stimuli consisted of trisyllabic phrases, which served to check the application of T3 Sandhi rules across word boundaries (see §6.3.2). In order to compare the production of T3 with that of other tone types, I included all four lexical tones in both experiments. 6.3.1 The Main Experiment (Experiment 1) As noted in Chapter 3, sixteen combinations of four lexical tones were equally represented by two words (consisting of different morphemes) in Experiment 1, resulting in thirty-two distinct words. The critical test items containing T3 are displayed in Item (2): (2) Example test disyllabic words containing T3: T3-T1: 手机，老家 (shǒu jī ‘cell phone,’ lǎo jiā ‘hometown’) T3-T2: 打球，小学 (dǎ qiú ‘to play balls,’ xiǎo xué ‘elementary school’) T3-T3: 语法，水果 (yǔ fǎ ‘grammar,’ shuǐ guǒ ‘fruit’) T3-T4: 五月，舞会 (wǔ yuè ‘month of May,’ wǔ huì ‘dance party’) T1-T3: 出口，西北 (chū kǒu ‘exit,’ xī běi ‘northeast’) T2-T3: 牛奶，白水 (niú nǎi ‘milk,’ bái shuǐ ‘water’) T4-T3: 跳舞，下雨 (tiào wǔ ‘to dance,’ xià yǔ ‘to rain’) All words were at the lowest proficiency level. These words were embedded in sentences where both the preceding and following morphemes were the neutral tone particle de. Most (forty-five out of sixty-four) sentences ended with the phrase hěn hǎo ‘很好.’ Nineteen filler sentences ended with the phrase hěn bú cuò ‘很不错.’ These filler sentences were distributed randomly throughout the reading list. Item (3) displays the carrier sentence with the test disyllabic words labelled as XX, and Item (4) lists two examples of test sentences ending with different phrases. (3) Carrier sentence: Chinese character: Pinyin: Gloss:
我 觉得 X X 的
Wǒ juéde X X de dōngxi hěn hǎo. I feel X X PARTICLE things very good ‘I feel XX things are very good.’
Acquisition of the Third Tone
(4) Two sample test sentences: a. 我觉得开学的东西很好。 ‘I feel that things used for starting the semester are very good.’ b. 我觉得西北的东西很不错。 ‘I feel that things from the northwest are very good.’ During the analysis, tones were judged to be correct or incorrect by native Chinese speakers (see Chapter 3 for details). For Half-T3 and Raised-T3 productions, tones were judged in the following way: when the target tone was followed by a T1, T2, T4 or neutral tone, only Half-T3 (the low-falling tone) was considered correct. Whenever T3 was followed by another T3, any production other than Raised-T3 (the mid-rising tone) was considered incorrect. Note that a word-final syllable followed by the neutral-toned particle de should surface as Half-T3 as well, as displayed in Figure 6.1 below. Therefore, no T3 productions surfacing as Full-T3 were considered correct in these cases. L2 learners’ productions of Full-T3 were observed on the final syllables of sentences containing hěn hǎo. For the sentence-final T3 morpheme hǎo, both Full-T3 and Half-T3 forms were considered acceptable (Duanmu 2000).
Figure 6.1 Surface forms of T3 in disyllabic words.
The target tone contours of hěn within the sentence-final phrase were also judged in the same way as stated above for word-initial T3, in order to confirm the findings from the disyllabic test words. If participants correctly applied both T3 Sandhi rules, we would expect the hěn in the phrase hěn bú cuò to be a
Half-T3, according to the Half-T3 Rule. On the other hand, the hěn in the phrase hěn hǎo should be a Raised-T3, according to the Pre-T3 Sandhi Rule. Participants were asked to repeat the thirty-two words twice, resulting in sixty-four tokens collected per speaker per trial. For further details about the participants (twenty English speakers, twenty Japanese speakers, and twenty Korean speakers) and the design of Experiment 1, see Chapter 3. 6.3.2 The Supplemental Experiment (Experiment 2) While Experiment 1 was designed to test all four Chinese tones, Experiment 2 was specifically designed to study T3, and included both a perception and a production component. 184.108.40.206 Stimuli The goal of the perception experiment was to test if all levels of learners were able to categorize the two basic phonetic variants, Half-T3 and Full-T3, as the phoneme T3. The stimuli for the perception test consisted of eighty-nine test syllables of the sound /ma/: fourteen monosyllabic pseudo-words, twelve disyllabic pseudo-words, and seventeen trisyllabic pseudo-words, with T3 distributed at different positions. Test words were read aloud by a female and a male native Chinese speaker. Each token was repeated once with an interval approximately 2.5 seconds long in between tokens. Participants were given approximately ten seconds in between new test words to transcribe the tone they had heard. The reading list for the production task consisted of four sets of words. Words were written in the pinyin system to ensure that participants were able to produce the test words without knowledge of word meanings. Word Set 1, used for the pre-test, consisted of twelve monosyllabic pseudo-words bearing four tones with each tone type repeated three times. The purpose of the pretest was to ensure that learners could produce individual tones correctly. The accuracy rates of the monosyllabic tones in the pre-test exceeded 90% for all learners. Therefore, I assumed that these learners were eligible for the production experiment. Word Set 2 included eight disyllabic pseudo-words combining T3 with other tones. Note that while the bitonal sequences in Experiment 1 are real words, the disyllabic tone sequences in the Experiment 2 are pseudo-words made up of the segments /ma/. The use of sonorants in Word Sets 1 and 2 were kept to a maximum to ensure continuous pitch tracking. In order to better understand the acquisition of T3 Sandhi rules, particular attention was placed on trisyllables in Experiment 2. In Chinese, there is a
Acquisition of the Third Tone
considerable number of monosyllabic words, which form trisyllabic phrases when combined with disyllabic words, the most common type of word. For example, in [nǚ [lǎo-shī]] (女老师 ‘female teachers’) and [nǚ [xué-sheng]] (女学生 ‘female students’), the morpheme nǚ ‘female’ is not a part of the lexical word lǎo-shī ‘teacher’ or xué-sheng ‘students,’ but is loosely connected to the lexical word, thus forming a trisyllabic prosodic unit. The nǚ (女 ‘female’) in the first phrase nǚ lǎo-shī should surface as a Raised-T3 due to the Pre-T3 Sandhi Rule, since the following syllable lǎo also carries T3. However, the nǚ in the second phrase nǚ xué-sheng should be produced as Half-T3 due to the Half-T3 Rule. It was my goal to determine how well learners could make this distinction. Word Set 3, used in a production task, consisted of fourteen trisyllabic phrases with T3 at the beginning of the phrase (see Table 6.3). The word class of the monosyllabic morpheme at the beginning of trisyllabic words varied, and covered adjectives, adverbs, pronouns, prepositions, and numbers. The test morphemes ‘女, 很, 也, 请, 我, 给, 五’ are among the most common monosyllabic words used in elementary Chinese textbooks. The phrases in Column A and Column B have the same syntactic structure, but differ in the tones borne by the second syllable. Items from Column A tested how well learners applied the Pre-T3 Sandhi Rule, whereas items from Column B tested the application of the Half-T3 Rule. Column A items and Column B items were presented in alternating order in the reading list to avoid an ‘inertia effect,’ resulting in fourteen phrases. Word Set 4 consisted of a small paragraph with twelve T3 syllables located at utterance-final (either a clause or sentence) positions. This set was designed Table 6.3 Example trisyllabic phrases used for production task in Experiment 2 Column A
1. 2. 3. 4. 5. 6. 7.
b. 女学生 nǚ xué shēng ‘female students’ b. 很不好 hěn bù hǎo ‘very bad’ b. 也是人 yě shì rén ‘also is a person’ b. 请他看 qǐng tā kàn ‘ask him to look’ b. 我要看 wǒ yào kàn ‘I want to look’ b. 给他看 gěi tā kàn ‘show him’ b. 五个人 wǔ gè rén ‘five people’
a. 女老师 nǚ lǎo shī ‘female teachers’ a. 很好吃 hěn hǎo chī ‘very delicious’ a. 也有人 yě yǒu rén ‘also have people’ a. 请你看 qǐng nǐ kàn ‘ask you to look’ a. 我想看 wǒ xiǎng kàn ‘I’d like to look’ a. 给我看 gěi wǒ kàn ‘show me’ a. 五本书 wǔ běn shū ‘five books’
to determine the proportions of Full-T3 that L2 learners produced when both Half-T3 and Full-T3 were acceptable. 220.127.116.11 Subjects and Recording Procedures A new group of twenty American English speakers (ten females and ten males) participated in Experiment 2. All learners had been taught with the mainstream ‘Full-T3 First’ method. Participants were recruited from beginner-level (ten students), intermediate-level (seven students), and advanced-level (three students) Chinese classes in two different universities. These universities were not the same as those used in Experiment 1 (although all of them are located on the east coast of the United States). The beginner-level learners had been learning Chinese for one month at the time of data collection and none had studied in a Chinese-speaking country. The intermediate-level learners had all studied Chinese for four semesters at the time of data collection. Two of the intermediate-level students had studied in China for a summer term. The advanced students had four to five years of Chinese-language learning experience including study in China for at least a semester. These advanced-level students’ oral proficiency was at an advanced-high level according to ACTFL (American Council on the Teaching of Foreign Languages) Oral Proficiency Guidelines (2012). No participant spoke or was learning any other tonal language. The recording equipment and lab setting were the same as those used in the Experiment 1. 18.104.22.168 Analysis For the perception component of the Experiment 2, each participant listened to eighty-nine test syllables, each of which was repeated once. Thus, there was a total of 3,560 (eighty-nine syllables × two repetitions × twenty participants) test syllables. Participants were asked to write the corresponding tone marks in pinyin on an answer sheet. For the production test, two native Chinese speakers, Z and C (both of whom received linguistic training and each had more than seventeen years of experience teaching Chinese) judged whether participants’ productions were correct or incorrect. To guarantee reliability, both intra- and inter-rater agreement was calculated. Tonal productions were judged and transcribed twice by Z, with a fifteen-day interval in between judgments. The agreement rate between these two judgments was 95.6%. For inter-transcriber reliability, rater C judged one third of the data independently. A comparison test indicated that Z and C had an agreement rate of 90.8%. A third native speaker was consulted in cases of discrepancies so that a final transcription could be reached.
Acquisition of the Third Tone
6.4 Results Since the results from Experiment 1 and Experiment 2 are compatible with one another, this section is organized by topic rather than by experiment. We first examine the results of the perception component from Experiment 2 in §6.4.1, followed by results from the production component. The second section (§6.4.2) reports the error patterns of Half-T3 and Raised-T3 in both experiments. The production of T3 at utterance-final positions from both experiments will be examined in §6.4.3. In the report, ‘error rate’ refers to the number of times the target tone was produced incorrectly, divided by the total number of times the target tone should have been produced. The ‘substitution rate of Tone X’ refers to the number of times Tone X was used as a substitute for some other target tone, divided by the total number of errors for the target tone. 6.4.1 Perception of T3 Variants (Half-T3 and Full-T3) With regard to the perception component, the purpose of this study is to determine if learners can successfully categorize both Half-T3 and Full-T3 as T3 in a listening task. That is, learners did not have to label the different variants of T3, but simply categorize the low tone (Half-T3) and low-dipping tone (Full-T3) as T3. Learners did not need to discern the Raised-T3 since it has the same phonetic form as T2. Figure 6.2 displays the error rates (in percentages) of individual tones from the perception component. Half-T3, Full-T3, and Raised-T3 are abbreviated as H3, F3, and R3. Results show that the ability to recognize and transcribe T3 variants varies greatly across the three proficiency levels. 70
60 50 40
Figure 6.2 Error rates of tones in the perception task in Experiment 2.
Tone recognition by the advanced learners was almost entirely accurate. Errors made by the beginner and intermediate students showed certain similarities. For both of these lower proficiency levels, Half-T3 was the allophone which had the highest error rates. Half-T3 was found to be the most difficult to recognize when in word-final positions. In contrast, the error rates of Full-T3 were considerably lower than those of the other tones in all proficiency levels. In other words, Full-T3 was much easier for learners to recognize. This also implies that learners failed to pay attention to Half-T3, even though it is the most widely used T3 allophone in actual Chinese utterances. The two pie charts below further break down error types to show for which tones the learners, particularly those at the beginner and intermediate levels, mistook Half-T3.
Figure 6.3 Error types for target Half-T3 (H3) for beginner- and intermediate-level learners in the perception component of Experiment 2.
Among the 402 Half-T3 productions produced by beginner students that were judged to be incorrect, 146 of them (36.3%) were mistaken for T4, and 98 of them (24.3%) were mistaken for neutral tones. Of the 156 Half-T3 productions produced by intermediate students that were judged to be incorrect, 72 of them (46.15%) were mistaken for T4, 26 of them (16.6%) were mistaken for Full-T3, and 26 (16.6%) of them were mistaken for neutral tones. This may have occurred as a result of the phonetic similarities between Half-T3, T4, and the neutral tone. Half-T3 is a low-falling tone, transcribed as , while T4 is a high-falling tone with a pitch value of . The similarity between Half-T3 and T4 has not been widely discussed but was documented in Garding et al. (1986). While T4 and Half-T3 possess different tonal register features, both have falling contours. Hearing Half-T3 as T4 indicates that participants with low proficiency levels paid more attention to the contour than to the register. Placing greater attention on contour rather than register is not an uncommon source of error for
Acquisition of the Third Tone
L2 tones (see H. Zhang 2010). On the other hand, some learners in this study mistakenly heard Half-T3 as the neutral tone. This may be due to the similarity in the two tones’ temporal features: both Half-T3 and neutral tones are much shorter in duration than other lexical tones (Jongman et al. 2006). There seems to be improvement in the perception of Half-T3 as students progress in proficiency. The error rates of Half-T3 decrease as proficiency increases. A statistical analysis using Fisher’s Exact Test shows that advanced students’ error rate in perceiving Half-T3 was significantly lower than that of intermediate students’. Intermediate students’ error rate for Half-T3 was significantly lower than beginning students’. 6.4.2 Production of Half-T3 and Raised-T3 Whereas the previous section focused on the perception of Half-T3 and Full-T3, this section focuses on the variants of Half-T3 and Raised-T3 resulting from the application of the T3 Sandhi rules. This section reports on T3 productions from both experiments. At least three similar error patterns were observed in both experiments: (1) In general, T1 was produced better than T4, and T4 was produced better than T2; (2) Half-T3, as a target tone, had higher error rates than Raised-T3; (3) Half-T3 was frequently used as a substitute for other tones, but the target Half-T3 was most frequently produced incorrectly as Full-T3 by L2 learners. This section first reports the results of error patterns and then gives results of substitution patterns. 22.214.171.124 The Error Patterns of Half-T3 and Raised-T3 Both experiments show similar error rankings across the individual tones, which were compatible with the Tonal Markedness Scale as discussed in Chapter 5. However, contrary to the hypothesis made in §6.2, Half-T3 had a higher error rate than Raised-T3. In Experiment 1, English speakers’ tonal performance was found to be the poorest, while Japanese speakers’ performance was the best. For all language groups, the error rate for T2 was the highest, with all groups of speakers exhibiting an error rate at or above 55%. T1 and Raised-T3 had the lowest error rates. The general ranking of error rates is displayed in Figure 6.4. No significant difference was found for the performance of the target Half-T3 in word-initial and word-final position. This can be seen in Table 6.4. As seen in Figure 6.4, Half-T3 was produced with much higher error rates than Raised-T3. To put it another way: L2 learners in this study performed the tone related to the Pre-T3 Sandhi Rule (Raised-T3 ) with higher accuracy than that related to the Half-T3 Rule. This suggests that learners applied the phonemic Pre-T3 Sandhi Rule better than the phonetic Half-T3 Rule. This
Figure 6.4 Overall error rates in the Experiment 1.
Table 6.4 Error rates of target Half-T3 at word-initial and word-final positions
Error rates p-values
52% 43% p = 0.3618
37% 29% p = 0.3368
33% 39% p = 0.3939
result does not support the hypothesis made, and is in fact the reverse of what was predicted in section §6.2. Some may ask whether the frequency of sandhi rule application influences error rates, since learners need to apply the Half-T3 Rule more often than the Pre-T3 Sandhi Rule. I do not believe that this is an issue. The higher error rate of Half-T3 compared to Raised-T3 is not only found for disyllabic words, but also in sentence-final phrases. In sentence-final positions, there are many more T3+T3 sequences than T3+Other sequences. The T3 morpheme hěn is used in both of the phrases hěn hǎo and hěn bú cuò. According to the Pre-T3 Sandhi Rule, the hěn in hěn hǎo should be produced as Raised-T3, but produced as Half-T3 in the phrase hěn bú cuò. Each subject must apply the Pre-T3 Sandhi Rule forty-five times, but the Half-T3 Rule only nineteen times. The result is
Acquisition of the Third Tone
that the error rate for Half-T3 is still much higher than for Raised-T3. Taking the English speakers as an example, the error rate for target Raised-T3 is 28% whereas the error rate for Half-T3 is as high as 68%. T1 and Full-T3 were the two most often-used substitute tones for the target word-initial T3, which is similar to the substitution patterns found in disyllabic words. This will be discussed in the next section. Both the general error pattern and Half-T3’s higher error rates relative to Raised-T3 are confirmed by the data from Experiment 2. The general error rates of all tones in Experiment 2 are displayed in Figure 6.5 (data from Word Sets 2–4).
Figure 6.5 Error rates of L2 tonal productions in Experiment 2.
The relative error rates of T1, T2, and T4 are also similar across the three proficiency levels: the error rates of T2 are higher than for T4, and the error rates of T4 are higher than for T1. The error rate rankings found in the two production studies are also similar to the error ranking found in the perception task, and is compatible with the TMS. From a developmental perspective, the performance of T2 is more difficult to improve than T3 and other tones, particularly from the beginner to the intermediate level. In contrast, the improvement in the performance of T3 and T4 across the three levels is striking. Figure 6.6 below breaks down all T3 error rates into Half-T3, Raised-T3, and Full-T3. The error rates for Half-T3 and Raised-T3 are greater than for Full-T3, and Half-T3’s error rate is higher than Raised-T3’s. Across proficiency levels, Half-T3 error rates descend more quickly than for Raised-T3: the error rate of Half-T3 falls from 64% in the beginner level to 5.5% in the advanced level, while the
Figure 6.6 Error rates of Half-T3, Raised-T3, and Full-T3 in Experiment 2.
error rate of Raised-T3 falls from 56% to 18.7%. As a result, the error rate of Raised-T3 is still greater than that of Half-T3 and Full-T3 at the advanced level. This indicates that the drop in the T3 error rate is mainly attributable to a substantial improvement in the production of Half-T3. As was the case for Experiment 1, there are two seemingly contradictory phenomena in Experiment 2. First, in the beginner and intermediate learners’ data, the error rates of Half-T3 are at least 10% higher than those of Raised-T3. This implies that lower-level L2 learners implemented the Pre-T3 Sandhi Rule better than the Half-T3 Rule, which is the reverse of what is observed in native Chinese speakers (Zhang and Lai 2010). However, advanced learners are very similar to native Chinese speakers. Second, the error rates of Raised-T3 and T2 for learners in Experiment 1 and the beginner- and intermediate-level participants in Experiment 2 are incompatible with the TMS. While L2 learners exhibited a very high error rate for T2 (predicted by the TMS), they had a relatively low error rate for Raised-T3, which has the same pitch contour as T2. In order to explain these observations, we must examine the details of the types of substitutions made for target T3 when tones were produced erroneously. 126.96.36.199 Substitutions Used for Half-T3 and Raised-T3 In the study of L2 acquisition, substitution has been shown not to be an arbitrary process, but instead the result of a process of avoidance and choice made by L2 learners. Item (5) displays the ranking of the frequencies of individual substitute tones across all speakers in Experiment 1. The numbers in
Acquisition of the Third Tone
parentheses following each substitute tone indicate the frequency with which each tone was used as a substitute for some other target tone. For example, Half-T3 was used as a substitute tone 1,032 times throughout the entire dataset. (5) The ranking of substitute tones based on frequency Half-T3 (1,032) > T1 (723) > T4 (459) > Full-T3 (327) > T2 (180) Figure 6.7 details the substitute tones within each learner group in Experiment 1. For each language group, the columns represent the substitute tones. The y-axis indicates the percent of substitutions out of all productions made within each group.
Figure 6.7 Substitutions within each language group in Experiment 1.
The rankings of individual substitute tones for each group are the same throughout: Half-T3 > T1> T4 > Full-T3 > T2. Generally speaking, wherever low error rates (i.e. high accuracy) occur with a target tone, that tone was also used as a substitute tone at a high rate, and vice versa. This is true for T1, T2, and T4. Note that this result fits predictions made by the TMS. Only Half-T3 does not fit this pattern. For Half-T3, very high error rates were found for all groups of learners, as seen in Figure 6.4, and yet Half-T3 was used as a substitute tone more often than any of the other tones, as seen in Figure 6.7. The high substitution rate of Half-T3 indicates that Half-T3 is phonetically easy to produce, especially in connected speech, and should therefore have a low error rate.
Table 6.5 Substitute tones for target Half-T3
Substitute tones (error tones)
FT3: 40% T4: 25% T1: 20%
FT3: 31% T1: 29% T2: 19%
T1: 35% FT3: 29% T4: 26%
Table 6.6 Detailed substitution patterns for target Half-T3 with positional information
Response tones T1 T2 T4 Half-T3 Full-T3 Other
Initial 11% 15% 5% 48% 20% 1%
Initial 11% 14% 3% 63% 8.8% 0.4%
Initial 11% 4% 5% 68% 12% 0.4%
Final 8% 0 16% 57% 18% 1%
Final 8% 0.6% 7% 71% 11% 2%
Final 14% 0.3% 13% 61% 10% 2%
However, this study finds that Half-T3 had a very high error rate (similar findings in Tao and Guo 2008). To determine why Half-T3 exhibits this contradictory pattern, substitutions used for the target tones Half-T3 were further examined in Experiment 1. For target Half-T3, the first three most frequently-used substitutions are provided in the following table. The tones listed are those used as a substitute by learners when errors did occur, followed by the percent frequency of that error type (out of the total number of errors for that target tone). As can be seen in Table 6.5, Full-T3 was used as a substitute tone very often in all three groups. For the English and Japanese speakers, Full-T3 was the most frequently-used substitute tone for target Half-T3. It was the second most-used substitute tone for the Korean speakers. The percentage of times each tone was produced for a target Half-T3 is shown in Table 6.6. (The row labelled ‘Half-T3’ shows the percentages of correct productions.) The tone used most often in each position is boldfaced to highlight the most frequently-used tone for a target Half-T3 at specific positions.
Acquisition of the Third Tone
The chart above shows some L1 transfer effects in each group in addition to the previously-noted frequent occurrence of Full-T3 as a substitute tone across the three groups. Japanese speakers’ most frequent use of T2 at the word-initial position as a substitute for Half-T3, and Korean speakers’ use of word-final T1, may be due to typical Japanese and Korean intonation patterns. In Tokyo Japanese, a typical lexical pitch accent can be annotated as H*L (High-Low) and the initial tone of the lexical word may be overridden by a rising tone if this lexical tone initiates an Accent Phrase (Venditti 2005). The test word in this study is located at the beginning of an Accent Phrase, so this may explain why many Japanese speakers produced either T2 or T1 at word-initial positions. A typical two-syllable Korean Accent Phrase can be annotated as Low-High (Jun 1996). Korean speakers’ frequent use of T1 in word-final positions is very likely due to this intonation pattern. For more details regarding L1 transfer effects, see H. Zhang (2013). In addition to these L1 transfer effects, Full-T3 was used often as a substitute target for Half-T3 among all language groups. The occurrence rate of Full-T3 as a substitute tone for target Half-T3 at both word-initial and word-final positions is the highest among all the errors for English speakers. Full-T3 was also the most often-used substitute tone for Half-T3 at word-final positions for Japanese speakers and at word-initial position for Korean speakers. The chart above shows some interactions resulting from L1 transfer, as well as the overuse of Full-T3. Experiment 2 shows similar findings regarding the overproduction of Full-T3. Both disyllabic words (Word Set 2) and trisyllabic phrases (Word Set 3) were used to test for the application of T3 Sandhi rules. Item (6) shows an example pair of trisyllabic phrases: (6) Example trisyllabic testing materials (a) 请你看 qǐng nǐ kàn ‘ask you to look’ (b) 请他看 qǐng tā kàn ‘ask him to look’ The test syllable qǐng in (a) was expected to undergo the Pre-T3 Sandhi Rule and therefore surface as Raised-T3. On the other hand, the test syllable qǐng in (b) was expected to undergo the Half-T3 Rule and surface as Half-T3. As was the case for Experiment 1, participants also mistakenly produced a large proportion of target Half-T3 as Full-T3, leading to a high error rate for Half-T3 and overuse of Full-T3. This pattern was found for both Word Sets 2 and 3 of the task, and was especially noticeable for the beginner and intermediate learners. Beginning learners made 232 errors when they intended to produce Half-T3 at nonfinal positions. Of these, 170 of them (about 73%) were
mistakenly produced as Full-T3. Intermediate learners made 72 errors for the target Half-T3, 50 of which (about 69%) were produced as Full-T3. Advanced learners only made four errors with Half-T3 targets. Two of these were produced as Full-T3, and the other two were produced as T4. Figure 6.8 displays a typical pitch contour for a beginner’s production of (a) qǐng nǐ kàn and (b) qǐng tā kàn.
Figure 6.8 Pitch track of two trisyllabic prosodic words with the same T3 morpheme at the beginning: (a) qǐng nǐ kàn and (b) qǐng tā kàn.
For cases in which Full-T3 was overused, the first syllables qǐng were always produced as a low- dipping tone (Full-T3, circled in Figure 6.8), regardless of whether stimuli were presented in the order (a) (b) or (b) (a). Lower-level learners overproduced Full-T3 more often than higher-level learners. This tendency is also confirmed in the next section. 6.4.3 Production of Utterance-Final T3 This section analyzes the production of T3 at sentence-final positions in both experiments. As discussed earlier, T3 usually surfaces as Half-T3 in nonfinal positions, and can optionally be produced as either Half-T3 or Full-T3 by native Chinese speakers when in isolation and when in utterance-final positions. In surveys of native Mandarin Chinese speakers, Half-T3, like Full-T3, is found to occur frequently in isolation and at the end of an utterance (Duanmu 2000; Shi and Li 1997). For example, in his experiment with native Mandarin Chinese speakers, Duanmu (2000) finds that the majority of utterance-final T3s produced by native Chinese speakers surfaces as Half-T3 rather than
Acquisition of the Third Tone
Full-T3. That is, native Mandarin Chinese speakers use Half-T3 at utterancefinal positions very frequently. Since both Full-T3 and Half-T3 are acceptable on utterance-final syllables, I paid particular attention to the proportions of Full-T3 and Half-T3 used in this location by various types of L2 learners in both experiments. In Experiment 1, the productions of the morpheme hǎo located at the end of the test sentence-final phrase hěn hǎo were examined. Results varied among the three groups of speakers. Figure 6.9 displays the actual tonal productions of the sentence-final T3 morpheme hǎo. The dark portions indicate the percent of times Full-T3 was uttered, while light portions indicate the percent for Half-T3.
Figure 6.9 Sentence-final T3 hǎo uttered by English, Japanese, and Korean speakers.
It is clear that the English speakers often produced the sentence-final T3 morpheme as Full-T3. For the Japanese speakers, only 40% of sentence-final T3s were produced as Full-T3, which was the least-frequent use of sentence-final Full-T3 among the three groups. Korean speakers produced Full-T3 in sentence-final positions about 45% of the time. Since both Half-T3 and Full-T3 are acceptable at the ends of utterances, the accuracy rates of the three groups are 98% for English speakers and 100% for Japanese and Korean speakers. Two conclusions can be drawn from the behavior of the sentence-final morpheme hǎo uttered by this diverse group of learners. First, the behavior of Full-T3 at the ends of sentences mirrors the behavior of Full-T3 in disyllabic words as reported in Experiment 1. In disyllabic words, English speakers mostly produced Full-T3 as a substitute for other tones. Japanese and Korean speakers produced Full-T3 much less often than the English speakers. Second, the hierarchy also bears a relation to the general tonal proficiency of these three groups of learners. English speakers had the highest overall error rate, while Japanese speakers had the lowest tonal error rate in the Experiment 1. It seems that the higher the general tonal proficiency, the less often Full-T3 was
produced, both within sentences and at sentence-final positions. The Englishspeaking group produced Full-T3 the most often, either as a substitute tone inside sentences or at word boundaries. This tendency was also confirmed in Experiment 2. Half-T3 and Full-T3 productions in sentence-final positions in Experiment 2 (Word Set 4) were made by learners at different proficiency levels. In the pre-test, almost all monosyllabic T3s were produced as Full-T3. However, when T3 occurred at the end of an utterance, results varied across learners’ proficiency levels. Figure 6.10 displays the results of the twelve test T3 syllables located at utterance-final positions.
The proportions of Half-T3 (H3) and Full-T3 (F3) at utterance-final positions.
Figure 6.10 shows that learners at higher proficiency levels used Half-T3 more often, but used Full-T3 less frequently, than beginning learners. The ratios of Half-T3 to Raised-T3 across the three levels were 0.3 at the beginner level, 0.82 at the intermediate level, and 1.3 at the advanced level. Within Experiment 2, it was also found that the behavior of Full-T3 at the ends of sentences across the three levels mirrored the behavior of Full-T3 in prosodic words. 6.5 Discussion 6.5.1 The Overproduction of Full-T3 In both experiments, two main observations were made regarding L2 learners’ production of T3. First, Half-T3 error and substitution patterns seem to conflict
Acquisition of the Third Tone
with one another. Half-T3  was used as a substitute for other tones more often than any other tone. However, the error rate of target Half-T3 was surprisingly high, and was frequently substituted with Full-T3  at both wordinitial and word-final positions. Second, in general, L2 learners produced the Pre-T3 Sandhi Rule better than the Half-T3 Rule, which is the reverse of what native Mandarin speakers do. I propose that the overuse of Full-T3 in L2 tone phonology may be the cause of the contradictory phenomena identified in L2 tonal productions of T3. Half-T3 is often used as a substitute for other contour tones, suggesting that it is phonetically easy for L2 learners (as predicted by the TMS). It is surprising that learners overproduce Full-T3, which is phonetically difficult to produce, when intending to produce the relatively simple Half-T3 in connected speech. It is difficult to ascribe this overproduction to native language transfer, because there is no precedent for lexical tones in English. Furthermore, a low-dipping tone is not a regular intonation form in English (Pirrehumbert 1980). Universal phonological constraints, such as the Tonal Markedness Scale, cannot explain the overuse of Full-T3, as Full-T3 is an unlikely candidate to be used as a substitute tone over less marked tones. Chinese phonology also cannot explain this overuse of Full-T3, since native Chinese speakers produce more Half-T3 than Full-T3. Based on the results of both the perception and production components, I believe that the overproduction of Full-T3 is caused by the mainstream ‘Full-T3 First’ teaching method, which is derived from the assumption that Full-T3 is the base form of T3. The prevalent treatment of Full-T3 as the normal form in classrooms has led S. Chen (1973) to observe: “As a result, a frequently found mistake is that students use the low dip full third tone in all environments” (S. Chen 1973: 146). S. Chen’s observation is now supported by the quantitative studies presented in this chapter. As this study has shown, participants are better at perceiving Full-T3 than Half-T3. In particular, the ‘Full-T3 First’ method of teaching requires applying complicated tone sandhi rules, which may add an extra and, as argued below, unnecessary burden for learners of Chinese. Since participants in this study were taught with the mainstream ‘Full-T3 First’ teaching method, it was necessary for them to memorize and apply both T3 Sandhi rules at almost every occurrence of T3, i.e. to change Full-T3 into Raised-T3 preceding another T3, and to change all other Full-T3s into Half-T3 when preceding T1, T2, T4, or a neutral tone. As can be seen in this study, L2 learners often failed to apply the appropriate T3 Sandhi rule in connected speech, instead reverting to what they believed the ‘standard form’ to be: Full-T3 . It seems that the T3 Sandhi rules required for a ‘Full-T3 First’ method of teaching have not been completely internalized by beginning and intermediate learners of Chinese, and when
confronted with T3, learners often simply use the first form they have been taught, Full-T3. However, advanced learners seemed to have overcome the interference of the predominant status of . On the other hand, this overuse of Full-T3 did not seem to negatively affect learners’ productions of Raised-T3. If anything, it may have increased its accuracy rate. This may have simply been due to the acoustic similarity between Raised-T3 (i.e. T2) and Full-T3. According to Wen and Yan (2015), native Chinese speakers tend to perceive L2 learners’ Full-T3 as T2 (rising tone). Since both Raised-T3 and Full-T3 end with a rising pitch, it is very likely the case that the overused Full-T3 was misperceived as a rising tone and sometimes judged to be a correct production for target Raised-T3, leading to inflated accuracy rates for Raised-T3. Elliot (1991) had similar findings. Some studies, such as by C. Yang (2011), argue that the ‘good’ performance of Raised-T3 shows that L2 learners apply the Pre-T3 Sandhi Rule very well. However, according to the results presented in this study, it is very likely that learners have not completely internalized the T3 Sandhi rules but have simply produced low-dipping tones for T3. The ‘Full-T3 First’ method of teaching is rooted in the theory that Full-T3 is the base form of T3. This issue is especially important for the teaching and learning Chinese as a second language, since it has practical consequences. For example, if a learner substitutes Full-T3 for Half-T3 on a regular basis, Full-T3 is easily misperceived as T2 or Raised-T3 by untrained ears, which may lead to changed meanings and therefore misunderstandings. For example, the word pairs displayed in Item (7) differ only in the initial tones of either Half-T3 or T2, yet have completely different meanings: (7) Example minimal pairs differing in Half-T3 and T2 (Raised-T3) a. lǎo gōng vs. b. mǎ qiú vs. c. shǐ shàng vs. láo gōng má qiú shí shàng ‘husband’ vs. ‘polo’ vs. ‘in history’ vs. ‘laborer’ ‘fried sesame ball’ ‘fashion’ 6.5.2 Implications The results of the present study show that the ‘Full-T3 First’ method may not effectively facilitate the acquisition of T3 since it seems to cause learners to overuse Full-T3 in connected speech. In this section, we look at the issue of the base form of T3 first, then revisit the idea of the ‘Half-T3 First’ teaching method, motivated by an alternative view of the base form.
Acquisition of the Third Tone
188.8.131.52 Theoretical Implications: The Underlying Form of T3 In the field of Chinese linguistics, the question of whether T3’s underlying form is Half-T3  or Full-T3  is controversial. The core controversy revolves around whether  should be treated as primary or secondary to . Many researchers have proposed that the underlying form of the third lexical tone in Mandarin Chinese should in fact be Half-T3  rather than Full-T3  (Hockett 1947; Mei 1977; Todo 1980; Tsung 1987; Yip 1980, 2002; C-Y. Chen 2005, among others). As discussed in Mei (1977:253), the main reason for assuming that Full-T3  is the underlying form is that “in the synchronic analysis of Modern Pekingese, it is customary to consider a tone in isolation as its basic value.”2 As a result, Half-T3  is taken as secondary. This study supports the proposal that  is the base form, not only because of its much wider environmental distribution, but also because this account better explains (1) the L1 acquisition of T3, (2) the historical development of T3 and tone sandhi rules, and (3) spoken Mandarin used inside and outside of China. Studies of first language acquisition of tones favor an account that analyzes  as the underlying form of T3. Evidence that Half-T3 is acquired earlier by children than Full-T3  is provided in Chao (1951), Li and Thompson (1977), and is summarized in Tsung (1987). The current study adds to this argument using data from second language acquisition. Half-T3  is also an attractive option in accounting for the reconstruction of sixteenth- century Pekingese tones – an account which conforms to early records of the Mandarin sandhi rules. Two Korean textbooks for learning Chinese and a statement in a late Ming treatise on versification in the Qu style show that only the Pre-T3 Sandhi Rule existed in old Pekingese Mandarin (Mei 1977). In the ‘General Principles’ describing the tonal values and the tone sandhi rules of fourteenth-century Pekingese in Laoqida yanjie 老乞大諺解 and Putongshi yanjie3 朴通事諺解, only the Pre-T3 Sandhi Rule is mentioned. According to Mei (1977), the shǎng shēng4 in sixteenth-century Mandarin 2 Zhang and Lai (2010) argue that one of the reasons for not taking  as the underlying form is “technically workable for Beijing Mandarin, but difficult to defend from a typological perspective.” (Zhang and Lai 2010:187). 3 The title of the book sometimes is transcribed as Piaotongshi yanjie or Parktongshi yanjie. Here Putongshi yanjie is adopted from Mei (1977). 4 The four tones of modern Standard Chinese are not the same as the four Classical tones (píng, shǎng, qù, and rù) because of numerous splits and mergers in the Classical categories (see Norman 1998 for more information).
should be reconstructed as a low-level tone and it later split into two forms, a low-rising and a low-level tone, between the sixteenth century and the present. Looking back to Zhang and Lai (2010), the claim that  is the underlying form of T3 is compatible with the finding that native Mandarin speakers apply the Half-T3 Rule more consistently than the Pre-T3 Sandhi. If , rather than , is assumed to be the underlying form, native Mandarin speakers only need to process the Pre-T3 Sandhi Rule, and no extra processing is required to derive Half-T3, giving the appearance that Mandarin speakers apply the nonexistent Half-T3 Rule more consistently than the Pre-T3 Sandhi Rule. The claim that  is the underlying form also accounts for the more frequent use of Half-T3 on utterance-final syllables than Full-T3 by native Mandarin speakers as shown in Duanmu (2000) and the survey reported in Shi and Li (1997). In addition, it is widely acknowledged that in Taiwan Mandarin Half-T3 has primary status and appears in utterance-final positions (Tai 1978). The situation is similar for the Mandarin spoken in Singapore. According to C-Y. Chen (1983), the form of T3 when it appears before a tone other than T3 or before a pause is a low pitch with no rise. 184.108.40.206 The ‘Half-T3 First’ Method The alternative ‘Half-T3 First’ teaching method introduces Half-T3 to L2 learners first as the base form. Learners then subsequently study the Raised-T3 form derived from Pre-T3 Sandhi. The learning of Full-T3 then becomes an optional task. There is no need to emphasize the Full-T3 form because the ‘bounce’ effect for the final rise pitch is a natural result of releasing the laryngeal muscles, which have contracted for low pitches (i.e. the base tone Half-T3), to a relaxed state (see information about the bounce effect in Hyman 2007; Chen and Xu 2006). While the ‘Full-T3 First’ method is dominant in the field of Chinese as a L2 language, the merits of ‘Half-T3 First’ should not be ignored. In addition to facilitating the establishment of Half-T3 as the base tone in a learner’s developing tone phonology, there are two other benefits to adopting this method. First, it avoids the confusion that students often make between T2 and Full-T3 at the beginning stages of learning Chinese. Second, this method simplifies the sandhi process. Under the ‘Full-T3 First’ method, learners are required to memorize and apply two rules: the Pre-T3 Sandhi Rule and the Half-T3 Rule. According to the results presented in this chapter, poor T3 performance is primarily caused by the high Half-T3 error rate, which results from the extra processing required to apply the Half-T3 Rule. This sandhi rule, which is applied more frequently than the Pre-T3 Sandhi, would cease to exist if we instead assumed that Half-T3 is the base form. Figure 6.11 displays the tone sandhi rules required if  is assumed to be the base form, vs. what would apply if  is assumed.
Acquisition of the Third Tone
Tone sandhi rules based on different theoretical assumptions.
As shown in Figure 6.11, both the Pre-T3 Sandhi Rule and the Half-T3 Rule are required if  is considered the base form. However, by adopting  as the base form, only the Pre-T3 Sandhi Rule would be required, thereby substantially simplifying the sandhi process. The computational burden on the learner would be lessened, which would potentially lead to better L2 performance on T3 productions. In the 1980s and 1990s, researchers in the field of teaching Chinese as a second language discussed how best to teach Chinese tones. Although less mainstream, the ‘Half-T3 First’ teaching method has been proposed in various studies, especially in the late 1980s (Yue-Hashimoto 1986; Tsung 1987; Zhao 1988; Y-J. Wang 1995). Most discussions of the ‘Half-T3 First’ method advocate teaching Half-T3 before Full-T3 at the beginning level to avoid the confusion caused by Full-T3’s similarity to T2, but still take Full-T3 as the base form. Despite the controversy, most researchers reject taking Half-T3 as the base form and the ‘Half-T3 First’ method has not been adopted by the majority of educators. For example, Sun (1998) believes that the evidence and reasoning used to argue that Half-T3 should be the main allophone of T3 taught to learners is a product of an ‘unnatural bifurcation’ of the two allophones, which is believed to be developed from structuralist analysis. Shi and Li (1997) argue against the ‘Half-T3 First’ teaching method, claiming that it is unwise to teach Half-T3 first because Full-T3, as the tone’s foundation, is the base form for all allophones and sandhi rules. In addition, they claim that it will be difficult for learners to study other forms of T3 if they cannot first pronounce , the base form of T3, well.5
5 The original statement is “如果我们对语言做一些实际考察之后，就会发现主张只教 半三声或先教半三声的想法未免有偏颇之处。我们认为：第三声的本调（我们 称之为全三声）是变化的基础，不掌握好本调就很难掌握好它的变调。” (Shi and Li 1997:126).
Empirical studies comparing the ‘Full-T3 First’ and ‘Half-T3 First’ teaching methods are needed to help determine which of the two methods is actually more effective. However, no matter which teaching method is adopted, it is important to impress upon learners the important status of Half-T3, as it is the most frequently seen variant of T3 in connected speech. Diversified methods in teaching T3 and its related tone sandhi rule(s) should be advocated to provide a basis for further pedagogical research. Under the ‘Half-T3 First’ method,  may be confused with the falling tone  at the beginning stage of learning. Quantitative studies are needed to determine if it is easier for learners to distinguish  from  under the ‘Half-T3 First’ method than to distinguish  from  in the ‘Full-T3 First’ method. Further pedagogical suggestions are provided in Chapter 7. 6.6 Conclusion In this cross-linguistic study of the L2 acquisition of Mandarin Chinese tones, I have examined the non-native tonal productions of three allophones of T3: Full-T3 , Half-T3 , and Raised-T3 . A series of experiments was conducted, with Experiment 1 involving English-, Japanese-, and Korean-speaking learners at the intermediate proficiency level, and Experiment 2 conducted with English-speaking learners at varying proficiency levels. A perception test indicated that learners categorized Full-T3 as T3 with ease, but often mistook Half-T3 as T4 or the neutral tone. In both production experiments, Half-T3 was a frequently used substitute tone when errors were made on other tones, and yet Half-T3 (as a target tone) itself exhibited a very high error rate. This held true for all groups of learners. This study argues that the apparent conflicts between error patterns and substitution patterns can be attributed to the overuse of Full-T3, which in turn is due to the widely used ‘Full-T3 First’ teaching method. The overproduction of Full-T3 may also cause other conflicting error patterns, such as the higher error rate of T2 relative to Raised-T3, which has the same phonetic contour. I argue that in many cases, learners have not internalized two T3 rules but simply use Full-T3 when T3 is required. At the utterancefinal position, learners with higher proficiency use more Half-T3s but fewer Full-T3s than learners with lower proficiency. It seems that the T3 issues discussed in this chapter also extend beyond the scope of L1 transfer and beyond the intrinsic structures of Mandarin tones. Rather, I believe that the overproduction of Full-T3 is rooted in the disputed view that Full-T3 is T3’s base form. The dominant assumptions made within theoretical linguistics regarding the base or underlying form of T3 has shaped
Acquisition of the Third Tone
current mainstream teaching practices, and has led to the seemingly contradictory phenomenon discussed above. Although the nature of the relationship between linguistic theory and foreign-language teaching has been debated since the late nineteenth century, it is widely acknowledged that an understanding of the linguistic structures of natural languages is of fundamental value for L2 teaching and learning (Lado 1968; Levenson 1979). In the case of the L2 acquisition of the Chinese third tone, the assumption that the underlying or base form of the third tone is a low-dipping tone has had a major effect on pedagogical practices. Currently, there is a disconnect between the way T3 is taught and linguistic research on T3’s nature. Further pedagogical research directly comparing the effects of different pedagogical methods is necessary to determine what the most efficient method for teaching T3 might be, so that the acquisition of this difficult tone may be improved.
Teaching Mandarin Chinese Tones In addition to presenting research on interlanguage tonal grammars, this book also attempts to shed light on tone pedagogy to better facilitate the acquisition of Mandarin Chinese by adult learners. Previous studies on tone acquisition have discussed various concrete strategies for improving both tone production and perception, such as Bluhme and Burr (1971), Zhao (1988), Y.J. Wang (1995), Zhao and Cheng (1997), Wang et al. (1999), So (2006), Chen and Massaro (2008), and C. Yang (2011). In this chapter, I do not intend to provide detailed classroom strategies, but rather aim to: 1. Discuss the pedagogical implications of the results presented in previous chapters (in §7.1.1) and provide tips on establishing mental representations of tones and training learners’ motor skills (§7.1.2). 2. Summarize the design of the commonly used materials in tone training and make suggestions to textbook writers (§7.2). 3. Provide sample material for use in the classroom (§7.3). 7.1
This section first discusses the pedagogical implications of findings from the preceding chapters, then makes some general suggestions to beginners learning Chinese tones. 7.1.1 Pedagogical Implications of Chapters 4–6 Although every element in Chinese prosody is important when studying the language, extra attention should be placed on aspects that are especially difficult for adult learners. Research presented in this book finds that some L2 tone errors do not derive from interference from learners’ native languages, nor are deduced from the tone grammar of Mandarin Chinese. Instead, some error patterns can be attributed to other factors such as phonetic mechanisms (anticipatory dissimilation), universal phonological constraints (the Tonal Markedness Scale (TMS) and the Obligatory Contour Principle (OCP)), and pedagogical practices currently in use. Therefore, in addition to interference from L1 transfer, learners must also overcome difficulties caused by factors such as articulatory limitations on speech production, possibly misleading
© koninklijke brill nv, leiden, 2018 | doi 10.1163/9789004364790_008
Teaching Mandarin Chinese Tones
orthography, teaching methods, and so on. Here I first identify likely areas of difficulty for L2 learners based on findings outlined in Chapters 4 and 5, then discuss T3 acquisition based on findings from Chapter 6. Mandarin Chinese contour tones are not only more difficult to produce than level tones for first-language learners, but they are especially difficult for L2 learners to acquire. In general, T2 in final positions within prosodic units (for example, word-final) and T4 in initial positions within prosodic units (for example, word-initial) seem to be especially difficult for speakers coming from nontonal language backgrounds (see H. Zhang 2015 for sentence-level positional effects in L2 tones). Additionally, when T2 and T4 are followed by tones with H onsets (T1 and T4), the accuracy of these tones decreases, likely due to anticipatory dissimilation. Therefore, I suggest that intensive training and focus be placed on T2 at word-final positions, T4 at word-initial positions, and also T2 and T4 when they precede T1 or T4 (i.e. T2-T1, T2-T4, T4-T1, and T4-T4 sequences). The research presented in Chapter 5 also finds that, in addition to the tone sequences mentioned above, identical tone combinations are especially difficult for L2 learners. There is a large number of identical tone combinations (such as T2-T2, T4-4) in the numerous disyllabic words found in modern Chinese. Producing two identical contours in a row is a completely new task for speakers from nontonal language backgrounds. Intensive training in this type of tonal sequence is necessary for learners to learn quickly how to switch between pitch targets in connected speech. Based on findings in Chapter 6, this section highlights several points applicable to T3, one of the most difficult tones for L2 learners to acquire. As discussed in Chapter 6, it is yet to be determined whether teaching Full-T3 first or Half-T3 first most benefits L2 learners. However, supporters of both of these accounts agree that a major problem in T3 acquisition is Half-T3. This section offers two suggestions for teaching T3. First, in classrooms, special attention should be paid to T3 register features over contour features. Results from Chapter 6 suggest that learners, especially those with low proficiency, are largely distracted by the falling-rising contour of Full-T3. However, theoretical phonology represents tone as a series of tiers, with register dominating contour features (see Yip 2002 for discussions on feature geometry). I believe that a greater focus on tone height rather than on pitch contour will better aid learners in pronouncing T3. Asking learners to produce T3 as low as possible (rather than distracting them with the fallingrising contour) will help them differentiate between T3 and other tones both perceptually and in production, ideally making the tones more contrastive with one another. Second, T3 should be practiced within tone strings rather than in isolation, so that learners can practice producing Half-T3, the most
widely used allophone of T3. In addition, producing tones in context is an effective way to acquire the intonation system. Let us now turn to general suggestions on how to learn Chinese tones and Chinese sounds as a whole. From the Establishment of the Mental Representation of Tones to Motor Skills As discussed in Chapter 1, Chinese tones do not function individually. Rather, the Chinese tone system is a dynamic and comprehensive whole, requiring learners to form a mental faculty that attends to and processes meaningful distinctions between different tones. Acquiring the Chinese tonal system thus requires one both to mimic the acoustic-phonetic features of Chinese tones as well as learn the phonology of Chinese sounds. Generally speaking, it is not difficult for L2 learners of Chinese to differentiate Chinese tones in a contrastive environment and to imitate tones during instruction. However learners, especially beginning learners, usually lack the stable mental (phonological) representation of the tone system that allows them to accurately reproduce the same sound patterns later. It is consequently extremely important for learners to establish the phonological representation of each tone and its relation to other tones. An efficient way to learn new words, phrases, and sentences is through listening and speaking. Establishing a stable phonological representation of sounds does not entail reading and memorizing pinyin spellings and tone markings of new vocabulary words in textbooks. However, the common practice in many language curricula and language classes is for Chinese language teachers and learners to rely heavily on pinyin at the initial stage of learning and teaching Chinese sounds. Students usually prepare new vocabulary by reading vocabulary lists out loud, and possibly practice writing out new vocabulary in both pinyin and Chinese characters in preparation for quizzes. Many learners even mistakenly assume that the Romanization system is a direct representation of the sounds of the Chinese language. Pinyin was developed in the 1950s based on earlier attempts to romanize Chinese. It was published by the Chinese government in 1958 as the official romanization system for Standard Chinese and used as the primary method of Chinese sound instruction in mainland China. The International Organization for Standardization (ISO) adopted pinyin as the standard romanization for modern Chinese in 1982, and the United Nations followed suit in 1986. It has also been accepted by the governments of Singapore and Taiwan, the United States Library of Congress, the American Library Association, and many international institutions. Pinyin has also become the dominant method for 7.1.2
Teaching Mandarin Chinese Tones
entering Chinese text into computers. Despite its widespread use, second language researchers have found that the pinyin spelling system at times impedes L2 learners’ progress in learning the Chinese sound system. For example, C-Y. Chen (2005) lists five points of difficulty for L2 learners which can be related directly back to the pinyin system. A single letter in the pinyin system is often used to represent multiple allophones, or even multiple phonemes, all of which are phonetically different from one another. For example, the three variants of T3 have very different pitch contours, but all bear the same tone mark. The pinyin letter ‘a’ is pronounced as [ä] in isolation, as [ɑ] in syllables such as dang, as [ɛ] in syllables such as tian or xuan, and as [a] in dan and sai. The pinyin letter ‘o’ is pronounced as [ǝ] in syllables like shou and you, as [u] in syllables such as song and xiong, as [ʊ] in mao and jiao, but as [o] in the syllables o or guo. A single letter used to represent multiple phones is misleading to L2 learners. Learning new words by listening directly to native speakers or recordings will keep the interference from the pinyin system to a minimum and establish the mental representation of tones in a more efficient way (see more discussions in C-Y. Chen 2005; Wu 2011). Chao (1948: 78) even argues that romanization is of no use for the initial learning of the sound system. When using the pinyin system, teachers should clearly convey to students that any signs, labels, and markings for sounds on paper are inherently limited in describing the linguistic system they are attempting to reflect. Only after first learning how to reproduce sounds from a native speaker can a student begin to profit from the use of any form of transcription (Chao 1948). I believe that the first several weeks is the best time to help learners form the mental representation of Chinese sounds. Learning new words and phrases by listening to and imitating the sounds produced by either native speakers or audio recordings of native speakers is more effective than focusing on the pinyin transcription. After sufficient listening to native speakers or audio recordings, a critical step is to connect the mental representations of tones with the articulators to realize the sounds phonetically. This process is known as having ‘active knowledge’ of a new language (Chao 1948). Learners should repeatedly practice listening and producing tone sequences since good pronunciation is a matter of motor skills coupled with ear training (Hockett 1951: xi). The Echo Method (Chao 1948; Chung 2013), in addition to various tone training methods (see Chan 1995; Zhao and Cheng 1997; So 2006; Chen and Massaro 2008; C. Yang 2011), may be helpful in practicing these motor skills. After Chao (1948) proposed an early version of the method, it was revised by Chung (2013) to train English-language learners in Taiwan. Chung (2013) lays out several basic steps: (1) listen to a good model speaking words and phrases followed by short sentences, (2) repeat or
‘echo’ the utterances silently to oneself, and then (3) say the utterance out loud. The first two steps will help develop an internal model that learners can use as a reference for their pronunciation. This method has been shown to fit nicely with current models of working memory in psychology (Contreras 2013). In addition to the use of a variety of concrete training methods, it is strongly encouraged that the training materials themselves also be varied. The following two sections turn to discussing the design of training materials. 7.2
Current Prevailing Teaching Materials
The mastery of Chinese sounds, including the lexical tone system, is the bedrock of oral proficiency. Precise and accurate pronunciation by L2 learners of Chinese has a widespread effect on the subsequent acquisition of features such as vocabulary, grammar, and idiom. Chao (1948) argues that the study of sounds will influence later study of Chinese “not by saving a few hours here and a few days there, but by multiplying the efficiency by integral factors” (Chao 1948: 73). Teaching materials for the first stage of Chinese sound learning is consequently of extreme importance. This section reviews the teaching materials for initial tone study in about twenty popular Chinese-language textbooks or sound-training books published in places such as China, the United States, Japan, and South Korea, since 1948. I include the materials incorporated in the latest editions of comprehensive Chinese language textbooks or monographs, and will focus on beginning proficiency levels only. Both the quality and the quantity, however, of materials and exercises for teaching tone vary greatly in the materials commonly used to teach Chinese as a foreign language. Table 7.1 lists basic information on these beginning-level books, including titles, first authors, publication information (date, name of publisher), and the language used for explanations in the textbook. The table is ordered by the first edition’s year of publication. The majority of the textbooks are for beginning learners of Chinese, though some are for Chinese heritage learners (for example, Chou et al. 1997 and He et al. 2008). Most of the exercises in these books are primarily devoted to learning segments (vowels and consonants) rather than tones. For materials on tone training, I will focus my review on the following three components: (1) introduction to the inventory of Chinese tones, including background knowledge of sandhi rules; (2) suggestions for methods of study, including the clarification of learning objectives, the sequences of items for sound training, the frequency and intensity of practice, and criteria to be met for non-native sound evaluation; and (3) exercises aimed at tone training.
Teaching Mandarin Chinese Tones Table 7.1
Textbooks and monographs under review
Year of publication, place, and name of publisher
Chinese Language Textbooks 1 Mandarin Primer Chao, Yuen-ren 1948, Harvard University Press, MA, USA 2 College Chinese Lin, Shou-ying 1993, Cheng and Tsui Company, MA, USA 大学汉语 3 Communicating in Ning, Cynthia 1993, Far Eastern Publications, Chinese Yale University Press, CT, USA 1997, 2005, 2009, Cheng and Tsui 4 Integrated Chinese Liu, Yue-hua Company, MA, USA 中文听说读写 5 Oh, China! Chou, 1997, Princeton University Press, Chih-p’ing NJ, USA 中国啊，中国！ 6 Short-Term Spoken Ma, Jian-fei 2005, Beijing Language and Chinese, Vol.1. Culture University Press, P.R. China 汉语口语速成-入门
English English English English English Chinese, English
Hanyu Kouyu Sucheng (I) Experiencing Chinese
2005, Darakwon, South Korea
2006, 2012, Higher Education Press, P.R. China
Chinese Master (Step 1) Me and China
Park, Jung-goo 2007, Darakwon, South Korea
2008, McGraw-Hill, NY, USA
Liu, Ying Wu, Sue-mei
2010, Hakusuisha (白水社), Japan 2009, 2011, Pearson, NJ, USA
2010, 2011, Beijing Language and Culture University Press, P.R. China
Chinese and English
2010, Routledge, NY, USA
Communication Chinese Link 中文天地
New Practical Chinese Reader (2nd ed.)
The Routledge Course Ross, Claudia in Modern Mandarin Chinese
118 Table 7.1
Chapter 7 Textbooks and monographs under review (cont.)
Year of publication, place, and name of publisher
Encounters Ni Wo Ta
Ning, Cynthia Zhang, Phyllis
2012, Yale University Press, CT, USA English 2015, Cengage Learning, USA English
Monographs on Chinese Sound Training Hockett, 17 Progressive Exercises in Chinese Charles F. Pronunciation 18 Hanyu Yuyin Cao, Wen Jiaocheng
1951, The Institute of Far Eastern Languages, Yale University, USA
2002, Beijing Language and Culture Chinese University Press, P.R. China
Hanyu Zhengyin Jiaocheng
Chinese Pronunciation Practice for Foreigners
2005, Beijing University Press, P.R. China
2009, Higher Education Press, P.R. China
The presence of the three components listed above sets some of the older books such as Chao (1948) and Hockett (1951) apart from most of the recently published (after 1990) textbooks and monographs. While the majority of recently published textbooks only briefly introduce the Chinese tone inventory, provide a minimum amount of exercises and nearly no guidance on methods of study, the older books offer not only a thorough explanation of the Chinese tone inventory and meticulously designed exercises for both general and specific tone training, but also very detailed instructions on methods of study. 7.2.1 Tone Inventory Descriptions All textbooks describe four lexical tones in Chinese, as well as the neutral tone. Several types of descriptions of tone inventories are used, including plain descriptions of register and contour features (for example, ‘high level’ for T1), pitch values (for example, ‘55’ for T1), and diagrams of pitch contour for the
Teaching Mandarin Chinese Tones
tones. Chao (1948) offers a convenient musical key to associate with each tone. However, not all of the recent textbooks provide a full range of information on tones. For example, He et al. (2008) only gives tone names and tone marks and nothing else; Liu (2010) only includes diagrams of pitch contours and nothing else; and Liu (2010, 2011) and Ross et al. (2010) give plain descriptions and diagrams but not pitch values. The descriptions of tone features are not consistent across these books, especially with regard to T3. While contour directions are usually introduced, register information is sometimes omitted. For example, T3 is described as a ‘dipping tone,’ but is not specified as ‘low’ as well in Ning (1993). In terms of pitch values and diagrams, the majority of recently published books do not include pitch values and contour diagrams for Half-T3 (Liu et al. 1997, 2005, 2009; Jiang 2006, 2012). Out of all the books listed in Table 7.1, only Hockett (1951), Cao (2012), and R. Wang (2005) assume Half-T3 to be the default form of the third tone, and Chao (1948) is the only one that describes Half-T3 in detail, even providing it with its own annotation (‘½ 3rd’). It is very important that ‘½ 3rd’ be specifically labelled, so that Half-T3 may receive special attention from both teachers and students who use the book. Among these books, pitch values may also differ. For example, Cao (2002) and R. Wang (2005) describe Half-T3 as  but other books describe Half-T3 as . How T3 behaves when it precedes a neutral tone is also described differently across textbooks. For example, most books explain that T3 should be produced as Half-T3 when followed by a neutral tone (for example, P. Zhang 2015). However, Ross et al. (2010: 28) describes the third tone as pronounced with a falling-rising contour when followed by a neutral tone. Since the majority of the books reviewed here assume that the low-dipping T3 (Full-T3) is the base form, both the Pre-T3 and Half-T3 Sandhi Rules are used in the explanations. However, as mentioned above, it is very common to ignore training of Half-T3 in most books. 7.2.2 Methods of Study Learning to speak a foreign language not only requires study, but also frequent and intensive training, since speaking is a productive skill. Practice is especially important for learning tones. Most textbooks nowadays only provide a minimal introduction to the Chinese tone inventory and few exercises, leaving tone training largely to the teachers. It is important to understand that a solid phonetic foundation gives the learner opportunities in the future to apply, synthesize, and develop genuine competencies in comprehending and producing the target language. A concise guideline of study methods included in teaching materials or in the Teachers’ Guide will substantially assist teachers and provide learners with practical and efficient approaches to developing
phonetic knowledge and basic speaking skills. However, except for Chao (1948) and Hockett (1951), who provide a good amount of instruction for studying sounds, and P. Zhang (2015) who instructs students on effective procedures for practicing sounds, the majority of the books listed above lack suggestions for effective ways to study the sound inventory. I recommend four points for textbook writers, drawing from some insightful statements in the three books cited above. These points are suggested to be included, for the student’s benefit, in the textbook or workbook (‘the significance of laying the foundations of phonetic work’ and ‘procedures for practicing sounds’), or for teachers in the Teachers’ Guide (such as ‘methods of assessment’). 1. The Significance of laying the foundations of phonetic work Because of the essential nature of foundational phonetic work and its pervasive effect on subsequent learning, it is necessary for students to be informed about the importance of studying and practicing the sound system early on. For example, Chao (1948: 73, 84) tells students that it is fully worthwhile to devote to foundational phonetic work the first hundred hours of study; the consequent ease and precision with which students will grasp the formation of new words will fully justify the cost in time. He assures students that conscientious work at this stage will greatly ease the subsequent process of mastering grammar, vocabulary, and idiom, while poor work may be crippling. 2. Suggestions for procedures for practicing sounds Although laying the phonetic foundation is strenuous work, it is short compared to the time it takes to learn words and constructions after this initial stage. Therefore, it is important to utilize this period of time in an efficient way, and it is worthwhile for students to devote their fullest attention to phonetic foundation work. Suggestions on the procedures for practicing sounds may include recommended durations for each training exercise, recommended sequences for training items, and even concrete training procedures to follow when doing homework assignments. For example, P. Zhang (2015: 6) offers four steps to internalizing Chinese phonetics: Step 1: View and Listen: While looking at the pictures or videos, listen to the sounds and words in each group you hear and concentrate on comprehension. Do not repeat yet! Step 2: Listen and Listen: Close your eyes and ‘feel’ the sounds and tones. Try to hear the differences as well as the similarities between Chinese and your native language. Just listen!
Teaching Mandarin Chinese Tones
Step 3: Listen and Repeat: With your eyes still closed and your mind relaxed, listen to each sound three times, and then imitate it 2–3 times. Step 4: Repeat and Write: After repeating the sound, look at the screen to see the pinyin spelling of the sound or word, then write it down in the blank space under each word or on a piece of paper as you say the word 2–3 times. 3. Methods of assessment Clarifying learning objectives for each stage and their evaluation criteria can motivate students to reach a specific goal in sound learning. At the beginning stages of learning tones, standards of evaluation may not be as high as they are in later stages. For example, Chao (1948) explains in his textbook that at the beginning, the only necessary and sufficient rule is that any two sounds should be pronounced differently. The reason for insisting that different sounds be heard and pronounced differently is that sounds carry distinctions in word meaning, and hazy distinctions between sounds cannot deliver clear ideas. Students, of course, should also have a general idea what ‘good pronunciation’ is. For example, Hockett (1951: vii) explains that good pronunciation of Chinese need not be an exact counterpart of any one native speaker’s pronunciation. Instead, if learners establish habits well within the range of variation found among native speakers themselves, learners’ pronunciation will count as ‘good.’ As students’ learning progresses, higher standards may be set. Memorizing sounds and tones at an early stage is especially efficient for helping learners internalize the new sound system. For example, Hockett (1951: xiv) suggests that the learner be exposed to monologues or dialogues prepared by native speakers, which the learner would then memorize and produce from memory, similar to the way a violinist rehearses and performs a piece of music. 4. Suggestions for teachers Some teaching materials may also provide suggestions for teachers. For example, ‘translation’ teaching methods are not encouraged in the textbook Mandarin Primer. Instead, Chao (1948: 79) explains the significance of exposure, and reminds teachers that even when the student takes up the study of an untranslated text, translation should be used only as an aid to, and test for, the understanding of the text. It should not take too much time out of the classroom. Instructions for teachers are particularly important for laying foundational phonetic work, especially during the first several weeks. Further
instructions for teachers in tone training and sample teaching materials are offered in the next section of this chapter. 7.2.3 Exercises Aimed at Tone Training Except for Chao (1948) and several sound-training books such as Hockett (1951), Cao (2002, 2012), and R. Wang (2005), all of which contain a variety of training items, the amount and types of tone exercises contained in recently published textbooks or workbooks tend to be very limited. Table 7.2 briefly summarizes the most frequently seen tone-training items in these four books and six other recent textbooks published after 2000. The training items in the table include: general practices for single tones and tones in sequences (labelled ‘General exc.’), the neutral tone, Half-T3, Pre-T3 Sandhi, Yi and Bu Sandhi and other sandhi processes, and stress (word-level stress and sentence-level focal prominence marking). Items with question marks in the table indicate that only sample tone items are included, but no formal exercises are presented in the textbook. Note that stress training is included here, since stress is also a part of Chinese prosody and it is closely related to the phonetic realization of lexical tones. Stress training may include word-level stress and/or sentence-level focal prominence. In both Chinese and English, some syllables sound louder and are more prominent than others. In English, for example, PERmit is a noun meaning ‘a card granting permission to do something,’ while perMIT is a verb meaning ‘to allow.’ Similarly, in Chinese we can say xíng li with stress on the first syllable meaning ‘baggage,’ but also can say xíng lǐ with stress on the second syllable meaning ‘to salute.’ I refer to this type of stress as ‘word-level stress.’ At the sentence level, focused syllables and words should be stressed, with an accompanying adjustment in pitch range or other prosodic devices (see H. Zhang 2016c for details of L2 Chinese focal prominence marking). For example, in the sentence Wǒ hē jī tāng, bú shì yú tāng ‘I drink CHICKEN soup, not FISH soup,’ the syllables jī and yú are stressed. Although these ‘stress’ cases may occur at the phrase level and sentence level, the stressed items are usually monosyllabic or disyllabic words. While learners should expand the pitch range of the stressed words, the canonical pitch shape of the lexical tones should not be changed. In most textbooks, both listening and reading exercises are included. For listening-based training, students are mostly asked to follow directions such as ‘discriminate tones by listening’ or ‘listen to sounds and write down tones from dictation.’ There are also many speaking-based exercises available, including ‘practice distinguishing between similar single sounds,’ ‘read aloud these bitonal sequences,’ ‘read aloud these tone sequences containing the neutral
Teaching Mandarin Chinese Tones Table 7.2 Tone training items listed in textbooks or monographs Books
General Neutral Half-T3 exc. tone (Sandhi)
Pre-T3 Other Sandhi Sandhi process
Chao (1948) Hockett (1951) Cao (2012) R. Wang (2005) Liu et al. (2009) Liu (2011) P. Zhang (2015) Wu et al. (2011) Ross et al. (2010) He et al. (2008)
Yes (base form) Yes (base form) Yes (base form) No
Yes (yi, bu) Yes (yi, bu) Yes (yi, bu) Yes (yi, bu)? Yes (yi, bu)
Rhythm, Intonation Intonation
No No Yes
tones/T3 Sandhi/Yi or Bu Sandhi processes,’ and ‘recite Chinese poems/tongue twisters/essays,’ etc. 7.3
The last section of this book provides ten sets of sample tone-training exercises for beginning learners, including a note under each exercise for teachers. Each exercise, containing four to eight items, addresses at least one issue or difficulty in tone acquisition. The sample items can be expanded on or revised for speaking or listening tasks, or be customized for specific class needs. For
speaking exercises, learners are encouraged to hum the tones/tone sequences before putting segments and tones together. I suggest that the four lexical tones presented to learners for the first time include only T1 (High Level), T2 (Mid Rising), T3 (Low Level), and T4 (High Falling), but not low-dipping T3. 1. Expand pitch ranges G-T. Chen (1974) finds that one source of error in the production of tone can be attributed to learners failing to widen their pitch ranges when speaking Chinese. This exercise is mainly used to expand students’ pitch range, especially those who do not have any tonal language background. T1 and T3 is a pair of level tones with one at the ‘ceiling’ and the other at the ‘floor’ (Tsung 1987).1 When trying to expand learners’ pitch range, it is “better to err in pitching tones too low than too high” (Chao 1948:85). Note that T3 here is a low-level tone, for which students are encouraged to use the very bottom of their pitch range. When demonstrating tones to students at the beginning stage, it would be better for teachers not to use low-dipping T3, even at utterance-final position, to avoid confusion.
Hum T1 and T3 in alternation at slow, normal, and high speed. 1. T1-T3 / T1-T3 / T1-T3 / T1-T3 2. T3-T1 / T3-T1 / T3-T1 / T3-T1 Hum T2 and T4 in alternation at slow, normal, and high speed. This exercise is also used to practice fast pitch-direction shift 3. T2-T4 / T2-T4 / T2-T4 / T2-T4 4. T4-T2 / T4-T2 / T4-T2 / T4-T2 Insert vowels or meaningful segments into the tone patterns above. For example: 5. tīng-xiě / tīng-xiě / tīng-xiě / tīng-xiě 6. lǎo-shī / lǎo-shī / lǎo-shī / lǎo-shī 7. bú-shì / bú-shì / bú-shì / bú-shì 8. bù-xíng/ bù-xíng / bù-xíng / bù-xíng/
2. Experience all four Mandarin tones The four lexical tones presented to learners for the first time include only T1 (High Level), T2 (Mid Rising), T3 (Low Level), and T4 (High Falling). The distributions of the pitch contours of these four basic tones are very neat: a pair of level tones with one at the top and the other at the bottom, and a pair of contour tones with one rising and the other falling. Rather than reciting the 1 The pitch of the starting point of T4 is even higher than T1.
Teaching Mandarin Chinese Tones
four tones in the same order, try to vary the order. Items may be organized into a matrix, so that students can be called upon to listen/produce any of the items randomly assigned by the teacher. The student’s ability to immediately speak and discriminate tones by listening to tones in any order can help establish and consolidate the mental representation of individual tones. The speaking activity can be conducted at slow, normal, and high speeds.
• • •
Hum the tones line by line, in a column, or diagonally Listen and distinguish the tones line by line, in a column, or diagonally Listen/speak and distinguish the real words line by line, in a column, or diagonally Sample Matrix: 1. yī yí yǐ yì 2. dǎ dā dá dà 3. fú fù fū fǔ 4. gè gé gě gē 3. Quickly switch pitch targets Various studies have found that many L2 tone errors result from learners’ failure to quickly switch between pitch targets, especially when handling contour tones in connected speech (C. Yang 2011; Xu 2001). T2-T2 and T4-T4 tone sequences in disyllabic words are especially difficult tone sequences to acquire, as identified in Chapter 5 of this book. T2 in succession • Hum 1. T2-T2 / T2-T2-T2 / T2-T2-T2-T2 / T2-T2-T2-T2-T2 T4 in succession • Hum 2. T4-T4 / T4-T4-T4 /T4-T4-T4-T4 / T4-T4-T4-T4-T4 or speak the above tone sequences with real words. For example: • Listen 3. lán-qiú / wán lán-qiú / lái wán lán-qiú / néng lái wán lán-qiú 4.
diàn-shì / kàn diàn-shì / yào kàn diàn-shì / jìu yào kàn diàn-shì
4. Half-T3 and Raised-T3 Students should always remember that the default form of T3 is a low-level tone, even though the tone marking is a dipping contour. Students should also be instructed that producing two T3 syllables in succession is difficult. When a T3 is followed by another T3, the first T3 is produced as a rising tone, which makes the articulation of this sequence easier. This is the only way native speakers produce two T3 syllables in a row, even though the tone marking on the first T3 does not change. This process is called ‘T3 Sandhi.’ The T3 Sandhi
process should be practiced on bitonal sequences until students become very comfortable with this motor skill, before they are asked to practice T3 Sandhi processes in longer tone sequences.
Hum and read aloud Half-T3 and Raised-T3 in common words, for example: 1. lǎo-shī ‘teacher’ lǎo-tóu ‘old man’ lǎo-bǎn ‘boss’ lǎo- shì ‘always’ 2. kě-xīn ‘satisfying’ kě-néng ‘possible’ kě-yǐ ‘may’ kě-shì ‘but’ 3. shuǐ-bēi shuǐ-píng ‘level’ shuǐ-guǒ ‘fruit’ shuǐ-lì ‘water cup’ ‘waterpower’ 4. xiǎo-xīn xiǎo-shí ‘hour’ xiǎo-jiě ‘Miss’ xiǎo-qì ‘be careful’ ‘be mean’ To help students internalize the T3 Sandhi Rule, especially when the sandhi rule occurs across word boundaries, it is better to prepare a series of words in the same word class and ask students to combine different classes of words into phrases. For example, one could write the B words on the board, then choose any words from a stack of flashcards from the A group and ask students to combine the two. Tools such as flashcards and pictures are encouraged in this exercise. A words B words 5. xiǎo ‘little’ / lǎo ‘old’ (surname) Zhāng / Wáng / Lǐ / Dèng 6. xiǎo ‘little’ / lǎo ‘old’ (animal) yā ‘duck’ / niú ‘cow’ / gǒu ‘dog’ / xiàng ‘elephant’ 7. wǒ ‘I’ / nǐ ‘you’ (verb) chī ‘eat’ / wán ‘play’ / xiě ‘write’ / yòng ‘use’ 8. wǔ ‘five’ / měi ‘every’ (time) tiān ‘day’ / nián ‘year’ / miǎo ‘second’ / yuè ‘month’ When students are able to apply T3 Sandhi without hesitation and correctly produce sequences such as ‘(Half-T3) xiǎo-yā / (Half-T3) xiǎo-niú / (Raised-T3) xiǎo-gǒu / (Half-T3) xiǎo-xiàng,’ they can then move on to trisyllabic phrases. For example:
9. a. nǚ lǎo shī ‘female teachers’ 10. a. hěn hǎo chī ‘very delicious’ 11. a. yě yǒu rén ‘also have people’ 12. a. qǐng nǐ kàn ‘ask you to look’ 13. a. wǒ xiǎng kàn ‘I’d like to look’ 14. a. gěi wǒ kàn ‘show me’ 15. a. wǔ běn shū ‘five books’
b. nǚ xué shēng ‘female students’ b. hěn bù hǎo ‘very bad’ b. yě shì rén ‘also is a person’ b. qǐng tā kàn ‘ask him to look’ b. wǒ yào kàn ‘I want to look’ b. gěi tā kàn ‘show him’ b. wǔ gè rén ‘five people’
Teaching Mandarin Chinese Tones
When students are able to produce Raised-T3 satisfactorily, they can proceed to longer T3 sequences. For example (cf. Hockett 1951): 16. 17. 18. 19. 20.
wǒ mǎi biǎo ‘I buy a watch’ wǒ yě mǎi biǎo ‘I also buy a watch’ wǒ yě děi mǎi biǎo ‘I also have to buy a watch’ wǒ xiǎng wǒ děi mǎi biǎo ‘I think I have to buy a watch’ wǒ xiǎng wǒ děi mǎi shǒu biǎo ‘I think I have to buy a wrist watch’
nǐ mǎi biǎo ‘You buy a watch’ nǐ yě mǎi biǎo ‘You also buy a watch’ nǐ yě děi mǎi biǎo ‘You also have to buy a watch’ nǐ xiǎng nǐ děi mǎi biǎo ‘You think you have to buy a watch’ nǐ xiǎng nǐ děi mǎi shǒu biǎo ‘You think you have to buy a wrist watch’
5. Yi ‘一’ and Bu ‘不’ Tone Sandhi Similar to T3 Sandhi, which changes the first T3 to a rising tone, two T4 words, yi ‘one’ and bu ‘not,’ should also be changed to a rising tone when they are followed by another T4. However, students should remember that this T4 Sandhi only applies to yi and bu, and not to any other T4 syllables. The exercise for Yi and Bu Sandhi should always include other items in order to simulate the real language environment where other T4 syllables do not apply this sandhi rule. For example: 1. 2. 3. 4.
a. yì tiān ‘one day’ a. yì nián ‘one year’ a. yì qǐ ‘together’ a. yì yàng ‘be same’
b. b. b. b.
bù chī ‘not eat’ bù lái ‘not come’ bù hǎo ‘not good’ bù cuò ‘not bad’
c. c. c. c.
dà chī ‘to engorge’ dà xué ‘university’ dà xiǎo ‘size’ dà hào ‘large size’
6. Other optional sandhi processes Advanced-level students may need the opportunity to practice additional sandhi processes, such as T2 Sandhi in trisyllabic phrases and neutralization in reduplicated adjectives. Items 1–4 are four sample trisyllabic phrases in which the second syllable bearing T2 can optionally be produced as a T1 when preceded by T1 or T2 in fast conversational speed. 1. 2. 3. 4.
xī hóng shì ‘tomato’ → xī hōng shì cōng yóu bǐng ‘scallion cake’ → cōng yōu bǐng qín huái hé ‘Qinhuai river’ → qín huāi hé rén mín lù ‘Renmin road’ → rén mīn lù
7. Experience neutral tones Syllables bearing the neutral tone are greatly affected by the syllable immediately preceding it. Neutral-toned syllables are short and light, and are written without tone markings. Research finds that a neutral tone is phonetically realized as a half-low tone after T1, a middle tone after T2, a half-high tone after T3, and a low tone after T4. However, in classrooms it is fine if students are not aware of the pitch values of the neutral tone. According to Chen and Xu (2006), who examined three neutral tones in context, a consistent pitch target for neutral tones is at the lower end of a speaker’s mid-pitch range, the most comfortable pitch level for the human voice. This can be thought of as the vocal folds’ ‘resting’ position, the position to which the vocal folds bounce back after very-high or low-pitched syllables. Therefore, students only need to know that neutral tones are very short and quiet, with a midlevel pitch. Learners can be instructed to simply relax their muscles while producing neutral tones. Exercises involving the neutral tone may cover those seen most often in common words, such as kinship terms and phrases ending with particles. Since the neutral tone is sometimes confused with the low-level T3 or high-falling T4 (see Chapter 6), an exercise requiring learners to discriminate between these particular tone types may also be helpful. Tone sequences ‘T3+Neutral’ and ‘T4+Neutral’ should be included in listening tasks, such as sample (c) and (d) in (1)–(3). Neutral tones may be locate in the middle of an utterance. The unstressed forms of function words (such as classifiers, prepositions, personal pronouns, etc.) play an important role in speech rhythm (see Triskova 2016 for more information). Items (4) and (5) list some trisyllabic phrases containing neutral tones at different positions. 1. 2. 3. 4.
a. mā-ma ‘mom’ c. nǎi-nai ‘grandma’ a. chī le ‘finish eating’ c. hǎo le ‘it’s done’ a. tā de ‘his/her’ c. wǒ de ‘my’ a. tīng de dǒng ‘can understand’
c. zǒu bu kāi ‘cannot get away’ a. chū fā ba ‘let’s go’
c. lái wǎn le ‘have come late’
b. yé-ye ‘grandpa’ d. bà-ba ‘dad’ b. chí le ‘it’s late’ d. diào le ‘lost’ b. hóng de ‘red ones’ d. dà de ‘big ones’ b. nán de duō ‘much more difficult’ d. kàn bu jiàn ‘unable to see’ b. xìng wáng de ‘one whose surname is Wang’ d. xiàn zài ne? ‘And now?’
Teaching Mandarin Chinese Tones
8. Contour tone coarticulation Research presented in Chapter 4 indicates that T2 followed by T1 or T4 is often misproduced as a low T3. Although the effect is not as strong as it is in T2, T4 exhibits similar problems. The sample exercise materials presented below can be adapted to both listening and speaking tasks. The goal of the T2 exercise is for students to be able to distinguish and correctly produce T2 and T3 when they are immediately followed by tones with high onsets, i.e. T1 and T4. The exercise for T4 follows the same design. Discriminate the following word pairs • T21. tonea. coarticulation: lǎo-gōng ‘husband’ vs. b. láo-gōng ‘laborer’
2. a. hǎo-chē ‘good car’ vs. b. háo-chē ‘luxury car’ 3. a. shǐ-shàng ‘in history’ vs. b. shí-shàng ‘fashion’ 4. a. xǐ-qì ‘happy atmosphere’ vs. b. xí-qì ‘bad practice’ T4 tone coarticulation: Discriminate the following word pairs 5. a. xiào-chē ‘school bus’ vs. b. xiǎo-chē ‘small car’ 6. a. xiào-xīn ‘filial’ vs. b. xiǎo-xīn ‘be careful’ 7. a. shì-shàng ‘in the world’ vs. b. shǐ-shàng ‘in history’ 8. a. jiàn-miàn ‘to meet’ vs. b. jiǎn-miàn ‘alkaline noodle’
9. Word-level Stress Exercises for word-level stress can be merged with those for neutral tones. Words that are identical in all respects except stress are also recommended for word-level stress exercises. For example: aloud the following pairs of words which differ only in their stress pat• Read terns (stressed syllables are boldfaced)
1. 2. 3. 4.
a. xíng-li a. dì-dao a. dì-fang a. dà-yi
行李 地道 地方 大意
‘baggage’ ‘authentic’ ‘place’ ‘careless’
b. xíng-lǐ b. dì-dào b. dì-fāng b. dà-yì
行礼 ‘to salute’ 地道 ‘tunnel’ 地方 ‘local’ 大意 ‘main idea’
For more advanced students, teachers might use the following word pairs (homophones) with subtle differences in stress placement. Words in column (a) all have stress on the first syllable, while those in (b) have stress on the second syllable (cf. Wang and Feng 2006).
Discriminate the word pairs by reading or listening 5. zhù-shǒu: a. 助手 ‘assistant’ b. 住手 ‘stop; hands off’ 6. gōng-yǒu: a. 公有 ‘publicly-owned’ b. 工友 ‘fellow worker’ 7. chá-yè: a. 茶叶 ‘tea leaf’ b. 查夜 ‘night patrol’ 8. jiāo-dài: a. 交待 ‘order; confess’ b. 胶带 ‘tape’ 9. fán-rén: a. 凡人 ‘ordinary person’ b. 烦人 ‘annoying’ 10. shǒu-fǎ: a. 手法 ‘technique; skill’ b. 守法 ‘be law-abiding’ 10. Sentence-level focus marking For sentence-level focus marking, explicit instructions should be given to students before each exercise. In Mandarin Chinese, a more prominent syllable is louder, longer in duration, and most important, has a widened pitch range (i.e. the highs are higher and the lows are lower). However, this widened pitch range does not change the essential contour directions of each tone. A T1 syllable should be pronounced higher and longer when stressed compared to one without stress. A T2 syllable should be pronounced longer and have a sharper rise when stressed. A T3 (low-level) syllable should be pronounced longer and lower in pitch (and actually quieter compared to an unstressed T3, although the total energy for the T3 syllable is greater due to its longer duration). Students should also remember the general rule about the distribution of  and : the default T3 form (low-level) should be preserved even when the syllable is stressed – the T3 syllable should not be produced as a low-dipping tone when followed by T1, T2, T4, or the neutral tone. T3 can be pronounced only as a dipping tone when it is in isolation or at the end of an utterance when stressed. T4 should fall more sharply when stressed. For sentence-level focus marking, exercises can consist of sentences in which various different syllables chosen by the teacher are given stress. Twelve sample tonal sequences of three syllables are presented below for the initial stage of tone-intonation training. The stressed syllables bearing varying tones are located at different positions with respect to all the syllables in T1.The pitch range of those syllables out of focus should be suppressed. The proper handling of unstressed syllables in connected speech also helps L2 learners to improve their oral performance. 1. 2. 3. 4.
Māo zhēn duō. Yáng zhēn duō. Niǎo zhēn duō. Xiàng zhēn duō.
‘There are lots of CATS.’ ‘There are lots of SHEEP.’ ‘There are lots of BIRDS.’ ‘There are lots of ELEPHANTS.’
Teaching Mandarin Chinese Tones
5. Tāng zhēn suān. 6. Tāng hái suān. 7. Tāng yě suān. 8. Tāng tài suān. 9. 10. 11. 12.
Tāng zhēn suān. Tāng zhēn tián. Tāng zhēn kǔ. Tāng zhēn là.
‘The soup is REALLY sour.’ ‘The soup is STILL sour.’ ‘The soup is ALSO sour.’ ‘The soup is TOO sour.’ ‘The soup is really SOUR.’ ‘The soup is really SWEET.’ ‘The soup is really BITTER.’ ‘The soup is really PEPPERY.’
References Abramson, A.S. (1979). “The Coarticulation of Tones: An Acoustic Study of Thai.” In Studies in Tai and Mon-Khmer Phonetics and Phonology in Honour of Eugénie J.A. Henderson, edited by T.L. Thongkum, P. Kullavanijaya, V. Panupong, and K. Tingsabadh. Bankok: Indigenous Languages of Thailand Research Project, 127–34. Adjemian, C. (1976). “On the Nature of Interlanguage Systems.” Language Learning. 26 (2): 297–320. Alderete, J. (1997). “Dissimilation as Local Conjunction.” In Proceedings of the North East Linguistics Society 27, edited by K. Kusumoto, 17–32. Amherst, MA: GLSA Publications. Alderete, J. (2003). “Phonological Processes: Dissimilation.” In International Encyclopedia of Linguistics, edited by W. Frawley, 323–24. 2nd ed. Oxford: Oxford University Press. Alderete, J. and S. Frisch. (2007). “Dissimilation in Grammar and the Lexicon.” In The Cambridge Handbook of Phonology, edited by Paul de Lacy, 379–98. Cambridge: Cambridge University Press. Alderete, J., P. Tupper, and S. Frisch. (2013). “Phonological Constraint Induction in a Connectionist Network: Learning OCP-Place Constraints from Data.” Language Sciences. 37: 52–69. Archibald, J. (1998). “Second Language Phonology, Phonetics, and Typology.” Studies in Second Language Acquisition. 20: 189–211. Arvaniti, A. (2011). “The Representation of Intonation.” In The Blackwell Companion to Phonology, edited by M. Oostendorp, C.J. Ewen, E.V. Hume, and K. Rice, 757–80. Oxford: Wiley-Blackwell. Bao, Z.M. (1999). The Structure of Tone. Oxford: Oxford University Press. Barlow, J. and J. Gierut. (1999). “Optimality Theory in Phonological Acquisition.” Journal of Speech, Language and Hearing Research. 42: 1482–98. Beckman, M. (1986). Stress and Non-Stress Accent. Dordrecht: Foris Publications. Beckman, M. and J. Pierrehumbert. (1986). “Intonational Structure in Japanese and English.” Phonology Yearbook. 3: 255–309. Bent, T. (2005). Perception and Production of Non-Native Prosodic Categories. PhD diss., Northwestern University. Bluhme, H., and R. Burr. (1971). “An Audio-Visual Display of Pitch for Teaching Chinese Tones.” Linguistics. 22: 51–57. Boersma, P. and B. Hayes. (2001). “Empirical Tests of the Gradual Learning Algorithm.” Linguistic Inquiry. 32: 45–86.
Boersma, P. and C. Levelt. (2000). “Gradual Constraint-Ranking Learning Algorithm Predicts Acquisition Order.” In Proceedings of Child Language Research Forum 30, edited by E.V. Clark, 229–37. Stanford: CSLI Publications. Boersma, P. and D. Weenink. (2011). “Praat: Doing Phonetics by Computer” [Computer Program]. Version 5.2.17. Accessed November 2011. http://www.praat.org. Bolinger, D. (1951). “Intonation: Levels versus Configurations.” Word. 7: 199–210. Broselow, E. (2004). “Unmarked Structures and Emergent Rankings in Second Language Phonology.” International Journal of Bilingualis. 8 (1): 51–65. Broselow, E., S. Chen, and C. Wang. (1998). “The Emergence of the Unmarked in Second Language Phonology.” Studies in Second Language Acquisition. 20: 261–80. Broselow, E., R. Hurtig, and C. Ringen. (1987). “The Perception of Second Language Prosody.” In Interlanguage Phonology, edited by G. Ioup and S. Weinberger, 350–62. Cambridge, MA: Newbury House Publishers. Broselow, E. and Z. Xu. (2004). “Differential Difficulty in the Acquisition of Second Language Phonology.” International Journal of English Studies. 4 (2): 135–63. Brunelle, M. (2003). “Tonal Coarticulation Effects in Northern Vietnamese.” In 15th International Congress of Phonetic Sciences (ICPhS 15), edited by Maria-Josep Solé, Daniel Recasens, and Joaquín Romero, 2673–76. Barcelona: Futurgraphic. Cao, W. (2002). Hanyu yuyin jiaocheng [A Course of Chinese Phonology]. Beijing: Beijing Language and Culture University Press. Cardoso, W. (2007). “The Variable Development of English Word-Final Stops by Brazilian Portuguese Speakers: A Stochastic Optimality Theory Account.” Language Variation and Change. 19: 219–48. Cardoso, W. (2011). “Onset‐Nucleus Sharing and the Acquisition of Second Language Codas: A Stochastic Optimality Theoretic Account.” Studia Linguistica. 65 (2): 198–231. Casas-Tost, H. and S. Rovira-Esteva. (2015). “Mapping Chinese Language Pedagogy from 1966 to 2013: A Bibliometric Study of the Journal of Chinese Language Teachers Association.” Journal of the Chinese Language Teachers Association. 50 (2): 31–58. Chan, M.K.M. (1995). “Students’ Tone Production and Audio-Visual Feedback.” Paper presented at the Annual Meeting of the Chinese Language Teachers Association, Anaheim, California, 18–20 November, 1995. Chao, Y.R. (1930). “A System of Tone Letters.” Le Maître Phonétique. 45: 24–27. Chao, Y.R. (1933). “Tone and Intonation in Chinese.” Bulletin of the Institute of History and Philology. 4: 121–34. Chao, Y.R. (1948). Mandarin Primer. Cambridge: Harvard University Press. Chao, Y.R. (1951). “The Cantian Idiolect: An Analysis of the Chinese Spoken by a TwentyEight Months-Old Child.” In Semitic and Oriental Studies, edited by W.J. Fischel, 27– 44. Berkeley: University of California Press.
Chao, Y.R. (1968). A Grammar of Spoken Chinese. Berkeley: University of California Press. Chao, Y.R. (1980). Yu Yan Wen Ti. Beijing: Shangwu Press. Chen, A. (2013). Universal Biases in the Perception of Mandarin Tones, from Infancy to Adulthood. LOT Dissertation Series, Netherlands Graduate School of Linguistics, Landelijke. Chen, C-Y. (1983). “A Fifth Tone in the Mandarin Spoken in Singapore.” Journal of Chinese Linguistics. 11 (1): 92–119. Chen, C-Y. (2005). “Proposed Modifications in Teaching Materials for Mandarin Phonology.” Journal of the Chinese Language Teachers Associatio. 40 (2): 67–78. Chen, G-T. (1974). “Pitch Range of English and Chinese Speakers.” Journal of Chinese Linguistics. 2: 159–71. Chen, M.Y. (2000). Tone Sandhi: Patterns Across Chinese Dialects. Cambridge: Cambridge University Press. Chen, Q-H. (1997). “Toward a Sequential Approach for Tonal Error Analysis.” Journal of the Chinese Language Teachers Association. 32 (1): 21–39. Chen, Q-H. (2000). Analysis of Mandarin Tonal Errors in Connected Speech by EnglishSpeaking American Adult Learners. PhD diss., Brigham Young University. Chen, S. (1973). “The Third Tone and See-Saw Pairs.” Journal of the Chinese Language Teachers Association. 8 (3): 145–49. Chen, T.H. and D.W. Massaro. (2008). “Seeing Pitch: Visual Information for Lexical Tones of Mandarin-Chinese.” Journal of the Acoustical Society of America. 123 (4): 2356–66. Chen, Y. (2012). “Tonal Variation.” In The Oxford Handbook of Laboratory Phonology, edited by A.C. Cohn, C. Fougeron, and M.K. Huffman, 103–114. Oxford: Oxford University Press. Chen, Y. and Y. Xu. (2006). “Production of Weak Elements in Speech: Evidence from F0 Patterns of Neutral Tone in Standard Chinese.” Phonetica. 63 (1): 47–75. Cheng, C.C. (1973). A Synchronic Phonology of Mandarin Chinese. The Hague: Mouton de Gruyter. Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, N. (1975). Reflections on Language. New York: Pantheon. Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht: Foris Publications. Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N. (1999). “On the Nature, Use, and Acquisition of Language.” In Handbook of Child Language Acquisition, edited by T. Bhatia and W. Ritchie, 33–54. San Diego: Academic Press. Chomsky, N. and M. Halle (1968). The Sound Pattern of English. New York: Harper and Row.
Chou, C-P., P. Link, and X-D. Wang. (1997). Oh, China! Princeton: Princeton University Press. Chuang, C.K. and S. Hiki (1972). “Acoustical Features and Perceptual Cues of the Four Tones of Standard Colloquial Chinese.” The Journal of the Acoustical Society of America, 52 (1A), 146. Chung, K.S. (2013). “Karen Chung Talks about the Echo Method.” Podcast on Talking Taiwan website. Accessed July 4, 2013. http://www.talkingtaiwan.com/about/. Clumech, H. (1980). “The Acquisition of Tone.” In Child Phonology, edited by G.H. Yenikomshian, J.F. Kavanagh, and C.A. Ferguson, 257–75. Vol. 1. New York: Academic Press. Contreras, D. (2013). The Echo Method and the Teaching of the Four Mandarin Chinese Tones. Senior honors thesis, University of North Carolina at Chapel Hill. Cooper, A. and Y. Wang. (2012). “The Influence of Linguistic and Musical Experience on Cantonese Word Learning.” Journal of the Acoustical Society of America. 131 (6): 4756–69. Cooper, A. and Y. Wang. (2013). “Effects of Tone Training on Cantonese Tone-Word Learning.” Journal of the Acoustical Society of America. 134 (2): 133–39. Cooper, W.E. and J.M. Sorensen. (1981). Fundamental Frequency in Sentence Production. New York: Springer-Verlag. Corder, S.P. (1967). “The Significance of Learners’ Errors.” International Review of Applied Linguistics. 5: 161–70. Delattre, P.C. (1965). Comparing the Phonetic Features of English, French, German, and Spanish. Heidelberg: Julius Groos Verlag. Duanmu, S. (2000). The Phonology of Standard Chinese. Oxford: Oxford University Press. Eckman, F.R. (2004). “From Phonemic Differences to Constraint Rankings: Research on Second Language Phonology.” Studies in Second Language Acquisition. 26: 513–49. Elliot, C.E. (1991). “The Relationship Between the Perception and Production of Mandarin Tones: An Exploratory Study.” University of Hawai‘i Working Papers in ESL. 10 (2): 177–204. Epstein, S., S. Flynn, and G. Martohardjono. (1996). “Second Language Acquisition: Theoretical and Experimental Issues in Contemporary Research.” Behavioral and Brain Sciences. 19: 677–758. Flege, J.E. (1993). “Production and Perception of a Novel, Second-Language Phonetic Contrast.” Journal of the Acoustical Society of America, 1589–608. Flemming, E. (2008). “The Role of Pitch Range in Focus Marking.” Slides from a talk given at the Workshop on Information Structure and Prosody, Studiecentrum Soeterbeeck, Ravenstein, Netherlands. Gandour, J. (1983). “Tone Perception in Far Eastern Languages.” Journal of Phonetics. 11: 149–75.
Gandour, J., S. Potisuk, and S. Dechongkit. (1994). “Tonal Coarticulation in Thai.” Journal of Phonetics. 22: 477–92. Gandour, J., S. Potisuk, S. Dechongkit, and S. Ponglorpisit. (1992). “Tonal Coarticulation in Thai Disyllabic Utterances: A Preliminary Study.” Linguistics of the Tibeto-Burman Area. 15: 93–110. Gandour, J., D. Wong, L. Hsieh, B. Weinzapfel, D. Van Lancker, and G. Hutchins. (2000). “A Crosslinguistic PET Study of Tone Perception.” Journal of Cognitive Neuroscience. 12: 207–22. Garding, E., P. Kratochvil, J. Svantesson, J. Zhang. (1986). “Tone 4 and Tone 3 Discrimination in Modern Standard Chinese.” Language and Speech. 29 (3): 281–93. Goldsmith, J.A. (1976). “An Overview of Autosegmental Phonology.” Linguistic Analysis 2: 23–68. Gordon, M. (2001). “A Typology of Contour Tone Restrictions.” Studies in Language. 25: 405–44. Gussenhoven, C. (2004). The Phonology of Tone and Intonation. Research Surveys in Linguistics. Cambridge: Cambridge University Press. Han, M.S. and K. Kim. (1974). “Phonetic Variation of Vietnamese Tones in Disyllabic Utterances.” Journal of Phonetics. 2: 223–32. Hancin-Bhatt, B. (1998). “Optimality and Stages of L2 Syllables.” Paper presented at the Second Language Research Forum, Honolulu, HI. Hancin-Bhatt, B. (2000). “Optimality in Second Language Phonology: Codas in Thai ESL.” Second Language Research. 16 (3): 201–32. Hancin-Bhatt, B. (2008). “Second Language Phonology in Optimality Theory.” In Phonology and Second Language Acquisition, edited by J. Edwards and J. Zampini, 117–46. Amsterdam: John Benjamins. Hancin-Bhatt, B. and R. Bhatt. (1997). “Optimal L2 Syllables: Interaction of Transfer and Developmental Effects.” Studies in Second Language Acquisition. 19: 331–78. Hao, Y-C. (2012). “Second Language Acquisition of Mandarin Chinese Tones by Tonal and Non-Tonal Language Speakers.” Journal of Phonetics. 40: 269–79. Haraguchi, S. (1988). “Pitch Accent and Intonation in Japanese,” in Autosegmental Studies on Pitch Accent, edited by H. van der Hulst and N. Smith. 123–150. Dordrecht: Foris Publications. Harrison, P.A. (1999). The Acquisition of Phonology in the First Year of Life. PhD diss, University College London. Harrison, P.A. (2000). “Acquiring the Phonology of Lexical Tone in Infancy.” Lingua. 110: 581–616. Hayes, R. (1999). “Reranking Stages in OT Analysis of the Acquisition of Japanese as a Second Language.” Carolina Working Papers in Linguistics 1. He, Q., Y-N. Wu, and P. Yang. (2008). Me and China. New York: McGraw-Hill.
Hirose, H. (1997). “Investigating the Physiology of Laryngeal Structures.” In The Handbook of Phonetic Sciences, edited by W.J. Hardcastle and J. Laver, 116–36. Oxford: Blackwell. Hirst, D. and A. Di Cristo. (1998). “A Survey of Intonation Systems.” In Intonation Systems: A Survey of Twenty Languages, edited by D. Hirst and A. Di Cristo, 1–44. Cambridge: Cambridge University Press. Hockett, C.F. (1947). “Peiping Phonology.” Journal of the American Oriental Society. 67: 253–67. Hockett, C.F. (1951). Progressive Exercises in Chinese Pronunciation. New Haven: The Institute of Far Eastern Languages, Yale University Press. Hu, Z-L. (2008). “Zuowei waiyu de hanyu jiaoxue” [Teaching Chinese as a Foreign Language]. Foreign Language Education in China. 1 (2): 3–10. Hyman, L.M. (1993). “Register Tones and Tonal Geometry.” In The Phonology of Tone, edited by H. v.d. Hulst and K. Snider, 75–108. New York: Mouton de Gruyter. Hyman, L.M. (2007). “Universals of Tone Rules: 30 Years Later.” In Tones and Tunes: Studies in Word and Sentence Prosody, edited by T. Riad and D. Gussenhoven, 1–34. Berlin: Mouton de Gruyter. Hyman, L.M. and K. VanBik. (2004). “Directional Rule Application and Output Problems in Hakha Lai Tone.” Language and Linguistics. 5 (4): 821–61. Jiang, L-P. (2012). Experiencing Chinese. 2nd ed. Beijing: Higher Education Press. Jones, D. (1972). An Outline of English Phonetics. 9th ed. Cambridge: Heffer and Sons. Jongman, A. and C. Moore. (2000). “The Role of Language Experience in Speaker and Rate Normalization Processes.” In Proceedings of the 6th International Conference on Spoken Language Processing, 62–65. Jongman, A., Y. Wang, C.B. Moore, and J.A. Sereno. (2006). “Perception and Production of Mandarin Chinese Tones.” In Handbook of Chinese Psycholinguistics, edited by P. Li, L.H. Tan, E. Bates, and O. Tzeng, 209–17. Cambridge: Cambridge University Press. Jun, S-A. (1990). “The Prosodic Structure of Korean – in Terms of Voicing.” In Proceedings of the Seventh International Conference on Korean Linguistics 7, edited by E-J. Baek, 87–104. Toronto: University of Toronto Press. Jun, S-A. (1993). The Phonetics and Phonology of Korean Prosody. PhD diss., The Ohio State University. Jun, S-A. (1996). The Phonetics and Phonology of Korean Prosody. New York: Garland Publishing. Jun, S-A. (2005). Prosodic Typology. Oxford: Oxford University Press. Kager, R. (1999). Optimality Theory. Cambridge: Cambridge University Press. Kiriloff, C. (1969). “On the Auditory Discrimination of Tones in Mandarin.” Phonetica. 20: 63–67.
Kubozono, H. (1999). “Mora and Syllable.” In The Handbook of Japanese Linguistics, edited by N. Tsujimura, 31–61. Oxford: Blackwell. Kubozono, H. (2011). “Japanese Pitch Accent.” In The Blackwell Companion to Phonology, edited by M. Oostendorp, C.J. Ewen, E.V. Hume, and K. Rice, 2879–907, Oxford: Wiley-Blackwell. Ladd, D.R. (2008). Intonational Phonology. 2nd ed. Cambridge: Cambridge University Press. Lado, R. (1957). Linguistics across Cultures. Ann Arbor: University of Michigan Press. Lado, R. (1968). Linguistics across Cultures – Applied Linguistics for Language Teachers. Ann Arbor: University of Michigan Press. Laniran, Y. and C. Gerfen. (1997). “High Raising, Downstep and Downdrift in Igbo.” Presentation at the 71st Annual Meeting of the Linguistic Society of America, Chicago. 2–5 Jan., 1997. Leather, J. (1987). “F0 Pattern Inference in the Perceptual Acquisition of Second Language Tone.” In Sound Patterns in Second Language Acquisition, edited by A. James and J. Leather, 59–81. Dordrecht: Foris Publications. Leather, J. (1990). “Perceptual and Productive Learning of Chinese Lexical Tone by Dutch and English Speakers.” In New Sounds 90: Proceedings of the Amsterdam Symposium on the Acquisition of Second Language Speech, edited by J. Leather and A. James, 305–41. Amsterdam: University of Amsterdam. Leben, W. (1973). Suprasegmental Phonology. PhD diss., Massachusetts Institute of Technology. Lee, Y.S., D. Vakoch, and L. Wurm. (1996). “Tone Perception in Cantonese and Mandarin: A Cross-Linguistic Comparison.” Journal of Psycholinguistic Research. 25 (5): 527–42. Levenson, E.A. (1979). “Second Language Lexical Acquisition: Issues and Problems.” Interlanguage Studies Bulletin. 4: 147–60. Li, C. and S. Thompson (1977). “The Acquisition of Tone in Mandarin-Speaking Children.” Journal of Child Language. 4: 185–99. Liberman, M. (1975). The Intonational System of English. PhD diss., Massachusetts Institute of Technology. Liberman, M. and A. Prince. (1977). “On Stress and Linguistic Rhythm.” Linguistic Inquiry, 8: 249–336. Lim, B-J. (2001). “The Production and Perception of Word-Level Prosody in Korean.” In IULC Working Papers, edited by C. Clopper and K. de Jong. Vol. 1.1. Indiana University. Lin, M. and J. Yan. (1991). “Tonal Coarticulation Patterns in Quadrasyllabic Words and Phrases of Mandarin.” In Proceedings of the XIIth International Congress of Phonetic Sciences. 3: 242–45. Lin, S-Y. (1993). College Chinese. Boston: Cheng and Tsui.
Lin, W.C.J. (1985). “Teaching Mandarin Tones to Adult English Speakers: Analysis of Difficulties and Suggested Remedies.” RELC Journal. 16 (2): 31–47. Liu, X. (2011). New Practical Chinese Reader. 2nd ed. Beijing: Beijing Language and Culture University Press. Liu, Y. (2010). Communication. Tokyo: Hakusuisha (白水社). Liu, Y., T-C. Yao, N-P. Bi, Y-H. Shi, and L-Y. Ge. (2009). Integrated Chinese. 3rd ed. Boston: Cheng and Tsui. Lombardi, L. (2003). “Second Language Data and Constraints on Manner: Explaining Substitutions for the English Interdentals.” Second Language Research. 19 (3): 225–50. Lu, J-M. (1992). “On the Perception and Production of Mandarin Tones by Adult EnglishSpeaking Learners.” Dissertation Abstracts International. 53(7), 2350A. University Microfilms No. 9233909. Ma, J-F., Y. Su, and Y. Zhai (2005). Hanyu Kouyu Sucheng I [Short-Term Spoken Chinese]. 2nd ed. Beijing: Beijing Language and Culture University Press. Ma, J-F., Y, Su, and Y. Zhai. (2005). Hanyu Kouyu Sucheng I. Seoul: Darakwon. Major, R. (2001). Foreign Accent: The Ontogeny and Phylogeny of Second Language Phonology. Mahwah, NJ: Lawrence Erlbaum. McCarthy J. (1986). “OCP Effects: Gemination and Antigemination.” Linguistic Inquiry. 17: 207–64. McCarthy J. (2008). Doing Optimality Theory: Applying Theory to Data. Oxford: Blackwell. McCarthy, J. and A. Prince (1993). Prosodic Morphology I: Constraint Interaction and Satisfaction. Technical Report. No. 3, Rutgers University Center for Cognitive Science. McCarthy, J. and A. Prince (1994). “The Emergence of the Unmarked: Optimality in Prosodic Morphology.” Papers from the Annual Meeting of the North East Linguistic Society. 24: 333–79. McCarthy, J. and A. Prince (1995). “Faithfulness and Reduplicative Identity.” In Papers in Optimality Theory, edited by J. Beckman, L. Dickey, and S. Urbanczyk, 249–384. University of Massachusetts Occasional Papers 18. Amherst: Graduate Linguistic Student Association. McGinnis, S. (1996). “Tonal Distinction Errors by Beginning Chinese Language Students: A Comparative Study of American English and Japanese Native Speakers.” In Chinese Pedagogy: An Emerging Field, edited by S. McGinnis, 81–91. Columbus: Foreign Language Publications. Mei, T-L. (1977). “Tones and Tone Sandhi in 16th Century Mandarin.” Journal of Chinese Linguistics. 5: 237–60. Miracle, W.C. (1989). “Tone Production of American Students of Chinese: A Preliminary Acoustic Study.” Journal of the Chinese Language Teachers Association. 24 (3): 49–65.
Moore, C.B. and A. Jongman. (1997). “Speaker Normalization in the Perception of Mandarin Chinese Tones.” Journal of the Acoustical Society of America. 102 (3): 1864–77. Nemser, W. (1971). “Approximative Systems of Foreign Language Learners.” International Review of Applied Linguistics in Language Teaching. 9: 115–23. Nespor, M. and I. Vogel. (1986). Prosodic Phonology. Dordrecht: Foris Publications. Ning, C. (1993). Communicating in Chinese. New Haven: Far Eastern Publications / Yale University. Ning, C. and J.S. Montanaro. (2012). Encounters. New Haven: Yale University Press. Norman, J. (1988). Chinese. Cambridge: Cambridge University Press. Ohala, J.J. (1978). “Production of Tone.” In Tone: A Linguistic Survey, edited by V.A. Fromkin, 3–39. New York: Academic Press. Ohala, J.J. (1981). “The Listener as a Source of Sound Change.” In Papers from the Parasession on Language and Behavior, edited by C.S. Masek, R.A. Hendrick, and M.F. Miller, 178–203. Chicago: Chicago Linguistic Society. Palmer, A. (1969). “Thai Tone Variants and the Language Teacher.” Language Learning. 19: 287–99. Park, J-G. and E-H. Baek. (2007). Chinese Master (Step I). Seoul: Darakwon. Pater, J. (2009). “Weighted Constraints in Generative Linguistics.” Cognitive Science. 33: 999–1035. Pierrehumbert, J.B. (1980). The Phonology and Phonetics of English Intonation. PhD diss., Massachusetts Institute of Technology. Pierrehumbert, J.B. and M. Beckman. (1988). Japanese Tone Structure. Linguistic Inquiry Monograph 15. Cambridge, MA: MIT Press. Pike, K. (1948). Tone Languages. Ann Arbor: University of Michigan Press. Pinker, S. (1984). Language Learnability and Language Development. Cambridge, MA: Harvard University Press. Pinker, S. (1994). The Language Instinct. London: Penguin. Potisuk, S., J. Gandour, and M.P. Harper. (1997). “Contextual Variations in Trisyllabic Sequences of Thai Tones.” Phonetica. 54: 22–42. Prince, A. and P. Smolensky. (1993/2004). Optimality Theory: Constraint Interaction in Generative Grammar. Malden, MA & Oxford: Blackwell. [Revision of 1993 technical report, Rutgers University Center for Cognitive Science. Available on Rutgers Optimality Archive, ROS-537] Qiu, X-Y. (2009). Chinese Pronunciation Practice for Foreigners. Beijing: Higher Education Press. R Core Team. (2000). R Language Definition. Vienna: R Foundation for Statistical Computing. Ross, C., B-Z. He, P-C. Chen, and M. Yeh. (2010). The Routledge Course in Modern Mandarin Chinese. New York: Routledge.
Selinker, L. (1972). “Interlanguage.” International Review of Applied Linguistics. 10: 209–31. Selkirk, E. (1984). Phonology and Syntax: The Relation between Sound and Structure. Cambridge, MA: MIT Press. Shang, X-Y. (2000). “Duiwai hanyu cihui dengji dagang jia yi ji ci shengdiao guilü de diaocha” [A Survey of the Tonal Patterns and Stress Patterns of Entries of Levels A and B in the Outline of Chinese Proficiency (Graded Vocabulary)]. Chinese Teaching in the World. 52: 43–47. Shen, X. (1989). “Toward a Register Approach in Teaching Mandarin Tones.” Journal of the Chinese Language Teachers Association. 24: 27–47. Shen, X. (1990). “Tonal Coarticulation in Mandarin.” Journal of Phonetics. 18: 281–95. Shen, X. (1992). “On Tone Sandhi and Tonal Coarticulation.” Acta Linguistica Hafniensia. 24: 131–52. Shen, X., M. Lin, and J. Yan. (1993). “F0 Turning Point as an F0 Cue to Tonal Contrast: A Case Study of Mandarin Tones 2 and 3.” Journal of the Acoustical Society of America. 93: 2241–43. Shepherd, M.A. (2003). Constraint Interactions in Spanish Phonotactics: An Optimality Theory Analysis of Syllable-Level Phenomena in the Spanish Language. MA thesis, California State University, Northridge. Shi, J. (2007). “On Teaching Tone Three in Mandarin.” Journal of the Chinese Language Teachers Association. 42 (2): 1–10. Shi, P.W. and M. Li. (1997). “Sansheng wenti yanjiu” [On the Third Tone]. In Language Studies and Teaching Chinese as a Foreign Language, edited by J.M. Zhao, 125–54. Beijing: Beijing Language and Culture University Press. Shih, C. (1987). “The Phonetics of the Chinese Tonal System.” AT&T Bell Laboratories Technical Memorandum. MH 11225. Shih, C. and H. Lu. (2010). “Prosody Transfer and Suppression: Stages of Tone Acquisition.” Speech Prosody 2010 Conference Proceedings. Chicago. Accessed March 2016. http://speechprosody2010.illinois.edu/papers/100968.pdf. Shih, C. and R. Sproat. (1992). “Variations of the Mandarin Rising Tone.” In Proceedings of the IRCS Workshop on Prosody in Natural Speech. 92 (37): 193–200. Philadelphia: The Institute for Research in Cognitive Science, University of Pennsylvania. Smolensky, P. (1993). “Harmony, Markedness, and Phonological Activity.” Handout to talk presented at the Rutgers Optimality Workshop 1, New Brunswick, NJ. So, C.K.L. (2006). Effects of L1 Prosodic Background and AV Training on Learning Mandarin Tones by Speakers of Cantonese, Japanese, and English. PhD diss., Simon Fraser University. Stemple, J.C., L.E. Glaze, B. Gerdeman-Klaben. (2000). Clinical Voice Pathology, Theory and Management. 3rd ed. San Diego: Singular Publishing Group.
Stragray, J.R. and D. Downs. (1993). “Differential Sensitivity for Frequency among Speakers of a Tone and a Nontone Language.” Journal of Chinese Linguistics. 21 (1): 143–63. Sun, S. (1998). The Development of a Lexical Tone Phonology in American Adult Learners of Standard Mandarin Chinese. Honolulu: University of Hawai’i Press. Suzuki, K. (1998). A Typological Investigation of Dissimilation. PhD diss., University of Arizona. Sweet, H. (1877). Handbook of Phonetics. Oxford: Clarendon Press. Tai, J.H-Y. (1978). Phonological Changes in Modern Standard Chinese in the People’s Republic of China Since 1949. Office of Research, United States Information Agency, Washington, DC. Tao, L. and L-J. Guo. (2008). “Learning Chinese Tones: A Developmental Account.” Journal of the Chinese Language Teachers Association. 43 (2): 17–46. Tesar, B. and P. Smolensky. (1998). “Learnability in Optimality Theory.” Linguistic Inquiry 29: 229–68. Tesar, B. and P. Smolensky. (2000). Learnability in Optimality Theory. Cambridge, MA: MIT Press. Todo, A. (藤堂明保). (1980). Chinese Phonology. Tokyo: Koseikan. Triskova, H. (2016). “De-stressed Words in Mandarin: Drawing Parallel with English.” In Integrating Chinese Linguistics Research and Language Learning and Teaching, edited by H. Tao, 121–144. Amsterdam: John Benjamins. Tsung, C. (1987). “Half-Third First: On the Nature of the Third Tone.” Journal of the Chinese Language Teachers Association. 22 (1): 87–101. Venditti, J. (2005). “The J_ToBI Model of Japanese Intonation.” In Prosodic Typology, edited by S.A. Jun, 172–200. Oxford: Oxford University Press. Venditti, J., K. Maekawa, and M. Beckman. (2008). “Prominence Marking in the Japanese Intonation System.” In Handbook of Japanese Linguistics, edited by S. Miyagawa and M. Saito, 456–512. Oxford: Oxford University Press. Villafaña, C. (2000). “Emergence of the Unmarked in Interlanguage Coda Production.” Working Papers of the George Mason University Linguistics Club. 33–48. Wang, R-J. (2005). Hanyu Zhengyin Jiaocheng. Beijing: Beijing University Press. Wang, X.C. (2006). “Perception of L2 Tones: L1 Lexical Tone Experience May Not Help.” In The Proceedings of the Third International Conference on Speech Prosody, edited by R. Hoffmann and H. Mixdorff, 85–88. Dresden Germany: TUD press. Wang, Y., A. Jongman, and J.A. Sereno. (2001). “Dichotic Perception of Mandarin Tones by Chinese and American Listeners.” Brain and Language. 78: 332–48. Wang, Y., J.A. Sereno, A. Jongman, and J. Hirsch. (2003). “fMRI Evidence for Cortical Modification during Learning of Mandarin Lexical Tone.” Journal of Cognitive Neuroscience. 15 (7): 1–9.
Wang, Y., M. Spence, A. Jongman, and J. Sereno. (1999). “Training American Listeners to Perceive Mandarin Tones.” Journal of the Acoustical Society of America, 106: 3649–58. Wang, Y-J. (1995). “Ye tan meiguo ren xuexi hanyu shengdiao” [On American Learners’ Tone Acquisition], Language Teaching and Research. 2: 126–40. Wang, Y-J. (1997). “Yangping de xietong fayin yu waiguoren xuexi yangping” [The Coarticulation of Tone 2 and the Acquisition of Tone 2 by Foreigners]. Language Teaching and Research. 4: 94–104. Wang, Z.J. and S.L. Feng. (2006). “Shengdiao duibifa yu Beijing hua shuangyinzu de zhongyin leixing.” [A Study of Stress Patterns in Disyllabic Words of Beijing Dialect] Yuyan Kexue [Linguistic Sciences] 5 (1): 3–22. Wen, B. and F. Yan. (2015). “The Merging between the Second Tone and the Third Tone in Mandarin Acquisition by L2 Learners.” Journal of the Chinese Language Teachers Association. 50 (1): 19–41. White, C. (1981). “Tonal Pronunciation Errors and Interference from English Intonation.” Journal of the Chinese Language Teachers Association. 16 (2): 27–56. White, L. (2003). Second Language Acquisition and Universal Grammar. Cambridge: Cambridge University Press. Winke, P.M. (2007). “Tuning into Tones: The Effect of L1 Background on L2 Chinese Learners’ Tonal Production.” Journal of the Chinese Language Teachers Association. 42 (3): 21–55. Wong, P.C.M. and T.K. Perrachione. (2007). “Learning Pitch Patterns in Lexical Identification by Native English-Speaking Adults.” Applied Psycholinguistics. 28: 565–85. Wu, C-H. (2011). The Evaluation of Second Language Fluency and Foreign Accent. PhD diss., University of Illinois at Urbana-Champaign. Wu, S-M., Y-M. Yu, Y-H. Zhang, and W-Z. Tian. (2011). Chinese Link. 2nd ed. Upper Saddle River: Prentice Hall/Pearson. Xu, Y. (1994). “Production and Perception of Coarticulated Tones.” Journal of the Acoustical Society of America. 95 (4): 2240–53. Xu, Y. (1997). “Contextual Tonal Variations in Mandarin.” Journal of Phonetics. 25: 61–83. Xu, Y. (1998). “Consistency of Tone-Syllable Alignment across Different Syllable Structures and Speaking Rates.” Phonetica. 55: 179–203. Xu, Y. (1999). “Effects of Tone and Focus on the Formation and Alignment of F0 Contours.” Journal of Phonetics. 27: 55–105. Xu, Y. (2001). “Sources of Tonal Variations in Connected Speech.” Monograph series, Journal of Chinese Linguistics. 17: 1–31. Yang, B. (2012). “The Gap between the Perception and Production of Tones by American Learners of Mandarin: An Intralingual Perspective.” Chinese as a Second Language Research. 1 (1): 31–52. Yang, B. (2015). Perception and Production of Mandarin Tones by Native Speakers and L2 Learners. Verlag Berlin Heidelberg: Springer.
Yang, B. and N. Yang. (2017). “Development of Disyllabic Tones in Different Learning Contexts.” International Review of Applied Linguistics in Language Teaching. Accessed 23 November 2017. doi:10.1515/iral-2016-0004. Yang, C-S. (2011). The Acquisition of Mandarin Prosody by American Learners of Chinese as a Foreign Language (CFL). PhD diss., Ohio State University. Yip, M. (1980). The Tonal Phonology of Chinese. PhD diss., Massachusetts Institute of Technology. Published 1990, New York: Garland Publishing. Yip, M. (1988). “The Obligatory Contour Principle and Phonological Rules: A Loss of Identity.” Linguistic Inquiry. 19 (1): 65–100. Yip, M. (2002). Tone. Cambridge: Cambridge University Press. Young-Scholten, M. (1994). “On Positive Evidence and Ultimate Attainment in L2 Phonology.” Second Language Research. 10 (3): 193–214. Yue-Hashimoto, A.O.-K. (1980). “Word Play in Language Acquisition: A Mandarin Case.” Journal of Chinese Linguistics. 8 (2): 181–204. Zhang, H. (2007). A Phonological Study of Second Language Acquisition of Mandarin Chinese Tones. MA thesis, University of North Carolina at Chapel Hill. Zhang, H. (2010). “Phonological Universals and Tone Acquisition.” Journal of the Chinese Language Teachers Association. 45 (1): 39–65. Zhang, H. (2013). The Second Language Acquisition of Mandarin Chinese Tones by English, Japanese, and Korean Speakers. PhD diss., University of North Carolina at Chapel Hill. Zhang, H. (2014). “The Third Tone: Allophones, Sandhi Rules, and Pedagogy.” Journal of the Chinese Language Teachers Association. 49 (1): 117–45. Zhang, H. (2015). “Positional Effects in Second Language Chinese Tones.” Journal of Chinese Language Teaching. 12 (2): 1–30. Zhang, H. (2016a). “Dissimilation in the Second Language Acquisition of Mandarin Chinese Tones.” Second Language Research. 32 (3): 427–51. Zhang, H. (2016b). “The Effect of Theoretical Assumptions on Pedagogical Methods: A Case Study of Second Language Chinese Tones.” International Journal of Applied Linguistics. 27 (2): 363–82. Zhang, H. (2016c). “Focal Prominence Marking in Second Language Chinese Tones.” In Integrating Chinese Linguistics Research and Language Learning and Teaching, edited by H. Tao, 195–214. Amsterdam: John Benjamins. Zhang, H. (2018). “Current Trends in Research of Chinese Sound Acquisition.” In The Routledge Handbook of Chinese Second Language Acquisition, edited by C. Ke, 217–233, Routledge: New York. Zhang, H.M. and Y.X. Yin. (2012). “Youxuanlun de shiyufei: xiandai yinxixue yanjiu de ruogan fansi” [Pros and Cons of Optimality Theory: Some Thoughts on Phonological Issues.] Zhongguo Yuwen [Studies of the Chinese Language] 6: 483–99. Zhang, J. (2002). The Effects of Duration and Sonority on Contour Tone Distribution – A Typological Survey and Formal Analysis. New York: Routledge.
Zhang, J. (2004). “The Role of Contrast-Specific and Language-Specific Phonetics in Contour Tone Distribution.” In Phonetically Based Phonology, edited by Bruce Hayes, Robert Kirchner, and Donca Steriade, 157–90. Cambridge: Cambridge University Press. Zhang, J. (2009). “Contour Tone Distribution is Not an Artifact of Tonal Melody Mapping.” Studies in the Linguistic Sciences. 33 (1/2): 73–132. Zhang, J. and Y-W. Lai. (2010). “Testing the Role of Phonetic Knowledge in Mandarin Tone Sandhi.” Phonology. 27 (1): 153–201. Zhang, P. (2015). Ni Wo Ta. Boston: Cengage Learning. Zhao, J-M. (1988). “Cong yixie shengdiao yuyan de shengdiao shuodao hanyu shengdiao” [Talking about Tone in Chinese from the Perspective of Some Tone Languages]. Selected Papers from the Second International Symposium on Teaching Chinese as a Foreign Language, 171–81. Beijing: Beijing Language Institute. Zhao, J-M. and M-Z. Cheng. (1997). “Jichu hanyu yuyin jiaoxue de ruogan wenti” [Several Issues in Teaching the Mandarin Sound System]. In Yuyin yanjiu yu duiwai hanyu jiaoxue [Phonetic Study and Chinese Language Teaching], edited by J-M. Zhao and Z-M. Meng, 276–92. Beijing: Beijing Language and Culture University Press. Zhu, H. and B. Dodd (2000). “The Phonological Acquisition of Putonghua (Modern Standard Chinese).” Journal of Child Language. 27: 3–42. Zoll, C. (2004). “Positional Asymmetries and Licensing in Optimality Theory,” 343–64. Accessed May 2013. http://roa.rutgers.edu/files/282-0998/roa-282-zoll-4.pdf.
Index Accent 19, 22, 27–30, 38, 101 Accentual Phrase 28–30, 75n See also AP Accuracy rate 17, 36, 38, 48–52, 54, 57, 59, 74–77, 80, 90, 103, 106 Acoustic similarity 33–34, 41, 83, 106 ACTFL (American Council on the Teaching of Foreign Languages) 92 Allophone 9–10, 24, 34, 36, 83–87, 94, 109–110, 114–115 Anticipatory coarticulation 8, 31, 43–45, 47–50, 52–59 See also Anticipatory effect Anticipatory effect 9, 43–47, 50–52, 54–55, 58n AP 28–30 See also Accentual phrase Arytenoid cartilage 3 Autosegmental phonology 5, 66 Base form 6, 10–11, 85, 105–109, 110, 111, 119, 123 Carryover coarticulation 8–9, 43, 45, 58 See also Carryover effect Carryover effect 9, 37, 45, 58n See also Anticipatory coarticulation Citation form 4, 6, 8, 11–12, 66 Coarticulation 8–9, 24, 31, 43–45, 47–50, 52–59, 86, 129 Component tone 6, 47–48, 76 See also Constituent tone Consonant 1–2, 4, 8, 17–18, 43, 54, 65, 116 Constituent tone 66–67, 76 See also Component tone Constraint 9, 23–24, 34, 43, 58–59, 61–70, 77–82, 105, 112 See also Sub-constraint Contextual factor 9 Contextual variation 8 *CONTOUR 62, 69–70, 80 Contrastive feature 1 Correctness judgment 39–41 Cricoid cartilage 3
Dep(T) 63 Dipping 4, 10–11, 17–19, 21, 32–33, 41, 84, 86, 93, 102, 105–106, 111, 119, 124–125, 130 Dissimilation 9, 24, 34, 45, 58, 60, 65–66, 75, 77, 81 Duration 5–6, 27, 29, 41, 95 Error rate 33–35, 41, 49, 68–69, 71, 74–76, 77n, 78, 86–87, 93–101, 103, 105, 108, 110 Error type 49, 55, 57, 59, 94, 100 Faith 63–64, 66–70, 78–81 See also Faithfulness constraint Faithfulness constraint 62–63, 65, 67n, 70 See also Faith *FALL 62–64, 68, 74, 80–82 Falling tone 17, 31–34, 40, 49, 59, 62–64, 81, 85, 87, 89, 94, 110 Feature geometry 21, 113 First language acquisition 17, 107 See also L1 acquisition Focal prominence 14, 27, 29, 122 Foot 27–28 Full-T3 9–11, 32–34, 41, 84–87, 89–90, 92–95, 97–110, 113, 119 Fundamental frequency 3, 13, 29 F0 3, 5, 8, 13–14, 16, 18, 39, 41, 45–48, 53–55 Half-T3 9–11, 33–35, 40–41, 84–87, 89–110, 113, 119, 122–123, 125–126 Half-T3 Rule 85–87, 90–91, 95–96, 98, 101, 105, 108–109 See also Half-T3 Sandhi Half-T3 Sandhi 10–11, 119 See also Half-T3 Rule Hertz (Hz) 3, 7, 39, 53–54 Ident(T) 63 Identical tone combination 69–70, 113 See also Identical tone sequence; ITC Identical tone sequence 33–34, 36, 60, 65, 67, 69–71, 76, 79, 81 See also Identical tone Combination; ITC Input 24, 62–65, 68, 78–79
148 Interlanguage 20, 23–24, 26, 60–61, 68, 70, 77–79, 81, 112 Intermediate Phrase 27 See also ip, Phonological Phrase International Phonetic Alphabet (IPA) 1n Intonation 1–2, 8, 11, 13–14, 21, 26–32, 37, 69–70, 75n, 79, 101, 105, 114, 123, 130 Intonation phrase 27–28 See also IP Inventory 4, 17, 40, 63, 68, 116, 118–120 ip 27 See also Intermediate Phrase IP 27–28 See also Intonation phrase ITC 70–73, 75 See also Identical tone combination; Identical tone sequence Japanese 19, 22–23, 26–30, 32, 38, 40, 44, 47, 49, 52, 54, 56–58, 65, 68–70, 72–73, 75, 77, 81, 90, 95–96, 100–101, 103, 110, 117 Korean 22–23, 26–27, 29–30, 32, 38, 40, 44, 47, 49, 52, 54, 56–58, 68–70, 73, 75, 77, 81, 90, 96, 100–101, 103, 107, 110, 117 Laryngeal feature 30, 66 Laryngeal muscle 108 *LEVEL 62–64, 68, 70, 74, 80–82 L1 acquisition 17, 34, 60–61, 65, 107 See also First language acquisition Markedness constraint 62–65, 69, 78, 81 Max(T) 63 Mental representation 2, 112, 114–115, 125 Mora 27–29 Musical ability 15–16 See also Musical experience Musical experience 16 See also Musical ability Neutral tone 6–7, 10, 13, 37, 84–85, 88–89, 94–95, 105, 110, 118–119, 122–123, 128–130 See also qingsheng NITC 70–73 See also Nonidentical Tone Combination Nontonal language 1, 6, 14, 16, 19, 22, 26–27, 29, 32, 113
Index Nonidentical Tone Combination 70 See also NITC Normalization 18 Obligatory Contour Principle 24, 34, 60, 66, 112 See also OCP OCP 24, 34, 60–62, 65–82, 112 See also Obligatory Contour Principle OCP(ConstTone) 67, 76–77 OCP(Contour) 67, 69–71, 78–82 OCP(H) 67, 70–71, 77–81 OCP(L) 67–68, 70–71, 75, 78, 80–81 OCP(Level) 67, 80 OCP(LH) 67, 78–80, 82 OCP(WholeTone) 67–68, 76–77, 81 Offset 45, 48, 53, 55, 58 Onset 31, 41, 44–45, 47–48, 50, 52–59, 113, 129 Optimality Theory 23, 34, 61–62 See also OT OT 23–24, 61–66, 70, 76, 82 See also Optimality Theory Output 14, 62–63, 65, 68, 79 Particle 6, 37–38, 40, 88–89, 128 Pedagogical practice 82, 111–112 Pedagogy 85, 112 Perception 15–19, 22, 30–31, 34, 41, 83, 86, 88, 90, 92–95, 97, 105, 110, 112 Phoneme 2, 10, 16, 84–86, 90, 115 Phonetics 2, 21, 47, 53, 120 Phonological categorization 7 Phonological knowledge 3, 7 Phonological Phrase 27–28 See also PhP Phonological representation 3, 6–8, 114 Phonological universal 22–23, 34, 60–61, 82 Phonology 2, 5, 13, 15, 20, 23, 26, 27, 34, 61, 64, 66, 77–81, 105, 108, 113–114 PhP 28, 30 See also Phonological Phrase Pinyin 4, 37–38, 84–85, 88, 90, 92, 114–115, 121 Pitch accent 28–29, 38, 101 Pitch range 4, 7–8, 14, 21, 40, 53, 122, 124, 128, 130 Pitch target 7, 32, 47, 62, 113, 125, 128 Pitch value 4, 10, 39–41, 47, 59, 84, 94, 118–119, 128
Index Praat 39–41 Pre-T3 Sandhi 9–12, 18, 85–87, 90–91, 95–96, 98, 101, 105–109, 122–123 Production 2–3, 9, 15, 19–24, 30–31, 33–36, 39–41, 44, 47–49, 53–57, 59, 61, 69–76, 79, 82–83, 86–95, 97–106, 109–110, 112–113, 124 Prominence marking 27–28, 122 Prosodic Structure 21, 23, 26–29, 32–33, 43, 58–59 Prosody 14, 16, 21–23, 26–27, 69, 112, 122 Qingsheng 6 See also Neutral tone Raised-T3 9–10, 12, 35, 40–41, 66, 71, 73, 75, 80, 84–87, 89–91, 93, 95–98, 101, 104–106, 108, 110, 125–127 Ranking 52–54, 62–71, 78–81, 95, 97–99 Reduplicated adjective 12, 13, 127 Reduplicated form 6 Register 4–6, 9, 19, 21, 33, 40, 94, 113, 118–119 Reliability 40, 92 Re-ranking 24, 61, 65, 69, 78–80, 82 *RISE 62–65, 67–68, 74, 80–81 Rising tone 4, 10, 12, 40, 44, 49, 59, 62–64, 66, 78, 81, 84, 89, 101, 106, 125, 127 Romanization 4, 114–115 Sandhi 2, 8–13, 15, 18, 34, 36, 65–67, 69–70, 75, 83–91, 95–96, 98, 101, 105–110, 116, 119, 122–123, 125–127 Stress 8, 11, 13, 19, 21–22, 27–29, 122–123, 129–130 Stress accent 19, 22, 27 Sub-constraint 24, 67, 77, 79–81 See also Constraint Suprasegmental element 14 Suprasegmental feature 14 Surface form 6, 12, 66, 69, 89 Syllable structure 2, 23, 61, 65 TBU 5, 29, 33, 62–63, 69 See also Tone-bearing unit
149 TETU 79 See also The Emergence of the Unmarked The Emergence of the Unmarked 79 See also TETU Thyroid cartilage 3 TMS 24, 34, 60–64, 68–70, 73–75, 80–82, 87, 97–99, 105, 112 See also Tonal Markedness Scale Tonal grammar 23–24, 61, 63–64, 65n, 67–68, 112 Tonal language 1–3, 7–8, 15–16, 18–19, 26, 29, 39, 43–44, 47, 58–59, 63–64, 92, 124 See also Tone language Tonal Markedness Scale 24, 34, 58, 60, 62–63, 74, 77, 79, 87, 95, 105, 112 See also TMS Tone-bearing unit See also TBU 5, 29, 62 Tone language 9, 15, 26, 44, 63–64, 66 See also Tonal language Transfer 21–23, 26, 30–32, 60, 77, 82, 101, 105, 110, 112 Turning point 41 T3 sandhi 9–10, 12–13, 15, 18, 34, 65–67, 69, 75, 84–85, 87–90, 95, 101, 105–106, 123, 125–127 See also Pre-T3 Sandhi; Half-T3 Sandhi; Half-T3 Rule T4 Sandhi 34, 66, 127 UG 60–61 See also Universal Grammar Underlying form 6, 10, 68, 82–85, 107–108, 110 See also Base form Universal Grammar 23, 60–61, 87 See also UG Utterance-final 10–11, 21, 31, 84, 91, 93, 102–104, 108, 110, 124 Variation 1, 8–9, 15, 31, 47, 52–53, 58, 61, 80, 83–84, 86, 121 Voicing 1, 8 Vocal fold 1, 3, 7, 128 Vowel 1–2, 4, 8, 17–18, 43, 116, 124