Social Dynamics in Second Language Accent 9781614511762, 9781614512288

This volume offers a definitive source for understanding social influences in L2 pronunciation, demonstrating the import

233 108 1MB

English Pages 303 [304] Year 2014

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Social Dynamics in Second Language Accent
 9781614511762, 9781614512288

Table of contents :
Introduction
Part I: The Nature of Accent
1 The Social Nature of L2 Pronunciation
2 Acoustic-Phonetic Parameters in the Perception of Accent
3 Developmental Sequences and Constraints in Second Language Phonological Acquisition: Balancing Language-internal and Languageexternal Factors
4 Suprasegmental Measures of Accentedness
Part II: The Learner’s Approach to Pronunciation in Social Context
5 Understanding the Impact of Social Factors on L2 Pronunciation: Insights from Learners
6 L2 Accent Choices and Language Contact
7 Accentedness, “Passing” and Crossing
Part III: The Teacher’s Approach to Accent
8 Problematizing the Dependence on L1 Norms in Pronunciation Teaching Attitudes toward Second-language Accents
9 Phonological Literacy in L2 Learning and Teacher Training
10 Training Native Speakers to Listen to L2 Speech
Part IV: The Social Impact of Accent
11 Listener Expectations, Reverse Linguistic Stereotyping, and Individual Background Factors in Social Judgments and Oral Performance Assessment
12 Accent and ‘Othering’ in the Workplace
Part V: Conclusions
13 Future Directions in the Research and Teaching of L2 Pronunciation
Subject index

Citation preview

John Levis and Alene Moyer (Eds.) Social Dynamics in Second Language Accent

Trends in Applied Linguistics

Edited by Ulrike Jessner Claire Kramsch

Volume 10

Social Dynamics in Second Language Accent Edited by John Levis and Alene Moyer

ISBN 978-1-61451-228-8 e-ISBN 978-1-61451-176-2 ISSN 1868-6362 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. © 2014 Walter de Gruyter, Inc., Boston/Berlin Typesetting: PTP-Berlin Protago-TEX-Production GmbH, Berlin Printing: CPI buch bücher.de GmbH, Birkach ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

Table of contents Alene Moyer and John Levis Introduction  1 Part I: The Nature of Accent 1

Alene Moyer The Social Nature of L2 Pronunciation  11

Rachel Hayes-Harb 2 Acoustic-Phonetic Parameters in the Perception of Accent  31 Jette G. Hansen Edwards 3 Developmental Sequences and Constraints in Second Language Phonological Acquisition: Balancing Language-internal and Languageexternal Factors  53 4

Lucy Pickering and Amanda Baker Suprasegmental Measures of Accentedness  75

Part II: The Learner’s Approach to Pronunciation in Social Context Kimberly LeVelle and John Levis 5 Understanding the Impact of Social Factors on L2 Pronunciation: Insights from Learners  97 6

Erik R. Thomas L2 Accent Choices and Language Contact  119

7

Cecelia Cutler Accentedness, “Passing” and Crossing  145

Part III: The Teacher’s Approach to Accent Stephanie Lindemann, Jason Litzenberg, and Nicholas Subtirelu 8 Problematizing the Dependence on L1 Norms in Pronunciation Teaching: Attitudes toward Second-language Accents  171

vi   

   Table of contents

Debra Hardison 9 Phonological Literacy in L2 Learning and Teacher Training  195 10

Tracey M. Derwing and Murray J. Munro Training Native Speakers to Listen to L2 Speech  219

Part IV: The Social Impact of Accent Okim Kang and Donald Rubin 11 Listener Expectations, Reverse Linguistic Stereotyping, and Individual Background Factors in Social Judgments and Oral Performance Assessment  239 12

Gai Harrison Accent and ‘Othering’ in the Workplace  255

Part V: Conclusions John Levis and Alene Moyer 13 Future Directions in the Research and Teaching of L2 Pronunciation  275 Subject index  293

Alene Moyer and John Levis

Introduction

Late learners rarely become native-like, and although age is often thought to constrain attainment, it is unlikely to be the unitary cause of incomplete acquisition. Some researchers argue that there is no clear critical period for phonology (Flege, 1995), and that factors such as experience with the target language and social influences are central to long-term attainment (Moyer, 2011, 2013). The fact that individual variation is ubiquitous points to the need to better understand what underlies observed age effects (see Moyer, this volume). Many SLA scholars have recently turned their attention to the social, cultural, and psychological circumstances relevant to language acquisition beyond early childhood, and specifically to the ways that these circumstances influence the late learner’s approach to pronunciation (Piske, MacKay & Flege, 2001). It is clear that social contexts can shift language learning processes (Tarone, 2000), and that learners alter their language choices based on contextual and environmental influences. Considering the unique connection between accent and the individual learner’s sense of self, we believe that the field of L2 phonology is positioned to advance a more holistic agenda that focuses on the context for language learning and use. Our goal for this volume is to present new evidence and viewpoints from experts across various empirical approaches, and to outline the directions that future work can take. The chapters included here speak to this goal by illustrating how social factors of both an intrinsic and extrinsic nature contribute to phonological acquisition. At the same time, they emphasize the real-world consequences of speaking with an accent. Some of the questions that guide the chapter selections include: –– What can learners’ own views of accent tell us about why their pronunciation differs from native norms?1 –– Can accent be easily defined by specific segmental parameters, or is it primarily a question of suprasegmental fluency? –– What is the social impact of accentedness? Do listener perceptions and attitudes toward pronunciation affect learners’ socially, with implications for long-term outcomes in phonology?

1 Although pronunciation and accent are frequently used interchangeably, Moyer, in Chapter One, distinguishes accent as a more global construct that encompasses communicative, pragmatic, and social meaning and performance.

2   

   Alene Moyer and John Levis

We believe that such a reorientation is at hand given that second language (L2) learners who have socially-oriented reasons to use the language, and who are connected to more experienced users of the target language, show greater progress in the realm of accent. This surely has to do with their use of L2 relative to L1 (Flege, Munro & Mackay, 1995), but also with the quality of L2 experience that those connections represent (Moyer, 2004, 2011). Richer, more meaningful opportunities for practice and feedback are essential because non-native speakers (NNS) who regularly interact in meaningful ways with native speakers are significantly more likely to acquire a native-like accent than are those without such contact (see Lave & Wenger, 1991; Moyer, this volume). Shifting to a sociolinguistically-focused approach naturally prioritizes the issue of identity. Identity is neither a straightforward nor a rigid category, however, and we cannot assume that all learners hope to acquire a near-native sound system; some see accent as an important way to separate themselves from the target language and culture (Lybeck, 2002; see also Cutler, this volume). One of the authors’ colleagues had a Puerto Rican student who reminded the author of this reality; the student made little effort to adjust his noticeable accent in a course for international teaching assistants. When told that it might affect his ability to pass the test to be a teaching assistant, he said he preferred his accent just as it was because he was Puerto Rican. Puerto Rico’s dependence on the United States and its dominant use of Spanish rather than English likely led to his hold on accent as a marker of identity. Without understanding the broader context for his choice, we might have ascribed his lack of progress exclusively to factors beyond his control. Researchers in the field of L2 phonology now realize that language users approach accent, in large part, as a response to complex social pressures. Gatbonton, Trofimovich and Magid (2005) provide evidence of this through their examination of accent choices among Chinese learners in the bilingual context of Quebec, where both French and English are available as language targets. The authors’ Chinese speakers were faced with decisions about affiliation within and beyond their own social group. Ethnic group affiliation factors in this bilingual language contact situation played out as follows: those with stronger ‘foreign’ accents in L2 were considered to be better leaders of the L1 in-group, whereas those with more fluent L2 pronunciation were seen as better representatives of the L1 group in relation to outsiders. Simply put, the learners’ L2 accents affected perceptions of their loyalty toward their own ethnic group. Such studies as these, and others in this volume (e.g., Cutler; LeVelle & Levis), demonstrate the selfaware approach L2 users take to pronunciation based on social dynamics that operate within and among various peer groups.

Introduction   

   3

A focus on the learner naturally leads us to also consider the reception of L2 users who sound identifiably ‘non-native.’ Thus far, little research has been dedicated to this issue. Miller (2003) argues that “…to be authorized and recognized as a legitimate user of English by others, you must first be heard by other legitimate users of English” (p. 47) [italics ours]. She distinguishes between intelligibility, which puts the emphasis on the speaker, and audibility, where the listener plays a central role in determining legitimacy, i.e., who has the right to be heard. Audibility, in other words, is premised on verbally fitting in. This is a function of pronunciation, to be sure, but L2 users must also know how to answer complex questions; they must notice and use features of the group they wish to be a part of (e.g., use of the stigmatized discourse marker like was particularly important for fitting into the group in Miller’s study of high schoolers); and they must be able to interact in pragmatically appropriate ways. Such multi-layered ability, in turn, opens up greater opportunities to speak, and thus to notice (and adjust) one’s own speech patterns relative to one’s interlocutors. This circular relationship deserves more explicit scrutiny as well. How do L2 learners engage in this continual cycle of noticing, restructuring, and refining their speaking abilities? Several chapters herein speak directly to this important phenomenon. Another reason to prioritize a sociolinguistic orientation is in the interest of defining more contextualized classroom goals. Although learners need to interact in the target language in order to steadily improve, authentically social interaction often does not occur. In language classrooms, the language itself may be limited in scope and social concerns may not reflect authentic interaction. Beyond those walls, exposure to the target language may be practically non-existent. Alternatively, the target language may be dominant outside the classroom, or it may serve as a lingua franca between otherwise nonnative speakers of a target language, depending on the region. All of these contexts imply very different objectives for instruction. Which norms should the learner be encouraged to aim for? In the latter case, if the lingua franca is long-standing, it may have developed its own linguistic and social norms but these may not be explicitly taught. Nativized varieties of English (e.g., Indian English, Ghanian English, Singaporean English) flourish independently of native speaker norms, yet they are often still seen as second best (see Harrison, this volume). There is a persistent belief that L2 users must speak “British English” or ”American English” to be ‘valid,’ and even non-native speaker teachers in these regions may promote this notion for themselves and for their students because of the prestige of the dominant NS models (see Jenkins, 2009). Students may thus be caught between linguistic insecurity and linguistic pride, aware of the stigma attached to their accent by outsiders, and reluctant to communicate with native speakers as a result. The teacher is uniquely positioned to discuss these realities openly and to raise awareness

4   

   Alene Moyer and John Levis

about the reception of accent. Several chapters in this volume assess current classroom assumptions and provide specific ideas on how to present pronunciation as an integrated part of speaking fluency (e.g., chapters here by Derwing & Munro, Hardison, Harrison, LeVelle & Levis). This volume was inspired by an invited colloquium at the 2011 Second Language Research Forum (SLRF) at Iowa State University, entitled Social Influences on the Acquisition of L2 Phonetics and Phonology. Its three presentations - by Alene Moyer, John Levis & Kimberly LeVelle, and Stephanie Lindemann - explored new ways of understanding the acquisition of pronunciation by adult language learners. There is a resurgence of interest in this topic, particularly regarding the social relevance and reception of L2 accent overall. As researchers and teachers, we feel it is crucial to be aware of the social and contextual factors that contribute to negative attitudes toward ‘sounding foreign’. We also hope to dedicate more attention to learners’ own awareness of social and contextual factors relevant to accent as they formulate an approach to phonological learning. This introductory chapter raises several issues to that end, and the following chapters elaborate on the new perspectives taking shape in the scholarship. Beginning Part I, The Nature of Accent, Moyer’s The Social Nature of Accent (Ch. 1) offers a definition of accent that is inherently dynamic, and suggests a balanced view of the constraints that operate on adult phonological acquisition – age, identity, attitudes, etc. Moyer also introduces a few key concepts for understanding the reception of accent, such as intelligibility, comprehensibility, and accommodation. Given the close connections between accent and self-image, Moyer also reiterates the role of social concerns for learning decision-making and agency. The chapters that follow in this section explore in greater detail the many intrinsic and extrinsic influences that shape phonological learning. In Chapter 2, Acoustic Parameters in the Perception of Accent, Rachel Hayes-Harb reviews research on the acoustic-phonetic properties of speech to show how segmental and suprasegmental levels of pronunciation contribute differentially to subjective accent ratings. Her comprehensive analysis reminds us that methodological differences across studies may contribute to misleading assumptions about objectivity in perceptions of ‘foreignness.’ Chapter 3, Developmental Sequences and Constraints in Second Language Phonological Acquisition: Balancing language-internal and language-external factors, by Jette Hansen-Edwards, examines the processes that underlie phonological learning in specific domains – vowels and consonants, syllable structure, stress, prosody, etc. – while situating theoretical models of transfer, markedness, and developmental processes in a broader framework. Her analysis is grounded in the understanding that learners have some choice here, e.g., they sometimes

Introduction   

   5

target non-standard over standard L2 variants due to peer pressure or the desire to signal a certain identity. Lucy Pickering and Amanda Baker, in Suprasegmental Measures of Accentedness (Chapter 4) provide an in-depth look at how specific features such as pitch movement, stress, and pause influence perceptions of intelligibility, comprehensibility, and accentedness. Their review raises interesting questions about the subjectivity inherent in judgments of accent based on listener background variables such as native speaker status, sociopolitical factors (including identity), and familiarity with a given accent. Part II, The Learner’s Approach to Pronunciation in Social Context, addresses the challenges of L2 pronunciation from the learner’s point of view. Chapter 5, by Kimberly LeVelle and John Levis, Understanding the Impact of Social Factors on L2 Pronunciation: Insights from Learners, introduces introspective data from non-native speakers on the relevance of identity, awareness of stigma, and ‘imagined communities’ as related to their willingness to communicate and to establish L2 social networks. LeVelle and Levis propose a ‘Sociolinguistic Core’ for pronunciation pedagogy based on a multi-level approach to linguistic and social fluency. Continuing this emphasis on the consequences of L2 pronunciation, we consider how learners in a bilingual community learn to associate social traits with accent, and decisively adopt certain variants in response to cultural difference and ethnic conflict. L2 Accent Choices and Language Contact, by Erik Thomas (Chapter 6) presents a case study of strong ethnolinguistic differentiation among older residents of a small Rio Grande Valley town. Here, Mexican Americans are holding onto unique pronunciation features in their English to preserve and assert ethnic distinctions, even as the Anglo citizens’ English shows phonetic influences from Spanish. The fact that L2 users themselves are aware of phonetic/phonological variation and also of the attitudes that such variation evoke, is increasingly recognized in the research. Chapter 7 by Cecilia Cutler, Accentedness, ‘Passing’ and Crossing, addresses social interaction and identity construction as critical to L2 pronunciation. Her data demonstrate the extent to which advanced learners in bilingual and multilingual settings model and control their phonological performance, even in the absence of contact with the group they aspire to sound like. Importantly, Cutler presents an aspect of the ‘passing’ phenomenon we rarely see: that learners are highly conscious of race and social conflict in target language societies, and may choose to ‘represent’ in solidarity with marginalized groups. In Part III, The Teacher’s Approach to Accent, Stephanie Lindemann, Jason Litzenberg, and Nicholas Subtirelu demonstrate that L2 users have specific, sometimes contradictory, attitudes toward pronunciation, in Problematizing the

6   

   Alene Moyer and John Levis

Dependence on L1 Norms in Pronunciation Teaching: Attitudes toward SecondLanguage Accents (Ch. 8). As the authors note, learners realize that non-native speech is often stigmatized by native speakers, yet they are prone to such devaluations themselves, even under controlled matched guise conditions (where speech samples are erroneously said to represent people of various backgrounds). Given the evidence for accent-related prejudice, the authors argue for a pedagogical approach to counter the traditional, unrealistic emphasis on ‘accent reduction.’ Further examination of what accent signifies in acoustic terms and how we should best treat it in pedagogical settings comes from Debra Hardison in Phonological Literacy in L2 Learning and Teacher Training (Ch. 9). She reiterates these challenges for teachers and describes how technological tools can be effectively used to advance phonological literacy. Based on data from MA TESOL program participants, Hardison highlights what teachers-in-training think they should know, analyzes the pronunciation stumbling blocks they are likely to encounter as non-native speakers themselves, and argues for a balanced approach to oral interaction skill development – one that prioritizes collaborative tasks and reflective components. If negative accent attitudes are so prevalent, can anything be done to balance the communicative burden and minimize comprehension difficulties between native and non-native speakers? This is the theme of Tracey Derwing and Murray Munro’s Chapter 10, Teaching Native Speakers to Listen to L2 Speech, which paints an overarching view of the salience of accent in an era of globalization. In the classroom, teachers can address the communicative and social relevance of pronunciation, but to do so, they must become well-versed in a number of areas themselves. Derwing and Munro outline specific recommendations for this preparation, and for targeting accent attitudes – both inwardly- and outwardlydirected. In Part IV, The Social Impact of Accent, Okim Kang and Donald Rubin further explore the connections between language attitudes and social identity. Their chapter entitled Listener Expectations, Reverse Linguistic Stereotyping, and Individual Background Factors in Social Judgments and Oral Performance Assessment (Ch. 11) examines ‘reverse linguistic stereotyping’, a phenomenon whereby listeners anticipate differences in accent and speech fluency based on a visual (or other) prompt, even before hearing the speaker’s voice. This has major ramifications for the workplace, as Kang and Rubin point out, where important impressions and decisions are made based on how someone ‘sounds.’ Chapter 12 by Gai Harrison, Accent and ‘Othering’ in the Workplace, brings this concern full circle. Harrison provides confirmation that some L2 users ‘self-select’ out of certain jobs based on the attitudes that accompany non-native speech. As she puts it, accent is ‘cultural capital’ which makes it more than simply a marker

Introduction   

   7

of identity – it is an economic commodity. This is important to keep in mind as language learners, researchers, and teachers. The issue of pronunciation standards should be openly questioned within each L2 use context so that the native speaker ideology is appropriately challenged. Researchers across the spectrum of applied linguistics and pedagogy increasingly recognize the central role of phonology for diverse areas, from literacy development (Walter, 2008) to oral fluency (Derwing, Munro, Thomson & Rossiter, 2009), to identity (Piller, 2002). In Part V, Conclusions, we come back to the relevance of L2 pronunciation for overall oral fluency while reiterating the difference between the two. Our final chapter by Levis and Moyer, Future Directions in L2 Pronunciation Research and Teaching, summarizes some of the most significant findings from the preceding chapters, outlines methodological challenges, and recommends ways to move the research forward. L2 phonology scholars must continue to explore the role of social contact and networks for pronunciation outcomes, and must take a more holistic view of the acquisition process itself. We still do not know how learners seek out and utilize target language input, for example, while gaining confidence and trying on a new identity in L2. As for the instructional realm, we maintain (along with our contributing authors) that the classroom is the place where an awareness for, and appreciation of, social context is both necessary and possible. Not only can we potentially shape the learner’s approach to pronunciation skill development, but as teachers we can also raise awareness of accent attitudes that may have real consequences for them. This would likely involve various kinds of training and open discussion, predicated on the ability of students and teachers alike to question their a priori assumptions about accent. August, 2013

References Derwing, T.M., Munro, M.J., Thomson, R.I., & Rossiter, M.J. 2009. The relationship between L1 fluency and L2 fluency development. Studies in Second Language Acquisition 31(4), 533–557. Flege, J.E. 1995. Second language speech learning: Theory, findings, and problems. Speech perception and linguistic experience: Issues in cross-language research, 233–277. Flege, J.E., Munro, M.J., & MacKay, I.R. 1995. Factors affecting strength of perceived foreign accent in a second language. The Journal of the Acoustical Society of America 97, 3125. Gatbonton, E., Trofimovich, P., & Magid, M. 2005. Learners’ ethnic group affiliation and L2 pronunciation accuracy: A sociolinguistic investigation. TESOL Quarterly 39(3), 489–511. Jenkins, J. 2009. English as a lingua franca: interpretations and attitudes. World Englishes 28(2), 200–207.

8   

   Alene Moyer and John Levis

Lave, J., & Wenger, E. 1991. Situated learning: Legitimate peripheral participation. Cambridge: Cambridge University press. Lybeck, K. 2002. Cultural identification and second language pronunciation of Americans in Norway. The Modern Language Journal 86(2), 174–191. Miller, J. 2003. Audible difference. ESL and social identity in schools. Clevedon, UK: Multilingual Matters. Moyer, A. 2004. Age, accent, and experience in second language acquisition: an integrated approach to critical period inquiry (Vol. 7). Clevedon, UK: Multilingual Matters Limited. Moyer, A. 2011. An Investigation of experience in L2 phonology: Does quality matter more than quantity? Canadian Modern Language Review/La Revue canadienne des langues vivantes 67(2), 191–216. Piller, I. 2002. Passing for a native speaker: Identity and success in second language learning. Journal of Sociolinguistics 6(2), 179–208. Piske, T., MacKay, I.R., & Flege, J.E. 2001. Factors affecting degree of foreign accent in an L2: A review. Journal of phonetics 29(2), 191–215. Tarone, E. 2000. Still wrestling with ‘context’ in interlanguage theory. Annual Review of Applied Linguistics 20(1), 182–198. Walter, C. 2008. Phonology in second language reading: not an optional extra. TESOL Quarterly 42(3), 455–474.

Part I: The Nature of Accent

Alene Moyer

1 The Social Nature of L2 Pronunciation Acquiring a new sound system after a certain age is universally seen as difficult. Few late language learners end up sounding native-like, even if they reside many years in the target language country. While researchers can identify various processes and features that characterize the early stages of phonological development, it remains something of a mystery why, even at the advanced stage, a foreign-sounding accent is such a persistent feature of otherwise fluent speech. It is surely due, in large part, to the sheer complexity of phonology. L2 learners must learn to distinguish unfamiliar sound categories and produce new sounds in sequences that sometimes contradict L1 phonological patterns. Moreover, they must realize that slight variations in intonation, rhythm, speech rate, etc. convey pragmatic, culture-specific levels of meaning. In other words, their communicative fluency relies on utilizing a host of segmental and suprasegmental features to good effect. Age of learning is thought to predict just how far they get in terms of such fluency. But given that L2 learners have individual goals when it comes to pronunciation, and circumstances may not effectively support the acquisition of these many features, it is no surprise that variation in long-term attainment is so widespread. Accent, “a set of dynamic segmental and suprasegmental habits that convey linguistic meaning along with social and situational affiliation” (Moyer, 2013), is typically seen as either a cognitive skill learned through effortful practice and self-monitoring, or a highly personal reflection of identity and affiliation with the target language. The accent as skill perspective emphasizes the ability to form new perceptual categories. This ability is thought to be encumbered by maturational changes that are neuro-biological and/or cognitive in nature. By contrast, the accent as identity view seeks to understand the importance of attitudes, motivation, and the extent of one’s desire to sound like a native speaker of the target language. It therefore seeks context-specific explanations for phonological attainment related to the depth and breadth of the one’s social networks, one’s attitudes regarding both L1 and L2, and even external (listener) attitudes about L2 speakers and their accents. Both perspectives should be appreciated, because more than any other aspect of language, phonological acquisition draws on both cognitive and affective influences. Accent is both deeply internal  – cognitively and psychologically – and inherently social in nature, regardless of the specific learning circumstances. This holds as true for the mother tongue(s) as it does for any subsequently learned languages (Moyer, 2013).

12   

   Alene Moyer

Pronunciation1 in another language is not simply a matter of making oneself understood on an acoustic level. Pronunciation conveys linguistic meaning at the same time that it indicates social identity and communicative stance. This is because the way we sound overall, our accent, is in many ways an expression of who we are, where we come from, and who we would like to be. L2 phonology2 is an area that therefore merits investigation on many levels. The segmental level is obvious in constructs like degree of accent, while a broader understanding of communicative, discursive fluency is front and center when we speak of suprasegmental skills. In L2 pronunciation research we find a correspondingly wide range of investigative approaches, from lab-based studies utilizing spectrographic and wave form analytics, to descriptive analyses of interactions between native and non-native speakers, to statistical and ethnographic studies of individual factors in long-term attainment. This opening chapter draws on several approaches to advance an understanding of pronunciation as a dynamic, inherently social phenomenon. To underscore the complexity of phonological skill building in a second language, we address several prominent issues in the research, including age as a factor in the ability to perceive and produce new sounds; social and psychological influences on pronunciation; and the communicative and social reception of accent. In so doing, the groundwork is laid for the in-depth analyses offered in the following chapters.

1.1 Accent and Age of the Learner As a rule, second language learners rely on the mother tongue(s) and previously learned languages as a knowledge base, drawing comparisons between these sound systems at the early stages of learning. Problems sometimes ensue. Features that are similar between L1 and L2 are oftentimes presumed to be identical, and finer L2 contrasts may not be noticed at all (e.g., the initial /t/ in terrific could sound like a /d/ if not sufficiently aspirated). Indeed, Flege (1995) has long argued that very similar features are far more difficult to notice than completely novel ones. If the learner does not perceive such nuances, she is not likely to artic1 In this chapter, pronunciation is used somewhat interchangeably with accent, although accent can be seen as a more global construct, reaching beyond articulation to encompass communicative, pragmatic, and social meaning and performance. 2 Phonology is used here in a general way to refer to both phonetics and phonology, as is common in the research on second language accent, except when specific phonetic contrasts are emphasized.



The Social Nature of L2 Pronunciation   

   13

ulate them. In some cases, altogether new categories must be acquired (typical English examples are the /æ/ as in cat and voiced and voiceless /ð, θ/ as in there and month). In other cases, familiar categories must be applied to new linguistic environments (e.g., voicing a final obstruent, so that bad does not sound like bat). Inappropriate substitutions and deletions – the most common developmental strategies – may allow communication to move forward (see Hansen-Edwards, Ch. 3, this volume), but they also signal to the listener that the speaker is nonnative, which could evoke negative responses, depending on the context (see Lindemann, Litzenberg & Subtirelu, Ch. 8; Rubin & Kang, Ch. 11, this volume). Adding stress, segment length, intonation, speech rate, pitch, and rhythm to the mix only complicates the challenges. These appear to be even harder for learners to notice than segmental distinctions. So, why is it so difficult to perceive and produce sounds and sound patterns in a new language? The most commonly cited reason for such difficulties is the learner’s age at first exposure to the target language. Lenneberg’s Critical Period Hypothesis (1967) suggested that the onset of puberty – around age 9 or 10 years – marks a turning point in the ability to fully recover, and by implication to acquire, language. He further predicted that phonology would present the greatest challenge in this regard. The assumption is that the neural cells related to phonological acquisition cease to be adaptive past a certain age. Put another way, the intractability of a foreign accent can be blamed on a decline in neural plasticity. Indeed, there is evidence that perceptual faculties become biased toward L1 within the first year of life (Werker & Pegg, 1992), but this does not rule out the ability to learn new categories later on. Escudero and Boersma (2004) demonstrate that adults can form specific phonetic biases in the L2 as a function of exposure frequency, for example. In point of fact, the plasticity explanation seems to be losing ground in light of some compelling counterevidence. As a skill that requires both higher order (analytical) and lower order (motorbased) processing, phonology relies on multiple neuro-cognitive capacities, any of which could gradually decline over the learner’s lifetime (e.g., hearing, memory, etc.). Such declines are not associated with a specific event like maturation. Moreover, recent studies using MRI (magnetic resonance imaging) and ERP (event-related potentials) technologies indicate that the brain is actually responsive and dynamic well into adulthood (see Herschensohn, 2007; Stowe & Sabourin, 2005). Sereno and Wang (2007) show that adults can be taught to accurately perceive tones in an unfamiliar tonal language after just two weeks of training, and that neural activity actually changes in response to new stimuli. In other words, new neural areas are recruited for processing novel sounds, and this shift is still apparent for some time following training. Another important finding in this study is that perception positively impacts production, highlighting the

14   

   Alene Moyer

brain’s continuing adaptability, even for phonological learning. Such evidence casts doubt on a strict interpretation of the critical period. One prominent theoretical debate is whether the faculties used for phonological learning early in life are still accessible later on, or whether other processing mechanisms step in to fill the gap. Best, McRoberts, and Sithole (1987) found that English-speaking adults can be taught to accurately discriminate African click sounds. A cognitive processing argument would suggest that the click category was so unfamiliar it triggered greater ‘noticing’, which then enhanced perception (see also Hancin-Bhatt, 1994). Simply put, what we observe as age-related constraints on categorical perception are really just processing biases, i.e., habits that have solidified over time with repeated exposure. Recent neural imaging data confirm that although late L2 learners process the target language differently from native speakers, this is likely a function of experience rather than age at first exposure (Birdsong, 2006; Steinhauer, White & Drury, 2009; see also Muñoz, 2006). Numerous studies have analyzed the correlation between age and imitative ability (Markham, 1997; Reves, 1978; Thompson, 1991), usually on the basis of tasks wherein words and phrases are first modeled by a native speaker. This is not a true test of attainment, but this kind of immediate, decontextualized performance may demonstrate a connection between perceptual ability and production. The idea that some people are uniquely equipped to master a new sound system as a function of musical ability is also intriguing, but hard evidence is lacking (see Piske, MacKay & Flege, 2001). Perhaps those we think of as ‘talented’ mimics have a special sensitivity to rhythm and melody. These prosodic features are primarily processed in the right hemisphere (see Nardo & Reiterer, 2009), so exceptional learners may simply be bilateral rather than ‘left-hemisphere dominant’ in their language processing (see discussion in Moyer, 2013). By other short-term measures, age effects are not consistent with a strict critical period; older learners tend to outperform younger ones on a range of phonology-related tasks (Snow & Hoefnagel-Hoehle, 1982). Looking longer term, Garcia-Lecumberri and Gallardo (2003) studied Basque-Spanish bilingual school children of various age groups acquiring English in a classroom. Targeting the third year of instruction, the authors showed that those who had started learning English as 11-year-olds were rated more intelligible and less foreign-sounding, and they also excelled on a vowel and consonant perception task compared to the two younger groups studied (Age of Onset of 8 years and 4 years, respectively). By most measures, the younger groups did not differ from one another significantly. In another set of tasks on similar groups of learners (also part of the Barcelona Age Factor Project database), Fullana (2006) found that older learners were significantly better on perception tasks regardless of amount of exposure (in classroom hours); however, age differences were negligible after a certain amount of



The Social Nature of L2 Pronunciation   

   15

instruction. Neither younger nor older learners were on par with native speaker controls for any of the tasks (see also Muñoz, 2011). Two conclusions are noteworthy here: (a) instruction can mitigate age effects; and (b) early exposure does not guarantee native-like attainment. Across studies, statistical analyses verify a significant relationship between pronunciation and age of onset (AO) with the target language (Asher & Garcia, 1969; Oyama, 1979; cf. Purcell & Suter, 1980), but numerous factors co-vary with AO. Some are external to the learner  – length of residence and access to instruction, to name two – and some are internal such as attitudes and motivation. Indeed, all of these factors are significant for pronunciation attainment in their own right (Bongaerts et al., 1995, 1997; Flege & Liu, 2001; Moyer, 1999, 2007; Oyama, 1976; Purcell & Suter, 1980; Trofimovich & Baker, 2006). Importantly, some studies have actually tested age against its concomitant factors (see Flege & Liu, 2001) and find it to be less significant than language contact and use (Flege et al., 1999; Moyer, 1999, 2004, 2007, 2011). The need to carefully analyze factors that overlap with AO cannot be overstated. Let us take length of residence as an example. Generally speaking, the younger the arrival, the longer the residence, which certainly affects the quality of target language experience (not to mention one’s motivation to sound nativelike). Younger arrivals predictably enjoy the benefits of formal schooling in the target language and quickly form friendships, thereby expanding their opportunities to learn and use L2 in many different contexts. This surely paves the way for a new sense of self in the target language. By comparison, late arrivers tend to have fewer opportunities to interact meaningfully with native speakers, and must work hard to build social networks that support language acquisition. Also, Moyer (2004) has verified that length of residence relates significantly to personal motivation to acquire the target language, and to an overall sense of satisfaction with pronunciation. On its own, a ‘simple’ metric like length of residence (LOR) cannot be taken at face value. Among late arrivers, LOR is unlikely to show significance for pronunciation in the mid-range of 2–5 years, but from about 8–10 years on, its significance is robust, as seen in Flege et al. (1999) and Moyer (2007). This is indicative of a shift in language dominance – a symbolic turning point that deserves much closer examination. L2 dominance implies that the target language has taken on the emotional and social functions typically associated with the mother tongue, and to the extent that this happens, “the relationship between accent and LOR becomes clear and predictable” (Moyer, 2008: 173). Those who speak the target language in the home are surely not comparable to those whose L2 use is far more circumscribed, and whose affiliation with the target language community is limited.

16   

   Alene Moyer

The age effects paradigm is beginning to acknowledge these realities, especially the idea that interaction is key to phonological learning (see Best & Tyler, 2007; Kuhl, 2007; Kuhl et al., 2008). Regarding first language acquisition, Kuhl (2007) specifically connects social interaction in the mother tongue with higher levels of attention and motivation, which in turn enhance the infant’s ability to remember and encode new phonological information. For L2 acquisition, Moyer (2011) demonstrates that accent correlates most significantly to contact with native speakers, especially as that represents interactive L2 use in various domains. This emphasis on interactive experience is also relevant for classroom learners. For eight semesters, Sardegna (2009) studied ESL learners who had taken a pronunciation course, testing their pronunciation at various intervals along the way, and for up to three years post-instruction. Her qualitative data suggest that the variation in long-term attainment can be attributed to an individual learner’s sense of urgency toward improving pronunciation and to the quantity and quality of target language practice. By now it is clear that age is not a unitary explanation for phonological attainment given that it co-varies with many affective, cognitive, and experiential factors. Moreover, early L2 exposure does not guarantee native-like mastery, and some late learners do end up sounding like native speakers (Bongaerts et al., 1995, 1997; Ioup et al., 1994; Muñoz & Singleton, 2007; Moyer, 1999; Nikolov, 2000; Purcell & Suter, 1980). Recent evidence suggests that age-related disparities in phonological attainment have much to do with the learner’s orientation toward, and experience with, the target language. The researcher’s task is therefore to better understand why some learners desire to sound native-like, and how they utilize the resources at hand to improve pronunciation.

1.2 Attitudes, Identity, and Agency in L2 Pronunciation Language attitudes, the circumstances of L2 learning, and the learner’s own awareness of accent are rarely addressed in age effects research, yet are crucial for understanding how the learner approaches pronunciation. In the field of SLA generally, individual differences are now acknowledged as more revelatory than universal patterns of development, for it is differences in decision-making and goal-setting that set some L2 users on the path to more native-like fluency. Their views of the target language culture, and even of accent itself, all inform their approach to phonological skill-building (see LeVelle & Levis, Ch. 5, this volume). The significance of affective factors was famously explored by Guiora and colleagues in the 1970s in a series of experiments designed to probe the ‘language ego’ as related to pronunciation (Guiora et al., 1972, 1980). Many subsequent (less



The Social Nature of L2 Pronunciation   

   17

controversial) investigations have since confirmed the connection between pronunciation accuracy and attitudes, motivation, and the desire to affiliate with the target language community (Bongaerts et al. 1995; Flege et al., 1999; Moyer, 1999, 2007, 2004; Purcell & Suter, 1980). Evidence shows that both integrative and instrumental motivation make a positive difference for accent, but what likely matters more is a sustained desire over time to improve one’s fluency and/or to sound like a native speaker (see Moyer, 2007; Muñoz & Singleton, 2007). L2 learners often say they want to sound native (e.g., Derwing, 2003, both cited in Derwing & Munro, 2009; Timmis, 2002), yet we know little of their specific efforts toward that goal. It is reasonable to assume that the desire to sound native encourages concrete behaviors directed at improving phonological accuracy, e.g., imitating native speakers, practicing aloud, asking for feedback, reflecting on problem sounds, increasing social contact, and so on (Moyer, 2004). Attitudes, sense of self, and cultural affiliation are certainly relevant for those who immigrate to another country and feel the need to take on a new language identity. Gardner has written that L2 learners must learn not only new information (vocabulary, grammar, pronunciation, etc.) but must also acquire “symbolic elements of a different ethnolinguistic community“ [original italics] (1979: 193). Lybeck’s (2002) study of 9 adult American women living in Norway attests to this challenge. Only those who managed to establish solid networks got beyond their feelings of social distance. Lybeck maintains that as a result, their accents were the most authentic-sounding. Along similar lines, Hansen’s (1995) study of 20 German-born immigrants to the U.S. emphasizes language dominance, cultural integration, attitudes, and speech community, alongside length of residence. Of these, speech community size and cohesiveness were particularly significant for degree of accent. Not surprisingly, fear of embarrassment and ridicule when speaking the target language correlated negatively to accent (p. 313), recalling Schumann’s (1975) prediction that anxiety contributes to a rigid ‘language ego’, increases inhibitions, and hinders the formation of close bonds with native speakers. The implications for language acquisition are clear. A political orientation toward one’s own ethnic group seems to affect L2 pronunciation as well. Gatbonton, Trofimovich and Segalowitz (2011) showed that among French Canadians in Quebec, strong support for their ethnic group’s political aspirations significantly correlated to a less accurate pronunciation of English /ð/. This pattern was strongest for beginners; advanced participants were significantly less political in their ethnic affiliation. The authors’ statistical analyses suggest that L2 use ‘mediates’ the relationship between ethnic group affiliation and oral proficiency; more contact translates to more opportunity for exposure and ‘attunement’ to the correct pronunciation (p. 198).

18   

   Alene Moyer

The struggle to find one’s place vis-à-vis a new language applies to instructional contexts as well, even among younger learners. Miller’s longitudinal, ethnographic study of high school ESL students in Australia highlights the fact that “in a context of second (or third) language acquisition, speaking audibly and without anxiety is an enormous challenge for most students” (2003: 141). Being reticent to speak, and exhibiting ‘resistance’ behaviors related to accent underscore the close connection between accent and identity. In an FL classroom, where the psychological stakes are relatively low by comparison to immersion contexts, some students purposefully mispronounce words to express solidarity with their peers (Lefkowitz & Hedgcock, 2002, 2006.) So, where we might assume that classroom (FL) learners are motivated by a desire for accuracy, they may in fact be more responsive to the discomfort felt by sounding like ‘someone else.’ In other discussions, identity is seen as highly personalized and fluid by virtue of the fact that it is continually (re)constructed with others (as in Norton Peirce, 1995; Pavlenko & Lantolf, 2000) . This means that the roles and discursive voices that we take on are neither static, nor strictly internal (Bucholtz & Hall, 2005). Miller has aptly described language acquisition in multilingual contexts as a messy process; we move “constantly across languages, sites and social memberships, mixing languages, learning languages, resisting languages, tentatively testing them out, and then reverting to more familiar linguistic and social territory” (2003: 142). Several close examinations of the identity/accent connection underscore this notion of fluidity. Marx (2002) offers an inside view of her own struggle to develop a voice in her surrounding community in Germany while studying abroad. She initially tried to pass as French while speaking German to avoid being labeled ‘American’. (As a Canadian, this was a significant concern for her.) Once she felt ready to attempt an authentically German accent, one year into her stay, she took on this task in earnest and by her own account ended up with several German features in her English pronunciation. When Marx eventually moved to New York three years later, she saw her German-influenced English as an advantage; it afforded her a more interesting status as an outsider. A similar case is presented by Major (1993), who analyzes the extraordinary language abilities of an American woman who had lived in Brazil for 12 years. Not only was her Portuguese native-like according to both listener perceptions and acoustic analyses, her (native) English phonology was influenced by Portuguese voice onset time (VOT)3 for /p, t, k/ according to casual speech tasks. Deeply immersed in Brazilian culture and language, she stood in contrast to other partic3 Voice onset time (VOT) is the time elapsed between the release of airflow, or burst, from a closure and the beginning of vocal cord vibration. Measured in milliseconds, it can distinguish phonemes, e.g., /b/ and /p/.



The Social Nature of L2 Pronunciation   

   19

ipants with much longer residences (34–35 years). Their pronunciation was still markedly foreign-sounding, and their assimilation process was clearly stymied by comparison. At the time of data collection, however, the exceptional learner was experiencing a phase of disillusionment with Brazil and moved back to the U.S. shortly thereafter. She subsequently lost all remnants of a Portuguese accent in her (native) English, and being fully aware of the change, reported that this was a result of her renewed sense of affiliation with the U.S. These stories reveal very conscious, ongoing struggles to control pronunciation in both L1 and L2 for social and psychological reasons. Some late learners can manipulate accent, taking pride in their ability to adopt regional and/or social pronunciation patterns in the target language (Piller, 2002; Rindal, 2010). Others simply ‘play’ at sounding native only when outside the native-speaking community (as seen in Moyer, 2004). But negotiating an authentic ‘voice’ is not simply a matter of aligning oneself with others, it is also about maintaining one’s individuality, as explored in Cutler’s chapter (this volume). In other words, authenticity may be more about sounding notquite- or even not-at-all native; not everyone is willing to give up their own identity and history in an effort to blend in with their L2 community (see Grazia Busa, 2010). For example, Moyer’s (2004) qualitative analysis describes two Turkish immigrants to Berlin with vastly different approaches to cultural assimilation. Although both immigrated by age four (well within the critical period), only the one who had assimilated culturally was rated as native-like for accent across all speaking tasks; the other rejected German culture and was judged to have a strong foreign accent. Approaches toward language learning and cultural assimilation were also compared for two late learners in the same study, with similar results: One judged to be native-like for most tasks had fully embraced the language as integral to her sense of self despite a number of social and psychological setbacks; the other had consciously chosen to maintain a strong American accent out of discomfort with German culture, even though this conflicted with his professional goals of teaching the language. It is only through such in-depth, ethnographic analyses that we come to appreciate the complex decision-making that shapes long-term phonological development. While no one would claim that all who wish to sound native-like can attain their goal, those who do end up sounding native-like surely have sought this consciously, for without a deep desire to affiliate with the L2 group, pronunciation skills are unlikely to develop past a point of reasonable intelligibility.

20   

   Alene Moyer

1.3 The Reception of L2 Pronunciation A more holistic appreciation of both processes and outcomes in language acquisition is gaining ground in the SLA literature of late. In light of this shift, a critical contribution of this volume is to situate individual attainment phenomena in their appropriate social frameworks. I therefore address the reception of accent, i.e., accent-related attitudes and their consequences, which is more fully developed in several chapters in Part II of the volume. Our ears apparently have an extraordinary filter when it comes to distinguishing L2 users from native speakers, even for speech samples as brief as 30 seconds (Flege, 1984, cited in Derwing & Munro, 2009). Invariably, perceptions of prestige and social attractiveness arise when we encounter a speaker whose accent is different from our own. These perceptions have much to do with ethnicity, social class, and gender, among other traits (Lindemann, 2003; Rubin, 1992; see also Thomas, this volume). Moreover, decades of empirical work confirm that a priori notions of attractiveness and prestige mediate our responses – and responsiveness – to accent (Giles & Coupland, 1991; Niedzielski & Preston, 2003). As interlocutors, we may be less willing to accommodate others linguistically and communicatively if we see them in a negative light. Rather than trying to approximate the speech patterns of our interlocutor, we may purposely signal distance by avoiding adjustments in vowel and consonant qualities, speech rate, and pause and pitch patterns (see Coupland, 1984). The point is that accent is communicatively dynamic; it can fluctuate in a given situation to signal speaker stance and identity, just as it fluctuates over time as a function of social and cultural affiliation. To claim, or even imply, that all L2 users should be judged against a native-like target is controversial, indeed. There is now a fervent call to abandon the ‘nativeness principle’ (Levis, 2005), and to embrace a standard of intelligibility instead. Intelligibility is a measure, not of acoustic approximation to a native target, but of the listener’s actual ability to decode an utterance (Derwing & Munro, 2009: 478). Comprehensibility, by contrast, is said to correspond to the effort required to listen and understand (i.e., whether extra measures are necessary for full comprehension). Intelligibility is also used in a general way to refer to a more politic and practical standard for L2 pronunciation, as compared to nativeness. The challenge for researchers is to define the parameters of intelligibility since this, too, is influenced to some degree by context and listener attitudes. Many criteria affect perceptions of intelligibility, including listener familiarity with the topic (Gass & Varonis, 1984); perceptions of speaker personality (Albrechtsen et al., 1980); background noise (Rogers et al. 2004; Van Wijngaarden et al. 2002); and even head movement and facial motions that cor-



The Social Nature of L2 Pronunciation   

   21

respond to pitch and loudness (Mumhall et al., 2004). Various linguistic criteria also come into play, in particular, syntactic complexity and mean length of utterance, voice onset time, syllable duration, word-level and phrasal stress, intonation, and pitch range. Derwing and Munro’s work (e.g., Derwing & Munro, 1997; Munro & Derwing, 1999) has shown that intelligibility may be judged on the basis of grammar as well (see also Ensz, 1982; Gynan, 1985). Looking across studies, it seems that speech rate is more important than any other single criterion (Anderson-Hsieh & Koehler 1988; Derwing & Munro, 1997; Kang, 2010; Munro & Derwing, 1998, 2001; Trofimovich & Baker, 2006; see Hayes-Harb, this volume). In Chapter 2, Hayes-Harb elaborates on the many acoustic aspects that affect listener perceptions of non-native speech, confirming that even a seemingly simple construct like ‘degree of accent’ is anything but simple, and intelligibility is even harder to nail down. Once we consider discourse-level features, intelligibility becomes even more fluid; every feature seems capable of sending its own communicative, pragmatic, or social message beyond the intended semantic one. Using an unexpected pitch range can make the difference between sounding angry or amused, for example. L2 users may not be fully versed in the effects such a mismatch has on their listeners (see Pickering & Baker, Ch. 4, this volume). For example, many L2 users do not consistently use pitch contrasts to signal meaning or increase pitch levels at rhetorical junctures that correspond to topic shifts (Wennerstrom, 1994, 1998). As Pickering asserts, “intonational cues are particularly vulnerable to misinterpretation” and can trigger negative responses if they are heard as rude, uncooperative or unfriendly (2009: 237). Research on the intelligibility of international teaching assistants points to discursive and intonational misfires that lead to comprehension problems, and give the impression that the ITA is unfriendly and ill-equipped to teach (see Lindemann, Litzenberg & Subtirelu, Ch. 8, this volume). So it seems that intelligibility is significantly related to context, and listeners are less tolerant of a foreign accent in formal situations (see Ryan et al., 1975). In other words, what we as listeners ‘tolerate’ in terms of degree of accent may depend on whether we are listening in a low-stakes or a high-stakes setting. Miller’s analysis of her own conversations with a non-native speaker of English in an Australian school setting underscores the frustration that can ensue, even when both interlocutors have the best intentions for getting beyond communication breakdowns (2003). Does familiarity with a given foreign accent correlate to greater intelligibility? Smith and Rafiqzad (1979) found that speakers with the same mother tongue background as their listeners were judged to be more comprehensible (as did Bent & Bradlow, 2003; Derwing & Munro, 2005; Kennedy & Trofimovich, 2008), but the linguistic complexity of the speech excerpt played a significant role. It is surely

22   

   Alene Moyer

no coincidence that in their study, native speakers generated much longer, more complex utterances, and they were rated less comprehensible than the accented, non-native speakers. Hayes-Harb et al. (2008), Major et al. (2005), Munro et al. (2006), and Van Wijngaarden et al. (2002) have all observed that speakers with the same linguistic background as their listeners had no advantage in this regard. In fact, Major et al. (2005) found that Chinese listeners judged Chinese-accented speakers of English less comprehensible than Spanish- and Japanese-accented ones (see also Ortmeyer & Boyle, 1985). While a strong foreign accent is not necessarily unintelligible (as Derwing and Munro have pointed out), there is much to clarify about the relationship between accent and actual comprehension. As degree of accent increases, listeners become more lenient in how they judge the gravity of pronunciation errors, according to Schmid and Yeni-Komshian (1999), but leniency notwithstanding, stronger accents can have a negative effect on comprehension and recall. We cannot directly observe what goes on in the listener’s mind as he processes nonnative speech, of course, and the end result may give a false impression. In Riney, Takagi and Inutsuka (2005) study, Americans who listened to Japanese-accented English speech tended to assign accent ratings based on segmental accuracy while Japanese listeners did so on the basis of intonation, speech rate, and sentence duration. Nevertheless, these two groups’ ended up with very similar ratings. Furthermore, Kennedy and Trofimovich (2008) point to listeners either familiar or unfamiliar with specific accents who also arrived at similar ratings for intelligibility even though they differed in terms of actual comprehension (discussed in Moyer, 2013). Current investigations continue to explore the acoustic features – both segmental and suprasegmental – that contribute to perceptions of intelligibility and accentedness. Without a doubt, a priori perceptions come into play as well, as shown by Luk’s (2010) study of TESOL teachers’ attitudes toward pronunciation errors. First and foremost, her data reveal that a bias toward Received Pronunciation is alive and well within World English contexts. In an effort to get at the particulars of their biases, subjects were asked to judge a number of Hong Kong English features as ‘errors’ vs. ‘L2 accent’, and then determine whether such features were socially stigmatized. Luk’s Chinese subjects tended to cite L2 segmental features as ‘errors’ that affect intelligibility, while the suprasegmental features were more likely to be classified as stigmatized. The native speakers of English were more tolerant of deviations from RP, by comparison, and at the same time, they found suprasegmental features to be far more significant for intelligibility than segmentals. Kang (2010) has similarly shown that native speakers are more critical of features like reduced pitch range and inaccurate phrasal stress patterns (e.g., when emphasis is placed on function words like the).



The Social Nature of L2 Pronunciation   

   23

The question of pronunciation standards in World Englishes has garnered much attention in the literature, especially as we realize that bilingual speakers are evaluated in terms of their approximation to Inner Circle norms. Harrison’s (2012) study confirms that L2 users are quite aware that ‘not all Englishes have the same market value’, depending on geographical location. Even in Australia (an Inner Circle4 location), British English has higher status than does American English (p. 6). The upshot of this reality is that teachers, as well as researchers, need to focus more on what learners themselves set out to achieve vis-à-vis pronunciation, and whether pronunciation standards are best determined at the local level (see Harrison, this volume). And although it is true that native listeners could be taught greater tolerance for accents (see Derwing & Munro, Ch. 10, this volume), L2 users would likewise benefit from a greater appreciation for the myriad criteria that affect perceptions of accent (see LeVelle & Levis, Ch. 5, this volume). Furthermore while teachers may conceive of pronunciation as a matter of comprehensibility (Chuang, 2008), classroom learners nevertheless tend to think of accent as a matter of segmental accuracy. We must therefore consider how teachers can best address this gap, and raise awareness of suprasegmentals as an essential part of discursive fluency (see Hardison, Ch. 9, this volume). Munro (2003) has stated that “having an accent is a normal aspect of second language learning, particularly for adult learners” (p. 48), and yet we know that language identity and how one sounds can affect professional credibility and the ability to advance at the workplace, as in other contexts (Harrison, 2012, and Ch. 8, this volume). One of the ironies of L2 pronunciation is that those who have mastered it reasonably well tend to have greater confidence in their language abilities overall and to seek out contact with native speakers in order to build social networks, which further increases fluency (Moyer, 2004). The circularity is obvious, and it works in reverse as well: a strong foreign accent can be a barrier to further advancement if it causes self-consciousness and discourages L2 users from pursuing a deeper connection to the target language community. These kinds of issues are now coming to the forefront in the research on L2 pronunciation, as researchers realize that, universal constraints notwithstanding, getting to a point of real fluency relies on the depth of one’s opportunities to engage with native speakers, with all that represents socially and psychologically.

4 ‘Inner Circle’ – a reference to Great Britain, the U.S., Canada, Australia etc. – signifies those locations where English is traditionally said to be spoken ‘natively’, whereas the ‘Outer Circle’ refers to countries where English is spoken as a second language (India, Pakistan, etc.)(Kachru, 1992).

24   

   Alene Moyer

References Albrechtsen, D., Henriksen, B. & Faerch, C. 1980. Native speaker reactions to learners’ spoken interlanguage. Language Learning 30, 365–396. Anderson-Hsieh, J. & Koehler, K. 1988. The effect of foreign accent and speaking rate on native speaker comprehension. Language Learning 38, 561–613. Asher, J. & Garcia, R. 1969. The optimal age to learn a foreign language. Modern Language Journal 53, 334–341. Bent, T. & Bradlow, A. 2003. The interlanguage speech intelligibility benefit. Journal of the Acoustical spelling of Society of America 114, 1600–1610. Best, C., McRoberts, G. & Sithole, N. 1987. Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants. Haskins Laboratories Status Report on Speech Research 91, 1–29. Best, C. & Tyler, M. 2007. Non-native and second language speech perception. Commonalities and complementarities. In O.-S. Bohn & M. Munro (Eds.), Language experience and second language speech learning. In honor of James E. Flege (pp. 13–34). Amsterdam: John Benjamins. Birdsong, D. 2006. Age and L2 acquisition and processing. Language Learning 56, 9–49. Bongaerts, T., Planken, B. & Schils, E. 1995. Can late starters attain a native accent in a foreign language? A test of the critical period hypothesis. In D. Singleton & Z.Lengyel (Eds.), The age factor in second language acquisition (pp. 30–50). Clevedon: UK: Multilingual Matters. Bongaerts, T., Summeren, C., Planken, B. & Schils, E. 1997. Age and ultimate attainment in the production of foreign language. Studies in Second Language Acquisition 19, 447–465. Bucholtz, M. & Hall, K. 2005. Identity and interaction: A sociocultural linguistic approach. Discourse Studies 7, 585–614. Chuang, Y. 2008. Speaking assessment: A study of how language teachers test students’ foreign language oral proficiency and their attitudes toward teaching speaking in the EFL classroom. (Unpublished doctoral dissertation). Texas A & M University. Coupland, N. 1984. Accommodation at work: Some phonological data and their implications. International Journal of the Sociology of Language 46, 49–70. Derwing, T. & Munro, M. 1997. Accent, intelligibility, and comprehensibility. Studies in Second Language Acquisition 20, 1–16. Derwing, T. & Munro, M. 2005. Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly 39, 379–397. Derwing, T. & Munro, M. 2009. Putting accent in its place; Rethinking obstacles to communication. Language Teaching 42, 476–490. Ensz, K. 1982. French attitudes toward typical speech errors of American speakers of French. Modern Language Journal 66, 133–139. Escudero, P., & Boersma, P. 2004. Bridging the gap between L2 speech perception research and phonological theory. Studies in Second Language Acquisition 26, 551–585. Flege, J. 1995. Second language speech learning: Theory, findings and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (229–273). Timonium, MD: York Press. Flege, J. & Liu, S. 2001. The effect of experience on adults’ acquisition of a second language. Studies in Second Language Acquisition 23, 527–552.



The Social Nature of L2 Pronunciation   

   25

Flege, J., Yeni-Komshian, G. & Liu, S. 1999. Age constraints on second-language acquisition. Journal of Memory and Language 41, 78–104. Fullana, N. 2006. The development of English (FL) perception and production skills: Starting age and exposure effects. In C. Muñoz (Ed.), Age and the rate of foreign language learning (pp. 41–64). Clevedon, UK: Multilingual Matters. Garcia-Lecumberri, M. & Gallardo, F. 2003. English FL sounds in school learners of different ages. In M. Garcia Mayo & M. Garcia Lecumberri (Eds.), Age and the acquisition of English as a foreign language (pp. 115–135). Clevedon, UK: Multilingual Matters. Gardner, R. 1979. Social psychological aspects of second language acquisition. In H. Giles & R. Sinclair (Ed.), Language and social psychology (pp. 193–220). Baltimore: University Park Press. Gass, S. & Varonis, E. 1984. The effect of familiarity on the comprehensibility of non-native speech. Language Learning 34, 65–89. Gatbonton, E., Trofimovich, P. & Segalowitz, N. 2011. Ethnic group affiliation and patterns of development of a phonological variable. Modern Language Journal 95, 188–204. Giles, H. & Coupland, N. 1991. Language: Contexts and consequences. Pacific Grove, CA: Brooks/Cole Publishing Co. Grazia Busa, M. 2010. Effects of L1 on L2 pronunciation: Italian prosody in English. Linguistic Insights – Studies in Language and Communication 96, 207–228. Guiora, A., Acton, W., Erard, R. & Strickland, F. 1980. The effects of Benzodiazepine (valium) on permeability of language ego boundaries. Language Learning 30, 351–363. Guiora, A., Beit-Hallami, B., Brannon, R., Dull, C. & Scovel, T. 1972. The effects of experimentally-induced changes in ego states on pronunciation ability in second language: An exploratory study. Comprehensive Psychiatry 13, 421–28. Gynan, S. 1985. Comprehension, irritation and error hierarchies. Hispania 68, 160–165. Hancin-Bhatt, B. 1994. Segment transfer: A consequence of a dynamic system. Second Language Research 10, 241–269. Hansen, D. 1995. A study of the effect of the acculturation model on second language acquisition. In F. Eckman, D. Highland, P. Lee, J. Mileham & R. Rutkowski-Weber (Eds.), Second language acquisition theory and pedagogy (pp. 305–316). Psychology Press. Harrison, G. 2012. “Oh, you’ve got such a strong accent”: Language identity intersecting with professional identity in the human services in Australia. International Migration, ISSN 0020–7985. Hayes-Harb, R., Smith, B., Bent, T. & Bradlow, A. 2008. The interlanguage speech intelligibility benefit for native speakers of Mandarin: Production and perception of English word-final voicing contrasts. Journal of Phonetics 36, 664–679. Herschensohn, J. 2007. Language development and age. Cambridge, UK: Cambridge University Press. Ioup, G., Boustagi, E., El Tigi, M. & Moselle, M. 1994. Re-examining the critical period hypothesis: A case study of successful adult SLA in a naturalistic environment. Studies in Second Language Acquisition 16, 73–98. Kachru, B. 1992. Models for non-native Englishes. In B. Kachru (Ed.), The other tongue (pp. 48–74). Chicago: University of Illinois Press. Kang, O. 2010. Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness. System 38, 301–315.

26   

   Alene Moyer

Kennedy, S. & Trofimovich, P. 2008. Intelligibility, comprehensibility, and accentedness of L2 speech: The role of listener experience and semantic context. Canadian Modern Language Review 64, 459–489. Kuhl, P. 2007. Cracking the speech code: How infants learn language. Acoustical Science & Technology 28, 71–83. Kuhl, P., Conboy, B., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M. & Nelson, T. 2008. Phonetic learning as a pathway to language: New data and Native Language Magnet Theory Expanded (NLM-e). Philosophical Transactions: Biological Sciences 363, 979–1000. Lefkowitz, N. & Hedgcock, J. 2002. Sound barriers: Influences of social prestige, peer pressure and teacher (dis)approval on FL oral performance. Language Teaching Research 6, 223–244. Lefkowitz, N. & Hedgcock, J. 2006. Sound effects: Social pressure and identity negotiation in the Spanish language classroom. Applied Language Learning 16, 13–38. Lenneberg, E. 1967. Biological foundations of language. New York: Wiley & Sons. Levis, J. 2006. Pronunciation and the assessment of spoken language. In R. Hughes (Ed.), Spoken English, TESOL and applied linguistics (pp. 245–270). New York: Palgrave Macmillan. Lindemann, S. 2003. Koreans, Chinese or Indians? Attitudes and ideologies about non-native English speakers in the United States. Journal of Sociolinguistics 7, 348–364. Luk, J. 2010. Differentiating speech accents and pronunciation errors – Perceptions of TESOL professionals in Hong Kong. Hong Kong Journal of Applied Linguistics 12, 25–44. Lybeck, K. 2002. Cultural identification and second language pronunciation of Americans in Norway. Modern Language Journal 86, 174–191. Major, R. 1993. Sociolinguistic factors in loss and acquisition of phonology. In K. Hyltenstam & A. Viberg (Eds.), Progression and regression in language: Sociocultural, neuropsychological and linguistic perspectives (pp. 463–478). Cambridge, UK: Cambridge University Press. Major, R., Fitzmaurice, S., Bunta, F. & Balasubramanian, C. 2005. The effects of nonnative accents on listening comprehension: Implications for ESL assessment. TESOL Quarterly 36, 173–190. Markham, D. 1997. Phonetic imitation, accent, and the learner. Travaux de l’Institut de Linguistique de Lund 33, 3–269. Marx, N. 2002. Never quite a ‘native speaker’: Accent and identity in the L2 – and the L1. Canadian Modern Language Review 59, 264–281. Miller, J. 2003. Audible difference. ESL and social identity in school. Clevedon, UK: Multilingual Matters. Moyer, A. 1999. Ultimate attainment in L2 phonology: The critical factors of age, motivation and instruction. Studies in Second Language Acquisition 21, 81–108. Moyer, A. 2004. Age, accent and experience in second language acquisition. An integrated approach to critical period inquiry. Clevedon, UK: Multilingual Matters. Moyer, A. 2007. Do language attitudes determine accent? A study of bilinguals in the U.S. Journal of Multilingual and Multicultural Development 28, 1–17. Moyer, A. 2008. Conceptions of L2 phonology: Integrating cognitive and sociolinguistic approaches to research and teaching. In S. Katz & J. Watzinger-Tharp (Eds.), Conceptions of L2 grammar: Theoretical approaches and their application in the L2 classroom (pp. 51–67). Boston, MA: Heinle Cengage Learning.



The Social Nature of L2 Pronunciation   

   27

Moyer, A. 2011. An investigation of experience in L2 phonology. Canadian Modern Language Review 67, 191–216. Moyer, A. 2013. Foreign accent: The phenomenon of non-native speech. Cambridge, UK: Cambridge University Press. Mumhall, K., Jones, J., Callan, D., Kuratate, T. & Vatikiotis-Bateson, E. 2004. Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science 15, 133–137. Muñoz, C. (Ed.) 2006. The effects of age on foreign language learning: The BAF project. In C. Muñoz (Ed.), Age and the rate of foreign language learning (pp. 1–40). Clevedon, UK: Multilingual Matters. Muñoz, C. 2011. Input and long-term effects of starting age in foreign language learning. IRAL 49, 113–133. Muñoz, C. & Singleton, D. 2007. Foreign accent in advanced learners: Two successful profiles. EUROSLA Yearbook 7, 171–190. Munro, M. 2003. A primer on accent discrimination in the Canadian context. TESL Canada Journal 20, 38–51. Munro, M. & Derwing, T. 1998. The effects of speaking rate on listener evaluations of native and foreign-accented speech. Language Learning 48, 159–182. Munro, M. & Derwing, T. 1999. Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. In J. Leather (Ed.), Phonological issues in language learning (pp. 285–310). Malden, MA: Blackwell. Munro, M. & Derwing, T. 2001. Modeling perceptions of the accentedness and comprehensibility of L2 speech. Studies in Second Language Acquisition 23, 451–468. Munro, M., Derwing, T., & Morton, S. 2006. The mutual intelligibility of L2 speech. Studies in Second Language Acquisition 28, 111–131. Nardo, D. & Reiterer, S. 2009. Musicality and phonetic language aptitude. In G. Dogil & S. Reiterer (Eds.), Language talent and brain activity (pp. 213–255). Berlin: Mouton de Gruyter. Niedzielski, N. & Preston, D. 2003. Folk linguistics. Berlin: Mouton de Gruyter. Nikolov, M. 2000. The Critical Period Hypothesis reconsidered: Successful adult learners of Hungarian and English. IRAL 38, 109–124. Norton Peirce, B. 1995. Social identity, investment and language learning. TESOL Quarterly 29, 9–31. Ortmeyer, C. & Boyle, J. 1985. The effect of accent differences on comprehension. RELC Journal 16, 48–53. Oyama, S. 1976. A sensitive period for the acquisition of a non-native phonological system. Journal of Psycholinguistic Research 5, 261–283. Pavlenko, A. & Lantolf, J. 2000. Second language learning as participation and the (re) construction of selves. In J. Lantolf (Ed.), Sociocultural theory and second language learning (pp. 155–178). Oxford, UK: Oxford University Press. Pickering, L. 2009. Intonation as a pragmatic resource in ELF interaction. Intercultural Pragmatics 6, 235–255. Piller, I. 2002. Passing for a native speaker: Identity and success in second language learning. Journal of Sociolinguistics 6, 179–206. Piske, T., MacKay, I. & Flege, J. 2001. Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics 29, 191–215.

28   

   Alene Moyer

Purcell, E. & Suter, R. 1980. Predictors of pronunciation accuracy: A re-examination. Language Learning 30, 271–287. Reves, R. (1978, August). The ability to imitate as a characteristic of the good language learner. Paper presented to Fifth International Congress of Applied Linguistics (AILA), Montréal. Rindal, U. 2010. Constructing identity with L2: Pronunciation and attitudes among Norwegian learners of English. Journal of Sociolinguistics 14, 240–261. Riney, T., Takagi, N. & Inutsuka, K. 2005. Phonetic parameters and perceptual judgments of accent in English by American and Japanese listeners. TESOL Quarterly 39, 441–466. Rogers, C., Dalby, J. & Nishi, K. 2004. Effects of noise and proficiency on intelligibility of Chinese-accented English. Language and Speech 47, 139–154. Rubin, D. 1992. Nonlanguage factors affecting undergraduates’ judgments of nonnative English-speaking teaching assistants. Research in Higher Education 33, 511–531. Ryan, E., Carranza, M., & Moffie, R. 1975. Mexican American reactions to accented English. In J. Berry & W. Lonner (Eds.), Applied cross-cultural psychology: Selected papers from the Second International Conference of the IACCP. Amsterdam: Swets & Zeitlinger. Sardegna, V. 2009. Improving English stress through pronunciation learning strategies. (Unpublished doctoral dissertation). University of Illinois at Urbana Champaign. Schmid, P. & Yeni-Komshian, G. 1999. The effects of speaker accent and target predictability on perception of mispronunciations. Journal of Speech, Language and Hearing Research 42, 56–64. Schumann, J. 1975. Affective factors and the problem of age in second language acquisition. Language Learning 25, 209–235. Sereno, J. & Wang, Y. 2007. Behavioral and cortical effects of learning a second language. In O. Bohn & M. Munro (Eds.), Language experience in second language speech learning (pp. 239–258). Amsterdam: John Benjamins. Smith, L. & Rafiqzad, K. 1979. English for cross-cultural communication: The question of intelligibility. TESOL Quarterly 13, 371–380. Snow, C. & Hoefnagel-Höhle, M. 1982. The critical period for language acquisition: Evidence from second language learning. In S. Krashen, R. Scarcella & M. Long (Eds.), Child-adult differences in second language acquisition (pp. 93–111). Rowley, MA: Newbury House. Steinhauer, K., White, E. & Drury, J. 2009. Temporal dynamics of late second language acquisition: Evidence from event-related potentials. Second Language Research 25, 13–41. Stowe, L. & Sabourin, L. 2005. Imaging the processing of a second language: Effects of maturation and proficiency on the neural processes involved. IRAL 43, 329–353. Thompson, I. 1991. Foreign accents revisited: The English pronunciation of Russian immigrants. Language Learning 41, 177–204. Trofimovich, P. & Baker, W. 2006. Learning second language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition 28, 1–30. Van Wijngaarden, S., Steeneken, H. & Houtgast, T. 2002. Quantifying the intelligibility of speech in noise for non-native talkers. Journal of the Acoustical Society of America 112, 3004–3013. Wennerstrom, A. 1994. Intonational meaning in English discourse: A study of non-native speakers. Applied Linguistics 15, 399–420. Wennerstrom, A. 1998. Intonation as cohesion in academic discourse: A study of Chinese speakers of English. Studies in Second Language Acquisition 20, 1–25.



The Social Nature of L2 Pronunciation   

Werker, J. & Pegg, J. 1992. Infant speech perception and phonological acquisition. In C. Ferguson, L. Menn & C. Stoel-Gammon (Eds.), Phonological development: Models, research, implications. (pp. 285–311). Timonium, MD: York Press.

   29

Rachel Hayes-Harb

2 A  coustic-Phonetic Parameters in the Perception of Accent 2.1 Introduction Adult native listeners are highly accurate in their ability to determine that a given speech sample is produced by a nonnative speaker (e.g., Flege, 1984; Munro, Derwing & Burgess, 2010), and tend to agree with one another concerning the degree to which a sample is accented relative to other speech samples (e.g., Derwing, Thompson & Munro, 2006). Despite this accuracy and consistency among listeners, however, we do not yet fully understand the properties of speech that underlie the perception of accent, or in L2 speech research, accentedness, which is defined here as the “the extent to which an L2 learner’s speech is perceived to differ from native speaker…norms” (Munro & Derwing, 1998: 160). Accented speech can impact communication in a number of ways, among them increasing the processing difficulty experienced by the listener, interfering with the successful transmission of information, and even leading to communication breakdown. On the other hand, accented speech can also signal to the interlocutor that a person is a non-native speaker, possibly triggering the use of nonnative-directed speech by the interlocutor, some features of which may be helpful to the non-native speaker (Dickman, 2009). Accentedness is crucially distinguished from the related but distinct constructs of intelligibility, which can be defined as “the extent to which a speaker’s message is actually understood” (Munro & Derwing, 1995: 291), and comprehensibility, or ‘listeners’ perceptions of difficulty in understanding particular utterances” (Munro & Derwing, 1995: 291). Indeed, these constructs can be distinguished not only in terms of the tasks used to probe them—they may be associated with distinct acoustic-phonetic features of the speech itself (see e.g., Kang, 2010). The literature on L2 speech production has considered the relationships among intelligibility, comprehensibility, accentedness, and/or the acoustic-phonetic properties of speech; the present discussion will be focused on the evidence regarding the relationship between acoustic-phonetic properties of speech and accentedness, which has received the fullest attention in previous research. The goal of this chapter is to review the research on speech-related factors, specifically focusing on studies which have investigated the relationship between the acoustic-phonetic properties of speech and subjective listener judgments of accentedness. In the following discussion, I will present the evidence for the rela-

32   

   Rachel Hayes-Harb

tionship between accentedness judgments and the contributions of (1) segmental properties of speech, focusing on composite segment production, segment substitution, vowel quality and duration, and Voice Onset Time (VOT); and (2) nonsegmental properties of speech, including speech rate, pauses, word duration, composite prosody, lexical and phrasal stress, intensity, pitch and pitch range, and syllable structure. I will then discuss the methodological variation among the studies and its potential implications, in addition to the limitations imposed on our understanding of the relationship between acoustic-phonetic features and accentedness when we assess accentedness only from the perspective of native listeners.

2.2 Segmental Contributions to Accentedness 2.2.1 Composite segment production While many studies have coded acoustic-phonetic parameters for segments in terms of specific segmental features such as VOT or vowel quality, others have used composite measures of segment production, in which a number of acoustic-phonetic (and phonological) parameters are grouped together. For example, Anderson-Hsieh, Johnson and Koehler (1992) investigated the effects of several types of what they call “pronunciation deviance” on “impressionistic judgments of pronunciation”. In their study, three experienced oral test raters assigned pronunciation scores on a seven-point scale1 to previously-recorded oral reading passage recordings of sixty male nonnative speakers of English from eleven different native language backgrounds. Two graduate students with transcription training transcribed the sixty speech samples, and the authors calculated deviance from a native speaker norm using three phonetic and phonological categories: (a) segmental errors included phonemic (e.g., substituting /ʊ/ with /u/) and subphonemic (e.g., substituting /ɹ/ with /ɾ/) deviance in the production of vowels and consonants; (b) syllable structure errors included epenthesis, deletion, and metathesis; and (c) “overall prosody” was rated on the basis of an “overall impression of stress, rhythm, phrasing and intonation” (p. 542). Error rates were calculated as number of errors divided by the number of possible errors. All three

1 The scale used by Anderson-Hsieh et al. (1992) differs somewhat from scales used in most of the other studies discussed in this chapter in that the lower end of the scale was labeled “heavily accented speech that was unintelligible” (thus combining accentedness and intelligibility); the high end of the scale, however, was simply labeled “near-native speech”.



Acoustic-Phonetic Parameters in the Perception of Accent   

   33

error types were significantly correlated with pronunciation ratings, and it was found that more segmental errors resulted in lower pronunciation scores. Magen (1998), in a study employing separate composite measures for vowel and consonant production, investigated the contributions of several acousticphonetic properties of accented speech by comparing accentedness scores assigned to naturally-produced sentences and computer-edited versions of the same speech samples. In this study, two native Spanish speakers (referred to as speaker ML and speaker HG) read English sentences out loud, and these productions were edited to make them sound more native-like in a variety of ways, including syllable structure (e.g., deletion of epenthetic schwa as in ‘[ə]speak’), vowel quality (e.g., reduction of unstressed vowels by shortening), consonant features (e.g., addition of aspiration), and lexical and phrasal stress (F02 resynthesis; only for Speaker ML). The original and edited versions of the sentences were then rated for “closeness to English” by ten monolingual speakers of American English on a 7-point scale (closer to native English…less close to native English); raters were also shown the target sentence in writing. Magen (1998) found that for speaker ML, ratings of the edited speech were significantly closer to English than the original sentence in all four editing categories (for at least one editing procedure per category). However, for speaker HG, edited speech was rated as significantly closer to English in only the syllable structure and consonant categories (vowel quality was not significant; stress was not considered). Thus the segment production edits for vowels and consonants did indeed impact the raters’ judgments of closeness to English; however, while the effect for consonants was found for both talkers, the effects for vowels were not. Kashiwagi and Snyder (2010) also counted vowel and consonant error rates separately. In an earlier study by the same authors (Kashiwagi & Snyder, 2008), three American English-speaking college English teachers rated the accentedness of recordings of English passages read by twenty female native Japanese speakers on a seven-point accentedness scale (very strongly accented…no accent). Kashiwagi and Snyder (2010) then listened to the recordings and produced measures of the following: consonant error rate, vowel error rate, stress error rate, speech rate, intensity, pitch, and pitch range.3 They found that both consonant and vowel error rates (in addition to speech rate and intensity) significantly affected accentedness ratings. Finally, in a study focused primarily on speech rate (and thus described in fuller detail under “Speech rate” below), Munro & Derwing (2001; experiment two) found that a measure of “phonological errors” (which took into account 2 F0, or fundamental frequency, is related to the perceptual property of pitch. 3 It is unclear how they determined what constituted an error.

34   

   Rachel Hayes-Harb

phonemic substitution, deletion and insertion but not subphonemic errors) contributed significantly to ratings of accentedness. (It should be noted that deletion and insertion are considered separately from segmental features, under the section “Syllable structure” below.) In summary, the studies using a variety of composite measures of segmental accuracy discussed here have all reported a significant relationship between segmental accuracy and accentedness. Of course, these various composite measures of accentedness are limited in that each only addresses a handful of the specific ways in which segment production may be accented. In addition, they implicitly assign equal weight to each feature. While it is likely that accentedness is determined by a number of features working together, as captured however imperfectly by the composite measures just discussed, it is also important to investigate the contributions of the individual components of segmental accuracy. The following sections present studies where various segmental features have been treated individually.

2.2.2 Substitution Segment substitutions were included as one of multiple features in some of the composite segment measures just discussed (e.g., Anderson-Hsieh et al., 1992; Kashiwagi & Snyder, 2010). However, Riney, Takada and Ota (2000) specifically assessed the influence of segment substitutions on accentedness judgments. Substitutions often arise due to mismatches between the phonemic inventories of learners’ first and second languages. For example, native Japanese speakers often substitute the Japanese apico-alveolar flap /ɾ/ for the English liquids /ɹ/ and /l/ (which are absent from the Japanese inventory). Riney et al. (2000) investigated the relationship between this substitution pattern and accentedness ratings in the English speech of native Japanese speakers. Eleven native Japanese speakers and three native English speakers participated in four English production tasks (word list reading, sentence reading, paragraph reading, and spontaneous speech). Accentedness for each speaker had been determined in an earlier study (Riney & Flege, 1998) by five native English speakers judging the productions from the sentence reading task on a nine-point scale (heavy foreign accent… no foreign accent). To determine the frequency of flap substitution, Riney et al. (2000) asked two native Japanese-speaking judges to listen to target words and to identify whether “any Japanese-type flap substitutions had occurred for English /ɹ/ and /l/” (p. 722). The target words were selected from the four production tasks (which included the sentence productions on the basis of which accentedness ratings had been determined). The authors then calculated substitution rates and



Acoustic-Phonetic Parameters in the Perception of Accent   

   35

found that there was a strong negative correlation between judgments of nativelike accentedness and flap substitution. This finding, however, should be interpreted with some caution as there was little overlap in the speech samples used for determining accentedness and those used for determining flap substitution rates for a given speaker. More direct evidence for a relationship between flap substitution rates and accentedness would require that both measures come from the same speech sample.

2.2.3 Vowel quality and duration Vowel quality and vowel duration also contribute to accentedness ratings; however, their influence, including F14, F2, F2-F1, duration, etc., appears to be somewhat vowel- and word-specific. Munro (1993) investigated this relationship in a study of 21 native Arabic speakers and two native English speakers who read lists of /bVt/ and /bVd/ words in carrier sentences. Five native English speakers who were also trained linguists (one of whom was the author) were presented with the target words, and for each word, were informed of the intended vowel. These judges were asked to rate just the vowel portion of the words, not global accentedness, on a 0 (“sounded like a category other than the intended one”) to 100 (“sounded like a perfectly natural, unaccented exemplar of the target vowel”) scale (p. 55). In addition, F1, F2, F2-F1, vowel duration, movement in F1, and movement in F2 were measured for each vowel. The results for each Arabic speaker token were adjusted by calculating the difference between the measurements for that token and the English talker means. Munro found that the relationship between mean accentedness ratings and the various vowel measures differed by vowel. The specific pattern of results is quite complex; however, Munro’s (1993) findings may be summarized as follows: movement in F1 and vowel duration predicted accentedness for /eɪ/; F1 predicted accentedness for /ɪ, ε, æ/, and F1 and/or F2 movement predicted accentedness for /i, ɪ, æ/. Wayland (1997) also studied the relationship between vowel quality and duration measures and accentedness. In this study, three native speakers of Thai and six native speakers of English read aloud a set of Thai words with five lexical tones in a sentence frame. These word productions were presented to three native speakers of Thai, who judged their accentedness on a five-point scale (strongest degree of accentedness…native-like production). The productions were also submitted to acoustic analysis (VOT, F1, F2, F2-F1, vowel duration, F0 peak, F0 valley, 4 F1 and F2 are two of the resonance frequencies of the vocal tract, associated with vowel height and frontness/backness, respectively.

36   

   Rachel Hayes-Harb

and F0 range), and also transformed into difference between nonnative data and native data as in Munro, 1993). Overall, only F0 valley was significantly correlated with accentedness, and follow-up analyses revealed that significant predictors of accentedness varied widely from word to word. For example, F2 was a significant predictor for one high-tone word and across all tones of a second word, and F2-F1 was a significant predictor for one mid-tone word and one high-tone word. It is not clear why the different words, which were presented to the judges in isolation, elicited such different effects. On the basis of these two studies, and given the mixed findings that they present, we might conclude that the relationship between accentedness and F1 and F2 and their movement is complex, and may differ not only depending on vowel but also on the specific lexical items in which the vowels appear.

2.2.4 Voice Onset Time Voice onset time (VOT) is a parameter that differs widely among languages, and therefore may be expected to contribute to accentedness. In all of the studies reviewed here except for Wayland (1997), the target language is English, a language with typically long VOT for voiceless plosives, and the native language of the nonnative speakers is Japanese or Spanish, languages with typically short VOT. For this reason, to the extent that VOT and accentedness are related, correlations might be expected to be positive, with longer VOTs associated with more native-like ratings. Alba-Salas (2004) asked four monolingual English speakers to determine which /p, t, k/ tokens were produced by a native speaker of English and which by a “foreigner” (in a foreign accent detection task). The tokens were excised from word-initial position from English target words produced in carrier phrases by six native speakers of American English and six native Spanish speakers in a carrier sentence. Alba-Salas correlated VOT and the accent detection judgments, revealing a nonsignificant negative relationship when the responses of all four English judges were considered together. When they were considered individually, one of the four judges exhibited a moderate significant positive relationship, where longer VOT values were associated with more “native speaker” responses; however, one judge exhibited a strong significant negative correlation, and the correlation was not significant for either of the other two judges. Shah (2004), whose study focused primarily on nonsegmental contributions to accentedness (described in greater detail under “Lexical and phrasal stress” below), found a positive but nonsignificant correlation between VOT and accentedness in Spanish-accented English speech. Riney and Takagi (1999) similarly investigated the



Acoustic-Phonetic Parameters in the Perception of Accent   

   37

relationship between accentedness and VOT in the English speech of native Japanese speakers. In their study, which was based on production data first described in Riney and Flege (1988), VOT values for word-initial /p, t, k/ from monosyllabic words read in isolation were found to correlate positively with accentedness scores. However, while the positive correlations were significant for /p/ and /t/, they were not significant for /k/. Finally, in Wayland’s (1997) study of Englishaccented Thai, VOT did not emerge as a significant predictor of accentedness. Thus the literature provides mixed results with respect to the influence of VOT on judgments of accentedness by native listeners. In general, research has revealed weak-to-moderate (and often nonsignificant) positive correlations between VOT and accentedness for learners of English as a second language. Surprisingly, though, Alba-Salas (2004) found an overall negative (but nonsignificant) correlation when all judges were considered together, and a significant negative correlation for one judge. While the present focus is on research where accentedness judgments are provided by native listeners, it is worth noting that a separate group of four native Spanish-speaking judges in the Alba-Salas study all exhibited positive VOT-accentedness correlations. Thus while the relationship between VOT and accentedness may be weak for native listeners, it may be stronger for nonnative listeners, though no conclusions about this pattern can be drawn given the small number of nonnative listeners in the study.

2.3 Nonsegmental Contributions to Accentedness While the pronunciation of segments has traditionally received primary attention in second language pronunciation teaching (Pennington & Richards, 1986), in fact it has been shown that instruction focused on nonsegmental aspects of pronunciation, such as utterance-level intonation and stress, is associated with more native-like ratings of learner speech (Moyer, 1999). In this section I present findings with respect to the contributions of nonsegmental acoustic-phonetic parameters to accentedness ratings by native listeners, including parameters having to do with speech rate, pausing, stress, intensity, pitch, and syllable structure.

2.3.1 Speech rate Rate of speech is one of the most widely studied contributors to nonnative accentedness. Multiple factors influence speaking rate, including word duration and the frequency and duration of pauses. Accordingly, speaking rate has been operationalized in a variety of ways. First I present findings where speech rate is

38   

   Rachel Hayes-Harb

measured globally (typically in syllables per second), followed by studies that employed other measures. A number of studies have provided evidence that the relationship between speaking rate and accentedness is somewhat complex, with nonnative speech that is either too fast or too slow rated as more accented than speech produced at a “normal” rate. Munro and Derwing (1998; experiment one) asked ten native speakers of Mandarin and ten native speakers of English to read an English passage at “normal” and “slow” rates of speech, and had 20 native English speakers rate the accentedness of these speech samples on a 9-point scale (no accent… very strong accent). They found that the native Mandarin speakers tended to speak more slowly overall than a set of native English-speaking controls, and also that the raters judged “slow” speech (M = 2.91 syllables/second) by the native Mandarin speakers as more accented than their “normal”-rate speech (M = 3.81 syllables/second). Munro and Derwing (2001) provided evidence that speech can be judged as more accented when it is produced at a speech rate that is either too slow or too fast. In their first experiment, 48 native speakers of Canadian English rated the accentedness of sentences read out loud by 48 English as a second language (ESL) learners from mixed native language backgrounds and four native speakers of Canadian English on a 9-point scale (no accent…very strong accent). Speaking rate accounted for a significant (but small) amount of variance in the accentedness scores, and the estimated “optimal” speech rate (at 4.76 syllables/second) was faster than that used by the learners in the study (M = 3.24 syllables/second). They also found a curvilinear relationship between accentedness and speech rate, where speech produced at the “optimal” rate was rated as less accented than speech at faster and slower rates. In their second experiment, in order to investigate whether speaking rate contributes to accentedness judgments independent of segmental accuracy, Munro & Derwing (2001) used speech compression-expansion software to speed up and slow down (by 10 % in each direction) English speech samples produced by ten native Mandarin speakers. Twenty-six native speakers of Canadian English5 rated the speech on the 1–9 accentedness scale, and the authors transcribed the productions to compute the total number of phonological errors (phonemic substitution, deletion or insertion; they ignored subphonemic errors). They found that when the number of segmental errors was taken into account, speaking rate still made a small but statistically significant contribution to accentedness scores, and also that the curvilinear relationship between accentedness and speaking 5 Twenty-seven raters participated; ratings from one were excluded due to apparent misunderstanding of the directions.



Acoustic-Phonetic Parameters in the Perception of Accent   

   39

rate found in the first experiment was replicated with the manipulated speech samples used in experiment two. Trofimovich and Baker (2006) also report evidence for a relationship between speech rate and accentedness judgments. In their study, thirty native speakers of Korean and ten monolingual English speakers participated in a delayed sentence repetition task. The speakers’ productions were then low-pass filtered at 450 Hz to remove segmental content, and ten native English speakers rated the accentedness of the filtered speech samples on a 9-point scale (strong foreign accent…no foreign accent). The raters were also shown the target sentence in writing so that they could recover the segmental content. The authors investigated the relationship between accentedness scores and speech rate (syllables/second), pause frequency, stress timing, and peak alignment6. Speech rate was one of four factors that were significantly correlated with accentedness scores (only peak alignment was not significant), with faster speech rated as more native-like. In yet another study where speech rate was calculated as syllables per second, Kashiwagi and Snyder (2010) found that speech rate was a significant predictor of accentedness ratings in Japanese-accented English, again with faster speech receiving more native-like ratings. Kang (2010: 302) measured speech rate in four ways: syllables per second, articulation rate (syllables per second excluding pauses), phonation-time ratio (percentage of time producing audible speech), and mean length of run (mean number of syllables between pauses). In this study, eleven international teaching assistants from a variety of language backgrounds and three male native speakers of American English performed twenty-minute simulated instructional presentations in their major field. Fifty-eight native speakers of English (undergraduate students with a mean of 1.15 linguistics classes) rated the speech samples on five 7-point scales (e.g., speaks with a foreign accent…speaks with an American accent; has no accent…has a strong accent; speaks like a native speaker of English…speaks like a nonnative English speaker, etc.; pp. 313–314). Five-minute segments of each speech sample were selected for acoustic analysis. Of the four speech rate measures, only articulation rate significantly predicted accentedness ratings, with slower articulation rates perceived as more accented. Other significant predictor variables were pitch range, ratio of stressed words to total number of words, mean length of silent pauses, and ratio of atypical topic boundary pauses.

6 Peak alignment, as defined by Trofimovich and Baker (2007), is the location of the highest value (peak) of pitch (or of its acoustic correlate, fundamental frequency) relative to the accented syllable in an intonation phrase” (p. 260).

40   

   Rachel Hayes-Harb

In summary, when speech rate is measured as syllables per second, it is often but not always significantly correlated with accentedness scores (often with faster speech rated as more native-like). However, the studies reported here have not found evidence of a relationship between accentedness and other measures of speech rate, including phonation-time ratio and mean length of run.

2.3.2 Variability in word duration The durations of spoken words vary according to a number of factors, including not only speech rate, but also properties of the words themselves (e.g., lexical frequency or lexical category) and the context in which they are produced (e.g., predictability in a given context). Baker et al. (2011) investigated the relationship between word duration measures and judgments of accentedness in Korean- and Chinese-accented English speech. Previously-recorded paragraphs read by 20 native Korean speakers and 20 native Chinese speakers were rated for accentedness by 50 native speakers of English on a 9-point scale (native…foreign). In addition, previously-recorded spontaneous speech samples from 18 of the 20 Chinese speakers and 16 of the twenty Korean speakers were rated on the same accentedness scale by 15 native speakers of English. Baker et al. (2011) then assessed the correlations between word duration measures and the accentedness scores (for both read and spontaneous speech) and found that within-speaker word duration variance (but not word duration itself) exhibited a significant correlation with accentedness ratings for both read and spontaneous speech, with greater word duration variance associated with more native-like ratings. The relative duration of function words was significantly correlated with the spontaneous but not the read speech ratings, with shorter function words receiving more nativelike ratings. In addition, the correlation between accentedness and similarity to “native centroids” (Spearman correlation between a given speaker’s word durations and the mean relative word durations as produced by the native speakers, p. 12) was significant for the spontaneous but not the read speech. Speech by nonnative speakers whose relative durations were more similar to the native centroids was judged as more native-like. Together, these findings provide evidence that nonnative speech that exhibits more within-speaker word duration variance and more function word reduction, in addition to word durations that are more similar to those of native speakers, is judged as more native-like by native listeners. The different patterns of findings for the spontaneous versus the read speech samples may reflect crucial differences in the way that accents are manifested in these two types of speech (but see Munro & Derwing, 1994); however, Baker et al. (2011) note that the spontaneous



Acoustic-Phonetic Parameters in the Perception of Accent   

   41

and read speech samples differed not only in their spontaneity, but also in their length, the number of individual samples associated with each talker, variety in content across talkers, and other factors. Thus their study was not designed to allow for conclusions concerning the effect of speech elicitation type on the relationship between acoustic-phonetic features and accentedness.

2.3.3 Pauses Pauses are periods of time in a speech stream where the speaker has either stopped speaking or is producing a “filler” (such as ‘um’ in English). As with other indices of speaking rate, findings with respect to the relationship between pause measures and accentedness have been mixed. Flege (1988) asked native speakers of Mandarin, Taiwanese and English to read a set of English sentences out loud. The author and an assistant identified pauses in the speech samples where the pause criterion was as follows: both listeners reported hearing a pause and the pause was “visually evident in a display of rms amplitude” (p. 73). The speech samples were then edited to remove the pauses. The edited and unedited sentences were presented to nine native English listeners (two of whom had previously participated in the reading task), who were asked to rate the speech by moving a lever on a response box. (The lever’s range was labeled “no foreign accent” and “strong foreign accent” at the two ends and “medium foreign accent” at the mid-point.) These listeners gave significantly more native-like scores to the edited than the unedited sentences; however, the effect was very small (the mean difference was only three points on the 256-point lever scale). In a study using only unedited speech, Tromfimovich & Baker (2006) found that both pause frequency (where a pause was defined as a break in the speech stream of more than 100 ms); and mean pause duration were significantly correlated with accentedness ratings, where fewer and shorter pauses were associated with more native-like ratings. Kang (2010) found that two out of five suprasegmental variables she investigated were significantly correlated with accentedness ratings. These were mean length of silent pauses and ratio of atypical topic boundary pauses (atypical pauses were defined as those 800 ms or longer within a clause boundary). The three other suprasegmental variables, number of silent pauses and the number and length of filled pauses, were not significantly correlated with accentedness. Thus removing pauses from nonnative speech can result in a significant but slight increase in native-likeness judgments, and more frequent, longer and inappropriately-located silent pauses may be associated with less native-like speech ratings.

42   

   Rachel Hayes-Harb

2.3.4 Overall prosody Acoustic-phonetic parameters that contribute to what might be called “overall prosody” have been treated individually in much of the literature reviewed here. However, the following two studies considered measures of prosody that combine a number of suprasegmental features. Anderson-Hsieh et al. (1992) used an overall measure of prosody which was assumed to encompass stress (both lexical and phrasal), rhythm (e.g., reduction of unstressed syllables and function words), intonation (e.g., pitch range), and phrasing (e.g., where pauses occur). In their study, three phonetically trained individuals first rated nonnative speech samples for overall accentedness (see footnote 1), and they then rated the speech for nativelikeness on the four prosody dimensions in addition to a fifth “overall prosody” dimension. Because all scores on all five prosody scales were strongly correlated with one another, Anderson-Hsieh et al. (1992) used only the overall measure in their accentedness-prosody correlation analysis. They found that overall prosody was significantly correlated with accentedness ratings, with more native-like prosody associated with more native-like accentedness ratings. Munro’s (1995) approach to investigating the relationship between prosodic factors and accentedness involved the use of low-pass filtered speech. In this study, ten native speakers of Canadian English and ten native speakers of Mandarin participated in a structured interview. Selections of their speech were presented in a low-pass filtered condition (in an effort to remove segmental information, at 300 Hz for the female voices and 225 Hz for the male voices) and an unfiltered condition. Twenty native speakers of English were asked to rate the speech samples on a four-point scale (definitely spoken with a foreign accent… probably foreign-accented…probably spoken by a native speaker of English… definitely spoken by a native speaker of English). Munro (1995) found that the listeners assigned significantly more native-like scores to productions by native English than native Mandarin speakers in both the filtered and the unfiltered conditions, indicating that prosodic information alone can be used by native listeners to distinguish between native and nonnative speech. These studies together provide evidence that overall prosody, at least as captured by the controlled elicitations methods used by these researchers and presented to listeners in a mostly decontextualized way, may play an important role in accentedness judgments. In the following sections, I present studies that have focused on individual components of prosody separately, including stress, intensity, pitch, and syllable structure.



Acoustic-Phonetic Parameters in the Perception of Accent   

   43

2.3.5 Lexical and phrasal stress Several studies have explored the influence of stress patterns on accentedness ratings. For example, Magen (1998) investigated the effect of lexical and phrasal stress by comparing accentedness ratings of edited and unedited nonnative speech samples. Lexical and phrasal stress was manipulated by resynthesizing the F0 patterns to more closely approximate native-like speech. Accentedness judgments were significantly more native-like for the stress-edited than the unedited speech samples. Kang (2010) investigated the effect of the number of stressed words per minute and proportion of stressed words on accentedness, and found that the ratio of stressed words to total number of words, but not number of stressed words per minute, was significantly related to accentedness, with more stressed words associated with less native-like speech. While Magen (1998) and Kang (2010) found evidence for relationships between measures of stress and accentedness ratings, other studies have not. In Shah (2004), twenty-two native Spanish speakers and five native English speakers read English sentences containing target words which were extracted for analysis. Ten native English speakers rated the target words on a nine-point scale (least foreign-accented…more foreign-accented). Shah (2004) measured overall word duration, unstressed vowel duration, ratios of stressed to unstressed vowel duration, and VOT in the target words. Lexical stress, which was operationalized as stressed to unstressed vowel duration ratio, was significantly correlated with accentedness ratings for only one of the eight target words. Trofimovich and Baker (2006) found that unstressed to stressed syllable duration ratio did not significantly predict accent ratings. Finally, Kashiwagi and Snyder (2010) found that their measure of stress, calculated as the number of syllables with irregular (i.e., non-target-like) stress divided by the total number of syllables, was not significantly correlated with accentedness. The studies discussed here have operationalized stress in a number of different ways, and used speech materials of differing lengths. It is therefore not clear whether the mixed findings with respect to the relationship between measures of lexical/phrasal stress and accentedness result from the nature of stress itself or from methodological differences among studies. Studies comparing the various measures of stress and involving speech materials of a variety of lengths may help to clarify this issue.

44   

   Rachel Hayes-Harb

2.3.6 Intensity Only one of the studies reviewed here has specifically focused on the relationship between intensity and accentedness. Kawashigi and Snyder (2010) found that native English speakers rated English speech produced by native Japanese speakers as less native-like when it was louder. In the absence of studies that involve other language pairings, length of speech materials, etc., it is unclear whether such a finding with respect to an intensity-accentedness relationship is languageor context-specific.

2.3.7 Pitch range Studies that have considered the relationship between pitch and accentedness have tended to focus on pitch range.7 Pitch range is typically calculated as the difference between maximum and minimum F0 points in a selection of speech. A small pitch range is normally associated with monotonous speech and a higher pitch range with animated, or “lively”, speech (see, e.g., Hincks, 2005 for discussion of liveliness judgments of L2 speech). Findings concerning the influence of pitch range on accentedness scores are mixed: Kang (2010) found that limited pitch range had a strong negative effect on accentedness, with wider pitch ranges associated with more native-like speech ratings. On the other hand, pitch range was not a significant predictor of accentedness scores in Kashiwagi and Snyder (2010).

2.3.8 Syllable structure errors The final parameter considered here is syllable structure errors, though they are more accurately classified as phonological than acoustic-phonetic. Syllable structure errors include deletion, epenthesis, and metathesis (i.e., switching the order of segments), and are widely-documented in nonnative speech (see e.g., Hancin-Bhatt & Bhatt, 1997; Sato, 1984). One of the measures considered by Anderson-Hsieh et al. (1992) was syllable structure deviance, determined by the rate of vowel and consonant epenthesis, deletion, and metathesis. Syllable structure deviance was significantly correlated with pronunciation ratings, with more errors resulting in lower ratings. Magen (1998) similarly found that syntheti7 Wayland (1999) in fact studied absolute pitch, and found that F0 valley (but not F0 peak or F0 range) significantly predicted accentedness ratings.



Acoustic-Phonetic Parameters in the Perception of Accent   

   45

cally “repairing” epenthetic schwas in nonnative speech led to significantly more native-like assessments of the speech.

2.4 Some Methodological Issues in Accentedness Research The studies discussed here involve various speech elicitation methods, lengths of speech samples, ranges of accents represented in the speech samples, ways of manipulating the accentedness of speech samples, and listener backgrounds, all of which may impact how we should interpret their findings. Here I discuss the potential implications of these methodological differences. One type of variation among these studies is found in the tasks used to elicit accentedness ratings. Most of the studies discussed here assessed accentedness via Likert-type scales, where raters were asked to select some point between two endpoints representing the degree to which they agree with the labels at either end of the scale. In accentedness studies, these endpoints are typically labeled with something like “sounds like a native speaker…doesn’t sound like a native speaker”; however, it is worth noting that the actual labels provided in the various studies differ widely. In addition, one of the studies discussed here, Alba-Salas (2004), used a foreign accent detection task, where judges are asked to determine which speech samples were produced by a native speaker of English and which by a “foreigner”. It is unknown whether accentedness judgments are robust across these differences in task type, though it may be expected that scalar responses can reveal finer judgment nuances more readily than can categorical judgments (see Southwood & Flege, 1999, for discussion of the related question of the difference between direct magnitude estimation versus interval scaling). A second type of variation among studies can be found in the length of materials and the elicitation procedures used to collect the speech samples presented to the judges. In the studies discussed here, materials length ranges from single segments excised from word productions (Alba-Salas, 2004) to several-minutelong passages (e.g., Kang, 2010). In addition, while most of the studies considered here use read speech samples (words, sentences, paragraphs), and some have used extemporaneous speech (e.g., Kang, 2010), very few have considered both for the purpose of comparing the two (e.g., Baker et al., 2011; see Munro & Derwing, 1994). Shorter materials, and those elicited via reading tasks, permit greater control over the speech stimuli, in particular when the researcher is interested in very specific features of speech (e.g., VOT). On the other hand, longer, and extemporaneous, materials are more representative of speech in communicative settings, and may be more appropriate for investigation of some acousticphonetic features in particular (such as speech rate, overall prosody, etc.). It is

46   

   Rachel Hayes-Harb

worth noting that the tension between experimental control and ecological validity is of course not unique to this type of research. A third type of variation concerns the range of accents represented in the speech sample presented to raters. In some studies, e.g., Kashiwagi and Snyder (2010), only nonnative speech samples were presented to subjects. Others such as Wayland (1997) present speech samples from both native and nonnative speakers to listeners. There is evidence that accentedness judgments are more native-like for nonnative speech samples when native speakers are excluded from the set of speech samples (Flege & Fletcher 1992); thus the potential for range effects should be taken into consideration when interpreting accentedness findings. A fourth type of variation concerns whether the speech samples were computer-edited. For example, both Flege (1988) and Magen (1998) made adjustments to naturally-produced speech samples and compared accentedness ratings of the edited and unedited versions in order to determine the contribution of the edited features to accentedness. The extent to which the findings of studies using edited speech can be compared to those of studies using only naturally-produced speech is unclear. Here, again, we see the tension between experimental control and ecological validity. Finally, a fifth type of variation among studies has to do with the characteristics of the listeners who provided the accentedness judgments. In the majority of the studies discussed here, the native speaker raters did not have expertise in either phonetics/phonology or language assessment, while in some, the judges were experienced oral test raters (Anderson-Hsieh et al. 1992), English teachers (Kashiwagi & Snyder, 2010), or linguistics students (Kang, 2010). Because it has been found that naïve and experienced raters may differ in their accentedness judgments (e.g., Hsieh 2011), differences in rater experience may limit comparisons of results across studies using raters with different profiles. A related consideration is whether the raters have second language experience. While Magen’s (1998) raters were all monolingual English speakers, the raters in the Kang (2010) study reported experience with a number of second languages; in fact, ten of the 58 raters had studied one of the native languages of the talkers whose speech they rated. It seems reasonable to hypothesize that listeners with different amounts of experience as second language speakers themselves will differ in their approach to assigning accentedness ratings. Another point of difference is whether or not listeners (raters) were native speakers of the targeted language. In order to limit the scope of the present discussion, I have focused only on (portions of) studies where the listeners providing the accentedness ratings were native speakers. However, it may be inappropriate to assess accentedness by means of exclusively native-speaker judges, in particular in the case of English, given that the majority of English speakers in



Acoustic-Phonetic Parameters in the Perception of Accent   

   47

the world are in fact nonnative speakers (e.g., Crystal, 2003). Indeed, a world Englishes perspective challenges even the very definition of accentedness as extent of deviance from a native speaker norm. Perhaps in part due to a push to acknowledge the legitimacy of nonnative speakers, and perhaps also in an effort to better understand speech processing by nonnative speakers, researchers are demonstrating increasing interest in accentedness ratings by nonnative listeners. In fact, some of the studies discussed here involved nonnative raters in addition to their native raters, including but not limited to Flege (1988), Alba-Salas (2004), and Kashiwagi and Snyder (2010). These studies revealed sometimes striking differences between native and nonnative listeners in the acoustic-phonetic parameters that affected their accentedness judgments (though it is premature at this point to make many generalizations about these differences given the relatively limited data available). In order to gain a fuller understanding of the phenomenon of accentedness and its acoustic-phonetic determinants, and in the spirit of a world Englishes perspective, much more investigation of accentedness as assessed by nonnative listeners is currently needed.

2.5 Conclusion In light of the methodological variation and sometimes mixed findings exhibited among the studies discussed in this chapter, it is difficult to derive many robust generalizations about the relationship between acoustic-phonetic parameters and accentedness judgments. Nonetheless, I believe we may reasonably draw a number of conclusions from this work.

2.5.1 Segmental contributions to accentedness When assessed in a composite manner, segmental accuracy has been consistently associated with more native-like speech ratings, though this relationship may differ for consonants and vowels (Anderson-Hsieh et al., 1992; Kashiwagi & Snyder, 2010; Magen, 1998; Munro & Derwing, 2001). In addition, certain segment substitutions resulting from differences in phoneme inventories between the L1 and L2 can lead to less-native-like accentedness ratings (e.g., substituting Japanese flap for English /ɹ/ and /l/, as in Riney et al., 2000). At the phonetic level, findings with respect to vowel formant values and duration have been complex, and appear to be vowel- and word-specific (Munro, 1993; Wayland, 1997). For consonants, longer VOT is sometimes associated with more native-like ratings of English as a second language speech; overall, however, the findings concerning

48   

   Rachel Hayes-Harb

the VOT-accentedness relationship have been mixed (Alba-Salas, 2004; Riney & Takagi, 1999; Shah, 2004; Wayland, 1997).

2.5.2 Nonsegmental contributions to accentedness Speech rate, which has been widely studied in the accentedness literature, appears to consistently influence accentedness ratings. In particular, when nonnative speakers intentionally slow down their speech, it may have a negative effect on ratings of native-likeness (Munro & Derwing, 1998). Interestingly, the relationship between accentedness and speech rate appears to be curvilinear, with speech that is either too fast or too slow rated as more accented (Kang, 2010; Munro & Derwing, 1998; Munro & Derwing, 2001; Trofimovich & Baker, 2006). Variance in word duration has also received some attention in the literature: when nonnative speakers exhibit more word duration variance and when their word durations are more similar to those of native speakers, their speech has been judged as having a more native-like accent (Baker et al., 2011). Also related to speech rate is pausing: it has been found that pausing more frequently, for longer periods, and in inappropriate places has been associated with less nativelike ratings of nonnative speech (Flege, 1988; Kang, 2010; Trofimovich & Baker, 2006). Prosodic factors have also received a great deal of attention in the literature, as measures of overall prosody are significantly correlated with judgments of native-likeness (Anderson-Hsieh et al. 1992; Munro, 1995). Studies looking specifically at lexical and phrasal stress patterns have produced mixed results, which may vary depending on how stress is measured (Kang, 2010; Kashiwagi & Snyder, 2010; Magen, 1998; Shah, 2004; Trofimovich & Baker, 2006). The relationship between pitch range and accentedness scores has also been inconsistent (Kang, 2010; Kashiwagi & Snyder, 2010). On the other hand, syllable structure errors, including epenthesis, deletion, and metathesis, have been associated with less native-like accentedness ratings (Anderson-Hsieh et al., 1992; Magen, 1998). In summary, while some of these patterns have proven consistent across studies, others appear to be less robust. It remains to be seen whether methodological variation, including of the types discussed earlier (e.g., the linguistic backgrounds of the raters, the range of accents represented in the task, etc.), is responsible for mixed findings.



Acoustic-Phonetic Parameters in the Perception of Accent   

   49

2.5.3 Conclusion Given that language is an inherently social phenomenon, and that accentedness is by definition determined by the subjective judgments of listeners, the crucial role of the listener cannot be ignored in any discussion of accentedness. Listeners’ judgments may be influenced by the amount and type of experience with nonnative speakers in general and/or speakers with specific native languages in particular, and their attitudes towards nonnative-speaking individuals and groups. In addition, the increasing attention to non-native listeners’ judgments of accentedness should be seen as a welcome development. Not only does the inclusion of non-native alongside native listener judgments acknowledge and legitimize the role of the non-native speaker, but it can also provide crucial insights into the ways in which non-native speakers process the target language. Here we have seen a summary of evidence linking specific acoustic-phonetic properties of speech to accentedness judgments. As mentioned above, accentedness is conceptually related to but crucially independent of constructs that more directly concern the success of communication, namely intelligibility and comprehensibility (e.g., Munro & Derwing, 1995). It remains to be seen how the multitude of features affecting accentedness impact actual communication—it may be the case that distinct (subsets of) features contribute to accentedness and related constructs (see, e.g., Kang, 2010). For example, segmental and suprasegmental features may contribute differentially to accentedness, intelligibility, and comprehensibility. Clearly, research addressing these questions must grapple with a variety of methodological issues that have limited our ability to draw broad conclusions about the relationship between acoustic-phonetic features of speech and their impact on listeners.

References Alba-Salas, J. 2004. Voice onset time and foreign accent detection: Are L2 learners better than monolinguals? Revista Alicantina de Estudios Ingleses 17, 9–30. Anderson-Hsieh, J., Johnson, R., & Koehler, K. 1992. The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning 42(4), 529–555. Baker, R.E., Baese-Berk, M., Bonnasse-Gahot, L., Kim, M., Van Engen, K.J., & Bradlow, A.R. 2011. Word durations in nonnative English. Journal of Phonetics 39(1), 1–17. Crystal, D. 2003. English as a global language. Cambridge: Cambridge University Press. Derwing, T.M., Thomson, R.I., & Munro, M.J. 2006. English pronunciation and fluency development in Mandarin and Slavic speakers. System 34, 183–193. Dickman, S.M. 2009. Differences in intelligibility of non-native directed speech and hearing impaired directed speech for non-native listeners. MA Thesis, University of Utah.

50   

   Rachel Hayes-Harb

Flege, J.E. & Fletcher, K.L. 1992. Talker and listener effects on degree of perceived foreign accent. Journal of the Acoustical Society of America 91(1), 370–387. Flege, J.E. 1984. The detection of French accent by American listeners. Journal of the Acoustical Society of America 76, 692–707. Flege, J.E. 1988. Factors affecting degree of perceived foreign accent in English sentences. Journal of the Acoustical Society of America 84(1), 70–79. Hancin-Bhatt, B., & Bhatt, R.M. 1997. Optimal L2 syllables: Interactions of transfer and developmental effects. Studies in Second Language Acquisition 19(3), 331–378. Hincks, R. 2005. Measures and perceptions of liveliness in student oral presentation speech: A proposal for an automatic feedback mechanism. System 33, 575–591. Hsieh, C-N. 2011. Rater effects in ITA testing: ESL teachers’ versus American undergraduates’ judgments of accentedness, comprehensibility, and oral proficiency. Spaan Fellow Working Papers in Second or Foreign Language Assessment 9, 47–74. Kang, O. 2010. Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness. System 38(2), 301–315. Kashiwagi. A., & Snyder, M. 2008. American and Japanese listener assessment of Japanese EFL speech: Pronunciation features affecting intelligibility. The Journal of Asia TEFL 5(4), 27–47. Kashiwagi, A., & Snyder, M. 2010. Speech characteristics of Japanese speakers affecting American and Japanese listener evaluations. Teachers College, Columbia University Working Papers in TESOL & Applied Linguistics 10(1), 1–14. Magen, H.S. 1998. The perception of foreign-accented speech. Journal of Phonetics 26(4), 381–400. Moyer, A. 1999. Ultimate attainment in L2 phonology: The critical factors of age, motivation and instruction. Studies in Second Language Acquisition 21, 81–108. Munro, M.J. 1993. Productions of English vowels by native speakers of Arabic: Acoustic measurements and accentedness ratings. Language and Speech 36(1), 39–66. Munro, M.J. 1995. Nonsegmental factors in foreign accent: Ratings of filtered speech. Studies in Second Language Acquisition 17(1), 17–34. Munro, M.J. & Derwing, T.M. 1994. Evaluations of foreign accent in extemporaneous and read material. Language Testing 11(3), 253–266. Munro, M.J., & Derwing, T.M. 1995. Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech. Language and Speech 38(3), 289–306. Munro, M.J., & Derwing, T.M. 1998. The effects of speaking rate on listener evaluations of native and foreign-accented speech. Language Learning 48(2), 159–182. Munro, M.J. & Derwing, T.M. 2001. Modeling perceptions of the accentedness and comprehensibility of L2 speech: The role of speaking rate. Studies in Second Language Acquisition 23, 451–468. Munro, M.J., Derwing, T.M., & Burgess, C.S. 2010. Detection of nonnative speaker status from content-masked speech. Speech Communication 52(7–8), 626–637. Pennington, M.C., & Richards, J.C. 1986. Pronunciation revisited. TESOL Quarterly 20(2), 207–225. Riney, T, Takada, M. & Ota, M. 2000. Segmentals and global foreign accent: The Japanese flap in EFL. TESOL Quarterly 34(4), 711–737. Riney, T.J., & Flege, J.E. 1998. Changes over time in global foreign accent and liquid identifiability and accuracy. Studies in Second Language Acquisition 20(2), 213–243.



Acoustic-Phonetic Parameters in the Perception of Accent   

   51

Riney, T.J., & Takagi, N. 1999. Global foreign accent and voice onset time among Japanese EFL speakers. Language Learning 49(2), 275–302. Sato, C.J. 1984. Phonological processes in second language acquisition: Another look at interlanguage syllable structure. Language Learning 34(4), 43–57. Shah, A.P. 2004. Production and perceptual correlates of Spanish-accented English. Proceedings of From Sound to Sense 2004. Ed by J. Slitka, S. Manuel, & M. Matthies. http://www.rle.mit.edu/soundtosense/conference/starthere.htm. C79–C82. Southwood, M.H. & Flege, J.E. 1999. Scaling foreign accent: Direct magnitude estimation versus interval scaling. Clinical Linguistics and Phonetics 13(5), 335–349. Trofimovich, P. & Baker, W. 2006. Learning second language spurasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition 28, 1–30. Wayland, R. 1997. Nonnative production of Thai: Acoustic measurements and accentedness ratings. Applied Linguistics 18(3), 345–373.

Jette G. Hansen Edwards

3 D  evelopmental Sequences and Constraints in Second Language Phonological Acquisition: Balancing Language-internal and Languageexternal Factors 3.1 Introduction This chapter focuses on developmental sequences in second language (L2) phonological acquisition and the constraints that impact these sequences. Up to now, research that examines developmental sequences in L2 phonological acquisition has primarily focused on the impact of language-internal factors such as markedness and transfer. However, social factors impact which features learners target for acquisition as well as how they use those features. That is to say, both language-internal and language-external factors shape what the learner targets for acquisition as well as how the learner perceives, and thus acquires, the L21 (or L3 or L4, etc.). During L2 acquisition, learners actively target and use L2 structures, and at the same time they are constrained by any language(s) previously learned as well as the articulatory/acoustic facets of the targeted structures themselves. Importantly, learners in this process are “active agents in their language use, language choices, and targets for acquisition… they are not passive recipients of the target language” (Hansen Edwards, 2008: 251). While there has been a great deal of research on L2 phonology, we still do not have a complete picture of how learners actually acquire sounds in an L2. This is partly due to the myriad factors that have been found to impact the development of an L2 sound system. Even within the research on language-internal constraints on L2 phonological acquisition, research can typically be considered as ‘isolationist’; not only is language acquisition typically investigated apart from the social setting, but the structures being examined are often isolated from the learner’s entire L2 phonological repertoire. However, as Baptista’s (2006) research indicates, learners do not acquire L2 structures in isolation from each other, and the acquisition of one structure, or features of a structure, may result in changes to other structures in the learner’s developing L2. Additionally, structures under investigation have often been researched in highly controlled word 1 Henceforth L2 will be used for any language acquired after the native language, e.g., also a third or fourth language.

54   

   Jette G. Hansen Edwards

lists and reading passages. Only examining a very specific subset of structures and investigating them primarily in controlled contexts presents a very limited, and possibly misleading, picture of what the learner can do in the L2. With this in mind, this chapter presents a state-of-the-art account of the research on developmental sequences in L2 phonological acquisition in order to examine how both language-internal and language-external aspects of L2 acquisition shape what the learner perceives and produces in the L2. Thereafter, major findings on the developmental sequences for L2 segmentals and suprasegmentals are discussed, with reference to how such constraints determine these sequences. The chapter concludes with suggestions for future research on developmental sequences in L2 phonological acquisition, with this complexity in mind.

3.2 L anguage-internal constraints on L2 phonological acquisition There are two major lines of research on language-internal constraints on SLA: (a) research that investigates the role of previous language(s) learned in constraining L2 acquisition via positive and negative transfer; (b) research that explores how universal, cross-linguistic phenomena such as markedness impact L2 acquisition. Each of these approaches is briefly outlined below. Beginning in 1957, with Lado’s Contrastive Analysis Hypothesis (CAH) (Lado, 1957) and continuing up to current theories and research on L2 phonology, the effects of any previously learned languages on the acquisition of later languages has been considered one of the most, if not the most powerful factor in the development of an L2 phonological system. It has therefore been a primary area of investigation in L2 phonological research as well as a major feature of theories of L2 phonological acquisition. Research on the effect of the L1 on acquisition of the L2 has often relied on a contrastive analysis of the languages in order to examine whether overlaps in the phonetic and phonological inventories of the two languages result in positive transfer of structures that occur in both the L1 and the L2. A substantial body of research evidences the powerful role of transfer across all domains of L2 phonological acquisition (Archibald, 1993b; Bohn & Flege, 1992; Chen, Robb, Gilbert & Lerman, 2001; Guion, Harada & Clark, 2004; Gut, 2010; Halle, Chang & Best, 2004; Hansen, 2001, 2004, 2006; Yu & Andruski, 2010). Findings illustrate that those structures that occur in the L1 and the L2 are positively transferred to the learner’s emerging L2 repertoire and are among the earliest acquired, in contrast to L2 structures that are not present in the learner’s L1. Transfer is also the major constraint in a number of prominent L2 phonological theories, for example Flege’s (1995) Speech Learning Model (SLM). The



Developmental Sequences and Constraints   

   55

SLM is a model of perception and production which holds that problems in L2 production can be traced to difficulties in perceiving sounds correctly in the L2 due to ‘equivalence classifications’ (Flege, 1987), i.e., that the level of perceived phonetic similarity/dissimilarity between L1 and L2 affects the extent to which the learner is able to establish a new phonetic category for an L2 sound. The SLM also says that age is a factor in acquisition, with the ability to establish new phonetic categories decreasing, but not disappearing, as age of learning increases. A substantial body of research provides support for the model (Flege, 1987, 1991; Flege, Frieda, Walley & Randazza, 1998; Flege & Hillenbrand, 1984). A second line of research on language-internal constraints has mostly focused on the effect of universals, including typological markedness, on L2 acquisition orders. The term markedness was coined by the Prague School of Linguistics (Jakobson, 1968) to name the concept of binary oppositions in language (e.g., voiced vs. voiceless stops such as /p/ vs. /b/ or /t/ vs. /d/). For each binary feature, one is typically more frequent across world languages and is therefore more basic, natural, and unmarked (Greenberg, 1965, 1966; Ladefoged & Maddieson, 1996). From cross-linguistic analyses, a number of important findings have emerged including that the most common (least marked) sounds in languages are voiceless stops, nasals, and the voiceless alveolar fricative /s/ (Greenberg, 1965, 1966; Ladefoged & Maddieson, 1996). L2 research on implicational markedness has focused on whether unmarked sounds or features of the L2 are acquired more easily and earlier than more marked sounds and features. It is important to note that markedness may interact with transfer in determining developmental sequences of acquisition (Hansen, 2001, 2004; Weinberger, 1987). Another focus of language-internal research is the extent to which developmental sequences and processes for L2 learners mirror those found for children who are L1 speakers of that language. Evidence shows that L2 learners are constrained to some extent by the same developmental constraints as child L1 learners, most likely because child language acquisition typically proceeds along a markedness continuum, with less marked structures acquired before more marked structures (Abrahamsson, 2001; Flege & Davidian, 1984; Hancin-Bhatt & Bhatt, 1997; Hansen, 2006; Hecht & Mulford, 1982; Piper, 1984). One problem with research examining transfer versus universal constraints is that both approaches are looking at the same phenomenon though in two different ways. For example, most languages have the voiceless stops /p t k/, which are considered to be unmarked cross-linguistically. A markedness perspective predicts that a learner will be able to produce these sounds in the L2 earlier and with a higher accuracy than other sounds. Another perspective on this phenomenon would be that this is a clear case of transfer from the L1 to the L2.

56   

   Jette G. Hansen Edwards

Despite the important role that language-internal constraints, and particularly transfer, continues to play in both L2 phonological theories and research, there are several major limitations to consider. Firstly, as Major and Faudree (1996) note, “…transfer cannot explain order of acquisition, and why or whether or not the resulting interlanguage systems behave according to the principles of natural languages” (p. 69). Secondly, it can be difficult to determine whether an acquisition issue is due to transfer or markedness. For example, if the learner’s L2 has both voiceless and voiced obstruents in word-final position, such as /t/ in bet and /d/ in bed, but the learner’s L1 only has voiceless obstruents in word-final position, the learner may devoice final voiced obstruents and produce bed as bet. The ‘transfer’ explanation holds that the learner is transferring L1 phonotactics into the L2 and producing the L2 to conform to the L1. However, as voiceless obstruents are less marked than voiced obstruents, and as child learners of languages with voiced obstruents also devoice obstruents in early stages of L1 acquisition, markedness may also be a factor. Ultimately, it may sometimes be impossible to determine whether L1 transfer, markedness, or other factors explain such phenomena as devoicing, or whether it is a combination of several factors. Another weakness of both language-internal approaches is that they largely ignore the role of input frequency for a given structure within a language. Though only now emerging as a research area in L2 phonology, SLA and child L1 phonological research have shown that the more frequently a structure is distributed and used in the target language, the earlier the structure may be acquired. As Zamuner, Gerken, and Hammond (2004) note, “…there is considerable evidence that child language reflects language-specific input at an early age” (p. 516). Input frequency differences in sounds across languages may also explain cross-linguistic differences in child L1 acquisition orders. For example, the voiced labiodental fricative /v/ may emerge earlier in the speech of children acquiring Swedish or Estonian as their L1, given that /v/ is more frequent in these languages, than for children acquiring English as their L1 since /v/ is not as frequent in English (Ingram, 1999). To what extent input frequency impacts and overrides other constraints on L2 phonological acquisition is also now being addressed. As Trofimovich, Collins, Cardoso, White and Horst (2012) state, L2 researchers who examine input frequency: …hold that language users are sensitive to the frequency of lexical items in linguistic input and that language acquisition involves the learning of phonological, morphological, semantic, and other regularities from input. With respect to L2 phonology, the logic here is that certain aspects of speech (e.g., speech sounds, stress patterns, intonation contours) are easier to learn when they occur within and across a variety of recurrent familiar lexical items. The more frequently L2 learners experience a given phonological pattern in



Developmental Sequences and Constraints   

   57

the input, especially across a range of lexical items, the more accurately they will perceive and produce this pattern. (pp. 176–177)

There are only a few research studies in L2 phonology that examine how input frequency impacts developmental sequences. They have mixed findings. Flege, Takagi and Mann (1996) found that learners were more accurate in identifying sounds in more frequently used words. In their research on the acquisition of voiceless versus voiced obstruent codas, Broselow and Xu (2004) found that it was not the frequency itself that impacted L2 acquisition, but what the learners perceived as being more frequent that mattered. In contrast, a comparison of the acquisition of /sl sn st/ structures by frequency vs. markedness by Cardoso (2008) found that markedness offered a better explanation for acquisition orders. Overall, the findings for the effect of input frequency in L2 phonology are not as clear-cut as those for child L1 phonology, which may not be surprising given that more factors, both extraneous and internal, impact L2 as compared to L1 acquisition. As noted previously, research on developmental sequences and the effect of language-internal constraints on acquisition often use word lists and reading passages as the primary means of data collection, which in effect treats the language in isolation from the context in which it is spoken. These types of analyses provide a snapshot of the learner’s capabilities under certain conditions, but only a glimpse of the whole language learning process. Those working within a sociolinguistic framework have long noted that L2 speakers vary their production of specific sounds based on a variety of factors. The influence of linguistic environment, which affects the preceding and following sounds, has been of particular interest for those working under a variationist perspective. Although they have not yet received widespread attention by L2 phonology scholars, a number of studies indicate that both the preceding and the following linguistic environment can have a strong effect on L2 production (Abrahamsson, 1999; Bayley, 1996; Dickerson & Dickerson, 1977; Gatbonton, 1978; Hansen, 2004, 2006; Hansen Edwards, 2011; Osburne, 1996; Young 1988). In fact, several studies have found that linguistic environment has a more powerful effect on L2 phonology than markedness. For example, research on factors influencing the modification via epenthesis versus accurate production of /sC/ and /sCC/ onsets, as in start and street, found that preceding linguistic environment (a consonant, vowel, or pause) was a stronger predictor of onset modification than length of the onset (Abrahamsson, 1999; Carlisle, 1997). Like the effect of linguistic environment, grammatical conditioning has also most frequently been investigated under variationist approaches to SLA (Archibald, 1993a; Bayley, 1996; Hansen, 2001, 2004; Osburne, 1996; Young,

58   

   Jette G. Hansen Edwards

1988), with findings indicating that grammatical conditioning may exert a powerful effect on the acquisition of L2 syllable codas. For example, research on final /t,d/ deletion in CC and CCC codas such as the omission of the final /t/ in just and text has shown that L2 learners are more likely to delete /t,d/ in inflected forms such as the past tense form in contrast to monomorphemic forms, which indicates a lack of acquisition of grammatical inflections. Conversely, native speakers of English are more likely to delete /t,d/ in monomorphemic forms and rarely delete the /t,d/ in inflected forms (Bayley, 1996). Language-external factors also impact L2 phonological acquisition, especially with respect to what the learner targets in the L2. This is discussed below.

3.3 L anguage-external constraints on L2 phonological acquisition Language-external constraints refer to factors beyond linguistic influences that may affect what L2 learners acquire and how they use what they acquire. Language-external constraints are often called ‘social factors’, as the focus of this research is how the social context of language learning shapes the opportunities, willingness, and ability of the learner to acquire and use the L2. Commonly researched language-external factors include gender, L1/L2 identity, extent of L1/ L2 use, interlocutor accommodation, and the effect of peer networks on the use of variants in the L2. While we know a great deal about how these factors impact L2 use, there has been no research to date that directly investigates how they affect developmental sequences in L2 phonological acquisition. As noted, research that investigates developmental sequences characteristically takes a languageinternal approach that relies on decontextualized language data. However, it is critical that language-external factors be examined in connection with developmental sequences to offer a wider perspective and deeper understanding of what the learners really know about the L2 and how they actually use the L2. To give an example – the interdental fricative /ð/ is rare cross-linguistically, and is therefore considered a relatively marked sound in comparison to other consonants. It exists in English, but it may not exist in the L1 for many L2 learners of English. If an L2 learner pronounces the /ð/ as /d/, as in dat for that, it may be erroneously assumed that the learner has not acquired the dental fricative. In fact, the learner may be able to pronounce /ð/ but for reasons of ethnic and cultural identity, chooses not to produce it in certain situations (Gatbonton, 1975). In fact, findings from research on language-external factors clearly illustrate that L2 learners are influenced by peer group networks, social identity, and ethnic group affiliation in using or avoiding particular features of the L2 in order



Developmental Sequences and Constraints   

   59

to signal a particular group affiliation or identity or to mark L1/L2 group membership. L2 learners have been found to target a non-standard over a standard L2 variant due to peer language use influences. Adamson and Regan’s (1991) Vietnamese and Cambodian L2 English learners targeted either the standard or non-standard {-ing} variant, either /ɪn/ or /iŋ/, based on how widely each was used in the learners’ same-gender English native speaker social groups. Those learners who were women typically used the standard /iŋ/ form as this was the most common form among the native English speaking women in their social networks, while the male L2 learners used /ɪn/, similar to the native English speaking men with whom they socialized. Learners may also actively resist using features of the L2 if use of these features conveys an identity in conflict with their L1 identity. For example, Gatbonton (1975) conducted research on the use of the English /ð/ as a pro-French versus pro-English marker for French Canadian learners of English L2. She found that French Canadians who were pro-French were more likely to use /d/ rather than /ð/ in order to mark their pro-French identity while the learners who were proEnglish used /ð/ more frequently. Both groups of French Canadian learners were aware of the effect of the use of /ð/ as a marker of identity. Ohara (2001) also found avoidance of use of specific L2 features, in this case a high pitch, in her work on American women learning L2 Japanese. Ohara found that some of the American women in her study avoided using a high Japanese pitch in some contexts, even though they had acquired the high pitch and understood why it was sociolinguistically appropriate in these contexts. These women felt that using the high pitch when speaking Japanese would convey a ‘submissive’ or ‘cute’ identity, which was at odds with their L1 identity. Instead, they employed a normal pitch although this may be been perceived as being both linguistically and socially incorrect in Japanese. Finally, research has also shown that learners vary the use of L2 and L1 sounds based on who they interact with in order to show ethnic solidarity or dissimilarity with the interlocutor (Beebe & Zuengler, 1983). While it is unlikely that language-external constraints can override L1 transfer or universal constraints on acquisition and therefore change the developmental sequence per se, language-external constraints can affect what learners are able to acquire of the L2 based on what aspects of the L2 they are exposed to. For example, learners may not be able to acquire features of the L2 that are not used in their social context because they may not be exposed to them, or if exposed to them, the learners may not notice these features if they are not widely used within their peer group networks. Their social and peer networks also affect the variety the learners target for acquisition. Finally, how the learners perceive their own identity in the L1 and L2 affects how they use – or avoid using – the features of the L2 that they have been exposed to and are able to acquire.

60   

   Jette G. Hansen Edwards

3.4 Findings from research on developmental sequences In this section, a synthesis of the main findings on developmental sequences in L2 phonology will be presented. The synthesis is based on a review of over 100 studies on L2 phonology, and covers vowels, consonants, syllable structure, and stress. In line with the research on L2 phonology, the term ‘developmental sequences’ is used to refer to the acquisition orders of L2 sound structures based on the emergence of one structure or feature of a structure before another, based on data collection at a given time or across time. The higher accuracy of one structure compared to another may also imply that this was a structure acquired earlier. Finally, the term developmental sequences also applies to the acquisition of a particular aspect of a structure, such as vowel duration and spectral properties of vowels. It is important to note that while findings on developmental sequences are fairly consistent, research has been conducted principally by examining the effect of language-internal factors such as transfer and markedness on acquisition, with English as the L2 and Spanish, Brazilian Portuguese, or Mandarin Chinese as the L1. More research across a wider range of languages may verify whether the developmental sequences discussed below are universal for L2 phonological acquisition. Additionally, expansion of the language collection tasks to include data about the social context of language learning will enable a more extensive analysis of developmental sequences.

3.4.1 Consonants Research on syllable margins, both for singleton consonants and consonant clusters, has been the most substantive in L2 phonology, and the findings are both largely consistent and robust. A variety of factors impact the acquisition and acquisition orders of both single and complex onsets and codas, from transfer to markedness and developmental effects, to grammatical conditioning and linguistic environment. As noted above, external factors may not change developmental sequences per se, but they do impact what features and variants of the language the learner is exposed to as well as targets for acquisition and use. Research on syllable onsets has focused on a number of areas: voice onset time (VOT), the acquisition of singleton onsets, and the acquisition of onset clusters. VOT refers to the duration between the release of the obstructed airflow for stop consonants and the onset of voicing for the following vowel. Languages differ in terms of phonemic and phonetic differences in VOT, and thus a great deal of research has examined the extent to which L2 learners can acquire alto-



Developmental Sequences and Constraints   

   61

gether new VOT settings. As Zampini (2008) notes, stops are classified according to the duration of the lag time between the release burst and the onset of voicing: long-lag VOT is usually over 35 milliseconds; short-lag VOT is between 0–35 milliseconds; and pre-voiced stops have voicing during the stop closure, and may even have a negative VOT. Learners may have a difficult time perceiving L1–L2 differences in VOT and therefore use L1 VOT values to produce L2 stops (Flege, 1991; Flege, Frieda, Walley, & Randazza, 1998). At advanced levels, learners may begin to reset their L2 VOT values, although research indicates that their L2 VOT values are still a ‘compromise’ (Zampini, 2008: 223) between the L1 and L2 values (Flege, 1991; Flege & Hillenbrand, 1984). In other words, learners may never completely acquire L2 VOT values unless they begin acquiring the L2 as children (and even then, native-like values may not be fully acquired). In a study of early and late Spanish L1 learners of English, Flege, Frieda, Walley and Randazza (1998) found that English native speakers had the longest VOT values, followed by early learners of English and then late learners, indicating that L2 VOT values were not acquired by the late learners and not fully acquired by the early learners. Third language acquisition research (Llama, Cardoso & Collins, 2010) has also found that learners transfer the L1–L2 compromise values into the L3. The question of whether developmental patterns can be found for VOT acquisition remains largely unanswered, although as noted above, developmental sequences may consist of a gradual merging of L1–L2 values, and native-like VOT values for the L2 may be very difficult to acquire. In terms of differences in the acquisition of VOT values for specific consonants, as González López (2012) states, “VOT values for voiceless stops vary in terms of [point of articulation, or] PoA2: the further back in the mouth the PoA and the more extended the contact area, the higher and longer the VOT values. Thus, /p/ tends to show a shorter VOT value than /t/, which shows a shorter VOT value than /k/” (p. 247). This suggests that VOT in L2 English is more easily acquired for /p/ over /t/, and /p/ and /t/ over /k/, and short-lag stops in other L2s are more easily acquired for /p/ over /t/ over /k/ for native English speakers (see also Major, 1987). Linguistic environment may also impact VOT values, as shown by Flege et al. (1998), where native speakers of English and early as well as late L2 learners of English all had longer VOT values before high over low vowels. There has also been a great deal of research on singleton onsets, with developmental sequences influenced both by L1 transfer and markedness. Voiceless stops and nasals are among the earliest L2 consonants to be acquired due to

2 PoA = point of articulation

62   

   Jette G. Hansen Edwards

both markedness and ‘positive’ transfer from the L1 (Hansen, 2006; Hecht and Mulford, 1982; Stockman & Pluut, 1999). The voiceless fricatives /s f/ appear to be the earliest fricatives to emerge as they are relatively unmarked; the more marked dental fricatives, affricates, and voiced fricatives tend to emerge later. Overall, voiceless consonants and front consonants also appear to be acquired earlier than voiced and back consonants, respectively. In terms of developmental patterns in onsets clusters, shorter onsets are typically acquired before longer onset structures, with a sequence of acquisition of C > CC > CCC. This pattern has been found for codas as well. Research on the acquisition of L2 syllable codas has focused on both singleton and complex codas. Some evidence suggests a preference for an open syllable structure, specifically a CV syllable structure over CVC or longer margins, as a CV structure is considered to be universally the least marked structure. Despite an apparent preference for a CV structure (see Tarone, 1980), L1 transfer effects can override this universal preference if the L1 allows a CVC structure (Benson, 1988; Hansen, 2004; 2006; Hodne, 1985; Osburne, 1996; Tarone, 1980). A great deal of research has also focused on word-final stops and/or obstruents, particularly with English as the L2, as English has a binary opposition in the voicing of these sounds. There appear to be markedness effects in both perception and production of these sounds, with voiceless sounds easier to perceive and produce than voiced sounds. Evidence suggests that if learners’ L1 does not have a voicing contrast in final position, it is difficult to learn the contrast in the L2 (Broselow, Chen, & Wang, 1998; Eckman, 1981a, 1981b; Edge, 1991; Flege & Davidian, 1984; Major & Faudree, 1996; Smith & Peterson, 2012). There is also the possibility that the less marked front stops are acquired earlier than the back stop. Cardoso (2011), for example, found that the accuracy of word final stop perception was related to the place of articulation of the stop consonant, with front sounds (e.g., labial, coronal /p t/) easier to perceive than the back stop /k/. Similar to the findings for onset acquisition orders, research on L2 codas indicates that voiceless stops and nasals are among the first consonants to be acquired, followed by the voiceless fricatives /s f/ (common cross-linguistically and relatively unmarked), and that voiced consonants are acquired later. The voiced dental fricative /ð/ and voiced fricatives /v ʒ/ are among the most difficult, and thus latest, structures to be acquired (Hancin-Bhatt, 2000; Hansen, 2001, 2004, 2006; Hecht & Mulford, 1982). It also appears that markedness affects order of acquisition by place of articulation, and that front consonants are easier to acquire than back consonants. Grammatical conditioning affects coda acquisition as well. In research on Mandarin Chinese speakers learning L2 Swedish, Abrahamsson (2001) found that deletion of /r/ in codas was connected to grammatical category. In Swedish, -er,



Developmental Sequences and Constraints   

   63

-or and -ar function as plural suffixes, such as in bilar (cars). Abrahamsson found more /r/ deletion in these suffixes than in words where /r/ was part of the morpheme as in skor (shoes), indicating a lack of acquisition of the plural inflection. Saunders (1987) also notes an effect of grammatical conditioning in his research on Japanese L1 learners’ production of English /ps ts ks/. He found that /s/ was retained in plural codas more often than third person singular codas, indicating that learners had begun acquiring the plural form but not the third person inflection. Research on /t, d/ deletion in CC and CCC codas supports the idea that deletion patterns are constrained by grammatical conditioning (Bayley, 1996), and that L2 learners are more likely to delete /t, d/ in inflected forms such as in the regular past tense verbs over monomorphemes.

3.4.2 Vowels The data on vowels is not as substantive as that on syllable margins and consonants. While the vast majority of research on the acquisition of L2 consonants and syllable margins has focused on production, most of the research on vowels has focused on perception, typically of one or more vowel contrasts (e.g., /i/ vs. /ɪ/, /æ/ vs. /ɛ/) based on the hypothesis that in order to produce an L2 vowel one must first perceive it. It may therefore be difficult to draw conclusions on acquisition orders per se, though a number of observations can be made. The most common vowels in the world’s languages are /i e a o u/ (Ladefoged & Maddieson, 1996), and these are among the first vowels acquired in L1 acquisition, according to Vihman (1996). Therefore, it stands to reason that based on frequency as well as early acquisition in L1 phonology, the vowels / i e a o u / may be among the earliest to be acquired by L2 learners. Research has examined the acquisition of these sounds in oppositional contrasts, e.g., /i/ vs. /ɪ/, /e/ vs /æ/, and /u/ vs. /υ/, for learners whose L1 has only one of the sounds in each contrast, typically /i e u/ (note that if English is the L2 in question, it has all of the sounds in these contrasts). Findings consistently suggest that one’s L1 vowel inventory is initially transferred onto the emerging L2 vowel repertoire. As BundgaardNielsen, Best and Tyler (2011) note: There is abundant evidence that the size and organization of the L1 vowel inventory influences how L2 learners perceive the vowel contrasts in their new language… The perceptual difficulty experienced by an L2 learner is partly determined by the size of the L1 inventory relative to the L2 vowel inventory. Thus, it is harder for speakers of L1s with smaller vowel inventories (such as Spanish) to acquire a rich L2 vowel inventory relative to speakers of L1s with larger vowel inventories (such as German and Norwegian). This is

64   

   Jette G. Hansen Edwards

because several L2 vowels may be perceived as similar to just one L1 vowel category and consequently will be hard to discriminate (p. 52).

Whether the learner’s L1 has one or more of the vowel contrasts may affect not only the acquisition order of the vowels in contrast, but also the basis on which the contrast itself is established, i.e., whether it is based on duration (length) or on spectral differences such as high-low tongue height, front-backness, or F1 and F2 formants, respectively. As an example, Bundgaard et. al (2011) note that Spanish-speaking learners of English do not have an L1 /i – ɪ/ contrast and they cannot distinguish the two vowels. They merge these vowels into one Spanish category, /i/. On the other hand, learners whose L1 is Serbian distinguish between English /i/ and /ɪ/ based on durational differences, as Serbian has both a long and a short /i/. Learners whose L1 is German, which has both /i/ and /ɪ/, can distinguish between the English /i/ and /ɪ/ based on spectral differences, though the German vowels have slightly different formant values than the English /i/ and /ɪ/. In fact, vowel duration appears to be a “more highly salient cue to vowel identity than spectral information” (Bundgaard et al., 2011: 52), especially in initial stages of L2 vowel acquisition. As Flege, Bohn and Jahn (1997) note, duration may be used as a cue to differences in vowel contrasts by less experienced learners while more experienced learners focus on spectral cues. In later stages, typically when learners have more L2 experience (Flege, Bohn & Jahn, 1997), learners may begin to distinguish length from spectral differences and widen their L2 phonetic space to accommodate new F1/F2 categories. In effect, a developmental pattern may exist for L2 vowel acquisition as follows: in the initial stage, L2 vowel space is based on L1 vowels (with a full transfer of durational and spectral values); therefore, the similar L1/L2 vowels (e.g., /i/, /e/, and /u/) emerge first. When a vowel contrast in the L2 represents two phonemes where the L1 only has one, such as /i  – ɪ/, /e – ɛ/, and /u – ʊ/, initially the two categories may be assimilated to one (e.g., /i/ for both /ɪ/ and /i/). According to Flege’s SLM (1995), these vowels would be perceived as ‘similar’, and therefore among the most difficult to acquire. As learners begin to perceive that there is a vowel contrast, they may first focus on durational differences. This may also be an effect of instruction, as Rojczyk (2010) notes. In later stages, spectral differences may begin to be acquired. Flege’s SLM also predicts that it may be easier to acquire an entirely new vowel that does not have any spectral similarities with an existing L2 vowel. For example, L1 English learners of French may have a relatively easier time acquiring the French sound /y/ than the French /u/, as /y/ may be perceived as a new phone while the French /u/ is similar to English /u:/ and may therefore be assimilated to this sound.



Developmental Sequences and Constraints   

   65

Evidence also suggests that acquisition of a L2 vowel or features of a L2 vowel (such as backness) affects the development of the other L2 vowels. As Baptista (2006) states, a major problem with research on L2 vowels is that most studies examine vowels in isolation from each other, looking at one vowel contrast rather than the entire L2 vowel space. As she explains, previous research has indicated that: …listeners perceive each vowel in relation to the speaker’s total acoustic vowel space, which they calibrate from the formant frequency patterns in the rest of the ongoing speech. If this is so, L2 research which treats vowels in isolation may be neglecting some very important information. (p. 20)

Along these lines, Baptista (2006) found that the majority of the English learners in her study, all L1 speakers of Brazilian Portuguese, were not able to acquire the /i – ɪ/ contrast because the learners had a difficult time distinguishing /ɪ/ from /eɪ/, which due to transfer from the learners’ L1, was produced as a higher vowel in the L2 than it should be, and thus closer in height to /ɪ/ than it should be in the L2. Across time, several out of the eleven participants were able to acquire the /i – ɪ/ contrast, but only after they lowered /eɪ/. This study provides evidence that one’s entire L2 phonology may be affected when one sound is acquired, and that sounds are not learned in isolation from each other. While specific developmental sequences for L2 vowels are thus difficult to establish, it is possible to offer a few insights. As noted earlier, the relatively unmarked and thus more frequent vowels tend to emerge first, e.g., /i e u/. In terms of which vowel contrasts are likely to emerge first, research suggests that although the /i – ɪ/ contrast is difficult for L2 learners (Jia, Strange, Wu, Collado & Guan, 2006), possibly due to the close proximity of the two sounds spectrally this contrast may emerge earlier than other vowel contrasts, and L2 learners can improve their perception and production of /ɪ/, especially with more L2 experience (Munro & Derwing, 2008; Rojczyk, 2010). Other vowel contrasts, such as the /ɛ – æ/ and the /u – ʊ/ vowel contrasts, may be more difficult and thus acquired later. A number of studies have found that even advanced learners have difficulty with the /ɛ – æ/ contrast and do not improve significantly over time (Baptista, 2006; Dreasher & Anderson-Hsieh, 1990; Escudero & Vasiliev, 2011; Flege, Bohn, & Jahn, 1997; Jia, Strange, Wu, Collado & Guan, 2006; Kautzsch, 2010; Munro & Derwing, 2008; Rojczyk, 2010). For example, Jia, Strange, Wu, Collado and Guan, (2006) found the English /ɛ  – æ/ contrast the second most difficult for their Mandarin Chinese L1 learners of English. Dreasher and Anderson-Hsieh (1990), in research on native speakers of Brazilian Portuguese learning English, also found that /æ/ was the most difficult vowel for their participants to acquire.

66   

   Jette G. Hansen Edwards

The high back short vowel /ʊ/ is also particularly difficult (Bundgaard et al., 2011; Kautzsch, 2010; Munro & Derwing, 2008) and is typically assimilated to /u/. Finally, linguistic environment has also been found to have an impact on vowel production: Escudero and Vasiliev (2011) found that following linguistic environment affected whether /ɛ/ was assimilated to /e/ or to /a/. Zhou (2010) also found that vowels were more difficult to produce before bilabial and velar stops, and easier before alveolar stops.

3.4.3 Stress This final section on developmental patterns in L2 phonological acquisition focuses on word and sentence stress. Work in this area has primarily focused on English L2 and on three main influences on the development of L2 stress patterns: syllable structure, grammatical conditioning, and the effect of stress patterns of words that are phonologically similar. Syllable structure has been investigated with regards to type of vowels (long vs. short) and weight of syllables (heavy syllables have more coda consonants whereas light syllables have fewer or none). Statistical analyses of words have found that long vowels attract stress more than short vowels, and that heavy syllables (those with more coda consonants) are more likely to be stressed than those with zero coda or fewer consonants in the coda. Grammatical conditioning refers to the stress distribution across lexical classes, though the focus is most often on English two-syllable nouns and verbs. Corpus linguistic analyses indicate that the majority (almost 74 %) of two-syllable words in English are trochaic. Researchers working on this area have focused on whether L2 learners can correctly place stress in two-syllable nouns and verbs, particularly on the first syllable of two-syllable nouns versus on the second syllable for two syllable verbs. A number of conclusions can be drawn from the research of L2 word stress acquisition. First, there is ample evidence that L2 learners transfer L1 stress patterns and that the L1’s status as a stress-timed or syllable-timed language can affect the learning of new stress patterns: “…[I]t seems that native speakers of a stress language are more likely to show evidence of learning stress patterns than first language speakers of a non-stress language” (Guion, Harada, & Clark, 2004: 208). Second, learners are able to acquire and use the cue of grammatical category to assign stress in English L2 words, possibly because it is often taught in ESL/EFL classrooms (Archibald, 1993a, 1993b, 1997; Guion, Harada & Clark, 2004). For example, Archibald (1993a), found that English learners of L1 Polish were able to accurately place stress on the first syllable of English two-syllable nouns and on the second syllable of English two-syllable verbs, evidence that



Developmental Sequences and Constraints   

   67

the cue of grammatical category was salient. However, L1 transfer may affect the use of this cue, as Yu and Andruski (2010) illustrate. They found that while both native speakers of English and Mandarin Chinese were able to accurately recognize trochaic and iambic stress patterns, the native speakers of English had a faster reaction time for identifying trochaic over iambic stress patterns while for Mandarin Chinese L1 learners of English, the pattern was reversed. The cue of syllable structure, in particular stress on a long over a short vowel, also appears to be salient. Guion, Harada and Clark (2004) examined stress placement in English by early and late Spanish L1-English L2 bilinguals and found that nouns received more initial stress than verbs, and that a long vowel also received more stress, regardless of whether it was in the initial or final syllable. A study by Guion (2005) on Korean L1-English L2 early and late bilinguals found that both groups placed more stress on the initial syllable for nouns as opposed to verbs, and on the long vowel in the initial syllable and final syllable. However, the effect of lexical class overall was smaller for the early bilinguals as compared to the native speakers, and even smaller for the late bilinguals. Studies of sentence stress have focused on whether L2 learners are able to acquire sentence stress cues such as F0 (fundamental frequency, the lowest frequency of a waveform), vowel length, and intensity to signal the difference between stressed and unstressed words. Findings indicate that L2 learners are able to use intensity and fundamental frequency to differentiate unstressed and stressed words in English. Intensity, in particular, appears to be easier to acquire, and thus perhaps constitutes an earlier stage in the sentence stress acquisition sequence according to several studies. For example, Chen, Robb, Gilbert and Lerman (2001) found that there were no differences in intensity between stressed words for the Mandarin Chinese speakers and native speakers of English, indicating that this aspect of sentence stress may be the easiest, and thus earliest to acquire. Ng and Chen (2011), in a study of Cantonese L1 speakers of English, also found that intensity of stressed syllables in sentences was the same for the native speakers and the Cantonese speakers of English. Similarly to native speakers, the Cantonese speakers of English were able to produce the stressed syllables with a higher F0 than unstressed words. In summary, it may be that, particularly for early learners, the different cues for stress marking are acquired as follows: lexical class first followed by syllable structure. Within syllable structure, learners may first use long vowel as the first cue, followed by consonant cluster (heavy syllable) over a singleton (light syllable) cues. It also appears that learners can acquire a variety of cues in the L2 for stress marking in both words and sentences.

68   

   Jette G. Hansen Edwards

3.5 Implications for future research While the existing research on the acquisition of an L2 sound system has led to important insights on how learners perceive, process, and produce new sounds, the research is far from complete. As of yet, there exists no model of L2 phonology that adequately incorporates and accounts for the factors that influence L2 phonological acquisition. As this synthesis makes clear, research and theories of L2 phonology need to expand their focus of transfer from only the L1 to other languages learned. It is also clear that both linguistic environment and grammatical conditioning can exert a powerful effect on L2 phonology, sometimes overriding transfer and universal constraints, indicating that phonological developmental phenomena are influenced by linguistic factors other than phonology. Finally, the nexus between language-internal and language-external factors is underresearched but requires more attention and focus if we are to truly understand how learners acquire and use the L2 sound system, as ‘errors’ in production may not reflect lack of acquisition, but rather the use of an L1 sound in order to avoid the L2 feature and/or to signal an L1 identity. The following suggestions are thus given for future research on developmental sequences: –– The focus of both the L1 and the L2 should be expanded to other languages as the majority of the research has focused on English as the L2 and often on languages such as Spanish, Brazilian Portuguese, Cantonese, or Mandarin Chinese as the L1. Such an expansion would help address the question of whether these developmental sequences are universal across L2 language acquisition or restricted to particular L2s. –– Data collection tasks should be carefully developed to enable the researcher to examine the effect of grammatical conditioning and linguistic environment as these constraints have been found to significantly affect L2 acquisition and production. While word lists and reading passages have merit as they enable the researcher to elicit particular L2 features and control the linguistic and grammatical environment, more naturalistic data should also be collected to see how learners actually produce the L2 without the support and time to monitor production. –– Questionnaires and surveys may be used to elicit information about languageexternal factors that impact L2 acquisition and use. Information gathered from these instruments can then be used to construct both closed and open interview questions, which can then elicit more naturalistic linguistic data, in addition to allowing the researcher to probe further into the social factors that affect the learner. The effect of the interlocutor during these interviews should also be examined, as it is likely that the L2 learner will use different



Developmental Sequences and Constraints   

   69

features of the L1 or L2 based on her/his perceived identification – ethnically and/or linguistically – with the interlocutor. –– Data collection should take place over several sessions rather than being only a one-time occurrence so that both linguistic and social findings from initial data collection sessions can be explored in further detail in follow up sessions. This can be particularly useful if the researcher finds that certain variants may be used for social marking in the L2 learner’s social context and/or if certain sounds are used variably by the L2 learner. In this way, the researcher can determine what causes the variation, e.g., whether languageinternal and/or language-external constraints such as preceding linguistic environment or ethnic/peer group affiliation affect which L2 or L1 sound or feature is used. To conclude, while we know a great deal about developmental sequences and the effect of language-internal factors on L2 acquisition, and are beginning to learn about language-external constraints on L2 use, these different research threads have not yet been connected into a cohesive model of L2 phonological acquisition. A more expansive research model, as suggested above, will hopefully enable us to enrich our understanding of the complex phenomenon of L2 acquisition and use.

References Abrahamsson, N. 1999. Vowel epenthesis of /sC(C)/ onsets in Spanish/Swedish interphonology: A longitudinal case study. Language Learning 49(3), 473–508. Abrahamsson, N. 2001. Universal constraints on L2 coda production: The case of Chinese/ Swedish interphonology. In Acquiring L2 syllable margins: Studies on the simplification of onsets and codas in interlanguage phonology, 1–38. Stockholm: Center for Research on Bilingualism. Adamson, H.D., & Regan, V. 1991. The acquisition of community speech norms by Asian immigrants learning English as a second language. Studies in Second Language Acquisition 13, 1–22. Archibald, J. 1993a. Language learnability and L2 phonology: The acquisition of metrical parameters. Dordrecht: Kluwer. Archibald, J. 1993b. The learnability of English metrical parameters by adult Spanish speakers. IRAL 31(1), 129–142. Archibald, J. 1997. The acquisition of English stress by speakers of nonaccentual languages: Lexical storage versus computation of stress. Linguistics 35, 167–181. Baptista, B.O. 2006. Adult phonetic learning of a second language vowel system. In Baptista, B., & Watkins, M., (Eds.). English with a Latin beat: Studies in Portuguese/Spanish-English interphonology, 19–40. Amsterdam: John Benjamins Publishing Company.

70   

   Jette G. Hansen Edwards

Bayley, R. 1996. Competing constraints on variation in the speech of adult Chinese learners of English. In R. Bayley and D.R. Preston (Eds.), Second language acquisition and linguistic variation. 98–120. Amsterdam: John Benjamins Publishing Company. Beebe, L.M., & Zuengler, J. 1985. Accommodation theory: An explanation for style shifting in second language dialects. In N. Wolfson & E. Judd (Eds.), Sociolinguistics and language acquisition, 195–213. Rowley, Mass.: Newbury House Publishers Inc. Benson, B. 1988. Universal preference for the open syllables as an independent process in interlanguage phonology. Language Learning 38(2), 221–235. Bohn, O.-S., & Flege, J.E. 1992. The production of new and similar vowels by adult German learners of English. Studies in Second Language Acquisition 14, 131–158. Broselow, E., Chen, S.-I., & Wang, C. 1998. The emergence of the unmarked in second language phonology. Studies in Second Language Acquisition 20, 261–280. Broselow, E., & Xu, Z. 2004. Differential difficulty in the acquisition of second language phonology. International Journal of English Studies 4(2), 135–163. Bundgaard, R.L., Best, C.T., & Tyler, M.D. 2011. Vocabulary size matters: The assimilation of second-language Australian English vowels to first-language Japanese vowel categories. Applied Psycholinguistics 32(1), 51–67. Cardoso, W. 2008. The development of sC onset clusters in interlanguage: Markedness vs. frequency effects. Slabakova, R., et al., (Eds.), Proceedings of the 9th generative approaches to second language acquisition conference (GASLA 2007), 15–29. Somerville, MA: Cascadilla Proceedings Project. Cardoso, W. 2011. The development of coda perception in second language phonology: A variationist perspective. Second Language Research 27(4), 433–465. Carlisle, R.S. 1997. The modification of onsets in a markedness relationship: Testing the interlanguage structural conformity hypothesis. Language Learning 47(2), 327–361. Chen, Y., Robb, M., Gilbert, H., & Lerman, J. 2001. Vowel production by Mandarin speakers of English. Clinical Linguistics & Phonetics 15(6), 427–440. Dickerson, L.J., & Dickerson, W. 1977. Interlanguage phonology: Current research and future directions. In S.P. Corder & E. Roulet (Eds.), Actes du 5ème Colloque de Linguistique Appliquèe de Neuchâtel: The notions of simplification, interlanguages and pidgins and their relation to second language acquisition, 18–29. Geneva: Droz. Dreasher, L.M., & Anderson-Hsieh, J. 1990. Universals in interlanguage phonology: The case of Brazilian ESL learners. Papers and Studies in Contrastive Linguistics 26, 69–92. Eckman, F.R. 1981a. On predicting phonological difficulty in second language acquisition. Studies in Second Language Acquisition 4(1), 18–30. Eckman, F.R. 1981b. On the naturalness of interlanguage phonological rules. Language Learning 31(1), 195–216. Edge, B.A. 1991. The production of word-final voiced obstruents in English by L1 speakers of Japanese and Cantonese. Studies in Second Language Acquisition 13, 377–393. Escudero, P., & Vasiliev, P. 2011. Cross-language similarity predicts perceptual assimilation of Canadian English and Canadian French vowels. Journal of the Acoustical Society of America 130(5), 277–283. Flege, J.E. 1987. The production of ‘new’ and ‘similar’ phones in a foreign language: Evidence for the effect of equivalence classifications. Journal of Phonetics 15, 47–65. Flege, J.E. 1991. Age of learning affects the authenticity of voice-onset time (VOT) in stop consonants produced in a second language. Journal of the Acoustical Society of America 89, 395–411.



Developmental Sequences and Constraints   

   71

Flege, J.E. 1995. Second-language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Theoretical and methodological issues, 233–277. Timonium, MD: York Press. Flege, J.E., Bohn, O.-S., & Jang, S. 1997. Effects of experience on non-native speakers’ production and perception of English vowels. Journal of Phonetics 25, 437–470. Flege, J.E., & Davidian, R.D. 1984. Transfer and developmental processes in adult foreign language speech production. Applied Psycholinguistics 5, 323–347. Flege, J.E., Frieda, E.M., Walley, A.C., & Randazza, L.A. 1998. Lexical factors and segmental accuracy in second language speech production. Studies in Second Language Acquisition 20(2), 155–187. Flege, J.E., & Hillenbrand, J. 1984. Limits of pronunciation accuracy in adult foreign language speech production. Journal of the Acoustical Society of America 76, 708–721. Flege, F.E., Takagi, N., and Mann, V. 1996. Lexical familiarity and English-language experience affect Japanese adults’ perception of r/ and /l/. Journal of the Acoustical Society of America 97, 3125–3134. Gatbonton, E. 1975. Systematic variations in second language speech: A sociolinguistic study. Unpublished doctoral dissertation. McGill University, Montreal. Gatbonton, E. 1978. Patterned phonetic variability in second language speech: A gradual diffusion model. Canadian Modern Language Review 34, 335–347. Gonzáles-López, V. 2012. Spanish and English word-initial voiceless stop production in code-switched vs. monolingual structures. Second Language Research 28(2), 243–263. Greenberg, J. 1965. Some generalizations concerning initial and final consonant sequences. Linguistics 18, 5–32. Greenberg, J. 1966. Language universals: With a special reference to feature hierarchies. Berlin: Mouton de Gruyter. Guion, S. 2005. Knowledge of English word stress patterns in early and late Korean-English bilinguals. Studies in Second Language Acquisition 27(4), 503–533. Guion, S.G., Harada, T., & Clark, J.J. 2004. Early and late Spanish-English bilinguals’ acquisition of English word stress patterns. Bilingualism: Language and Cognition 7(3), 207–226. Gut, U. 2010. Cross-linguistic influence in L3 phonological acquisition. International Journal of Multilingualism 7(1), 19–38. Halle, P.A., Chang, Y.-C., & Best, C.T. 2004. Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics 32(3), 395–421. Hancin-Bhatt, B. 2000. Optimality in second language phonology: Codas in Thai ESL. Second Language Research 16(3), 201–232. Hancin-Bhatt, B., & Bhatt, R.M. 1997. Optimal L2 syllables: Interactions of transfer and developmental effects. Studies in Second Language Acquisition 19, 331–378. Hansen, J.G. 2001. Linguistic constraints on the acquisition of English syllable codas by native speakers of Mandarin Chinese. Applied Linguistics 22(3), 338–365. Hansen, J.G. 2004. Developmental sequences in the acquisition of English L2 syllable codas. Studies in Second Language Acquisition 26, 85–124. Hansen, J.G. 2006. Acquiring a non-native phonology: Linguistic constraints and social barriers. London: Continuum. Hansen Edwards, J.G. 2008. “Social factors and variation in production in L2 phonology.” Book chapter in J.G. Hansen Edwards and M.L. Zampini (Eds.), Phonology and second language acquisition, 251–279. Amsterdam: John Benjamins Publishing Company.

72   

   Jette G. Hansen Edwards

Hansen Edwards, J.G. 2011. “Deletion of /t,d/ and the acquisition of linguistic variation by second language learners of English.” Language Learning 61(4), pp. 1256–1301. Hecht, B.F., & Mulford, R. 1982. The acquisition of a second language phonology: Interaction of transfer and developmental factors. Applied Linguistics 3, 313–328. Hodne, B. 1985. Yet another look at interlanguage phonology: The modification of English syllable structure by native speakers of polish. Language Learning 35(3), 405–417. Ingram, D. 1999. Phonological acquisition. In M. Barrett (Ed.), The development of language, 73–97. London: UCL Press. Jakobson, R. 1968. Child language aphasia and phonological universals. Berlin: Mouton de Gruyter. Jia, G., Strange, W., Wu, Y., Collado, J., & Guan, Q. 2006. Perception and production of English vowels by Mandarin speakers: Age-related differences vary with amount of L2 exposure. Journal of the Acoustical Society of America 119(2), 1118–1130. Kautzsch, A. 2010. Exploring L1 transfer in German learners of English: High front vowels, high back vowels and the bed/bad distinction. Research in Language 8, 63–84. Ladefoged, P., & Maddieson, I. 1996. The sounds of the world’s languages. Oxford: Blackwell. Lado, R. 1957. Linguistics across cultures. Ann Arbor, MI: The University of Michigan Press. Llama, R., Cardozo, W., & Collins, L. 2010. The influence of language distance and language status on the acquisition of L3 phonology. International Journal of Multilingualism 7(1), 39–57. Major, R.C. 1987. Phonological similarity, markedness, and rate of L2 acquisition. Studies in Second Language Acquisition 9, 63–82. Major, R.C., & Faudree, M.C. 1996. Markedness universals and the acquisition of voicing contrasts by Korean speakers of English. Studies in Second Language Acquisition 18, 69–90. Munro, M.J., & Derwing, T.M. 2008. Segmental acquisition in adult ESL learners: A longitudinal study of vowel production. Language Learning 58(3), 479–502. Ng, M.L., & Chen, Y. 2011. Proficiency in English sentence stress production by Cantonese speakers who speaker English as a second language (ESL). International Journal of Speech-Language Pathology 13(6), 526–535. Ohara, Y. 2001. Finding one’s voice in Japanese: A study of the pitch levels of L2 users. In A. Pavlenko, A. Blackledge, I. Piller & M. Teutsch-Dwyer (Eds.), Multilingualism, second language learning, and gender, 231–254. Berlin: Mouton de Gruyter. Osburne, A.G. 1996. Final cluster reduction in English L2 speech: A case study of a Vietnamese speaker. Applied Linguistics 17(2), 164–181. Piper, T. 1984. Observations on the second-language acquisition of the English sound system. The Canadian Modern Language Review 40(5), 542–551. Rojczyk, A. 2010. Forming new vowel categories in second language speech: The case of Polish learners’ production of English /I/ and /e/. Research in Language 8, 85–97. Saunders, J. 1987. Morphophonemic variation in clusters in Japanese English. Language Learning 37, 247–270. Smith, B.L., & Peterson, E.A. 2012. Native English speakers learning German as a second language: Devoicing of final voiced stop targets. Journal of Phonetics 40(1), 129–140. Stockman, I.J., & Pluut, E. 1999. Segment composition as a factor in the syllabication errors of second-language speakers. Language Learning 49(1), 185–209. Tarone, E. 1980. Some influences on the syllable structure of interlanguage phonology. IRAL XVIII(2), 138–152.



Developmental Sequences and Constraints   

   73

Trofimovich, P., Collins, L., Cardoso, W., White, J., & Horst, M. 2012. A frequency-based approach to L2 phonological learning: Teacher input and student output in an intensive ESL context. TESOL Quarterly 46(1), 176–187. Vihman, M.M. 1996. Phonological development: The origin of language in the child. Cambridge, MA: Blackwell. Weinberger, S.H. 1987. The influence of linguistic context on syllable simplification. In G. Ioup and S.H. Weinberger, (Eds.), Interlanguage phonology: The acquisition of a second language sound system, 401–416. Cambridge, Mass.: Newbury House Publishers. Young, R. 1988. Variation and the interlanguage hypothesis. Studies in Second Language Acquisition 10, 281–302. Yu, V.Y., & Andruski, J.E. 2010. A cross-language study of perception of lexical stress in English. Journal of Psycholinguistic Research 39, 323–344. Zampini, M.L. 2008. L2 speech production research. In J.G. Hansen Edwards and M.L. Zampini (Eds.), Phonology and second language acquisition, 219–249. Amsterdam: John Benjamins. Zamuner, T.S., Gerken, L., & Hammond, M. 2004. Phonotactic probabilities in young children’s speech production. Journal of Child Language 31, 515–536. Zhou, W. 2010. The production of L2 vowels by Chinese EFL learners: An acoustic perspective on pre-fortis clipping. Asian Journal of English Language Teaching 20, 81–94.

Lucy Pickering and Amanda Baker

4 Suprasegmental Measures of Accentedness Hansen (2006: 31) states that “there is a scarcity of research on social constraints on the development of an L2 phonology even though most researchers acknowledge the importance of social context in language learning.” We argue that this is particularly the case with regard to the suprasegmental measures of speech. Despite increased interest in the role of prosody in everyday interaction and recognition of its importance in establishing cooperative and successful communication, work in prosodic variation in L2 speech in relation to social factors continues to trail behind other aspects of phonological development research. This chapter begins with an examination of some of the historical reasons for this research gap. The next sections involve a close look at the current view of accentedness in relation to comprehensibility before describing in detail the suprasegmental measures that have contributed to perceptions of L2 accentedness. We then focus on the role of social factors with regard to L2 production of suprasegmentals specifically. In the final section, we consider future directions for this area of research and teaching in the field of L2 pronunciation.

4.1 H  istorical Contextualization of Accentedness & Suprasegmentals Accentedness has been a central concern of the field of L2 pronunciation since its inception. Historically, nonnative-like accentedness was viewed negatively, with accent reduction serving as the primary goal of L2 education. Approaches to addressing accentedness in pronunciation teaching, however, were highly constrained due to trends in linguistic analysis and associated pedagogical practice. Instructional approaches derived from Structuralism, such as Audiolingualism in the US and the Oral Proficiency Approach in the UK (Richards & Rodgers, 1986), promoted pronunciation “accuracy” and intensive practice involving repetition and drills that privileged segmental structure (Celce-Murcia, Brinton, Goodwin & Griner, 2010). This focus did not typically extend to suprasegmental structures, which for many scholars was not viewed as a grammatical system and was therefore relegated to the realm of paralinguistic investigation (Chomsky & Halle, 1968; Sapir, 1921). There were, however, some early pedagogical treatments of intonation on both sides of the Atlantic. Pike (1945) developed a phonemic model that Chun (2002: 25) describes as “the hallmark of American intonation analysis for the next

76   

   Lucy Pickering and Amanda Baker

two decades and beyond.” He proposed four pitch phonemes identified relative to each other (extra-high, high, mid, and low) that form significant components of the intonation contour. Intonational meaning in natural speech was connected primarily to speaker attitude. In the UK, early descriptions of ‘tunes’ were followed by an analysis that divided the intonational unit into a pre-head, head, and nucleus on which falling or rising nuclear tones occurred, and similarly to Pike, attitudinal function was considered to be primary (O’Connor & Arnold, 1961). As theoretical interest in linguistics shifted toward analysis of discourse (Hymes, 1972), prosody became less marginalized. In a comprehensive treatment of intonation in English, Halliday (1967, 1970) proposed a fully grammaticalized model comprising three separate systems: Tonality (tone unit divisions), tonicity (internal structure of tone units with particular emphasis on tonic or nuclear syllables) and tone (pitch movement on the final tonic syllable). Within the system of tone, five primary tone choices were possible, and Halliday (1970: 21) describes their role as belonging to the “realm of grammar (and within grammar, the realm of syntax)”. A secondary tonal system identified three pitch levels whose function was to recognise affective meaning. Patterns were labelled with adjectival glosses such as “forceful or querulous” and “awestruck or disappointed” (Halliday, 1970: 32f). Increasingly, however, discourse-based models recognized a multi-functional role for intonation in discourse beyond grammatical and attitudinal components (Chun, 1988). In applied linguistics, this expanded notion of the communicative value of intonation was most thoroughly expounded in David Brazil’s (1985) model of Discourse Intonation. Brazil proposed that intonation structure directly contributes to the pragmatic message of the discourse by the use of intonational cues to link information to the world or context of the hearer. Participants in the interaction negotiate toward a state of convergence that allows for successful communication. In this respect, pitch choices have direct significance for indexical or non-referential functions in discourse, including the use of pitch variation to communicate sociolinguistic information such as status differences, solidarity or social distance between interlocutors. Simultaneously, these non-referential functions of prosodic cues were being investigated in discourse models of sociocultural interaction, particularly in the cross-cultural communication work of John Gumperz (1982). Using contextualization cues evident at all levels of the discourse including the prosodic, Gumperz proposed that participants rely on a shared linguistic and sociocultural background for their interpretation. These are tacit behaviours and not easily retrieved by native speakers on a conscious, analytical level. Gumperz’s own work (1982, 1983, 1992) investigating interactions between Indian English speakers and British/North American English speakers reveals that Indian English



Suprasegmental Measures of Accentedness   

   77

prosodic conventions frequently lead North American/British participants to view Indian English speakers as discourteous, aggressive or misleading. Thus, Gumperz, Kaltman and O’Connor (1984: 5) characterise intonation as “among the most important of the devices that accompany cohesion in spoken interaction”. Despite the promising direction of this research, its pedagogical impact was minimal. With the exception of somewhat isolated calls to treat intonation and discourse (see, for example, Bradford, 1988 and Pennington & Richards, 1988) throughout the 1980s and 1990s, the role of suprasegmentals remained a low priority in the field of L2 pronunciation.

4.2 Accentedness versus Comprehensibility An important turning point for the role of suprasegmentals in L2 speech emerged from the discovery that the relationship between accent and comprehensibility (i.e., increased non-native accent resulted in decreased listener comprehensibility) had been vastly oversimplified. Empirical research models developed by Tracey Derwing, Murray Munro and their colleagues meticulously separated accentedness from independent factors of intelligibility and comprehensibility. Within this framework, foreign-accented speech was defined as “non-pathological speech that differs in some noticeable respects from native speaker pronunciation norms” (Munro & Derwing, 1995: 290). This is distinguished from comprehensibility which is defined as “listeners’ perceptions of difficulty in understanding particular utterances” (1995: 291). In a group of studies, Derwing, Munro and other researchers have consistently shown that comprehensibility is not correlated with accentedness ratings by native speaker (NS) listener judges; in fact, speech is often rated as highly intelligible despite a wide range in accentedness ratings (Anderson-Hsieh, Johnson & Koehler, 1992; Derwing & Munro, 1997), and speakers who succeed in reducing the degree of foreignness in their accents (based on expert NS raters) may still be heard as incomprehensible by lay listeners (Munro & Derwing, 1995). In much of this work, accentedness is measured as an undifferentiated construct, e.g. ‘no accent’ to ‘very strong accent’ (Lima, 2012). However, some studies have teased apart suprasegmental and segmental components of accent, and these show the importance of intonation, stress, and temporal measures such as speech rate. Munro and Derwing (1995) reported that intonation may play a slightly larger role in NSs’ assessment of both comprehension and accent in comparison to segmentals. Both Derwing and Munro (1997) and Derwing and Rossiter (2003) further concluded that L2 comprehensibility is improved for NS listeners who perceive enhanced prosodic proficiency in the non-native speakers’ (NNS)

78   

   Lucy Pickering and Amanda Baker

speech. Derwing and Munro (1997: 15) state that improvement in NNS comprehensibility “is more likely to occur with improvement in grammatical and prosodic proficiency than with a sole focus on correction of phonemic errors”. Within this research paradigm, the suprasegmental components of accent (e.g., “goodness of prosody” which primarily considers intonation and rhythm) are measured impressionistically by listener judges using Likert scales or sets of descriptors (e.g., Anderson-Hsieh, Johnson & Koehler, 1992, Derwing & Munro, 1997; Derwing, Rossiter, Munro & Thomson, 2004; Munro & Derwing, 1995). Although there is nothing “inherently pejorative about an impressionistic methodology” (Koffi, 2012: 227), these kinds of subjective auditory ratings may be compromised by their dependence on raters’ backgrounds and experiences which may also threaten their internal reliability (Edge & Richards, 1998; Kang & Pickering, 2011; Kang & Rubin, 2009). As Kang, Rubin and Pickering (2010: 564) note, we must beware of a “sort of tautological regression that uses human judgements of speech as the criterion for assessing bias in human judgements of speech”. Innovations in technology now allow for extensive objective measurement of acoustic phonetic variables using computer-assisted acoustic analysis programs (Ingram & Park, 1997; Levis & Pickering, 2004) such as PRAAT (http://www.praat. org/) or WASP (http://www.phon.ucl.ac.uk/resource/sfs/wasp.htm). In addition, they have allowed an expansion of the field of analysis to include detailed assessment of the role of pitch change via fundamental frequency (F0), and variations in volume and speech rate which would be inaccessible to listener judges. In the following section we detail the suprasegmental features of accent currently under consideration in the literature through the use of both auditory and acoustic measurement.

4.3 Suprasegmental Features as Components of Accentedness As a component of the phonological system of the L2, non-native prosody clearly contributes to perceptions of foreign accent; as Jilka (2007: 77) points out, however, “the non-trivial difficulty lies in two tasks: accurately identifying exactly those intonational deviations that actually constitute relevant manifestations of foreign accents, and ascertaining the respective relative significance of these deviations”. In the following discussion, we take a broad view of suprasegmental features and operationalize them as a comprehensive set of parameters comprising pitch, stress, pause and rate measures (Iwashita, 2010; Kang et al., 2010; Trofimovich & Baker, 2006). A solid understanding of these terms is important as many of these features are central to our subsequent discussion of the role of social factors in the L2 pronunciation of suprasegmentals.



Suprasegmental Measures of Accentedness   

   79

4.3.1 Pitch measures Pitch measures comprise what is typically defined as intonation, or the linguistically meaningful use of pitch movement at the level of the spoken phrase or unit (as opposed to the lexical level). Although theoretical frameworks vary, models of intonation in discourse are componential and identify sites within intonation or tone units which carry significant pitch movement (Brazil, 1997; Pierrehumbert & Hirschberg, 1990). The Pierrehumbert and Hirschberg model (based on Pierrehumbert, 1980) comprises a series of static tones or tonal targets that, together with a series of phonetic implementation rules, determine the shape of the F0 contour. There are three groups of tones: pitch accents, phrase accents, and boundary tones. Pitch accents occur on stressed or salient syllables and mark the information status of the item. For example, high pitch accents mark the ‘new’ information on the following example: The train leaves at seven H* H* H*

(1990: 286)

The second group of tones, phrase accents, associate with the right edge, or closing boundary of either intermediate phrases, or intonational phrases (L, H). The last group of tones are associated with the right edge of intonational phrases. They are symbolized by (L%, H%). The combinations of phrase accents and boundary tones create four possible ‘complex’ tones at the end of an utterance (LL%, LH%, HL%, and HH%). The following example shows a typical declarative contour with LL%. The train leaves at seven H* H* H* L L%

(1990: 286)

Phrase accents and final boundary tones also indicate whether a section of the discourse is complete, or if further discourse is required for its interpretation. Pierrehumbert and Hirschberg (1990) attach pragmatic values to the tonal combinations in which tonal targets make up a tune “to convey a particular relationship between an utterance, currently perceived beliefs of the hearer or hearers and anticipated contribution of subsequent utterances” (1990: 271). For example, the following final contour – an H* pitch accent followed by a L phrase accent and a L% boundary tone – conveys what the speaker believes to be “new information” to the hearer:

80   

   Lucy Pickering and Amanda Baker

Legumes are a good source of vitamins H* L L%

(1990: 272)

In contrast, within Brazil’s model, tone unit boundaries are identified on the basis of pitch level and movement on stressed or prominent syllables which are divided into two categories: an onset syllable carrying what Brazil terms as “key” choice (a high ⇑, mid ⇒, or low ⇓ level pitch accent) and a final tonic syllable carrying the termination pitch and tone choice (a high, mid or low rising, falling or level tone). With regard to key, high key choices indicate that the speaker is marking the spoken material as contrastive or highlighted in relation to surrounding information as shown in the following example: He took the exam and ↑ FAILED He did not pass, as you might have expected: contrastive (Sinclair & Brazil, 1982: 144) Mid key choices are glossed as additive, whereas low key signifies a reformulation or ‘equative’ function indicating that no new information is added. Together, key and termination choice mark larger units called pitch sequences or speech paragraphs. These comprise a stretch of consecutive tone units that fall between two low termination choices and form a topic-related intonational paragraph. Paragraph beginnings and endings are marked with an initial high key and closed with a low final termination. Tone choice is carried on the prominent syllable with the maximum sustained pitch movement and reflects the common ground between speakers. Tones that end in a falling movement (fall ↘ or rise-fall ↗↘) are used to indicate new or asserted information, and tones with a rising pattern (rise ↗ or fall-rise ↘↗) mark material as assumed to be known or recoverable. A fifth level or neutral tone is used to temporarily withdraw from the context of the interaction or to mark routinized language behaviour (e.g., while reading aloud). The following example exemplifies typical tone and key interactions in a single teacher-student exchange which the teacher closes with a final evaluative assertion of ‘good’ using a falling tone and low termination: T: ⇒ What’s the final ↗ANSWER? S: ⇒ ↗ SIXTEEN? T: ⇒ ↘ SIXTEEN ⇓↘ GOOD

Both pitch height (high, mid or low) and pitch movement (rising, falling or level) have been shown to contribute to perceived accentedness. NNSs have been con-



Suprasegmental Measures of Accentedness   

   81

sistently shown to manifest a compressed overall pitch range and a smaller pitch declination rate in comparison with NSs (Jenner, 1976; Mennen, 1998; Pickering, 2004; Willems, 1982). These NNS characteristics can compromise speech paragraph structure as this is perceived by the NS listener in part by an initial extra high pitch reset and low pitch close or termination. Pickering (2004) found that non-native English speaking teaching assistants were unable to manipulate key choices consistently to create intonational paragraphs. In conjunction with a narrower pitch range, this obfuscated the prosodic structure of their spoken discourse in the classroom, and also impacted NS student learning in lectures as a result. Studies have also shown that L2 speakers may use unexpected choices of tone, e.g., a replacement of rises with falls and vice versa (Mennen, 2007: 55). In Wennerstrom (1994, 1997), who uses Pierrehumbert’s intonation model, Japanese, Thai and Chinese speakers used low, falling tones at boundaries between related propositions where NS listeners would anticipate rising or mid-level tones. Pirt (1990) reported similar findings in a study of Italian learners, and Hewings (1995) found a preference for falling tones in the speech of L2 learners from Korea, Greece and Indonesia. Such uniformly falling tones can be problematic in contexts where NSs commonly use rising tones to avoid an impression of rudeness or animosity, or to reduce face threats when expressing disagreement (Pickering, 2001). Miscommunication between speakers related to patterns of intonation can be pervasive and lead to negative stereotyping. Gumperz (1982), for example, describes how the unexpected use of falling tones in the speech of Indian English speakers in a workplace cafeteria in Britain caused them to be perceived as uncooperative by their British English-speaking interlocutors.

4.3.2 Stress measures Stress measures may focus on word stress or sentence stress, i.e., the difference between the syllable stressed in the word form umBRElla and the prominent word (green) in the sentence ‘I want my GREEN umbrella’. In an analysis of nonnative teaching presentations, Gallego (1990) found that word stress errors were responsible for 35.8 % of pronunciation problems as identified by ESL specialists although no specific examples were given. Field (2005) conducted an experimental study in which lexical stress was manipulated to produce conditions in which bisyllabic words were given in three conditions: their standard form, with a shift of stress and weakened vowel quality, and a shift of stress with full vowel quality. In general, Field found that intelligibility was more frequently compromised when lexical stress was shifted rightward for both native and nonnative listeners.

82   

   Lucy Pickering and Amanda Baker

Sentence stress, or prominence, is a significant component in discourse intonation structure. Unlike Field, who tested words in their isolated citation form, Hahn (2004) created three versions of a text in which primary stress or sentence stress was manipulated in academic presentations given by nonnative speakers. Native speaker listeners responded more positively to production and recall tasks in which primary stress was normally placed as opposed to misplaced or absent. In Brazil’s discourse intonation model, the tone unit is identified by this feature and at least one, but usually two, prominent syllables delimit the tonic segment. Research has suggested that NNS frequently misplace these prominences (Jenner, 1976) or obscure primary sentence stress by placing equal stress on every content word in the tone unit as opposed to placing focal prominent stress on the tonic syllable (Backman, 1979; Pickering, 1999; Wennerstrom, 2000). As pitch height on prominent syllables also signals key and termination choice (i.e., high key onset and low key termination choice as speech paragraph boundaries), misplaced or obscured stress patterns also contribute problems in the identification of pitch sequences as larger organizational structures in spoken text.

4.3.3 Pause and rate measures Pauses are an important factor in perceptions of the fluency of NNS discourse (Riggenbach, 1991). Measures under investigation include number, length, and location of silent and filled pauses. Analysis of NNS discourse shows qualitative and quantitative differences in both placement and length of pauses in comparison to NSs. These materially affect prosodic structure and contribute to perceived accent and a reduction in comprehensibility (Anderson-Hsieh & Venkatagiri, 1994; Kormos & Dénes, 2004, Riggenbach, 1991). Low proficiency speakers tend to pause frequently and inappropriately, and pause durations are longer (Pickering, 2004). In addition, silent pauses tend to be longer and more irregular in non-native speech and regularly break up conceptual units (Pickering, 1999; Rounds, 1987). Pause structure also contributes to perceptions of speech rate and together they comprise the temporal components of L2 speech. Temporal contributions to the perception of accentedness include the rate of syllables per second, articulation rate (mean number of syllables per second excluding pauses), phonation-time ratio (percentage of time producing speech), and mean length of run (an average number of syllables between pauses of a specified length). Following Munro and Derwing (2001), studies suggest that there is a curvilinear relationship between speech rate as measured by syllables per second and listeners’ judgments of L2 comprehensibility and accent in which, for optimum



Suprasegmental Measures of Accentedness   

   83

comprehensibility, NNS utterances are slower than the typical rate of an NS utterance but faster than the rate that L2 learners often produce. In addition to speech rate, Kormos and Dénes (2004) reported that mean length of utterance and phonation time ratio are the best overall predictors of perceived fluency in NNSs by NSs. Typically, the temporal and acoustic measures detailed above have been investigated independently. Kang et al. (2010) investigated the conjoint impact of each of these factors in a study examining the relationship between suprasegmental features of accentedness and impressionistic judgments of NNS oral proficiency and comprehensibility by NS listeners. One hundred and eighty-eight US undergraduate students judged 26 speech samples elicited from iBT TOEFL® examinees. Each of the samples was analysed for 29 temporal and acoustic measures using a KayPENTAX Computerized Speech Laboratory. Following a hierarchical cluster analysis, suprasegmental variables accounted for approximately 50 % of the variance in the NS raters’ assessments of proficiency and comprehensibility. The most salient variable was found to be a fluency cluster which included all rate measures (syllables per second, articulation time, phonation time and mean length of runs), one stress measure (average number of prominent syllables per run), and one intonation measure (mid-level, falling tones). Similar fluency clusters comprising rate and stress measures were found to be significant in previous studies (Anderson-Hsieh et al., 1992; Freed, 2000, Kormos & Dénes, 2004). Mid-level falling tones are also the most common tones to appear in NS discourse and are thus not unexpected (Cauldwell, 2003; Pickering, 2001). Rising tones, unit final low level tones and silent pauses were also positively associated with comprehensibility ratings, suggesting that the ability to mark shared background material (information or knowledge) and clear boundary marking are significant factors in perceived comprehensibility of NNS speech. The Kang et al. (2010) study was ground-breaking to the extent that it privileges objective acoustic measurements of suprasegmental features above NS listener perceptions. In addition, it seeks to understand the relative significance of different suprasegmental features with regard to perceptions of oral proficiency. Future studies will permit situating these findings within a larger, more comprehensive data set.

4.4 T  he Role of Social Factors in the L2 Pronunciation of Suprasegmentals In the following section of the chapter we highlight four sociocultural factors that have been identified in the research as having some influence on the perception

84   

   Lucy Pickering and Amanda Baker

and/or production of L2 prosody: listener familiarity, NS status, sociopolitical factors, and identity.1

4.4.1 Listener familiarity with NNS accents The impact of listener familiarity with non-native speech is an area that has seen much development in the past decade, particularly with regard to World Englishes. Its importance lies in the recognition that perceptions of accentedness and comprehensibility depend on the experiences of the listener as well as the production of the speaker. Thus, listeners’ familiarity with a particular L1 can influence how they receive a message conveyed by a speaker. L2 speakers who use unexpected prosodic patterns may be considered especially difficult to understand by listeners who are unfamiliar with speakers from a particular L1. Much of the research in this area, however, has focused on sentence-level or read materials (Bradlow & Bent, 2008; Kennedy & Trofimovich, 2008; Matsuura, 2007), which makes suprasegmental components of accent that manifest over discourse difficult to assess (Levis & Pickering, 2004). Only one study speculates on the role of prosodic variables specifically. Major, Fitzmaurice, Bunta and Balasubramanian (2002) asked NNSs to listen to two-minute lecture scripts in English read by native and non-native speakers. One of their findings was that Chinese and Japanese listeners understood the Spanish speakers equally as well as the North American English speakers despite the Spanish speakers’ accented speech. The authors suggest that the rhythmic characteristics of the Spanish-accented speakers transferred from Spanish (i.e., its syllable-timed nature) assisted the comprehension of Chinese and Japanese listeners whose L1s have a similar lack of vowel reduction; thus, in this study, listener familiarity with a specific L1 may not always determine judgements of accentedness or comprehensibility. Instead, familiarity with the rhythmic patterns that are common to groups of language (i.e., syllable-timed or stress timed) may have a much stronger impact on listeners’ perceptions.

1 Although many of the studies discussed here frequently span more than one of these issues, the studies outlined below are grouped thematically according to what are presented as the most significant findings of the research.



Suprasegmental Measures of Accentedness   

   85

4.4.2 Native speaker status Perhaps to a greater degree than amount of exposure a listener may have to a particular variety of L1 or L2 speech, the NS status of the listener has also been found to have a definite impact on the perception of an L2 speaker’s use of suprasegmentals. While historically studies have focused on the judgment of inner-circle (Kachru, 1992) speaker-hearers rather than outer or expanding circle listeners (Jenkins, 2002), more recently, researchers have expanded their participant pool to include both NS and NNS listener judgments. Jun and Li (2010) investigated whether NNS and NS status would have an impact on raters’ assessments of comprehensibility and accentedness using think-aloud protocols to identify factors that might influence their assessments. The study involved seven NNSs of various levels of proficiency who were rated by three NS and three NNS raters. Results showed that the NNS raters were considerably more aware of both problematic suprasegmentals and segmentals than the NS raters. Expanding these results to a high stakes environment, Kang (2012) investigated whether NS status would have an impact on listener perception of the oral performance of International Teaching Assistants (ITAs) in relation to prosodic measures of accentedness. Seventy NS and NNS undergraduate students in the US rated 11 ITAs. Five rater background factors were investigated: NNS status; language sophistication (derived by adding the number of college classes taken in linguistics and/or TESL to years of foreign language study); amount of contact with NNSs (percentage of time spent interacting with NNSs in a typical week); amount of teaching and tutoring experience; and negative experience in ITAtaught courses.) Following a cluster analysis, Kang found that approximately 25 % of the variance in the ratings of oral proficiency and instructional competence was attributable to suprasegmental variables (18–19 %) and rater background variables (8–9 %) with NNS status as the most significant background variable. This suggests a correlation between NS perception of NNS production that interacts with NS linguistic experiences and which warrants further investigation.

4.4.3 Sociopolitical factors Some initial groundwork has been established in this area, showing a possible connection between NS listeners’ association of ‘non-standard’ rhythm or fluency to speakers from ‘stigmatized’ countries. Lindemann (2005) examined the attitudes of undergraduate NS students in an American university concerning the accents of NNSs from 58 countries. Two hundred and eight (208) students rated the English of NSs and NNSs in terms of how “familiar,” “correct,” “pleas-

86   

   Lucy Pickering and Amanda Baker

ant” and friendly” they considered the speakers’ English to be. In addition, a map labelling task was conducted with 79 students who were asked to describe the English of speakers in various countries in the world. As with prior studies, familiarity played a major role in how listeners rated the accents of NNSs, however, sociopolitical factors were also found to have an important impact. In particular, Lindemann noted that groups that were classified as “non-stigmatized”  – those that had “comparatively favourable relationships with the US during the respondents’ lifetimes, and do not have large populations of recent immigrants in the US” (p. 193) – received positive ratings in this study. In contrast, speakers from countries that had less favourable relationships with the US or who were the source of much higher numbers of immigrants to the US tended to receive lower ratings, including Mexican, Japanese, Chinese and Indian English speakers. In the map labelling task, Lindemann also found that most participants provided negative descriptions of the English used in countries they had evaluated negatively for correctness in the rating tasks. China and Mexico, two highly rated “familiar” countries, received the most comments. Some participants referred to Mexican English as “broken English,” “sloppy” or “lazy sounding” (p. 203), and others considered Chinese or Asian English in general to be “broken,” “choppy,” “slurred,” “blurry” or “heavily accented” (p. 200).

4.4.4 Identity It is clear that identity in terms of regional, age- or gender-related and sociocultural groups, has a role in NS intonational choices. It is also clear that identity issues have been a concern of the L2 pronunciation field for some time (e.g. Guiora, Beit-Hallahmi, Brannon, Dull & Scovel, 1972). Thus we should expect that for L2 learners, there may also be a connection between intonational choices and identity. We know that for adult language learners, there may be a direct conflict between a claimed desire for a native-like accent and a strong wish to maintain a sense of personal identity. Primarily, however, accent in these studies is considered as a global construct encompassing both segmental and suprasegmental features (Jenkins, 2005; Sifakis & Sougari, 2005). One study that directly addresses social concerns in L2 suprasegmental production is Ohara (2001) who investigated the acquisition of intonation by American women learning Japanese. She found that two out of five of the English advanced learners of Japanese avoided the use of a higher pitch (a sociocultural norm for women’s speech in Japan) as they were unwilling to perform what they considered to be a “genderbased identity” (Hansen, 2006: 27).



Suprasegmental Measures of Accentedness   

   87

An expansion of the notion of identity includes the possible influence of ethnic group membership as it relates to accentedness. Currently, we can find no examination of this factor in L2 pronunciation which deals with suprasegmental measures specifically. Gatbonton, Trofimovich and Magid (2005) report two studies undertaken in Canada in which they investigated the relationship between accentedness and membership in particular ethnic groups for L2 English learners. In the first study, 24 French speakers of English, who rated themselves in the 1970s during a time of strong Québécois nationalism as either nationalistic listeners, non-nationalistic listeners or liberal listeners, associated ethnic affiliation with the degree of accent possessed regardless of the ethnic affinity they most closely associated themselves with, i.e., the greater the accent, the greater the affiliation with the ethnic group. In addition, only the nationalistic listeners working in monoethnic or intragroup contexts preferred leaders with heavily accented or moderately accented speech whereas the other two groups working in biethnic contexts preferred either non-accented or moderately accented leaders. Thirty years later, the researchers conducted a similar study to determine if similar results would occur in situations where conflict between ethnic groups did not occur. Eighty-four Chinese speakers of English participated in the second study and results revealed that, in English, speakers with moderate or minimal accents were ascribed significantly less ethnic group affiliation than those with heavy accents. Although suprasegmental measures were not addressed directly, the studies suggest that components of accentedness play an important role in perceptions of group identity and particularly in the workplace environment. As the discussion above suggests, this is still a nascent area of research in L2 phonological development. An important methodological issue is that intonation studies have typically used relatively small numbers of participants and have focused on artificial speech tasks conducted in laboratory settings (Gut, 2007). As noted above, a number of current studies in the assessment of accentedness continue to prioritize sentence-level, artificially produced data. There is growing evidence, however, that such data are problematic for assessing the role of suprasegmental features in natural speech. In a discourse analysis of Mandarin Chinese, Tao (1996) argues that pitch register differences supposedly identified between interrogatives and declaratives may in fact be an artefact of studying isolated sentences in an experimental setting. Similarly, Lai (2002) proposes that misconceptions about the stress patterns of Cantonese derive in part from a reliance on experimental production of the language which alters its normal prosodic patterns. In regards to English, Brazil (1985) recognized that speakers’ differing levels of engagement with the language and the listener in reading aloud tasks results in systematically different prosodic composition of the discourse.

88   

   Lucy Pickering and Amanda Baker

A promising alternative approach may lie in recent developments in corpus construction. Currently, at least two learner corpora are available that include annotation of some suprasegmental features. The LeaP (Learning Prosody in a Foreign Language) corpus comprises more than 12 hours of recording time (73.941 words) of L2 learners of German and English and includes six manually annotated and two automatically annotated tiers (Gut, 2007). The corpus includes native speakers of both English and German as controls and a number of L2 learner groups of varying levels of proficiency. The corpus comprises four different types of speech: interview, reading passage, story retelling, and reading of nonsense word lists (p. 151). Gut (2007) reports an initial investigation on vowel reduction and also describes the use of the LeaP corpus as a teaching tool. The Hong Kong corpus of Spoken English has approximately one million words prosodically transcribed using Brazil’s (1997) discourse model (Cheng, Greaves & Warren, 2008). It comprises sub-corpora of academic, business, conversation and public discourse spoken by both Hong Kong Chinese English speakers and native English speakers from Britain, the U.S. and Australia in normal day-to-day interactions. Initial examination of this corpus has resulted in a great deal of work examining the prosodic characteristics of these speakers including the intonation of indirect speech acts (Cheng, 2002), speakers’ use of intonation to assert conversational control (Warren, 2004), and the intonation of disagreement (Cheng & Warren, 2005).

4.5 Future Directions Suprasegmental components of L2 accentedness and the role of social factors is currently an under-researched area, but newer research designs show considerable promise. A shift from a focus on grammatical contrasts in isolated sentences to a focus on discourse competence in L2 production has resulted in a more equal distribution of research between segmental and suprasegmental aspects of speech. Within pronunciation pedagogy also, the use of speech analysis technology has matured from the application of limited visualization techniques (e.g., de Bot, 1980; Weltens & de Bot, 1984) to expanded contexts that prioritize text-based suprasegmental patterns to improve the L2 production of speech paragraphs (Levis & Pickering, 2004) and pitch variation in oral presentations (Hincks & Edlund, 2009). Testing is also a field in which technology, particularly Automatic Speech Recognition (ASR) systems, is being used to assess suprasegmental structure.



Suprasegmental Measures of Accentedness   

   89

Fully automated tests such as Versant, also known as PhonePass2, incorporate suprasegmental measures (timing and pause structure specifically). Nevertheless, as commercial products, they have not undergone the kind of rigorous and transparent peer-review processes that would be expected within typical research circles (Chapelle & Chung, 2010). Furthermore, while temporal measures such as speaking rate and pause measures may produce high correlations between ASR systems and human raters, Kang (2010) notes that many aspects of intonation and its communicative function have yet to be incorporated into ASR systems. At the same time as we are able to use technology to pinpoint with increasing accuracy the variation in L2 suprasegmental production and its potential significance, the paradigm is also shifting toward an expanded view of social context, both in terms of how speakers wish to make themselves understood, and in terms of how context affects perceptions of accentedness (Levis, 2005). While there is considerable agreement regarding the communicative role of intonation in NS-based interaction, the investigation of intonation as a resource in English as a Lingua Franca (ELF) interaction is very recent. ELF in this context is defined as “communication between fairly fluent interlocutors from different L1 backgrounds, for whom English is the most convenient language” (Breiteneder et al., 2006: 163) and initial research suggests that ELF interactants may process phonological features differently from their NS counterparts and rely more heavily on segmental than suprasegmental features (Deterding, 2005; Jenkins, 2000). Jenkins suggests that this predominant focus on bottom-up processing reflects L2 speakers’ higher dependency on phonological form as opposed to shared contextual knowledge with their interlocutors. Initial research with regard to suprasegmental features suggests that ELF users employ intonational cues as a resource to negotiate and maintain successful interaction including the use of prosodic self-repair and stylized voicing in quoted speech (Pickering, 2009; Pickering & Litzenberg, 2011). However, ELF interaction does not mirror NS-based interaction with regard to the functions of prosodic structure. For example, there is no evidence of socially-oriented tone choices such as the rising tones in face-saving contexts that would be anticipated in NS-NS conversation. This research program is in its very early days but promises to contribute important insights regarding the perception and production of prosodic cues in naturally-occurring discourse across varying social contexts. Regardless of status as native or non-native speakers, the importance of sociocultural variables affecting accentedness and comprehensibility is clear: our attitude towards L2 speakers is directly connected to our perception of their

2 Produced by Ordinate Corporation

90   

   Lucy Pickering and Amanda Baker

use of suprasegmental features in communication. As Mennen (2007: 54) states, “given that we derive much of our impressions about a speaker’s attitude and disposition toward us from the way they use intonation in speech, listeners may form a negative impression of a speaker based on the constantly inappropriate use of intonation.” Thus, we cannot afford to neglect the teaching and learning of prosodic elements in the L2 classroom. In light of the importance of suprasegmentals in communication, one might expect the intersection between prosody, social factors and accentedness to have yielded a larger amount of research and pedagogical interest in the field of L2 phonology. Instead, work has been somewhat sporadic and localized. Our next steps, therefore, need to expand on our currently limited knowledge of cross-linguistic influence and age-related developmental features as significant for the production and perception of L2 prosody. It seems likely that more recent innovations including corpus methodology, the use of technology, and an expanded notion of communication contexts promises a more robust program of research in this area. With these new understandings, we can return to a focus on the classroom, and determine an effective pedagogy for addressing the learning and teaching of suprasegmentals in a manner appropriate to the ever-increasing complexity and diversity of our global society.

References Anderson-Hsieh, J., Johnson, R., & Koehler, K. 1992. The effect of foreign accent and speaking rate on native speaker comprehension. Language Learning 38, 561–593. Anderson-Hsieh, J., & Venkatagiri, H. 1994. Syllable duration and pausing in the speech of Chinese ESL speakers. TESOL Quarterly 28, 807–812. Backman, N.E. 1979. Intonation errors in second language pronunciation of eight Spanish speaking adults learning English. Interlanguage Studies Bulletin 4, 239–266. Bradford, B. 1988. Intonation in context. Cambridge: Cambridge University Press. Bradlow, A., & Bent, T. 2008. Perceptual adaptation to nonnative speech. Cognition 106, 707–729. Brazil, D. 1985.The communicative value of intonation. Birmingham: University of Birmingham, English Language Research. Brazil. D. 1997. The communicative role of intonation in English. Cambridge: Cambridge University Press. Breiteneder, A., Pitzl, M., Majewski, S., & Klimpfinger, T. 2006. VIOCE recording: Methodological callenges in the compilation of a corpus of spoken ELF. Nordic Journal of English Studies 5(2), 161–187. Cauldwell, R. 2003. Streaming speech: Listening and pronunciation for advanced learners of English.Birmingham: Speechinaction. Celce-Murcia, M., Brinton, D.M., Goodwin, J.M., & Griner, B. 2010. Teaching pronunciation: A reference for teachers of English to speakers of other languages (2nd ed.). Cambridge: Cambridge University Press.



Suprasegmental Measures of Accentedness   

   91

Chapelle, C., & Chung, Y. 2010. The promise of NLP and speech processing technologies in language assessment. Language Testing 27, 301–315. Cheng, W. 2002. Indirectness in intercultural communication. Conference on the Pragmatics of Interlanguage. Munster University, Germany, 22–25 Spetember 2002. Cheng, W., Greaves, C., & Warren, M. 2008. A corpus-driven study of discourse intonation. Amsterdam: John Benjamins. Cheng, W. & Warren, P. 2005. //→ well i have a DIFferent// ↘ THINKing you know//: A corpus driven study of disagreement in Hong Kong business discourse. In F. Bargiela-Chiappini & M. Gotti (Eds.), Asian Business Discourse(s) (pp. 241–270). Frankfurt: Peter Lang. Chomsky, N., & Halle, M. 1968. The sound pattern of English. Cambridge, MA: MIT Press. Chun, D. 1988. The neglected role of intonation in communicative competence and proficiency. Modern Language Journal 72, 295–303. Chun, D. 2002. Discourse intonation in L2. Philadelphia: John Benjamins de Bot, K. 1980. The role of feedback and feedforward in the teaching of pronunciation: An overview. System 8, 35–45. Derwing, T.M., & Munro, M. 1997. Accent, intelligibility, and comprehensibility: Evidence from four L1s. Studies in Second Language Acquisition 19(1), 1–16. Derwing, T. & Rossiter, M. 2003. ESL learners’ perceptions of their pronunciation needs and strategies. System 30, 155–166. Derwing, T.M., Rossiter, M.J., Munro, M.J., & Thomson, R.I. 2004. L2 fluency: Judgments on different tasks. Language Learning 54, 655–679. Deterding, D. 2005. Listening to estuary English in Singapore TESOL Quarterly 39(3), 425–440. Edge, J., & Richards, K. 1998. May I see your warrant please? Justifying outcomes in qualitative research. Applied Linguistics 19, 334–356. Field, J. 2005. Intelligibility and the listener: The role of lexical stress. TESOL Quarterly 39, 399–423. Freed, B.F. 2000. Is fluency, like beauty, the eyes, of the beholder? In H. Riggenbach (Ed.), Perspectives on fluency (pp. 243–265). Ann Arbor, MI: the University of Michigan Press. Gallego, J.C. 1990. The intelligibility of three nonnative English-speaking TAs: An analysis of student-reported communication breakdowns. Issues in Applied Linguistics 1, 219–237. Gatbonton, E., Trofimovich, P., & Magid, M. 2005. Learners’ ethnic group affiliation and L2 pronunciation accuracy: A sociolinguistic investigation. TESOL Quarterly 39(3), 489–512. Guiora, A., Beit-Hallahmi, B., Brannon, R., Dull, C. & Scovel, T. (1972) The effects of experimentally induced changes in ego states on pronunciation ability in a second language: An exploratory study. Comprehensive Psychiatry 13, 421–428. Gumperz, J. 1982. Discourse strategies. Cambridge: Cambridge University Press. Gumperz, J. 1983. Language and social identity. Cambridge: Cambridge University Press. Gumperz, J. 1992. Contextualization and understanding. In A. Duranti & C. Goodwin (Eds.) Rethinking context: Language as an interactive phenomenon (pp. 229–252). Cambridge: Cambridge University Press. Gumperz, J., Kaltman, H., & O’Connor, M. 1984. Cohesion in spoken and written discourse. In D. Tannen (Ed.), Coherence in spoken and written discourse (pp. 3–19). Norwood, NJ: Ablex. Gut, U. 2007. Learner corpora in second language prosody rsearch and teaching. In J. Trouvain & U. Gut (Eds.), Non-native prosody (pp. 145–167). Berlin: Mouton. Hahn, L.D. 2004. Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly 38, 201–223. Halliday, M.A.K. 1967. Intonation and grammar in British English. Paris: Mouton.

92   

   Lucy Pickering and Amanda Baker

Halliday, M.A.K. 1970. A course in spoken English: Intonation. London: Oxford University Press. Hansen, J.G. 2006. Acquiring a non-native phonology. London: Continuum. Hewings, M. 1995. Tone choice in the English intonation of nonnative speakers. International Review of Applied Linguistics 33, 251–265. Hincks, R., & Edlund, J. 2009. Promoting increased pitch variation in oral presentations with transient visual feedback. Language Learning & Technology 13(3), 32–50. Hymes, D. 1972. On communicative competence. In J.B. Pride & J. Holmes (Eds.), Sociolinguistics (pp. 269–285). Harmondsworth: Penguin. Ingram, J.C.L., & Park, S.G. 1997. Cross-language vowel perception and production by Japanese and Korean speakers of English. Journal of Phonetics 25, 343–370. Iwashita, N. 2010. Features of oral proficiency in task performance by EFL and JFL learners. In M.T. Prior (Ed.), Selected Proceedings of the 2008 Second Language Research Forum, (pp. 32–47). Somerville, MA: Cascadilla. Jenkins, J. 2000. The phonology of English as an international language. Oxford: Oxford University Press. Jenkins, J. 2002. A sociolinguistically based, empirically researched pronunciation program syllabus for English as an international language. Applied Linguistics 23, 83–103. Jenkins, J. 2005. Implementing an international approach to English pronunciation: The role of teacher attitudes and identity. TESOL Quarterly 39(3), 535–543. doi: 10.2307/3588493 Jenner, B. 1976. Interlanguage and foreign accent. Interlanguage Studies Bulletin 1, 166–195. Jilka, M. 2007. Different manifestations and perceptions of foreign accent in intonation. In J. Trouvain & U. Gut (Eds.), Non-native prosody. Phonetic description and teaching practice (pp. 77–96). Berlin: Mouton de Gruyter. Jun, H.G., & Li, J. 2010. Factors in raters’ perceptions of comprehensibility and accentedness. In J.M. Levis & K. LeVelle (Eds.), Proceedings of the 1st Pronunciation in Second Language Learning and Teaching Conference (pp. 53–66). Ames, IA: Iowa State University. Kachru, B.B. 1992. The other tongue: English across cultures. Urbana: University of Illinois Press. Kang, O. 2010. ESL learners’ attitudes toward pronunciation instruction. In J.M. Levis & K. LeVelle (Eds.), Proceedings of the 1st Pronunciation in Second Language Learning and Teaching Conference (pp. 105–118). Ames, IA: Iowa State University. Kang, O. 2012. Impact of rater characteristics and prosodic features of speaker accentedness on ratings of international teaching assistants’ oral performance. Language Assessment Quarterly 9(3), 249–269. doi: 10.1080/15434303.2011.642631 Kang, O., & Pickering, L. 2011. The role of objective measures of suprasegmental features in judgments of comprehensibility and oral proficiency in L2 spoken discourse. Speak Out! 44, 4–8. Kang, O. & Rubin, D. 2009. Reverse linguistic stereotyping: Measuring the effect of listener expectations on speech evaluation. Journal of Language and Social Psychology 28, 441–456. Kang, O., Rubin, D., & Pickering, L. 2010. Suprasegmental measures of accentedness and judgments of English language learner proficiency in oral English. Modern Language Journal 94, 554–566. Kennedy, S., & Trofimovich, P. 2008. Intelligibility, comprehensibility, and accentedness of L2 speech: The role of listener experience and semantic context. Canadian Modern Language Review 64(3), 459–489. doi: 10.1353/cml.2008.0034



Suprasegmental Measures of Accentedness   

   93

Koffi, E. 2012. Intelligibility assessment and the acoustic vowel space: An instrumental phonetic account of the production of lax vowel vowels by Somali speakers. In J. Levis & K. LeVelle (Eds.), Proceedings of the 3rd Pronunciation in Second Language Learning and Teaching Conference, Sept. 2011. (pp. 216–232), Ames, IA: Iowa State University. Kormos, J., & Dénes, M. 2004. Exploring measures and perceptions of fluency in the speech of second language learners. System 32, 145–164. Lai, E. 2002. Prosody and prosodic transfer in foreign language acquisition. Muenchen: LINCOM EUROPA. Levis, J.M. 2005. Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly 39(3), 369–378. Levis, J. & Pickering, L. 2004. Teaching intonation in discourse using speech visualization technology, System 32, 505–524. Lima, E. 2012. A comparative study of the perception of ITAs and native and nonnative undergraduates. In J. Levis & K. LeVelle (Eds.).Proceedings of the 3rd Pronunciation in Second Language Learning and Teaching Conference, Sept. 2011. (pp. 54–64), Ames, IA: Iowa State University. Lindemann, S. 2005. Who speaks “broken English”? US undergraduates’ perceptions of non-native English. International Journal of Applied Linguistics 15(2), 187–212. Major, R.C., Fitzmaurice, S.F., Bunta, F., & Balasubramanian, C. 2002. The effects of nonnative accents on listening comprehension: Implications for ESL assessment. TESOL Quarterly 36(2), 173–190. Matsuura, H. 2007. Intelligibility and individual learner differences in the EIL context. System 35(3), 293–304. Mennen, I. 1998. Second language acquisition of intonation: The case of peak alignment, Chicago Linguistic Society 34, 327–341. Mennen, I. 2007. Phonological and phonetic influences in non-native intonation. In J. Trouvain & U. Gut (Eds.) Non-native prosody. Phonetic description and teaching practice (pp. 53–76). Berlin: Mouton De Gruyter. Munro, M.J., & Derwing, T.M. 1995. Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning 45(1), 73–97. Munro, M.J., & Derwing, T.M., 2001. Modeling perceptions of the accentedness and comprehensibility of L2 speech: The role of speaking rate. Studies in Second Language Acquisition 23, 451–468. O’Connor, J.D. & Arnold, G.F. 1961. Intonation of colloquial English. London: Longman. Ohara, Y. 2001. Finding one’s voice in Japanese: A study of the pitch Levels of L2 users. In A. Pavlenko, A. Blackledge, I. Piller and M. Teutsch-Dwyer (eds.) Multilingualism, Second Language Learning, and Gender (pp. 231–254). Berlin: Mouton de Gruyter. Pennington, M. & Richards, J. 1988. Pronunciation revisited. TESOL Quarterly 20, 207–225. Pickering, L. 1999. An analysis of prosodic systems in the classroom discourse of native speaker and non-native speaker teaching assistants. Unpublished dissertation, University of Florida. Pickering, L. 2001. The role of tone choice in improving ITA communication in the classroom. TESOL Quarterly 35, 233–255. Pickering, L. 2004. The structure and function of intonational paragraphs in native and nonnative speaker instructional discourse. English for Specific Purposes 23(1), 19–43. Pickering, L. 2009. Intonation as a pragmatic resource in ELF interaction. Intercultural Pragmatics 6(2): 235–255.

94   

   Lucy Pickering and Amanda Baker

Pickering, L. & Litzenberg, J. 2011. Intonation as a pragmatic resource, revisited. In A. Archibald, A. Cogo & J. Jenkins (Eds.), Latest trends in ELF. Cambridge Scholars. Pierrehumbert, J. 1980. The phonology and phonetics of English intonation. Unpublished doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA. Pierrehumbert, J. & Hirschberg, J. 1990. The meaning of intonational contours in the interpretation of discourse. In P. Cohen, J. Morgan & M. Pollock (Eds.), Intentions in communication (pp. 271–312). Cambridge, MA: The MIT Press. Pike, K. 1945. The intonation of American English. Ann Arbor, MI: University of Michigan Press. Pirt, G. 1990. Discourse intonation problems for nonnative speakers. In M. Hewings (Ed.), Papers in discourse intonation (pp. 145–156). University of Birmingham: ELR. Richards, J.C., & Rodgers, T.S. 1986. Approaches and methods in language teaching. Cambridge: Cambridge University Press. Riggenbach, H. 1991. Towards an understanding of fluency: A microanalysis of nonnative speaker conversation. Discourse Processes 14, 423–441. Rounds, P. 1987. Characterizing successful classroom discourse for NNS teaching assistant training. TESOL Quarterly 21, 643–672. Sapir, E. 1921. Language. New York, NY: Harcourt Brace. Sifakis, N.C., & Sougari, A.-M. 2005. Pronunciation issues and EIL pedagogy in the periphery: A survey of Greek state school teachers’ beliefs. TESOL Quarterly 39(3), 467–488. doi: 10.2307/3588490 Sinclair, J. & Brazil, D. 1982. Teacher talk. Oxford: Oxford University Press. Tao, H. 1996. Units in Mandarin conversation. Amsterdam: John Benjamins. Trofimovich, P., & Baker, W. 2006. Learning second language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition 28(1), 1–30. Warren, M. 2004. //î so what have YOU been WORKing on REcently//: Compiling a specialized corpus of spoken business English. In U. Connor & T. Upton (Eds.), Discourse in the professions: Perspectives from corpus linguistics (pp. 115–140). Amsterdam: John Benjamins. Weltens, B., & de Bot, K. 1984. Visual feedback of intonation II: Feedback delay and quality of feedback. Language and Speech 27, 79–88. Wennerstrom, A. 1994. Intonational meaning in discourse: A study of non-native speakers. Applied Linguistics 15, 399–420. Wennerstrom, A. 1997. Discourse intonation and second language acquisition: Three genre-based studies. Unpublished doctoral dissertation, University of Washington, Seattle. Wennerstrom, A., 2000. The role of intonation in second language fluency. In H. Riggenbach (Ed.), Perspectives on fluency (pp. 102–127). Ann Arbor: University of Michigan Press. Willems, N.J. 1982. English intonation from a Dutch point of view. Dordrecht: Foris.

Part II: The Learner’s Approach to Pronunciation in Social Context

Kimberly LeVelle and John Levis

5 U  nderstanding the Impact of Social Factors on L2 Pronunciation: Insights from Learners The acquisition of pronunciation skills in a foreign language has long been a poster child for what goes wrong, or at least never quite goes right, in learning a new language. Adult learners rarely show native-like skills in pronunciation, leading to a widespread belief that decreasing pronunciation ability is a result of increasing age. But there have also consistently been exceptions to this dominant pattern of abbreviated attainment. Native-like pronunciation for speakers who have learned foreign languages as adults have been reported for Dutch learners of English (Bongaerts, van Summeren, Planken, & Schils, 1997), speakers of French (Coppieters, 1989), and among German-English couples (Piller, 2002). In addition, other learners have demonstrated the acquisition of excellent, albeit foreign-accented pronunciation (Moyer, 1999). Nor does learning the foreign language at a younger age guarantee native-like pronunciation. Piske, Mackay and Flege (2001) demonstrated that Italian-English early bilinguals who spoke Italian more frequently had more noticeable accents in English. Clearly, attainment is varied, even for learners who want to change their pronunciation. One explanation for variation is the influence of individual differences (e.g., motivation, aptitude, field dependence/independence) in ultimate attainment (see Hansen Edwards, this volume). However, there is also considerable evidence that social factors influence ultimate attainment (Flege, 1995; Moyer, 1999, this volume) through the development of new identities (Marx, 2000) and connections with the L2 community (Miller, 2003). The exploitation of sociolinguistic knowledge even allows for the possibility of passing as a native speaker in well-defined social contexts (Piller, 2002). Unfortunately, the influence of social factors in L2 acquisition is not well-studied, especially from the point of view of the L2 learners. This is critical for understanding how variation in pronunciation acquisition occurs and the kinds of choices L2 learners make about their pronunciation in response to the pressure of social factors. This chapter re-examines research about the influence of social factors on learners’ acquisition of L2 pronunciation. Social factors are often studied in terms of the ways in which they form a barrier to acquisition (Schumann, 1976, 1986), affect learners’ confidence as a result of social stigmatization (Gluszek & Dovidio, 2010b), or how they may form a barrier to acculturation (Lippi-Green, 1997; Munro, 2003), but social factors have also been shown to facilitate acquisition (Miller, 2003), especially in learner access to community networks (Lybeck, 2002), the development of personal relationships (Piller, 2002), and the formula-

98   

   Kimberly LeVelle and John Levis

tion of group identity (Gatbonton, Trofimovich, & Magid, 2005). Social factors also influence the teachers of L2 learners, many of whom are L2 learners themselves (Jenkins, 2009), affecting their expectations about what is possible, what is likely, and what is necessary in acquiring L2 pronunciation. In this paper, we explore how L2 learners react to the social atmosphere in which they pronounce their L2. Like air, this atmosphere is invisible, yet unavoidable and necessary for speaking. To improve, they need to pronounce and speak and listen to real people in real interactions. If they do not give themselves a chance to fail (what many fear), they will never know success, whereby they are understood and ultimately become audible (Miller, 2003). In examining what learners have said about this social atmosphere, we discuss three different theoretical concepts (social participation, stigma, and imagined communities) that are relevant to the social nature of pronunciation acquisition. We pay special attention to research that uses the voices of the L2 learners themselves because their insights most clearly reveal their awareness of, reaction to, and use of the social context of learning a new language. It is our contention that the voices of L2 learners are underutilized in understanding the social context of L2 pronunciation. Greater attention to ethnographic (e.g., Anton, 1996), participant observation (e.g., McLemore, 1991) and case study research (e.g., Gumperz, 1982) provides a way to help understand the importance of social factors in pronunciation acquisition and the pronunciation choices made by L2 users. Many studies of the social nature of pronunciation do not make use of what the L2 users have said about their experiences and feelings (e.g., Gatbonton et al., 2005), instead relying on less direct measures of the influence of social factors. These studies are invaluable, but necessarily limited. L2 users’ voices are also limited in what they can tell us about the social nature of pronunciation, but the limits are different and offer a complementary research perspective. This chapter concludes by proposing a sociolinguistic equivalent of Jenkins’ Lingua Franca Core (2002) in which we argue that social factors must be addressed instructionally, especially in second language contexts. Social factors are a stealth issue in pronunciation improvement, in that they are rarely considered or addressed explicitly in pronunciation teaching, but they should not be considered optional. However, unlike Jenkins’ LFC, which is most applicable to communication between nonnative speakers, our “Sociolinguistic Core” is most directly applicable to contexts in which second language learners have to negotiate native language speech communities and norms. Other authors have discussed the role of social factors in second language acquisition (SLA) in great detail. We will not attempt to repeat their work, but we will instead highlight key theoretical concepts that could lead to a deeper understanding of the importance of social factors in L2 pronunciation. We will



Impact of Social Factors: Insights from Learners   

   99

focus on three concepts that have not typically been applied to L2 pronunciation: participation in established social groups, stigma and shame, and imagined communities and identities. After presenting each concept, we present evidence for its application to L2 pronunciation by re-examining what L2 users have said about their experiences learning the pronunciation of an L2 and connecting their descriptions to the theoretical concepts.

5.1 Participation in Established Social Groups One framework for understanding how social groups influence pronunciation acquisition is legitimate peripheral participation (Lave & Wenger, 1991). This framework describes how learners engage with a social network to acquire knowledge. Indeed, Lave and Wenger argue that it applies to all types of learning (c.f. Wenger, 2010; Norton & Toohey, 2011). Learners are successful, according to this framework, if they are ‘legitimate’ (accepted as a potential new members of the community), ‘peripheral’ (allowed to be at the edges of the group, watching and learning and slowly moving towards the center), and given the opportunity to participate. Lave and Wenger argue that learning happens through doing rather than knowledge being transmitted through stated instructions. Lave and Wenger contrast the idea of peripheral participation with marginal participation. A peripheral learner is one on the edge of a group looking in but also one who has the ability to move into a more central position within the group at a later date. In contrast, marginal participation (Wenger, 1998) marks a learner as not part of the group, but outside of it. She does not have the ability to gain access, regardless of her skills, knowledge or abilities. Most crucially, the community of practice that she wishes to join must not only acknowledge her as a potential member but also must grant her access. As Lave and Wenger state: The key to legitimate peripheral participation is access by newcomers to the community of practice and all that membership entails. But though this is essential to the reproduction of the community, it is always problematic at the same time. To become a full member of a community of practice requires access to a wide range of ongoing activity, old-timers, and other members of the community; and to information, resources and opportunities for participation. (1991: 100)

Lave and Wenger further elaborate on the constructs of oldtimers and newcomers. Norton (2001) takes these terms and applies them to language learners, stating that in “the language classroom, such theories seem particularly apt in situations in which second language learners (newcomers) enter a classroom in which speakers of the target language (old-timers) constitute the more experi-

100   

   Kimberly LeVelle and John Levis

enced members of the community” (p. 163). In regard to pronunciation, Lave and Wenger’s ideas offer a possible way out of the difficulties with the native-nonnative dichotomy, in which L2 learners cannot gain access to the insider’s group no matter how good their pronunciation or how skilled they are in use of the L2. Newcomer and oldtimer allow for learners to move beyond new without having to reach the goal of native. Thus we can acknowledge an L2 speaker of English as a newcomer during her initial stay in the L2 culture, but she can move into the role of an oldtimer after a period of time without having to meet the often unattainable bar of achieving native-like pronunciation.

5.2 Social Participation – A Doorway or a Wall? The importance of social participation is clear for pronunciation acquisition, but social groups may look impervious to those on the outside. To L2 users, the element that can help them the most is also the one that can delay their progress. Access to a social network allows L2 learners to begin to construct an identity in the L2. But such access is controlled by others. Miller writes: “[H]ow students are heard by English speakers is just as important as how they speak English. Representation is...representation of social identity, a display of self in social interaction, discursively and jointly constructed in the context of social memberships” (2003: 14). Gaining access to social networks and their potential for affecting pronunciation acquisition is not straightforward. Not only may social network insiders be resistant, outsiders may decide for various reasons not to participate fully. A particularly rich study of the power of social participation on pronunciation acquisition is Lybeck (2002). Lybeck interviewed nine American women living in Norway with varying degrees of access to the relatively closed social networks in Norway. Their differing reactions to their inclusion or exclusion as they learned Norwegian pronunciation connected to their degree of access. Lybeck measured their pronunciation in an initial interview and then six months later after a second interview. The nine subjects were divided into three groups based on the strength of their social networks (2 with Strong, 2 with Moderately Strong, and 5 with Weak Networks). For pronunciation, she examined both the percentage of words that were native-like and the number of Norwegian /r/ sounds. Native-like judgments were based on the word being sufficiently in line with all phonetic qualities, “including the correct use of syllable length, elision, and stress” (Lybeck, 2002: 178). The /r/ is a recognized social marker in Norwegian, both to Norwegians and to L2 users. Lybeck says that “the use of American r is not problematic when it comes to comprehension, but it immediately identifies



Impact of Social Factors: Insights from Learners   

   101

the speaker as American, and thus was considered a good measure of accent” (p. 178). In general, all the women found that it was difficult to break into established networks. Those who were more accepting of the target culture and had better networks seemed to have better pronunciation in Norwegian. Those who felt labeled as outsiders struggled to find supportive social networks. One woman (B2) improved yet saw pronunciation still marking her as being an outsider, saying “I’d like to be able to pronounce things correctly and to be understood... and I don’t necessarily want to be marked as an outsider all the time” (p. 186). In Lave and Wenger’s (1991) terms, she wants to be a peripheral participant but feels herself a marginal participant. Another woman was more blunt about Norwegian pronunciation and its effect on her willing marginalization, saying “it’s almost like a mental block for me that when I hear this sound...I don’t have a desire to copy it. I don’t want to sound like that” (p. 178). The women’s own judgments of their acceptance were often connected to Norwegian reactions to their pronunciation. The women were reported as saying “I don’t think they want to understand sometimes,” “It pisses me off when I am trying so hard and they don’t seem to be,” and that “They aren’t very good at asking you ‘What do you mean?’ They just give you the ‘look’” (p. 187). Social rules were unevenly enforced as well. One woman (A1) in the strong network group with good pronunciation, found that Norwegians corrected her too much – “you could just go nuts...[you feel like saying] Ok, can we talk now” (p. 187). For most of the women, the connection between having a supportive social group and their pronunciation was clear. Most of the women in the weak social network group had the least native-like Norwegian accents, were the most reluctant to use it, were the most likely to show evidence of culture shock, and were the most willing to switch to English (or avoid Norwegian). In this group, participant C1 was a particularly interesting case. She started with among the best pronunciation and high potential for legitimate peripheral participation (she was married to a Norwegian, with a child, a career, and in-laws), but she refused full participation, which showed in her pronunciation. She felt that her in-laws didn’t try to help her and were not willing to recognize her attempts to learn Norwegian, that colleagues at work did not make it easy for her, that she would not be successful in breaking into the culture, and as a result, she would always be an outsider. Her reaction was “I just don’t care as much” (p. 187). She also said that “I don’t want [my son] speaking Norwegian to me...because it’s important that he speaks English” (p. 186). She began to focus on relationships with other Americans. Her decrease in pronunciation accuracy with /r/ seemed to be a way of pulling away, of actualizing her “don’t care” attitude.

102   

   Kimberly LeVelle and John Levis

The women in Lybeck’s study suggest that success in participating in social groups often goes hand in hand with success in pronunciation. It is clear that relative proficiency with the spoken language, and the strength of ingroup-outgroup feelings in a culture also make a difference. But the willingness of target language speakers to interact with, accept, and provide feedback appear to be critical for pronunciation development and future satisfaction in using the language. This suggests for L2 users that good pronunciation may get worse if there is no social reason to maintain it. Pronunciation in the new language is too fragile to survive feelings of social isolation. Deep connections to social groups, their norms, and the opportunities they afford may even allow L2 users to develop good enough pronunciation to pass for native speakers in certain contexts (Piller, 2002). In such situations, pronunciation is not seen as a stable feature of L2 competence, but rather as a feature of L2 performance which can create an identity in a particular context. Piller studied bilingual couples (usually with one German and one English speaker) in which one or both of the couple could pass for a native speaker of the L2 in certain contexts, especially in “first encounters, often service interactions” (p. 191). Passing may thus be a mark of good enough pronunciation and cultural knowledge to not be noticed. This is often accomplished by using sociolinguistically marked pronunciations and lexis (i.e., dialect forms) that make the speakers sound as if they fit in. The goal of speakers in these encounters was not necessarily to pass for a native speaker, but rather to disguise origins so others could not guess where they are from. As Piller said, “they prefer not to be reduced to their original national identity. At the same time, they do not necessarily want to be perceived as native speakers, either” (p. 194). They may achieve this by working to rid themselves of stereotypical markers associated with speakers of their native language or taking on markers that make them sound like they are from somewhere else. Allan, an American in Piller’s study, did not try to hide that he was not a native German, but he talked about having fought against having typical markers of an American accent, saying “I know I can’t avoid that I’m a foreigner, but I enjoy it that some people don’t know where I’m from. They think..sometimes Italy or – they don’t have a clue – and I quite enjoy that” (2002: 195). Another example in Piller’s study was a German in England. She did not want to be stereotyped as a German, and at first people took her for an American, and later for someone from Scotland, which she took as progress toward achieving a British accent. This is similar to Marx’s (2002) deliberate use of a French accent to hide her origins. L2 users may even refuse to be identified as natives of the L2. One woman described in Piller (2002), a native speaker of Danish, could be heard as a German native speaker in more than short interactions, but always identified herself as a nonna-



Impact of Social Factors: Insights from Learners   

   103

tive speaker because she did not want her pronunciation to raise expectations of her having insider cultural knowledge that she did not share. Social involvement may be an important strategy for those who want to improve their pronunciation. Gluszek and Dovidio (2010a) show that there is a connection between better pronunciation and greater social connection, i.e., feeling more a part of the target culture and identifying with it. Their findings suggest that a willingness to engage with others and to see communication as primarily social is important in gaining access to real speakers. Furthermore, it opens up opportunities to notice how people talk, how they interact, the ways in which they package their words and gestures, and the sociolinguistically marked variants that evoke comfort in interactions. We are social beings, and communication and pronunciation live and develop within social contexts.

5.3 Stigma and Shame Another way that pronunciation intersects with social factors is through the stigmatization of accent, which in its most extreme form, can cause a sense of shame. Such stigmatization may be felt by both native and nonnative accented speakers (Gluszek, Newheiser, & Dovidio, 2011). Studies of stigmatized characteristics (such as eating disorders, homosexuality, and mental illness) and stigmatized groups show that members of stigmatized groups may feel “devalued, stigmatized or discriminated against because of their particular social identity” (van Laar & Levin, 2006: 6). As Schmader and Lickel (2006) say, “we feel shame when we perceive that a flaw in our character has been revealed to others and/or to ourselves” (p. 264). Another writer says something similar: shame “comes in the recognition of the reality of our own self,” that is, in our innate limitations (Kurtz, 2007). Put another way, shame is related to “an attribute of a person that is deeply discrediting” (Gluszek & Dovidio, 2010b, p. 216), and it can be seen in the deep feelings L2 speakers have toward the social stigma they perceive as a result of their accents. Accent-related stigma is unique for several reasons (Gluszek & Dovidio, 2010b). First, accent becomes conspicuous simply by beginning to speak, whereas other stigmas may be concealable (Quinn, 2006). Another way in which accent-related stigma is unlike other stigmas lies in its potential to disrupt interactions with others (Gluszek & Dovidio, 2010b). Accent does not necessarily disrupt understanding, as evidenced by Munro and Derwing’s (1995) finding that listeners may be able to understand everything said while still rating speech as heavily accented. However, heavier accents may cause speakers to anticipate greater miscommunication. In other words, “communication challenges associated with pos-

104   

   Kimberly LeVelle and John Levis

sessing a nonnative accent may play a pivotal role in the social initiatives, perceptions, and adjustment of nonnative speakers” (Gluszek et al., 2011: 37). The stigmatization of accent, like other stigmas, may cause the accented speaker to react to the stigma by changing their own behavior. In studying stigmas related to mental illness, Quinn (2006) found that “most people, prior to being labeled [as mentally ill], know and, to some extent, endorse the stereotypes about the mentally ill” (p. 89). Like accented individuals, they must live with two sets of conflicting ideas about who they are and how others see them. People may have varied reactions to the stigmatizing effects of an L2 accent. Some choose to ignore the stigma, others take refuge in a strong group identity (van Laar & Levin, 2006), others may react with anger or defiance, and yet others may “be more likely to avoid situations in which they think they may experience stigma, be less likely to initiate conversations, and attribute any problems in communication to the listener’s prejudices” (Gluszek & Dovidio, 2010b, p. 221). As related to L2 pronunciation, stigma is unavoidable and likely to be deeply felt by L2 learners. It may cause them to avoid the very things that will help them improve, but accentrelated stigma is also deeply unfair precisely because of its wide acceptance.

5.4 How Stigma Influences L2 Pronunciation In addition to re-examining published research, we have also drawn on unpublished interviews with our own students who have taken part in pronunciation tutoring. In these open-ended interviews, learners often comment on the social aspects of their pronunciation improvement. One of the most surprising things we found when we asked our learners to talk about their pronunciation was that some of them felt shame as a result of their accents. We had heard learners talk about discouragement, not understanding what they were doing wrong, about being misunderstood, but shame was unexpected. However, there is evidence that devaluing a speaker’s language leads to marginalization and shame about the language. In a study of Garifuna speaking children in Belize, Bonner (2001) found that the marginalization of the language, its association with poverty and disfavored social status caused children to be ashamed to speak their own language. In related research, accent has been identified as a key stress factor in the workplace for accented speakers (Wated & Sanchez, 2006). One of our students, a Korean pastor, talked about shame in describing why he would not speak English with the members of his church. He felt that his poor English would conflict with the role of authority he had as a pastor, and that he would be ashamed before his congregation. His accent would diminish him and his authority. His accent, he believed, was a moral failing that had to be hidden.



Impact of Social Factors: Insights from Learners   

   105

This view of accent is painfully counterproductive and ultimately defeating. He felt he deserved no credit for negotiating the new language. Instead, he believed he deserved approbation for not succeeding sufficiently, like being seen as a failure at tennis simply because one is not a professional. Other examples of this sense of shame can be seen in Derwing (2003), which reported the reactions of immigrants to native Canadians’ treatment of them because of their accents: 1. “When I work for a company my colleagues don’t understand. They joke. I feel bad very often.” (p. 557) 2. “‘What? What? Can you repeat that?’ I feel very strange with Canadian people. Rude, impolite – I feel very sad about that.” (p. 558) 3. When I’m speaking English with an accent and I’m making mistakes, they’re thinking about what I am. Low.” (p. 558) 4. “At the bank, the lady who was talking to me, she cannot understand me. So that the other people who work with her, they started to laugh. Just because I cannot pronounce one word.” (p. 558) Although none of these reports use the word shame, the emotional isolation of being targeted because of accent is clear. The immigrants remember being laughed at, being the butt of jokes, being seen as “low,” or feeling bad or sad because of the way that people react to them. These are socially traumatic experiences, and their effect may be that L2 speakers try to minimize situations in which they could happen again. As one set of researchers says, “shame elicits a very passive or withdrawing response aimed at reducing feelings of self-consciousness” (Schmader & Lickel, 2006: 264). Another writer says that “shame arises when there is a threat to the social bond” (Scheff, 2002: 95). Since language is a key way to create and manage social bonds (Miller, 2003), language differences such as accent are a constant barrier, or even threat, to such bonds. Accent stigma clearly underlies the tremendous appeal of accent reduction. Indeed, the accent reduction industry in the United States flourishes, Newman (2002) argues, because some immigrant groups (Indians in Newman’s essay) “do this accent reduction out of shame and a feeling of insecurity...It’s simply that people are ashamed of what they are” (p. 63). We have asked our English learners if they would take a native accent if we could wave a magic wand and change their speech. Invariably, they give an enthusiastic “Yes.” They have no desire to become culturally American; they just want to remove pronunciation as a barrier to communication. The barrier is often draped with fear and frustration on their part, for they see pronunciation as a blockade. Often they also see themselves not as “active agents in their language use, language choices, and targets for instruction” (Hansen Edwards, 2008: 251), but as unwilling victims of a language system they do not control.

106   

   Kimberly LeVelle and John Levis

As Lippi-Green states, accents can be used to denigrate and diminish the contributions of others (1997). While it is illegal in North America to discriminate on the basis of race, gender, age, ethnicity or sexual orientation, it is more difficult to prove discrimination on the basis of native or nonnative accent (Wolfram & Schilling-Estes, 2006; Purnell, Idsardi & Baugh, 1999). Munro (2003) documents discrimination in Canada based on accent. One case of a Polish-accented teacher led a supervisor to refuse to employ him because the teacher “did not speak English” (p. 44) even though there were no established complaints from students, parents or other teachers. In another case, it appeared that accent over the phone may have been the cause of discrimination for a job. In a third case, a long-term employee was criticized for “her accent and her ‘broken English’” (p. 47) even though oral skills had never been an essential part of her data-processing job which she had held for 17 years. Both L2 users and native speakers rarely question the correctness of stigmatizing accent. As noted, stigma can lead to a reluctance to engage native speakers in conversation, to limited involvement in deeper social interactions, and thus to less access to cultural opportunities which will both help them to improve their L2 and to feel more contented with their use of the language. There may be no way to overcome the deep stigmas associated with accent that pervade any society, but L2 users need to recognize the unfair connections between stigma and accent so that they do not buy into negative assumptions about their abilities to speak the new language.

5.5 Imagined Communities and Identities Another way that the social nature of L2 pronunciation can be understood is through the lens of imagined communities and imagined identities (Norton, 2001; Norton & Toohey, 2001; Pavlenko & Norton, 2007). Wenger (2010, pp. 4–5) posits three facets of belonging (engagement, imagination, and alignment ) that work to help construct our sense of identity. Engagement can be defined as the interaction with others (whether successful or not). Imagination is the creation of our vision of how this piece of the world works based on our evolving understanding. Finally, alignment is the degree to which we coordinate our efforts with those of the group around us. Wenger has argued that these three aspects describe how we can belong to a group – by connecting with it, by creating our own view of it, and by coordinating our efforts with those of the group (see Wenger, 1998, 2010 for more on these ideas). Norton (and later Norton & Toohey, 2001) argues that the concept of imagination is critical in understanding participation and lack of participation by learners. As they stress, imagination in this context is not fantasy



Impact of Social Factors: Insights from Learners   

   107

or myth. Instead, it is the construction of a sense of potential, what might become and what one might strive towards. Learners are trying to gain access to the imagined community of speakers of English. To do so, they may take on an imagined identity to allow them greater agency in this new community (Miller, 2003). Norton and Toohey argue that a binary and unchanging view of identity is inadequate. In contrast, they propose that identity is socially constructed, variable and changing, and responsive to inequitable power differentials and influences. This means that identity may be composed of conflicting pieces, as described in the following section. Miller (2003) suggests that being part of a community is essential to establishing an L2 identity. In terms of pronunciation, Miller talks about the importance of audibility rather than intelligibility. She says: “To be authorized and recognized as a legitimate user of English by others, you must first be heard by other legitimate users of English” (p. 47). In other words, intelligibility, which is speakercentered, is not enough. Listeners must also recognize the speaker’s legitimacy. The speaker must become audible. Audibility, in turn “may determine the extent to which a student can participate in social interactions and practices within the educational mainstream. Being audible also provides the means of self-representation, as well as the essential underlying condition for ongoing acquisition of English through practice, and continuing identity work” (pp. 48–49). One’s pronunciation, when it is audible to others, no longer remains a barrier to acquisition. Some of the students Miller wrote about saw themselves as outside of their desired community, even though they are aware that others like them are accepted members of their hoped for community. The high school L2 users’ ability to be heard by native speakers was connected to their audibility in general and their pronunciation in particular. In an early assertion of this connection, Zuengler (1988) said that “pronunciation is a domain within which one’s identity is expressed” (p. 34). Identity, however, is not unchangeable but fluid. Speakers adjust their speech patterns constantly to fit into different groups. Marx (2002) goes further and says that “identities do not exist within people, but are constructed between them in interaction” (p. 274). Indeed, part of our identity is connected to how we see others and to how we see ourselves in relation to others. We use the concepts of imagined identities and imagined communities discussed earlier, looking at two cases to illustrate. First, we look at Marx’s identity journey, followed by a study of two teachers imagining themselves to be legitimate teachers despite being accented. In both cases, pronunciation was central to how learners understood their identity in the L2. At the same time, identity must be understood in light of their lack of native-like pronunciation.

108   

   Kimberly LeVelle and John Levis

Marx (2002) offers one of the most striking examples of how much it can cost to construct a new identity. An Anglophone Canadian, Marx studied abroad in Germany for three years involving planned and unplanned losses and gains in her identity. The shifting boundaries of her identity suggest ways in which accent choices allow one to cross boundaries of new communities, but also how those choices involve factors not completely under a language learner’s control. Marx described her attempt to become more and more German during her stay abroad, connecting her cultural adjustments to her new community as well as to her changing accents. She argued that her “acceptance of and into the C2 [second culture] allowed for a more complete appropriation of accent in this L2, i.e. that this C2 was reflected in the melody and pronunciation of the second language” (p. 268). The stages she went through suggest the ways that pronunciation and other types of accommodation go hand in hand, and that both are involved in the person’s “self-translation” (p. 269) in which she both sees her sense of self going through loss (or displacement) and recovery, thereby constructing a new identity. Her first stage involved a decision to only speak German, even with other English speakers. She imagined a new identity as a German-speaker rather than an English speaker who knows German. The next step in her self-translation was not linguistic, but physical. After about four months, she changed the way she dressed so as to be seen as similar to German university students. Interestingly, coordinated with her physical transformation was a transformation of her accent, not to the German norm she heard around her but rather to French-accented German. She says “part of the initial attempt to eradicate the L1 accent was not due to the desire to acquire a perfect ‘native speaker’ accent in the L2, but rather to cloak the fact that I was a native speaker of English” (p. 272). The next step Marx reported was the deliberate attempt to achieve a native German accent after about a year in Germany so that she could be seen as a real member of the C2. While her previous steps could be seen as a way to mask her linguistic and cultural origins, this new step was riskier, but partially successful. She often passed for a native speaker in speaking with strangers. (This is consistent with “passing” as described by Piller, 2002.) Marx’s personal narrative showed that her deliberate and systematic changes in pronunciation helped build bridges to the new communities. Her pronunciation did not simply change; she continually undertook steps toward her identity goals through pronunciation. But her narrative also suggests that building a new identity did not come with a guarantee that she would be comfortable in the new community. New identities are double-edged; they come at a cost and at some point may simply carry too high a price for many L2 users.



Impact of Social Factors: Insights from Learners   

   109

In another examination of L2 users imagining new communities, Golombek and Jordan (2005) studied two Taiwanese English teachers who began to re-imagine themselves as legitimate yet accented teachers. The two teachers took part in a TESL training course on the teaching of pronunciation. The first teacher, Shaomei, revealed a variety of feelings toward English that painted a very conflicted view toward her spoken proficiency. She said that most L2 users of English she knew “don’t think their English is good enough,” that their “English is never sufficient” and that they “feel unconfident of ourselves and dare not to tell others that we were once English majors because we are incompetent to speak English fluently” (p. 519). With this deficit mindset, Shao-mei saw her spoken language in general and pronunciation in particular as markers of incompetence which could not be overcome. Golombek and Jordan also said that her choice of words “suggest that even if an L2 speaker feels confident in her language abilities, native speakers can – and may – still refuse to ratify her as a legitimate speaker” (p. 520). Becoming an active agent will clearly be more difficult if one feels inadequate and believes that target language speakers will not accept one’s efforts, no matter what. Her feelings were likely not objective, but they offered a striking view of her subjective reality. Her sense of accent, and of her own agency, began to change with knowledge about alternative views of accent and language competence. In one reaction piece to a reading, Shao-mei wrote that “ESL learners should not be ashamed of their accent” and that a more realistic view of accent could lead to “more confidence” so that her own students would be able to see themselves as “competent learners instead of disabled” (p. 521). Re-imagining a teaching community was not an easy task for Shao-mei in these comments. The perceived limitation of her accent, a source of her hidden shame and view of speech as a disability, had to be confronted before she could feel confident and competent. It is striking that she felt that her own changed views may only bear fruit in the lives of her students. It is also striking that she used the concept of shame to describe accent. The second teacher, Lydia, did not accept common beliefs about accent, but she did not have the words to resist what she felt others believed. She was a “black lamb” in Taiwan because her beliefs were not shared by other teachers. For Lydia, this meant working on her English pronunciation, a somewhat paradoxical strategy because she thought others would see her as “stupid or clumsy” and that her pronunciation would be seen as a flaw in her intelligence (p. 524). But Lydia never fully accepted the validity of the outside social pressures though they had been powerful enough to silence her and make her doubt her own abilities as a teacher. A way to negotiate the often unspoken social stigma of accent is for learners to imagine their access to an imagined community of speakers of English – and then to work toward becoming a part of it. This may involve adjustments in their

110   

   Kimberly LeVelle and John Levis

pronunciation or their outward appearance, as in Marx’s continual translation of herself through contact with the German community she lived in, or it may involve questioning the unspoken assumptions regarding community membership, as in the two teachers described by Golombek and Jordan (2005). At the core of imagined communities and imagined identities is the willingness to see social access not as it seems on the surface but as it could and should be. Unfortunately, “the very people to whom the learners were most uncomfortable speaking English were the very people who were members of – or gatekeepers to – the learners’ imagined communities” (Norton, 2001: 166). In other words, even as their imagined communities became real, it still was not easy to gain access and become part of the target language community.

5.6 A Proposal for a Sociolinguistic Core L2 learners do not pronounce in isolation, they pronounce to make social connections with other people through their speaking. Those social connections then provide the needed opportunities to establish their L2 identity and change their pronunciation in the only way that really matters, by communicating with other people. Accordingly, social factors should also be a priority in teaching and learning L2 pronunciation. What is not clear is how such social factors should be prioritized. In a typical pronunciation classroom, there is a heavy emphasis on segmentals, with some emphasis on suprasegmentals. Most pronunciation points are taught with an emphasis on accuracy with practice at the word and sentence level. Interaction and communication are included only as a low priority, and social factors play little or no part in instruction, even though research tells us that the influence of social factors on L2 pronunciation is undeniable. In one attempt to set priorities for pronunciation teaching, Jenkins (2000) proposed a Lingua Franca Core (LFC) for international English communication. The LFC was heavily weighted toward segmental phonological features that would make speakers of English throughout the world more intelligible to one another. The LFC claimed that some features were important enough to impair intelligibility if pronounced wrongly (e.g., most consonants, vowel length, and nuclear stress) while the mispronunciation of other features (e.g., English rhythm, schwa, word stress, intonation) was relatively unimportant. As an important addition to its core phonological features, the LFC also proposed teaching accommodation strategies to promote interaction. The issues involved with establishing a sociolinguistic core are daunting. Billions of people around the world use English, and the social and cultural assumptions that influence their communication are just as varied as they are. But the



Impact of Social Factors: Insights from Learners   

   111

goal of this chapter is far more modest. We hope to establish the beginnings of a sociolinguistic core for learners of English living in Inner Circle contexts (Kachru, 1986). In these contexts, nonnative users of English often find that pronunciation may add to the substantial social barriers that can impair communication. Intelligibility is essential, but it is not always enough to get beyond feeling like an outsider (Piller, 2002). What might be involved in a sociolinguistic core? Like Jenkins (2000), we argue that certain items should be prioritized in pronunciation learning. Unlike Jenkins, we do not argue that certain items should a priori be excluded from our core. We simply do not have the evidence for those kinds of decisions. For the foundation of our core, we have identified five items based on learner accounts of success interacting in a new culture and language. As such, we offer them as a starting point of essential, socially-relevant priorities for pronunciation learning and teaching: 1. Interacting outside of the comfort zone 2. Using interactional strategies that match the targeted social group 3. Judiciously using sociolinguistic markers in pronunciation 4. Looking the part 5. Being realistic about both the stigma of accent and long-term outcomes in pronunciation

5.6.1 Interacting outside the comfort zone The first priority is rather obvious. To pick up sociolinguistically relevant behaviors and to have opportunities to fit in, a speaker must have opportunities to interact with others outside of transactional contexts. By transactional contexts, we mean brief service encounters rather than more extended interactions. Unfortunately, many L2 users do not have these kinds of interactions. They may be uncomfortable around native speakers. They may be surrounded by opportunities but be “too busy” to see them. This is all too common with graduate students at US universities, the context in which we work. One student from Malaysia had English pronunciation that was quite good and overall was highly intelligible. And though he was also quite personable one-on-one, he saw himself as shy. He spent long hours in a lab surrounded by American graduate students. When we said this sounded like a golden opportunity to use his English, he said that he was so involved in work that he could not take advantage of it. After work, he said he was too tired to go out with the other students. Although he saw himself staying in the United States as a professor, he viewed the social contexts around him not as opportunities but rather as distractions. Another student from Russia

112   

   Kimberly LeVelle and John Levis

was vibrant and outgoing yet found herself in a similar situation. She worked in a lab setting with other students and professors, and had to use English to communicate. Nevertheless, she rarely communicated beyond transactional language. When she was finished with work for the day, she went home to her family. She stated that she felt like she had used more English in Russia than she had since arriving. These two learners remind us of Coleridge’s poem, The Rime of the Ancient Mariner. “Water, water everywhere, and not a drop to drink.” English, English everywhere, but not a word to speak. Surrounded by opportunities, they do not take advantage of them either because of a misplaced sense of duty or because of a transactional approach to their own language. They were deeply dissatisfied but did not see that greater social involvement was a solution to their dissatisfaction. Not only is it important to use the L2 interactively, it is also important to be involved in supportive social networks in which language use naturally occurs. Lybeck (2002) found that the most successful learners of Norwegian were those who had such networks, whether they were composed of in-laws, work groups, or groups for new mothers. Those who were least successful (and most resentful) felt left out by others, such as Norwegian-speaking in-laws who spoke too quickly or co-workers who went silent when the learner came to sit with them at lunch. Social networks can take time to develop, but churches or other religious groups, conversation groups, civic and social organizations, or clubs, can all provide opportunities to talk and listen while joining others in a common activity.

5.6.2 Using interactional strategies The second element of our core is using speech strategies to match the interactional patterns of one’s intended social group. Miller’s (2003) ethnographic study of immigrant high school students in Australia argued that students could not establish their new identity without being accepted in the social context, yet without an identity, they were inaudible to others. Establishing their identity was not just a matter of speaking intelligibly; it involved speaking in a way that could be accepted. Miller’s study shows that successful learners use spoken discourse features such as like, they mine the speech of others for useful language, and they expand their answers to questions to promote social interaction. The use of like is particularly striking. It is a feature of speech that is widely stigmatized but was socially cohesive among the high school students in the study. More successful learners picked up on this function of like. Less successful learners did not, perhaps because they did not have the social opportunities to acquire



Impact of Social Factors: Insights from Learners   

   113

it. Similarly, Marx (2002) talks about how her use of the discourse marker gel? (y’know? or right?) from Bavarian German allowed her to occupy another identity way-station on her journey to trying to become native-like in standard German. Although interactional strategies cannot make up for unintelligible pronunciation, they can help learners sound socially appropriate.

5.6.3 Sociolinguistic markers of pronunciation Lexical sociolinguistic markers are important for fitting in to a social group, but so are sociolinguistically marked pronunciation features that do not sound like they came out of standard classroom instruction. These may be harder to notice for learners, but they may for that reason be even more important and amenable to instruction. Piller (2002) talks about L2 learners who were able to pass because of their attention to local or regional speech markers. One woman, for example, used [s] rather than [ʃ] in sp and st initial clusters (spielen, Strasse). This somewhat traditional pronunciation had local prestige. Even though she used such features more frequently than did local native speakers, the pronunciation was effective in helping to establish her as a local. In another example, a woman used a marked pronunciation of [ʃ] rather than [s] typical of another stigmatized dialect of German, even though her partner said the use was not accurate. Piller, herself a native speaker of German, thought it sounded perfect, suggesting that the use of marked variants may be a way to pass as better than your pronunciation would otherwise justify. Presumably, English vowel variants associated with regional pronunciations such as the US Northern Cities Shift or the Southern Shift (Labov, Ash, & Boberg, 2005) or even cross-varietal vowel variants (e.g., using a well-known British variant in the US) could serve the same purposes for learners of English. Implicit in the models for most pronunciation teaching is the assumption that learners should always use standard pronunciation variants. This is an unfortunate and untenable assumption for English in the world. Even among native speakers, standard varieties do not command the majority position. In British English, Received Pronunciation (RP) speakers may constitute as little as about three percent of the UK’s population (Trudgill, 2001). In American English, General American (GA) speakers are more numerous but hardly constitute an overwhelming majority. Most native speakers will speak a non-standard spoken variety, and the marked pronunciations of these varieties can help nonnative speakers sound as if they linguistically belong in their speech community.

114   

   Kimberly LeVelle and John Levis

5.6.4 Dressing the part The fourth priority for a sociolinguistic core is to look like the larger speech community. We suggest that there are two ways to accomplish this: through clothing choice, and through nonverbal communication. Marx (2002), long before she began intensive work to sound German, changed the way she dressed to fit in with the dominant culture: After approximately four months of immersion in the L2 and C2, and concurrent with my decision to remain in Germany until the completion of the academic year, I started to effect changes to my outer self...I began to follow the practices of German university students: I forfeited my beloved running shoes, men’s jeans and t-shirts, all symbols of my “Canadianness” (p. 271).

In her desire to distance herself from other English speakers and more closely affiliate with the social groups she wanted to be part of, Marx used physical cues of the implicit dress-code. Dress is an obvious bridge to shorten social distance but it is one that many L2 learners do not easily cross, perhaps because the way we dress is as much a part of our identity as the way we sound. As many L1 cultures themselves become more multicultural, there may be more space and more models to work from in terms of dress. Perhaps more importantly, body language and gesture may be critical in the kind of social messages sent because they provide a different kind of visible clue about group membership. Pragmatically inappropriate body language can be a give-away that we are not part of a social group. At the least, it can send confusing messages in an unfamiliar culture. Moreover, body language and gesture are closely connected to pronunciation. In English, some examples include the brief side-to-side head shake to signal negation, the shoulder shrug connected to the idiomatic intonation of the words “I don’t know,” the marking of stressed syllables with body or head movements, and the acknowledgements of another upon giving and pronouncing our names. In all these, the way we sound and the way we appear connect pronunciation to gesture. Bringing sound and appearance together not only will help communication, it will provide a social anchor in the L2.

5.6.5 Being realistic about stigma and about progress The final element of our sociolinguistic core is an honest recognition of limitations. It is well-known that L2 users rarely become native-like in pronunciation. It is also well-known, though much less discussed, that accent is stigmatized.



Impact of Social Factors: Insights from Learners   

   115

Because there is no magic wand to remove foreign accents, we need a way to help L2 users understand and confront the social assumptions that can make them reluctant to communicate. Stigma gains strength by silence, both silence of the L2 learner and the silence of the L1 listeners. While it is hard to change native speakers’ views in the short run, L2 users can begin the change in attitudes by refusing to accept the injustice of others’ assumptions or by using humor to displace the power of the stigma. L2 users should be taught the reality and injustice of stigma. This should be a priority in language classrooms and should be openly discussed in language teaching texts by using examples of L2 users who have successfully negotiated and overcome stigmas. For example, one of our students told a story of asking for the location of a product in the grocery store. Later, she saw the clerk she had asked talking to another clerk. Both were looking at her, pointing and laughing. Rather than feeling ashamed and silenced, our student went to the clerks, confronted them and told them that they had no right to make fun of her. She was a learner of English, and even more, she was a customer who did not deserve the treatment she was receiving. Although one clerk walked away, the other apologized. This suggests that L2 users, instead of taking on the attitudes of those who other them, can control their own reactions to, and understanding of stigma.

5.7 Conclusion Pronunciation in a new language is particularly fraught with threat. People may not understand you, may pretend not to understand, or may even understand but decide that the way your message is packaged is somehow unacceptable or worthy of mockery. It takes courage, or perhaps desperation, to continue communicating when faced with negative attitudes about your speech, and by extension, your identity (Lindemann, this volume). Disapproving (or neutral) comments about someone’s accent set up barriers to social participation. But L2 users need to develop ways to legitimately participate in a new culture when their own speech betrays them as being an outsider. Such participation may only be on the margins, but if they are to become audible and establish their identity in a social context, they must be heard as legitimate speakers who have something to contribute (Miller, 2003). To learners, accent is not simply part of communication. It is what identifies them as outsiders, openly and unambiguously, every time they open their mouths. We can indirectly learn much about the way social factors influence L2 users by examining patterns of L2 use related to social factors. However, without asking the L2 users about why they pronounce as they do or what their experi-

116   

   Kimberly LeVelle and John Levis

ences have been in L2 language use, their own or others’ attitudes, or how they fit into intact social groups, we will have a limited understanding of how social interactions impact their pronunciation of the target language. To understand the pressures they feel, the barriers they perceive, and the ways in which they overcome and become successful social users of their new languages, we need to find ways to ask L2 users about their experiences and their attitudes toward them. We also need to openly address the social context of pronunciation improvement so that L2 users can make informed decisions about their own improvement and can react knowledgeably to the often unspoken social pressures and prejudices that may or may not be related to the clarity of their speech. We have put forward an outline for a sociolinguistic core for pronunciation to move beyond the typical emphasis on sounds and prosody. An approach that involves teaching for intelligibility alone is one-dimensional, especially in second language contexts, because it ignores the feelings, insecurities, and uncertainties that learners carry with them, as well as their reluctance to try things that will expose their weaknesses. For this reason, L2 learners need to understand the power of the social context and be shown ways to begin to negotiate it.

References Anton, M.M. 1996. Using ethnographic techniques in classroom observation: A study of success in a foreign language class. Foreign Language Annals 29, 551–561. Bongaerts, T., Van Summeren, C., Planken, B., & Schils, E. 1997. Age and ultimate attainment in the pronunciation of a foreign language. Studies in Second Language Acquisition 19, 447–465. Bonner, D.M. 2001. Garifuna children’s language shame: Ethnic stereotypes, national affiliation, and transnational immigration as factors in language choice in southern Belize. Language in Society 30, 81–96. Coppieters, R. 1987. Competence differences between native and near-native speakers. Language, 544–573. Derwing, T.M. 2003. What do ESL students say about their accents? Canadian Modern Language Review/La Revue canadienne des langues vivantes 59, 547–567. Flege, J.E. 1995. Second language speech learning: Theory, findings, and problems. Speech perception and linguistic experience: Issues in cross-language research, 233–277. Gatbonton, E., Trofimovich, P., & Magid, M. 2005. Learners’ ethnic group affiliation and L2 pronunciation accuracy: A sociolinguistic investigation. TESOL Quarterly 39, 489–511. Gluszek, A., & Dovidio, J.F. 2010a. Speaking with a nonnative accent: Perceptions of bias, communication difficulties, and belonging in the United States. Journal of Language and Social Psychology 29, 224–234. Gluszek, A., & Dovidio, J.F. 2010b. The way they speak: A social psychological perspective on the stigma of nonnative accents in communication. Personality and Social Psychology Review 14, 214–237.



Impact of Social Factors: Insights from Learners   

   117

Gluszek, A., Newheiser, A.K., & Dovidio, J.F. 2011. Social psychological orientations and accent strength. Journal of Language and Social Psychology 30, 28–45. Golombek, P., & Jordan, S.R. 2005. Becoming “black lambs” not “parrots”: A poststructuralist orientation to intelligibility and identity. TESOL Quarterly 39, 513–533. Gumperz, J.J. 1982. Discourse strategies (Vol. 1). Cambridge: Cambridge University Press. Hansen Edwards, J.G. 2008. Social factors and variation in production in L2 phonology. In M.Z.J.G. Hansen Edwards (Ed.), Phonology and second language acquisition (pp. 251–279). Amsterdam: John Benjamins Publishing Company. Jenkins, J. 2000. The phonology of English as an international language: New models, new norms, new goals: Oxford University Press, USA. Jenkins, J. 2002. A sociolinguistically based, empirically researched pronunciation syllabus for English as an international language. Applied Linguistics 23, 83–103. Jenkins, J. 2009. English as a lingua franca: interpretations and attitudes. World Englishes 28, 200–207. Kachru, B.B. 1986. The alchemy of English: The spread, functions, and models of non-native Englishes. Oxford: Pergamon. Kurtz, E. 1981. Shame and guilt: Characteristics of the dependency cycle: An historical perspective for professionals: Hazelden Publishing & Educational Services. Labov, W., Ash, S., & Boberg, C. 2005. The atlas of North American English: Phonetics, phonology and sound change. De Gruyter Mouton. Lave, J. & Wenger, E. 1991. Situated learning: Legitimate peripheral participation. Cambridge: Cambridge University Press. Levin, S., & van Laar, C. 2005. Stigma and group inequality: Social psychological perspectives: Lawrence Erlbaum. Lippi-Green, R. 1997. English with an accent: Language, ideology, and discrimination in the United States. New York: Routledge. Lybeck, K. 2002. Cultural identification and second language pronunciation of Americans in Norway. The Modern Language Journal 86, 174–191. Marx, N. 2002. Never quite a native speaker: Accent and identity in the L2 and the L1. Canadian Modern Language Review/La Revue canadienne des langues vivantes 59, 264–281. McLemore, C.A. 1991. The pragmatic interpretation of English intonation: Sorority speech. University of Texas at Austin. Miller, J. 2003. Audible difference: ESL and social identity in schools (Vol. 5): Multilingual Matters Limited. Moyer, A. 1999. Ultimate attainment in L2 phonology. Studies in second language acquisition 21, 81–108. Munro, M.J. 2003. A primer on accent discrimination in the Canadian context. TESL Canada Journal 20, 38–51. Munro, M.J., & Derwing, T.M. 1995. Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language learning 45, 73–97. Newman, B. 2002. Accent. The American Scholar 71, 59–69. Norton, B. 2001. Non-participation, imagined communities, and the language classroom. In M.P. Breen (Ed.), Learner contributions to language learning: New directions in research (pp. 159–171). Harlow, England: Pearson Education. Norton, B., & Toohey, K. 2001. Changing perspectives on good language learners. TESOL Quarterly 35, 307–322.

118   

   Kimberly LeVelle and John Levis

Pavlenko, A., & Norton, B. 2007. Imagined communities, identity, and English language learning. International handbook of English language teaching, 669–680. Piller, I. 2002. Passing for a native speaker: Identity and success in second language learning. Journal of Sociolinguistics 6, 179–208. Piske, T., MacKay, I.R.A., & Flege, J.E. 2001. Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics 29, 191–215. Purnell, T., Idsardi, W., & Baugh, J. 1999. Perceptual and phonetic experiments on American English dialect identification. Journal of Language and Social Psychology 18, 10–30. Quinn, D.M. 2006. Concealable versus conspicuous stigmatized identities. In Levin, S., & Van Laar, C. Stigma and group inequality: Social psychological perspectives (pp. 83–103). Lawrence Erlbaum. Scheff, T.J. 2002. Shame and the social bond: A sociological theory. Sociological Theory 18, 84–99. Schmader, T., & Lickel, B. 2006. Stigma and shame: Emotional responses to the stereotypic actions of one’s ethnic ingroup. In Levin, S., & Van Laar, C. Stigma and group inequality: Social psychological perspectives (pp. 261–285). Lawrence Erlbaum. Schumann, J.H. 1976. Second language acquisition: The pidginization hypothesis. Language Learning 26, 391–408. Schumann, J.H. 1986. Research on the acculturation model for second language acquisition. Journal of Multilingual & Multicultural Development 7, 379–392. Trudgill, P. 2001. Sociolinguistic variation and change. Edinburgh: Edinburgh University Press. van Laar, C., & Levin, S. 2006. The experience of stigma: Individual, interpersonal, and situational influences. In Levin, S., & Van Laar, C. Stigma and group inequality: Social psychological perspectives (pp. 1–17). Lawrence Erlbaum. Wated, G., & Sanchez, J.I. 2006. The role of accent as a work stressor on attitudinal and healthrelated work outcomes. International Journal of Stress Management 13, 329. Wenger, E. 1998. Communities of practice: Learning, meaning, and identity. New York: Cambridge University Press. Wenger, E. 2010. Communities of practice and social learning systems: The career of a concept. In C. Blackmore (Ed.), Social Learning Systems and Communities of Practice (pp. 179–198). London: Springer Wolfram, W., & Schilling-Estes, N. 2006. American English, 2nd ed. Malden. MA: Blackwell. Zuengler, J. 1988. Identity markers and L2 pronunciation. Studies in Second Language Acquisition 10, 33–49.

Erik R. Thomas

6 L2 Accent Choices and Language Contact 6.1 Introduction Second language learners in close contact with speakers of the L2 are confronted with an array of sociolinguistic variants in their target language. The native speakers of the target language that they encounter may speak a regional dialect, and regardless of the region, target language speakers from different social groups exhibit variation in social dialects and command a range of speaking styles. L2 speakers’ ability to construct their own way of speaking – and with it, their own L2 identity – may depend on their ability to discern the order behind the variety of speech forms they encounter. In addition, L2 learners not only have to figure out the social meanings behind pronunciation variants, they have to decide whether to imitate them. Investigation of how populations of L2 learners accomplish this in a community setting is underdeveloped, largely because previous studies have examined limited numbers of linguistic variables. Two fields that have an interest in these issues, language contact and sociolinguistics, have investigated them only to a limited extent. Language contact researchers tend to focus more on issues such as code switching, language attrition, and the motivation to speak in one language or another in a given situation, and less on linguistic variables, linguistic structures, and their social meanings. Tarone (1979) noted the neglect of sociolinguistic aspects of the L2 in language contact studies, and over three decades later the situation has not improved greatly. Sociolinguists, on the other hand, are interested in issues such as how language changes and the social indexicality of linguistic variants, but most have confined themselves to L1 situations. Surprisingly little quantitative sociolinguistic research has explored how L2 speakers respond to language contact with L1 speakers and how this contact affects their own language choices. This review focuses on the sociolinguistic side of contact – its basic approach to research and how it has addressed language contact (mainly for pronunciation)  – and then examines one particular contact variety, Mexican American English (MAE), with a case study to illustrate how this kind of inquiry can proceed.

6.2 The Bottom-Up Approach and Language Contact One of the key differences between language contact research and sociolinguistic research (especially quantitative sociolinguistics) is the place of theory in the

120   

   Erik R. Thomas

research process. Studies of language contact are primarily top-down and theorydriven. That is, research commonly revolves around the testing of a pre-existing theory with data or comparisons of theories in the light of new data. Quantitative sociolinguistics, conversely, is primarily a bottom-up, data-driven field. Studies usually are not based on a full-fledged theoretical model, and they may not even involve an explicit hypothesis. Construction of a theoretical model commences only after analysis of the data. Although they have advantages, theory-driven strategies may at times limit the perspectives applied to data or induce researchers to make data fit into patterns that suit the theories poorly. The quantitative sociolinguistic paradigm offers an alternative way of examining data, one that not just permits but encourages open-ended interpretations. In examining the social meanings of pronunciation variants, the quantitative sociolinguistic approach also takes advantage of methodological and theoretical innovations from phonetics. These advances constitute sociophonetics, which is now a major part of sociolinguistics (see, e.g., Thomas, 2011). Phonetic and phonological transfer is one of the largest influences in L2 learning (Thomason & Kaufman, 1988). However, language contact researchers have generally focused more heavily on morphology, syntax, and semantics and less on phonetics and phonology. Nonetheless, quantitative sociolinguistic studies have rarely examined more than a handful of variables. Authors construct elaborate theories about the social composition of the study community on the basis of such limited evidence. Different variables, however, may cut across a community in numerous ways: e.g., by ethnicity, birth cohort, socioeconomic status, gender, or social clique, or even as features of individual identity. While one variable may make it appear that one such social division is primary within the community, other variables may be correlated with other social divisions. The case study presented here exemplifies how analysis of large numbers of variables can help to reveal the broad sociolinguistic structure of a community. This chapter considers how sociolinguists have recently studied L1 groups, and the limitations of newer approaches. It also examines how these approaches have been applied to L2 situations: for example, clarifying how the pronunciation of L2 learners overlaps and differs from the variation produced by L1 speakers. The presence of ethnic tension can strongly influence L2 pronunciation choices. Studies from a variety of contexts are used to demonstrate this point, with Mexican Americans in the US being the most widely studied contact situation in sociolinguistics. After a review of studies related to Mexican Americans’ pronunciation of English (most of which examine few linguistic variables), a case study of North Town, a community in Texas, is presented. North Town illustrates



L2 Accent Choices and Language Contact   

   121

the value of examining larger numbers of linguistic variables in order to verify the social patterning of L2 pronunciation in language contact situations.

6.3 Sociolinguistic Approaches to Communities Sociolinguistic approaches to variation depend on the ways that social groups are defined. From the early years of sociolinguistics, social divisions were based on broad demographic traits: age group, ethnicity, sex, socioeconomic status, and sometimes neighborhood of residence. Over the years, as discussed by Eckert (2005), some sociolinguistic studies have moved toward a more careful analysis of social connectivity. Examination of the relationship between individual social networks and use of linguistic variants was pioneered by Labov, Cohen, Robins and Lewis (1968). They found that African American adolescents who belonged to “gangs” (then conceived of as small social groups, not as the current meaning of gang) generally showed more features associated with African American Vernacular English than non-gang members. Network analysis became more highly developed with James and Lesley Milroy’s work in Belfast (e.g., Milroy 1987). They introduced measures of how interconnected people were, and found that people in tightly interwoven networks showed more local dialectal features than people in looser networks. Another approach to social connection, introduced by Habick (1980) but better known in work by Eckert (e.g., 1989), involves sociometric analysis, in which networks of larger populations are mapped out. Habick and Eckert have both found that individual networks are part of larger patterns in which certain networks are affiliated with school-sanctioned values and others with non-school-sanctioned values, and that linguistic variables are also aligned with those groupings. With the exception of Fought (1999) and Mendoza-Denton (2008), these methodological innovations have not been applied to language contact situations. More recently, in the “community-of-practice,” or COP, approach (e.g., Bucholtz, 1998), individual social groups (the communities of practice) have been examined by looking at the attitudes and values expressed by the members and the ways that members decide who belongs to the group. Alam and StuartSmith (2011) used this approach in examining the speech of girls of Pakistani background in Glasgow, Scotland. These speakers came from a Punjabi- or Urduspeaking background. Their syllable-initial /t/ sounds were subjected to acoustic analysis to determine where on the retroflex-alveolar-dental continuum they fell. Retroflex articulations reflect influence from Punjabi and Urdu. Girls from a conservative, Islamic-oriented COP showed relatively retroflex articulations, while

122   

   Erik R. Thomas

girls from a COP representing a more rebellious outlook showed relatively dental articulations, more like those of non-Asian Glasgow residents. While community-based studies have shed greater light on how people interact, they often have their own drawbacks. With the increased emphasis on sociological factors has come a decreasing emphasis on the language itself. Progressively fewer linguistic variables have been examined even as knowledge of people’s interactions and attitudes has deepened. For example, Mendoza-Denton (2008) conducted a detailed study of how two Latino groups, the norteños and sureños, differed in a school in the Los Angeles area. She examined a range of behaviors, including such icons as the kinds of makeup students wore. However, she quantitatively analyzed only one linguistic variable, the realization of the bit vowel. Her study was more sociological than linguistic. The hazard of this kind of study is the same as the long-standing weakness of sociolinguistics: elaborate theories of the sociolinguistic structure of the community may be based on incomplete evidence, as unanalyzed variables may dissect the community in completely different ways. Of course, from a language contact perspective, it would be desirable to know about a broader array of variables in order to provide more information for applications such as teaching.

6.4 Acquisition of Variation by Shifting Groups In spite of these drawbacks, the existing studies have yielded some clues about how L2 learners learn to recognize sociolinguistic variation that occurs among native speakers of the L2 and then adapt to their own uses. For the most part, these studies have examined the L2 community as a whole instead of conducting more detailed network or social group studies. Evans, Mistry, and Moreiras (2007) examined the vowels of Gujerati-speaking immigrants and their children in a London neighborhood. The first-generation speakers showed significant differences in vocalic realizations from the matrix community  – the surrounding majority group – and had difficulty making some phonological contrasts found in English. However, the second-generation subjects, all but one born in England, produced vowels that were almost indistinguishable from those of matrix speakers. In this case, the researchers found no sensitivity by the first generation to local sociolinguistic variation, yet their children did not appear to have any appreciable effects from their parents’ accented English. In contrast, Kerswill, Torgerson, and Fox (2008) found strong ethnic effects in London. They compared the diphthongs of residents of two London boroughs: Hackney, with a mixed population of Anglo-British and various immigrant groups, especially from southern Asia and the Caribbean; and Havering, which



L2 Accent Choices and Language Contact   

   123

was mostly Anglo-British. Young Anglo-British in Havering showed diphthongs with only mild differences from the traditional Cockney speech of elderly working-class Londoners. Hackney was a different story. The offspring of the immigrants produced narrow diphthongs for the bait vowel, forms of the bite and bout diphthongs with low central nuclei, and, often, back variants of the boat vowel. All of these ethnic realizations are quite un-Cockney-like. Young Anglos in Hackney varied, depending on whether their friendship networks included many members of the immigrant groups. The authors attributed the ethnic variants to two sources. One is the fact that Anglophone Caribbean varieties have vowels similar to the emergent ethnic realizations. The other, however, was substrate effects from southern Asian languages, for which they noted that L2 learners of English typically produce forms much like the Caribbean variants. In this instance, reinforcement of interference features from Asian languages by the Caribbean immigrants led to their continuance. Quite a different result was found by Horvath (1985), who examined the vowels, especially the diphthongs, of Greek and Italian immigrants to Sydney, Australia. Anglo-Australian English was, at the time the Greeks and Italians immigrated, characterized by readily definable sociolects. These sociolects ranged from the ‘broad’ speech associated with the working class, which showed relatively wide diphthongs, to the speech of the top social levels, which showed narrower diphthongs. Horvath outlined an intermediate sociolect as well. For example, the bait diphthong could range from a wide [ai] form to a narrow [ei] form, with a gradation of intermediate forms. Surprisingly, the first-generation immigrants, apparently in an attempt to fit in as Australians, outdid the ‘broad’ speech by producing even wider diphthongs on average. Nevertheless, the next generation of Greek- and Italian-Australians reversed course and tended toward the intermediate values. Simultaneously, young Anglo-Australians were also moving toward intermediate realizations, so that the speech of immigrants and Australians became indistinguishable for these vowels. A similar case is that of Hall-Lew (2009), who found no evidence of interference from Chinese languages in the vowels of Chinese Americans in San Francisco. Instead, young Chinese Americans were participating fully in vowel shifts occurring among local Anglo Americans. Studies of L2 speakers who learned L2s in classroom settings have yielded comparable results. Adamson and Regan (1991) examined variation in the use of the two phonological variants, [ɪŋ] and [ɪn], in English unstressed syllables spelled -(C)ing among Vietnamese and Cambodians learning English, while Bayley (1996) examined consonant cluster reduction (CCR) – loss of final stops in syllable-coda consonant clusters  – among Chinese natives learning English. Both studies found that L2 learners acquired the same variants that native speak-

124   

   Erik R. Thomas

ers of English show, but that they did not acquire the same constraints on the variation. Adamson and Regan noted that L2 learners sometimes showed lexicalization (consistent, presumably cognitively encoded pronunciations of particular words) instead of rule acquisition. Bayley found that L2 learners showed more CCR when the past tense morpheme was the final consonant (e.g., bused) than they did for monomorphemic clusters (e.g., bust)  – the opposite of the pattern that native speakers exhibit. Another study of L2 learners who acquired English in a school setting is Rindal (2010). It involved Norwegians and their decisions to opt for either American English or British English variants. Variants studied included rhoticity (articulation of pre-pausal and pre-consonantal /r/), tapping of medial /t/, and the quality of two vowels. While the subjects were taught mostly British English in the classroom, their exposure to American English came largely from movies, television, and comparable media. Perhaps as a result of the respective settings of the exposure, the subjects developed differing associations with each variety: they considered British more prestigious but American more informal. They then gravitated toward variants from one dialect of English or the other to project their own identities. A further example of how sensitive immigrant groups are to variation in the matrix community and how quickly they can adopt variants, yet in different ways from native speakers’ usage, is found in Schleef, Meyerhoff, and Clark (2011) and Meyerhoff and Schleef (2012). Like Adamson and Regan (1991), they examined variation in unstressed -ing. They compared the use of the two variants by teenage natives of London and Edinburgh with use by Polish-born immigrant teenagers who had moved to those two cities. The British-born adolescents in both cities showed greater proportions of [ɪn] in spontaneous speech than in reading and among males than among females, both of which are familiar tendencies in studies of this variable across the English-speaking world (see the review in Hazen, 2008). The Polish-born teenagers adopted both variants, but not with the same constraints. In London, Polish-born females showed higher rates of the [ɪn] variant than Polish-born males. In Edinburgh, conversely, Polish-born males showed higher rates of the [ɪn] variant than females, but, unlike locallyborn teenagers, sex was a stronger predictor than speaking style. In this case, the immigrants acquired both variants, but implemented them in novel ways. Task can affect the variation observed. For example, reading versus spontaneous speech leads to style shifting. Major (2004) compared four casual speech processes in readings of sentences and shorter phrases by native speakers of Spanish and Japanese who had learned English. Males of both groups showed greater use of the casual variants, but Spanish speakers showed far more stylistic differentiation between the sentences and the phrases. Tarone (1979) argued that



L2 Accent Choices and Language Contact   

   125

L2 learners showed stylistic conditioning in their L2 speech, not just for read vs. spontaneous speech or between different reading tasks, but within spontaneous speech between formal and casual styles. Few researchers have followed up on this notion, however. Studying variation in spontaneous speaking styles can be difficult for logistical reasons. Two of the most effective methods are to have a community member conduct interviews, which usually requires funding to pay the interviewer, or to convince community members to strap recording equipment to their bodies as they go about their daily routines – not easy for those of us, including the present author, with dismal sales skills. One exception, however, is Sharma (2011), who examined the speech of the Punjabi community in a neighborhood on the west side of London. She had subjects carry recorders during their daily activities. She examined four different phonetic variables. It turned out that some speakers exhibited dramatic differences in the incidence of ethnic variants, depending on whom they were conversing with. Younger women showed such stylistic heterogeneity especially strongly, reflecting their adaptation from a traditional Indian society to Western norms. This finding could not have been obtained if Sharma had relied strictly on interview-style speech.

6.5 Impact of Ethnic Tensions in Language Contact Ethnic tensions can introduce an additional dynamic to language contact situations and can influence the pronunciation variants chosen by L2 users of a language. Nortier and Dorleijn (2008) describe how such a situation developed in The Netherlands, in which some L2 speakers of Dutch chose to employ the pronunciation of other L2 users rather than that of the L1 speakers. They examined the speech of Moroccan-Dutch and Turkish-Dutch subjects in several urban centers. A number of features, mostly phonological, from Moroccan Arabic and Berber had become established in the Dutch of these groups. The dialect took on special meaning after the World Trade Center bombing of 2001, however, as open anti-Muslim sentiment broke out among ethnic Dutch. Assassinations of two prominent Dutch figures increased the tension. Immigrant groups found themselves having to choose between a Dutch identity and a Muslim identity. The Moroccan-Dutch ethnolect crystallized in this situation. The authors wondered why the features were of Moroccan, not Turkish, origin, in spite of the approximately equal numbers of Moroccans and Turks. The reason appeared to be that Turks were less prone to give up their L1 than Moroccans, who had shifted rapidly to Dutch but carried over various interference features. When the Turks did acquire Dutch, it was often the Moroccan-influenced ethnolect, which had

126   

   Erik R. Thomas

become a Muslim identity marker. They suggested that Turks were more clannish than Moroccans as well, which could suppress their influence on other groups. Surprisingly, though, certain groups of native Dutch teenagers were also adopting the Moroccan ethnolect because they saw it as a marker of urban identity. In other situations of long-term language contact, pronunciation choices may differ from both the L1 and L2 patterns. This is the case for Turkish speakers of German in Germany. In Germany, Turks constitute the main immigrant group and have experienced a great deal of discrimination. Most notably, German governments have resisted Turkey’s admission to the European Union in part to deny German citizenship to residents of Turkish extraction. This situation has resulted in multi-generation Turkish communities in Germany whose members have an uncertain status. Turks have, in turn, developed a German dialect of their own characterized by features molded, often in ways that are not straightforward, from Turkish interference features. Queen (2001, 2012) describes one such development. Turkish German exhibits two rising phrase-final contours in its intonational system: one that is also found in mainstream German and another typical of Turkish. However, the semantic and pragmatic uses of these contours do not match those of either language. Moreover, the incidence of two other final tones, a falling tone and a low tone, differ for Turkish German bilinguals and ethnic Germans. In the bilingual situation, Turkish Germans have reallocated all of these final contours to form a new system that can serve as a marker of their ethnic identity. Such mixing of intonational systems is probably common in contact situations: Birkner (2004) found a similar mixture among German-Portuguese bilinguals in Brazil when speakers engage in code switching and apply a contour from Portuguese to phrases in German. Even in cases in which L2 users take on L1 features, tensions may lead to social differentiation, as found by Sharma and Sankaran (2011). This study was part of the same project discussed in Sharma (2011), described above. The Punjabi community in London was established after World War II, but by the late 1960s, overt hostility, including riots, broke out against Punjabis. The Punjabi community responded with anti-racist organizing. By the late 1980s, the tensions had dissipated. In the meantime, however, a Punjabi ethnolect was developing. Among its most salient features was a retroflex variant of /t/, a direct carryover from Punjabi. Punjabis who had grown up in India or Pakistan commonly substituted retroflex [ʈ] for alveolar [t] in any phonetic context, though most commonly in medial position, e.g., later. Those born in England showed progressively lower rates of retroflection, and it became increasingly associated with word-initial position, as in two. Moreover, young females tended to avoid it altogether. Simultaneously, the incidence of glottalized /t/ in medial and final positions, a feature of vernacular Anglo-British dialects, was increasing. A new dialect with features



L2 Accent Choices and Language Contact   

   127

of both Punjabi and Anglo English and an association with maleness had taken shape. Even within a relatively homogenous community, variation is evident in relation to identification with L1 social groupings. Hirson and Sohail (2007) examined English spoken in another Punjabi community, this one in Bradford, England. They analyzed rates of rhoticity. The Punjabi subjects fell into two social orientations: “British Asian,” who were relatively integrated into British culture, and “Asian,” who were not well integrated. The authors noted that the two identifications were related to “religious affiliations” and “politically charged tensions” (p.1501). The British Asian group showed almost no rhoticity, reflecting the speech of the surrounding Anglo community. The Asian group, conversely, showed a great deal of rhoticity. The high degree of rhoticity in the Asian group reflects the fact that /r/ can occur in syllable codas in Punjabi, including positions in which a consonant or pause follows it.

6.6 Studies of Mexican American English The above studies from around the world lay out many of the factors that affect linguistic choices in L2 situations: ethnic tensions, size of the L2 group, gender identity, and other kinds of identity. Most of these factors have been identified as affecting the English of the best studied language contact population in the United States, Mexican Americans, who constitute the largest immigrant group over the past hundred years (Barrera, 1979). There were Hispanic people in Texas, New Mexico (including modern-day Arizona), and California when the United States annexed those areas, of course, and immigration from Mexico occurred throughout the remainder of the 19th century. However, as restrictive policies caused immigration from Europe to wane in the early 20th century, and as immigration from other parts of the world, particularly Asia, has yet to challenge that from Mexico, Mexican Americans have become the largest and most visible immigrant group. Moreover, immigration from Mexico accelerated after 1900, both because of problems in Mexico and because other sources of cheap labor became unavailable in the United States (Barrera, 1979). Mexican Americans’ position is somewhat analogous to the various groups of guestworkers in European nations in that they represent a large group providing unskilled and semi-skilled labor who entered speaking a different language from the dominant segment of the population. They have experienced many of the same challenges as well, such as residential segregation, discrimination of various types, and at times ethnic tensions. Much like the speech of their European counterparts, their speech reflects

128   

   Erik R. Thomas

a mixture of interference from their heritage language and new sociolinguistic patterns. The difficulties that Mexican Americans faced began well before the heavy immigration from Mexico in the twentieth century. De León (1982) describes how Texans of Mexican background (Tejanos) were frequently displaced from land that they owned during the nineteenth century and were at times terrorized, such as by lynchings or by disruption of their business enterprises by Anglos who wanted the business. Tejanos were rarely elected to public offices except in a few majority-Tejano areas, and attempts were made to disenfranchise them. Few of them learned English during that period. Hispanics were deprived of their land, quite often by unfair means, in California and New Mexico as well (Barrera, 1979). During the twentieth century Hispanic groups still endured abuses. Anglo ranchers and farmers commonly kept a Hispanic family for their labor – the men for ranch or agricultural work and the women for domestic chores – in what is termed the patrón system. Not surprisingly, the housing and facilities provided for the Hispanic families were ordinarily substandard. When landowners mechanized their operations during the period from 1920 to 1950, the Tejano families were often summarily evicted (Foley, 1988). They generally moved into towns and cities to find other work. This movement coincides with an overall shift of the Hispanic population from rural to urban areas (Barrera, 1979). In cities and towns, however, they encountered residential segregation, school segregation, a lack of city services in the neighborhoods where they lived, prejudicial treatment by law enforcement officers, and exploitation of their votes by politicians (Foley, 1979). In schools, Tejano children were prohibited from speaking Spanish, though they did finally receive compulsory primary education, which resulted in widespread learning of English. In earlier times, the lack of English skills among Hispanics had provided opportunities for unscrupulous Anglos to take their land. It was during this period that the shift from Spanish to English gained momentum and Mexican American English (MAE) became a recognizable entity. As noted, MAE is one of the most heavily studied contact varieties in the world, and certainly the most extensively studied contact variety in the United States. A few other U.S. contact varieties have attracted occasional attention from sociolinguists, such as Chinese English (Bayley, 1996), Native American English (Wolfram et al., 1979), and Vietnamese English (Wolfram & Hatfield, 1984), as well as another Spanish contact variety, Puerto Rican English (Ma & Herasimchuk, 1971; Slomanson & Newman, 2004; Wolfram et al., 1979). Nevertheless, even though research on MAE outstrips that of other contact varieties, many aspects of MAE are poorly understood, including its social and regional heterogeneity, its relationship to social traits of its speakers, the range of linguistic variables that characterize it, and its continuing development.



L2 Accent Choices and Language Contact   

   129

Although there were some early discussions of MAE such as Lynn (1945), the earliest detailed study of it was by Sawyer (e.g., 1959, 1964) in San Antonio. Sawyer’s work is not held in high esteem today, perhaps unfairly, because she regarded MAE as a transitional variety: to her, MAE represented only the speech of people whose L1 was Spanish. As was the custom of the day, she put considerable emphasis on lexical variation. However, she also determined that her Mexican American subjects were not acquiring the Southern dialect features that Anglos in San Antonio exhibited, and that Mexican Americans often lacked English contrasts with no analogs in Spanish, such as certain vowel contrasts and contrasts among post-alveolar consonants (e.g., /ʧ/ as in chip vs. /ʃ/ as in ship). At the time she was active, most Mexican Americans probably did have Spanish as their L1 and dominant language. That situation soon began to change, however, and by the 1970s, scholars started to recognize that MAE could represent a more stable variety (Bills, 1977; Metcalf, 1972). The 1970s witnessed a small burst of studies of MAE from across the Southwest. These studies had a strong descriptive component though they incorporated some early sociolinguistic methods as well, mainly in surveying numerous speakers and in taking into account factors such as socioeconomic status. Castro-Gingras (1972), Natalicio and Williams (1972), and Register (1977) described Spanish interference features. Garcia (1976) employed a lexical questionnaire in a survey of Brownsville, Texas, comparing subjects from “barrio” (heavily Mexican American) and non-barrio sections of town. Two other studies from Texas, Thompson (1975) and Hamilton (1977), investigated the degree to which Mexican Americans adopted regional features of Anglo dialects, though Thompson also included an interference feature from Spanish, the devoicing of final /z/. Doviak and Hudson-Edwards (1980) examined devoicing of final /z/ in Albuquerque. McDowell and McRae (1972) compared two regional variables in the speech of Mexican Americans, Anglos, and African Americans and across social classes and speaking styles in Austin, Texas. The three ethnic groups patterned differently for the two variables combined. Moreover, speaking style affected the variables strongly, while social class affected Anglo speech most consistently and Mexican American speech least consistently. Hartford (1975) examined fourteen variables, such as CCR and realization of interdental fricatives as stops, in the MAE of Gary, Indiana, and tested for correlations with social factors and speaking style. She found that speakers with higher occupational aspirations tended to use more Standard English features than those with lower occupational aspirations. The latter group gravitated both toward features arising from Spanish interference and occasionally toward features such as non-rhoticity borrowed from African American English. In general, these studies found considerable variation

130   

   Erik R. Thomas

among Mexican Americans, with some moving toward more Anglo-like speech and others showing English speech more heavily influenced by Spanish. By this time, a wider circle of researchers was finally taking note of MAE as an emerging dialect. Ornstein (1975) represented an early attempt to create a general description of MAE, listing a variety of features associated with it. Other descriptive overviews followed during the 1980s (García, 1984; González, 1988; Peñalosa, 1980; Penfield & Ornstein-Galicia, 1985; Wald, 1984), sometimes with other agendas such as applications to teaching of Mexican American students or demonstrating that MAE was more than a transitional phenomenon. In subsequent years, other overviews of MAE have benefitted from the continual expansion of research on the variety (Bayley & Santa Ana, 2008; Fought, 2003, 2006; Santa Ana & Bayley, 2008). Nevertheless, the 1980s also produced a number of sociolinguistic investigations of MAE (Galindo, 1987, 1988; Melendez, 1982; Merrill, 1987; Wald, 1981). These studies focused on particular variables associated with MAE, especially those typical of speakers whose L1 is Spanish, such as devoicing of final /z/ and the tendency to realize /ʃ/, as in she, as [ʧ], and /ʧ/, as in much, as [ʃ]. The questions of whether these substitutions reflect phonemic categories is not clear. Some outlined differences between speakers who learned Spanish before English and those who learned English from early childhood. The latter consistently exhibited lower incidences of variants traditionally associated with MAE. Although this finding led Merrill (1987) to argue that MAE was merely a transitional speech form, most other MAE researchers were adopting the notion that MAE is an established dialect. Another research thrust in Mexican American studies has been consonant cluster reduction, or CCR. CCR, noted in Bayley (1996), is the tendency of syllablefinal consonant clusters in which the last member is a stop, as in bust, to lose the stop. Numerous factors affect loss, of which two of the most important are the nature of the next segment and whether the stop forms a separate morpheme, as in bussed. A following vowel, as in bust up, will favor retention of the stop, while a following consonant, as in bust my, will favor deletion of the stop. The stop is also more likely to be retained if it constitutes a discrete morpheme than when there is no morpheme boundary within the cluster. In addition, CCR is more favored in unstressed syllables, as in forest, than in stressed syllables. With regard to social constraints, CCR is generally more common in casual speech and among lowerstatus speakers and males than in formal speech or among higher-status speakers or females. Hartford (1975) and Galindo (1987) had studied CCR in MAE, finding many of the usual patterns that occur in other dialects. Santa Ana (1992, 1996) and Bayley (1994) analyzed CCR among Mexican Americans in Los Angeles and San Antonio, respectively. Both found that stress could fail to affect CCR, though it depended on the age group in Bayley (1994). Santa Ana’s (1992, 1996) subjects



L2 Accent Choices and Language Contact   

   131

showed a highly unusual tendency to exhibit more CCR in past participles than in past tense forms, but Bayley’s subjects showed no such difference. The study of MAE has also expanded into acoustic studies, primarily of vowel variation. Much of this work, especially the earlier efforts, has been descriptive. Godinez (1984) and Godinez and Maddieson (1985) began the inquiry by outlining differences between MAE and Anglo English in California. For example, Mexican Americans, both monolingual in English and bilingual, showed higher forms of the bat vowel and less fronting of the boot vowel than Anglos. Veatch (1991) followed with an analysis of a Los Angeles Mexican American’s vowels. Thomas (1993, 2001) examined Texas MAE, finding that many Mexican Americans lacked the raising of the bat vowel before nasals, as in hand, that typifies Anglo dialects. All of those studies represent parts of California or Texas with majority Mexican American populations. In contrast, several recent acoustic studies have examined MAE in the Great Lakes area, where Mexican Americans are clearly a minority and a vowel shift known as the Northern Cities Shift predominates in Anglo speech. Studies in this region all show Mexican Americans accommodating to local Anglo speech to a greater or lesser degree (Konopka & Pierrehumbert 2008; Roeder 2010; Ocumpaugh 2010). Clearly, the relative size of the immigrant group’s population makes a difference. Even so, in North Carolina, where Mexican Americans are also a minority, Wolfram, Carter, and Moriello (2004) found that Mexican Americans were not acquiring glide weakening of the bide vowel, which occurs in the matrix Anglo and African American dialects where the interviews were conducted. A departure from the vowel studies is Van Hofwegen (2009), who used acoustic analysis to show that /l/ was “lighter” (less velar) among Mexican Americans than among Anglos in a Texas community. This variable had been examined by Galindo (1987, 1988), but only auditorily. Relatively recently, new social network techniques taken from sociology have been applied in sociolinguistic studies of MAE. Mendoza-Denton (2008) was discussed above. Another such study is Fought (1999), who compared the speech of Mexican American students with and without gang affiliations in a California school. Those without gang affiliations tended to show fronter realizations of the boot vowel than those who were connected with gangs, and variation seemed to be used to signal those identities. Fought (2003) expanded the analysis to other variables, which revealed other social configurations for the additional variables.

6.7 A Case Study of Language Contact: North Town Sociolinguistic studies of language contact situations, especially the many on MAE, have provided clues as to how speakers construct new lects in diverse

132   

   Erik R. Thomas

language contact situations. A weakness of prior studies, however, is that they posit (at times elaborate) social structures on the basis of few linguistic variables. While the correlations that these studies have found are undoubtedly accurate, they may well be missing other correlations that affect different linguistic variables. Only a study that includes a large number of variables, some salient and others subtle, can determine whether the more obvious linguistic variables represent the social structure fairly. Here, a case study of a community in Texas, utilizing many variables, is undertaken. Contact between Tejanos and Anglos spans several generations in this community. An analysis of older speakers illustrates how thoroughly long-term social tensions can influence linguistic systems. “North Town,” a pseudonym created by Foley (1988), lies between San Antonio and the Rio Grande valley. This region was settled later than either the “Valley,” as the lower Rio Grande area is known, or San Antonio (De León, 1982). The town itself was founded in 1882 (Tillotson, 1971). Mexicans began moving into North County and living under the patrón system as soon as the area was settled, and by 1920 the county had a majority Mexican American population (Foley, 1988: 289). By that time, many of them were moving into North Town itself. However, they did not move into the original part of town, but into “Mexican Town,” on floodprone ground across the railroad tracks. The railroad continues to divide the community between more and less affluent halves to this day. Anglos controlled the municipal government until Mexican Americans began to run for public offices in the 1960s and 1970s. Before that time, lack of education had hindered Mexican Americans. Few attended high school until the 1930s and 1940s. By the 1950s and 1960s, when they began to push for public improvements such as street paving in their neighborhoods, some were beginning to acquire the advanced degrees that allowed them to take on leadership roles. The town was segregated not only in residence but also in the schools. Segregation of elementary schools did not cease until 1969 (Foley, 1988). During the 1970s, a movement among Mexican Americans called La Raza Unida emerged, aiming to win control of town government. Anglos mounted vigorous opposition and defeated it, but by the 1980s Mexican Americans were quickly displacing Anglos and in the 1990s a Mexican American was elected mayor. Mexican Americans have even integrated the old section of town. Local Anglos have also been losing control of the ranch and farm lands, not to Mexican Americans but to wealthy outsiders who have been buying the land for private deer hunting preserves. Anglos have come to accept their minority status, though they still own many of the important businesses in town. The population of North Town, numbering a little over 9000, is now about 85 % Hispanic (2010 Census), and the proportion is increasing.



L2 Accent Choices and Language Contact   

   133

In spite of the current calm within North Town, most older Mexican Americans remember the prolonged struggle they endured on several fronts – services, residence, education, and local governance. One would not expect those generations to identify with Anglos, and it was these generations who established the Mexican American speech patterns. Sociolinguistic investigation of North Town showed how the dialectal development played out. Interviews were conducted in North Town in 2005 and 2007. The interviews were entirely or primarily conversational, and subjects were interviewed in both English and Spanish whenever possible. Although all generations were interviewed, the analysis here focuses on the oldest generation of Mexican Americans, the first one that systematically learned English in North Town, as well as North Town Anglos of the same age range. These subjects, including ten Mexican Americans (two male and eight female) and eight Anglos (six male and two female), were born during the period from 1918 to 1938. All grew up in North Town or surrounding parts of North County, though two Mexican Americans spent part of their childhood in the Texas Panhandle. Interviews have been analyzed for a large number of variables. Twenty-four were analyzed as continuous variables and are included here. The large number of variables ensures that no single variable can skew the overall patterning found across the community. The variables encompass vocalic, consonantal, and one prosodic variable, the latter of which is prosodic rhythm (i.e., the degree of stresstiming or syllable-timing, measured with the nPVI method of Low, Grabe & Nolan, 2000). Each of these variables exhibits a variant that is known or suspected to characterize either MAE (see the preceding section) or the rural Southern dialect spoken by the Anglos. All variables were measured using acoustic techniques, as outlined in Thomas (2011), or in the case of certain consonantal variables, acoustic analysis and auditory coding in conjunction. Vowel measurements were subjected to normalization using the technique of Lobanov (1971), and means of the normalized values for each vowel were used in the statistical analyses. Each of the 24 linguistic variables was analyzed using linear regression. Ethnicity, sex, highest level of education attained, and year of birth were treated as the independent variables. Analysis progressed in two steps. First, a regression analysis was run for each dependent (linguistic) variable with all four independent variables. Second, further regression analyses were run by eliminating independent variables until the model maximizing the R2 value was attained. Models that maximize R2 are generally more informative than the full models because more of the variance is explained by fewer variables. The results are shown in Table 1. (Model-maximizing numbers are used in all cases unless only the Full model numbers were available. These are marked in Table 1 in italics.)

134   

   Erik R. Thomas

Table 1: Linear regression results for continuous variables. Maximal R2 model numbers are used for all cells unless there were results only for the full model, which are listed in italics. For significance levels, × denotes p