The Sonority Controversy (Phonology & Phonetics) 3110261510, 9783110261516

Sonority has a long and contentious history. It has often been invoked by linguists as an explanatory principle underlyi

233 23 9MB

English Pages 503 [504] Year 2012

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Sonority Controversy (Phonology & Phonetics)
 3110261510, 9783110261516

Table of contents :
List of contributors
Introduction
Part 1: Sonority and Phonotactics
Sonority and sonority-based relationships within American English monosyllabic words
The role of sonority in the phonology of Latin
Is the Sonority Sequencing Principle an epiphenomenon?
Sonority distance vs. sonority dispersion - a typological survey
Sonority variation in Stochastic Optimality Theory: Implications for markedness hierarchies
Sonority intuitions are provided by the lexicon
Part 2: Sonority and Phonetics
Sonority and central vowels: A cross-linguistic phonetic study
Sonority and the larynx
Articulatory bases of sonority in English liquids
Part 3: Sonority and Language Acquisition
The Sonority Dispersion Principle in the acquisition of Hebrew word final codas
Part 4: Sonority and Sign Language
Acceleration peaks and sonority in Finnish Sign Language syllables
Part 5: Sonority and Computational Modeling
Sonority and syllabification in a connectionist network: An analysis of BrbrNet
References
Author index
Index of languages, dialects, and linguistic families
Subject index

Citation preview

The Sonority Controversy

Phonology and Phonetics 18 Editor

Aditi Lahiri

De Gruyter Mouton

The Sonority Controversy

Edited by

Steve Parker

De Gruyter Mouton

ISBN 978-3-11-026151-6 e-ISBN 978-3-11-026152-3 ISSN 1861-4191 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. ” 2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston Typesetting: PTP-Berlin Protago TEX-Production GmbH, Berlin Printing: Hubert & Co. GmbH & Co. KG, Göttingen 앝 Printed on acid-free paper 앪 Printed in Germany www.degruyter.com

This volume is dedicated to the memory of Nick Clements, who taught all of us so much about sonority.

Contents List of contributors

ix

Introduction Steve Parker

xi

Part 1: Sonority and Phonotactics Sonority and sonority-based relationships within American English monosyllabic words Karen Baertsch

3

The role of sonority in the phonology of Latin András Cser

39

Is the Sonority Sequencing Principle an epiphenomenon? Eric Henke, Ellen M. Kaisse and Richard Wright

65

Sonority distance vs. sonority dispersion – a typological survey Steve Parker Sonority variation in Stochastic Optimality Theory: Implications for markedness hierarchies Jennifer L. Smith and Elliott Moreton Sonority intuitions are provided by the lexicon Ruben van de Vijver and Dinah Baer-Henney

101

167 195

Part 2: Sonority and Phonetics Sonority and central vowels: A cross-linguistic phonetic study Matthew Gordon, Edita Ghushchyan, Bradley McDonnell, Daisy Rosenblum and Patricia A. Shaw

219

Sonority and the larynx Brett Miller

257

viii

Contents

Articulatory bases of sonority in English liquids Michael Proctor and Rachel Walker

289

Part 3: Sonority and Language Acquisition The Sonority Dispersion Principle in the acquisition of Hebrew word final codas Outi Bat-El

319

Part 4: Sonority and Sign Language Acceleration peaks and sonority in Finnish Sign Language syllables Tommi Jantunen

347

Part 5: Sonority and Computational Modeling Sonority and syllabification in a connectionist network: An analysis of BrbrNet Paul Tupper and Michael Fry

385

References

411

Author index

473

Index of languages, dialects, and linguistic families

481

Subject index

485

List of contributors Dinah Baer-Henney Universität Potsdam [email protected]

Elliott Moreton University of North Carolina [email protected]

Karen Baertsch Southern Illinois University [email protected]

Steve Parker Graduate Institute of Applied Linguistics and SIL International [email protected] [email protected]

Outi Bat-El Tel-Aviv University [email protected] András Cser Pázmány Péter Catholic University [email protected] Michael Fry Simon Fraser University [email protected] Matthew Gordon University of California Santa Barbara [email protected] Eric Henke University of Washington [email protected] Tommi Jantunen University of Jyväskylä tommi.j.jantunen@jyu.fi

Michael Proctor University of Southern California [email protected] Daisy Rosenblum University of California Santa Barbara [email protected] Patricia A. Shaw University of British Columbia [email protected] Jennifer L. Smith University of North Carolina [email protected] Paul Tupper Simon Fraser University [email protected]

Ellen M. Kaisse University of Washington [email protected]

Ruben van de Vijver Universität Potsdam [email protected]

Bradley McDonnell University of California Santa Barbara [email protected]

Rachel Walker University of Southern California [email protected]

Brett Miller University of Cambridge [email protected]

Richard Wright University of Washington [email protected]

Introduction This is a book about sonority. As used in linguistics today, sonority can be defined as “a unique type of relative, n-ary (non-binary) feature-like phonological element that potentially categorizes all speech sounds into a hierarchical scale” (Parker 2011: 1160, this volume). However, discussions of sonority in the phonological literature have been quite contentious. On the one hand, it has often been appealed to by researchers as an explanatory principle underlying various cross-linguistic phonotactic generalizations, especially in the domain of the syllable. Thus, in the last two decades sonority has continued to be frequently invoked within the framework of Optimality Theory (OT: Prince and Smolensky 1993/2004), the predominant model of phonology in the generative tradition. On the other hand, many phonologists and phoneticians have expressed concerns about the adequacy of formal accounts based on sonority, including even doubts about the very existence of sonority itself (see, for example, Ohala 1990). More recently, a polemic has reemerged with respect to the role of sonority in Universal Grammar. Specifically, Berent et al. (2007, 2009) argue that native speakers have an innate awareness of phonotactic tendencies rooted in sonority. Nevertheless, works such as Daland et al. (2011) counter that these effects can be explained by statistical patterns extrapolated from language-specific lexicons, without requiring linguistic predisposition in the form of built-in constraints. Consequently, the field of linguistics would definitely benefit from a comprehensive review of sonority and a fresh look at all aspects of its behavior, both from the perspective of phonetics as well as phonology. Given the wide variety of subtopics in this collection, there is something to appeal to everyone – the list of contributions encompasses areas such as Optimality Theory, language acquisition, computational modeling, acoustic phonetics, typology, syllable structure, speech perception, markedness, connectionism, psycholinguistics, and even MRI technology. What ties all of these issues together is a general emphasis on sonority as a common background phenomenon, either from a positive or a negative point of view. Accordingly, a continuum of opinions about sonority is represented, ranging from complete acceptance and enthusiasm, to neutrality, to moderate skepticism. Several previous thematic collections have also included phonological analyses involving sonority, strength, and related factors (e.g., Carvalho, Scheer and Ségéral 2008; Nasukawa and Backley 2009). To date, however, the topic of sonority has never been the sole focus of an entire book, either by a single author

xii

Introduction

or a series of contributing authors. Therefore, a complete volume exploring different viewpoints about sonority is a unique addition to the field of phonology. The list of contributors includes many well-known and respected linguists who regularly publish their research in leading academic journals. Furthermore, this collection of chapters is not just a rehash of previous work; rather, each paper contains new, cutting-edge results based on the latest trends in the field. Hence, no other extant piece of literature matches this volume in terms of its coverage of issues and breadth of subthemes, all focused together on phenomena related to sonority. With respect to the intended audience, the papers in this collection assume a general working knowledge of phonetics and phonology. Consequently, they are not geared towards beginners. Rather, the volume is designed for serious researchers who need to stay abreast of the fields of descriptive and theoretical linguistics. It should therefore appeal to graduate students and professors in departments allied with all aspects of the scientific study of language, including psychology, education, computer science, philosophy, etc. The twelve chapters in this collection are divided into five main parts. Within each of these five sections the individual papers are presented in alphabetical order, according to the surname of the (first) author. As a preview of the entire book I briefly summarize below the contribution that each of these twelve studies makes to the overall theme. Part 1 is comprised of six chapters focusing on phonotactic patterns in one or more specific languages: Karen Baertsch analyzes the segmental combinations occurring within different subparts of the syllable in American English. She accounts for the sonority relationships amongst these by positing a series of locally-conjoined optimality theoretic constraints specific to the various margin and nuclear positions occupied by the respective phonemes. András Cser examines the role of the Sonority Sequencing Principle and the Syllable Contact Law in Latin consonant clusters. The presentation is rich in data, and along the way he proposes three different versions of a general Place Condition that capture the privileged behavior of coronals. Eric Henke, Ellen Kaisse, and Richard Wright present case studies of consonant clusters in Korean and Greek in order to show that traditional sonoritybased accounts are inadequate. They argue instead that sonority sequencing tendencies are best explained by a perceptual model that is sensitive to the relative robustness of acoustic cues in specific phonological environments. Steve Parker reports the results of a typological survey of 122 languages containing syllable-initial consonant clusters in order to shed light on which kind of complex onset is universally preferred: consonant plus glide, or con-

Introduction

xiii

sonant plus liquid? In the ensuing discussion he identifies two subpatterns: a series of conceptually-related languages in which C2 must always be a glide, and a competing set of languages in which onset clusters must always end with a liquid. These findings are compared and contrasted with the traditional notions of minimum sonority distance and sonority dispersion. Jennifer Smith and Elliott Moreton consider the question of which type of optimality theoretic constraint family better characterizes markedness hierarchies such as the sonority scale: a scale-partition model with universally fixed rankings, or a stringency (subset) approach allowing permutable rankings? In the accompanying analysis they invoke a stochastic version of OT in order to tease apart the divergent predictions of these two types of constraint families in accounting for intra-speaker variation. Ruben van de Vijver and Dinah Baer-Henney investigate the nature of onset clusters in German by means of three experiments: a psycholinguistic probe of nonce forms, a statistical analysis of corpus frequency counts, and a computational learning simulation. They conclude that knowledge of sonority is not necessarily innate; rather, German speakers can extrapolate the relevant generalizations from patterns independently found in their lexicon. This outcome thus concurs with that of Daland et al. (2011) mentioned above. Part 2 of the volume consists of three chapters examining the physical bases of sonority by means of phonetic studies: Matthew Gordon and four co-authors present the findings of a major examination of four acoustic correlates of sonority among vowels, particularly schwa, in five languages: duration, peak intensity, acoustic energy, and perceptual energy. Their data reveal that, regardless of its language-specific phonological behavior, schwa does generally yield values different from most other vowels in these languages along one or more of these parameters. However, they conclude that vowel sonority may actually derive from a combination of several phonetic dimensions, not just a single measure by itself. Brett Miller explores two scalar production mechanisms underlying the acoustic and perceptual cues that promote typical segment sequencing classes: sound source and vocal tract aperture. Based on the phonological patterning of implosive and glottal consonants in particular, he concludes that when these two scales might assign contradictory sonority rankings to pairs of similar sound types, the segments in question remain mutually unranked instead. Michael Proctor and Rachel Walker investigate the articulatory production of American English liquid consonants (/l/ and /ô/) in different syllabic contexts by means of real-time structural MRI. Given the overall greater stability of /ô/ than /l/ in terms of their lingual configurations across environments, they

xiv

Introduction

conclude that phonotactic constraints can be impacted by a range of articulatory factors, not just degree of stricture alone, as physical correlates of sonority. The third section of the volume is assigned to one paper dealing with language acquisition: As Outi Bat-El notes, Clements’ (1990) Sonority Dispersion Principle predicts that sonorant codas should be produced by language learners earlier than obstruent codas since the latter are more marked in this position universally. Nevertheless, this tendency is contradicted by data from three Hebrew-speaking children who acquired obstruent codas first. She suggests that these learners were influenced by the higher language-specific frequency of word-final obstruents, and attributes this effect to the combined complexity of prosodic markedness (codas) along with segmental markedness (sonorants). Part 4 consists of one paper analyzing a particular signed language: Tommi Jantunen carries out an empirical test of the Acceleration Peak Hypothesis, by which nuclear sonority is correlated with rapid movement of the index finger. Using data from Finnish Sign Language, he shows that each syllable can in fact exhibit from zero to three acceleration peaks and, furthermore, that these peaks can occur outside of canonical syllables. Consequently, he concludes, the hypothesis that sonority corresponds directly to acceleration is not substantiated by this study. The last chapter in the volume deals with computational modeling: Paul Tupper and Michael Fry examine the operation of BrbrNet, a connectionist network that syllabifies words in Imdlawn Tashlhiyt Berber using just two constraints: Onset and the Sonority Sequencing Principle. They show that the original version of BrbrNet proposed by Legendre, Sorace, and Smolensky (2006) generates the wrong parsing in certain words containing five or more segments. Therefore, they propose a different set of Harmonic Grammar parameters by which BrbrNet produces the correct results in all cases, plus a formal proof of this claim. Some of the twelve chapters summarized above can be grouped in alternative ways based on certain other shared subthemes. For example, one of the classic motivations for the traditional view of sonority is to explain the nature of phonotactic restrictions on clusters of consonants in syllable-initial and/or word-initial position (Steriade 1982; Selkirk 1982, 1984). This is a major focus of five of the papers in this collection: those by Baertsch; Henke, Kaisse, and Wright; Parker; Smith and Moreton; and van de Vijver and Baer-Henney. Of these, the contributions by Baertsch, Parker, and Smith and Moreton confirm a need for minimum sonority distance constraints of one type or another. In contrast to this, the chapter by Henke, Kaisse, and Wright eschews sonority in favor of constraints encoding the inherent perceptibility of certain cross-linguistically

Introduction

xv

favored sequences such as initial s+stop in languages like English. This line of attack on an old problem motivates an alternative to potentially awkward mechanisms such as the Sonority Sequencing Principle combined with extraprosodicity. Their approach builds on an intriguing proposal by Steriade (2009) called the P-map. This back-and-forth debate between sonority (an articulatory and acoustic notion) and perception is likely going to continue growing in importance in future work on segmental phonotactics. Another important research agenda in the past two decades has been constraint based models such as Optimality Theory. Four of the chapters in this volume are cast in this light: those by Baertsch, Smith and Moreton, Bat-El, and Tupper and Fry. Of these, Baertsch and Bat-El utilize the classical model of Prince and Smolensky (1993/2004). The other two papers invoke more recent refinements: Smith and Moreton employ the probabilistic constraints of Stochastic OT (Boersma 1998a, 1998b; Boersma and Hayes 2001), while Tupper and Fry delve into Harmonic Grammar, a hot topic in the current literature (e.g., Pater 2009). Harmonic Grammar employs weighted constraints to account for frequency effects in language-specific lexicons, among other things. Consequently, analyses that describe such phenomena in detail often require computational modeling and simulations in order to exhaustively calculate all of the statistical possibilities. This is the main thrust of one chapter, that of Tupper and Fry. Two other contributions also deal with this issue (mathematical confirmation of theoretical proposals) to a lesser degree: those of Smith and Moreton, and van de Vijver and Baer-Henney. A hallmark of recent argumentation about linguistic theory is that proposals for revisions to our formal models have become increasingly technical. Thus, it has become standard practice, and seemingly even a requirement, to explore the ramifications of one’s claims via in-depth, rigorous proofs of this type. Finally, given the overlap of subthemes in this collection, a felicitous side effect is that some phenomena are examined from more than one perspective. This is nicely illustrated, for example, by empirical data involving liquid consonants in rhyme position in American English. Specifically, the phonetic study of Proctor and Walker employs instrumental (MRI) technology to plot the gradient behavior of lingual trajectories in actual pronunciations by native speakers. Complementing this, Baertsch provides a more categorical look at how to capture the corresponding phonological patterns in terms of OT constraints. Together these two chapters provide a fresh reexamination and confirmation of the claim that English r is higher in sonority than laterals such as /l/ (Parker 2002, 2008, 2011). This outcome, in turn, casts doubt on the common practice of lumping all liquids together into one monolithic class vis-à-vis the universal sonority hierarchy (Clements 1990).

xvi

Introduction

Acknowledgments I am indebted to Katja Lehming of de Gruyter for promptly answering numerous questions concerning the logistical details of this volume. I am also grateful to Aditi Lahiri, the general editor of the series in which this book appears. She provided crucial direction and guidance at several stages of the process. Finally, I wish to thank the following colleagues (in alphabetical order) for their input and suggestions on various portions of this volume: John Alderete, Michael Boutin, Rod Casali, Stuart Davis, Paul de Lacy, Colleen Fitzgerald, Maria Gouskova, Min-Joo Kim, Jaye Padgett, Steve Parkhurst, Mary Pearce, Renate Raffelsiefen, and Lorna Rozelle. Dallas, Texas, USA, June 2012

Steve Parker Graduate Institute of Applied Linguistics and SIL International

Part 1: Sonority and Phonotactics

Sonority and sonority-based relationships within American English monosyllabic words Karen Baertsch

Abstract. The sonority scale and minimum sonority distance requirements have long been invoked to describe the restrictions on onset and coda clusters in English syllables, as well as for syllables in other languages. This chapter looks more deeply into the sonority relationships within syllable constituents in American English monosyllables, identifying additional segmental combinations within onset, coda, and rhyme that fail to surface despite meeting the appropriate minimum sonority distance requirement. In these cases, the sequences include ‘worst of the best’ segments that are acceptable but not preferable as singletons. For example, onset clusters may not begin with non-obstruent consonants but sonorants are quite acceptable as singleton onsets. Coda clusters may not consist entirely of non-consonantal segments but non-consonantal glides (vowel offglides) are acceptable coda singletons. Consonantal peaks may not be followed by coda clusters, but syllabic consonants are acceptable singleton peaks.

1.

Introduction and background

The concept of sonority has a long history in the phonological literature, going back to the late 19th century (cf. Sievers 1876/1893; Jespersen 1904). Its effects have been discussed at length and in relation to many languages, but a more precise definition of the phonetic correlates of sonority and the sonority rankings of individual segments has been more elusive. Much of the research into the effects of sonority has framed it in terms of distinctive features (e.g. Steriade (1982), Clements (1988, 1990), Giegerich (1992:152), Blevins (1995: 211)). More recently, Parker (2002, 2008) has used segmental intensity levels to measure the sonority levels of segments in several languages, resulting in a fairly detailed relative sonority hierarchy backed up by phonetic evidence. Traditionally sonority has been invoked, quite successfully, in order to account for a number of phenomena like the general organization of the syllable (sonority sequencing) and for restrictions on consonant clusters in onset and coda positions (minimum sonority distance). While such analyses have been very helpful in advancing our understanding of how sonority is involved in syllabification, they have also been plagued by persistent exceptions to the

4

Karen Baertsch

sonority-driven general principles of syllabification. The position we take in this paper is that sonority relationships do exist within the syllable and that the general principles, like the sonority sequencing generalization and the minimum sonority distance constraints, are valid but that the sonority relationships within the syllable are more complicated than a simple rise or fall in sonority. Our goal in this analysis is to identify some of the sonority-based restrictions within the American English syllable that are not based simply on a sonority rise or fall that meets a categorical threshold. We look here, for example, at segments that are dispreferred but acceptable in a singleton onset or coda and their failure to successfully combine with other dispreferred but generally acceptable segments when that combination would meet a stated sonority distance threshold. The paper is couched in an optimality theoretic framework, in which sonority thresholds are established for singleton segments in each syllable position (onset, peak, and coda) in American English. Those thresholds are then combined in order to examine the more complex behavior evident when onsets and codas branch into clusters, followed by the combination of peak and coda into complex rhymes. At each stage, we identify the sonority-driven interactions among the segments involved. 2.

Analysis

The goal of the present analysis is to identify several sonority-driven effects that fall out of the syllabification of monosyllables in American English. The syllable has often been described in sonority-based terms, and a number of principles of syllabification have become commonplace in the literature. The most relevant of these for the present analysis include the sonority sequencing principle (Selkirk 1982; Blevins 1995, among others) which states that the sonority of segments within a syllable must increase to the peak then decrease to the right edge of the syllable. Stemming from sonority sequencing is the concept of a minimum sonority distance (Steriade 1982; Selkirk 1982, and others) which requires not only an increase in sonority as segments near the peak, but requires that the increase meet a language-specific threshold in order for a consonant cluster to be permissible. A minimum sonority distance analysis of onset clusters would identify an obstruent-glide sequence as the best (steepest sonority rise) onset cluster (as in English twin) and the mirror image glide-obstruent (as in English like) as the best (steepest sonority fall) coda cluster given a vowel > glide > liquid > nasal > obstruent sonority scale. The sonority dispersion principle (Clements 1988, 1990) also has its roots in sonority sequencing. Here, the syllable is split into two constituents (demisyllables) sharing the peak. Un-

Sonority and sonority-based relationships

5

der the sonority dispersion principle with a vowel > glide > liquid > nasal > obstruent sonority scale, an initial (onset-peak) demisyllable beginning with an onset cluster is most harmonic when the sonority distance between the segments is uniform and the sonority slope throughout is steep. Thus, the best initial demisyllable with three members consists of an obstruent followed by a liquid followed by a vowel (as in English play). The final (peak-coda) demisyllable is most harmonic when the sonority distance between segments is uniform and the sonority slope throughout is shallow, making the best final three member demisyllable a sequence consisting of a vowel followed by a glide followed by a liquid (as in English fire). Within Optimality Theory, the minimum sonority distance concept has probably been the most influential, as a number of analyses have proposed constraints along the lines of “Minimum Distance 1  Minimum Distance 2  Minimum Distance 3 …”. For example, Gouskova (2004) combines an onset sonority constraint hierarchy (based on Gnanadesikan 2004) with a coda (mora) constraint hierarchy (based on Morén 1999) creating a set of sonority distance constraints with each constraint militating against a stratum with a sonority distance of X. The drawback to Gouskova’s approach (and other, similar, approaches), however, is exactly what she considers one of its strengths. It groups constraints into strata that must treat all sequences of a given sonority distance equally. It is a drawback inherited from its predecessor, minimum sonority distance thresholds. A minimum sonority distance threshold is entirely dependent on the sonority scale on which it is based. If we adopt a minimal sonority scale ‘vowel > glide > liquid > nasal > obstruent’ such as the one adopted by Clements (1988), the resulting minimum sonority distance is two for English onsets in order to account for the presence of obstruent-liquid onset clusters (as in play, tree, clear, etc.). We then must adopt some other mechanism (usually an OCP constraint) to exclude nasal-glide onset clusters (*/nwæk/, */mjejt/) that would also meet the minimum sonority distance of two. And with this minimal scale, we run into difficulty in describing the differing behavior between /ô/ and /l/ in English peaks. English /ô/ often fills peak position (bird, word, pearl, turn, etc.) but /l/ can become syllabic only in unstressed syllables (bottle, topple, but *[blk], *[sln]). By recognizing a distinction between the liquids with /l/ as " " less sonorous than /ô/ (Wiese 2001; Murray and Vennemann 1983; Giegerich 1992, among others), we still have the same minimum onset threshold of two for English, but now must add a restriction against nasal plus /ô/ onset clusters and /l/ plus glide onset clusters in addition to the nasal plus glide clusters mentioned above. If we were to adopt a more elaborated sonority scale such as the one Steriade (1982: 98) proposes for Greek, which adds a distinction be-

6

Karen Baertsch

tween stops and fricatives and between voiced and voiceless in both the stops and fricative categories, our minimum sonority distance increases to three (the distance between voiceless fricatives and /l/), but now we are faced with even more exceptions, including a ban on voiceless obstruent plus voiced fricative clusters that meet the minimum sonority distance of three but cannot surface in English. If we were to consider a much more elaborated scale such as the one put forth in Parker (2008: 60) or the scale implicit in Prince and Smolensky’s (1993/2004) Peak and Margin Hierarchies, the concept of a simple minimum sonority distance threshold becomes untenable. The more sonority distinctions we identify and the more detailed our scale becomes, the more exceptions to a minimum sonority distance threshold we must account for. However, we need not throw out entirely the concept of sonority-driven principles in syllabification. We adopt the sonority scale in (1) for our analysis with some discussion of the sonority distinctions made here as the issues arise in the analysis.1 (1)

Sonority scale [-hi] > /i/ > /u/ > /ô/ > /l/ > nasals > obstruents

In the approach presented here, we are not limited to a stratal analysis of sonority slope requirements. As a result, the analysis proposed here could incorporate a sonority scale as elaborate as Parker’s (2008) scale or as minimal as Clements’ (1988) ‘obstruent – nasal – liquid – vowel’ scale. What is important for the analysis as it is applied to any language is that the major sonority distinctions relevant in that language be identified, and for the current purpose, the distinctions made in the sonority scale in (1) are sufficient for American English stressed monosyllables. The focus here is the core syllable of American English, which can accommodate a two-segment branching onset, a two-segment branching coda, and a single-segment peak (offglides of tense vowels and diphthongs are analyzed here as coda segments rather than as peak segments). The syllable rhyme consists of the peak plus the coda. The reasoning for this structure, in particular the structure of the rhyme, will be discussed in more detail later in the analysis as it becomes relevant. We do not include word-initial /s/ when it clusters with 1. The scale in (1) collapses some of the sonority distinctions at the low- and highsonority ends of the scale in favor of more detail in the mid ranges. It is the detail in the middle-sonority segments of the scale that is most important for the analysis provided in this paper. Treating groups of segments that act as a class as a single class in the scale is consistent with constraint encapsulation (Prince and Smolensky 1993/2004), discussed below.

Sonority and sonority-based relationships

7

stops or nasals as part of the core syllable, nor do we include word-final coronal obstruents that are often cited as exceptions to the sonority profile of the syllable and exceptions to the permissible number of segments. In both cases, the acceptability of these segments is not determined by the sonority of the segments within the core syllable. They are treated as appendices to the word. A short discussion of initial and final appendices is provided in sections 2.1 and 2.2, respectively. Differences in the internal structure of the syllable aside, sonority relationships within the syllable are relatively uncontroversial. The peak is the most sonorous segment within the syllable. The preference for single segments preceding the peak is for low sonority. The preference for a branching onset is for increasing sonority. The preference for single segments following the peak is generally for more sonorous segments over less sonorous segments (with much more varied behavior at the ends of words). And when multiple segments follow the peak, the requirement is for decreasing sonority (Prince and Smolensky 1993/2004; Kenstowicz 1994). This is a starting point for our analysis, but is not the end point, as we will see. The analysis presented here builds on the split margin approach to the syllable developed in Baertsch (1998, 2002), employing three sonority-based hierarchies of constraints, each militating against segments of a particular sonority ‘level’ filling a particular slot within a syllable. The constraint hierarchies are universal, as are the rankings of the constraints within the hierarchies. What is language-specific is how the hierarchies are interspersed and how they interact with other constraints. The peak hierarchy in this account is identical to Prince and Smolensky’s (1993/2004) Peak Hierarchy and gives preference to high sonority segments in peak position. The margin hierarchies (M1 and M2 ) employed here differ from the Margin Hierarchy in Prince and Smolensky (1993/2004) in that they govern different syllable positions within the onset and the coda of the syllable. The M1 hierarchy gives preference to low sonority segments in singleton onset position and the same preference to the second segment of a coda cluster (when present). The M2 hierarchy gives preference to high sonority segments and governs singleton coda segments as well as the first segment of a coda cluster and the second segment of an onset cluster (when present). Peak position has prominence over margin position (Prince and Smolensky 1993/2004) and within syllabic constituents, the primary (left hand) constituent has prominence over the secondary (right hand) constituent

8

Karen Baertsch

(Baertsch 2002).2 The hierarchies governing each position within the syllable are provided at the terminal nodes of the diagram in (2). (2)

The split margin approach to the English syllable σ Onset

Rhyme Nuc

M1

(M2)

P

(Coda) M2

(M1)

One way in which the syllable structure given in (2) differs from some other structures for English is in the non-branching nucleus. In the current analysis vowel offglides are considered coda segments. This position will be discussed in more detail when we take up the issue of the rhyme in section 2.2.3 Before taking up the issue of the rhyme, however, we begin our discussion in the next section with the syllable onset and the sonority-based generalizations that can be extracted from the behavior of singleton onsets, followed by onset clusters. 2.1. The onset Singleton onsets are probably the most discussed of the sonority-based syllabification issues in optimality theoretic analyses. They are treated first in Prince and Smolensky (1993/2004) in their presentation of the Margin Hierarchy. The treatment in the current approach is similar except for the addition of the positional subscript shown in (3) and throughout this chapter. For the purposes of simplicity and space, this paper employs the following cover symbols for each of the sonority levels listed in (1) above: The category obstruents is indicated by T,4 nasals are indicated by N, /l/ by L, /ô/ by R, high back vowels and glides by U, high front vowels and glides by I, and non-high vowels by A. 2. This is very similar to government and licensing within the syllable in government phonology discussed by Ewen and Botma (2009) and Cyran (2008) but without empty nuclei. 3. The approach outlined here is also not incompatible with the addition of a moraic level of representation. However, moraic theory is primarily a theory of weight rather than sonority. And while weight can – in many languages – be crucial to completing the process of syllabification, one goal of the current analysis is to tease apart the effects of weight and sonority in building syllables. I therefore do not include a moraic level in the current analysis. 4. In English, segmental [h] and [P] pattern as obstruents and are included in the T category here. This is consistent with Parker (2008), who provides additional detail regarding both the phonetic and phonological patterning of the glottal consonants in English and other languages.

Sonority and sonority-based relationships

9

Also following Prince and Smolensky (1993/2004, section 8), the assumption here is that a universal fully elaborated sonority scale (similar to the scale discussed in Parker (2008)) is reduced to a more coarse-grained, language-specific scale by encapsulation of constraints – collapsing distinctions between adjacent segments when no competing constraint interferes. The subscript in each case indicates the syllable position – subscript 1 indicates an M1 segment, 2 indicates an M2 segment, and P indicates a peak segment. As discussed in Prince and Smolensky (1993/2004), low sonority segments (obstruents) are the most preferred onset segments and as the segment becomes more sonorous, the fit as an onset segment becomes less preferred. (3)

M1 constraint hierarchy (with ranking of Faith indicated for American English) *A1  Faith  *I1  *U1  *R1  *L1  *N1  *T1

In American English, Faith5 is dominated only by *A1 , indicating that a nonhigh vowel will never surface in onset position.6 Faith dominates the remainder of the M1 constraint hierarchy, including the constraints that would be violated by an onset glide. While high front vowels are not preferred onset segments, they can surface in onset position (yak, yellow, etc.), as can any of the less sonorous segments governed by the lower ranking constraints in (3). An underlying /ô/, for example, can surface as an onset if it is followed by a more sonorous segment (raw /ôO/ surfaces as [ô1 OP ] with [ô] in onset position rather than as [ÄP .OP ] where [ô] is parsed in peak position), but if not followed by a more sonorous segment, it would surface as syllabic (burr [b1 ôP ]) or as a coda (bear [b1 EP ô2 ]).7 The constraint ranking in (3) treats the high front vowel/glide as more sonorous than the high back vowel/glide,8 following Kiparsky’s (1979) sonority scale, which allows us to capture the asymmetrical behavior of the two high

5. I use a generic Faith constraint throughout the analysis to maintain the focus on the sonority phenomena that come out of Faithful parses of an underlying sequence. 6. Assuming Richness of the Base (Prince and Smolensky 1993/2004, section 9.3), any input is a possible input for English. The *A1  Faith ranking crucially prevents a non-high vowel from surfacing as an onset in English. 7. Subscripts in phonetic transcriptions indicate position of that segment within the syllable. 8. I take the position here that glides carry an identical feature set to their corresponding high vowels underlyingly. It is the process of syllabification that turns the vowel into a glide. A high tense vowel is treated as underlyingly long (/ii/ → [ij]) and a high lax vowel is underlyingly short (/i/ → [I]).

10

Karen Baertsch

vowels/glides in onset position.9 This position is discussed more thoroughly in Baertsch (2008) and will not be exhaustively covered here, but /u/ makes a better singleton onset in English than /i/ and, when coupled with a second underlying high back vowel (/uu/), will generally surface as an onset glide followed by a lax [U] peak. Compare /uu/ → [wU] in wood, wolf, wool, etc. with the relative rarity of /uu/ → [uw] in ooh, ouzo where the former set is not uncommon and the latter set is quite rare and populated by expressive and borrowed words. The corresponding high front vowel sequence (/ii/) will generally surface as an onsetless, tense [ij] in English. In this case, compare /ii/ → [jI] in just a few (expressive) words like yin, yip, etc. in contrast to the much more common /ii/ → [ij] in east, eel, easy, etc.10 In neither case is an initial onset glide banned from onset position, but the asymmetry supports a sonority distinction between the two glides in which [w] (surfacing in onset position) is less sonorous than [j] (surfacing in peak position). Of course, an initial sequence /uuu/ would necessarily surface as [wuw] as we see in woo, woozy and an initial /iii/ sequence must surface as [jij] as in yeast, yield. The asymmetrical behavior of [w j] is similar in onset clusters as well. The back glide [w] readily surfaces in onset clusters, but the front glide surfaces only in conjunction with peak /u/ – the somewhat controversial ongliding [ju:] diphthong (Davis and Hammond 1995; Baertsch 2008). In this case the back glide patterns with the lower sonority liquids [l] and [ô] while the front glide patterns with the high sonority vowels as part of a syllable peak. Also, Yamamoto (1996) notes that an English intervocalic front glide more readily patterns with the preceding vowel (kay.ak, gi.ant) in contrast to an intervocalic back glide (Ka.wa.sa.ki). The parse with the front glide is also in contrast to the syllabification in the lending language (ka.yak, ka.wa.sa.ki). Intervocalically, the front glide is filling what in this analysis is a 9. Raffelsiefen (2011) also argues for a sonority distinction between the front and back glides/vowels as well. Her argument is that the back glide is more sonorous than the front glide based on the patterning of glides primarily in coda position on the assumption that low sonority segments are preferred in coda position. In this analysis, we take the position (Prince and Smolensky 1993/2004) that high sonority segments are preferred in coda position. We are both arguing for the same sonority distinction between the glides but from different theoretical stances vis-à-vis the coda. 10. Ohala and Kawasaki-Fukumori (1997) note a typological tendency for languages to avoid [wu] and [ji] sequences as examples of sequential constraints that are difficult to handle using a sonority-based approach to the syllable. The approach developed here does incorporate such sequences into a sonority-based analysis – not only when there is an asymmetry between the two as in English, but also in a more general sense by the interaction of the M1 hierarchy (dispreferring onset glides) and the corresponding peak constraints.

Sonority and sonority-based relationships

11

(high sonority) coda position (a nuclear position for some) and the back glide is filling an onset position which prefers lower sonority. In each of these examples – word-initially, in onset clusters, and intervocalically – the front glide is acting as though it is more sonorous than the back glide. The distinction is therefore made in the hierarchies employed here. In a singleton onset, however, both glides are possible onsets in English; therefore both *I1 and *U1 are dominated by Faith. Branching onsets in this analysis consist of an M1 segment (the slot filled by a singleton onset as well) followed by an M2 segment. The M2 hierarchy in (4) highlights the preference for higher sonority segments as second onset segments (and as singleton coda segments). Because the M2 constraint hierarchy governs both the second slot in an onset cluster and coda segments, the ranking of Faith within this constraint hierarchy must be determined by the least restrictive position and in the case of American English, the least restrictive position is singleton coda position. Because obstruents can surface as singleton codas in American English, Faith dominates the whole M2 hierarchy in American English, as reflected in (4). (4)

M2 constraint hierarchy (with ranking of Faith indicated for American English) Faith  *T2  *N2  *L2  *R2  *U2  *I2  *A2

Branching onsets in American English do not, however, allow obstruents or nasals in the second onset slot as the ranking in (4) seems to imply. Branching onsets in American English allow only the best M1 segments (obstruents) combined with the most sonorous consonants of the M2 segments, preferring a steep sonority rise between the two segments – what would be a minimum sonority distance of two, given the sonority scale in (1) above. Within an onset cluster, it is the sonority relationship between the two that is primarily responsible for determining the relative well-formedness of onset clusters. Conjunction of the M1 hierarchy in (3) with the M2 hierarchy in (4), as discussed in Baertsch (1998, 2002) creates a set of complex constraints (5) that capture the sonority distance relationships between the segments involved. The ranking of the complex constraints in (5) is determined by the ranking of the component constraints (Smolensky 1997). Because *A1 has already been shown to dominate Faith in American English, any conjunctions involving *A1 will also dominate Faith and there is no need to include those constraints in the continuing discussion.

12 (5)

Karen Baertsch

Onset conjunction of M1 plus M2 *I1T2 *I1N2

*U1T2

*I1L2

*U1N2

*R1T2

*I1R2

*U1L2

*R1N2

*L1T2

*I1U2

*U1R2

*R1L2

*L1N2

*N1T2

*I1I2

*U1U2

*R1R2

*L1L2

*N1N2

*T1T2

*I1A2

*U1I2

*R1U2

*L1R2

*N1L2

*T1N2

*T2

*I1

*U1A2

*R1I2

*L1U2

*N1R2

*T1L2

*N2

*U1

*R1A2

*L1I2

*N1U2

*T1R2

*L2

*R1

*L1A2

*N1I2

*T1U2

*R2

*L1

*N1A2

*T1I2

*U2

*N1

*T1A2

*I2

*T1

*A2

Looking more carefully at the diagram in (5), we see that the highest ranking constraints are those that govern (typologically very marked) falling sonority onset clusters. For example, the highest ranking constraint, *I1 T2 would be violated by an onset cluster consisting of a high front glide followed by an obstruent (a sonority distance of −5). Moving down the diagram, the clusters governed by the constraints on each successive line in the diagram are progressively better onset clusters (a distance of −4, −3, −2, etc.). Taking a traditional view of the sonority relationship within American English onset clusters as required to rise in sonority (sonority sequencing) and to meet a minimum sonority distance of two, we would identify the first seven lines of this ranking diagram (enclosed in the dotted triangle) as dominating Faith. These are the highest ranking of the onset constraints and are typologically the least-well-formed onset clusters. They are removed from the diagram in (5) leaving the constraints shown in (6) as the set of constraints governing onset clusters meeting the minimum onset sonority distance of two. As the analysis proceeds, other constraints identified as dominating Faith (enclosed within the dotted line) will also be removed from their respective ranking diagrams, ultimately leaving only the constraints that are violated by well-formed American English onset clusters.

Sonority and sonority-based relationships

(6)

13

Onset constraints governing clusters with a sonority distance of at least two Faith *T2 *I1

*U1A2

*R1I2

*L1U2

*N1R2

*T1L2

*N2

*U1

*R1A2

*L1I2

*N1U2

*T1R2

*L2

*R1

*L1A2

*N1I2

*T1U2

*R2

*L1

*N1A2

*T1I2

*U2

*N1

*T1A2

*I2

*T 1

*A2

The diagram in (6) is as far as a traditional minimum sonority distance approach gets us in onset clusters. This diagram includes constraints that would govern any potential onset cluster that meets a minimum sonority distance of two given the sonority scale in (1), above. But several clusters meeting this distance (governed by the constraints enclosed in the dotted triangle in (6)) are not well-formed onset clusters in American English. Included in this group are clusters that would include a non-high vowel as the second segment of the cluster (e.g. *[w1 a2 -], *[ô1 a2 -], *[l1 a2 -] and *[m1 a2 -]), clusters that would in“ clude a liquid or “nasal followed by“a high glide “(*[ô1 j2 -], *[ô1 w2 -], *[l1 j2 -], *[l1 w2 -], *[m1 j2 -], *[m1 w2 -]), and clusters consisting of a nasal followed by [ô] (*[m1 ô2 -]). These potential (but ill-formed) onset clusters are traditionally banned by an OCP restriction against [+son][+son] onset clusters rather than a restriction based on the sonority profile of the cluster. In this approach, however, the OCP restriction is not necessary. Here it is the sonority relationship between the segments within the potential cluster that becomes fatal. While both singleton obstruent onsets and singleton sonorant onsets are well formed onsets in English, the higher sonority of (singleton) sonorant onsets makes them less well formed (but still acceptable) than the very low sonority (singleton) obstruent onsets. When combined with a second segment into an onset cluster, however, those less well formed sonorant onsets fatally violate the relevant conjoined constraint. The constraints pertaining to these [+son][+son] onset clusters are immediately dominated only by the constraints identified in the discussion surrounding (5) as dominating Faith. Therefore, in this analysis, the constraints enclosed in the dotted triangle in (6) are also analyzed as dominating Faith and are removed from the ranking diagram, leaving only those singleton

14

Karen Baertsch

onsets and onset clusters (obstruent plus sonorant) that are allowed in English in (7). (7)

Active onset constraints in American English Faith

*T2 *I1 *U1 *R1 *L1 *N1

*T1L2

*N2

*T1R2

*L2

*T1U2

*R2

*T1I2

*U2

*T1A2

*I2

*T 1

*A2

The diagram in (7) now accounts for all of the possible onsets and onset clusters in English. These include obstruent plus lateral approximant (e.g. plank, glue, sly), obstruent plus rhotic approximant (tree, green, throw), and obstruent plus high back glide (quick, twin, switch), but it does not yet prevent some of the banned onset clusters in English. Clusters governed by the constraints within the dashed parallelogram (those that would consist of an onset cluster containing an obstruent followed by [j]11 or an obstruent followed by a non-high glide) also do not occur in American English. However, unlike the previous set of constraints which were analyzed as dominating Faith, these constraints cannot dominate Faith because they are dominated by constraints violated by well-formed onset clusters in English. In this analysis, the potential clusters governed by the constraints enclosed within the dashed lines ultimately fail because of the sonority of their segments as well, but in this case it is due to an interaction between the *T1 I2 and *T1 A2 onset hierarchy and the *IP and *AP constraints of the Peak hierarchy.12 Both non-high vowels and high front vowels make better peaks in English than onset clusters, and an input sequence consisting of an obstruent+/i/ or obstruent+non-high vowel will surface as obstruent onset plus peak vowel (violating *T1 and *IP or *T1 and *AP ) rather than as an onset cluster violating 11. The American English diphthong [ju] in cute, mute, few, etc. is considered here to be an ongliding diphthong filling the peak position within the syllable. This issue is discussed further in the next section. 12. The Peak hierarchy is discussed more fully in the next section.

Sonority and sonority-based relationships

15

*T1 I2 or *T1 A2 (e.g. pi.ano and pa.ella). Obstruent+/u/ sequences, on the other hand, may surface as *T1 U2 clusters if there is a vowel following the /u/ which is more sonorous than /u/ (e.g. quake), as we see in (8). This interaction will be discussed again later. (8)

Obstruent + /i/ sequences vs. obstruent + /u/ sequences /pie/ *T1 I2 *I2 *IP *T1 a. p1 j2 EP *! * * → b. p1 iP .EP * * → a. b.

/kwe/ k1 w2 EP k1 wP .EP

*UP *!

*T1 U2 *

*U2 *

*T1 * *

*AP * * *AP * *

The previous discussion completes what, in this analysis, are the sonority related phenomena active in the syllabification of underlying strings into onsets and onset clusters in English. Remaining restrictions on segments within these categories are in this analysis considered to be unrelated to sonority. Some of these restrictions include: (1) the lack of onset clusters beginning with voiced fricatives (an accidental gap stemming historically from the complementary distribution of voiced/voiceless fricatives in English);13 (2) the ban on [Labial][Labial] onset clusters;14 and (3) the ban on onset [N],15 etc. In this analysis such restrictions are considered unrelated to sonority. Before moving on to a discussion of the rhyme, however, we return to the issue of initial s-clusters. In the analysis presented here, such segments, along with similar word-final coronal obstruents, are treated as appendices to the syllable or to the phonological word (Goad and Rose 2004, among others). In languages like English and other Germanic languages, these segments are singleton obstruents but in a few languages, like Old Tibetan, appendices may include two segments and more closely resemble branching onsets. As onsets, singleton segments are governed by the M1 hierarchy (and in the case of languages like Old Tibetan, the conjoined hierarchy). As far as sonority restrictions on such segments, the M1 hierarchy incorporates the preference for very low sonority. We would therefore expect obstruents as the default appendix segment. As ap13. While English has no native words with voiced fricative + sonorant onset clusters as a result of the earlier non-phonemic status of voiced fricatives, it is willing to borrow words with such clusters and maintain those clusters – Vlad, zloty, zwieback, voilà. 14. This is a gap in the native vocabulary as well, as evidenced by borrowings such as bwana, pueblo, voilà. 15. This restriction would follow historically from /ng/ → /N/.

16

Karen Baertsch

pendices, they are also subject to satisfaction of a constraint militating against such adjunction (*App-L).16 Restriction to /s/ (/S/ in German) is attributed to constraints (place, manner, voicing assimilation, etc.) not based on sonority.17 With the addition of the *App-L constraint (dominated by Faith, which will allow appendices to surface), an underlying sequence consisting of /s/ followed by an approximant, like /sla/, shown in the first tableau in (9), will fatally violate the *App-L constraint, allowing violation of the *T1 L2 constraint in the winning candidate. In the second tableau in (9), however, violation of *T1 T2 is fatal and violation of *App-L is preferable to violation of Faith. (9)

American English s-clusters /sla/ a. s1 l1 ap → b. s1 l2 aP c. s1 VP l1 aP

Faith

/sta/ → a. s1 t1 ap b. s1 t2 aP c. s1 VP t1 ap

*T1 T2

*App-L *!

*T1 L2

*L1 *

*L2

* *!

* *

Faith

*!

*App-L *

*T2 *

*!

*T1 * * * *T1 ** * **

*A2 * * ** *A2 * * **

With the addition of the *App-L constraint (dominated by Faith, which will allow appendices to surface), segments (restricted to /s/ in American English) will surface in the appendix position only when they cannot be included in the core syllable ([sn-], [sm-], [sp-], [st-], [sk-]). For example, an underlying sequence consisting of /s/ followed by an approximant ([sl-], [Sô-], [sw-]), will fatally violate the *App-L constraint, allowing violation of the *T1 L2 constraint in the winning candidate, as in the first tableau in (9). In the second tableau in (9), however, violation of *T1 T2 is fatal and violation of *App-L is preferable to violation of Faith.

16. *Appendix-Left(Right): A consonant at the left/(right) edge must be immediately dominated by Onset/(Rhyme) (Goad and Rose 2004). 17. See Henke, Kaisse and Wright (this volume) and Wright (2004) for a perceptual account restricting such segments to phonemes (like /s/) with very reliable internal cues for identification. There are also a few sequential constraints usually attributed to OCP restrictions that apply across the word. These include restrictions on words of the form sC iVC i (*skak, *spap) discussed by Clements and Keyser (1983), Fudge (1987), Frisch (2004), and Coetzee (2008).

Sonority and sonority-based relationships

17

2.2. The rhyme The sonority relationships within the syllable rhyme have been less thoroughly treated in the literature and in optimality theoretic analyses. While sonority has a major role in determining the well-formedness of rhyme constituents, the relationships within the rhyme are more complex and intertwined than they are in onsets and onset clusters. A number of structural issues relating to the rhyme are also unclear. The existence of the rhyme as a constituent of the syllable is not universally accepted (Blevins 1995; Kessler and Treiman 1997, among others). There is a difficulty in determining whether a particular segment within the English rhyme is a member of the peak or of the coda, exacerbated by the distribution of lax vowels plus consonant clusters vs. tense vowels that cannot be followed by consonant clusters (Kenstowicz 1994; Selkirk 1982; Fudge 1969, 1987, among others). There is a difficulty in determining whether a word-final consonant, both as a property of English in particular and as a parameter for all languages, is a member of the coda or part of an appendix to the syllable not governed by the sonority of the preceding rhyme (Piggott 1999; Kenstowicz 1994; Booij and Rubach 1984; Halle and Vergnaud 1980)). There is a difficulty in identifying the length/tenseness/laxness of the English low vowels and the number of rhyme slots each fills (Halle 1977; Selkirk 1982; Giegerich 1992, among others). Each of these issues will be discussed in more detail below. We adopt the position here that the rhyme is a constituent within the syllable. This is based on the work of several researchers (including Blevins 1995; Fudge 1969, 1987; Giegerich 1992; Selkirk 1982; Kessler and Treiman 1997, among others), who find more (usually sonority-based) restrictions on peak-coda combinations than on onset-peak combinations, as well as arguments based on syllable weight, stress, language games, compensatory lengthening, etc. If the rhyme is a constituent, our analysis should be able to identify a number of restrictions within the rhyme that are due to the sonority relationships among the segments involved. We shall take up this question again in section 2.2.3. The difficulty in determining to which constituent a glide or sonorant consonant belongs is discussed in more detail in our discussion of the peak constituent in the next section. The difficulty hinges on the complementary distribution of lax vowels with up to two consonants following (limp, ink) vs. tense vowels and diphthongs which allow only one following consonant (leak [lijk], but *leank *[lijNk]). In this analysis, the middle segment (the offglide of a tense vowel or the first consonant following a lax vowel) in both cases must be treated in the same way, as the goal of the constraint ranking is inherently to take an underlying sequence of sounds and parse them into a syllable in a way that best fits the sonority profile of English. Introducing four terminal nodes for what turns

18

Karen Baertsch

out to be a maximum of three segments would necessitate an additional set of constraints to sort out the resulting mess. In the interest of simplicity, we treat both the offglides of tense vowels and the sonorant consonants of coda clusters as coda segments in our analysis (limp[l1 IP m2 p1 ], leak[l1 iP j2 k1 ]), as discussed in more detail below. We adopt the relatively common position that the behavior of word-final coronal obstruents does not always pattern with other word-final consonants. In much of the earlier literature, these segments have been identified as exceptional (in some cases as violating the SSP) but principled (coronal obstruents). The most common treatment of these segments has been to include them as adjoined to the coda (Giegerich 1992), the syllable (Halle and Vergnaud 1980; Fudge 1969), the prosodic word (Booij and Rubach 1984), or as onsets of a syllable with an empty nucleus (Ewen and Botma 2009; Cyran 2008). Piggott (1999) develops in more detail the options available to languages in word-final position, as part of the syllable and/or adjoined to the end of the word. As was the case with initial s-clusters in the previous section, we treat these obstruents as belonging to a word-final appendix in this analysis as well. And as with the initial s-clusters, the appendix is an M1 position and serves as a ‘last resort’ for incorporating final coronal obstruents into the word. When possible, the constraint hierarchy will parse coronal obstruents as part of the core syllable, but when that fails (in most cases because the core syllable positions are filled by other segments), violation of *App-R (Goad and Rose 2004) allows adjunction. Because final coronal obstruents may be parsed as coda segments or as appendices and when parsed as coda segments they pattern with similar noncoronal segments and clusters, we avoid including examples involving word-final coronal obstruents in the discussion that follows. We come back to the issue of the length/tenseness of the low vowels when we again take up the discussion of sonority relationships within the rhyme constituent in section 2.2.3. There is a good deal of agreement on /A/ as a tense vowel, and as a tense vowel, we treat it as filling both a peak position and a coda M2 position as the other tense vowels of English do (mob [m1 AP A2 b1 ]). We will begin our discussion of the rhyme with a discussion of the constituents within the rhyme (peak and coda), identifying the restrictions within each category that result from the sonority profile associated with each, then return to the rhyme as a constituent in order to identify other sonority-based phenomena that arise when peaks are followed by codas.

Sonority and sonority-based relationships

19

2.2.1. The peak Looking at the singleton peaks allowed in English, we again see a gradient acceptance of segments based on sonority. The Peak constraints, governing segments that potentially fill a peak position within a syllable are provided in (10), below. The highest sonority segments, non-high vowels, are readily accepted as peaks, as are both high vowels. Rhotics easily fill a peak slot in a stressed syllable and may be followed by a single coda segment, but while nasal and lateral peaks are allowed in English peaks, they are only allowed in unstressed syllables, as in the final syllable of mountain or bottle, and may not be followed by a coda. (10)

Peak constraint hierarchy (with ranking of Faith indicated for American English) *TP  Faith  *NP  *LP  *RP  *UP  *IP  *AP

Obstruents are simply banned from peak position in English.18 This set of circumstances is encoded in this account by the constraint ranking *TP  Faith, as shown in (10). As the focus of this paper is on stressed monosyllables, we leave the discussion of nasal and lateral peaks at this point and focus on the segments that can occur in stressed syllables – rhotics and vowels. In addition to the singleton peaks available in English syllables, we must also consider the possibility of a complex peak. Under many previous accounts of the English syllable (Selkirk 1982; Giegerich 1992; Fudge 1969, etc.), long vowels and diphthongs were treated as complex peaks. This created a difficulty for Selkirk (1982) where a syllable could either contain a complex peak or a complex coda, but not both. Selkirk solved this difficulty by including sonorant consonants ([l ô n m N]) that are both post-vocalic and pre-consonantal as peak segments, forming a complex peak with a preceding lax vowel. Treating both the offglides of tense vowels and preconsonantal sonorant consonants as peak segments is problematic for this analysis. Consonants occur freely in M2 positions, but peak positions are restricted largely to [-consonantal] segments in English. Treating the sonorant consonants as well-formed enough to freely combine with vocalic segments within a peak in this analysis would grossly overgenerate both the possible peaks and the possible codas of English. Therefore we do not adopt Selkirk’s account of English peaks. Giegerich (1992: 143– 18. One reviewer suggests that if we were to consider Pssst! to be an English word, we would have to admit the possibility of obstruent peaks in English. This, along with the dental click comprising Tsk-tsk! and other examples from the sound-symbolic lexicon of English are not considered true English words here.

20

Karen Baertsch

147) solved the difficulty of a three position rhyme by restricting the peak to a maximum of two X-slots, the coda to a maximum of two X-slots, and the rhyme to a maximum of three X-slots. This approach also runs into an overgeneration problem in the current analysis and is not adopted. The approach adopted here, which was mentioned by Fudge (1969: 276) but neither adopted nor elaborated upon in that work, is to analyze both the second portion of the tense vowels and pre-consonantal sonorants as coda segments rather than peak segments. Fudge (1987: 368–369) returns to this issue, noting that such an analysis would highlight the similarities between the onset-coda restrictions preventing ClVl, CrVr, and CwVw sequences in English (see also Clements and Keyser (1983); Davis (1988); and Cairns (1988) for discussion of these “OCP” restrictions in English). Kenstowicz (1994: 259) states explicitly that the syllabic affiliation of offglides is unclear and provides both a peak analysis and a coda analysis. The split margin approach to the syllable, as discussed in Davis and Baertsch (2011), capitalizes on the similarities in behavior between onset M2 segments and coda M2 segments and the inclusion of vowel offglides as M2 coda segments further highlights those similarities. This is not to say that no language allows complex peaks. The behavior of complex peaks in this approach to the syllable should be markedly different, and such an approach should be able to distinguish between languages with phonemically long vowels and languages like English in which length and tenseness are intertwined. For example, contrast the vowel system of English with the vowel system of Yakut, which includes the vowels [i W e A y u ø o]. Each of these vowels occurs as short and as long (the long vowels are two to three times longer than the short vowels) and there are four rising sonority diphthongs [ie yø uo WA], agreeing internally in backness and roundness (Krueger 1962: 47– 48, converted to IPA equivalents). The vowels of Yakut are complex peaks and would be analyzed as such in a split margin approach. The palatal glide in Yakut, on the other hand, which may surface after any of the vowels (long, short, or diphthong (Böhtlingk 1964: 109–111)) would be analyzed as an M2 segment, as it is in English. In an analysis in which a language allows complex peaks, a self-conjunction of the peak hierarchy will govern the well-formedness of such segments/sequences. In the case of English, all of the complex constraints generated by the self-conjunction of the peak hierarchy dominate Faith. This is functionally the equivalent of Rosenthall’s (1994: 15–16) No Long Vowel constraint. In English, the singleton peak constraints interact with the coda constraints discussed in the next section to determine the well-formedness of rhyme sequences.

Sonority and sonority-based relationships

21

2.2.2. The coda Singleton coda segments are governed by the M2 hierarchy provided in (11), below. Coda segments in English include obstruents (which violate *T2 ), indicating that Faith dominates the whole of the M2 hierarchy in this analysis. While the constraints at the low end of this hierarchy may appear to be problematic, we must keep in mind that the second portion of a tense vowel or a diphthong will fill the first coda position (M2 ). Interaction between the M2 hierarchy and the peak hierarchy will generally pull a vowel into a peak and it is this interaction that would prevent an ill-formed surface sequence like /pôin/ *[p1 ÄP j2 n1 ] in favor of [p1 ô2 IP n2 ]. (11)

M2 constraint hierarchy (with ranking of Faith indicated for American English) Faith  *T2  *N2  *L2  *R2  *U2  *I2  *A2

We also meet an interaction with stress at this point. Stressed syllables in English must be heavy. This will require an M2 segment in each stressed syllable. Long/tense vowels and diphthongs are two-position sounds (two X-slots in Giegerich’s (1992) analysis). Therefore, any word ending in a vowel in English will violate one of the low-ranking M2 constraints (*U2 or *I2 in the case of an offglide and *A2 in the case of a low, tense vowel). And while the implication of the M2 hierarchy is, in general, that high sonority vowels make the best M2 segments, very high sonority vowels make even better peak segments as the need for a peak in a syllable (required) outweighs the need for a coda segment (optional). As a result of this interaction the input /pôin/, above, will surface as [p1 ô2 IP n2 ] rather than *[p1 ÄP j2 n1 ]. Branching codas in English, consisting of an M2 segment followed by an M1 segment, are not simply the mirror image of branching onsets (M1 followed by M2 ). While the reverse of all the possible onset clusters are acceptable as coda clusters, coda clusters with less of a sonority slope than are allowable in onsets also surface (in traditional sonority distance terms, a minimum sonority distance of one in the coda). Coda clusters consisting of two [+son] segments are allowed as well. However, rising sonority codas are not allowed. We know from the discussion of singleton onsets above that A1 dominates Faith; therefore all of the complex constraints in which *A1 is a member will also dominate Faith. Using the same approach for deriving the coda constraints as we did for the onset constraints yields the mirror image of the onset constraints in (5), and eliminating the falling sonority constraints that dominate Faith results in the complex coda ranking diagram in (12).

22 (12)

Karen Baertsch

Coda constraints governing clusters with a sonority distance of at least one Faith

*T2

*N2T1

*L2N1

*R2L1

*U2R1

*I2U1

*A2I1

*N2

*L2T1

*R2N1

*U2L1

*I2R1

*A2U1

*I1

*L2

*R2T1

*U2N1

*I2L1

*A2R1

*U1

*R2

*U2T1

*I2N1

*A2L1

*R1

*U2

*I2T1

*A2N1

*L1

*I2

*A2T1

*N1

*A2

*T1

More of the constraints in (12) are dominated by Faith than was the case with the onset constraints, but we see here some sonority effects that are not captured by the minimum sonority distance approach to coda clusters as well. Sequences of a tense vowel plus /w/ that would violate the *I2 U1 do not occur (*[t1 eP j2 w1 ]), even though the sequence follows the sonority sequencing principle. The nonhigh vowels are interesting here as well. Tense mid vowels are followed by offglides in the coda and would violate one of the constraints with (*U2 or *I2 ) as a component. The literature regarding these tense vowels and the diphthongs is quite explicit in stating that such vowels take two syllable slots (see, for example, Giegerich 1992; Blevins 1995). The low vowels do not offglide, but /A/ and sometimes /æ/ are considered to be tense/long vowels in the literature. The low back vowel occurs in open monosyllables (spa, thaw, straw, etc.),19 but the low front vowel is much less common (occurring in baa [bæ:] for some speakers). On the other hand, both low vowels may co-occur with consonant clusters ([kArp], [læmp]). Given the somewhat unruly behavior of these low vowels, it is no surprise that there is little discussion as to the number of syllable slots filled by these two vowels in the phonological literature. In durational studies (e.g. Crystal and House 1988), however, these vowels pattern as long, as does syllabic /ô/. Halle (1977) argues that the low back vowel [A] is long but lax and that the low front vowel [æ] may be long or short and when long may be tense or lax. Both Halle’s (1977) length results and the durational results in Crystal 19. For some speakers, the vowel in thaw, straw is /O/ rather than /A/. For those speakers, the analysis is the same, but with [O:].

Sonority and sonority-based relationships

23

and House (1988) support including the *A2 X1 series as acceptable coda cluster sequences; we treat the low vowels here as possible M2 segments and take the issue up again in a later section. But within the *A2 X1 series are the *A2 I1 and *A2 U1 constraints (the two highest ranking *A2 X1 constraints enclosed in the dotted triangle in (12)). These represent the least well-formed *A2 X1 which still fall in sonority. A surface representation violating one of these constraints (*[AP A2 j1 ], *[AP A2 w1 ], etc.) would be difficult to distinguish from the /aj/, /Oj/, /aw/ diphthongs which violate only the comparatively low-ranking *I2 and *U2 coda constraints. What the *A2 I1 and *A2 U1 constraints share with the *I2 U1 constraint discussed earlier in this paragraph is a very marked, high sonority M1 segment. The high sonority of the M1 portion of the combination, while possible in singleton onsets, is not well-formed enough to combine with other segments in a coda cluster. These three constraints dominate Faith and have been removed from the diagram in (13). (13)

Active coda constraints in American English Faith

*T2

*N2T1

*L2N1

*R2L1

*U2R1

*N2

*L2T1

*R2N1

*U2L1

*I2R1

*L2

*R2T1

*U2N1

*I2L1

*A2R1

*U1

*R2

*U2T1

*I2N1

*A2L1

*R1

*U2

*I2T1

*A2N1

*L1

*I2

*A2T1

*N1

*A2

*T1

*I1

The constraints in (13) represent all the possible combinations for well-formed American English codas. There are, however, other interactions with the peak position that we turn to at this point. While there continues to be discussion in the literature over the existence/non-existence of a rhyme constituent (see, for example, Blevins 1995; Selkirk 1982; Davis 1988, 1989; Fudge 1987; Treiman 1986) we argue here for the rhyme. In the discussion of each of the terminal nodes of the syllable above, we argued that sonority plays a role in the selection and relative well-formedness of segments in each syllable position. If the rhyme is a constituent of the syllable, we should also find sonority based restrictions on combinations of peak and coda that can be captured by this analysis. We do find such restrictions.

24

Karen Baertsch

In a very general sense, not all of the permissible coda combinations shown in (13) can occur with all of the permissible peak segments shown in (10), above. For example, the unstressed nasal and lateral peaks are just sonorous enough to be allowed as peaks in unstressed syllables, but are banned when in combination with any of the permissible codas in (13). While we can stipulate this in a description of the behavior of these peaks, we have as yet no optimality-theoretic mechanism to eliminate such a possibility. The complex of constraints that prevent light, stressed syllables from surfacing would prevent a simple syllabic nasal/lateral from surfacing in a stressed syllable, but the constraints governing stress/weight would be unable to prevent a potential syllabic nasal/lateral plus stop sequence from surfacing. The rhyme constraints discussed in the next section are responsible for identifying these sequences as fatally ill-formed. 2.2.3. The rhyme revisited Combining the coda constraints in (13) with each of the active peak constraints in (10) results in a number of additional constraints to juggle, but also brings to the fore a number of sonority-driven restrictions on the structure of the syllable that have not yet been discussed. In sections 2.1 and 2.2.2, we identified several sonority-governed restrictions on the onset and coda, respectively, that are not evident in a simpler ‘minimum sonority distance’ approach to those constituents. Specifically, we found that only the best singleton onsets are able to support complex onsets and that the best singleton codas are also the codas most able to support complex codas. In this section, we will find that the rhyme is similar in this respect. The best peaks (the highest sonority peaks) are the ones that most readily support codas and complex codas. In the case of the leastwell-formed, lowest sonority peaks (nasals), all of the constraints governing a potential rhyme violating *NP plus any of the permissible coda segments from the diagram in (13) dominate Faith; thus none of these combinations will be allowed to surface in American English. The only English examples available with nasal peaks are those that are in unstressed syllables and are not followed by coda segments (mountain, prism). The same situation exists with a potential rhyme violating *LP plus any of the permissible coda segments (bottle, bubble). The complex rhyme constraints that include *LP or *NP are the highest ranking of the set of rhyme constraints created by combining the possible peak and coda constraints. The low sonority of the peak segment in these combinations is too low to combine with any otherwise well-formed coda segments. The American English rhotic /ô/ is more sonorous than the other sonorant consonants and can surface as the peak in a stressed syllable. Recall from the

Sonority and sonority-based relationships

25

discussion above that syllabic /ô/ patterns with the tense vowels, at least as far as duration. It also patterns with the tense vowels phonotactically. It can end a stressed syllable, as in purr. It can also be followed only by singleton coda consonants (pearl), but is not followed by coda clusters (*pearlk). It should therefore be analyzed as filling two rhyme slots, in this case, the peak and the first margin position (M2 ). Combining the complex coda constraints in (13) with the peak constraint *RP results in the ranked constraints provided in (14). It is important to note here that each of the constraints in this diagram dominates the corresponding constraint with *UP and is in turn dominated by the corresponding constraint with *LP . Given that all of the complex rhyme constraints containing *LP dominate Faith, there is no difficulty in identifying constraints along the top edge of this diagram as dominating Faith as well. Syllabic /ô/, as a tense/long vowel, does not combine with following consonant clusters,20 indicating that *Rp N2 T, *Rp L2 N1 , *Rp L2 T1 , *Rp T2 , *Rp N2 , and *Rp L2 , all dominate Faith. These constraints (on the left columns in (14)) are surrounded by a dotted triangle below and will be removed from the set of active constraints involving *RP in the next diagram. (14)

American English rhyme constraints including peak [ô] Faith

*RPT2

*RPN2T1

*RPL2N1

*RPR2L1

*RPU2R1

*RPN2

*RPL2T1

*RPR2N1

*RPU2L1

*RPI2R1

*RPL2

*RPR2T1

*RPU2N1

*RPI2L1

*RPA2R1

*U1

*RPR2

*RPU2T1

*RPI2N1

*RPA2L1

*R1

*RPU2

*RPI2T1

*RPA2N1

*L1

*RPI2

*RPA2T1

*N1

*RP

*RPA2

*T1

*I1

20. Note here again that I do not include appended coronal segments (like the final plural morpheme on pearls, perms) as part of the core syllable. I take up these segments again at the end of this section and avoid using any examples with final coronal obstruents in this section to ensure that the example does indeed fit within the core syllable.

26

Karen Baertsch

English words violating the constraints in the column to the right of that dotted triangle (*RP R2 L1 , *Rp R2 N1 , *Rp R2 T1 , and *Rp R2 ) are readily attested. A syllabic /ô/, as in purr, violates *Rp R2 at the bottom of the column. Words with syllabic /ô/ followed by a consonant (e.g. perk, perm, pearl, etc.) violate the relevant constraints dominating *Rp R2 in this diagram. The constraints in the columns to the right of the *RP R2 column all represent sonority reversals. The low sonority of peak /ô/ followed by a higher sonority (coda) vowel violates the SSP. However, in addition to the violation of sonority sequencing evident in these constraints, the coda sequences involved also include some of the least well-formed of the possible codas in (13). Like the coda constraints involving very high sonority M1 segments in (12), these constraints govern the combination of a comparatively low sonority (dispreferred, but possible) peak with a very high sonority (thus dispreferred) M1 segment and this combination proves fatal. The three constraints enclosed within the dotted triangle on the right are immediately dominated by Faith in this diagram and, as was the case earlier in this analysis with such constraints, we reanalyze them now as dominating Faith. Those three constraints are also removed from the set of active constraints involving *RP , reflected in the diagram in (15). The violation of the SSP would appear to be a difficulty with the remaining constraints in the columns dominated by these constraints (enclosed in dashes in (15)) as well, but these constraints are each dominated by an active *RP R2 X1 constraint and therefore cannot dominate Faith. (15)

Rhyme constraints with peak [ô] that are active in American English Faith

*R PR2L1 *RPR2N1

*RPU2L1

*I1

*RPR2T1

*RPU2N1

*RPI2L1

*RPR2

*RPU2T1

*RPI2N1

*RPA2L1

*R1

*RPU2

*RPI2T1

*RPA2N1

*L1

*RPI2

*RPA2T1

*N1

*RP

*RPA2

*T1

*U1

The remaining constraints within the dashed rectangle in (15) are those that would violate Sonority Sequencing and that cannot dominate Faith. Recall,

Sonority and sonority-based relationships

27

however, that these constraints dominate the corresponding constraints in which the vowels fill peak position and there are other options for parsing such a sequence. For example, given an input /ôim/, the highest sonority segment, /i/, makes a much better peak than does /ô/. /ô/ makes a perfectly acceptable onset. The highest ranking M1 constraint that a [ô1 IP m2 ] (rim) parse will violate is the *R1 constraint and the highest ranking rhyme constraint it will violate is the comparatively low-ranking *IP N2 constraint (which we will see below) while the highest ranking rhyme constraint that a *[ôP j2 m1 ] parse will violate is the much higher ranking *RP I2 N1 constraint. We see this in tableau form in (16). A similar situation exists for the other constraints within the dashed rectangle in (15). (16)

/ôim/ → [ô1 IP m2 ], *[ôP j2 m1 ]

/ôim/ → a. ô1 IP m2 b. ôP j2 m1

Faith

*RP I2 N1

*RP

*!

*

*IP N2 *

*N2 *

*R1 *

*I2

*N1

*

*

*IP *

In (17), we see the constraints relevant for UP followed by the possible singleton codas and coda clusters (*UP combined with each of the constraints in (13)). It is important to note here as well that the *UP R2 L1 constraint and all of the constraints it dominates are dominated by *Rp R2 L1 . Other high-ranking constraints in this diagram are immediately dominated by Faith. In addition, each of the constraints in this diagram dominates the corresponding *IP constraint (discussed below) in turn. In this diagram, parses that violate the constraints in the first four columns would involve short /u/ ([UP ]) followed by either singleton consonant codas (violating the constraints along the bottom of the columns) or consonant clusters (violating the remaining constraints within those columns). While [U] is one of the least common vowels in English (more common than only [Oj] in Kessler and Treiman’s (1997) distributional study of English CVC words), it is readily attested before singleton codas (look, room, pull, poor).21 Before consonant clusters, words containing [UP ] are more difficult to find. Fudge (1987: 362–364) identifies only two words in this category (one followed by [lf] and one followed by [lv]).22 Both would violate *UP L2 T1 21. There is quite a bit of variation in the pronunciation of several of these (and following) words within American English speakers. I include them under the vowel used as the first pronunciation listed in the American Heritage Dictionary of the English Language, 4th edition (Pickett 2000) unless otherwise noted. In this reference, room is cited as [ôuwm], [ôUm]. 22. Presumably, Fudge is referring to wolf and wolves. He does not provide examples for all of his collocations.

28

Karen Baertsch

on this analysis. Because of the infrequent nature of [UP ] in the American English lexicon, and because I (and others) find these clusters pronounceable, I consider the lack of [Ump], [Ulm], etc. to be due to an accidental rather than systematic gap in the lexicon and leave this detail for further research. (17)

Rhyme constraints including peak /u/ Faith *RPR2L1 *UPT2 *UPN2T1 *UPL2N1 *UPR2L1 *UPU2R1 *UPN2

*UPL2T1 *UPR2N1 *UPU2L1 *UPL2

*UPR2T1 *UPU2N1 *UPR2

*UPI2R1

*I1

*UPI2L1

*UPA2R1

*U1

*UPU2T1

*UPI2N1

*UPA2L1

*R1

*UPU2

*UPI2T1

*UPA2N1

*L1

*UPI2

*UPA2T1

*N1

*UPA2

*T1

([ʊP] plus coda) ([uPw2])

The constraints in the fifth column in diagram (17) are violated by parses containing a long/tense /uu/ ([uP w2 ]). Aside from *UP U2 R1 , rhymes violating these constraints are quite common: pool, cool, groom, womb, soon, moon, soup, spook, proof, truth, zoo, stew, blue, etc. Before /ô/, long/tense [uw] (along with [ij], [ej], and [ow]) does not occur, a phenomenon discussed in much of the literature (see, for example, Kahn 1976: 76–77; Rogers 2000: 75–78). As the *UP U2 R1 constraint is immediately dominated by Faith (recall from the previous discussion that *RP U2 R1 dominates Faith), it and the remaining constraints in the dotted triangle in (17) are analyzed here as dominating Faith – the high sonority of the M1 segment in each case causing failure of the combination. This leaves us with the constraints in (18) representing the active rhyme constraints in English with peak /u/.

29

Sonority and sonority-based relationships

(18)

Rhyme constraints with peak /u/ that are active in American English Faith *R PR2L1 *UPT2 *UPN2T1 *UPL2N1 *UPR2L1 *UPN2

*UPL2T1 *UPR2N1 *UPU2L1 *UPL2

*UPR2T1 *UPU2N1 *UPR2

*I1 *UPI2L1

*U1

*UPU2T1

*UPI2N1

*UPA2L1

*R1

*UPU2

*UPI2T1

*UPA2N1

*L1

*UPI2

*UPA2T1

*N1

*UPA2

*T1

Here, as with the peak /ô/ constraints above, we see some parses (in the dashed parallelogram) that would violate the Sonority Sequencing Principle but again we see no surface violations of these constraints because other possible parses (with the highest sonority segment parsed as a peak) are preferred. Take, for example, the different parses of /kuup/ coop and /kuip/ quip shown in (19). Here, despite the ranking *UP U2 T1  *UP I2 T1 shown above, the winning candidate for coop, [k1 uP w2 p1 ], successfully violates the complex rhyme constraint *UP U2 T1 , but a violation of the lower-ranking *UP I2 T1 constraint in the losing candidate for quip ([k1 uP j2 p1 ]) proves to be fatal. In the latter case (quip), this is due to an interaction with the complex onset constraint *T1 U2 , indicating a preference for /u/ to be parsed as part of a complex onset rather than as a peak when followed by a higher sonority segment (in this case, /i/). (19)

/uu/ coop vs. /ui/ quip

/kuup/ a. k1 w2 UP p2 → b. k1 uP w2 p1 /kuip/ → c. k1 w2 IP p2 d. k1 uP j2 p1

Faith

*UP T2 *!

*UP U2 T1

*UP I2 T1

*T1 U2 *

*T1 I2

*IP T2

* *

*

*!

In the former case (coop), the two highest-sonority segments in the sequence are both /u/. Therefore, violation of some *UP constraint is inevitable. The other /u/ in this UR will either be parsed as part of a complex onset or as an offglide

30

Karen Baertsch

of the vowel. The loser for coop fails by attempting to parse a very low sonority obstruent in a very high sonority (M2 ) slot. The constraint ranking diagram relevant for peak /i/ followed by singleton codas and coda clusters is given in (20). The discussion here parallels the discussion of peak /u/ with the exception that short/lax [I] (governed by the constraints in the first four columns of the diagram) is a much more common peak than [U]. As was the case with [U], examples of words with [I] followed by a singleton coda consonant are readily available: pick, ring, mill, peer, etc. Words with [I] followed by non-/ô/ consonant clusters are also readily available: limp, rink, kiln, film, silk, milk. While there are many English words with [IP ] followed by coda [ô2 ] in the lexicon (peer, tear, leer, ear, etc.), words with [I] followed by /ô/-initial clusters are quite rare (beard with a final coronal obstruent). This is due largely to a historical backing of the vowel and resulting merger with syllabic /ô/ (Kahn 1976) that took words like chirp, firm, girl from this category, but these combinations remain a possible rhyme in English. (20)

Rhyme constraints including peak /i/ Faith *R PR2L1 *UPT2 *UPN2T1 *UPL2N1 *UPR2L1 *IPT2

*IPN2T1

*IPL2N1

*IPR2L1

*IPU2R1

*IPN2

*IPL2T1

*IPR2N1

*IPU2L1

*IPI2R1

*IPL2

*IPR2T1

*IPU2N1

*IPI2L1

*IPA2R1

*U1

*IPR2

*IPU2T1

*IPI2N1

*IPA2L1

*R1

*IPU2

*IPI2T1

*IPA2N1

*L1

*IPI2

*IPA2T1

*N1

*IPA2

*T1

([ I P] plus coda) ([ju])

*I1

([i Pj2])

The fifth column in (20) is represented in this analysis by English words containing the [ju:] diphthong. This diphthong is categorized as a long vowel by Fudge (1987) and has a mean duration longer than the average for the long vowels in Crystal and House’s (1988) study of vowel duration – stressed [ju:] ([0] in their orthography) has a mean duration of 151 ms. (p. 279) while the average for their stressed ‘long vowel’ category is 141 ms. and for their ‘diphthong’ category is

Sonority and sonority-based relationships

31

185 ms. (p. 266). The quality of this vowel is of interest here. We usually consider this to be a ‘palatalized’ /u/ of some sort. I suggest here, however, that the quality of this vowel is much more like [40]. To test this, we collected a corpus of 94 instances of the English word cute (2 instances each from 47 individuals) and measured the formant frequencies just beyond the halfway point of the vowel. At that point in the vowel, the average F2 was still 1993 Hz with accompanying lowering of F3 as well, supporting a high front rounded vowel analysis of the first half of this diphthong. At the end of the vowel, F2 remains much higher than the average F2 for [u] as well.23 Compare the formant frequencies in English keep ([ij]), cute ([40]), and cool ([uw]) in Figure 1 spoken by one speaker in the corpus.

[ij] [40] Figure 1. Comparison of the vowels in English keep, cute, cool

[uw]

As a high front vowel, the diphthong described here as [40] would violate the *IP U2 (X1 ) constraints shown in the fifth column in (20) and English words with rhymes violating all but the highest ranking of these constraints are common. *IP U2 is violated by open syllables containing [40] (few, pew); a final M1 obstruent is also acceptable (huge, cube), as is a final nasal (fume) or a final lateral (fuel, mule). What is unacceptable in combination with this vowel is a final /ô/ (*[f40ô] – an underlying /fiuô/ in this analysis would surface as a two-syllable [f1 4P 02 .ÄP ] fewer) *IP U2 R1 is currently immediately dominated by Faith and is reanalyzed here as dominating Faith, along with the other constraints included in the dotted triangle in (20). Long/tense /ii/ violates the constraints in the sixth column in (20). Here again, we find examples easy to come by (sea, keep, leaf, lean, peel, etc.) with the exception of long /ii/ followed by /ô/, paralleling the situation with long /uu/ 23. This is a crude measurement at this time. More detailed analysis of this vowel in this and other environments awaits.

32

Karen Baertsch

above. And, as we did with the constraints in this category with peak /u/, we reanalyze the constraints within the dotted triangle here as dominating Faith as well, a failure precipitated by the high sonority of *R1 . (21)

Rhyme constraints with peak /i/ that are active in American English Faith *RPR2L1 *UPT2 *UPN2T1 *UPL2N1 *UPR2L1 *IPT2

*IPN2T1

*IPL2N1

*IPR2L1

*IPN2

*IPL2T1

*IPR2N1

*IPU2L1

*IPL2

*IPR2T1

*IPU2N1

*IPI2L1

*IPR2

*IPU2T1

*IPI2N1

*IPA2L1

*R1

*IPU2

*IPI2T1

*IPA2N1

*L1

*IPI2

*IPA2T1

*N1

*IPA2

*T1

*I1 *U1

This leaves the active constraints with peak /i/ in (21), including the constraints that violate the SSP in the dashed rectangle which are bested by a parse with a non-high peak. When a high front vowel is available as a peak, we don’t see the same interaction with the complex onset constraints that we did in our discussion of peak /u/. Compare the tableaux for /piuk/ puke and /piik/ peek in (22). Candidate (a) for /piuk/ incurs a fatal violation of the *UP T2 constraint as coop did above, leaving [p1 4P 02 k1 ] puke as the winning candidate. Peek, on the other hand, falls prey to the preference for parsing /i/ as a peak rather than as part of a complex onset, despite the ranking *T1 U2  *T1 I2 discussed in section 2.1. (22)

/iu/ puke vs. /ii/ peek

/piuk/ a. p1 j2 UP k2 →b. p1 4P 02 k1 /piik/ c. p1 j2 IP k2 →d. p1 iP j2 k1

Faith

*UP T2 *!

*T1 U2

*T1 I2 *

*IP T2

*IP U2 T1

*IP I2 T1

* *!

* *

Sonority and sonority-based relationships

33

Finally, we come to the constraints in which a non-high vowel is parsed as the syllable peak. A non-high vowel is uncontroversially the best peak vowel. The determination as to which of these vowels are tense/long and which are lax/short is, however, somewhat controversial. Of the mid vowels, [E] and [2] are consistently treated as lax/short and [ej] and [ow] are consistently treated as tense/long. The true diphthongs [aj], [aw], and [Oj] are consistently treated as long. Fudge (1969, 1987), Kessler and Treiman (1997), and Giegerich (1992) consider [æ] to be a lax/short vowel, in contrast to tense/long [A]. Halle (1977) argues for long and short [æ] in contrast to long [A:]. Crystal and House (1988) treat both low vowels as long, with mean durations in the stressed condition of 159 ms. and 140 ms., respectively, against an average of 141 ms. for all long vowels. As a lax vowel, [æ] follows the restriction against syllable-final lax vowels and can be followed by a consonant cluster (talc, scalp, clamp, damp). [A], on the other hand, easily fits the long vowel category based on its mean duration and can end a stressed syllable (spa), but can also be followed by a consonant cluster (tarp, swamp). I treat [A:P2 ] as a long vowel here (occupying two rhyme slots) and [æP ] as a short vowel (occupying a single rhyme slot) but note that the inclusion of a short [AP ] or a long [æ:P2 ] will not change the details of the analysis and I leave the determination of long/short options of both low vowels for later research. The constraints and rankings relevant to non-high peaks are given in (23). In (23), we see the best peak segments (mid and low vowels) combining with all of the coda combinations from (13). The non-high lax vowels ([æP ], [EP ], [2P ]) all combine easily with both singleton codas (violating *AP T2 , *AP N2 , *AP L2 , *AP R2 – cap, deck, dumb, lamb, bell, dull, care, etc.). These peaks also combine very easily with consonant clusters (lamp, hemp, helm, gulp, elf, snarl, corn, harm, pork, etc.), violating the *AP N2 T1 , *AP L2 N1 , *AP L2 T1 , *AP R2 L1 , *AP R2 N1 , and *AP R2 T1 constraints in the first four columns of the diagram. The non-low diphthongs [ow] and [aw] (which violate the *AP U2 (X1 ) constraints in the next column) can end a word (bow, bough, hoe, how, etc.) and they combine with single consonants in M1 position (couch, cove, town, tone, foul, pole, hour), although less well when they violate the highest ranking of these constraints, *AP U2 R1 . Compare here the varied pronunciations of hour, sour, scour, and perhaps flour and dour with one vs. two syllables ([aPw2 ô1 ] vs. rhyme [aPw2 ] followed by peak /ô/ ([aP w2 .ÄP2 ])).24 Note also the relative rarity of (orthographically) one syllable [aP w2 ô1 ] words listed above in contrast 24. Two research assistants were asked to impressionistically judge the number of syllables in 188 instances of tower or sour. The vast majority of these utterances were judged to be two syllables with only a few single syllables. This is in contrast to the

34 (23)

Karen Baertsch

Rhyme constraints including peak non-high vowel Faith *RPR2L1 *UPT2 *UPN2T1 *UPL2N1 *UPR2L1 *IPT2

*IPN2T1

*IPL2N1

*IPR2L1

*APT2 *APN2T1 *APL2N1 *APR2L1 *APU2R1 *APN2

*APL2T1 *APR2N1 *APU2L1 *APL2

*APR2T1 *APU2N1 *APR2

*API2R1

*I1

*API2L1

*APA2R1

*U1

*APU2T1

*API2N1

*APA2L1

*R1

*APU2

*API2T1

*APA2N1

*L1

*API2

*APA2T1

*N1

*APA2

*T1

([æP], [ɛ P], [ʌP] plus coda) ([aPw2], [oPw2]) ([aPj2], [ePj2], [ɔ Pj2]) ([ɑ Pɑ 2])

to the (orthographically) two syllable words of similar composition (power, tower, cower, bower, shower, flower, etc.). The non-low diphthongs [aj], [ej], and very low frequency [Oj] (violating constraints in the *AP I2 (X1 ) column) are also easily attested in word-final position (tie, day, boy, etc.) and are readily followed by singleton consonants (shape, pipe, wave, noise, dime, lame, pail, smile, boil, fire), again with more difficulty violating the highest ranking of these constraints, *AP I2 R1 .25 Finally, words violating the *AP A2 (X1 ) series (long low vowel followed by singleton consonant) in the last column of conjunctions in (23) are quite common (lock, con, ball, car, etc.). Note at this point that it is only when we get to the best peaks (non-high vowels) that combinations with all of the possible codas (from (13)) are attested. same task with 94 instances of towel in which several (approximately one third) were considered to be single syllables. 25. Here again, research assistants were asked to impressionistically judge the number of syllables in 288 instances of tire, wire, and liar in which the initial consonant was truncated, leaving what they were told was ire. Again, very few were judged to be single syllables in stark contrast to dial, aisle, nail, and tail, which frequently came in as monosyllabic.

Sonority and sonority-based relationships

3.

35

Conclusion

The analysis presented here has highlighted many of the sonority-based processes active in the syllabification of stressed monosyllabic words in American English. It captures the phenomena previously discussed (sonority sequencing, sonority slope restrictions) as well as a number of additional phenomena (the ‘worst of the best’) that are tied directly to sonority. In these cases, the sonority threshold in each of the syllable positions is satisfied (onsets may include singleton sonorant consonants, peaks may include syllabic sonorants, etc.) but these segments fail to appear in the same positions in branching constituents. These include the failure of dispreferred but acceptable high sonority onset (M1 ) segments to surface in initial position in onset clusters. Nasal consonants are acceptable singleton onsets and given the most commonly cited sonority scale, a cluster consisting of a nasal followed by a glide would meet the minimum sonority distance requirement of two based on the same scale, yet the cluster cannot surface in an English onset cluster. Here it is the relatively high sonority of the nasal (in comparison to the best onset, an obstruent) that triggers the failure. The situation is similar in the coda. While glides are acceptable singleton codas as the offglides of tense vowels or diphthongs, they fail to surface in coda clusters consisting entirely of non-consonantal segments (which would push the glide into M1 position). In the rhyme, the situation is more complex. The peak is the head of the rhyme and English allows several relatively low sonority segments to surface in the rhyme. Syllabic nasals and laterals surface in English, but only in unstressed syllables and with no coda following. The syllabic rhotic is a much more acceptable peak in English, surfacing in stressed as well as unstressed syllables, and that rhotic can be followed by a singleton consonant but it may not be followed by a consonant cluster. It is only the best peaks (vowels) that can surface with coda clusters as well as singleton codas. And even with rhymes including a high vowel peak, coda clusters ending with [ô] become unacceptable or awkward. In each case, it is the sonority profile of the sequence coupled with the sonority level of the individual segments that trigger the failure, not just the sonority slope. We also see here the competition between syllable positions for individual segments. This is most evident with high sonority high vowels that can surface as singleton onsets (an M1 position), as the second segment in an onset cluster (an M2 position), as a peak, or as a coda segment (an M2 position). As the sonority of the segment increases, so does the pull toward peak position. And, while very high sonority segments may appear in coda position, they are pulled into the peak as sonority increases relative to the preceding segment. Hence, a high front vowel is pulled into the syllable peak rather than surfacing as part of an onset cluster follow-

36

Karen Baertsch

ing an obstruent, but will surface as a singleton onset when a non-high vowel follows. Our discussion has been framed in optimality theoretic terms and is based on the sonority scale in (1). However, that scale represents a much more detailed sonority scale that has been collapsed to highlight only the sonority distinctions most important for the current analysis. In fact, the top and bottom ends of this scale will be further exploded (subject to paper size and screen size, of course) to capture the details of English phenomena such as the schwa deletion phenomena detailed by Hooper (1978). A similar analysis of another language will also use the same scale, perhaps encapsulated in a different way. What is clear from the analysis presented here is that we cannot afford to rely entirely on a ‘steepness of slope’ model of sonority sequencing if we want to capture the sonority governed behavior within the syllable. The model also captures the comparative markedness of clusters in a more abstract way. Take, for example, just the minimal sonority distinctions vowel/ glide (V) > liquid (L) > nasal (N) > obstruent (T) and limit acceptable onset clusters to just the combinations obstruent + X. The model generates the hierarchy in (24) for onset clusters in this case. The implication in this hierarchy is that obstruent-nasal clusters are more marked than obstruent-liquid clusters are more marked than obstruent-glide clusters. Insertion of Faith between *T1 N2 and *T1 L2 (Faith 1 in (24)) would eliminate the possibility of obstruent-nasal clusters (as is the case in English) leaving only obstruent-liquid and obstruentglide clusters as acceptable in the language. Insertion of Faith at the Faith 2 position in (24) would make obstruent-glide clusters the only acceptable clusters in the language. (24)

Onset constraint hierarchy generated from basic sonority scale (Faith 1) (Faith 2) *T1 N2  *T1 L2  *T1 V2 (*V2 …  … *VP )

Adding interaction with the Peak/Rhyme constraints into the analysis, we see the possibility of reducing the clustering possibilities from the lower-ranking end of the hierarchy. The preference for vowels in peak position, shown in the *V2 …  … *VP ranking in (24), is responsible for what amounts to an increase in the markedness of obstruent-glide clusters, resulting in a language that allows only obstruent-liquid clusters and not obstruent-glide clusters. This has also been shown to be at work in English in the asymmetrical distribution of obstruent-glide clusters.

Sonority and sonority-based relationships

37

Such an analysis also need not be framed in optimality theoretic terms, although the approach employed here does have the advantage of making more of the relationships within syllable constituents accessible than earlier approaches. At its base, we simply took the sonority scale ordered to show sonority preference in each syllable position, combined the scales at each syllable node to represent all logical combinations, and mapped that to the syllables found in American English stressed monosyllables. What resulted from this is evidence of the very active role sonority plays in syllabification. Abbreviations OCP Obligatory Contour Principle SSP Sonority Sequencing Principle Acknowledgments My thanks go to Rachel Melvin, Aracelis Gonzalez Johnson, Tanja Burkhard, and John Dotson for discussion and empirical testing of many of the issues discussed here. Thanks as well for the comments I received from reviewers, Steve Parker, and several students as I was writing this chapter. Finally, thanks to all those who agreed to participate in data collection over the past year.

The role of sonority in the phonology of Latin András Cser

Abstract. Sonority plays an important role in the phonology of Latin consonants not only in the static organisation of segments into syllables but also in syllable contact phenomena and in a number of phonological processes, most notably in total assimilation at prefix–stem boundaries. This chapter gives a detailed description and analysis of these three sonority-related phenomena. The analysis of consonant clusters, syllabification and assimilations reveals the details of the functioning of the Sonority Sequencing Principle and the Syllable Contact Law in Latin. It further reveals the details of the interaction between these sonority-related generalisations and place of articulation; the nature of this interaction, in which the special nature of coronals is highlighted, is captured in three generalisations (the Place Condition, the Inverse Place Condition and the Generalised Place Condition). The analysis also provides insight into how sonority as a principle of phonological organisation interacts with morphological composition, specifically in the case of prefixation.

1.

Preliminary

1.1. Aims, scope and framework; sonority This chapter is mainly descriptive in that its goal is to present the role of sonority in the consonantal phonotactics of Classical Latin, not in some specific theoretical framework but in a way that may prove useful to adherents of various models. The three focal points are (i) syllable structure, especially with respect to the peripheral (non-nuclear) constituents, where consonants are found, and the way the Sonority Sequencing Principle1 manifests itself; (ii) the regularities governing the distribution of consonants in heterosyllabic clusters in simplex forms and the way the Syllable Contact Law2 manifests itself; (iii) the assimilations that affect consonants at prefix–stem boundaries and the role sonority plays (with the mediation of the Syllable Contact Law) in these. Furthermore, 1. Hooper (1976), Steriade (1982), Selkirk (1984), Clements (1990), Zec (2007), Parker (2011) amongst others. 2. Hooper (1976), Murray and Vennemann (1983), Vennemann (1988), Zec (2007), Seo (2011).

40

András Cser

in all three points the interaction of sonority with place of articulation is presented in detail and we propose, as descriptive generalisations, three interdependent conditions (the Place Condition, the Inverse Place Condition and the Generalised Place Condition) whose aim is to give a semi-formal expression to particular aspects of this interaction. We proceed in a synthetic manner in that we generalise over data and establish patterns, but we do not have the space to review all of the arguments in detail that the various points of our analysis are based on (references will be duly given). We assume what may be called a traditional constituent-based notion of the syllable (a hierarchical structure consisting of Onset, Nucleus and Coda, the latter subsumed under Rhyme), including the possibility of certain segments being extrasyllabic (e.g. [s] in stare ‘to stand’). We also assume a minimum of morphological structure, distinguishing only between simplex and prefixed forms. Simplex forms are not necessarily monomorphemic (in fact there are very few monomorphemic words in Latin), but they include at most post-stem formatives; they can be defined as forms that are neither compounds, nor prefixed forms, nor reduplicated forms. In this chapter compounds and reduplicated forms are not discussed at all.3 Sonority will be made use of here as a classificatory notion, a scalar property of segments. We assume without discussion, and in line with much of the relevant literature,4 that (i) sonority is a property based on some physical characteristics of speech sounds, most probably intensity (loudness or the amount of acoustic energy), the openness of the vocal tract, formant structure (resonance) and voicing; (ii) there exists a scale along which segments or segment types can be arranged as a function of their particular sonority value. While there is no a priori reason for the sonority scale to be linear, i.e. non-branching, we will work with a traditional linear scale, viz. Vowels  Glides  Liquids  Nasals  Fricatives  Stops. What will be examined here is to what extent such a scale 3. For phonological aspects of Latin reduplication see Cser (2009a). 4. See Parker (2002, 2003, 2008 and 2011), Clements (2009), Lodge (2009: 77–79), Clements and Hume (1995), Blevins (1995: 210–212), Cser (2003: 28–43), Jany et al. (2007), Szigetvári (2008). In Parker (2011) and (2008) a very detailed scale is given with flaps higher, and trills lower, in rank than laterals (see also Parker 2002: 255–257). It will be seen later that there may be some reason to assume that in Latin [r] had higher sonority than [l] (as Steriade 1982 also claims), although it is uncertain whether it was a flap or a trill. Another aspect of the scale in Parker is the ranking of voiced obstruents above voiceless ones in general, thus the ranking of voiced stops higher than voiceless fricatives, though the author admits that “the ranking of voiced stops over voiceless fricatives is harder to justify than most aspects of this hierarchy” (Parker 2011). Here we assume that stops are generally less sonorous than fricatives.

The role of sonority in the phonology of Latin

41

is helpful in making generalisations about syllable structure and the distribution of consonants in Classical Latin. 1.2. Data and the nature of the evidence In terms of data, the present work is based on volume 1 of the Brepols Corpus (CLCLT-5 – Library of Latin Texts by Brepols Publishers, Tombeur 2002). In filtering the data we have confined ourselves to the period between 100 BC and 400 AD.5 We disregard loanwords that were, in all likelihood, not yet “naturalised” in the period under discussion, such as the pn- and ps-initial technical and religious terms borrowed from Greek (e.g. pneumaticus ‘of air or wind’, psalmus ‘psalm’). When metrical issues arise, we use a “poetic sub-corpus”, which is the entire corpus of the poets Lucretius, Catullus, Vergil, Horace, Propertius, Tibullus, Ovid, Silius Italicus, Persius, Lucanus, Martialis, Statius, Valerius Flaccus, and Juvenal.6 In the case of a dead language the availability and the reliability of the data present special problems. It is only one part of the problem that data for Latin exist only in writing. The other part is that even the documents in which the language has been preserved come overwhelmingly from periods later than that in which Latin was actually spoken and in which the originals of these documents were composed. The written sources of Latin thus fall into two major groups. A smaller part has remained from Antiquity without any mediation (papyri and inscriptions). The larger part, by contrast, i.e. manuscripts, have been transmitted via copying by hand, the only method of transmission until the appearance of the printing press. The vast majority of extant Latin texts fall into the latter category and, by consequence, they do not always reflect faithfully the form of texts as originally produced by their authors. What follows from this is that the linguist who studies Latin has to rely on a large corpus of texts that are burdened with varying degrees of uncertainty of an elementary kind. Of course most texts, especially from the classical era, have been restored with high fidelity and very good editions have been around for some time. But one always has to bear in mind that 5. This period is chosen because it fairly consistently represents what is called Classical Latin. Since it is certain that the language did not remain phonologically unchanged even in the aspects relevant here, we disregard data that fall within this period but only occur towards its end. 6. This subcorpus results simply from the appropriate filtering of data in the Brepolscorpus. The poets are listed in chronological order of birth to the extent that it is known. Lucretius was born in the first years of the 1st century BC; Juvenal died some time in the first half of the 2nd century AD.

42

András Cser

some of the data are conjectural, and some of the conjectures are not necessarily right, though most linguists are not in a position to judge these for themselves, especially on a larger scale. In the case of a dead language the question naturally arises what evidence there is for actual pronunciation. For detailed discussion one may consult works like Allen (1978) or Sihler (1995); here we only list the major types of sources. Evidence is based on a variety of data, like much of reconstructive work, in particular (i) direct comments and descriptions by ancient grammarians; (ii) poetic practice: versification, metrification, rhythm, occasional instances of alliteration, rhyming etc.; (iii) puns and onomatopoeic expressions; (iv) inscriptional evidence, which is contemporary (as opposed to manuscript evidence) and often produced by unschooled people who knew little about standardised orthography and were thus more likely to produce phonetically faithful scraps of writing; (v) comparative evidence involving related contemporary languages as well as daughter languages; (vi) transcriptions of Greek words and other borrowings both ways (i.e. Latin words represented in Greek, early Germanic, Celtic etc. as well as Greek words represented in Latin script); (vii) typological aspects of sound systems, implicational relations etc.; (viii) internal aspects of the phonological system (phonotactics, apparent processes of assimilation, lenition and other). A brief note on metrical evidence is in order here, since much of this chapter is concerned with syllable structure and syllabification, for which evidence comes mainly from scansion. The poetic metres used in the Classical Latin period have been researched for centuries and are well known at the empirical level (see e.g. Halporn, Ostwald, and Rosenmeyer 1963 or any of the standard handbooks; see further Allen 1978 and 1973, two very important works on the topic), and they have also been the object of theoretical inquiry (e.g. Fabb and Halle 2008). Everything we presuppose here in terms of metrical interpretation can be found even in introductory textbooks; the outlines are the following. Latin metre is based on systematic alternations of heavy (-VV., -VC(C)., -VVC(C).) and light (-V.) syllables. There were many patterns in use in the Classical period, all modelled on Greek precursors (for which see West 1982 and Devine and Stephens 1994). The metrical patterns reveal unambiguously in most cases which syllables are heavy and which are light (the exceptions include e.g. line-final syllables, which are not strictly delimited by the metre). In the overwhelming majority of cases the length of vowels is known too on independent grounds (see the list above). Thus in many configurations the syllabic affiliation of consonants can be detected on the basis of the relation between vowel length and syllable weight.

The role of sonority in the phonology of Latin

43

For example, the last foot in a hexametre always consists of two syllables, the first of which is heavy, the second indeterminate. If such a foot includes the word p˘ont¯es ‘bridges’, the only possible syllabification of the cluster is [n.t] with a syllable boundary between the two consonants, because the first syllable of the word is heavy, although its vowel is short, thus the [n] has to be its coda. If a word like ˘ımp˘etr¯o ‘I achieve’ is found to constitute a heavy–light–heavy foot, this shows that the cluster [m.p] is heterosyllabic but [.tr] is a complex onset to the third syllable because the second syllable of the word, being metrically light, cannot have a coda. 1.3. The structure of the chapter The segmental inventory is introduced and some of its controversial aspects highlighted in section 2. Static syllable structure is described in section 3, which neatly illustrates the workings of the Sonority Sequencing Principle. Syllable contact phenomena are discussed in section 4, where a restricted operation of the Syllable Contact Law is observed and the Place Condition and the Inverse Place Condition are introduced. In section 5, the role of sonority in consonantal (total) assimilations is discussed and the restricted form of the Syllable Contact Law is demonstrated in this context too, with interesting parallels and differences between consonant clusters within morphemes vs. at morpheme boundaries. It is here that the Generalised Place Condition is introduced. Section 6 concludes the chapter. 2.

The segmental inventory of Classical Latin

2.1. Consonants The surface-contrastive inventory of Classical Latin consonants is the following: Table 1. The Classical Latin consonant system

obstruents

stops

fricatives nasals sonorants liquids glides

voiceless voiced

labial p b f m (w)

coronal t d s n lr j

velar k g

glottal

h N w

44

András Cser

This system is typologically very simple. Voicing is contrastive only for stops; fricatives are redundantly voiceless, sonorants are redundantly voiced. Three places of articulation cover all consonants but one; [h] is the only glottal segment in the system, while there is no velar fricative in the language at all.7 The glide [j] was phonetically palatal, and [w] was labiovelar.8 Since [j] is the only phonetically palatal consonant we list it as a coronal, a categorisation supported by the word-final occurrence of this segment (see below).

7. Since, however, the only reconstructible historical source of Latin [h] is Proto-IndoEuropean **[gH ], it is probable that there was, at some point between the two stages, a velar fricative in the system which then developed into [h] (see e.g. Sihler 1995: 158–160 for a classic handbook-type summary; Stuart-Smith 2004 is a work devoted in its entirety to the development of Proto-Indo-European aspirates in Italic, with the problems surrounding Latin [h] discussed on pp. 43, 47 and passim). It must be mentioned at this point that [h] does not condition or undergo any phonological rule in Latin and therefore a complete description would be equally feasible without it (as in Touratier 2005 or Zirin 1970; in the latter work /h/ is appropriated as a phonological symbol for hiatus), which may well be a sign that by classical times this sound was lost completely. The behaviour of (etymological) (V)[h]V is no different in any respect from that of plain (V)V. 8. There is a tradition of analysing glides as positional variants of the respective high vowels (see e.g. Hoenigswald 1949 or more recently Touratier 2005 and Lehmann 2005). While their environments are partly predictable, cases of contrast are far too numerous to be dismissed. Some of these cases can be explained away with reference to morphological structure (vol[w]it ‘he rolls’ vs. vol[u]it ‘he wanted’ where the [u] is a perfective marker), or s[w]avis ‘sweet’ vs. s[u]a ‘his/her’, where the [a] of sua is a feminine marker), some cases clearly cannot: bel[u]a ‘beast’ vs. sil[w]a ‘forest’, q[w]i ‘who/which’ vs. c[u]i ‘to whom/which’, ling[w]a ‘tongue’ vs. exig[u]a ‘small’, co[i]t ‘(s)he meets’ vs. co[j]tus ‘meeting’, or aq[w]a ‘water’ vs. ac[u]at ‘(s)he should sharpen’, or consider the possibilities of representing the difference between inicere [injikere] ‘throw in’ vs. iniquus [ini:kwus] ‘inimical’. As an anonymous referee of the present paper correctly points out, some of the generalisations to be made later on would be simpler and less laden with exceptions if glides were analysed as vowels rather than consonants and thus did not fall under the purview of consonantal phonotactics proper. Since, however, glides and high vowels are contrastive in many cases, this is not possible. Another way to achieve this kind of simplicity would be to say that while glides are not identical phonologically to high vowels, they are distinguished from all other consonants by some major class feature (e.g. [+/− vocalic]), and then we go on to present only those regularities that govern the distribution of [− vocalic] segments. For the sake of a more complete description we will not pursue this line here, though we readily admit that it would be theoretically interesting.

The role of sonority in the phonology of Latin

45

There seems to be good evidence that the orthographic sequence gn denoted phonetic [Nn] rather than [gn].9 Thus [N] was in almost complementary distribution with [n] (scil. [N] before velar stops, but note annus ‘year’ and agnus ‘lamb’ with contrast between [nn] vs. [Nn]) as well as in almost complementary distribution with [g] (scil. [N] before [n], but note agger ‘heap’ and angor ‘constriction’ with contrast between [gg] vs. [Ng]). It is because of these marginal contrasts that we include the velar nasal in Table 1 above. There is further evidence that [l] displayed an allophony somewhat similar to that found in British English, viz. it was velarised before consonants and velar vowels, and unvelarised before palatal vowels and in gemination.10 All the consonants except [h], [N], and [w] occur as geminates, though some of them mostly or only at prefix–stem boundaries. In simplex forms geminates are found only intervocalically.11 All consonants except [N] occur word-initially and intervocalically. The fricative [h] (if it still existed at the time) occurs only in these two environments (homo ‘man’, vehere ‘carry’). For [f] word-initial position is its almost exclusive environment (ferre ‘carry’, fur ‘thief’, frangere ‘break’ etc.); in simplex forms, non-initial [f] occurs only in a handful of words such as vafer ‘cunning’. Word-final consonants are mostly suffixes or parts of suffixes; this follows from the morphological character of the language. Consonants that constitute or end suffixes are [t d s r j], e.g. venit ‘he comes’, feror ‘I am carried’, puellae ‘girls’ ([-aj]). Non-suffixal final consonants are [s r n l] and marginally [t k] in lexical words and some function words (only [t]) and some deictics (only [k]), and [b w] in some proclitics. Thus the descriptive generalisation for word-final

9. The evidence is surveyed in Allen (1978: 22–25) and most handbooks, and involves the general phonotactic patterns of Classical Latin (on which more will be said later), diachronic developments, inscriptional as well as ordinary spellings and word-plays. Word-initial gn may have retained or regained the archaic pronunciation [gn] in the proper name Gnaeus. The same spelling-pronunciation cannot be excluded wordinternally either. Initial gn is discussed in detail in Cser (2011). Note that Vennemann (1988: 38) lists the change [gn] > [Nn] as one that improves syllable contact by raising the sonority of the first consonant. 10. The evidence, summarily discussed in Allen (1978: 23–25) and other handbooks again comes from grammarians’ remarks, sound changes conditioned by [l] as well as its Romance reflexes. On English l-darkening see e.g. McMahon (2002: 110). 11. Note that the phonology of geminates and the representation of length will not be discussed here. For simplicity we assume that length is not a contrastive feature and geminates are sequences of identical consonants.

46

András Cser

consonants appears to involve a marked preference for coronals and definitely no Classical Latin word ends in [p f h g m].12 The issue of the labiovelars [kw] and [gw] is a complicated one. Without going through the arguments in detail we note that these can be analysed both as single segments with secondary articulation and as clusters; the structural patterns, including the phonotactics of the language as well as morphophonological alternations, are indecisive. In this chapter we assume that they are clusters rather than monosegmental entities, but much of the present analysis would be the same even if we chose the other option.13 2.2. Vowels and hiatus The surface-contrastive set of vowels is given in Table 2. Table 2. The Classical Latin vowel system

high mid low

short long nasal front back front back front back i u i u ĩ u e o e o e o a a a

The Classical Latin vowel system consists of three parallel sets of five vowels: short, long oral and long nasal. While nasal vowels contrast with oral vowels on the surface (des [de:s] ‘you should give’ vs. dens [d˜e:s] ‘tooth’), their distribution is generally predictable with respect to the two nasal consonants. Nasal vowels occur in two environments, finally and before fricatives. Since [m] does not occur in word-final position, and [n] does not occur before fricatives, nasal vowels can be analysed as being the positional variants finally of [Vm], and in pre-fricative position of [Vn]. If one analyses nasal vowels as derived through the dropping of a coda nasal consonant with compensatory lengthening (which actually replicates their history), their invariable length is explained, and so are the alternations they enter into (not to be discussed here). Virtually all discussions of the Classical Latin vowel inventory include a number of complex entities referred to as diphthongs traditionally. These are 12. Word-final orthographic m merely indicates the nasalisation (plus lengthening) of the preceding vowel, rather than phonetic [m]. 13. Devine and Stephens (1977: 13–104) is by far the most detailed discussion of the Latin labiovelars to date. For a classic summary of some of the arguments, to which later “phonemic” analyses hark back, see Sturtevant (1939). For a less thorough but astute survey see Zirin (1970: 29–40). See also Watbled (2005) and Touratier (2005). The whole issue is thoroughly reviewed in Cser (2009b: 14–23).

The role of sonority in the phonology of Latin

47

ae [aj] (ai etc.), oe [oj] (oi ), au [aw] (au ), for some also ei [ej] (ei ), “ ), ui [uj] (ui ), even “ “ ). This practice goes back“to eu [ew] (eu ou [ow] (ou “ “ “ a glide that is tautosyllabic a terminological and notational tradition in which with a preceding vowel is said to form a diphthong with it. Such an approach is not only dated now but was already inconsistent before the appearance of modern phonological analysis not least because it introduced an unwarranted distinction between prevocalic and postvocalic glides. A consistent phonological analysis can hardly support a view of the Classical Latin vowel system that postulates diphthongs in the strong sense of the word, i.e. complex entities that are functionally equivalent to “pure” vowels, or at least a significant subgroup of them, e.g. long vowels.14 Two heterosyllabic vowels may be adjacent in simplex forms under certain restricted circumstances. Two constraints are very general and almost exceptionless: (i) the first vowel is short (and non-nasal); (ii) the second vowel alternates. To the first constraint there are three sorts of exceptions: one involves the disyllabic forms of the verb fieri ‘to become, happen’, e.g. f¯ıo¯ ‘I become’, f¯ıunt ‘they become’, another the pronominal genitive suffix -¯ıus (as in ill¯ıus ‘his’), the third a handful of genitive-dative forms belonging to the fifth declension, most notably di¯e¯ı ‘day’.15 Exceptions to the second restriction consist in a handful of words with [ie] and [ue] (hiems ‘winter’, abies ‘pine tree’, paries ‘wall’, puer ‘boy’, puella ‘girl’, duellum ‘war’). As for the melodic content of the vowels in hiatus, restrictions only apply to the first vowel, the choice of the second being governed by paradigmatic regularities. The first vowel can be [u i e]; of the remaining two vowels, [o] is never found on the left of a hiatus; [a] is found in aeneus [ae:neus] ‘bronze’ (adjective) and ait ‘he said’. Vowels separated by [h] show exactly the same regularities, with a single word, trahere ‘to drag’ having [a] in hiatus, and the interjection e¯ heu ‘alas’ a long vowel before h. Hiatus rules largely prevail in prefixed forms and compounds too.

14. Functional equivalence involves two aspects: (i) the entity in question is phonotactically equivalent to a vowel, i.e. it only occupies the syllable nucleus and generally patterns with vowels phonotactically; (ii) it is equivalent to a vowel in terms of alternation patterns and generally in terms of triggering and undergoing phonological rules. The arguments will not be presented in detail here; the interested reader is referred to Cser (1999), in even more detail Cser (2009b: 25–31). 15. The rule is that in these genitives/datives the [e] is short if preceded by a consonant, as in r˘e¯ı ‘thing’, but long if preceded by a vowel (in fact always [i]), as in di¯e¯ı ‘day’. This phenomenon may be thought of as a ban on the double application of the shortvowel-in-hiatus rule (viz. short [e] because of the following [i:], and then short [i] because of the following [e]).

48

András Cser

3.

Syllable structure

Most of the evidence for syllable structure comes from the inspection of attested consonant clusters and their behaviour in verse. As was briefly explained in 0.2, versification in Antiquity was based on the systematic alternation of heavy and light syllables arranged into a variety of patterns. If the length of vowels in individual morphemes is known independently, as it usually is, the weight of specific syllables indicates the syllabic affilation of most consonants. It is clear, for instance, that not all CC clusters in the context V__V behave in the same way. Thus, if one contrasts the two words in (1), patres ‘fathers’ and hostes ‘enemies’, with vowel length and usual scansion indicated, one sees that the two clusters behave differently:16 (1)

p˘atr¯es h˘ost¯es scansion: ∪ — — —

On the basis of metrical evidence, [tr] thus turns out to be an onset cluster, whereas [st] turns out to be – at least internally – a heterosyllabic cluster, since the metrical difference between the two words can only be explained with the syllabification p˘a.tr¯es vs. h˘os.t¯es – in spite of the fact that both [tr] and [st] are found word-initially too (tres ‘three’, stare ‘stand’). 3.1. The general template The general syllable template, which describes a hypothetical maximal monosyllabic word, is given in (2). (2)

The syllable template of Latin σ

Ons

Rh

Nu

[s]

[obs] [son]

V V

Co

[son]

[obs]

[s]

16. The arc indicates a light syllable and the horizontal line a heavy syllable.

The role of sonority in the phonology of Latin

49

This template is too general, however, and several restrictions apply to the specific positions it contains. These restrictions can be summarised and illustrated as follows (discussion of the [s] on the peripheries not dominated by the σ node is found in the next subsection). A non-complex onset, whether initial or post-vocalic, may include any of the consonants (as given in Table 1, except the velar nasal). Internal post-consonantal onsets are more restricted; these restrictions will be seen in more detail in 3 below.17 A complex onset always consists of an obstruent and a nonnasal sonorant and thus conforms to the Sonority Sequencing Principle. More specifically, these clusters include stop+liquid sequences (plenus ‘full’, a.cris ‘sharp’, but note that stop+[l] is much rarer than stop+[r]18 ), [fr] and [fl] (frater ‘brother’, a.fra ‘black’, flamma ‘flame’ the latter only initially), [sw] (suadere ‘persuade’, only initially), [kw] (quis ‘who’, a.qua ‘water’) and [gw] (san.guis ‘blood’, only internally after [N]). Nasals are not found in complex onsets at all. As is typical cross-linguistically, the choice of coda consonants is more restricted than that of onset consonants in Latin. Word-final codas, whether simple or complex, overwhelmingly include coronal consonants (e.g. pecten ‘comb’, vult ‘he wants’, sunt ‘they are’, pars ‘part’). The only exceptions are a small set of [k]-final deictics (e.g. hic ‘this’, istaec [istajk] ‘this’, hinc ‘from here’), two [ws]-final nouns (laus ‘praise’, fraus ‘deception’), the noun hiems ‘winter’ and two function words (aut [awt] ‘or’, haud [hawd] ‘not’). Internal codas, by contrast, clearly prefer sonorants, non-coronal and voiceless obstruents. More specifically, all seven sonorants [m n N l r j w] are found as single coda consonants word-internally, but of the obstruents systematically only [p k s]. The voiced obstruents, the coronal stops, [f] and [h] are generally incompatible with the coda position.19 Complex internal codas are se17. For a generalisation involving the preferred similarity of initial and internal onsets see Vennemann’s Law of Initials (Vennemann 1988: 32). 18. The asymmetry between [l]-final and [r]-final clusters may be due to a difference in the respective sonority values of the two liquids combined with the Minimal Sonority Distance effect, which is suggested in Steriade (1982), and also in Parker (2008 and 2011), though in Parker’s sonority scale this would only work if [r] was a flap rather than a trill since trills have a lower sonority value than [l]. 19. What this means is that final clusters bear more resemblance to heterosyllabic clusters than to medial coda clusters, which is equivalent to saying that final consonants bear more resemblance to onset consonants than to coda consonants. For reasons of space we do not go into this issue in detail but note that this point has been made with respect to other languages; see Harris (1994: 73–4) on English as well as French. In Goverment Phonology and some other models it is explicitly argued that final consonants are onsets. For a survey of these arguments see Gussmann (2002: 91ff). For important theoretical insights on consonant clusters see also Cyran (2008).

50

András Cser

quences of a sonorant and [p] or [k] (e.g. carp.tus ‘picked’, mulc.tus ‘milked’, emp.tus ‘taken’, sculp.si ‘I shaped’), but [js] and [ws] are also found (caes.pes ‘lawn’, faus.tus ‘favourable’) and the badly irregular hapax cluster [wg.m] of aug.men(tum) ‘growth’ includes the single occurrence of coda [wg].20 As sonorant+obstruent sequences complex codas are in conformity with the Sonority Sequencing Principle. The only exception to the sonorant+obstruent structure (but not to the Sonority Sequencing Principle) is the final coda [st] found in four words altogether (e.g. est ‘is’). 3.2. Extrasyllabic [s] Of all the consonant clusters found in Latin, one type does not fit into the template given in (2) above. This includes an [s] flanked by stops and/or a word boundary. In particular, this means initial [sp st sk] (e.g. spirare ‘breathe’, stare ‘stand’, scire ‘know’), medial [kst pst] (dexter ‘right’, depstum ‘pastry’) and final [ps ks] (ops ‘help’, rex ‘king’). The problematic status of such clusters, especially at word edges, has long been noticed in the general phonological literature (e.g. Lowenstamm 1981). Specifically in Latin, these clusters show the following peculiarities. (i) They are the only clusters that do not conform to the Sonority Sequencing Principle. The [s] represents a relative sonority peak between two stops as well as between a stop and a word boundary in any order. (ii) Final (sonorant +) stop + [s] clusters are highly restricted in that they only appear when a specific inflectional suffix (nominative singular -s) is added to stop-final stems. (iii) Initial [s] + stop (+ sonorant) clusters are paralleled by medial clusters of the same segmental composition which are clearly heterosyllabic (see the example hos.tes ‘enemies’ vs. stare ‘stand’ above). (iv) If a vowel-final prefix is added to a simplex form with initial [s] + stop (+ sonorant), leftward resyllabification takes place and the [s] becomes the coda of the preceding syllable. This never happens when obstruent + sonorant initial simplex forms are prefixed (e.g. re+stare → res.tare ‘remain’ but re+trahere → re.trahere ‘pull back’). Unambiguous evidence for this comes from poetic metre, e.g. restant ‘they remain’ always scans as two heavy syllables, while retrahunt ‘they pull back’ as light–light–heavy. (v) Three verbs with initial [s] + stop have a reduplicated perfect, which other CC-initial verbs never have. The three verbs are stare ∼ steti ‘stand’, spon20. By hapax we mean it is only attested in this lexical item. It is also phonologically irregular, on account of the [gm] section; see section 4.

The role of sonority in the phonology of Latin

51

dere ∼ spopondi ‘promise’and scindere ∼ preclassical scicidi ‘cleave’. Notice that reduplication only copies the stop portion of the cluster, not the [s].21 We think the evidence is compelling enough to regard these (but only these) occurrences of [s] as extrasyllabic, a solution we suggested earlier (Cser 1999: 178).22 A consequence of this is that in the clusters in question only some consonants belong to either the coda or the onset, viz. s[pr Ons ] – in words like spretus ‘disdained’, – [ Co p]s in words like ops ‘help’, – [ Co k]s[tr Ons ] – in words like extra ‘outside’, and so on. It is clear that if we did not have recourse to extrasyllabicity, any description of syllable structure and any taxonomy of the consonant clusters, which underlies such a description, would be much more complex – so long, of course, as one remains within a hierarchical model like the one used here.23 4.

Syllable contact

Heterosyllabic, i.e. coda–onset, clusters in Latin simplex forms are overwhelmingly in conformity with the Syllable Contact Law; that is, the last segment of the syllable on the left has higher sonority than the first segment of the following syllable.24 The asymmetry of most permitted heterosyllabic cluster types is 21. Also note the imperfective reduplication (even more moribund morphology in Latin than perfective reduplication) sistere ‘stop’ from the same root as stare. 22. In Lehmann (2005: 168 and passim) the same position is called pre-initial and postcoda. An advantage of referring to it as extrasyllabic is that in internal [pst] [kst] one does not have to decide whether the trapped [s] is pre-initial or post-coda, which would be possible only on an ad-hoc basis. More recently see Vaux and Wolfe (2009) and Green (2003a) on extrasyllabicity in general and Morelli (2003) on s+stop clusters. 23. Internal extrasyllabic [s] is not allowed by the Peripherality Condition (Roca 1994: 213). Note, however, that Rubach and Booij (1990) argue for unsyllabified stray consonants in word-medial position in Polish; similarly, Goad (2011) argues that in English, words like extra include an internal “appendixal” [s], though the appendix is adjoined to the syllable node, not the word node. If we do not wish to explicitly reject the Peripherality Condition, we may note that internal extrasyllabic [s] is almost always followed by a t-initial suffix (e.g. depstum ‘pastry’ cf. depsere ‘to knead’) and thus we may argue that it is at the periphery of a morphological domain. This is a point that will not be pursued here. 24. More recently, the role of sonority in the development of consonant clusters in Late Latin was analysed in Gess (2004), where the validity of essentially the same generalisation is demonstrated in an Optimality Theoretic framework.

52

András Cser

clear. For instance, [s] can only be followed by voiceless stops (hos.pes ‘host’, hos.tis ‘enemy’, cres.cere ‘grow’); nasals can be followed by stops (an.te ‘before’) but never by liquids; stops can never be followed by nasals or glides ([kw] and [gw] are not heterosyllabic and [gm] is exceptional, see below); liquids can be followed by stops, nasals and [s] (al.bus ‘white’, pul.sus ‘beaten’, cer.nere ‘see’, ul.mus ‘elm-tree’) but not by another liquid (*[rl lr]) or the glide [j] (*[lj rj]). At the same time, there is an interesting interplay between sonority and place of articulation. In a sequence of a non-coronal and a coronal consonant (in this order) in the lower half of the sonority scale, C1 does not have to be of higher sonority. Four of the five permitted equal-sonority clusters are of this kind (ap.tus ‘fit’, ac.tus ‘done’, am.nis ‘river’, di[N].nus ‘worthy’) as are the two permitted stop+fricative clusters (ip.se ‘himself’, Ve[k.s]i ‘I carried’). This may be formalised in the following way: (3)

The Place Condition Heterosyllabic [obs][obs] and [nas][nas] clusters are well-formed irrespective of sonority relations if C1 is non-coronal and C2 is coronal (i.e. [pt kt ps ks mn Nn] are well-formed). If C2 is non-coronal, only sonority relations are decisive (i.e. [sp sk] are well-formed, *[tk tp pk kp] are not).

On the other hand, the only remaining clusters not in conformity with the Syllable Contact Law, [rw lw jw] (par.vus ‘small’, sil.va ‘forest’, ae.vi [aj.wi:] ‘age’ [genitive]) are all coronal+non-coronal sequences and thus show the opposite distribution of place (e.g. a good parallel to the [mn]-type would be the non-existent cluster *[wj]). It thus seems that in the upper half of the sonority scale the opposite of the above condition holds: (4)

The Inverse Place Condition Heterosyllabic clusters consisting of non-nasal sonorants are wellformed irrespective of sonority relations if C1 is coronal and C2 is noncoronal (i.e. [lw rw jw] are well-formed). If C2 is coronal, only sonority relations are decisive (i.e. [wr wl jr jl] are well-formed, *[rl lr lj rj wj] are not).

The Place Condition encapsulates an empirical observation that has been made in the literature earlier about a variety of languages. Bailey (1970) briefly discusses metathesis as a diachronic change and generalises that it preferably results in non-coronal+coronal clusters. Clements (1990: 311–314) discusses the issue and the proposals made earlier to explain such effects by assigning lower

The role of sonority in the phonology of Latin

53

sonority to coronals than to non-coronals of identical manner of articulation. He rejects such a solution because it leads to conflicting generalisations and opts instead for an explanation based on markedness, i.e., [t] being simpler than other voiceless stops, it is freer to occur in a variety of positions (see also de Lacy 2006 on markedness in general and place of articulation markedness in particular). This, however, does not in itself explain preferred sequential orderings (in other words, it explains why [pt] and [tp] are preferred to [kp] or [pk] but does not explain why [pt] is preferred to [tp]). The same is true of Booij’s (1995: 44–46) analysis of a similar preference in Dutch clusters, which is based at this point on Yip (1991). If coronals had lower sonority than labials and velars within the same manner class, cases like [pt] and [kt] could be accounted for, since these would be falling-sonority clusters, but the possibility of [ks] and [ps] would be left unexplained. Furthermore, as Steriade (1982) points out, the lack of complex onsets [tl] [dl] can actually be an argument for the higher, rather than lower, sonority of [t] [d] because the smaller sonority distance between coronal stops and [l] makes these clusters worse than e.g. [pl] [kl]. The following table summarises the distribution of heterosyllabic clusters in simplex forms as a function of sonority. As in the discussion so far, geminates are disregarded (e.g. the liquid+liquid box is left empty in spite of legitimate and numerous geminate liquids). Cluster types that are generally well-formed are marked with (2) and a chequered box. Cluster types that only admit noncoronal+coronal sequences (i.e. comply with the Place Condition) are marked with (1) and horizontal lines. Cluster types that only admit coronal+non-coronal sequences (i.e. comply with the Inverse Place Condition) are marked with (3) and vertical lines.25 The empty top right-hand half of the table vs. the full bottom left-hand half (marked 2) shows the validity of the Syllable Contact Law. A couple of comments are in order at this point. One of these concerns fricatives. The segment [f] does not take part in word-internal clusters except for a few words with [fr] (e.g. vafritia ‘cunning, wit’). This cluster appears to vacillate between tautosyllabicity, i.e. being a complex onset, and heterosyllabicity, which means that the cluster type fricative+liquid is marginally attested. The other fricative [s] is never found before voiced consonants (at least internally; initial [sw] is attested in three stems and their derivatives).

25. While here we only give examples of CC clusters, it is to be understood that the C.C portions of larger clusters (CCC and the very rare CCCC clusters) display the same properties. That is, a cluster like [n.tr] or [m.pl] belongs in the nasal+stop category here, [Nk.t] or [lk.tr] in the stop+stop category, and so on.

54

András Cser

Table 3. Heterosyllabic cluster types in simplex forms

Legend:

1 – Place Condition 2 – Syllable Contact Law 3 – Inverse Place Condition empty box – no cluster attested

The other comment concerns stop+sonorant clusters. While stop+liquid clusters are overwhelmingly tautosyllabic, i.e. complex onsets, there is some tokenlevel variation in poetic practice. The history of this variation has been well researched and the consensus is that the (occasional) heterosyllabic scansion of stop+liquid clusters was introduced in the wake of Greek models.26 It is to be noted that while it is customary to speak of stop+liquid (muta cum liquida) clusters as a single type, there is, in fact, a major asymmetry between the [r]-final and the [l]-final subtypes. Clusters of the form C[l] appear to be generally rarer, and are somewhat problematic because of the widespread diachronic vacillation between CV[l] and C[l] forms (e.g. periculum ∼ periclum ‘danger’), which resulted from the conflict between the early tendency to insert a short vowel in the environment C_[l] and the somewhat later tendency to syncopate unstressed vowels in internal open syllables.27 Two specific stop+liquid clusters present difficulties of a rather elementary sort: [dr] is lexically very rare (and appears to be mostly heterosyllabic),28 and 26. For a good summary see Timpanaro in EV 232–235 and Sen (2006); in a different context, see Hoenigswald (1992), who argues that morphological structure played a part in the syllabification of such clusters at least in early Latin. 27. For a good discussion of the data see Ward (1951) and Sen (2006). 28. The only Latin words containing medial [dr] attested in poetry are derivatives and compounds based on qu˘adr- (e.g. quadratus ‘divided into four parts’, quadrupes ‘four-legged’, quadriiuga ‘drawn by four horses’). Such words occur 127 times in the entire poetic corpus. Out of the 127, 113 (= 89%) scan with a heavy first syllable, indicating a heterosyllabic cluster. This may result from the fact that otherwise the vast majority of words with [dr] are either Greek names/loans (e.g. Hadria, cedrus ‘cedar’, hydrus ‘water-serpent’), where heterosyllabicity of any stop+liquid cluster is the (borrowed) norm, or prefixed forms (e.g. adripere ‘grasp’), where heterosyllabicity is a phonological rule of Latin (see below). The rarity of [dr] in simplex

The role of sonority in the phonology of Latin

55

non-initial [kl] is found consistently only in one word (periclitari ‘try’) which, however, is never used in poetry and it is thus impossible to assess the syllabification of this particular cluster. The only stop+glide clusters, [kw] and [gw], are always tautosyllabic. There is, furthermore, one attested stop+nasal cluster, the highly exceptional [g.m]. This cluster is odd because it is (i) the only consistently heterosyllabic and rising sonority cluster, (ii) the only instance of a coda voiced stop apart from the wordfinal [d] of neuter pronouns, (iii) the only instance of post-obstruent nasal and (iv) only ever found in words derived with the suffix -men(tum), e.g. agmen ‘train’. Therefore it can be regarded as truly marginal and hence the empty box in the table above. A final, brief note concerns nasals. Nasals are not found before fricatives – although sonority relations would not militate against this configuration – except for the single lexical item hiems ‘winter’ (see above the discussion of nasal vowels). In hiems, however, the cluster [ms] is not heterosyllabic but a complex coda. 5.

Assimilations

5.1. Prefixed forms in Classical Latin In this section we look at consonantal assimilation phenomena that occur at prefix–stem boundaries, the only type of morphological boundary that is relevant here. It will be seen that these assimilations are driven largely by sonority and result in clusters that are in conformity with the generalisations made in earlier sections. Furthermore, similarly to the patterns seen in simplex forms, an interesting interplay between sonority and place of articulation is observed. A brief discussion of some aspects of resyllabification is in order here, since it involves the reorganisation of syllable structure at such morpheme boundaries (and, of course, at word boundaries, which will not be discussed here). When a prefix is added to a stem, the assignment of the segments at the boundary to syllabic positions shows the following very general regularities in those cases when there is no segmental change. – A prefix-final consonant is syllabified as onset to the first syllable of the stem if and only if without it there would be no onset to that syllable: ab+ire

forms, its absence from word-initial position coupled with the pull of the Greek pattern and the syllabification of prefixed forms apparently led to a preference for a heterosyllabic analysis of this cluster even in native words.

56

András Cser

→ a.bi.re ‘go away’ vs. ob+ruere → ob.ru.e.re ‘bury’ in spite of simplex fe.bris ‘fever’ etc. – A stem-initial [s] that would be extrasyllabic in an unprefixed form is syllabified leftwards if a well-formed coda results: re+stare → res.ta.re ‘remain’ (but ob+stare → ob.s.ta.re ‘obstruct’ with the [s] remaining extrasyllabic). – There is no leftward syllabification of other stem-initial consonants: re+ fractus → re.frac.tus ‘broken’.29 What this shows is that syllabification cannot override the syllable structure that is created within morphological boundaries except when onsetless syllables would result following a closed syllable (as in a.b+i.re). The assignment of stem-initial extrasyllabic [s] to the prefix-syllable (as in re+s.ta.re) does not contradict this, since that [s] is not incorporated into the syllable structure of the stem.30 If we assume that syllable structure is present in the individual morphemes before concatenation, this process of resyllabification can be represented as follows: (5)

Resyllabification at prefix–stem boundaries σ

σ

σ

σ

σ

σ

Rh

Rh

Rh

Rh

Rh

Rh

Nu Co

Nu

Ons Nu

Nu Ons Nu

Ons Nu

a

b + i

i

r

e



a

b i

i

r

e

σ

σ

σ

σ

σ

σ

Rh

Rh

Rh

Rh

Rh

Rh

Ons Nu

Ons Nu

Ons Nu

r

e

+

{s} t a a

r

e

Ons Nu Co Ons Nu →

r

e

s

t a

a

Ons Nu r

e

29. Despite appearances, the prefixation of gn-initial stems (e.g. ignoscere ‘forgive’ from in + gnoscere) does not involve either nasal deletion or leftward resyllabification. For a detailed discussion see Cser (2011). 30. Note that this account of the interaction between syllable boundaries and morphological boundaries is considerably simpler than that presented in Devine and Stephens (1977: 136–8), where a complex interplay between phonological processes and morphological domains is assumed.

The role of sonority in the phonology of Latin

57

The tangled story of prefixed forms will not, in general, be pursued here. We nevertheless note some salient features because an understanding of these is indispensable for what follows in this section. Prefixation led in many cases to lexicalisation, which in turn resulted in drastic phonological modifications at the prefix–stem boundary as well as within the stem.31 The pace and the extent of lexicalisation, however, was highly variable. Furthermore, prefixation also involved recomposition in all periods of the documented history of Latin. An early case is seen in perj¯urare ‘forswear’, which is the recomposed variant of the older form peierare ([pejjera:re], same meaning). Later recompositions can be reconstructed on the evidence of Romance languages; it is well known that reflexes of forms like rétinet ‘he keeps’ (< re + tenet) often derive not from the inherited Classical Latin forms but from recomposed variants such as **reténet (> French retient etc.). The varying pace of lexicalisation and the varying degree of transparency coupled with the phonological processes that took place at prefix–stem boundaries resulted in a cline of productivity and phonological interference with strongly lexicalised and opaque forms at one end and transparent formations at the other. The situation is further aggravated by the etymologically oriented habits of spelling that began to gain ground beginning with the 1st century AD but did not affect all words of a similar composition to the same extent. The net result is that many of the phonological processes to be described here are variable and some are quite difficult to decipher because of their equivocal attestation.32 It is also very difficult to say, for the same reason, to what extent prefixed words were prosodically integrated, an issue that will not be discussed here (apart from syllabification). 5.2. Assimilations at prefix–stem boundary The particular prefixes that we look at here are the following: ad- ‘to’, ob‘against’, sub- ‘under’ (stop-final); dis- ‘away’, trans- ‘across’ ([s]-final); con31. Within stems, these phonological modifications are virtually confined to short vowels; a discussion of these is beyond the limits of the present work. 32. The best summary of these issues to date is Prinz (1949–50 and 1953), which is based on an extensive study of manuscript and inscriptional evidence as well as grammarians’ remarks; one may further consult Leumann (1977: 181–219) on the sound changes that took place in consonant clusters, including those that emerged at prefix–stem boundaries, Buck (1899) on the assimilation of prefix-final consonants, and García González (1996), a short case-study of the prefix ad- and its epigraphic variants based on the Roman inscriptional corpus (Wilhelm 1876). For a thorough summary see Cser (2009b: 64–88).

58

András Cser

‘with’, in- ‘in’ or ‘un-’ (nasal-final); per- ‘across’ or ‘very’, super- ‘above’, inter- ‘between’ ([r]-final); and prae- ‘forward’ ([j]-final). The remaining prefixes are either vowel-final and thus irrelevant (e.g. de- ‘about’, pro- ‘forward’), or attested in too few lexical items to be of interest (e.g. por- ‘forward’, an‘around’), or do not show any relevant kind of alternations (e.g. subter- ‘under’, circum- ‘around’, post- ‘after’), or show alternations that fall outside the scope of phonologically based generalisations (e.g. ab-/abs-/as-/a-/au- ‘from’). No prefix ends in a voiceless stop or [g], or in [f h m l w]. The assimilations that take place between the prefix-final and the stem-initial consonant are always regressive, i.e. the first consonant assimilates to the second, e.g. [db] > [bb]. Proceeding in order of decreasing sonority of the prefix-final consonant, the first item is prae- [praj]. The final glide does not appear to undergo assimilation to any kind of consonant. Before vowel-initial stems it is resyllabified as one would expect, e.g. praeacutae [pra.ja.ku:.taj] ‘sharpened to a point’. The only liquid found in prefix-final position is [r]. This shows occasional assimilation to [l] (intelligere ‘understand’ from inter+legere or perlucere ∼ pellucere ‘transmit light’), and isolated assimilation to [j] in a single item (peierare [pejjera:re] ‘forswear’ from per+jur- (see above). The liquid [r] remains unchanged before stops, fricatives, nasals and [w] in all cases (pertulit ‘he carried through’, perfert ‘he carries through’, pernoscere ‘to know thoroughly’, pervadere ‘to walk across’). Nasals undergo place assimilation but no manner change before stops. Before fricatives they are dropped with compensatory lengthening and nasalisation of the preceding vowel, but direct, inscriptional and other kinds of evidence show that vacillation must have been great (inscius [˜ı:skius] ‘ignorant’, conferre [kõ:ferre] ∼ [ko.Mferre]? ‘carry’). Total assimilation of the nasal to liquids is found quite frequently (e.g. corrigere ‘correct’ or irrigare ‘make wet’, actually more frequently for con- than for in-). Before glides, nothing seems to happen to the nasal, with one exception: conjicere ‘throw’ is attested more than sporadically as coicere, which suggests a totally assimilated nasal, i.e. [kojjikere]. Other words with glide-initial stems are occasionally found with no orthographic n in inscriptions (e.g. coiux ‘spouse’). Very rarely com is written instead of con before [w]-initial stems (e.g. comvivia ‘banquets’). Of the fricatives, only [s] is found prefix-finally. This sound is not affected by any change before stops, and undergoes place assimilation to [f] (at least in dis-, e.g. differre ‘scatter’ but transferre ‘take across’ apparently with no assimilation). Before all voiced consonants (a fortiori before all sonorants) the [s] of dis- is regularly deleted with compensatory lengthening of the preceding vowel (e.g. d¯ıluere ‘wash away’, d¯ımittere ‘send away’, d¯ıjudicare ‘judge’). The [s]

The role of sonority in the phonology of Latin

59

of trans- is deleted in the same way only before voiced coronals (e.g. tr¯adere ‘hand over’, tr¯alucere ‘transmit light’ vs. transversus ‘crosswise’, transmittere ‘send over’). Among stop-final prefixes a distinction must be made between [d]-final adand [b]-final ob- and sub-. The [d] of ad- undergoes place (and voicing) assimilation to all stops (e.g. appetere ‘try to reach’, accipere ‘try to reach’ – note that these are not instances of total assimilation). It variably undergoes total assimilation to all remaining consonants except the glides (adjuvare ‘help’, advenire ‘arrive’), though assimilation to liquids appears to be more frequent than assimilation to fricatives or nasals (arrogare ∼ adrogare ‘claim’, adsiduus ∼ assiduus ‘persistent’, admovere ∼ ammovere ‘move near’). The [b] of ob- and sub- variably undergoes voicing assimilation to obstruents in general and place assimilation to velar stops (e.g. occupare ‘seize’, suggredi ‘approach’). It undergoes total assimilation to [f] but not to [s] (e.g. sufferre ‘endure’, offundere ‘pour’), and it variably undergoes total assimilation to [m] but not to [n] (e.g. obmutescere ∼ ommutescere ‘become dumb’, submittere ∼ summittere ‘put forth’). Before liquids, ob- remains unchanged, whereas the [b] of sub- variably undergoes assimilation to [r] but not to [l] (surripere ‘steal’ but subrogare ‘substitute’, sublevare ‘raise’). Similarly to [d], [b] also does not assimilate to glides. In Table 4 we summarise the total assimilations that take place between prefix-final consonants and stem-initial consonants. Place assimilations and voicing assimilations are not indicated. A distinction is made between systematically attested assimilations (1, darker shade), sporadic assimilations (2, medium shade), isolates (3, lighter shade) and non-assimilating types (empty box);33 in the last type the cluster surfaces unchanged (or is only affected by place and/or voicing assimilation). The marking n/a in the table means that the clusters in question do not emerge for some reason (nasal deletion before fricatives and [s]-deletion before voiced segments). Table 5 highlights the place assimilations in the stop/stop and the fricative/fricative sections of the above table. It is clear from the data in Table 4 that the assimilations are governed largely by the Syllable Contact Law. Total assimilation is likely to take place if the sonority of C1 is lower than the sonority of C2 . This is borne out by the fact 33. Admittedly, this four-way categorisation is an oversimplification, since what we call sporadic here in some cases conflates type-level and token-level variability, and also lexically determined allomorph selection. While it is impossible to do justice to all the data in the space given, we are convinced that this presentation does not distort them in any harmful way. The interested reader is referred to the literature mentioned in the previous note.

60

András Cser

Table 4. Total assimilations at prefix–stem boundary

Legend:

1 – systematically attested assimilations 2 – sporadically attested assimilations 3 – isolated instances of assimilation empty box – no assimilation n/a – cluster does not emerge for independent reasons

Table 5. Systematically attested place assimilations between stops and between fricatives at prefix–stem boundary

C1 ↓

C2 →

dt

bp

gk

s

f

d b s

that nothing assimilates (totally) to stops, only stops assimilate to fricatives and nasals, both stops and nasals assimilate to liquids, and the glide [j] does not assimilate to anything. That is, Table 4 is by and large the inverse of Table 3. On the other hand, nothing ever assimilates to [w], and assimilation to [j] is sporadic, which means that C+glide clusters are tolerated much better at prefix– stem boundaries than in simplex forms. Thus the Syllable Contact Law, a sonority-based principle, appears to operate as a static filter in the case of simplex forms and as a filter inducing assimilation processes at these morpheme boundaries. The clusters that are of special interest at this point are those which are rising-sonority or equal-sonority clusters (i.e., at variance with the Syllable Contact Law) and which are practically never remedied by assimilation.34 Some of these clusters are identical in both 34. So we do not include here clusters such as [bm] [dn] or [rl] because these are variably repaired to [mm] [nn] [ll], respectively, e.g. submovere ∼ summovere ‘remove’, adnumerare ∼ annumerare ‘count’ and perlucere ∼ pellucere ‘transmit light’. By

The role of sonority in the phonology of Latin

61

segmental composition and syllabification to clusters found in simplex forms (viz. [pt], [ps] and [rw], as in obtinere ‘maintain’, obsidere ‘occupy’ and pervadere ‘go through’, respectively). But some are not; in particular, these latter are [bd bn sm] (e.g. subdere ‘put underneath’, obnunciare ‘bring bad news’, transmittere ‘send over’, respectively), and the glide-final clusters [dj dw bj bw sw35 nw rj]. The cluster [bl] is found internally too, but at prefix–stem boundary it is always heterosyllabic (oblectare ‘delight’), thus in spite of the identical segmental composition its syllabification is not the same as that of its word-internal counterpart (see section 5.1). If one wishes to make a phonologically based generalisation about these stable, categorically non-assimilating equal- and rising-sonority clusters, they clearly present two separate issues. One is that given the lack of active assimilatory capacity on the part of glides, and the compulsory regressive direction of assimilations in Latin,36 most C+glide clusters that emerge at prefix–stem boundaries surface intact. If we discount these, we are left with [ps pt bd bn sm bl].37 Of these, the coronal-final clusters (i.e. all except [sm]38 ) are covered by what can be termed Generalised Place Condition (cf. 3 above), which is no longer restricted to [obs][obs] and [nas][nas] clusters: (6)

The Generalised Place Condition (valid at prefix–stem boundary) Heterosyllabic clusters are well-formed irrespective of sonority relations if C1 is non-coronal and C2 is coronal (i.e. [ps pt bd bn bl] do not undergo assimilation). If C2 is a non-coronal other than [w], only sonority relations are decisive (i.e. [bf df bm dm] undergo obligatory or optional/variable assimilation, but the falling sonority clusters do not; by undergoing place assimilation, the clusters [bg dg sf] comply with the same clause).

contrast, we include [rj] in the list of non-remedied clusters in spite of the single item peierare ‘forswear’ discussed above. 35. The cluster [sw] is found in simplex forms too, but only word-initially as a complex onset, never as a heterosyllabic cluster. 36. Which is, of course, not to deny that some progressive assimilations are generally believed to have happened in the prehistory of the language, e.g. **wel-si > velle ‘want’. 37. Although [br] should fall under the generalisation to be made, it is not included here because it variably undergoes assimilation with sub-, though not with ob-. 38. This [sm] is that found with trans- only. This prefix ends in a non-assimilating [s], and thus it creates the only exception to the Generalised Place Condition in (6).

62

András Cser

6.

Conclusion

The phenomena we have surveyed in Classical Latin provide insight into how sonority as a principle of phonological organisation is present in the static distributional patterns of consonantal segments. Given that most of these patterns were inherited from Proto-Indo-European and many survived into the Romance languages (aspects we have left untouched), they appear to be historically very stable and persistent. In its phonotactics, Latin is actually a very conservative language, possibly more conservative than other Indo-European languages of the same period, though significant differences can be found too (e.g. the absence of [s] + voiced consonant clusters). We have also gained insight into how sonority as a principle of phonological organisation interacts with morphological composition, specifically in the case of prefixation.39 This issue also has a diachronic aspect because many prefixed forms arose already in the prehistory of Latin, many of these were lexicalised (sometimes with severe phonological consequences), but many were recomposed, and new prefixed structures continued to arise even in Post-Classical Latin. It has been seen that the Syllable Contact Law, a sonority-based generalisation, operated at prefix–stem boundaries as well as within simplex forms, but in a more restricted fashion. This may be due to the fact that the different morphological domains allowed the Syllable Contact Law to operate differently – yet another issue left for later research, with a lot to disentangle even at the level of data, given the long and intricate history of prefixation in Latin. The third point of interest is the interaction of sonority with place of articulation. The data reveal a pattern in which certain heterosyllabic arrangements of coronal and peripheral (non-coronal) consonants constitute well-formed clusters in spite of the sonority relations that obtain between them. We have captured these in three interrelated place conditions. Fleshing out this interaction between manner (sonority) and place more fully is again beyond the scope of this work, but it is certainly an issue that should be looked at in more depth in the future. Abbreviations x.y * ** C Co

syllable boundary ill-formed representation reconstructed form consonant coda

39. Another aspect we left unmentioned is that many of the consonant clusters in simplex forms arose historically from suffixation.

The role of sonority in the phonology of Latin

nas Nu obs Ons Rh son σ V

63

nasal nucleus obstruent onset rhyme sonorant syllable vowel

Acknowledgments My thanks must go to Steve Parker and two anonymous reviewers who made invaluable comments on an earlier version of this chapter. If I did not always take their advice, the responsibility is mine, and so are all remaining errors.

Is the Sonority Sequencing Principle an epiphenomenon? Eric Henke, Ellen M. Kaisse and Richard Wright

Abstract. The behavior of clusters of obstruent consonants and of clusters of sonorant consonants is not well explained by the Sonority Sequencing Principle or the Syllable Contact Law. Their phonotactics find more comprehensive explanations in the predictions of a theory of perceptual acoustic cue robustness. We explain this theory, review and refine the results of cross-linguistic surveys of all-obstruent or all-sonorant sequences, and add specific case studies of Korean and Modern Greek. We conclude that the correct predictions of the Sonority Sequencing Principle and the Syllable Contact Law are subsumed under the cue approach, which makes fewer incorrect predictions, avoids stipulations concerning sibilants, and covers a wider range of phonotactic generalizations.

1.

Introduction

Phonologists and phoneticians continue to wonder – are the Sonority Sequencing Principle (SSP) and the related Syllable Contact Law fundamental explanatory principles or are they only apparent generalizations that are better explained through the intersection of perception-based factors? The SSP (ex. Clements 1990) states that a string of tautosyllabic segments should rise in sonority to the syllable nucleus and fall in sonority thereafter. The Syllable Contact Law (ex. Vennemann 1988) militates against rising sonority over a syllable boundary. Are either of these principles really the best way to view phonotactic generalizations? In this paper we tackle the issue head-on by looking in detail at two languages, Korean and Modern Greek, that display a wide range of consonant clusters and of processes affecting them. We conclude that a perception-based account is more explanatory, since it covers the commonly observed strings that observe the SSP and Syllable Contact Law, as well as frequently encountered exceptions such as prevocalic strident + obstruent plateaus (or reversals, depending on the sonority scale one adopts)1 . Using the same set of tools, the 1. We acknowledge that if one treats sibilants and other sonority-violating segments as extrasyllabic or otherwise unparsed in regards to syllable structure, it is difficult

66

Eric Henke, Ellen M. Kaisse and Richard Wright

perception-based account also deals with matters about which the SSP says nothing, such as changes in place of articulation of adjacent consonants; and with changes in all-sonorant or all-obstruent clusters, about which Syllable Contact or the SSP may also say nothing or the quite the wrong thing. Moreover, the perception-based account covers several other universals without additional principles, including the unmarked status of CV syllables and the dispreference for codas (all noted by Clements 1990 but requiring a stipulative set of sonority dispersion principles that differ between CV and VC). This chapter builds on previous work (e.g. Wright 1996, 2004; Morelli 1999, 2003; Seo 2003b) in extending the role of perception to phonotactic patterns previously attributed to Sonority Sequencing. In lieu of the phonetically problematic and cross-linguistically variable sonority scale, we appeal to scales of auditory cue robustness, based on objective acoustic dimensions, and well-understood properties of the human auditory system. For example, the ill-formedness of *[bnIk] vs. the well-formedness of [blIk] are often cited as evidence for the SSP. However, we observe that the place cues to a word-initial [b] in pre-nasal position are a subset of the cues in pre-liquid position. As is more fully discussed in sections 3.2 and 3.3, in the pre-liquid position the cues in the [b] release are perceptible in the transition into the liquid, whereas with the following nasal stop the release burst is likely to be lost due to gestural overlap (as with any stop+stop sequence), while the oral closure of the [n] obscures the formant transitions that would otherwise provide cues to the place of articulation. Indeed if the release burst and formant transitions are lost, the prenasal stop will likely not be perceived as present at all (Wright 2001, Benkí 2003). As in the sonority-based approach, it is correctly predicted that no language allows word-initial stop+nasal clusters to the exclusion of stop+liquid clusters.2 It will be shown that the empirical coverage of this cue-based approach is broader, extending to a wide variety of phonological phenomena, and its predictions are more accurate than the sonority-based approach. In our survey of languages we to find any sonority violating segments. However, as we discuss in section 2, we feel that this very common family of formal solutions leaves an important question unaddressed: Why is it that the segments implicated in sonority violations (or which are left unparsed) follow such a strong typological pattern? 2. Daland et al. (2011) counter an appealing argument for the idea that speakers must have knowledge of the SSP because they have comparative judgments about sequences they have never heard, judging non-words beginning with [bn] to be superior to those beginning with [lb], for instance (Berent et al. 2007; Davidson 2006, 2007.). Daland et al. argue that speakers have access to the lexical frequency of sequences of features, not only sequences of segments, and thus can judge the [-son][+son] word-initial sequence in [bn] to be superior to the [+son][-son] sequence in [lb].

Is the Sonority Sequencing Principle an epiphenomenon?

67

find that there are three outcomes for segmental sequences that are perceptually weak. In one set of languages, exemplified in this paper by Korean, these weak sequences are targets of assimilation, deletion, etc. (as discussed in Steriade 2001, 2009, Blevins 2004 and Seo 2003b). In another set of languages, exemplified here by Modern Greek, we see changes to segmental strings (both synchronic and diachronic) that result in a more robust perceptual encoding. In the last set of languages, such as Arrernte, Tsou, and Georgian (discussed briefly in the last section), perceptual cues are maintained and enhanced through phonetic processes of gestural timing. We remain agnostic in this paper about whether perceptual cues are encoded in constraints in the synchronic grammar, as argued in a large body of work by Steriade and Flemming, among others; or whether their presence or absence rather results in changes over time that result in synchronic patterns where more perceptible strings are over-represented and less perceptible strings have been altered by deletion, assimilation, or enhancement, as in the work of Blevins. What we do dispute is whether the SSP is a universal principle of synchronic grammars. We are convinced by Evans and Levinson (2009) and Levinson and Evans (2010) that it is implausible that highly specific parameters and universals could be encoded in the human genome. But if the SSP is not part of the human genetic endowment, where does it come from? Our view is that the segment sequencing generalizations it is meant to deal with come from the maximization of perceptually recoverable strings, and this explains why the predictions of a perceptual cue approach subsume the good predictions of the SSP while avoiding the incorrect ones. Work like that of Daland et al. (2011) mentioned in footnote (2) thus figures importantly in our reasoning, since it can explain knowledge speakers possess that might otherwise seem to have to be innate. The chapter is structured as follows. In section 2 we define and discuss sonority and highlight several problems that we see with its usage in phonological theory. Section 3 makes precise our notion of cue robustness and cue precision. We further show, in section 4, how the perceptual factors discussed in section 3 capture a range of typological generalizations previously attributed to Sonority Sequencing. We then exemplify this approach with discussions of the phonotactics of Korean and Modern Greek in sections 5 and 6 respectively. Finally, in section 7 we discuss how language-specific phonetic implementation affects the ways in which perception plays a role in phonology. We use the apparent problem of Arrernte syllable structure as an example. Finally we consider some implications of a perceptual approach to formal phonology.

68

Eric Henke, Ellen M. Kaisse and Richard Wright

2.

Sonority, the Sonority Sequencing Principle, and the Syllable Contact Law

The precise definition of sonority we subscribe to is not of great import here, since we deal with processes and phonotactic generalizations which we argue are best not seen in terms of sonority. We do believe that Parker’s (2002, 2008) correlation of sonority with observed intensity was a large step forward. And while we argue against the use of sonority in accounting for apparent sequencing generalizations like the SSP and the Syllable Contact Law, we have no reason to reject the whole notion of sonority or the feature [sonorant]. (See Kaisse 2011 for an overview of how the feature has been deployed and whether there is sufficient support for keeping it.) The notion of Sonority Sequencing has long played a central role in analyses of phonotactic patterns and related phonological processes. The search for a single underlying Sonority Sequencing Principle has its roots in a strong typological tendency to place severe restrictions on the sequencing of speech sounds. Although there are many variants of Sonority Sequencing (see Baertsch this volume and Miller this volume for discussions), a widely cited version was proposed by Clements (1990) in which phonotactic sequencing generalizations are accounted for in terms of constraints requiring segments to be ordered within syllables according to a “sonority cycle,” rising sharply in sonority from the beginning towards the nucleus, and falling in sonority within the coda, if any. To take a familiar example, the ill-formedness of *[bnIk], and *[lbIk] as possible words of English, vs. [blIk], is attributable to a requirement that English onset clusters rise by a distance of at least two steps on the following sonority scale: vowel > glide > liquid > nasal > obstruent (Clements 1990). The obstruent-liquid-vowel sequence in [blIk] satisfies this Sonority Sequencing and distance requirement, while the nasal-obstruent sequence in *[bnIk] has insufficient sonority distance and the liquid-obstruent sequence *[lbIk] has a sonority fall before the syllable nucleus. However, despite its widespread use, Sonority Sequencing, and related sonority- based generalizations such as Syllable Contact (e.g. Vennemann 1988; Davis and Shin 1999; Zec 2007), have remained problematic for a variety of reasons. One of the most frequently cited problems is that although it corresponds loosely to acoustic intensity or degree of constriction, sonority has no straightforward phonetic definition. For example, sibilant fricatives generate high intensity frication, but still pattern as low-sonority segments; and nasals have maximal oral constriction but relatively high sonority (see Keating 1983, Clements 1990 and Harris 2006 for discussions of problems with proposed phonetic correlates of sonority). While this has led some to propose that sonority

Is the Sonority Sequencing Principle an epiphenomenon?

69

be treated as an abstract feature or representational scale, Harris (2006) points out that most other representational scales seen in phonological theory derive fairly straightforwardly from phonetic dimensions and form the basis for lexical contrasts, whereas the sonority scale does neither. Rather, the sonority scale is used in phonology exclusively in regards to phonotactics and syllable structure. Attempts to relate the sonority scale to a complex of acoustic/perceptual dimensions (e.g. Keating 1983; Komatsu et al. 2002; Parker 2002, 2008) have revealed that relative acoustic intensity and relative segmental duration are the most reliable phonetic correlates. However, when these acoustic dimensions (such as the relative intensity of laterals vs. rhotics) are tested in cross-linguistic studies, they are found to vary dramatically across languages in ways that are irrelevant to their phonotactic patterning (e.g. Jany et al. 2007). While the crosslinguistic variance in the relationship between sonority and acoustic dimensions might be attributable to interleaving of constraints (in an OT-based approach), it seems unlikely that the introduction of acoustic detail into formal phonology will resolve many of the shortcomings of sonority discussed in the following paragraphs. Secondly, the phonological details of the sonority scale appear to vary from language to language. For example, in Steriade’s (1982/1990) analysis of Ancient Greek, coronals pattern as higher in sonority than noncoronals and voiced obstruents pattern as higher in sonority than voiceless ones; similarly Levin (1985) and Dell and Elmedlaoui (1985, 2002) propose much more detailed language-specific scales for Klamath and Tashlhiyt Berber, respectively. However, in most instantiations of the sonority scale, languages observe no place or voicing distinctions. Clements (1990:296) suggests that a universal sonority scale can be nevertheless maintained, by decomposing apparently sonoritybased phonotactic patterns such as those found in Greek, Klamath and Tashlhiyt Berber in terms of language-specific interaction between Sonority Sequencing and some independent markedness scales. While this approach may work, once interactions of independent scales, in this case phonetic scales, are fully taken into consideration, sonority loses its relevance and, we would argue, becomes epiphenomenal. That is, sonority is not a single property; rather, the patterns attributed to Sonority Sequencing are the result of a few broad perceptuallymotivated constraints which interact with other constraints and language-specific lexical contrasts to yield the phonotactics of particular languages. The perceptual motivation derives from scales of auditory cue robustness, based on objective acoustic dimensions and well-understood properties of the human auditory system. Another problem with Sonority Sequencing is its typological under- and over-generation. As for under-generation, it fails to predict typologically com-

70

Eric Henke, Ellen M. Kaisse and Richard Wright

mon sonority plateaus,3 particularly word-initial sibilant+stop clusters, as in English [stIk] (see Goad 2011 for a thorough discussion.)4 While sonority violations are not inherently problematic for grammars with violable constraints, proposing sibilant-specific syllable structures, rankings or constraints does not in itself address the problem that sibilants, and not obstruents in general, present. That is, while one could approach the problem with a constraint requiring an [s]+stop cluster to be syllabified, ranked above Sonority Sequencing, the fact that Sonority Sequencing, by itself, cannot derive such patterns results in a segment-specific stipulation. Simply saying that sibilants are an exceptional class of segment vis-à-vis sonority, e.g. by treating them as appendices, as extrasyllabic, or as part of a complex segment, without motivating their exceptionality, is no less stipulative. We are simply saying that however sibilant+stop sequences are treated formally, some discussion of why sibilants, and not other obstruents, participate in these exceptional sequences is merited. Moreover, if one is being consistent, an ad hoc constraint that allows [s] to violate the Sonority Sequencing Principle implies a whole raft of sonority-exception constraints – one for each consonant that is attested in the world’s languages – thereby predicting a very large number of unattested and unlikely sonority violations.5 A more insightful approach to phonotactics would account for the typologically common acceptability of commonly attested sonority violating sequences, such as sibilant+stop onsets, while still predicting the typologically common sequences that are not sonority violations, such as stop+liquid onsets,

3. In more elaborated scales, fricatives may be ranked as more sonorous than stops, in which case sibilant+stop represents a sonority reversal rather than a plateau, and is all the more problematical. 4. Goad (2011) notes in her survey of prevocalic clusters that s+stop is more frequent than s+sonorant, which is in turn more frequent than s+fricative. She claims that this pattern should not be observed because the loudness of the /s/ should mask the stop cues, while the sonorants and fricatives have internal cues that should survive masking. However, as discussed in some length in Wright (1996, 2004) the recovery period of the auditory nerve fibers is approximately 20 ms; hence the masking will be over before the stop cues occur, as they are primarily carried on the following vowel after the stop closure. On the other hand, her hierarchy nicely illustrates the role of modulation in sequencing preferences, since s+stop clusters will have the greatest modulation, followed by s+sonorants, which are in turn followed by s+fricatives. 5. Appealing to markedness arguments about /s/, for instance its common appearance in segment inventories, does not solve this problem, since it raises the same questions about how markedness is defined. After all /s/ is one of the latest learned sounds in language acquisition (Fee 1995), a pattern not predicted by frequency-based markedness.

Is the Sonority Sequencing Principle an epiphenomenon?

71

in terms of the same general principles. This, we will argue, a perception-based explanation can accomplish with ease. As for over-generation, sonority-based accounts of phonotactics must invariably be supplemented with other phonotactic constraints, such as the prohibition on [tl] and [dl] onset clusters in English. Again, it would be preferable to account for the phonotactic unacceptability of [tl] vs. the acceptability of [tô] in terms of the same general principles. Consider also the implications of language-specific coarseness in the sonority scale: the (relatively fine-grained) sonority scale posited by Dell and Elmedlaoui (1985) for Tashlhiyt Berber (also adopted in Prince and Smolensky’s (1993/2004) OT reanalysis): low vowels > high vowels/glides > liquids > nasals > voiced fricatives > voiceless fricatives > voiced stops > voiceless stops. Applying this particular scale to a language which requires a sonority distance of two steps in the onset, as in the English example above, we predict [tf], [bz], [fn], [vl], and [nw] onset clusters (but not [vn], [mr], or [lj]), none of which are present in English. Standard theorizing about sonority within phonology has been largely within the context of a syllable-based approach to phonotactics: as in the account of the ill-formedness of *[bnIk] above, segment sequencing must conform to the sonority hierarchy in order for the segments to be parsed as tautosyllabic onsets. Steriade (2001, 2009) argues, however, that phonotactic patterns are not predicated on syllabification, but rather follow from the restriction of phonological contrasts to contexts where they enjoy sufficient auditory cues to facilitate recovery on the part of the listener.6 For example, as noted in Steriade (2001, 2009) and as first noted in Bladon (1986), a pre-aspiration contrast is typically restricted to post-sonorant position cross-linguistically and post-aspiration to pre-sonorant position.7 The partial devoicing of the adjacent sonorant provides an important cue to the aspiration contrast. Crucially, the typology of aspirated stop phonotactics shows no sensitivity to the syllable position of the stop itself 6. Most of the segment sequencing patterns that we treat in this paper seem to be predictable from a combination of perceptual constraints and other constraints that do not refer to the syllable. However, we do not take a stand on whether syllables can be disposed of altogether in phonological descriptions. It may well be syllabification that underlies the difference in realization of the first postvocalic [p] vs. [t] in the brand name Applets ["æ.ph l@ts] and Cotlets ["kAt^.l@ts], but the SSP does not predict this syllabification distinction any better than a perceptual model does. (Note that the Applets in this example are not those of iPhone fame but rather a candied fruit delicacy of the Pacific Northwest.) 7. Steriade actually motivates a more detailed hierarchy of contexts for aspiration, further supporting the general claim made here, that contrasts occur preferentially in contexts with superior cues.

72

Eric Henke, Ellen M. Kaisse and Richard Wright

or to the tauto- or heterosyllabicity of the sonorant which licenses it. Steriade formalizes the Licensing-by-Cue proposal in terms of Optimality Theoretic hierarchies of context-sensitive featural correspondence constraints. Interleaved with articulatory markedness constraints (reflecting some notion of effort minimization), this constraint system results in a greater propensity for neutralization of a contrast (to the articulatorily less marked value) the weaker its cues in that context. This approach captures the implicational relationship that preaspirated stops in post-obstruent positions occur only if they are also found in postsonorant positions but not vice-versa. This general approach is applied to the distribution of a broad range of laryngeal contrasts, to height contrasts in nasalized vowels, and to retroflexion and apical/laminal contrasts in Steriade (2001); to further analysis of laryngeal contrasts in Steriade (2009); to place contrasts in consonant clusters in Jun (1995) and Steriade (2001); and to epenthesis in Steriade (2001). A similar approach is applied to the distribution of front rounded vowels in Flemming (2001). 3.

Speech Cues, Robustness and Precision

Since the SSP and related phonotactic constraints address restrictions on segment strings, it is useful to consider perceptual cues to place, manner, voicing, voice quality, etc. in light of how segment strings carry perceptual cues. There are several good discussions of perceptual cues in the phonological literature (see for example Wright 2004 among others), so we will keep this section relatively brief. We follow previous work in defining a cue as information in the acoustic signal that allows the listener to apprehend the presence of a segment and discern a phonological contrast. Below we summarize gestural and auditory factors that influence cue robustness. Finally we introduce the concept of cue precision, the degree to which a cue narrows down the number of perceptual segmental choices, and discuss how the interaction between cue precision and cue robustness influences the recoverability of a segment in a segmental string. 3.1. Cue Robustness and Cue Precision In the phonological literature the notion of contrast recoverability has become an important concept (e.g. Silverman 1997; Steriade 2001, 2009; Jun 2004; Flemming 2007; among others). Contrast recoverability itself is affected by two factors: robustness and precision. Following Wright (1996, 2001) and Benkí (2003), we define robustness as the degree to which the presence of a segment, and that segment’s contrastive information, is likely to be apprehended by a listener under normal listening conditions. We define precision as the degree

Is the Sonority Sequencing Principle an epiphenomenon?

73

to which the cue narrows the field of segmental contenders (e.g. Miller and Nicely 1955; McClelland and Elman 1986; Benkí 2003). These two principles are based on well-understood factors of gestural timing and auditory processing. First among the auditory factors to robustness is audibility: the louder the portion of the signal that bears cues to a contrast the more robust it is.8 Second is the temporal distribution of cues: cues that unfold over a long period of time (such as frication noise) or are distributed over a long portion of the signal (such as the spectral changes associated with vowel nasalization) are more likely to survive fluctuating background noise and changes in the listener’s attention (e.g. Miller and Nicely 1955; Miller 1956; Gordon 1988; Faulkner and Rosen 1999). Third is the impact of the segment sequencing on the auditory response: in a pair of sounds, greater amplitude and spectral change (referred to as modulation in Ohala and Kawasaki-Fukimori 1997) results in more robust information. The effect of modulation on robustness of encoding is related to the make-up of the segmental string and to the distribution of cues. Take, for example, a sequence of two fricatives with similar intensity and spectra in prevocalic position, such as /sSa/. On their own, both have high-precision internal cues, and both are high intensity. However, as a sequence, they will be more poorly discriminated from each other than a sequence of fricatives with different intensities and spectra, such as /sxa/, which will in turn be more poorly discriminated from each other than a sequence of a fricative and a stop, such as /ska/. Moreover, modulation prefers more abrupt onsets (the stop+vowel in the string fricative+stop+vowel) over more gradual onsets (the fricative+vowel in the string fricative+fricative+vowel). Since stops have no internal cues, while fricatives do have them, stops are in greater need of such an abrupt onset.9 Modulation of adjacent obstruents and the need for a stop to be adjacent to a vowel will figure critically in our section 6 discussion of prevocalic obstruent clusters in Modern Greek. Finally, CV strings and their ilk are preferred to VC strings; this is because in a pair of sounds, the lower the intensity of the preceding sound the greater the robustness of the following sound. Stated in terms of segmental ordering: cues that are found in transitions out of a quiet sound (such as a stop closure) are more reliably recovered than cues that are found 8. For example, while /f/ and /s/ are both fricatives, and therefore carry internal cues in their frication spectrum, they are not of equivalent loudness. Therefore, their internal cues are not equally robust; the /f/ cues are more likely to be masked by noise than those of the /s/. However, in most sonority based analyses, the two are equal in sonority. 9. While voiced stops may have periodicity during their closure, the signal generated is so low in amplitude that it is on its own a poor cue, even for the voicing contrast, in the absence of a flanking vowel (ex. van Alphen and Smits 2004).

74

Eric Henke, Ellen M. Kaisse and Richard Wright

in transitions out of a loud sound (such as a vowel). In a pair of sounds, the lower the intensity of the preceding sound the greater the robustness of the following sound. This has led to a frequently replicated finding that consonant cues are typically more recoverable in prevocalic position than in postvocalic positions (e.g. Miller and Nicely 1955; Wang and Bilger 1973; Fujimura, Macchi and Streeter 1978; Wright 2001; Benkí 2003). In general (and with certain segment-specific exceptions), manner contrasts are the most robustly encoded, while laryngeal contrasts are more robust than place contrasts (e.g. Miller and Nicely 1955; Benkí 2003). Perhaps most important to robustness is cue redundancy: the greater the cue redundancy the more robust the encoding is. Cue redundancy is dependent on the interaction of the articulatory gestures and their relative timing. In considering perceptual robustness in relation to gestural timing (see Wright 1996, 1999, 2004; Chitoran, Goldstein and Byrd 2002; Goldstein et al. 2007 for discussions), sequences of speech sounds where gestural overlap results in increased cue redundancy create a more robust encoding of segmental information than those where increased overlap results in a paucity of cues. There are two main factors to consider: 1) does the resulting signal create internal cues (such as frication noise or nasal murmur) or transitional cues (such as formant transitions or glide formant movement), and 2) does a segment’s constriction obscure information about flanking segments (as obstruents and nasals typically do) or does it carry information about flanking sounds (as vowels and glides do)? Coarsely speaking, ordering segments in terms of their degree of constriction from greatest constriction to greatest aperture, followed by increasing constriction after the aperture peak, ensures the greatest benefit from gestural overlap. Thus, without resorting to sonority itself or to syllable structure, the resulting sequencing of sounds looks very much like the traditional sonority sequence without some of its problem sounds (such as fricatives in general, and /s/ in particular): obstruents and nasals (full or nearly full occlusion in the oral tract) > approximants (liquids > glides) > vowels.10 In addition, sounds that have internal cues (such as fricatives) can bear more gestural overlap without the loss of cues than sounds that rely on transitions; their cues can therefore remain recoverable without a flanking vowel. The degree to which all of their contrasts 10. The greater the degree of stricture, the lower the amplitude of a sound and the poorer it is at carrying transitional information about flanking consonants. Therefore this aperture scale is coarser than a fully articulated cue robustness scale would be. For instance, low vowels are typically louder than high vowels and therefore more likely to be associated with characteristics sonority-based treatments attribute to high sonority, such as attracting stress.

Is the Sonority Sequencing Principle an epiphenomenon?

75

are fully recoverable depends on audibility (loudness) and precision of their cues. 3.2. Cues Summary The acoustic signal is produced by articulatory gestures, constrictions and openings that are continuous and overlapping to a greater or lesser degree. This can be modeled as a time-varying filter with resonances that shapes the spectrum of the sound source (e.g. Fant 1960). For example, when a consonant constriction is superimposed on an adjacent vowel the deformation of the vocal tract results in localized perturbations of the vowel’s formant structure as the vocal tract changes shape, creating what are generally referred to as formant transitions. Similarly when a stop closure is released into a following segment with a lower degree of stricture, the pressure buildup that occurs during the closure is equalized, resulting in a brief (5–10 ms) release burst. In stops, place cues are found in the formant transition frequencies of the flanking vowels and in the release burst spectrum; manner cues are found in the closure silence (or near silence), the presence of a release burst, and the slope of the formant transitions; laryngeal contrast cues are found in the intensity of the release burst, pitch perturbations at the following vowel onset, the duration of the closure, the duration of the VOT, and in the type of sound that fills the space between the release burst and the onset of the vowel’s voicing (e.g. aspiration noise, silence). Consonant place and manner information can also be carried in the shaping of the amplitude envelope and onset spectrum of /s/ in stop+s or s+stop clusters, albeit in a much less precise way than is done in formant transitions (Bailey and Summerfield 1980; Mann and Repp 1981). Formant transitions cues are highly robust (periodic, high intensity, mid- to lowfrequency); release bursts cues have high precision but low robustness; and the signal attenuation caused by the stop closure carries precise cues to manner but is robust only in the presence of high intensity flanking sounds. Typically, formant transitions in prevocalic (CV) position are more robust than those in postvocalic (VC) position; however, stops that show a large amount of vowel coloring when they are found in VC position are found in some studies to be more recoverable than their CV counterparts. As discussed in Steriade (2001, 2009), and elsewhere, retroflex stops (and nasals) fall into this class (e.g. Öhman 1966; Hamann 2003:Ch 3 for retroflexes). Nasal stops, unlike oral stops, contain highly robust internal cues in the “nasal murmur” and the “pole – zero” pattern in their spectrum. These cues ensure that the presence of a nasal segment and its manner of articulation will be recovered even in the absence of a flanking vowel. However, place cues in the nasal murmur are low precision. That

76

Eric Henke, Ellen M. Kaisse and Richard Wright

is, while they are heard as nasals, they are heavily reliant on flanking vowels for their place cues. Like retroflexes, velar nasals’ place cues are more recoverable in postvocalic position than in prevocalic position (e.g. Harnsberger 2001; Narayan 2008), although for nasals, place cues show less precision overall than for oral stops. Like nasals, oral fricatives have internal cues to place, manner, and voicing contrasts. The fricative noise provides high precision cues; however, the loudness of the frication noise determines the robustness of these cues. Sibilants (e.g. /s, S, ù/) have high intensity frication noise. As a result, their presence is readily detected by listeners and their place and manner cues recovered. Other fricatives do not have this advantage and are therefore more reliant on transitional cues, formant transitions and vowel rise-time, for their accurate recovery. Unlike other consonants, glides (such as /j/ and /w/) and liquids (such as /l/ and /ô/) have clear formant structure throughout their durations, which act as a cue to their manner. Glides are distinguished from each other, as their high vowels counterparts are, by the distance between the first and second formant value at the peak of the consonant constriction, and they are distinguished from vowels by rapidity of the formant movement and the lack of a steady state portion. Because their cues are dynamic in nature, glides and laterals are highly dependent on the presence of a neighboring vowel (underlying or epenthetic) to carry the transitional information that distinguishes them from syllabic counterparts. Both glides and liquids have a formant structure. Therefore, while they rely on a flanking vowel to be properly perceived, they can provide transitional cues about a preceding or following consonant. Because the liquids have a closer stricture than glides, especially /l/ which typically has a partial closure in the oral cavity (Ladefoged and Maddieson 1996:183), they are slightly worse than glides at providing transitional cues to flanking consonants. While both glides and liquids can provide cues to flanking consonants, the dynamic-transitional nature of glides and particularly liquids may obscure the formant transition cues to flanking consonants, thereby interfering with place cues. The more robust a segment’s internal cues are and the better it is at carrying transitional cues to flanking segments, the more optimal a syllable nucleus it is. This makes vowels the most likely candidates for syllable nuclei and stops the least. One other factor to consider is modulation: the greater the modulation (both in frequency and in amplitude) the greater the auditory and attentional impact. Alternating loud resonant segments, such as vowels, with quiet occlusive segments, such as stops, optimizes both modulation and transitional cues. As is discussed in Wright (1996) and elsewhere, studies of the impact of speech sounds on the neurophysiology (both in the auditory pathway and in the audi-

Is the Sonority Sequencing Principle an epiphenomenon?

77

tory cortex) reveal that greatest auditory benefit of modulation is achieved when there is the largest amplitude and frequency change and when the acoustic edge is sharpest (as in a stop+vowel sequence). In this way, perceptually-motivated segment ordering makes predictions similar to the SSP. We will be talking primarily about sequences of consonants in the remainder of this paper. But to conclude this section, let us consider the status of particular vowels as ideal syllable nuclei and as bearers of stress in quality-sensitive systems. The sonority hierarchy has done well in capturing these kinds of phenomena using the same concepts as it uses for segment sequencing. But cue theory can be equally successful. In considering what makes a sound well-suited to stand on its own (as a bare V syllable, for instance), the robustness of its internal cues is critical. Vowels make good stand-alone sounds because their internal cues are loud, periodic and redundantly encoded (in that they have a long duration, as discussed in Wright 1996 inter alia). We have already seen that vowels are ideal bearers of transitional cues for adjacent sounds and discuss this further in the next paragraph. And what about the likelihood of low and peripheral vowels attracting stress in a quality-sensitive system, such as those discussed by Kenstowicz (1997)? These distinctions have most commonly been viewed in terms of their sonority. The orderings Kenstowicz deduces are roughly these: a > e o > i u > @ > 1. In other words, in a quality-sensitive stress system, low vowels preferentially receive stress over mid vowels, mid over high peripheral, and high peripheral over high central. This ordering is likewise expected within an acoustic cue approach. It follows well-established hierarchies of the properties of vowel loudness and length and thus meshes logically with prominence, grammaticized as stress in such systems. As we have said, segmental ordering (and hence syllable structure) is profitably viewed from the perspective of how good a carrier of transitional cues a sound is for flanking segments, together with modulation. Vowels are the best carriers of transitional information (they are periodic, have strong formant structure, and are loud) and therefore they make the best nuclei. The louder a vowel is the better a carrier it is (low > mid > high) (ex. Lehiste and Peterson 1959; Peterson and Lehiste 1960). In terms of modulation, the greater the difference between a vowel and a flanking sound, the better a carrier it is as well. Vowel height is inversely correlated with F1 height. All consonants with an oral constriction (i.e. not glottal consonants) cause a dramatic lowering of F1 because their constrictions create a Helmholtz resonator. In a Helmholtz resonator, the narrower the constriction and the longer the area of constriction the lower the resonance is. High and (most) mid vowels are also Helmholtz resonators, with high vowels having the longest and narrowest constrictions – their first formants are very similar to the F1 loci for non-glottal consonants, creating the least mod-

78

Eric Henke, Ellen M. Kaisse and Richard Wright

ulation. As for [1], it is the least different of all the vowels (generically) from all consonants (generically) on both the F1 and the F2 dimensions. The following table summarizes what we have said about the internal cues of various speech sounds, about their best position in a string of segments, and about their suitability as carriers of cues to the identity their neighboring segments, in terms of manner, of place, and of voicing. Table 1. Summary of cue robustness as determined by segment-internal cues, and of transitional cues to flanking sounds carried by segment class vowels glides liquids nasals sibilant fricatives other fricatives stops

manner cues internal carrier robust good

voicing cues internal carrier robust good

place cues internal carrier robust good

poor11 poor robust robust

good good poor medium

robust robust robust medium

good good poor poor

medium medium poor robust

medium medium poor poor

medium

poor

poor

poor

medium

poor

optimal position C_, _C, C_C, _ _V, V_ _V, V_ _V, V_ _V, V_, _CV, VC_ _V, V_

poor

poor

poor

poor

poor

poor

_V, V_

3.3. Interaction between segmental ordering and cue recovery Stops, glides and liquids all depend on transitional cues and are therefore largely dependent on formant transitions for the full recovery of their contrasts. Nonsibilant fricatives and nasals, while carrying internal cues to their manner, are largely dependent on transitional cues for their place contrasts. While glides and liquids are most dependent on a flanking vowel, other consonants’ information can be carried by perturbations of glide or liquid formant structure. Prevocalic position provides the most robust cues, particularly for stops, because it ensures a release burst in addition to formant transitions, while place neutralization is likely to occur if stops occur before another obstruent, which typically cannot carry cues. Loud fricatives, particularly the sibilant fricatives, are least dependent on formant transitions and are therefore expected to be the “stranded” segment (without a flanking vowel) in obstruent clusters. This predicts that a language that permits other types of obstruent clusters, such as a stop+stop or a non-sibilant fricative+stop, will also permit a sibilant+stop. As summarized in 11. Glides and liquids must be dominated by formant movement to be distinguished from their [+syll] counterparts, which usually achieve at least some steady state duration.

Is the Sonority Sequencing Principle an epiphenomenon?

79

Wright (2004) and as can be deduced from what we have said above, modulation, cue redundancy and cue dispersion favor the following sequences in order of preference: (1)

– CV > CVC (where coda consonants either have strong transitional cues, like glides and liquids, strong internal cues, like nasals and fricatives, and where nasal and oral stop place contrasts are likely to be neutralized) – CV > CGV ≥ CLV (L = liquid)12 – CV > SCV (S = sibilant fricatives) > FCV > (F = non-sibilant fricative)

More specific ordering restrictions can also be deduced from the information presented in the previous sections. For instance, a stop+liquid+vowel sequence will be preferred over a liquid+stop+vowel because the liquid is highly dependent on the V for its complete identification, while the stop can benefit from transitional cues perturbing the liquid, even though the liquid is not as ideal a carrier a vowel. 4.

Typology of CCV obstruents and VCC sonorants

In her typological survey of languages that allow obstruent clusters in prevocalic position, Morelli (1999, 2003) found that the most attested pattern was that of a fricative followed by a stop. Furthermore, she presents an implicational relationship; the presence of any other prevocalic obstruent cluster implies the presence of a fricative-stop cluster. Morelli’s survey leaves out some important details.13 The most commonly attested pattern is not simply a cluster of any fricative and a stop, but sibilant fricative-stop clusters. Out of the 25 languages for which she provides onset cluster charts, 7 have sibilants (usually /s/) as the only fricatives in their inventory. Out of the remaining 18 languages, 8 allow only sibilants to cluster with stops, 3 also allow some non-sibilants (but not all) to cluster, and the remaining 7 have no restrictions whatsoever on which fricatives may cluster with stops.

12. The distinction between CG and CL is clearly problematic since there are languages that permit CGV but not CLV and others that permit CLV but not CGV (Clements 1990; Parker this volume). 13. Importantly, she only examines word initial clusters, and makes no attempt to address prosodic affiliation of /s/ in clusters.

80

Eric Henke, Ellen M. Kaisse and Richard Wright

All of the languages in the survey which have more fricatives than /s/ clearly favor the sibilants as the C1 in C1 C2 V obstruent clusters. All of the languages that allow fricative+stop clusters include sibilant+stop clusters. Most of the languages have more sibilants in their inventory than non-sibilants, and most allow sibilants to precede a greater variety of stops than non-sibilants do. No language, even those that allow any fricative to cluster, allows any individual fricative to cluster with more stops than /s/ does, although some allow ties. Mawo allows 5 clusters beginning with /s/, and five beginning with /x/ and /X/ each; while Haida allows 5 clusters with /s/ and 5 with /ì/. The pattern above is not surprising from the point of view of phonetic cues. The sibilant fricatives have robust internal cues, due to the intensity of the frication, that allow them to be recovered in the absence of formant transitions, making them better suited to surviving on the edge of syllables. For the languages that allow some, but not all, fricatives to cluster, there is also a preference for the louder, more sibilant-like fricatives. Haida, for instance, allows the relatively loud /ì/ to cluster, but not /x/ or /X/. Seri is more lenient, allowing for /x/ and /X/, but not /F/, its quietest fricative.14 The exception is Mawo, which allows all of its fricatives to cluster except for /C/, a sibilant. The Mawo exception highlights the fact that, like the SSP, perception-based phonotactic restrictions are strong tendencies (indeed much more exceptionless than the SSP) rather than inviolable universals. From our viewpoint, perception has a strong influence over diachrony, so that a cluster beginning with a fricative whose identity is hard to recover is unlikely to survive. But as we shall see in section 7, languages do sometimes allow such hard-to-recover sequences to persist, often enhancing their realization in some way that makes them easier to perceive. Seo’s (2003b) survey of 31 genetically distinct languages reveals that perceptual factors relating to segment sequencing, rather than Syllable Contact, account for phonotactic restrictions on sonorant sequences (see also Seo 2011).15 More specifically, she finds that modulation (viz. Ohala and Kawasaki-Fukimori 1997) restrictions on sequences of otherwise similar segments together with auditory factors that influence cue robustness predict the typological patterns that motivated Syllable Contact. Perceptual factors interact with languagespecific restrictions, such as the prohibition on /l/ in pre-vocalic position in Ko-

14. The voiceless bilabial fricative /F/ is notoriously quiet in all languages in which it has been measured (Ladefoged and Maddieson 1996). 15. Seo restricts her observations to intervocalic consonant sequences, presumably because such strings allow her to compare the predictions of Syllable Contact with perceptually-motivated segment sequencing.

Is the Sonority Sequencing Principle an epiphenomenon?

81

rean16 , to arrive at language-specific phonotactic restrictions. In particular, she finds that assimilation patterns and other alternations that might at first appear to be motivated by the Syllable Contact Law are rather motivated by more detailed segmental similarity for which the sonority hierarchy is irrelevant. For example, in Moroccan Arabic, intervocalic homorganic nasal plus liquid sequences optionally undergo complete assimilation, while heterorganic nasal plus liquid sequences surface unmodified. Syllable Contact cannot be the sole explanation, since both homorganic and heterorganic nasal plus liquid sequences show a rise of sonority that should violate the Syllable Contact Law. Moreover, in Uyghur, the homorganic /nl/ sequence surfaces as [ll] as the result of assimilation, while the heterorganic, Syllable-Contact violating, /ml/ and /Nl/ sequences do not undergo any phonological change (Hahn 1991). This sort of place-specific restriction is clearly outside any but the most detailed, and least typologically plausible, sonority hierarchies. As we will see in section 5, Seo offers a satisfying explanation that lies in the greater difficulty of perceiving the identity of both consonants in a homorganic sequence. Moreover, Seo finds that there are no languages in her survey where obstruent+liquid assimilation takes place where nasal+liquid assimilation does not. For example, Tatar, /nl/ and /ml/ surface as [nn] and [mn], respectively, while /kl/ and /tl/ do not undergo any phonological modification. Yet the obstruent+liquid clusters violate Syllable Contact more severely than the nasal+liquid ones.17 Consider also Seo’s observation that in Toba Batok /p/ and /t/ become [k] before a liquid. This seems like a very different kind of change than the manner adjustments that are often seen through the lens of Syllable Contact. But she argues that the explanation is related if seen from the standpoint of enhanced perceptibility. [k] in Toba Batok is regularly released in this presonorant position while /p/ and /t/ are never released except before vowels. Thus the change in place of articulation is motivated by the greater perceptibility of a pre-consonantal released stop, while Syllable Contact has no obvious explanation for such a change. This kind of example falls under our third possible outcome of a perceptually difficult sequence, namely that the articulation of the segments is enhanced so that gestural overlap is limited and the first con16. We obviously cannot claim that perceptual sequencing predicts all missing combinations of sounds, nor even that there will be no exceptions to it in any language. However, we are claiming that the exceptions will be random, rather than following typological patterns as many of the exceptions to the SSP do. 17. Baertsch and Davis (2004) present a possible counter-example to this pattern: in the Turkic language Bashkir, an input obstruent+liquid sequence undergoes manner assimilation to surface as obstruent+obstruent, while a nasal+liquid sequence surfaces as nasal+obstruent.

82

Eric Henke, Ellen M. Kaisse and Richard Wright

sonant is better perceived. Our discussion of Tsou in section 7 will provide a more detailed example. Another pattern that Seo finds that is not predicted by Syllable Contact but which is predicted by perceptual sequencing involves nasals. Nasals have poor internal place cues and their CV transitions are perceptually weak, leaving the nasals with robust manner cues but weak place cues. While other sonorant+ obstruent sequences in her sample typically show more stability when the sequence is hetero- rather than homorganic, nasals show the opposite pattern: they are more likely to show assimilation to the following obstruent when the two consonants are heterorganic. There are also place-specific patterns where cues to place or manner are weaker in certain consonants than in others. Korean provides a good example of this, as discussed in Jun (2004). In the next section, we look at two kinds of patterns in Korean clusters which are well-explained by cue-theory but puzzling in terms of sonority sequencing or syllable contact. 5.

Korean consonant phonotactics

We turn now to a detailed consideration of processes altering consonant sequences in Korean. Some of the phenomena we discuss here have repeatedly been seen as exemplifying the sonority-based Syllable Contact Law. However, we will argue that taken as a whole, they are actually quite problematic for that law and make more sense when viewed from the perspective of cue recovery. Furthermore, we will see that sonority sequencing and Syllable Contact have nothing to say about changes in place of articulation of adjacent consonants, a pervasive process in Korean. The same principles of cue robustness and precision that we have outlined do much better in elucidating the reasons for these additional cases of segment and syllable contact. The processes we will discuss are these: a. an obstruent becomes a nasal before a nasal b. a liquid becomes a nasal after a nasal c. a noncoronal stop followed by a liquid is doubly altered to produce a sequence of two nasals d. a coronal stop becomes a liquid before a liquid e. a coronal nasal becomes a liquid before a liquid f. a coronal nasal becomes a liquid after a liquid g. coronal stops assimilate in place to the following labial or velar h. labial stops assimilate in place to the following velar

Is the Sonority Sequencing Principle an epiphenomenon?

83

In each of these cases a perceptually weak sequence succumbs to assimilation in line with approaches such as Steriade’s (2001, 2009) Licensing-by-Cue (for a synchronic analysis), or by Blevins’ (2004) Evolutionary Phonology (for a diachronic analysis.) 5.1. The Korean consonant inventory We begin with a brief overview of the Korean consonants. Korean has a relatively large obstruent inventory which includes a laryngeally three-way contrastive series of stops, with voiceless, aspirated, and fortis segments at the labial, alveolar and velar places of articulation. The palatal affricates also exhibit this pattern. Three fricatives exist: the alveolar fricative /s/, the fortis alveolar fricative /s’/, and a glottal fricative /h/. The sonorant inventory is typical, with three nasals (labial, alveolar, and velar), one alveolar liquid /Ï/18 , and two glides, /j/ and /w/. (2)

Consonant Inventory of Korean (adapted from Kim-Renaud 1974) Labial Alveolar Palatal Velar Glottal Stop p ph p’ t th t’ k kh k’ Affricate tC tCh tC’ Fricative s s’ h Nasal m n N Liquid Ï Glide w j

The coda is a heavily restricted position in the Korean syllable. In this position all stop contrasts are neutralized to an unreleased stop at each place of articulation. The palatal affricates also neutralize to an unreleased alveolar stop [t^], as do the fricatives. The nasals and the liquid are licensed in this position, whereas the glides are not. This allows seven possible coda segments. (3)

Possible coda consonants [p^, t^, k^, m, n, N, l]

While Korean has constraints against obstruents in the coda, in the onset, sonorants are more limited. The velar nasal /N/ is not licensed in the onset. The liquid may be present underlyingly in Sino-Korean morphemes, but surfaces as

18. We use the cover symbol Ï for the single liquid phoneme in Korean, which varies in its lateral and rhotic properties depending on context (Iverson and Sohn 1994).

84

Eric Henke, Ellen M. Kaisse and Richard Wright

[n] in the onset, and as the flap [R] between vowels. All other segments freely appear in the onset. 5.2. “Syllable Contact” processes Korean has several well-known sonorant assimilation processes that are typically given as evidence for the sonority-based Syllable Contact Law (Iverson and Sohn 1994; Davis and Shin 1999). Davis and Shin’s version of the Syllable Contact Law is a constraint that serves to avoid rising sonority over a syllable boundary. That is, the coda of one syllable should be of higher sonority than the onset of the next. The Korean processes usually given as evidence for this constraint are obstruent-nasalization, liquid-nasalization, and coronal assimilation, brief examples of which are given below. (4)

(5)

(6)

Obstruent-nasalization /kuk-muÏ/ [kuM.mul] soup-water Liquid-nasalization a. /kam-Ïi/ [kamni] look-manage b. /pak-Ïam/ [paM.nam] wide-view Coronal assimilation /tikWt-ÏiWÏ/ [ti.kWl.li.Wl] t l

‘broth’

‘supervision’ ‘exhibition’

‘the letters t and l’ (Davis and Shin 1999)

In each of these processes, a rise in sonority across a syllable boundary is repaired by altering one or both of the segments to create a sonority plateau across the syllable boundary. The most complex process is that shown in (5b), in which both segments are changed, with both the stop and the liquid becoming nasals. Liquid Nasalization only occurs if the nasal is non-coronal. If the coronal nasal /n/ comes into contact with the liquid /Ï/, a geminate [ll] is created, as shown in (7). (7)

Nasal-lateralization a. /tan-Ïan/ [tal.lan] ‘happiness’ group-joy b. /siÏ-ne/ [Sil.le] ‘indoors’ (Davis and Shin 1999) room-inside

When the nasal precedes the liquid, the process looks like another argument for syllable contact; the lower sonority nasal becomes a liquid to avoid rising

Is the Sonority Sequencing Principle an epiphenomenon?

85

sonority. Crucially, however, the process occurs when the liquid precedes the nasal as well. This has been a problem for syllable contact based accounts, as it takes a perfectly good sonority sequence and creates a violation. However, a perception-based account can handle nasal-lateralization, the so-called syllable contact processes, and other processes that sonority-based accounts can say nothing about. 5.3. Perception-based account and cue robustness The perceptual account for the behavior of /n/ and /Ï/ in Korean rests on their phonetic similarities. They are homorganic sonorants and lack the perceptual contrast to appear adjacent to one another (Flemming 1995). This is motivated by the fact that a ban on adjacent alveolar sonorants is common cross-linguistically (for a detailed list, see Seo 2003b), as well as by experimental evidence (Seo 2003a). In a perceptual discrimination experiment, Seo asked native speakers of Korean, Moroccan Arabic, and Swedish to label pairs such as [alna]/[alla] and [aNla]/[alla], among others as “same” or “different”. Her dependent variable was reaction time, where a longer time is associated with greater difficulty discriminating sounds. She found that the pairs with a sequence of homorganic sonorants were more difficult to discriminate from one another (had longer reaction times) than those with heterorganic sonorants, supporting the hypothesis that [n] and [l] are perceptually similar. The similarity of [n] and [l] can also explain the pattern in (5a), where the liquid becomes a nasal following a noncoronal nasal, if we follow the P-map hypothesis that languages more readily tolerate changes which listeners do not easily perceive to have occurred (Steriade 2009). In her experiment, Seo found that the pair [amla]/[alla] was easier to distinguish than was the pair [amla]/[amna]. That is, changing the liquid to a homorganic nasal creates less of a perceptual change than changing both the place and manner of the nasal. A perception-based account is also available for the data in (4), obstruentnasalization. In the string stop+nasal+vowel, the preconsonantal stop, which is unreleased and relies on a relatively poor VC transition, is perceptually weak and therefore a better target for assimilation than the prevocalic nasal, which has robust internal cues to manner and the optimal CV transition. The nasalization of non-coronal obstruent+liquid clusters, as in (5b), presents an example of how language-specific phonotactic restrictions interact with perceptually motivated selection of output candidates. Korean prohibits singleton laterals in prevocalic position. The form that is selected for in the output, [n], is well-formed in prevocalic position and is the least different segmental choice. Now that the language-specific restriction against laterals is satisfied,

86

Eric Henke, Ellen M. Kaisse and Richard Wright

the nasalization of the non-coronal obstruent follows from the explanation in the preceding paragraph. 5.4. Changes in adjacent consonants of the same manner Sonority deals with the interplay between the manner classes – obstruents (or perhaps stops and fricatives classed separately), nasals, liquids (or perhaps, laterals and rhotics separately), and glides – and as such cannot describe or predict processes between segments of the very same manner class, such as two adjacent stops, two nasals, or two lateral liquids. However, appealing to speech perception and phonetic cues can explain processes such as Korean’s asymmetries in stop assimilation. When two stops are adjacent in Korean, the first may assimilate to the place of the second. Whether this occurs depends on the places of both stops. A coronal stop will assimilate to the place of a following labial or velar stop. (8)

a. /path -pota/ [pap^p’oda]19 field-than b. /path -kwa/ [pak^k’wa] field-and

‘rather than the field’ ‘field and’

(Kim-Renaud 1974)

A labial stop will also assimilate to a following velar. However, if the following stop is coronal, no assimilation will occur. (9)

a. /th op-kh al/ saw -knife b. /pap-to/ rice-also

[th ok^kh al]

‘handsaw’

[pap^to] *[pat^t’o]

‘rice also’ (Kim-Renaud 1974)

Finally, when a velar stop precedes a coronal or a labial, no assimilation will occur. (10)

a. /pak’-to/ [pak^to] *[pat^t’o] outside-also b. /kuk-po/ [kuk^p’o] *[kup^p’o] nation-treasure

‘outside also’ ‘national treasury’ (Kim-Renaud 1974)

In summary, coronal stops will assimilate to the place of whichever stop follows them, whereas labial stops only assimilate to velar stops, and velar stops do not assimilate at all. 19. Recall that stops’ laryngeal contrasts are neutralized in the coda. Additionally, a voiceless unaspirated stop becomes fortis following another stop.

Is the Sonority Sequencing Principle an epiphenomenon?

87

While sonority-based approaches like the Syllable Contact Law can say nothing about such processes, the above generalization follows rather straightforwardly from a perceptual approach. The most reliable cues to place in a stop reside in formant transitions into a vowel (CV), which are more prominent than the transition into the consonant (VC). Therefore, C1 in a /VC.CV/ sequence is in a perceptually weak position, as it lacks a CV transition. Korean stops are also unreleased in the coda, and so they lack a release burst, making them even more perceptually weak. In this case the only cues to place reside in the VC transition, which varies by place. Of all the stops, the velar has the most reliable cues to place in the VC transition. Before the velar closure, the second and third formants come together to form the ‘velar pinch’ that cues the listener to the place of articulation. Labial stops cause some lowering of both the second and third formants, although less severe than the transitions for the velars. Finally, the coronal stops cause less movement of the formants than either the labials or the velars, and are therefore less perceptually salient (Jun 2004). Given the above, the asymmetrical pattern of Korean stop assimilation is logical. The unreleased coronal stop has almost no cues to place, and assimilates to the place of the following consonant, which, with its release burst and CV transition has robustly encoded place cues. The labial stop has more place cues in its VC transition, so it does not assimilate to the following coronal, but only to the perceptually stronger following velar. The unreleased velar, with the strongest cues in its VC transition, is perceptible in this position, and so it does not undergo any place assimilation. The lack of robust cues in an unreleased stop also explains the behavior of stops in the so-called syllable contact processes. Unreleased coda stops become nasals before nasals because they lack robust cues, while nasals have robust manner cues. Gestural overlap (anticipatory lowering of the velum) turns a stop into nasal in this environment. Similarly, in coronal assimilation, an unreleased coronal stop assimilates to the following liquid, producing a geminate [ll]. This follows from the same reasoning; an unreleased stop, especially a coronal, is perceptually weak, and when adjacent to a perceptually strong segment such as a liquid, is masked by the liquid. This masking effect can be reflected in a phonological process. To conclude this section, we must bring up another Korean coda-neutralization process that does not make particular sense either from a perceptual standpoint or a standard syllable-contact one. /s/, /h/, and the palatal affricates are neutralized to [t] in coda; in other words, only [-continuant] obstruents are permitted in coda position. Since /s/ is rich in internal cues, it is not expected that it should suffer alteration in this position. We suspect this is a case where the

88

Eric Henke, Ellen M. Kaisse and Richard Wright

phonological generalization disallowing [-continuant] in coda trumps perceptual motivation. Korean simply disallows most obstruent features in coda, including laryngeal contrasts and continuancy. (See Hayes and Steriade 2004:1517 for a discussion of phonological generalizations vs. phonetically motivated patterns of this sort.) 6.

Modern Greek onset obstruent clusters

Modern Greek provides an excellent case against which to test the contention that cue-based theory does a more satisfactory and inclusive job of describing consonant phonotactics than the sonority sequencing principle does. Because Morelli’s (1999, 2003) large survey of onset obstruent cluster typology perforce results in a superficial treatment of all the languages she includes, it is worthwhile to investigate the details of obstruent clusters in a language whose phonology is well-documented and well-studied. Since sonority sequencing has little to say about such clusters, the Greek case further supports the point we made in our discussion of Korean, namely that cue theory can supplant the need for sonority sequencing. Modern Greek represents a different kind of outcome for perceptually weak sequences: segmental strings which are perceptually weak have changed in a way that has increased perceptual robustness. Modern Greek has undergone a sound change in its development from its parent, post-Classical Greek, whereby obstruent clusters have been altered to avoid sequences that agree in continuancy (Philippaki-Warburton 1980; Kaisse 1993). The sound change endures as a productive synchronic pattern of alternations in the language. The majority of clusters are altered to achieve the pattern fricative+stop. For example, the inherited fricative+fricative cluster in *xTes ‘yesterday’ emerges as [xtes], while the underlying stop+fricative cluster in /plek-T ik-e/ also achieves the template by changing the continuancy of both members, emerging as [plextike] ‘it was knitted.’ /s/ has proven inalterable, so we find stop+s and s+stop whatever the underlying source: /e-Graf-sa/ → [eGrapsa] ‘s/he wrote’, for instance. Since Modern Greek does not permit any coda clusters (Kaisse 1977; Eleftheriades 1985; Joseph and PhilippakiWarburton 1987), all these adjustments are either unarguably in onset position (word-initially) or, in the intervocalic cases, probably in onset (Joseph and Philippaki-Warburton 1987) though conceivably divided between coda and onset (a syllable contact position.) 20 20. We use the phrase ‘in onset’ loosely, since, as we have mentioned earlier, there are certainly conceivable structural analyses where the first of two word-initial consonants is not strictly in the onset but rather in some appendix that precedes the onset.

Is the Sonority Sequencing Principle an epiphenomenon?

89

The relevance of the Greek case is clear. First, because these adjustments involve obstruents adjacent to other obstruents (like the Morelli cases summarized above), a relatively coarse sonority scale says nothing about what order is best for them. Stops and continuants are equal in sonority and thus form a sonority plateau. A finer-grained scale such as that advocated by Parker (2002, 2008), Baertsch (this volume) and Miller (this volume) would predict that onset obstruent clusters should consist of stops plus fricatives, an increase in sonority as one approaches the nucleus – quite the opposite of what the Greek sound change and synchronic continuancy adjustment processes produce and thoroughly contradicted by the results of Morelli’s wide-ranging survey. Second, the Greek obstruent clusters, though quite permissive in what fricatives they permit in the initial position of the cluster, nonetheless follow the generalization that we noted in our survey and refinement of Morelli’s results. There are more clusters involving [s] than any other fricative – a generalization we ascribe to the excellent internal cues of sibilants compared to other fricatives. Third, the possibility of [s] (and no other fricative) appearing as the second member of a cluster can be explained by the relative intensity of sibilants, which allows them to actually carry cues to the identity of preceding consonants. We will go through the clusters occurring in Modern Greek in some detail, showing that cue theory predicts most of the data we encounter. In a few cases we will see that neither cue theory nor the SSP has anything helpful to say.

The evidence for maximization of the onset in V+Obstruent+Obstruent+V sequences is suggestive but circumstantial. Native speaker intuitions reported by Joseph and Philippaki-Warburton (1987) and Malikouti-Drachmann, (1987), among others, place both obstruents in the onset. No Greek word can end in an obstruent other than [s], suggesting again that obstruents cluster in the onset even after a vowel. Finally, we will see that Greek is unusually permissive in the wide range of word-initial clusters it countenances, suggesting that maximizing onsets will produce clusters that are fully acceptable in the language. Unfortunately, Greek does not have quantity-sensitive stress or other phonological processes that could provide converging evidence to support this phonotactic and judgment-based division. Therefore it is possible that only word-initial cases differentiate sonority-based versus cue-based explanations in favor of cues. If the consonants in a V-fricative-stop-V sequence are heterosyllabic, and if stops are less sonorant than fricatives (a controversial assumption), then Syllable Contact would equally predict the improvement that intervocalic, heterosyllabic fricative+stop represents over other obstruent sequences.

90

Eric Henke, Ellen M. Kaisse and Richard Wright

Modern Greek contains the following obstruents: (11)

Underlying obstruent system of Modern Greek p t k θs x f δz γ v

We will now consider what happens when these obstruents are adjacent to one another. 6.1. Processes achieving the fricative+stop onset template and/or eliminating adjacent obstruents of the same manner Five types of clusters are eliminated by processes that alter the feature [continuant]: a. clusters of adjacent voiceless fricatives show defricativization of the second consonant 21 b. adjacent stops undergo fricativization of the first consonant c. a cluster of stop plus fricative is doubly altered and emerges as a fricative plus a stop d. clusters where the second fricative is /s/ undergo defricativization of the first consonant e. clusters of two /s/’s or of /z/ plus /s/ are degeminated (as are all clusters of coronals) We will illustrate each of the processes in turn. a. Fricative + fricative becomes fricative + stop When the passive suffix /-T-/ is added to a stem ending in a fricative, it emerges as the stop [t]. The first example in (12) shows the underlying continuant character of the suffix, which appears postvocalically. (12)

a. timi-T-ik-e honor-PASS-PST-3SG b. raf-t-o sew-PASS-1SG

‘it was honored’ ‘I will be sewn’

21. Voiced fricatives do not dissimilate except in the isolated Cypriot dialect: (i.) avGo ‘egg’ (Cypriot avgo) evδ omaδ a (Cyp. evdomaδ a)‘week’ (cf. efta ‘seven’) (na) vGo ‘(that) I go out’ (Cyp. vgo) oGδ oos (Cyp. oGdoos) ‘eighth This apparently peculiar restriction is explained in note 24.

Is the Sonority Sequencing Principle an epiphenomenon?

c. filax-t-ik-a guard-PASS-PST-1SG d. peras-t-ik-e pass-PASS-PST-3SG

91

‘I took care of myself’ ‘it was passed’

The underlying form of this suffix must contain the fricative /T/, since underlying stops do not spirantize in this (or any) position, as evidenced by forms such as [zitima] ‘matter’. /-T-/ is the only Modern Greek suffix that begins with something other than /s/ or a vowel, so most of our following examples will come from historical developments from Postclassical Greek (P.Gk.) to the modern language. Sequences of fricatives also occurred morpheme-internally in the parent language, and these too have undergone defricativization: (13)

P.Gk.>M.Gk.

aisTanomai > estanome sxolio > skolio exTes > xTes > xtes efTinos > fTinos > ftinos

‘I perceive’ ‘school’ ‘yesterday’ ‘cheap’

Obstruents which are identical in manner are thus altered, increasing modulation. We will discuss why the preferred outcome is usually fricative+stop at the end of this survey of processes. b. Stop + stop becomes fricative + stop (14)

P.Gk.>M.Gk.

hepta > efta okto > oxto ktizo > xtizo pteron > ftero

‘seven’ ‘eight’ ‘I build’ ‘wing’

Again, the elimination of a cluster of segments with identical manner increases modulation. c. stop + fricative becomes fricative + stop The examples in the second column of (15) show stop-final stems to which have been added the same underlyingly fricative-initial suffix, passive /-T-/ as we saw in (12) . In these cases, both obstruents change their continuancy value: (15)

a. paralip-omai ‘I am neglected’ neglect-PASS.PRS.1SG b. katadiok-omai ‘I am pursued’ pursue-PASS.PRS.1SG

paralif-t-ik-a ‘I was neglected’ neglect-PASS-PST-1SG katadiox-t-ik-a ‘I was pursued’ pursue-PASS-PST-1SG

92

Eric Henke, Ellen M. Kaisse and Richard Wright

The change illustrated by examples such /paralip-T-ik-a/ → [pa.ra.li.fti.ka] is particularly striking. Both consonants are altered in their continuancy, suggesting forcefully that fricative plus stop plus vowel is the sequence being aimed at where possible. d. Clusters of fricative + s Unlike the three other voiceless fricatives of Modern Greek, /s/ does not have a systematic stop partner. The historical source of this phenomenon is the fact that the other voiceless fricatives, [f], [T], and [x], come from the aspirated counterparts of *p, *t, and *k, while /s/ has always been /s/.22 If /s/ is the second member of a cluster, Defricativization is not possible and the first fricative becomes [-continuant], increasing modulation by eliminating a string of fricatives. (16)

a. graf-o write-PRS.1SG,

e-grap-s-a ‘I write, I wrote’ PST-write-PRF-PST.1SG

b. pros-ex-o pros-ek-s-a take.care-PRS.1SG take.care- PRF-PST.1SG

‘I took care’

c. dulev-o work-PRS.1SG

‘I work, I will work’

dulep-s-o work-FUT.PRF-1SG

[ps] and [ks] clusters also occur word-initially: (17)

psomi ‘bread’

ksenos ‘strange’

Once again, we avoid unmodulated strings of fricatives. Because /s/ is loud, it is a better carrier of stop transitional cues than Greek’s other fricatives. So while the stop+s outcome is not ideal from a perceptual standpoint, it is a good compromise with the structural properties of the language, which prevent /s/ from undergoing alteration. Indeed, a proponent of the P-map approach (Steriade 2009) might argue that /s/ is not targeted for any defricativization process because of its loudness; there is less perceptual damage if /T/ turns into [t]. A proponent of Evolutionary Phonology (Blevins 2004) might argue that the loudness of [s] and its perceptual distinctness from [t] would be expected to protect the sibilant, as opposed to the interdental fricative, over time.

22. The current health of the t-T pairing is illustrated in the Cypriot forms discussed in Kaisse (1993), where a /t/ comes to stand before a newly created palatal stop. The /t/ becomes [T], never [s]: /mati+a/ → matja → matca → maTca ‘eyes’.

Is the Sonority Sequencing Principle an epiphenomenon?

93

e. Coronal clusters While roots ending in /f/ or /x/ regularly show defricativized allomorphs before s-initial suffixes, roots ending in /z/ or /t/ or /T/ show deletion in a process that deletes any coronal before /s/: (18)

a. arxiz-o arxi-s-a ‘I begin, I began’ begin-PRS.1SG begin-PRF-PST.1SG b. Tet-o place-PRS.1SG

e-Te-s-a place-PRF-PST.1SG

‘I place, I placed’

c. aleT-o grind-PRS.1SG

ale-s-o grind-PRF-PST.1SG

‘I grind, I will grind’

We can again attribute such changes to the need for modulation, often loosely described as instances of the Obligatory Contour Principle (OCP). However, the OCP is a phonological principle, standardly invoked to avoid adjacent identical elements within the same morpheme. These Greek cases are often heteromorphemic and more likely have the perceptual motivation of modulation. 6.2. Survey of occurring prevocalic clusters The result of all the processes we have now exemplified is an inventory of 13 or 14 prevocalic obstruent clusters in the core vocabulary.23 (In addition, Modern Greek allows most obstruents to combine with most liquids, with the sole glide, [j] and with some nasals to form SSP-obeying clusters such as [pl], [tr], [xj] and [tm]. We do not discuss these here, as the SSP and cue theory make similar predictions concerning the unmarked nature of such sequences.)

23. Modern Greek is diglossic. The core (‘demotic’) vocabulary consists of words which have undergone normal phonological development. There is an archaizing, literary language, ‘katharevousa,’ devised by scholars in the nineteenth century, which suppresses some of the normal phonological and morphological developments of the language with a goal of making the modern language more closely resemble Ancient Greek. The continuancy adjustments discussed in this section are suppressed in katharevousa. This occasionally results in doublets where a katharevousa form with adjacent stops or fricatives has entered the spoken language. We do not consider these forms here, since they are deliberate coinages that violate the normallydeveloped phonotactics of the core vocabulary.

94 (19)

Eric Henke, Ellen M. Kaisse and Richard Wright

voiceless obstruent clusters24 ps ft ks sp st sk sf xt

These clusters occur word-initially as well as intervocalically, where, as we have seen, the evidence suggests but does not insist, that they are not divided between syllables. Either the word-initial clusters are tautosyllabic and in onset, in which case they are problematic for the SSP, or, in an analysis to which we would not subscribe, they are in some sort of appendix outside the onset. (We again refer the reader to Goad 2011 for a summary of the structures that have been proposed for extrasyllabic /s/). In the latter case, an additional set of statements would be required to locate the first of two obstruents outside the syllable, begging the question of why such structures are needed. To these voiceless clusters we can add the following involving voiced obstruents: (20)

voiced obstruent clusters vδ vγ zv zγ γδ

Voiced obstruents have never participated in the continuancy template, as discussed at some length in Kaisse (1993). It is proposed there that because there is no voiced stop series in Modern Greek, the distinctive feature system of Greek does not make alternations producing voiced stops feasible in the lexical phonology – in other words, the continuancy template is imposed only insofar as it structure-preserving; it cannot produce novel segments such as voiced stops. We will largely neglect discussion of these clusters in what follows, simply because they have not been able to achieve continuancy dissimilation for a systematic phonological reason.25 Now let us consider the voiceless obstruent clusters more carefully. Let us first and most importantly inquire why Greek should have actively evolved clusters where the first member is a fricative and the second a stop. As we said in section 3.1, three factors related to robustness work in favor of the fricative+stop 24. Data from Eleftheriades (1985), modified by our own searches and with non-core, katharevousa clusters omitted. 25. When the rule is postlexical (Kaisse 1993), as in the Cypriot dialect, the fricative+stop template applies even to voiced obstruents, yielding [vd], [vg], etc. (Data from Newton 1972.) See also note 22.

Is the Sonority Sequencing Principle an epiphenomenon?

95

sequence over both fricative+fricative and stop+fricative sequences. First, in a pair of sounds, modulation (greater amplitude and spectral change) militates for dissimilation of adjacent stops or adjacent fricatives. Second, cues that unfold over a long period of time (such as frication noise) are more robust than the cues for stops in the absence of a flanking vowel for word initial clusters and in the weak VC position. Third, in stop1+stop2 clusters, the first stop has a high probability of losing the release burst due to gestural overlap, leaving it with no cues in word initial position, or weak VC cues in VCCV position. It is therefore more crucial for stops to be next to a vowel, and more specifically in prevocalic (CV) position to be fully recovered. Fricatives, even lower-intensity ones like [f] and [x], contain internal cues that reduce their need to be in prevocalic position in order to be robustly perceived. Another point to note is that the clusters involving an initial [s] are more numerous (n = 4) than those involving all the other fricatives combined (one for initial [f], one for initial [x], and none for initial [T].) This accords with (our improved accounting of) Morelli’s data set in section 4. As we said there, sibilants, being more robust in their internal cues due to their high intensity frication noise, are more able to take a position that is not adjacent to a vowel, on the outer margins of a syllable. We should point out that when the first consonant is not [s], but rather [f] or [x], the second consonant must be the coronal [t].26 Here, cue theory has little to contribute – though neither does the SSP. It may be that the parent language did not provide inputs with a non-coronal in the correct position, but that of course only moves the explanation back a millennium or so. There may be an articulatory explanation, in that movement of the blade of the tongue is more easily coordinated with other points of articulation than movement of the body of the tongue or the lips is. An anonymous referee suggests that the ability of the tongue tip to move rapidly may result in an anterior coronal articulation producing less hiding of the preceding fricative than another place of articulation would (Jun 2004). Or it may be that restricting the second element to a single place of articulation increases the likelihood that the listener will correctly identify the acoustically far-from-ideal cluster. The observation that coronals are preferred in the second position of cluster can be extended to clusters where 26. A few lexical items beginning with [ft] have sporadically developed alternative pronunciations with [fk]. For instance, [ftiari] ‘shovel’ (from Anc. Gk. ptu-on) has the alternative form [fkiari]. Even this sporadic development makes sense when seen from the point of view of cues. [t] and [k] have similar F2 transitions. The labial quality of a preceding [f] lowers the center of gravity of the [t] release burst, making it even more similar to [k]. The F2 transitions thus produced are not appropriate for a labial, so the listener may (mis)perceive [fk], but not [fp].

96

Eric Henke, Ellen M. Kaisse and Richard Wright

a nasal is the second member. Eleftheriades (1985) lists initial clusters of [pn], [kn], [Tn], [xn], [Gn] and [mn], but the other nasal, [m], appears as the second member only in one morpheme (tmi-‘section’).27 Finally, modulation results in a number of OCP-like gaps in the cluster inventory relating to place of articulation. Eleftheriades (1985) lists onset clusters of [ft] and [xt] but not of [Tt]. Presumably, the two coronals are too similar to be easily perceived. We have already noted (section 6.1) that Ancient Greek early on eliminated coronal consonants that occurred before [s], another modulation phenomenon. There are also gaps in obstruent+nasal sequences, not discussed above. Greek has [tm] and [pn] but not [tn] or [pm]. We want to be up-front about the inability of cue theory to explain everything about the distribution of Greek clusters. As already mentioned, we think the lack of dissimilation of voiced fricative clusters has a phonological explanation. Happily, when we find a dialect (Cypriot) where the process is postlexical, they too alter so as to become fricative+stop clusters. The restriction of the second position stop to [t] (and the second position nasal to [n]), we have said, probably has nothing to do with cue recovery. There are other small glitches in the data. [sf] is quite common as an onset, and we do not know why it has not dissimilated to [sp] in every case. While [sf] is the only non-learned fricative+fricative cluster recognized by Eleftheriades (1985), dictionaries (for instance Mandeson 1961) also list several forms with [sx], such as [sxoli] ‘leisure.’ Again, we do not know why these clusters persist occasionally in the core vocabulary. 7.

Conclusion

The Sonority Sequencing Principle and the Syllable Contact Law have several well-known shortcomings. We have seen some of these illustrated in our treatment of Korean and of Greek. In Korean, some heterosyllabic sequences containing sonorants are altered in ways that happen to improve them from the point of view of a sonority-based theory of optimal syllable contact, but another such sequence, a lateral followed by a nasal, is altered in entirely the wrong direction. A more integrated explanation of all the Korean syllable contact phenomena is available from the standpoint of cue recoverability. Moreover, cue recoverability extends satisfyingly to processes involving changes in place of articulation 27. [zm] occurs, but [z] is regularly the result of voicing assimilation when it appears as the first member of a cluster, so this case falls under the same generalization permitting /s/ in extra, absolute-initial position and thereby admitting [sp], [st], [sk] and [sf].

Is the Sonority Sequencing Principle an epiphenomenon?

97

involving heterosyllabic obstruents in Korean, a type of sequence about which sonority sequencing and syllable contact have nothing to say. Similarly, Modern Greek has several processes that aim to produce an onset sequence of fricative + stop from any input sequence of stops and fricatives in any order. Again, sonority sequencing has little to say about such adjustments. Indeed if stops are less sonorant than fricatives, the SSP makes the wrong predictions about onset obstruent clusters in Greek and in the other languages surveyed in Morelli (1999, 2003). More generally, we have noted that while the sonority scale that underlies the SSP corresponds roughly to relative intensity and relative duration (e.g. Parker 2002, 2008), these dimensions also factor into robustness and recoverability, and therefore do not distinguish a sonority scale from a perceptual scale. Other phonetic dimensions, such as degree of stricture, are poorly correlated with sonority.28 Moreover, the level of phonological detail in the scale varies from language to language. Most important is the SSP’s typological overand under-generation. While it fails to predict typologically common obstruent sequences, such as those involving sibilant fricatives+stops (such as those discussed in Goad 2011), it must be supplemented with a variety of other constraints to capture other types of common phonotactic restrictions such as those against glides before high vowels with matching backness features (*wu, *ji), or restrictions on sibilant fricative sequences (*Ss, *Sù), or place specific restrictions like those against /l/ in clusters with coronal stops (*tl, *dl) in languages that otherwise permit coronal stops to cluster with /ô/ or with /r/. We argue that the majority of the common exceptions to the Sonority Sequencing Principle are well accounted for by a using perceptual sequencing that takes into account three factors that relate to recoverability: cue precision, auditory robustness, and gestural robustness. Finally, the same principles that we argue underlie most phonotactic sequencing constraints also motivate other phonological processes, such as those that motivate Steriade’s (2001, 2009) P-map hypothesis, which states that perceptual similarity determines the degree of violation of faithfulness in the mapping of the input to the output. There are unresolved formal issues in the implementation of the P-map approach. As Flemming (2008) and McCarthy (2009) point out, many of the kinds of markedness repair strategies that motivate the P-map require evaluations of percepts that are not fully knowable given the underlying input. That is, the 28. On the other hand, stricture is related to whether or not a segment is a good carrier of transitional cues to flanking segments: coarsely speaking segments with lower stricture are typically better cue carriers than those with higher degrees of stricture. In this way stricture could be seen as having a complex relationship to sonority.

98

Eric Henke, Ellen M. Kaisse and Richard Wright

degree to which a segmental string contains robustly encoded cues is determined by language-specific details of phonetic implementation. For example, Flemming points out that some languages, like English, permit released stops in word-final position, while others, like Korean, mandate unreleased stops. Korean can neutralize place and manner precisely because its unreleased obstruents are perceptually weak – and therefore are much more perceptually similar to one another in place and manner. The phonology cannot evaluate the amount of information lost in neutralizing the Korean segments (or, potentially lost, in the case of English segments) until it knows how the winning candidate will be pronounced. While referring to phonetic detail addresses apparent contradictions to perceptual sequencing found in various languages, it introduces the complexities discussed by Flemming and McCarthy into the formalization of perceptual constraints or of perceptual weighting. Also relevant to patterns discussed in this chapter are the phonetics of stop+stop gestural coordination in languages that seem to violate perceptual sequencing constraints. We discuss these next. Arrernte (Breen and Pensalfini 1999; Tabain, Breen and Butcher 2004), a Pama-Nyungan language of Australia, provides an excellent illustration of how phonetic implementation resolves seeming contradictions to perceptual sequencing constraints. There is strong evidence for an underlying VC syllable structure from a variety of phonological areas in the language, including reduplication, stress assignment, prosodically conditioned allomorphy, and language games. This would seem problematic for cue theory, since we have seen repeatedly that postvocalic position is less optimal for the recovery of consonants than prevocalic position. However, on the surface several processes conspire to ensure that in post-pausal “utterance initial” position, any underlying initial vowel is deleted or transposed to ensure a phonetic CV word onset. More strikingly, in word-internal consonant clusters, an epenthetic vowel, or a transposed vowel is inserted to ensure that there are no CC sequences. Finally, wordfinal consonants are always strongly released, sometimes with a resultant excrescent vowel. The result is surface forms that have a CVCV structure, ending with word final consonants that are strongly released. From a perceptual point of view, the resulting phonetic representation of Arrernte (and related languages) is nearly optimal. Similarly Tsou (Wright 1996, 1999), an Austronesian language of Taiwan, has an unusually large number of word-initial obstruent clusters including a large number of stop+stop clusters. Phonetically, in word initial position, where the C1 cues are encoded exclusively in the stop release burst, gestural overlap is limited so that the C1 stop is always ensured a release burst even in speeded production tasks; however, in intervocalic position where preceding vowel formant transitions provide cues to the C1 stop, a greater degree

Is the Sonority Sequencing Principle an epiphenomenon?

99

of gestural overlap is permitted, resulting in the more frequent loss of the C1 release burst. No such word initial limitation on gestural overlap is observed when the onset clusters are made up of a stop and a fricative (either stop+fricative or fricative+stop). Similar patterns are seen in Georgian (Chitoran, Goldstein and Byrd 2002) and Montana Salish (Flemming, Ladefoged and Thomason 2008). In all these cases, language-specific phonetics lead to the preservation of contrast at the phonetic level. In these languages, phonotactics permit sequences that in other languages are resolved through deletion or neutralization. How can phonologists resolve the need for P-map constraints, or other functional constraints such as ease of articulation, that relate to details in the output of the phonetics? Several possibilities have been suggested in the literature. The first, discussed in Flemming (2008) is that there is an intermediate stage after the underlying form, the “realized input”, in which language-specific phonetic details act as the input to the P-map. McCarthy (2009) proposes something similar, within the derivational version of OT known as harmonic serialism, in which candidates are evaluated at each stage of the derivational process. An alternative to this approach is to use sound change that is motivated by perceptual factors. This latter approach is typified in the work of Ohala (ex. 1981), Blevins (2004), and Boersma (2009). By way of conclusion, we would like to discuss some shortcomings of an exclusively perception-based approach. We will take sibilants as an illustrative set: while they are ideal segments from a perceptual point of view they are nevertheless targeted for changes in phonology. For example in Korean, as we have noted, word final /s/ surfaces as an unreleased /t/, and in many dialects of Spanish, coda /s/ debuccalizes to [h]. More strikingly, in Tsou (Tung 1964; Wright 1996) clusters of homorganic fricative and stop are not permitted (presumably due to OCP violations): *fp, *st, *pf, *ts. When a prefix containing an /s/ is affixed onto a stem beginning with a /t/, the /s/ is debuccalized, even in word initial position, creating a word initial cluster with the glottal fricative followed by the voiceless alveolar stop [ht]. While underlying word-initial /h/ stop clusters are attested in word initial position in Tsou, the creation of an [ht] cluster from an underlying /st/ cluster is perceptually unfortunate. It would make more sense to resolve the violating cluster by creating [sU] which is also attested in the language and which is much better from a perceptual point of view. Clearly perception cannot be the sole or even dominant factor here.

100

Eric Henke, Ellen M. Kaisse and Richard Wright

Abbreviations 1 3 Anc.Gk. C Cyp. F FUT L M.Gk. ms OCP OT P.Gk. PASS PRF PRS PST S SG SSP V VOT

first person third person Ancient Greek consonant Cypriot Greek non-sibilant fricative future liquid Modern Greek milliseconds Obligatory Contour Principle Optimality Theory Postclassical Greek passive voice perfective present tense past tense sibilant fricative singular Sonority Sequencing Principle vowel voice onset time

Acknowledgments We would like to thank Stuart Davis, Robert Kirchner, Steve Parker, an anonymous referee, and the members of Dr. Parker’s Autumn 2011 seminar on sonority at the Graduate Institute of Applied Linguistics. We know we have not convinced Stuart and Steve, but we have found their friendly skepticism invaluable.

Sonority distance vs. sonority dispersion – a typological survey Steve Parker

Abstract. This study presents a corpus of data from several languages illustrating the typological range of variation among certain kinds of consonant clusters in syllableinitial position. These facts further confirm the relevance and importance of the sonority hierarchy as a theoretical primitive of Universal Grammar. Specifically, I examine the claims of two extant models in accounting for bisegmental onsets that strictly follow the Sonority Sequencing Principle: the Minimum Sonority Distance approach and the Sonority Dispersion Principle. Minimum Sonority Distance is a general tendency by which specific languages may impose a parametric requirement that sonority rise by at least x ranks from C1 to C2 in a syllable-initial consonant cluster (Steriade 1982; Selkirk 1984). Assuming the typical five-category sonority scale (vowel > glide > liquid > nasal > obstruent), sonority distance favors glides as the default (unmarked) class of segments in C2 position since glides are higher in relative sonority than all other consonants. In contrast to this, the Sonority Dispersion Principle posits that in a C1 C2 V sequence, these three segments should be maximally and evenly dispersed (separated) from each other in terms of sonority, all else being equal (Clements 1990). This results in a preference for liquids rather than glides in C2 position since liquids are halfway between obstruents and vowels in most sonority scales. Nevertheless, a major theoretical gap is that the divergent predictions of these two competing formal devices have never been systematically tested with empirical data from a robust sample of languages. To help remedy this situation I report here the findings of a survey of 122 languages containing onset clusters, designed to shed fresh light on this topic. The results partially validate both generalizations simultaneously: glides are the preferred C2 segments in some languages, while other languages require all syllable-initial clusters to end with a liquid. Therefore, neither the Minimum Sonority Distance model by itself nor the Sonority Dispersion Principle alone can account for all languages exhibiting onset clusters; i.e., neither of them holds true as an absolute statement of markedness concerning preferred sequences of onset consonants in all cases. Furthermore, this typological study also provides evidence for two semi-novel cross-linguistic patterns rooted in sonority; I call these the glide offset continuum and the liquid offset continuum. The former comprises four distinct yet conceptually-related types of languages in which C2 is always a glide. In this cluster of glide offset languages permissible natural classes in C1 position consist of either (1) obstruents alone, or (2) obstruents plus nasals, or (3) both of these plus liquids, or (4) all three of these plus glides. The liquid offset continuum is analogous to this except that C2 is always a liquid.

102

Steve Parker

It therefore limits the natural classes of segments in C1 position to either (1) obstruents only, or (2) obstruents plus nasals, or (3) obstruents, nasals, and liquids. Hence some notion of sonority differential between C1 and C2 is still ultimately crucial, with the added twist that the fixed terminus of an onset cluster (C2 ) can be specified in each language to be either a glide or a liquid. However, because of this latter condition on the quality of C2 , classical sonority distance approaches such as Steriade (1982) and Selkirk (1984) are not quite restrictive enough to generate these two language continua without further constraints.

1.

Introduction

The goals of this paper are to (1) document the typological range of certain kinds of syllable-initial consonant clusters, and (2) provide an informal descriptive analysis of the corresponding phonological generalizations that characterize such linguistic patterns. For lack of space a more formal account of these facts in terms of Optimality Theory (OT) has to be left for future work. Nevertheless, even in this informal treatment the notion of sonority will be crucial in explaining the observed tendencies. The most frequently-cited sonority scale is probably the following (Clements 1990; Kenstowicz 1994; Smolensky 1995b): (1)

Modal sonority hierarchy natural class: vowels > glides > liquids > nasals > obstruents abbreviation: V G L N O sonority index (SI): 5 4 3 2 1

The natural class abbreviations in (1) are helpful in referring to the constituents of consonant clusters in a compact way. For example, the syllable /pla/ has an OL (obstruent+liquid) onset. Similarly, the corresponding sonority indices provide a convenient basis for modeling the sonority distances that exist between adjacent segments in mathematical terms. This will be illustrated shortly. The domain in which sonority is most often invoked is the syllable. A common analogy is that the syllable is like a wave of energy (Sievers 1876/1893; Pike 1943). Specifically, syllables tend to universally abide by the following constraint: (2)

Every syllable exhibits exactly one peak of sonority, contained in the nucleus (Parker 2011).

This is known as the Sonority Sequencing Principle (SSP). Key works assuming this principle as a basis for analysis include Hooper (1976), Blevins (1995), and

Sonority distance vs. sonority dispersion – a typological survey

103

Zec (2007). This generalization underlies a very strong and important crosslinguistic implicature: (3)

(a) (b) (c) (d)

In many languages all tautosyllabic consonant clusters obey the SSP. In many other languages some tautosyllabic consonant clusters obey the SSP while other clusters violate it. However, there is no known language in which all tautosyllabic consonant clusters violate the SSP. Therefore, the presence of SSP reversals in a particular language entails the existence in that same language of clusters that satisfy the SSP.

The statement in (3d) appears to be exceptionless (Parker 2011). It also leads to the conclusion that consonant clusters respecting the Sonority Sequencing Principle are universally preferred (natural). For example, one standard criterion for defining markedness is the following: “the frequency with which one sort of situation obtains, as opposed to some other situation, in a large number of widely divergent languages has traditionally been a basis for determining the unmarked status of that situation” (Kenstowicz and Kisseberth 1973: 3). See also de Lacy (2006) and Hume (2011) for summaries of more recent work on markedness. Among those onset clusters that strictly obey the Sonority Sequencing Principle, two additional subtendencies have been identified. Accordingly, two formal mechanisms have been previously proposed to capture these generalizations. First, languages can impose a Minimum Sonority Distance (MSD) condition on the members of a tautosyllabic consonant cluster. This parameter requires that the two segments be separated by at least x ranks on the sonority scale (Steriade 1982; Selkirk 1984; Zec 2007; Topintzi 2011). For example, a complex onset of the type OL (such as /kR/) involves a sonority distance of 3 − 1 = 2 (cf. (1)). This leads to the following typological prediction: (4)

Assuming the Sonority Indices in (1), the largest possible MSD among onset clusters is 4 − 1 = 3. Therefore, some languages can and should exist in which the only possible syllable-initial clusters contain an obstruent followed by a glide (OG, such as /pj/).

Later sections of this paper confirm that such languages are attested, and are probably even abundant. The second extant formal device computing relative sonority among onset clusters is the Sonority Dispersion Principle (SDP: Clements 1990). This posits that initial demisyllables (onset+nucleus combinations) are optimal when their

104

Steve Parker

constituents are maximally dispersed in sonority; e.g., /ta/. However, when the initial demisyllable contains three members (CCV), the Sonority Dispersion Principle prefers these segments to be spaced apart as evenly as possible (in sonority). This results in the best evaluation for complex onsets of the type OL. This is because liquids fall midway between obstruents and vowels in terms of their sonority indices in the five-category scale in (1). Consequently, from the Sonority Dispersion Principle we derive a different prediction: (5)

Some languages can and should exist in which the only possible onset clusters consist of OL (such as /tR/).

As later sections show, (5) is true as well. This raises an important typological question: which kind of onset cluster is universally unmarked, OL or OG? Ultimately this may reduce to a simpler issue: which natural class is preferred in C2 position, glides or liquids? This tension is a recurring theme in this paper. Our dilemma is this: neither the Minimum Sonority Distance approach nor the Sonority Dispersion Principle is completely and universally true of all languages simultaneously. They cannot be since they make divergent and contradictory claims. Nevertheless, a basic tenet of Optimality Theory is that constraints are violable. Therefore, we can generate both of these language types (in different grammars) by permuting the crucial rankings of the constraints that encode the relevant generalizations. However, for reasons of space the exact nature of these formal constraints and their potential interactions cannot be pursued here. Rather, the intriguing cross-linguistic variety of phonological systems uncovered by this survey is discussed in descriptive terms only. Nevertheless, I still appeal to the notions of sonority distance and sonority dispersion as a general framework for the informal analysis when appropriate. Furthermore, two partially new types of language continua rooted in sonority also emerge in this chapter. In addition to the OG-only type of language previewed in (4), there are certain other languages in which the only attested onset clusters consist of OG and NG (nasal+glide). Still other languages allow only OG, NG, and LG (liquid+glide). Finally, at least one language allows all four possible consonant+glide (CG) combinations and no others: OG, NG, LG, and GG (glide+glide). Thus the CG continuum I document here monotonically increases in inclusiveness as it incorporates more marked onset types. That is, the existence of GG in the inventory of a particular language implies LG as well, but not vice-versa. Similarly, LG entails NG, etc. (among the languages in this subset; see (41)). This pattern obviously derives from the sonority scale and the related concept of sonority distance, but at the same time it is subtly unique too. This is because, in addition to a parametric requirement on sonority distance,

Sonority distance vs. sonority dispersion – a typological survey

105

all of the languages in this continuum exhibit the further condition that C2 must be a glide. Therefore, typical Minimum Sonority Distance approaches, such as Steriade (1982), Selkirk (1984), and Zec (2007), cannot directly accommodate these language types without modification. To illustrate why, assume a hypothetical language with a classical minimum distance setting of 2. This permits the onset cluster types OG and NG but rules out *LG and *GG. However, all else being equal it also generates OL, which in fact is ungrammatical in many of the languages sketched here. The problem with an OL cluster is not in the sonority distance per se (n = 2), but rather in the fact that C2 is a liquid rather than a glide. Now there certainly do exist some languages in which OL clusters are attested along with OG and NG; see Parker (2011) for some examples. Thus traditional minimum sonority distance constraints are independently required. However, at the same time we also need the flexibility to capture languages in which sonority distance is fixed but which also require the end point of the onset cluster (C2 ) to always be a glide. This is the sense in which a semi-novel CG continuum is motivated by the results presented here. Similarly, an analogous CL (consonant+liquid) continuum is also posited in this chapter. Languages in this category share two common characteristics: (1) a minimum sonority distance condition between C1 and C2 ; and (2) a further requirement that C2 must be a liquid. To illustrate, with sonority distance set to 1 we generate a language allowing OL and NL clusters, but no others. Crucially, languages in this continuum systematically prohibit *OG onsets, despite their otherwise optimal sonority distance of 3. The problem with the latter is that C2 is a glide rather than the liquid prescribed by these language types. Obviously, the classical minimum distance approach cannot produce this effect either. What is more, the Sonority Dispersion Principle also encounters difficulties with this kind of pattern. As we will see in section 3.3, Clements’ (1990) model predicts that onsets of the type OG and ON are less marked than NL due to how sonority dispersion is calculated. Nevertheless, some of the languages described here permit NL clusters along with OL, yet not *OG nor *ON. Consequently, the Sonority Dispersion Principle by itself is also incapable of accounting for all of the empirical facts summarized in this paper. Hence I also propose a partially new continuum of CL language types. In summary, the overall purpose of this work is twofold. First, I present a corpus of languages illustrating the range of variation ascribable to these two phonological tendencies – the CG continuum and the competing CL continuum. Second, I provide an informal analysis of the relevant facts in which I appeal to the sonority hierarchy as a unifying explanatory device. The remainder of this paper is organized as follows. In section 2 I establish some important nomenclature for referring to different types of onset clusters

106

Steve Parker

and the segmental positions within them. In section 3 I present an overview of the results of my typological survey and introduce the CG and CL continua. In section 4 I list the criteria employed in sampling languages for the purposes of this study. In section 5 I present the actual linguistic data documenting the existence of a CG continuum, and in section 6 I do the same for the CL continuum. Finally, in section 7 I present my conclusions. 2.

Preliminary issues

In section 3 I summarize the overall results of my typological survey and relate those findings to sonority in a preliminary and informal way. But first I deal with a few terminological matters. Sonority, as used in this paper, can be defined as “a unique type of relative, n-ary (non-binary) feature-like phonological element that potentially categorizes all speech sounds into a hierarchical scale” (Parker 2011: 1160). The most robust correlate of sonority discovered to date in measurable phonetic terms (physiological, acoustic, and/or perceptual) is intensity/loudness (Ladefoged 1975: 219; Parker 2002, 2008; Jany et al. 2007). Clements (2009) suggests that sonority is also related to perceived resonance or relative power, which he equates with a repeated, prolonged, or augmented sound signal. Furthermore, Galves et al. (2002) posit that sonority in a rhythmic sense is inversely proportional to spectral entropy (Nespor, Shukla, and Mehler 2011). Before examining the typology of consonant clusters in detail, it would be helpful to establish some nomenclature for referring to different positions within onsets. For syllable-initial sequences I employ the following labels: (6)

Assume a syllable beginning with exactly two adjacent consonants, followed by a vowel: σ [C1 C2 V...]. Assume furthermore that the Sonority Index of C2 is greater than or equal to the SI of C1 . In that case, designate C1 the anchor of the cluster, and designate C2 the offset.

For example, in the onset clusters /kl/, /pj/, and /sm/, the anchors are /k/, /p/, and /s/ respectively. Similarly, the offsets are /l/, /j/, and /m/. These terms allow us to discuss phonotactic patterns in a compact way. A precedent for the use of offset to refer to C2 is Nakagawa (2006), cited in Miller (2011).1 By design the definitions in (6) apply to onset clusters only. In syllable codas 1. For other phonological senses of the term offset, see Murray and Vennemann (1983), Vennemann (1988), Newkirk (1998), Harris (2011), Padgett (2011), Seo (2011), and Wilbur (2011).

Sonority distance vs. sonority dispersion – a typological survey

107

the relationships are different and potentially more complicated. Hence I do not deal with them here. Furthermore, a condition stipulated in (6) is that the sonority slope from C1 to C2 either rises or is level. The latter is illustrated by a hypothetical syllable such as /mni/. Clements (1990) refers to this type of onset as a sonority plateau. It seems appropriate to still refer to C1 in a plateau as the anchor of the cluster and to C2 as the offset. However, in onset clusters that violate the Sonority Sequencing Principle (such as /lt/), these labels do not apply in the same way. On the one hand it may be more appropriate to refer to C2 (/t/) rather than C1 as the anchor in reversed clusters. On the other hand, it does not seem logical to call C1 (/l/) the offset in this case. Therefore, the definitions proposed in (6) are intentionally restricted to onset clusters that do not reverse the Sonority Sequencing Principle.2 More than 100 distinct versions of the sonority hierarchy have been posited in the literature (Parker 2002). There is fairly strong evidence that sonority distinctions can and should be made between different subclasses of obstruents, liquids, and vowels, contra (1). For example, many works claim that fricatives often pattern as higher in sonority than stops (Parker 2002, 2008, 2011; Zec 2007; see also Vaux and Miller 2011). Nevertheless, the five natural classes in (1) are the easiest ones to motivate and the most useful ones to employ. Furthermore, Clements’ (1990) Sonority Dispersion Principle crucially relies on this same five-way sonority scale (section 3.3). Consequently, in this paper I limit my analysis to the five categories in (1) to keep things as simple and consistent as possible. Nevertheless, to digress briefly, an expansion of the sonority hierarchy to more than five classes potentially affects the Sonority Dispersion Principle more than it does minimum sonority distance approaches. The latter just prefer a cluster to be maximally differentiated. Consequently, all that changes (for them) with a more detailed scale is that the quantity of predicted language types increases. With the Sonority Dispersion Principle, however, the relative sonority distances between all natural classes on the scale are simultaneously crucial. Therefore, if we make a distinction between laterals and rhotics in the middle of the hierarchy, for instance, this could significantly impact (and possibly undermine) the results. Along these lines, Al-Ahmadi Al-Harbi (2002) proposes 2. Other names for C1 and C2 have been proposed elsewhere. See, for example, Pike and Pike (1947), Anderson (1986), Durand (1990), Kaye, Lowenstamm, and Vergnaud (1990), Ewen (1995), Baertsch (1998, 2002), Nakagawa (2006), Harris (2007), Goad (2011), and Miller (2011). The general idea in most of these approaches is that the leftmost element in biconsonantal onset clusters is dominant. That is, C1 is the central part of the cluster while C2 is in some sense dependent on C1 . Thanks to Paul de Lacy and John McCarthy (p.c.) for discussion of this point.

108

Steve Parker

an interesting adaptation of the Sonority Dispersion Principle to Acehnese in which fricatives outrank stops (cf. section 6.2.1). Nevertheless, for lack of space this issue cannot be pursued here in depth, but see section 7 for a brief follow up comment. Returning to the fivefold scale in (1), in complex onsets the four categories of consonants can be logically combined into bisegmental clusters in 16 possible ways: Table 1. Exhaustive list of 16 possible combinations of consonant clusters, grouped by sonority class (the numbers in parentheses indicate the sonority differential (SD) between C2 and C1 ) obstruent obstruent OO (0) second nasal ON (1) consonant liquid OL (2) glide OG (3)

first consonant nasal liquid NO (–1) LO (–2) NN (0) LN (–1) NL (1) LL (0) NG (2) LG (1)

glide GO (–3) GN (–2) GL (–1) GG (0)

As Table 1 graphically shows, the 16 clusters can be grouped into three major blocks: (7) (a)

(b)

(c)

sonority rises, 6 clusters (enclosed in dark bold borders): ON, OL, OG, NL, NG, LG [Sonority Differential > 0] sonority equal, 4 clusters (diagonally from top left to bottom right): OO, NN, LL, GG [Sonority Differential = 0] sonority falls, 6 clusters (in shaded cells): NO, LO, LN, GO, GN, GL [Sonority Differential < 0]

In (7a) I list the six onset types in which sonority rises. In these the relative sonority differential between the offset consonant and the anchor is at least 1. Let us refer to these as core clusters, following Al-Ahmadi Al-Harbi (2002). They are clearly less marked cross-linguistically than those listed in (7b) and (7c). I call these latter two groups the noncore clusters. This paper focuses on languages having different combinations of the six core types. As (3a) notes, many languages restrict all of their onset clusters to a subset of these six. One language having all six of these in phonetic forms and no others is Abau (Bailey 1975; Lock 2007).

Sonority distance vs. sonority dispersion – a typological survey

109

In (7b) I list the four types of plateau clusters. In these, sonority neither rises nor falls (Sonority Differential = 0). I know of no language in which all permissible complex onsets are restricted to these only. However, there are many languages which allow one or more of these along with certain core clusters: Hixkaryána (Derbyshire 1979, 1985), Muniche (Gibson 1988, 1996), Sobei (Sterner 1975; Sterner and Ross 2002), and Teribe (Koontz and Anderson 1974; Oakes 2001). None of these four languages has any of the reversed clusters from (7c). Therefore, given the statements in (3), we can conclude that sonority plateaus are less marked than sonority reversals (Kreitman 2006). The latter six types are listed in (7c). Their sonority differential values are less than or equal to –1. They are thus the mirror image or inverse of the six core types. In Table 1 their cells are shaded to highlight their irregular status. In this chapter I ignore them completely.3 To summarize, we can posit the following universal markedness scale for complex onsets, where “ ” is used in the optimality sense of inherent harmony (Prince and Smolensky 1993/2004). This scale has been confirmed by psycholinguistic experiments with nonce forms in English and Russian (Berent et al. 2007; Daland et al. 2011). (8)

Cross-linguistic markedness hierarchy for syllable-initial consonant clusters: sonority rise (obey the SSP) sonority plateau sonority fall (SSP reversal)

In short, this paper deals primarily with biconsonantal onsets that strictly fulfill the Sonority Sequencing Principle – the six core types. Sonority plateaus are also included in the analysis when relevant. However, for reasons of space and simplicity four categories of clusters are entirely avoided here. These are (1) geminates; (2) syllable codas; (3) sonority reversals in onset position, such as /nt/ and potentially /sp/ (if fricatives are more sonorous than stops); and (4) onsets containing more than two consonants, such as hypothetical /plj/, /smw/, etc. The latter are rare anyway. All four of these phenomena involve additional complications that would take us too far afield of our main focus.4

3. At the phonemic level Santa María Quiegolani Zapotec has 15 of the 16 onset types from Table 1 – all except NN (Regnier 1993; Black 1995). 4. For further analysis and discussion of SSP-reversing clusters, see e.g. Parker (2002, 2011), Kreitman (2006), Goad (2011), and Henke, Kaisse, and Wright (this volume).

110 3.

Steve Parker

Overview of results

In this section I preview the findings of my typological survey of onset clusters. During 2010 I browsed through hundreds of books and articles that posit phonological descriptions and analyses of specific languages. All of these reside in the library of the Graduate Institute of Applied Linguistics in Dallas. Among these works I identified 122 languages permitting syllable-initial consonant clusters. For each of these 122 languages I noted what types of onset clusters are attested, following Table 1. These languages form the basis of my typological claims here. For lack of space I cannot directly include all of them in this paper. Nevertheless, I do take them into account in my phonological generalizations. Consequently, none of these languages contradicts any of the conclusions posited in this study.5 3.1. Minimum Sonority Distance In this section I briefly discuss previous proposals employing a minimum sonority distance approach, such as Steriade (1982), Selkirk (1984), and Zec (2007). These are alluded to in section 1. I refer to these here as general, traditional, typical, or classical sonority distance models. By this I mean that they do not necessarily require C2 to be a glide or liquid, unlike the CG and CL continua introduced in section 1. The following formal definition is adapted from Parker (2011): (9)

Minimal Sonority Distance (MSD)def = Assuming the Sonority Indices in (1), and given an onset composed of two segments, C1 and C2 , if a = SI(C1) and b = SI(C2 ), and if a ≤ b, then the language-specific MSD = x such that b − a ≥ x, where x ∈ {0, 1, 2, 3}.

The typological implicature of (9) is as follows. If a language permits biconsonantal onset clusters having a certain sonority distance, then all cluster types involving the same or greater sonority distance are also attested in that language, ceteris paribus. For example, with a sonority distance setting of 0, ten different combinations of onset clusters are possible. These are the six core (rising sonority) clusters from (7a) plus the four plateau clusters from (7b). One language approximating this prediction is Leti (van Engelenhoven 2004). It has nine of 5. Anyone interested in the full corpus can write to me at [email protected]. I have not cleaned this database up yet, so it is still in rough shape. That is why I have not posted it on the internet.

Sonority distance vs. sonority dispersion – a typological survey

111

the ten types – all except GG, which can reasonably be considered the most marked type cross-linguistically (except for actual sonority reversals). Several other languages amenable to a classical minimum sonority distance analysis are listed and discussed in Zec (2007) and Parker (2011). Furthermore, many of the 122 languages examined in my survey also fall into this category. These are not discussed in detail here since they do not necessarily require all onset clusters to end with either a glide or a liquid. Consequently, I do not deal with this more generalized typology at length in this paper.6 However, it does bear mentioning that Zec (2007) limits her treatment of Minimum Sonority Distance phenomena to the three sonority classes obstruent, nasal, and liquid. That is, she intentionally avoids referring to CG clusters completely. She explained (p.c.) that this is due to the inherent difficulty of interpreting glide offset clusters in languages where no other class of consonants occurs in C2 position. In section 5.1 I confront this issue (interpreting CG clusters) in more detail. I argue there that strong reasons can be found in some languages for positing CG sequences as the only true cluster types allowed. Therefore, one of the contributions of this paper is to extend Zec’s (2007) partial typology of sonority distance effects by adding glide offset languages to the mix. The work of Baertsch (1998, 2002) should also be consulted for further empirical facts. She proposes an optimality theoretic model for analyzing typical minimum sonority distance effects using local conjunction of constraints. See also the paper by Smith and Moreton in this volume. 3.2. Introduction to the glide offset continuum As noted in section 1, a quasi-new type of language hierarchy is posited in this paper. I call this the glide offset continuum. All of the languages in this category restrict C2 to one or more glides. They differ, however, in what type(s) of anchors they allow in C1 position. Some permit only obstruents. Others add nasals to this, while still others include liquids too. Finally, some languages attest all four categories of anchors, including glides. I refer to these glide offset clusters schematically with the abbreviation CG, where C = any consonant.7 The following table summarizes the relevant facts: 6. Among the 122 language in my database, 20 allow obstruent offset clusters and 20 permit nasal offset clusters. Most of these languages in fact attest both types simultaneously (CO and CN) and thus occur in both sets of 20 (n = 17). That is, I found only three languages that have CO but not *CN, and three others with CN but not *CO. 7. A more literal abbreviation of glide offset would obviously be GO. However, the latter already exists to refer to glide+obstruent clusters. Therefore, I avoid GO here and use CG instead to refer to any biconsonantal onset cluster in which the offset is a glide, such as [pwV] or [ljV].

112

Steve Parker

Table 2. The glide offset (CG) continuum of language types, where MSD = minimum sonority distance (the numbers in parentheses indicate the sonority distance (SD) between C2 and C1 ) row 1 language type row 2 permissible () and non-permissible (*) onset clusters

A (GG) *ON (1) *NL (1) *OL (2) GG (0) LG (1) NG (2) OG (3) row 3 phonological generalization MSD/CG=0 row 4 name of language(s) Shilluk

B (LG) *ON (1) *NL (1) *OL (2) *GG (0) LG (1) NG (2) OG (3) MSD/CG=1 Ga’dang Kham Koonzime

C (NG) *ON (1) *NL (1) *OL (2) *GG (0) *LG (1) NG (2) OG (3) MSD/CG=2 Angaataha Pame

D (OG) *ON (1) *NL (1) *OL (2) *GG (0) *LG (1) *NG (2) OG (3) MSD/CG=3 Bambassi Dadibi

In the first row of Table 2 I assign each type of language a distinct alphabetical code (A, B, etc.). This facilitates the comparative discussion. After each of the letters in these four cells (A, B, C, and D) I summarize the language type in that column by indicating the most marked glide offset cluster permitted in that category of language. This corresponds to the CG cluster involving the lowest sonority distance. Row 2 displays the attested and non-attested onset cluster types present in each kind of language, drawing from the list in Table 1. These are indicated by  and *, respectively. I include here seven specific combinations in each case: the six core (rising sonority) clusters from Table 1, plus one plateau – GG. The latter is necessary because it occurs in Shilluk (a type A language). At the top of each of the four rightmost cells in row 2 I list the three core clusters that do not end with a glide (ON, NL, OL). These are included for completeness only. None of them occurs in any of the languages mentioned in this table, with one minor exception (Pame; see section 5.4.2). This is because Table 2 displays glide offset languages only. Underneath these three non-CG cluster types I list (from top to bottom) the remaining four (CG) clusters, in order of increasing sonority distance between C2 and C1 . Thus OG is at the bottom of these cells because it involves the maximal sonority distance (4 − 1 = 3). A dark horizontal line in these cells indicates the critical cut-off point between permissible and non-permissible clusters for each language type. As the display proceeds from left to right, the location of this threshold boundary line decreases across columns A–D of row 2. The corresponding typological generalization is as follows. In the CG continuum (in which the offset consonant must be a glide), the occurrence of a higher sonority anchor in the inventory of cluster types entails the inclusion of all natural classes of anchors with a lower

Sonority distance vs. sonority dispersion – a typological survey

113

sonority index. The converse of this is not true. The phonological motivation for this continuum is plausibly a pressure to maximize the perceptual contrast in sonority between the anchor and the offset. It is thus analogous to the more general Minimum Sonority Distance parameter mentioned in section 3.1. It differs, however, in requiring C2 to always be a glide. Classical minimum distance constraints do not necessarily include this second condition (C2 = glide) and therefore cannot produce language types A, B, and C in this table. To illustrate, suppose we attempt to capture type B languages with a generic minimum sonority distance setting of 1. This correctly permits the three attested clusters LG, NG, and OG. However, it overgenerates by also allowing the three ungrammatical clusters *ON, *NL, and *OL too. The crucial distinction here is that the latter three types do not end with a glide, even though their sonority differential is large enough. Consequently, the typical minimum distance approach is unable to correctly account for this glide offset continuum by itself because it is not sufficiently restrictive. Language type D, on the other hand, sets this minimum sonority distance between the anchor and the offset as high as it can go, namely to 3. It achieves this by fixing the sonority index of the anchor at the low end of the scale (n = 1) and prescribing an offset as high in sonority as possible (n = 4). It thus limits all onset clusters to the type OG (obstruent+glide). It therefore confirms and exemplifies the prediction in (4). In fact, this OG cluster is the only one that necessarily occurs in all four language types posited in Table 2. It thus has a special status as the maximally unmarked complex onset universally (modulo the Sonority Dispersion Principle, to be discussed next). Given this, I propose a new technical term to refer specifically to this OG type of onset sequence: maximal distance cluster. This language type (D) is the only one in Table 2 that classical minimum sonority distance constraints can adequately generate. In row 3 of Table 2 I list the corresponding phonological “generalization” that encapsulates each language type. These labels are abbreviated here as “MSD/CG = x”. In this context MSD still refers to the language-specific minimum sonority distance parameter, analogously to (9). However, in this case the respective onset inventories are a priori limited to glide offset clusters only. This is indicated by the subcategorization /CG. This condition is added to distinguish these language types from the more general minimum distance continuum by restricting the former to systems in which the sonority index of C2 must equal 4 (a glide). However, these parametric statements are not intended to represent the formal optimality theoretic constraints that actually encode the relevant phonological patterns. Rather, they are employed here as informal and somewhat brute force abbreviatory devices that simply describe the corresponding language types. As such they can be thought of as preliminary cover terms

114

Steve Parker

for the family of OT constraints that may eventually be posited in future work. At the very least, though, one would hope that the official constraints that “replace” these statements are not this ad hoc. In row 4 of Table 2 I list the names of individual languages that exemplify the four CG types: A, B, C, and D. These are displayed in alphabetical order from top to bottom within each cell. In section 5 I discuss these specific languages one-by-one. There I note certain sociolinguistic details such as genetic affiliation, number of speakers, and geographic location. I also present actual data examples from primary sources to substantiate the attested onset cluster types. 3.3. Introduction to the liquid offset continuum I now return to the Sonority Dispersion Principle mentioned in sections 1 and 2. Recall that this predicts the existence of languages having the cluster type OL, and no others (see (5)). The reason for this is the way in which sonority dispersion is calculated. Clements (1990, 1992) proposes a model of phonotactics that is also rooted in sonority. In this approach, syllables are divided into two partially overlapping constituents called demisyllables (a term borrowed from Fujimura and Lovins 1978). The initial demisyllable contains the onset + nucleus. The final demisyllable comprises the nucleus + coda (the rhyme). For example, the word /plænt/ consists of the demisyllables /plæ/ and /ænt/. As previewed in sections 1 and 2, the Sonority Dispersion Principle prefers initial demisyllables whose segments are maximally and equally dispersed in sonority. More precisely, initial demisyllables containing the same number of segments are more harmonic when they minimize a value called D, defined in (10) below. The formula for D is anticipated in the work of Hooper (1976) and Vennemann (1988) on sonority and syllable structure. (10)

Sonority Dispersion Principle m

D= ∑

i=1

1 d2i

where d = distance between the sonority indices of each pair of segments, and m = number of pairs of segments (including nonadjacent ones), where m = n(n − 1)/2, and where n = number of segments Clements (1990: 304) paraphrases (10) as follows: “D ... varies according to the sum of the inverse of the squared values of the sonority distances between the members of each pair of segments within” a demisyllable. In calculating values of D he crucially assumes the five-category scale in (1), with its corresponding sonority indices. Given this, D yields the following values of relative

Sonority distance vs. sonority dispersion – a typological survey

115

markedness among initial demisyllables containing exactly two consonants. In this model, demisyllable types reversing the Sonority Sequencing Principle are intentionally ignored. (11)

Sonority Dispersion values of D for initial CCV demisyllables (based on (10)) OLV = 0.56 most  natural ⏐ OGV = 1.17 ⏐ ⏐ ONV = 1.17 ⏐ ⏐ ⏐ NGV = 1.36 ⏐  NLV = 1.36 LGV

=

2.25

least natural

Given the definition of D in (10), what is important in CCV clusters is that the total of the sonority distances between all three pairs of segments be simultaneously maximized. This leads to a preference for OLV demisyllables because their three constituents are evenly spaced apart (in sonority). In other words, the sonority rank of liquids (3) is precisely halfway between that of obstruents (1) and that of vowels (5). Thus initial demisyllables of this type have the lowest D value in (11), viz. 0.56. The Sonority Dispersion Principle therefore claims that onset clusters consisting of an obstruent followed by a liquid are inherently optimal (unmarked) relative to all other types. This leads to the prediction in (5): some languages should contain this kind of cluster and no others. The following table lists three such languages. It also includes three other languages allowing just OL and NL clusters. I call both of these groups liquid offset languages. Together they challenge my earlier claim that glides are inherently preferred in C2 position. This point is further discussed later in this section.8 In row 2 of Table 3 I include only the six core (rising sonority) clusters from Table 1. This is because to date I have not discovered any CL languages permitting sonority plateaus (like LL, etc.). This is probably an accidental gap. These six core types are arranged vertically from top to bottom within the two rightmost cells of row 2 in order of increasing sonority distance between C2 and C1 . However, the two liquid offset clusters NL and OL are purposely placed below 8. The caption of Table 3 contains the abbreviation CL. This should not be confused with the process of compensatory lengthening, for which it often stands. Rather, in this paper CL signifies onset clusters in which C2 is any liquid, such as [kRV] or [mlV]. It therefore refers to liquid offset clusters and the corresponding languages that attest only such types. It is thus analogous to CG, which in this context stands for glide offset clusters and languages.

116

Steve Parker

Table 3. The liquid offset (CL) continuum of language types, where MSD = minimum sonority distance (the numbers in parentheses indicate the sonority distance (SD) between C2 and C1 ) row 1 language type row 2 permissible () and non-permissible (*) onset clusters

E (NL) *ON (1) *LG (1) *NG (2) *OG (3) NL (1) OL (2) row 3 phonological generalization MSD/CL=1 row 4 name of languages Aceh Garo Isirawa

F (OL) *ON (1) *LG (1) *NG (2) *OG (3) *NL (1) OL (2) MSD/CL=2 Catío Tshangla Wa

the other four types. This is unlike Table 2. The reason for this is that these two language types (E and F) share in common a preference for liquid offsets, combined with a positive sonority differential. The corresponding generalization is that the presence of NL in an inventory implies the inclusion of OL as well, but not vice-versa. This is true of all languages, not just CL types. See the discussion of Greenberg’s (1978) universals in section 3.4. In section 6 I describe the six languages in Table 3 individually and in more detail. A third predicted language type that would also conceivably fit in Table 3 involves a minimum sonority distance of 0. The corresponding phonological generalization would be expressed as MSD/CL = 0. This hypothetical system would thus permit all three types of liquid offset clusters – OL, NL, and LL, the last of which is a plateau. Such a language would then be analogous to type A in the glide offset continuum (Table 2). This would round out the theoretically expected typology in a satisfying way. To date I have not discovered a language fitting this description. This is not surprising since LL is the most marked kind of liquid offset cluster possible (Sonority Distance = 0). The lack of an example of this type of language is considered here to be accidental. From a phonological point of view there is no reason why it could not exist. It is at least expected to be possible if the sonority account posited here is on the right track. See section 6.4 for further discussion. Returning to the six attested CL languages in Table 3, we have thus identified a second type of syllable-initial cluster that patterns as maximally unmarked cross-linguistically: OL. I therefore propose another technical term for this type of preferred complex onset (OL): equal distance(s) cluster. However, there is a concomitant problem with the Sonority Dispersion Principle. As (11) illustrates, the formula in (10) evaluates ON onsets as better than

Sonority distance vs. sonority dispersion – a typological survey

117

NL, ceteris paribus. Nevertheless, as previewed in section 1 and confirmed in Table 3, at least three languages attest NL clusters while disallowing *ON.9 Consequently, this particular prediction of the Sonority Dispersion Principle is incorrect as an absolute universal hypothesis. Therefore, Clements’ (1990) proposal cannot account for certain of the empirical facts documented here, even among liquid offset languages. Furthermore, as foreshadowed in section 1 and illustrated in section 3.2, the existence of many CG-only languages also contradicts the claims of the Sonority Dispersion Principle. Hence this model in and of itself cannot account for all onset cluster patterns, just as classical minimum sonority distance approaches cannot either. At the same time, whatever constraint system is ultimately posited to capture the glide offset continuum in Table 2, it will not be able to generate the two attested types of CL languages in Table 3 either without further enrichment (additional constraints). To keep matters simple I continue to just refer to these two competing tendencies here in an informal way as the glide offset continuum and the liquid offset continuum. Nevertheless, see section 7, as well as Parker (2002, 2011), for more discussion. 3.4. Summary of section 3 In conclusion, there are two different types of unmarked onset clusters crosslinguistically: OG and OL. Hence it is natural that many languages attest both of these sequences simultaneously. In fact, some languages allow precisely these two cluster types and no others. Three such examples are Boikin (Freudenburg and Freudenburg 1974, 1994; Hemmilä 1998), Bukiyip (Conrad 1992; Conrad and Wogiga 1991; Hemmilä 1998), and Urim (Luoma 1985). To summarize, based on all of the languages mentioned in this section (including those in my survey), we can now posit two broad and significant phonological generalizations that have no exceptions, as far as I am aware: (12)

(a) (b)

All languages with complex onsets allow some clusters with obstruent anchors. All languages with complex onsets allow some clusters with either liquid or glide offsets.

The positive universal statements in (12) can be supplemented with other typological claims that are negative in scope:

9. One language in which the opposite is true is English: ON is good yet *NL is bad.

118

Steve Parker

(13) (a) (b) (c) (d)

Not all languages with complex onsets allow clusters with nasal anchors. Not all languages with complex onsets allow clusters with liquid anchors. Not all languages with complex onsets allow clusters with nasal offsets. Not all languages with complex onsets allow clusters with obstruent offsets.

As argued above, the six hypotheses in (12) and (13) are firmly rooted in sonority as an explanatory feature. In other words, they are not random in nature, but principled. Some of the generalizations posited here confirm claims made by Greenberg (1978) concerning the composition of initial (onset) clusters. For example, his hypothesis #33 states, “In initial systems the existence of at least one cluster consisting of nasal + liquid implies the existence of at least one cluster consisting of obstruent + liquid” (p. 266). This is illustrated by the three type E languages in Table 3, and is not falsified by any of my other findings. 4.

Documenting the facts – preliminary matters

In sections 5 and 6 I present further descriptive facts pertaining to many of the languages just sketched. But first I list and explain the criteria used in evaluating the appropriateness of certain data for inclusion in this study. It is important to ensure that the onset clusters analyzed in this paper are truly canonical, productive sequences in the respective languages. Consequently, all of the cases cited below were screened to fulfill several a priori conditions. Specifically, the onset clusters reported here all consistently exhibit ten characteristics: (1) They appear in core vocabulary items, not just in marginal lexical categories such as loanwords, proper names, ideophones, etc. Similarly, forms noted as occurring only in non-standard or restricted dialects are excluded from this corpus, or at least noted as such. (2) As explained in section 2, clusters violating the Sonority Sequencing Principle are mostly ignored in this paper. (3) Triconsonantal onset clusters such as /splV/, /tRjV/, etc. are also excluded. (4) Ideally, the clusters in question are not restricted to word-initial position only. Rather, they should also be attested word-medially (when the language permits that). (5) Furthermore, word-medial onset clusters must be unambiguously syllable-initial. One way to ensure this is if they are clearly preceded by a coda consonant. Otherwise, when they are intervocalic (VC1 C2 V), the syllable break must fall before C1 , without any doubt. This needs to be confirmed by independent, language-specific

Sonority distance vs. sonority dispersion – a typological survey

119

evidence. For example, if C1 can never be a legitimate coda consonant, then it must belong together with C2 in the following onset. (6) The surface cluster must not be derived from an underlying sequence like /#CVCV/ via deletion of the first vowel, especially when this process (syncope) is optional and/or gradient. Similarly, biconsonantal clusters that can always be broken up by an intrusive or transitional vocoid are considered here to be noncanonical onsets. (7) The source(s) of data should provide an exhaustive list of every specific core onset cluster that occurs, such as /pl/, /bl/, /kl/, etc. On the other hand, in reporting the data below I sometimes omit minor phonetic details (such as allophonic variation) when this is not crucial to the analysis. (8) For languages categorized here as liquid offset (CL) only, there must be at least one phonemic glide occurring in simple (CV) onsets. Otherwise the lack of glide offsets could be independently explained by a more general constraint against glides as contrastive segments in the inventory of the language as a whole. (9) Similarly, in order for a language to qualify (for our purposes) as glide offset only, it must contain at least one liquid phoneme in simple (CV) onsets. (10) Finally, for CG-only languages, the source(s) must provide explicit motivation (argumentation) for interpreting such sequences as true biconsonantal onsets. Otherwise a phonetic sequence such as [kwa] could be reanalyzed in an alternative way that does not involve an onset cluster. See sections 5.1 and 6.1 for further discussion of this issue (interpreting glide offset sequences). With respect to the sources of data presented below, certain additional characteristics are also desirable in principle. In an ideal world all of the references cited in a study of this type should display the following features: (1) They correspond to a primary source, i.e., a linguist who has directly elicited the data from live speakers of the language. (2) The coverage should be relatively comprehensive, e.g., based on years of contact with the language community, rather than just a preliminary sketch. (3) The data are formally published in some way, rather than just representing a work in progress. And (4) if more than one authoritative source on the language exists, these should all be in agreement about the relevant facts. All of the criteria listed above are theoretical desiderata for any survey of this type. In actual practice, however, not all of them are always fulfilled, even in this paper. In the presentation of data below I note any exceptions to the conditions enumerated in this section. Furthermore, there are many other potential CG-only or CL-only languages that my survey identified but which are not included in this analysis due to one of the problems noted in this section. See Appendices A and B for a list of such languages. It may well be worth the effort for someone else to follow up on these leads. See also sections 5.6 and 6.4 for further discussion.

120 5.

Steve Parker

Documentation of glide offset (CG) languages

In this section I present data confirming the existence of CG-only languages. The order in which these languages are discussed follows that of row 4 in Table 2. But first I deal with a preliminary background issue that is potentially very important. 5.1. The problem of interpretation A major complication in glide offset languages is the issue of interpretation (see section 4). That is, a phonetically-transcribed sequence such as [pja] can correspond to several different phonological representations, all else being equal (Pike 1947). This is the thorny problem also known in classical phonemics as resegmentation (Robinson 1970; Rensch 1982). Glide offset sequences a priori can potentially be parsed in at least three different ways: (14) Possible phonological representations of a phonetic CGV sequence (a) [pja]: The glide element is in the onset and forms a true bisegmental cluster together with the previous consonant ([C1 C2 V]). j The glide element is in the onset and forms a sin(b) [p a]: gle complex (monosegmental) phonemic unit together with the previous consonant ([C1 V]) – in this case a palatalized stop. This is often called a secondary articulation or offglide. > > (c) [pja]∼[pia]: The glide element is in the nucleus and forms “ a vowel cluster or diphthong together with the following vowel segment ([C1 V1 V2 ]). The analyses in (14) summarize the three primary interpretations of glide offset sequences. Variations and subtypes of these also exist, differing according to the theoretical and representational assumptions of the analyst.10 In this paper all 10. For example, another partially distinct interpretation of the glide is as the surface manifestation of a prosodic feature at the level of the syllable, morpheme, or word: [ P C1 V1 ], where P = prosody. One such approach is Firth’s (1948, 1957) prosodic framework, which has few adherents currently. In this model a prosody is posited when a feature’s phonetic distribution is predictable. For example, there are two situations when this is appropriate: (1) the feature in question is limited to only one occurrence per prosodic domain, and/or (2) the feature appears to equally affect all segments in the syllable, morpheme, or word. In more recent models these types of patterns are normally encoded by autosegmental and feature geometry representations.

Sonority distance vs. sonority dispersion – a typological survey

121

glide offset clusters ex hypothesi correspond exclusively to situation (14a). That is, they are claimed by the sources to represent true sequences in which the glide is explicitly in the onset, not in the nucleus. Furthermore, the glide crucially patterns as a separate consonantal segment distinct from the preceding anchor element, not as a secondary articulation together with it. This is the conclusion shared by all of the authors cited in section 5. What is more, based on their arguments and evidence, as listed in the detailed discussion of each language below, this onset cluster analysis is arguably the most appropriate interpretation in all of these cases, as far as I can tell. Parenthetically, the [CVV] interpretation in (14c) may involve a different kind of issue than the question of two onset consonants (14a) vs. one onset consonant (14b). I illustrate this with a brief look at a fragment of tagmemic theory (Pike 1947), one model of classical phonemics. In this approach the issue of whether a high front vocoid in a sequence that sounds like [CjV] is a consonant ([j]) or a vowel ([i]) is a matter of phonetic resegmentation. In other words, it possibly involves a relatively low-level phonological rule. The criterion typically appealed to in support of this kind of analysis is canonical syllable structure. For example, Pike’s fourth premise states, “Characteristic sequences of sounds exert structural pressure on the phonemic interpretation of suspicious segments or suspicious sequences of segments” (Pike 1947: 60). That is, the clear, unambiguous phonotactic patterns observed in a specific language establish a precedent that should guide us in analyzing more difficult cases. The resulting conclusion can then affect phonetic transcriptions retroactively. For example, a sequence originally “heard” as [Cja] might be retranscribed as [Cia] after interpretation. In my opinion this type of adjustment is dubious. It is true that in some languages the phonetic sequence [Cja] is best derived from underlying /Cia/ (rather than /Cja/) via a process of demorafication. However, the question of its precise surface status should be determined in principle from phonetic criteria alone. That is, in theory the sequences [Cja] and [Cia] normally sound different. Furthermore, acoustically [Cia] should exhibit an appreciable and fairly steady-state high front vocalic segment of moderate duration between the onset consonant and the more nuclear [a]. In actual practice, however, the segmental boundaries of glide-like approximants are notoriously difficult to determine (Ladefoged 2003). This unfortunately complicates the matter. Returning to the cluster interpretation in (14a), there is an inherent problem when CG is the only possible onset sequence observed in a particular language: the danger of circular reasoning. To illustrate, suppose a language has no canonical (unambiguous) onset clusters. The best exemplar of these would be an obstruent followed by a heterorganic liquid: [pl], [kR], etc. In such a case

122

Steve Parker

the analyst faces a dilemma when confronted with CGV sequences. One option is to simply posit a new and unprecedented syllable type with complex onsets: [C1 C2 V]. However, if this lacks independent phonological support, it is a brute force move. Its advantage, on the other hand, is that it minimizes the inventory of individual consonant segments (as in 14b) and/or complex vowel sequences (14c). A second option is to economize on the inventory of syllable types in the language by interpreting glides as secondary modifications of the preceding consonant (labialization, etc.), or by assigning them to the nucleus. This has the advantage of leaving the canonical syllable structure intact (unmodified). Its cost, however, is that it can greatly increase the inventory of contrastive segmental “phonemes” – consonant units occurring in the onset position of simple (CV) syllables, as in (14b). In this kind of situation, then, we may have to choose a more or less arbitrary representation for such sequences without compelling evidence either way. For further general discussion of this issue and its implications, see Suárez and Suárez (1961), Bendor-Samuel (1965), Bendor-Samuel (1966), Bearth and Zemp (1967), Lloyd and Healey (1970), Pet (1979), Fiore (1987), Hofmann (1990), Vissering (1993), Fried (2000), Parker (2002, 2011), Beachy (2005), Olson (2005), and Ahland (2009). Perusal of the references above suggests that the question of a [CG] cluster vs. a [CG ] monosegmental unit is often an unsolved problem. On the one hand it is possible to cite various criteria posited in the literature for deciding this issue (see (15) and (16) below). On the other hand there is very little consensus about the criteria themselves, much less their relative importance or weight. Furthermore, because of this uncertainty there are disagreements about the best interpretation of CG sequences in many specific languages. As a result, it may not be possible to conclusively establish a single correct analysis for all cases. Rather, in such languages, the decision may involve a more-or-less subjective judgment based on fairly subtle evidence. Consequently, it may be impossible to refute someone who insists on an alternative analysis. For example, in Baertsch’s (1998) model the glide in CGV sequences is automatically assigned to the nucleus in the default case, for representational reasons. Similarly, when a language allows several kinds of anchor consonants but only glides in the offset, some researchers take this as evidence that the glides are part of a nuclear diphthong rather than an onset cluster (Duanmu 1990; Kim 1998). In summary, then, at present there is no clear solution to this problem that applies universally. Neither do I have a definitive answer to propose here. Nevertheless, as shown in the subsections below, it is still possible to come to a reasonably firm conclusion that CG-only clusters do exist in some languages. The key, I suggest, is to confirm this interpretation with the accumulation of language-specific evidence. In other words, one logical way to proceed

Sonority distance vs. sonority dispersion – a typological survey

123

is to establish some principled a priori criteria and then apply them consistently to the problem cases. Furthermore, a crucial point to emphasize is that in every language containing glide offset clusters, one of the three interpretations in (14) must be the right one. Which of these is correct obviously differs from one language to the next. The solution is to seek independent evidence in each language to help disambiguate CG sequences when these are the only potential clusters. Each language presents its own idiosyncrasies and clues and must therefore be deciphered on a case by case basis. Nevertheless, several general guidelines can be gleaned from the references listed above. Specifically, all else being equal, a true glide offset cluster (as in 14a) is best posited when one or more of the following factors are present: (15)

Evidence/arguments for interpreting CG sequences as an onset cluster (C1 C2 V) (a) The glide element never bears (contrastive) tone or stress. (b) There are no canonical vowel sequences in syllable nuclei. (c) The glide element does not contribute to syllable weight, such as in fulfilling prosodic minimality or attracting stress. (d) Other canonical vowel clusters are resolved by epenthesizing an intervening consonant or deleting one of the vowels (truncation). (e) The total measurable duration of the glide element is approximately as long as that of a single unambiguous onset consonant in CV syllables (Ladefoged and Maddieson 1996). (f) The glide element can occur independently (as the phoneme /j/ or /w/) in onset position in simple CV syllables. (g) There is a pressure or desire to minimize the total number of phonemic units in the segmental inventory.

On the other hand, the glide element in CGV sequences is best assigned to the syllable nucleus in the following situations: (16)

Evidence/arguments for interpreting glides as part of a complex vowel or diphthong (CV1 V2 ) (a) The glide can bear (contrastive) tone or stress. (b) There are canonical vowel sequences in syllable nuclei. (c) The glide element contributes to syllable weight, such as in fulfilling prosodic minimality or attracting stress. (d) The glide element does not occur independently (as the phoneme /j/ or /w/) in onset position in simple CV syllables.

124

Steve Parker

(e)

There is a pressure or desire to keep the maximal syllable template as uncluttered as possible, without expanding it to allow onset clusters.

The criteria in (15) and (16) obviously conflict with each other. Taken together, they often consistently point to the preferred interpretation of CG sequences in many languages. Consequently, in the presentation of glide offset languages below I note the language-specific arguments given by each source in support of this conclusion. However, in other languages the evidence is mixed, and then the correct interpretation is more difficult to ascertain. As noted in section 4, Appendix A lists a number of apparent CG-only languages. In some of these no explicit argumentation is given by the author, so I decided not to include those languages in the analysis here. On the other hand, the data presented in the body of this paper (section 5) represent the languages in my sample for which the CG cluster interpretation is most strongly supported, in my opinion. Among the criteria in (15) and (16), the last point in each list constitutes opposing desiderata: either minimize the inventory of phonemic segments or else keep the inventory of syllable types simple. These two pressures are at odds with each other. Furthermore, they are not language-specific in nature but rather constitute a priori theoretical principles and therefore potentially apply to all cases. However, for this reason they are also the criteria most in danger of circular reasoning when language-specific confirmation is lacking. The tension is this: regardless of the ultimate interpretation of CG-only sequences, in every language we have to complicate the overall analysis in one way or another. For example, we can posit a greater number of contrastive segments, but this entails secondary articulations (palatalization and/or labialization) which otherwise may not occur. On the other hand, we can expand the list of syllable types to permit complex onsets without independently-observed canonical clusters. I am not aware of any work establishing a critical cut-off point in terms of how many new complex phonemes can be “added” before it becomes more economical in the long run to just assume the existence of onset clusters. Nevertheless, at the very least we can posit the following cross-linguistic heuristic principle as a logical guideline in the default case. This is one specific instantiation of Pike’s (1947) general premise quoted above. (17)

All else being equal, the greater the number of additional phonemic units (/CG /) concomitantly required by a non-cluster interpretation in a particular language, the greater the structural pressure to complicate the canonical syllable template by allowing for onset clusters (/C1 C2 /). Conversely, the fewer the number of additional phonemic units required by a non-cluster interpretation in a particular language, the

Sonority distance vs. sonority dispersion – a typological survey

125

smaller the structural pressure to complicate the canonical syllable template by allowing for onset clusters. In classical phonemics the structural pressure referred to in (17) falls on the descriptive analyst. Nevertheless, most works of that period do not define this pressure in more concrete terms (unlike (18) below). This pressure in fact derives from Occam’s razor – the principle that the simpler the better. Whether it also guides children acquiring specific languages is an interesting but separate issue that cannot be pursued here. With respect to (17) the number of resulting complex phonemic segments (/Cj / and/or /Cw /) increases proportionally to two different factors. One is the quantity of contrastive glides appearing in offset (C2 ) position in that language. The other is the number of anchor consonants occurring in C1 position. At the same time this latter quantity (potential C1 consonants) is usually correlated with the number of different sonority-based natural classes permitted in that slot. Consequently, concerning the four types of glide offset languages in Table 2, we can make a general prediction: the lower the setting (numerical value) of the MSD/CG = x parameter, the greater the number of possible clusters and therefore the greater the inherent capacity to maximize the pressure described in (17). Thus type A languages, for instance, should naturally tend to display the largest inventory of CG onset cluster combinations, ceteris paribus. This is because they select potential anchor segments from four different natural classes: O, N, L, and G. Conversely, type D languages are expected to exhibit the smallest number of biconsonantal onsets overall since these are a priori limited to OG only. In fact, it is possible to precisely quantify the pressure described in (17). This will be done for each CG-only language analyzed here. For this purpose I propose a new quantitative measure to directly compare and contrast languages and thus determine their relative typological markedness with respect to this statistical parameter. I call this numerical value the Cluster-to-Segment Ratio. It is defined and calculated as follows: c (18) Cluster-to-Segment Ratio (CSR)def = s where c = number of specific, individual types of bisegmental onset clusters occurring in language X, given the interpretation in (14a), and ignoring SSP reversals and s = number of individual phonemic consonant units occurring in language X as simple onsets in CV syllables To illustrate, suppose a language contains exactly 12 consonant phonemes in simple CV syllables, including /j/ and /w/. This is the denominator in (18). Sup-

126

Steve Parker

pose furthermore that no unambiguous onset clusters such as OL are attested in this language. However, suppose there are a number of phonetic [CGV] sequences such as [pwa], [kjo], etc. Let us say hypothetically that 15 specific glide offset clusters of this type would result if we posit a bisegmental C1 C2 interpretation. This value (15) is then the numerator in (18). Given these two values, the Cluster-to-Segment Ratio for this language is 15 ÷ 12 = 1.25. This amounts to a mathematical approximation of the phonological “cost” of rejecting onset clusters in this case. In other words, instead of analyzing [pw] and [kj] as consonant clusters in this language, let us say we interpret them as the unit phonemes /pw /, /kj /, etc., with secondary articulations. If 15 such sequences are in fact attested, a unit interpretation of them (as in 14b) would entail 15 additional consonant “phonemes”. This would increase the inventory of phonemic segments from 12 to 27, an increase of 125%. This in effect more than doubles the number of contrastive phonemes in this language (among the consonants). Therefore, as the number of C+G clusters occurring in a language increases, so does the obtained Cluster-to-Segment Ratio. This, in turn, leads to an increase in the structural pressure described in (17). Therefore, for each glide offset language described below, I first list all of the individual consonant phonemes posited by the source (not counting CG sequences). I then list the specific, individual types of C1 C2 clusters also posited by the source. Given these two numbers, I then calculate the Cluster-to-Segment Ratio for that language. After this has been done for all CG languages analyzed here, I consider additional statistical characteristics of the sample, such as Cluster-to-Segment values averaged across language types. This is discussed in section 5.6. To summarize this section, in structuralist analyses of specific languages some types of onset clusters (such as OL) have only one possible interpretation or representational status. Other potential clusters, such as OG, are inherently ambiguous. The latter are interpreted in light of independently-motivated canonical patterns in the language. In generative works there is typically less emphasis on this kind of “discovery procedure” (Pike 1947). Nevertheless, this issue must be confronted in a survey of this type since descriptions of individual languages have appealed to a variety of criteria whose usefulness and relative force is largely a matter of opinion. Consequently, for some CG-only languages it is difficult, or even impossible, given current phonological theory, to be certain which interpretation is right. Thus in some cases the choice between the complex segment analysis vs. the onset cluster interpretation is still somewhat subjective in the end. Therefore, a potential problem in these cases is that no decisive standard exists for evaluating the relative (dis)advantages of an increase in the number of phonemes vs. an increase in the number and types of clusters. For example, we might attempt to establish some hypotheti-

Sonority distance vs. sonority dispersion – a typological survey

127

cal Cluster-to-Segment value as a threshold that crucially determines the outcome one way or the other. However, there appears to be no way to do this that does not involve arbitrary stipulation. Given this, I opt here for the (arguably) best solution available in the circumstances. Specifically, CG-only interpretations of the languages below are followed in this paper when the source of a description provides reasonable language-specific argumentation that is not obviously problematic. Furthermore, the degree to which a single-segment interpretation would lead to a fairly large increase in the size of the phoneme inventory serves as a concrete confirmation of these analyses. Nevertheless, it must be acknowledged that these two criteria are not necessarily conclusive in and of themselves. Consequently, a reader who insists on a higher standard – absolute proof – may not be persuaded by every piece of evidence summarized below. However, the existence of glide offset languages seems inherently plausible, and classical Minimum Sonority Distance constraints directly predict that some of these should in fact be found, namely, those with OG clusters only. In conclusion, then, the discovery of examples of each of the four theoretically possible CG-only language types (A, B, C, and D) is a satisfying outcome of the survey summarized here. At the same time, it reconfirms the importance of sonority in accounting for these patterns. 5.2. Type A languages (OG + NG + LG + GG clusters) 5.2.1. Shilluk (ISO 639-3 code: shk) I begin the presentation of actual linguistic data with the sole language categorized as type A in Table 2. The following display summarizes the relevant facts. Different sections of this table are separated by the symbol “ ”. These divisions are explained immediately afterwards. Table 4. Synopsis of descriptive facts for the Shilluk language classification: Nilo-Saharan, Eastern Sudanic, Nilotic, Western, Luo, Northern, Shilluk country: Sudan population: 175,000 sources of data: Gilley 1992; Remijsen, Ayoker, and Mills 2011 dialect of focus: Pachoda (in Gilley 1992) consonant phonemes (19): p ”t t c k b d” d é g m n” n ñ N l R j w $CGV clusters (31): pj ”j t tj cj kj bj d”j dj éj gj mj ñj Nj lj Rj pw ”tw tw cw kw bw d”w dw éw gw mw nw ñw Nw lw jw CSR: 1.63 $CLV clusters: none other #CCV clusters: none

In Table 4 the first entry is “classification”. This refers to the full genetic grouping established for this language. Then comes the country or countries in which it is spoken (Sudan). Next is the total estimated population of speakers (175,000 persons). In all such tables these first three items invariably follow the conventions and spelling of the latest edition of the Ethnologue (Lewis 2009). The

128

Steve Parker

sources of data for the phonological description assumed here are Gilley (1992) and Remijsen, Ayoker, and Mills (2011). Then comes the specific dialect of the Shilluk language focused on by one or both of these sources (Pachoda). Not all sources in my survey include this detail. “Consonant phonemes” is a list of all underlying segments posited by the source(s). These are arranged first by manner of articulation, in order of increasing sonority (lowest to highest). Within each of these natural classes the segments are then arranged by place of articulation. The number in parentheses after the category “consonant phonemes” (19) indicates how many of these units there are in total. The next section of the table is labeled “$CGV clusters”, where “$” stands for a syllable boundary. This is an exhaustive list of all the specific syllable-initial clusters of type CG (glide offset). It follows the same general order as the previous entry (consonant phonemes). Combinations with the glide /j/ come first, then those with /w/. Next is the Cluster-to-Segment Ratio, as defined in (18). Recall that this is computed by dividing the number of onset clusters (31 in this case) by the number of segmental consonant phonemes (19 in this table). The penultimate entry is called “$CLV clusters”. This is a list of any observed syllable-initial liquid offset clusters. Here in section 5 all such tables indicate “none”. This is just a reminder and confirmation that none of the languages of type CG contains any CL clusters. Finally, I list any additional (noncore) types of word-initial consonant clusters that may be attested in the language, for completeness. In most cases this entry is vacuous, indicated by “none”. I now present a few actual Shilluk words to illustrate the different types of onset clusters that occur. These are taken from Gilley (1992) and/or Remijsen, Ayoker, and Mills (2011) – the sources noted in Table 4. A period (.) indicates syllable boundaries. Underlined vowels are contrastively [+ATR], following Gilley (1992), who calls this feature expanded pharynx. There is one minor analytical difference between the two sources: for Remijsen, Ayoker, and Mills (2011) the lone rhotic phoneme is a trill while Gilley (1992) describes it as a flap. The latter is used in Table 4. This discrepancy does not crucially affect any conclusions here. (19) Shilluk data examples (Gilley 1992; Remijsen, Ayoker, and Mills 2011) OG: NG: LG: GG:

[t”jEw] ‘also’ [`a.mj´El] ‘stubborn’ [lj¯Ec] ‘elephant’ [jw`Ot] ‘flying termites’

[bj´El] ‘millet’ [kwE:R] ‘small hoe’ [NjEl] ‘to trundle’ [ñwE.lO] ‘earthworm’ [´o.Rj´al] ‘mongoose’ [lw´ak] ‘barn’ [jw´e”t] ‘defile (agentive deverbal noun)’

Together with Table 4, the data in (19) indicate that Shilluk exhibits all four types of glide offset clusters: OG, NG, LG, and GG. By definition, then, a type A language displays two consistent phonological characteristics: (1) C2 is always

Sonority distance vs. sonority dispersion – a typological survey

129

and only a glide, and (2) C1 segments consist of at least one member from each of the four sonority natural classes. So the sonority index of C2 = 4 and the sonority index of C1 ranges through the values 1, 2, 3, and 4. Therefore, the minimum sonority distance between C2 and C1 is 0 (cf. Table 2). Among the 19 Shilluk consonant phonemes, 15 combine as anchors with /j/ offsets and 16 combine with /w/. The distribution of C1 and C2 in complex onsets, then, is almost completely exhaustive. In other words, “there is virtually no cooccurrence restriction on consonants” (Gilley 1992: 25). Of the 38 theoretically possible CG clusters, seven do not occur: [nj], [n”j], [n”w], [Rw], [wj], [jj], and [ww]. In explaining these lacunae Gilley notes that [nj] has probably merged with [ñ], accounting for the absence of the former. Furthermore, /n” / is a rare phoneme to begin with. Consequently, she considers the lack of /n”j/ and /n” w/ clusters to be accidental, based on insufficient data. The glide plateaus /jj/ and /ww/ are independently ruled out by a general constraint against geminates. This leaves just */Rw/ and */wj/ as the only true systematic gaps. As observed in section 3, both of the latter two cluster types (LG and GG) are very marked cross-linguistically anyway. Nevertheless, as Table 4 and (19) show, Shilluk does have /lj/, /Rj/, /lw/, and /jw/. In fact, Leoma Gilley (p.c.) reports the existence of at least 19 different morphemes beginning with the sequence /jw/. Overall, then, Shilluk permits a total of 31 kinds of CG clusters. The ratio of posited clusters to individual phonemic segments is thus 31 ÷ 19 = 1.63. This is the largest obtained Cluster-to-Segment value among my sample of languages (see section 5.6). Therefore, the hypothetical “cost” of positing labialized and palatalized consonant units in Shilluk, instead of expanding the onset from C to CC, would be an increase in the segmental inventory by 163% (cf. (18)). I argue that this is prohibitive, so a single minor adjustment in the maximal syllable template is preferred. Gilley (1992: 24–26) gives other reasons (besides economy) for her cluster interpretation. First, she notes that rising sonority diphthongs do occur in Shilluk but behave differently from the CGV sequences listed here. Second, there are no clear (unambiguous) vowel sequences. Third, both glide offsets /j w/ can be followed by front and back vowels. Remijsen, Ayoker, and Mills (2011) concur with this interpretation but do not offer any argumentation. 5.3. Type B languages (OG + NG + LG clusters) 5.3.1. Ga’dang (ISO 639-3 code: gdg) Ga’dang is an Austronesian language of the Philippines (Table 5). A total of 16 consonant phonemes are combined into 17 different CG clusters. All but three of the non-glide segments (/P F r/) are attested as anchors in at least one CG cluster each. However, *GG is systematically missing. This is not surprising

130

Steve Parker

Table 5. Synopsis of descriptive facts for the Ga’dang language classification: Austronesian, Malayo-Polynesian, Philippine, Northern Luzon, Northern Cordilleran, Cagayan Valley, Ibanagic, Gaddangic country: Philippines population: 6,000 source: Troyer 1959 consonant phonemes (16): p t k P b d g F s m n N r l j w $CGV clusters (17): pj tj kj bj dj gj nj Nj lj tw kw bw dw sw mw nw lw CSR: 1.06 $CLV clusters: none other #CCV clusters: none

since it is a noncore cluster – a sonority plateau (section 2). Representative data are presented in (20):11 (20) Ga’dang data examples (Troyer 1959) OG: NG: LG:

[n@p"pja] ‘good’ [pakaPam"mwan] ‘joking’ [tal.ljok] ‘bend in river’

["kwi] ‘to’ [gina"bwat] ‘got up’ [ma"man.nwet] ‘fishing with a hook’ [pal"ljot] ‘short flute’ [mal"lwag] ‘boiling’

In Troyer’s (1959) data it is curious that so many CG clusters occur in words where the anchor consonant is the second half of a geminate, as (20) illustrates. Nevertheless, she lists many other examples where this is not the case, analogously to [gina"bwat] ‘got up’ in (20).12 However, all of the latter involve obstruent anchors, never nasals or /l/. This is puzzling and merits further investigation to determine whether it is a systematic restriction. Troyer does clarify that “[g]eminate clusters are made up of re-articulated phonemes when pronounced syllable by syllable, but are long phonemes in fast speech” (p. 96). Troyer’s (1959: 96–97) discussion provides several arguments for her biconsonantal CG interpretation. For example, there are no non-suspect vowel clusters (where both members are syllabic). Furthermore, a sequence like [djob] is always pronounced as one syllable phonetically, even in very slow speech. Also, strings like [djob] contrast with similar sequences containing syllabic high vowels, e.g., [ma"di.jot] ‘will bathe’. Finally, “… to conclude that y [j, S.P.] and w are in these instances palitalization [sic] and labialization of the preceding consonant would more than double the number of consonant phonemes” (p. 97). Accordingly, the Cluster-to-Segment Ratio for Ga’dang is 1.06. Mike Walrod (p.c.) concurs with this analysis, adding that vowel sequences always have a [P] between them. 11. The form for ‘bend in river’ occurs in a text and thus is not marked for stress. Troyer (1959) notes that stress is somewhat unpredictable but falls most often on the penultimate syllable. 12. Mike Walrod (p.c.) reports that the /p/ in [n@p"pja] ‘good’ in (20) is phonetically short, not long. He worked on the language for many years and wrote a dissertation on its discourse structure (Walrod 1983). Unfortunately, however, he knows of no published analyses of Ga’dang phonology other than Troyer (1959).

Sonority distance vs. sonority dispersion – a typological survey

131

An important theoretical assumption of Optimality Theory is Richness of the Base (ROTB). This principle states that there are no systematic restrictions holding of input (underlying) forms (Prince and Smolensky 1993/2004; Smolensky 1996; Smolensky and Legendre 2006; Booij 2011). Thus we might ask, what would Ga’dang speakers do with a hypothetical word containing an “illicit” cluster, such as OL? There are several possible ways to empirically access the rich base and hence test how specific languages repair prohibited clusters: (1) morpheme concatenation, (2) neologisms, (3) loanwords (Hyman 1970; Jacobs and Gussenhoven 2000; Smith 2009), (4) psycholinguistic experiments involving nonce forms (Hayes and Londe 2006; Zuraw 2007; Coetzee 2008, 2009), (5) speech errors such as spoonerisms, (6) ludlings (word games), and (7) prior stages in the history of the language. In this case Troyer (1959) provides us with a few concrete illustrations. On p. 100 she gives a list of consonant clusters occurring across syllable boundaries. One of these is br, attested in the word ["leb.ro] ‘book’. She notes that this was borrowed from Ilocano; undoubtedly it ultimately comes from Spanish, where it is syllabified as ["li.BRo]. In native Ga’dang words the phoneme /b/ does otherwise occur in syllable codas, at least word-finally and in intervocalic geminate clusters. It thus patterns like all other consonant segments of the language. Another such heterosyllabic sequence in Troyer’s list is /tl/: [tat.li"Fan] ‘Pass by!’. We thus have direct evidence that Ga’dang speakers actively avoid tautosyllabic liquid offset clusters, at least those starting with /b/ and /t/. Furthermore, in both of these cases the observed strategy to deal with the “problematic” input is the same: place a syllable break between the two consonants. Nevertheless, Troyer unfortunately does not give any evidence to support these syllabifications. Two other nativized forms that are probably also borrowed from Spanish occur in her data, but without any comment on her part: [nataram"poso] ‘very wicked’ (cf. tramposo) and [matara"baFo] ‘will work’ (cf. trabajo). In discussing the remaining languages below I similarly note any specific examples of loanwords or other analogous forms that illustrate Richness of the Base. However, not all sources mention this detail (how ungrammatical input clusters are resolved). 5.3.2. Western Parbate Kham (ISO 639-3 code: kjl) Western Parbate Kham is a Sino-Tibetan language of Nepal (Table 6). Elsewhere in this paper it is referred to simply as Kham. Sample data are presented below. In these transcriptions there appears to be a syllable-final /h/. This is actually not a consonant but rather represents a lax phonation register of the

132

Steve Parker

preceding vowel, following Watters (1998).13 An interesting detail of Kham’s phonemic inventory is a series of contrastively aspirated stops and affricates. These can also cluster with glide offsets, as illustrated in (21) and (23). Table 6. Synopsis of descriptive facts for the Kham language classification: Sino-Tibetan, Tibeto-Burman, Himalayish, Mahakiranti, KhamMagar-Chepang-Sunwari, Kham country: Nepal population: 24,500 sources: Watters 1971, 1998, 2002, 2003 dialect of focus: Takale consonant phonemes >> > (22): p t k ph th kh b d g ts tsh dz s z m n N l R j w h $CGV clusters (35): pj tj kj > > > > > ph j th j kh j bj dj gj tsj tsh j dzj sj zj nj lj Rj hj tw kw ph w th w kh w bw dw gw tsh w dzw sw mw nw Nw lw Rw hw CSR: 1.59 $CLV clusters: none other #CCV clusters: none

(21) Kham data examples (Watters 1998) OG: [gjo:h] ‘long, big’

> [ts h jam] ‘day’

[sw˜I:] ‘hot’

NG: [mw˜I:] ‘(to) warm’

[nw˜I:] ‘milk (n)’

[zal-nja] ‘(pour-inf) to pour’

LG: [bah.lja] ‘cock (rooster)’ [lwi:-] ‘(to) insert’ [Rwi:h] ‘bug (n)’

As listed in Table 6 and illustrated in (21), Kham exhibits OG, NG, and LG onset clusters. Steve Watters (p.c.) confirms that */jw/ and */wj/ are not attested.14 The ratio of total clusters (35) to consonant phonemes (22) is 1.59. This is nearly as large as that observed for Shilluk in section 5.2.1. In fact, it is the second highest Cluster-to-Segment value among the languages analyzed > here (section 5.6). Watters (1998) also includes /mj/, /tsw/, and /zw/ in his list of permissible clusters. His resulting generalization is as follows (excluding */jw/ and */wj/): all consonants except /N/ cluster with /j/, and all except /p/ combine with /w/. He comments, “This restriction appears to be arbitrary. In principle, there is no good reason why they shouldn’t occur, since clusters like /my/ and /ny/ are possible, as well as /phw/. The initial cluster /Ny/ occurs in Gamale” (Watters 1998: 41, fn. 2).15 However, he also notes that some of the clusters he posits occur only in onomatopoeic words (p. 41). Unfortunately, he does not specify which ones these are. Nevertheless, for each of the 35 clus13. In onset position, however, the symbol /h/ indicates a voiceless glottal “approximant” or “frictionless continuant”: [ha:] ‘tooth’, [hja: hi:] ‘flinging’. Watters (1998:55) notes that “[t]he consonant /h/ has no particular vocalic shape. It is the voiceless equivalent of the vowel that follows”. 14. David Watters passed away a few years ago. Steve Watters is his son and continues to work on the language. 15. Gamale is a language related to Kham. In this quote, /my/ = IPA /mj/, /ny/ = /nj/, /phw/ = /ph w/, and /Ny/ = /Nj/.

Sonority distance vs. sonority dispersion – a typological survey

133

ters listed in Table 6, I found at least one canonical example in Watters (1998). That is, these 35 clusters occur in (at least) some words whose glosses do not appear to involve onomatopoeia. They are therefore included in Table 6. The > remaining three clusters listed in Watters (1998) are /mj/, /tsw/, and /zw/. I assume these are the ones limited to marginal items such as onomatopoeia. Steve Watters (p.c.) confirms this, at least for /zw/: (22)

A cluster type restricted specifically to onomatopoeic items in Kham /zwaR zwaR/ ‘quickly, with a large flow (as in milking)’ /zwa zwa/ ‘burn steadily (as in a fire)’ /zwoN zwoN/ ‘sparkling, shiny’

As (22) illustrates, words of this type often involve reduplication. In such cases the glide offset is copied too. Due to the marginal nature of these three specific clusters, I do not include them in Table 6. In arguing for CG sequences as bisegmental onsets, Watters (1998) notes that neither of the two consonants contributes to syllable weight. Thus CGV patterns as light in terms of “rhythm groups” (p. 74). The canonical syllable rhyme is maximally bimoraic: *[CV:C]σ . Therefore a coda consonant is possible only when the vowel is short. When there is a contrastively lengthened vowel or a diphthongal nucleus, codas do not occur. Nevertheless, complex CG onsets do occur in syllables with long vowels or coda consonants. This fact indicates that prevocalic glides do not reside in the rhyme (Watters 1998: 76):16 (23)

Kham bimoraic syllables (see also (21) and (22)) /th ju:/ ‘in disgust’ /tw˜I:za/ ‘short’ /kwa:/ ‘clothing’ /ph wak/ ‘immediately in place’ /ph wi:-/ ‘to pump bellows’

In terms of Richness of the Base (section 5.3.1), Kham presents two specific examples of directly avoiding OL clusters. First, when Cr clusters exist in words borrowed from Nepali, they are repaired by inserting a schwa between the two > consonants: jhãkri > [dz˜a:hk@Ri] ‘shaman’ (Watters 1998: 56). Second, the same strategy (vowel epenthesis) is also observed in historically reconstructed forms with CL clusters. Normally [@] is the default vowel in Kham. However,

16. The onset cluster interpretation followed here is posited in Watters’ most recent works on Kham (1998, 2002, 2003). This appears to be a change from Watters (1971). In the latter he analyzes the glide in CGV sequences as part of a complex syllable nucleus.

134

Steve Parker

when the next syllable contains a rounded vowel, this epenthetic nucleus completely harmonizes with it:17 (24) historical vowel insertion in Kham (Watters 1998: 67) *ble > [b@le:-] ‘to ruin’ *kri > [k@Ri:-] ‘to slice’ but *kru > [kuRu:] ‘a bur’ *phlo > [ph olo:-] ‘to split bamboo’ 5.3.3. Koonzime (ISO 639-3 code: ozm) Koonzime is a Bantu language of Cameroon (Table 7). It has contrastive labiovelar stops, both voiced and voiceless. It also has a labiovelar nasal. However, it lacks a rhotic phoneme. Consequently, LG onsets are limited to anchors with the lateral /l/. Furthermore, dorsal consonants do not combine with /j/ offsets. Sample data are presented in (25). Tone is marked with a superscripted number after each syllable, following Beavon (1977). In his transcription system, “1” is the highest phonetic pitch, etc. Table 7. Synopsis of descriptive facts for the Koonzime language classification: Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Northwest, A, Makaa-Njem (A.80) country: Cameroon population: 30,000 sources: Beavon 1977, 1983 dialect of focus: Lomié conso> > > l j w $CGV clusters (18): pj nant phonemes (18): p t c k kp b d é g gb s m n ñ Nm tj bj dj sj mj nj lj pw tw kw bw dw gw sw mw nw lw CSR: 1.00 $CLV clusters: none other #CCV clusters: NO

(25) Koonzime data examples (Beavon 1977) OG: [e4 bja1−3 ] ‘to give birth’ [gwa1 ] ‘almost’ NG: [mj˜ah4−5 ] ‘bag’ LG:

[ljEN1−3 ]

‘habit’

[e4 swam1−3 ] ‘to be shrivelled up’

[mw˜ah4 ] ‘crayfish’ [nwaP1−3 ] ‘wild mango’ [lwaP1−3 ] ‘eyedropper’

The last entry in Table 7 indicates that Koonzime has some word-initial clusters of the type NO, in addition to CG onsets. An example is [nte1 me1 ] ‘jealousy’. Beavon (1977) interprets these as true sequences rather than prenasalized unit 17. In (24) the original vowels have undergone diachronic lengthening. The explanation for this is as follows. In order for a Kham lexical item to occur as an independent prosodic word, it must be minimally bimoraic. Thus in monosyllabic morphemes pronounced in isolation the vowel is predictably long. This correlates with stress placement as well. In the forms in (24) the lengthening of the (final) vowels took place before the epenthetic vowel was inserted between the two consonants. That is, prosodic minimality triggered automatic vowel lengthening in unaffixed roots and subsequently schwa epenthesis rendered this earlier process opaque (Watters 1998).

Sonority distance vs. sonority dispersion – a typological survey

135

phonemes. The reasons for this are not relevant for our purposes. Furthermore, as noted in section 2, reversed sonority clusters are ignored here for the sake of simplicity. Beavon (1977: 6–8) provides several motivations for a cluster interpretation of syllable-initial CG sequences. One argument is that /w/ and /j/ exist as independent phonemes, contrasting with all other onset consonants in CV syllables. Also, glides combine with a wide variety of anchor consonants, both voiced and voiceless (Cluster-to-Segment Ratio = 1.00). Furthermore, phonetic syllables having the shape [CjV] and [CwV] clearly contrast with [CiV] and [CuV], respectively. This is because each of the two vocoids in [CiV] and [CuV] sequences can exhibit different tones, whereas those in [CjV] and [CwV] cannot. In other words, glide offsets never bear phonemic tone (Beavon 1983). There is also a contrast between palatally-released alveolar stops (/tj/ and /dj/) and the pure palatal phonemes /c/ and /j/. What is more, interpreting CGV sequences as vowel clusters (CVV) would result in several trisyllabic stems, which otherwise are not attested. That is, in all canonical cases the stem is maximally disyllabic, and no unambiguous vowel sequences are ever tautonuclear, so V1 V2 would have to be parsed into separate syllables. This would lead to some stems with the unprecedented pattern [CV.V.CV]. A final argument for this analysis is the same as one given in section 5.3.2 for Kham: prevocalic glides do not contribute to syllable weight. Thus the maximal syllable is bimoraic, so a rhyme may have a lengthened vowel or a coda consonant, but not both. Nevertheless, CG onsets occur in syllables with heavy rhymes, so they are not moraic. Some examples are given in (25). Here are a few other illustrative forms: (26)

Koonzime words with bimoraic rhymes (Beavon 1977) [e4 pjab4−5 ] ‘to winnow’ [e4 djEl4−5 ] ‘to bind’ [kwan4−5 ] ‘honey’ [e4 dwe:4−5 ] ‘to load a gun’ [lwal4−5 ] ‘duck’

5.3.4. Summary of section 5.3 To summarize section 5.3, I have described three languages classified here as type B. Their common phonological characteristic is the inclusion of complex onset clusters consisting of three general combinations: OG, NG, and LG. Furthermore, all of these languages proscribe GG clusters entirely. In schematic terms, Sonority Index(C2 ) = 4 and Sonority Index(C1 ) ∈ {1,2,3}, so Sonority Distance ≥ 1. The corresponding prose generalization is that the offset must be a glide and the set of anchor consonants must include at least one liquid, as well as members of the other two natural classes lower in sonority: nasals and obstruents. This is abbreviated in Table 2 as MSD/CG = 1.

136

Steve Parker

5.4. Type C languages (OG + NG clusters) 5.4.1. Angaataha (ISO 639-3 code: agm) Angaataha is a Trans-New Guinea (non-Austronesian) language of Papua New Guinea (Table 8). It has only 11 consonants in its phonemic inventory, forming five specific clusters. Among these, /Pw/ is rare (Huisman, Huisman, and Lloyd 1981). In the sample data below an acute accent marks high tone and low tone is unmarked, following the source. Table 8. Synopsis of descriptive facts for the Angaataha language classification: Trans-New Guinea, Angan, Angaatiha country: Papua New Guinea population: 2,100 principal source: Huisman, Huisman, and Lloyd 1981; secondary > sources: Lloyd 1973; Huisman 1992 consonant phonemes (11): p t k P tS m n N R j w $CGV clusters (5): nj pw kw Pw mw CSR: 0.45 $CLV clusters: none other #CCV clusters: none

(27)Angaataha data examples (Huisman, Huisman, and Lloyd 1981) OG: [t´a.m.pw´ ´ ai.Po] ‘lizard sp.’ [[email protected]] ‘tree sp.’ NG: [nja.nj´@ .Pa] ‘flower sp.’

[´a.Pwi.p´a .t@] ‘type of trap’ > [ma.nj´I.nj´ai] ‘children (obj.)’ [mw´@[email protected]@] ‘axe’

The word glossed ‘children’ in (27) is transcribed differently in Huisman (1992), as [ma.n.ji.n.jai]. This seems to indicate that he now analyzes the two /n/s in this word as phonetically syllabic. This casts some doubt on the /nj/ cluster in Table 8, at least for this particular form. This is the only type of complex onset formed with a palatal glide offset in Angaataha. Huisman, Huisman, and Lloyd (1981: 53–54) provide three arguments for their CG cluster interpretation. One of these is economy, i.e., avoiding the addition of new phonemes. They also note that in onset clusters, [w] and [j] are non-syllabic and do not bear any tone. Furthermore, in all of these sequences both segments occur elsewhere as independent phonemes. 5.4.2. Northern Pame (ISO 639-3 code: pmq) Northern Pame is an Oto-Manguean pitch-accent language of Mexico (Table 9). Elsewhere in this paper it is referred to simply as Pame. Berthiaume (2003) posits 40 consonant phonemes. This relatively large number is due in part to contrastive aspirated and glottalized (ejective) series for many of the voiceless stops, affricates, nasals, and laterals. CG clusters involve labial and laryngeal anchors plus /j/, and velar and laryngeal anchors with /w/. Coronals do not combine with either glide. The two contrastive tones, high and rising, are limited

Sonority distance vs. sonority dispersion – a typological survey

137

Table 9. Synopsis of descriptive facts for the Pame language classification: Oto-Manguean, Otopamean, Pamean country: Mexico population: 5,620 source: Berthiaume 2003 consonant phonemes (40): p t k P ph th kh t’ k’ >>> > > > b d g ts tS tsh tSh ts’ tS’ s S h m n ñ mh nh ñh m’ n’ ñ’ l L lh Lh l’ L’ R Rj j w $CGV clusters (13): pj Pj ph j bj hj mj mh j (m’j) kw Pw kh w k’w gw hw CSR: 0.33 $CLV clusters: none other #CCV clusters: OO, ON

to one occurrence per lexical root. In the illustrative data below these are marked with the typical accent diacritics, following Berthiaume (2003): (28) Pame data examples (Berthiaume 2003) OG: [pj´aPa] ‘tomorrow’ [kh wˇ@n] ‘they dragged’ [gwˇan@n] ‘we are going’ [hj´@P] ‘you (sg)’ NG: [mj´ahaw] ‘his stomach’ [mj´ æPæb@t] ‘donkeys’ [mh j´˜@n] ‘soup’ In Table 9 the only glide offset cluster involving an ejective is m’j. Berthiaume > (2003) clarifies that this sequence is realized phonetically as [mbj] due to a pro> cess of laryngeal buccalization. An example is /j-m’´u/ → [mbj´u] ‘cacti’; this form also involves metathesis of the two underlying consonants. Due to the marginal nature of this particular cluster it is placed in parentheses in Table 9. Furthermore, it is not included in the calculation of the Cluster-to-Segment Ratio. The latter value (0.33) is the smallest of all the glide offset languages described in this paper (see section 5.6). As this table indicates, Pame also exhibits word-initial clusters composed of either two obstruents or an obstruent followed by a nasal. In both of these C1 is one of the voiceless oral fricatives /s/ or /S/. The second consonant is either a voiceless stop/affricate (with or without aspiration and glottalization), or a plain nasal (with no laryngeal modification). Here is one example of each of these two > > cluster types in word-initial position: [stsh ´awP] ‘ruler’ and [sn@gwˇ@h@ts’] ‘his hat’. The upshot of these facts is that Pame is not a pure type C language, i.e., its core clusters are not strictly limited to glide offsets only. Nevertheless, for our purposes I ignore this particular complication. The crucial detail for this section is that Pame clearly does not have any liquid offset clusters. A more consistent type C language is Angaataha (section 5.4.1). Berthiaume (2003: 84–87) argues for his CG cluster interpretation on the basis of distribution: none of these clusters occurs word-finally, whereas less > ambiguous, monosegmental (unit) phonemes like /ts/, /ñ/, aspirates, and ejectives do. Gibson (1956) and Manrique (1967) concur with this interpretation; they invariably transcribe all such examples as a consonant followed by a sep-

138

Steve Parker

arate glide. However, Avelino (1997) posits a series of onglide diphthongs instead, but gives no justification for this analysis (Berthiaume 2003: 85). There is very strong evidence that Pame actively avoids onset clusters other than those listed in Table 9. For example, labial consonants are not attested with /w/ offsets. When such a sequence would otherwise surface, the glide is deleted instead. The following paradigm involves the ‘third person subject’ prefix (or infix) /w/ attached to verb roots. (29) CG is repaired by deletion of the /w/ after a labial anchor in Pame (Berthiaume 2003: 114) > > /kw´˜ats/ → [kw´˜ats] ‘he sets soft’ > > /Pw´@ts’/ → [Pw´@ts’] ‘he writes’ > > /hwˇ@tS/ → [hwˇ@tS] ‘he hunts’ but /pwˇ æp/ → [pˇ æp] ‘he helps him’ /pw´ æ/ → [p´ æ] ‘he braids’ 18 ‘he lives’ /mw´ehu/ → [m´ehu] /mw´uhuj/ → [m´uhuj] ‘he passes by’ As shown in (29), /w/ is permitted to cluster with velar and glottal anchors, in confirmation of Table 9. However, when /w/ follows a labial consonant like /p/ or /m/, the glide is elided. This restriction against adjacent labial consonants is language-specific rather than being a general feature of all type C languages; cf. Table 8. Berthiaume (2003) attributes this gap to a high-ranking OCP-like Place cooccurrence constraint. Another strategy to fix illicit consonant clusters in Pame is to break them up by inserting an intervening vowel. This happens, for example, when a coronal segment is followed by /w/. In this case the vowel [u] is epenthesized, as illustrated in (30a) below. Also, when morpheme concatenation places one of the glottal segments /P/ or /h/ immediately after another onset consonant, vowel insertion applies again. This is seen with the clusters /ph/ in (30b) and /th P/ in (30c). This behavior shows that clusters of this type are not really aspirates or ejectives. In such cases the nucleus that is predictably inserted is a copy of the underlying vowel in the following root syllable:

18. Berthiaume (2003) transcribes this phonetic form without the high tone on the first vowel: [mehu]. He confirmed (p.c.) that this is a typo.

Sonority distance vs. sonority dispersion – a typological survey

139

(30)

CC is repaired by vowel epenthesis in Pame (Berthiaume 2003: 124–127) (a) /s@Pnwˇ˜a/ → [s@Pnuˇ˜w˜a] ‘my nose’ (b) /ph´ æ/ → [p´ æhæ] ‘he carries’ h ´ h´ ˜ æ] ‘tamale’ ˜/ → [nt æP˜ (c) /nt Pæ " Finally, another interesting process involves a prefixal /j/ that metathesizes with the following root-initial consonant. This is illustrated in the verb paradigm below. Compare these forms with those in (29). When j+C metathesis would produce a syllable-initial sequence of three consonants, the /j/ is syllabified as [i]. In the following words /n-/ is a default class marker and /Pj-/ is the third person singular possessive prefix: (31)

CCC is repaired by glide vocalization in Pame, after metathesis (Berthiaume 2003: 213) /n-Pj-p´u P/ → [nP.pj´uP] ‘his butter’ " /n-Pj-mæ ´ Pp/ → [nP.mj´ æPp] ‘his donkey’ " but /n-Pj-ph´ æ/ → [nP.píhæ] ‘his cargo’ " /n-Pj-mP´aj/ → [nP.míPaj] ‘his animal’ " In (31) the first two words show that a metathesized [j] surfaces when the root begins with only one underlying consonant. However, when the Underlying Representation of the root begins with a cluster of two consonants, the prefixal [j] cannot “land” between them since this would violate the Sonority Sequencing Principle. Consequently, it is realized as the vowel [i] instead, resulting in an additional syllable. In this case the preservation of the “palatal” melody trumps the default realization of these surface vowels with the same quality as the nucleus of the following syllable. For example, compare the form [p´ æhæ] ‘he carries’ from (30b) with the related word [nP.píhæ] ‘his cargo’ in (31). " 5.4.3. Summary of section 5.4 In this section I have presented data from two different languages, Angaataha and Pame. Both of these are classified here as type C. However, while Angaataha is a prototypical exemplar of this category, Pame is not. The former has only two onset cluster types: OG and NG. Pame, on the other hand, permits both of these plus two other kinds of biconsonantal clusters: OO and ON. OO sequences are noncore types since they involve a sonority plateau (section 2). This leaves the existence of the rising sonority onset ON as a minor complication in Pame. Setting this aside, the common phonological feature of these two languages is a minimum sonority distance of at least two ranks between the offset and anchor consonants when C2 is a glide. Thus *LG and *GG are systemati-

140

Steve Parker

cally not attested in this category of language since the sonority distances within the latter two types are only one rank and zero ranks, respectively. The corresponding generalization from Table 2 is MinimumSonorityDistance/CG = 2. 5.5. Type D languages (OG clusters) 5.5.1. Bambassi (ISO 639-3 code: myf) Bambassi is an Afro-Asiatic language of Ethiopia (Table 10). All of its syllableinitial consonant clusters are of the type OG only. It thus requires the difference in sonority ranks between offsets and anchors to be as large as possible, viz., 4−1 = 3. Recall from section 3.2 that I suggest the term maximal distance cluster to describe this type of complex onset. Ahland (2009) notes that his general discussion of phonotactics is limited to isolated, monomorphemic words. In his list of data he specifically distinguishes between word-initial and word-medial clusters, so I do the same in Table 10. Crucially, all four non-initial clusters also occur word-initially, but not vice-versa. A few generalizations can be made about the specific clusters that are attested. For example, /w/ offsets do not occur with labial anchors, nor does the /j/ offset combine with coronals. Also, the only voiced consonant occurring in complex onsets is /g/, attested with both glides. The list of permissible clusters in Table 10 includes /sw/. This is not mentioned specifically in Ahland’s (2009) analysis. Nevertheless, it does occur in one of the words cited in (32). Michael Ahland (p.c.) confirmed that this is a valid example and that he inadvertently forgot to add /sw/ to his cluster list. Sample data are presented below. Bambassi has three contrastive levels of tone: H(igh), M(id), and L(ow). In my transcriptions I follow Ahland (2009) and specify the tonal melody at the end of each word, separated by a comma. > In Table 10 the affricate /tS/ is parenthesized to highlight its marginal status. According to Ahland this phoneme occurs in only one non-borrowed word, the > far-distal demonstrative [gjetSe,HH] ‘that’. Furthermore, only some speakers > pronounce it this way; all others pronounce it with a [S] instead of [tS]. Consequently, it is not included in the count of consonant phonemes (21), nor does it contribute to the Cluster-to-Segment Ratio. Table 10. Synopsis of descriptive facts for the Bambassi language classification: Afro-Asiatic, Omotic, North, Mao, East country: Ethiopia popula> tion: 5,000 source: Ahland 2009 consonant phonemes (21): p t k p’ t’ k’ b d g (tS) ts’ s S h z m n N l R j w $CGV clusters word-initially (12): pj kj p’j k’j gj tw kw t’w k’w gw sw Sw; word-medially (4): kj p’j kw k’w CSR: 0.57 $CLV clusters: none other #CCV clusters: none

Sonority distance vs. sonority dispersion – a typological survey

141

(32) Bambassi data examples (Ahland 2009) OG: [ha.kjam.ba,MHM] ‘hunt’

[k’jaN.k’i.la:.pe,HLLL] ‘kidney’

OG: [gja:.je,HL] ‘many’

[twaN.gi.le,MLL] ‘elephant’

OG: [ha.k’wins.ka,MLH] ‘kneel’ [gwi:n.t’e,LH] ‘sweeping’

[kwi:n.t’e,HH] ‘hair’ [swi:.Re,LH] ‘hawk’

Ahland (2009: 21–22) provides several reasons for his CG cluster interpretation. His argumentation and discussion of alternatives is very apropos as a summary of the theme of section 5.1 of this paper. Consequently, I reproduce it here literally, in its entirety:19 These consonant-approximant sequences are ambiguous in that they could be interpreted as a single C (that is, as a labialized or palatalized consonant), as a CC cluster, as a consonant followed by a VV sequence with [u] or [i] as the first vowel, or as a diphthong [uV] and [iV], formed with the following vowel. These “ phenomena are interpreted“ as CC clusters on the grounds that positing complex consonants would increase the consonant inventory by 11 and lead to an inventory which does not follow a principle of economy nor which exhibits natural class symmetries; that is the sets of labialized and palatalized consonants would not be found systematically distributed throughout the inventory. Additional observations, which are perhaps less convincing as phonological arguments but which are relevant to the consonant-approximant sequences, include: 1) there are no non-geminate (i.e. non-identical) VV sequences in monomorphemic words; 2) there is no evidence of diphthongs, and the distribution of the approximants would require positing five diphthongs; 3) in the vast majority of cases, they are found word-initially and when they do occur medially, consonant distribution and syllable structure suggest they must be seen as onset clusters; it might be expected that were these single Cs, they could be found more often internally– more generally distributed. All unambiguous Cs which occur initially also occur as medial onsets, apart from [dZ], which occurs only in borrowed words. In short, as all analyses are problematic, it is preferable to minimize the consonant inventory rather than complicate it in a nonsymmetrical, nonsystematic manner. It is the assumption of the author that more data may yield other examples of these CC clusters, where additional obstruents may be followed by either of the approximants (Ahland 2009: 21–22; quoted with permission).

Another argument that can be made is the following. The maximal syllable in Bambassi normally consists of just one coda consonant, plus contrastive vowel length. An example of this is the form [gwi:n.t’e,LH] in (32). This has a total of three timing units in the rhyme (the word [ha.k’wins.ka,MLH] in (32) is the lone example with a complex coda). However, CG onset clusters can crucially cooccur in the same syllable with a trisegmental rhyme. If the glide offset in 19. Thanks to Michael Ahland and Lindsay Whaley (p.c.) for permission to include this quote here.

142

Steve Parker

these cases were interpreted as belonging to the nucleus, there would then be four segments in the rhyme. This is otherwise unattested, so it makes more sense to place the glide in the onset.20 Michael Ahland (p.c.) concurred with this reasoning. Finally, Ahland (2009) includes a couple of words in which the lateral /l/ is followed by a glide. These two forms appear in a list of consonant clusters that are all heterosyllabic. Consequently, they constitute evidence that Bambassi actively avoids onset clusters of the type LG. This confirms the generalization that all anchor consonants in this language must be obstruents. (33)

liquid–glide clusters are parsed into separate syllables in Bambassi (Ahland 2009: 20) [k’il.je,MH] ‘leaving’ [a.kil.wa.je,MMLL] ‘name of a Mao clan’

There are several good reasons why the two lateral+glide sequences in (33) should not be parsed as tautosyllabic onset clusters (analogously to the forms in (32)): (1) In both words the /l/ and the glide belong to different morphemes. (2) /l/ can independently occupy the coda position in other intervocalic consonant clusters with falling sonority, such as /lt/, /lk/, /lb/, /ld/, /lg/, and /lm/. None of these sequences occurs word-initially. (3) Both /j/ and /w/ appear by themselves as lone onset segments in word-initial CV syllables, as well as intervocalically (VCV). (4) The sequences /lj/ and /lw/ do not occur word-initially, whereas all of the OG clusters do. (5) If /lj/ were a tautosyllabic onset cluster, it would be the only example of a coronal anchor with a palatal offset. (6) Nasal consonants are never followed by glides: */mj/, */nw/, etc. Consequently, it would be completely anomalous to have a language with OG and LG onsets but not *NG. This would entail a typologically unprecedented gap in the natural classes of attested anchor consonants with respect to the sonority scale. In summary, positing a syllable break between [l.j] and [l.w] does not “cost” anything in terms of requiring additional phonological structures, while positing [.lj] and [.lw] onsets would complicate the analysis in several respects. 5.5.2. Dadibi (ISO 639-3 code: mps) Like Angaataha (section 5.4.1), Dadibi is a Trans-New Guinea language of Papua New Guinea (Table 11). However, since these two languages belong to different subfamilies, their genetic relationship is distant. Furthermore, they ex-

20. Beachy (2005) advances a contrary interpretation of CG sequences in Dizin, a related language.

Sonority distance vs. sonority dispersion – a typological survey

143

Table 11. Synopsis of descriptive facts for the Dadibi language classification: Trans-New Guinea, Teberan country: Papua New Guinea population: 10,000 principal source: MacDonald and MacDonald 1974; secondary sources: MacDonald 1973, 1992; Hemmilä 1999 consonant phonemes (13): p t k ph th kh s x m n l j w $CGV clusters (5): pw tw kw ph w xw CSR: 0.38 $CLV clusters: none other #CCV clusters: none

hibit different inventories of CG cluster types. Consequently, it seems harmless to include both of them in this study. MacDonald and MacDonald (1974) posit two series of oral stops, which they transcribe as /p t k/ vs. /b d g/. However, they describe /p t k/ as being aspirated in phonetic forms. Hence I transcribe these segments in Table 11 and (34) as [ph th kh ], respectively. Similarly, MacDonald and MacDonald’s /b d g/ are pronounced as voiceless unaspirated in the default case (word-initially). Consequently, I transcribe them here as [p t k], respectively. As an aside, these latter three phonemes have continuant and/or voiced allophones in medial (intervocalic) position. Finally, MacDonald and MacDonald posit a phoneme /h/ which they describe as a voiceless velar fricative. I transcribe this segment as [x]. The following example presents illustrative data. (34) Dadibi data examples (MacDonald and MacDonald 1974; Hemmilä 1999) OG: [twai] ‘bad’ [kw˜e ] ‘bird’s call’ [au.kwa] ‘they’ h OG: [p wai] ‘named’ [xwã] ‘axe’ [xw˜ı] ‘fight’ MacDonald and MacDonald (1974: 145–147) give two main reasons for their CG cluster interpretation. These are economy of phonemes and “literacy experiments” (p. 146). Unfortunately they do not expand on the latter point. They also note that the phonetic sequence [CwV] involves just one syllable and one tone, whereas the contrastive sequence [Cu.V] patterns as having two syllables and two tones. 5.6. Summary of section 5 (glide offset languages) In this section I have described eight languages sharing a common phonological characteristic with respect to their inventory of complex onsets. Specifically, all of these languages exhibit the maximal distance cluster type OG. Two of them in fact allow this type and no others (Bambassi and Dadibi). Furthermore, these eight languages together form a novel kind of sonority-based hierarchy. Specifically, each one adheres to the following generalization: among glide offset (CG) clusters, the presence of an anchor having a higher Sonority Index implies the

144

Steve Parker

existence of all anchor classes having a lower Sonority Index. The following table summarizes the main details of these eight languages: Table 12. The glide offset continuum of language types (cf. Table 2) language type A B

C D

permissible onset clusters language OG, NG, LG, GG Shilluk OG, NG, LG Ga’dang Kham Koonzime OG, NG Angaataha Pame OG Bambassi Dadibi

phylum Nilo-Saharan Austronesian Sino-Tibetan Niger-Congo Trans-New Guinea Oto-Manguean Afro-Asiatic Trans-New Guinea

country Sudan Philippines Nepal Cameroon Papua New Guinea Mexico Ethiopia Papua New Guinea

Table 12 displays the distribution of the eight CG-only languages in terms of major genetic phyla and geographic locations. There is a fairly wide spread of these two factors among the sample, with only one overlap (repeated stock and country). Specifically, Angaataha and Dadibi are from the same macrofamily and both are spoken in Papua New Guinea. Nevertheless, as noted in section 5.5.2, they pertain to different typological subclasses (type C vs. type D, respectively). Furthermore, each of these two language types is independently confirmed by one other attested language in this survey: Pame is another type C language and Bambassi further exemplifies type D. Consequently, this partially new CG-only language continuum appears to be fairly well established by this initial sketch. Nevertheless, one unfortunate gap in this sample is the lack of a CG language from South America. However, at least two possible glide offset languages from that continent appear in Appendix A: Puinave (Caudmont 1953) and Terêna (Harden 1946; Bendor-Samuel 1966). In addition, there are potentially many more languages that could easily turn out to be canonical exemplars of these four language types. As noted in section 4, a list of these languages is presented in Appendix A. For our purposes the languages cited there do not fulfill all of the ideal prerequisites enumerated in section 4, at least among the bibliographic sources I have been able to access. Nevertheless, in many of these cases the only “shortcoming” is that the source does not list each specific combination of phonemes exemplifying the different cluster types. Therefore, with a broader search of references one could probably add a significant number of cases to the four language types summarized in Table 12. I refer the reader to Appendix A for a robust list of other likely candidates for glide offset-only languages.

Sonority distance vs. sonority dispersion – a typological survey

145

The last matter to discuss before concluding this section is the issue of Cluster-to-Segment values. These are summarized in the following table for all eight glide offset languages analyzed here: Table 13. Statistical summary of Cluster-to-Segment Ratios across language types (see (18)) language type CSR value Shilluk A 1.63 Ga’dang B 1.06 Kham B 1.59 Koonzime B 1.00 Angaataha C 0.45 Pame C 0.33 Bambassi D 0.57 Dadibi D 0.38

means for languages grand mean for all grouped by type glide offset languages (A) 1.63 (B) 1.22 0.88 (C) 0.39 (D)

0.48

In Table 13 Shilluk has the highest Cluster-to-Segment value of the sample: 1.63. As noted in conjunction with (17), this is predicted since type A languages draw from anchor consonants having all four sonority ranks. The next highest language type overall is B (mean Cluster-to-Segment Ratio = 1.22). Among these, Kham has the second highest value of the sample (1.59). One other type B language, Ga’dang, also has an obtained value (slightly) higher than 1.00. The lowest value among all CG languages is that of Pame (0.33), a type C language. Consequently, the mean for type C overall (0.39) is actually a little lower than that of type D (0.48). This is not expected since type C languages permit two classes of anchor consonants (obstruents and nasals), whereas type D languages allow only one (obstruents). An obvious generalization here is that languages in the top half of the table (types A and B) exhibit values higher than or equal to 1.00. In all four of these cases, then, positing complex monosegmental onset units would at least double the quantity of language-specific consonant phonemes. Conversely, the bottom four languages (types C and D) consistently display values lower than 1.00. The grand mean for all eight languages together is 0.88. Nevertheless, given the limited number of languages in this table, these preliminary mean values are necessarily tentative from a statistical point of view. Hence they need to be confirmed with many more Cluster-to-Segment Ratios from a larger sample.

146 6.

Steve Parker

Documentation of liquid offset (CL) languages

In this section I describe the relevant facts pertaining to six languages that, almost exclusively, require their onset clusters to end with a liquid. The format is the same as that of glide offset languages in section 5. The order of presentation follows that of Table 3. 6.1. Interpretation again A preliminary matter is the issue of interpreting potential CGV sequences in such languages. Recall that in glide offset languages it is important to demonstrate that consonant sequences ending in a glide truly are onset clusters (section 5.1). As we have seen, this problem is especially difficult when no other canonical clusters occur. With liquid offset languages, however, a different issue is at stake. It is generally agreed that OL sequences are not phonologically ambiguous, unlike CG sequences. Thus in a CLV demisyllable the liquid can only reside in the onset, not in the nucleus. Furthermore, CL sequences are normally analyzed as a cluster, not as complex phonemic units analogous to labialized and palatalized stops. Occasionally minor exceptions to this gener> alization may arise, such as lateral affricates like /dl/. Nevertheless, all of the languages mentioned in this section permit some heterorganic clusters such as /pR/ and/or /kl/. Another minor complication is that some phonetic CL sequences derive from underlying CVL due to vowel syncope. This is especially common in African languages, where this deletion process is often optional and gradient (cf. section 4). Therefore, even CL sequences may potentially be ambiguous at times, albeit in a different way. Nevertheless, none of the six liquid offset languages described here exhibits this phenomenon, as far as I am aware. What may be problematic, though, is when a language allowing CL clusters also attests phonetic CGV sequences. In this case it is a priori possible that CG is also a true onset cluster. The principle here is that the existence of CL establishes a precedent for other cluster types (such as CG) in the same language (cf. section 5.1). Hence if a language is claimed to be liquid offset only, we must initially be cautious when that language also has phonetic CGV sequences. In such a case it is necessary to show explicitly that CG is not also an onset cluster. For our purposes it does not matter whether this is accomplished by interpreting the glide as a secondary release of a lone onset consonant (cf. 14b), or as part of a complex nucleus/diphthong (14c). What is crucial is that the CG sequence not be analyzable as parallel to CL – a sequence of two different segments, both in the onset (14a). Consequently, in the description of CL languages below I include any language-specific arguments noted by the source(s) in rejecting an

Sonority distance vs. sonority dispersion – a typological survey

147

interpretation whereby CGV demisyllables (if present) are structurally equivalent to CLV. In order for such languages to be included here, the conclusion must be that CLV is truly a unique type of syllable onset. Furthermore, since CL clusters are generally unambiguous, there is little point in calculating Clusterto-Segment Ratios (18) for liquid offset languages. Thus in the tables below I omit this detail. 6.2. Type E languages (OL + NL clusters) 6.2.1. Aceh (ISO 639-3 code: ace) Aceh is an Austronesian language spoken in the Sumatra region of Indonesia (Table 14). The phoneme /S/ is described as a voiceless postlaminal dentalalveolar fricative with a wide channel area (Durie 1985). In Table 14 the alveopalatal fricative /S/ is parenthesized since it is mainly restricted to Arabic loanwords. It is not included in the count of phonemes. The alveolar trill /r/ has a tap allophone intervocalically. This phoneme is pronounced differently – as a uvular approximant – in some dialects. Oral stops generally cluster with both /l/ and /r/, except that /t c d é/ do not combine with /l/. The cluster /Sr/ also occurs in some dialects. The combination /Nr/ has only been attested in one word (see (35)). Therefore the status of Aceh as a type E language is marginal. An important condition on all onset clusters in Aceh is that they only occur in stressed syllables. Usually this is the final syllable of the word. In unstressed syllables only simple (non-branching) onsets are found: [CV(C)]. The following list presents some illustrative data. Table 14. Synopsis of descriptive facts for the Aceh language classification: Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Chamic, Achenese country: Indonesia (Sumatra) population: 3,500,000 sources: Durie 1985, 1995, 2001; Daud and Durie 1999 dialect of focus: North Aceh, conservative (older people) consonant phonemes (19): p t c k b d é g S (S) m n ñ N r l j w P h $"CLV clusters (13): pr tr cr kr br dr ér gr pl kl bl gl (Nr: one token) $CGV clusters: none other $"CCV clusters: C+h

(35) Aceh data examples (Durie 1985, 1995, 2001; Daud and Durie 1999) /Or/: [tri@N] ‘bamboo’ [croh] ‘fry’ [m˜an.dr˜Et] ‘type of spicy drink’ /Ol/: [klO] ‘dumb’ [blO@] ‘buy’ [gli] ‘ticklish’ NL: [ˆNram] ‘angry’ The last form in (35) begins with a velar nasal transcribed with a circumflex diacritic on top ([ˆN]). Durie (1985) calls this type of segment a “funny” nasal.

148

Steve Parker

These are realized with longer duration and decreased nasal airflow compared with “plain” nasals. The two series of nasals are in complementary distribution and thus do not contrast with each other. The plain allophones occur before nasalized vowels while the funny allophones appear syllable-initially before a consonant or an oral vowel.21 As indicated in Table 14, Aceh also attests initial clusters consisting of an obstruent or a liquid followed by the voiceless glottal fricative /h/. Two examples are [that] ‘very’ and [lhoP] ‘deep’. Some dialects have /mh/ and /nh/ as well. Durie (1985) interprets these as true clusters rather than aspirates based on native speaker intuition: Aceh people use a separate letter (h) to write these consonants, showing that they are psychologically real. Furthermore, this analysis simplifies the description of phonotactics. For example, /C+h/ sequences can be split apart by an infix. Assuming that this /h/ is in fact an independent segment, it raises a difficult question: where do glottal consonants fall on the sonority scale? Specifically, are they obstruents or sonorants? To my knowledge this issue has never been conclusively resolved. For general discussion of this topic, see Parker (2002, 2008, 2011), Mielke (2008, 2009), Kaisse (2011: 290), Walker (2011: 1853), and Miller (this volume). Due to the controversial nature of this issue I refrain from taking a firm stance here concerning the relative sonority of the laryngeal “glides” /h/ and /P/. Consequently, I do not attempt to categorize the C+h clusters in this language in terms of the typological parameters proposed in this paper. Instead, I simply ignore them as a complication that transcends this discussion. The C+h cluster interpretation has also been posited for Aceh by Al-Ahmadi Al-Harbi (2002). His analysis involves a modified version of the Sonority Dispersion Principle (Clements 1990; see also sections 1, 2, and 3.3). Finally, as noted in Table 14, Aceh crucially does not permit syllable-initial CG clusters of any type. Mark Durie (p.c.) confirmed that phonetic CGV sequences never occur. In fact, in some forms involving an underlying hiatus beginning with a high vowel, the two nuclei are predictably broken up by an epenthetic glottal stop. For example, phonemic /éioh/ ‘far’ is pronounced phonetically as [éiPoh]. This establishes Aceh as basically a liquid offset-only language. 6.2.2. Garo (ISO 639-3 code: grt) Garo is a Sino-Tibetan language spoken in India and Bangladesh (Table 15). Burling (1961: 3) describes the phoneme /s«/ as a “voiceless lamino-alveolar 21. A reviewer asks whether the nasal in [ˆNram] could be phonetically syllabic. I could not find any mention of this point in Durie’s works.

Sonority distance vs. sonority dispersion – a typological survey

149

Table 15. Synopsis of descriptive facts for the Garo language classification: Sino-Tibetan, Tibeto-Burman, Jingpho-Konyak-Bodo, Konyak-BodoGaro, Bodo-Garo, Garo countries: India, Bangladesh population: 900,000 sources: Burling 1961, 1992, 2003 dialect of focus: standard A’we (northeastern >> corner of the Garo hills) consonant phonemes (16): p t k P b d g ts« dz« «s h m n N (l) R > > w $CLV clusters (10): pR tR kR bR dR gR ts«R dz«R «sR mR $CGV clusters: none other $CCV clusters: «sp «st «sk «spR «skR

> > rill [grooved, S.P.] spirant”. Similarly, /ts«/ is intermediate between /ts/ and > /tS/. In Table 15 the lateral /l/ is parenthesized since it does not occur syllableinitially except in loanwords. However, it does appear in coda position while [R] is limited to syllable onsets. Therefore these two liquid segments are technically in complementary distribution in the native vocabulary and can be considered allophones of the same phoneme. Consequently, [Cl] clusters are not observed. The rhotic /R/ is normally a flap but at times is pronounced as a “very brief trill”, even in onset clusters. The voiceless stops /p t k/ are phonetically aspirated in onset position, including before /R/. The generalization is that any oral obstruent can combine with an /R/ offset, and /mR/ clusters are attested as well. Here are some sample forms: (36) Garo data examples (Burling 1961, 1992, 2003) /OR/: [pR1N] ‘morning’ [s«Rek] ‘balcony’ > /OR/: [ ts«aP.pReta] ‘eat excessively’ [nam.s«RaNa] ‘completely good, thoroughly good’ /OR/: [bRi] ‘four’ [mesok.dRaa] ‘show against one’s desires’ [b1l.gRi] ‘weak’ /mR/: [mR1t] ‘razor’ [mRoN] ‘main supporting post of a house’ [mRipa] ‘submerge, go under water’

The onset clusters /bR dR gR/ are noted in Burling (1961, 1992) but omitted in Burling (2003). Rob Burling (p.c.) clarified that the latter is an oversight. Similarly, /mR/ is listed in Burling (1992, 2003) but not in Burling (1961). This too is an error. Burling reports that /mR/ onsets are relatively rare but have been “securely established”. He knows of 16 words beginning with this sequence, and it occurs word-internally as well. Therefore, while the total quantity of /mR/ clusters is somewhat limited, it is not trivial. As noted in Table 15, syllable-initial clusters composed of /s«/ followed by a voiceless stop are also permitted. Two examples are [s«popReta] ‘blow too much, burst’, and [s«kaN] ‘before’. Finally, syllable-initial /Cw/ clusters do not occur. Furthermore, whenever a sequence such as /Cua/ arises, the two vowels are always in different syllables. Once again some phonologists might consider this a matter of interpretation. In this

150

Steve Parker

case, however, the transcriptional choice between [Cwa] and [Cua] could presumably be settled on phonetic grounds. See sections 5.1 and 6.1 for further background and discussion. 6.2.3. Isirawa (ISO 639-3 code: srl) Isirawa is classified by Lewis (2009) as a Tor-Kwerba language (Table 16). It is therefore unrelated to Aceh (Table 14). Furthermore, although both languages are spoken in Indonesia, their locations are geographically distant from each other. This eliminates a potential areal hypothesis – the possibility that the phonological system of one language has been influenced by the other (see section 6.4). In Table 16 the marginal phoneme /h/ is parenthesized since it is restricted to a small number of lexical items. In addition, in some of these it is in free variation with /s/: [ma"sIta] ∼ [ma"hIta] ‘daughter’. The only liquid phoneme is /R/, which is basically a flap. However, adjacent to another consonant it is trilled, as in the clusters here. The generalization in Table 16 is that rhotic offsets combine with /m/ as well as with any oral obstruent except the af> fricate /tS/. Some of these onset clusters in fact optionally surface with a short, transitional vocoid between the two consonants. However, certain sequences, such as /pR/, never exhibit this schwa-like vowel. Furthermore, this excrescent segment does not perturb the default placement of stress on the penultimate syllable. Therefore it is clearly not phonemic. Following are some examples. Table 16. Synopsis of descriptive facts for the Isirawa language classification: Tor-Kwerba, Greater Kwerba, Isirawa country: Indonesia (Papua) population: 1800 source: Oguri and Erickson 1975 consonant phonemes (12): p t > k tS f s (h) B m n R j w $CLV clusters (7): pr tr kr fr sr Br mr $CGV clusters: none other #CCV clusters: none

(37) Isirawa data examples (Oguri and Erickson 1975) /OR/: /OR/: /OR/: /mR/:

[a"prEsa] ‘lower arm’ ["tri] ‘thunder’ ["krai krai] ∼ ["k@ rai k@ rai] ‘bird sp.’ ["srifa] ‘language’ ["naB.sra] ‘three’ [mra"Rimanapi] ‘bathed’ ["mri] ∼ ["m@ ri] ‘one’

["so.kra] ‘mouse’ [En"to.fra] ‘ring finger’ [po"ma.Braun] ‘bring it’ [fita."mra] ‘corpse’

Concerning the issue of interpretation (sections 5.1 and 6.1), a number of vowel clusters begin with /i/ or /u/. However, Oguri and Erickson (1975) analyze these as complex nuclei (diphthongs), for several reasons: (1) The two vowels together count as only one mora. (2) Both segments have equal length. (3) In many cases stress can optionally shift from one half to the other. (4) Reversed sequences such as [ua] vs. [au] occur. (5) There are non-ambiguous vowel clus-

Sonority distance vs. sonority dispersion – a typological survey

151

ters such as [oa]. And (6) stress never falls prior to the antepenultimate vowel except in words containing a vowel sequence somewhere after the stress. One example is ["wamuR2i] ‘shark’. Hence, the lack of preantepenultimate stress is unexceptional as long as the VV sequences in question are treated as tautosyllabic. Consequently, in phonetic [CVV] sequences the first vowel is clearly in the nucleus, not in the onset, even when it is [+high]. This confirms Isirawa as a liquid offset-only language. 6.2.4. Summary of section 6.2 In this section I have described three languages classified here as type E. The common phonological characteristic of this category is the existence of just two basic kinds of onset clusters: OL and NL. Among these languages, Aceh exhibits two liquid phonemes, /l/ and /r/. Both of these combine with a range of different obstruent anchors (section 6.2.1). Garo, however, has only one liquid phoneme, the flap /R/ (section 6.2.2). Isirawa only has /R/ as well, but as we have seen, this is pronounced as a trill when in a cluster (section 6.2.3). One general trend among these languages, then, is that rhotic offsets are more common than lateral offsets. This is especially true of NL clusters. Among these, Aceh attests only one token, beginning with /Nr/. Garo also has only one kind of sequence with a nasal anchor, /mR/. However, this cluster is more statistically productive in this language than /Nr/ is in Aceh. Finally, Isirawa also permits only one kind of NL cluster: /mR/ → [mr]. While this is sometimes split apart by an optional low-level reduced vocoid, so are certain OL sequences. An interesting aspect of all of these NL clusters, then, is that the nasal anchor is always non-coronal; /nL/ sequences do not occur in any of the languages sampled here. This is somewhat analogous to another cross-linguistic tendency: while the heterorganic clusters /pl kl bl gl/ are relatively common, languages attesting onsets of the type /tl dl/ are less frequent. This is often ascribed to a constraint against adjacent homorganic consonants, at least those in which C1 is [–continuant] (Harris 1983). Consequently, in this sample of type E languages, OL clusters are much more robust than NL clusters. This is true both in terms of types and tokens. This is not surprising since OL involves a larger sonority distance (n = 2) than NL does (n = 1). In other words, sonority provides a logical basis for why this outcome is expected. In conclusion, the existence of NL onsets is more tentative overall than most other types of core clusters examined in this paper. Nevertheless, we have observed enough canonical examples of them among these three cases to establish, in a preliminary way, a new category of language: type E.

152

Steve Parker

6.3. Type F languages (OL clusters) 6.3.1. Emberá-Catío (ISO 639-3 code: cto) Emberá-Catío is a Choco language spoken in Colombia and Panama (Table 17). Elsewhere in this paper it is simply called Catío. Mortensen (1994) describes > /p’ t’ k’ tS’s’/ as “constricted” and therefore often ejectivized. That source is followed here since it is the most recent comprehensive analysis of the phonology available. Furthermore, it is confirmed by acoustic measurements. Catío has a contrast between a trilled /r/ and a flap /R/. The latter combines as an offset with four different voiceless obstruent anchors (Table 17). Some sample data are given in (38) below. Table 17. Synopsis of descriptive facts for the Catío language classification: Choco, Embera, Northern countries: Colombia, Panama population: 15,040 principal source: Mortensen 1994; secondary sources: Mortensen 1999; Rex and Schöttelndreyer 1973 consonant phonemes (19): ph th kh p’ t’ k’ b d >h > > h tS tS’ dZ s s’ m n r R w h $CLV clusters (4): th R kh R t’R s’R $CGV clusters: none other #CCV clusters: none

(38) Catío data examples (Mortensen 1994, 1999) > ˜ h Re] ‘above (in heaven)’ [kh R˜ıñtSh a] ‘think’ /OR/: [th Ro] ‘armadillo’ [Wn.t [ph ˜en.t’R5] ‘widow’ /OR/: [kh R˜ıña] ‘want’ [t’Rua] ‘land (n)’ /OR/: [s’R˜o˜a] ‘old’ [s’Roma] ‘big’ According to Chaz Mortensen (p.c.), phonetic sequences such as [kua] also occur. However, when Catío speakers pronounce syllables like this slowly and carefully, they only use [u], never [w]. This indicates that such segments are vowels rather than consonants (glides). Consequently, they form part of the nucleus rather than the onset. In the loanword /plata/ ‘money’, the Catíos pronounce the lateral faithfully, i.e., just as it is articulated in Spanish. 6.3.2. Tshangla (ISO 639-3 code: tsj) Tshangla is a Sino-Tibetan language spoken in three countries of southern Asia (Table 18). It is therefore related to Garo (section 6.2.2). The standard dialect of Tshangla corresponds to the Trashigang district of Bhutan. It has 29 consonant phonemes, listed in Table 18, including contrastive aspiration for voiceless stops and affricates. Three additional segments (in parentheses) occur in loanwords. The fricatives /C/ and /ý/ are described as lamino-postalveolar (Andvik 2010). Trashigang speakers combine /R/ with labial stops only, to form three complex onsets. In other dialects the three velar stops cluster with /R/ as well.

Sonority distance vs. sonority dispersion – a typological survey

153

Table 18. Synopsis of descriptive facts for the Tshangla language classification: Sino-Tibetan, Tibeto-Burman, Himalayish, Tibeto-Kanauri, Tibetic, Bodish, Tshangla countries: Bhutan, China, India population: 175,200 principal source: Andvik 2010; secondary sources: Andvik 1999, 2003 dialect of focus: “standard” Tshangla (Trashigang district) consonant phonemes (29): p t ú k ph th úh >>> > > > kh b d ã g ts tC tsh tCh (dz) dý s C z (ý) m n ñ N l (ì) R j w h $CLV clusters (3): pR ph R bR (kR kh R gR) $CGV clusters: none other #CCV clusters: pC (two tokens)

These are parenthesized in Table 18. Some examples are given below, including clusters with velar anchors from non-Trashigang varieties. In the form [kh RaFe] ‘to meet’, the voiceless bilabial fricative is a lenited allophone of /ph / in intervocalic position. (39) Tshangla data examples (Andvik 2010) /OR/: [pRospe] ‘to plow’ [ph RaNga] ‘underneath’ /OR/: [bRo] ‘taste (n)’ [bRa] ‘other’ /OR/: [kRemtala] ‘lean, thin’ [kh RaFe] ‘to meet’ [gRale] ‘to like, to enjoy’ Andvik (2010: 14) clarifies that historical velar-initial clusters have simplified in Trashigang to a monosegmental onset containing only a retroflexed coronal. For example, /kRame/ ‘to distribute’ in Written Tibetan corresponds to /úame/ in Trashigang. As Table 18 indicates, the initial sequence /pCV/ also occurs, but only in two lexical items. One of these is onomatopoeic; the other is [pCi] ‘four’. Tshangla also exhibits /C+l/ sequences, but only in morphologically derived environments (Andvik 2010: 62–67). Two examples are [jek-la] ‘is speaking’ and [lap-la] ‘is learning’. Both of these end with the present imperfective suffix /-la/. In bimorphemic forms such as these the syllable break plausibly falls between the two consonants since C+l clusters never occur within a single morpheme. Furthermore, there are many words ending with voiceless stops and many others beginning with /l/. Hence Tshangla enforces the prohibition against non-rhotic offset clusters by heterosyllabification. Finally, the sequence /Cui/ can arise, but only across morpheme boundaries. An example is /bu-i/ ‘take-imperative’. Eric Andvik (p.c.) clarifies that in such forms the /u/ rather than the /i/ is stressed, in keeping with the regular pattern of stress falling on the penultimate vowel or syllable. Furthermore, the sequence /bwV/ never occurs in monomorphemic contexts. Consequently, the /u/ here is clearly in the nucleus rather than in the onset.

154

Steve Parker

6.3.3. Parauk Wa (ISO 639-3 code: prk) Parauk Wa is an Austro-Asiatic language spoken in Myanmar and China (Table 19). Elsewhere in this paper it is referred to simply as Wa. All of its consonant phonemes except /P h s/ contrast two varieties: plain and aspirated. Its voiced obstruent stop series is inherently prenasalized. Watkins (2002) describes /ô/ as a median alveolar approximant. Each of the two unaspirated liquids /l ô/ combines as an offset with all bilabial and velar stops, yielding 16 OL clusters. The following sample data illustrate these patterns. A dieresis underneath a vowel indicates contrastive breathy register. Table 19. Synopsis of descriptive facts for the Wa language classification: Austro-Asiatic, Mon-Khmer, Northern Mon-Khmer, Palaungic, Eastern Palaungic, Waic, Wa countries: Myanmar, China population: 1,188,000 source: Watkins 2002 dialect of focus: standard Wa consonant phonemes (35): p t c k P ph th ch kh m b n d ñ é N g m bh n dh ñ éh N gh s h v vh m n ñ N mh nh ñh Nh l lh ô ôh j jh $CLV clusters (16): pl kl ph l kh l m bl N gl m bh l N gh l pô kô ph ô kh ô m bô N gô m bh ô N gh ô $CGV clusters: none other #CCV clusters: none

(40) Wa data examples (Watkins 2002) /Ol/: [pluat] ‘stop (v)’ [m blah] ‘few’ /Oô/: [kôaWN] ‘drum’

[N gliah] ‘sixty’ ¨

[m bh laP] ‘kind of tobacco’

[kôaWN] ‘clothes’ [ph ôu] ‘blow (v)’ ¨

Watkins (2002) notes that there is a lot of dialect variation. Lewis (2009) reports a total of 300 different varieties of Wa spoken in China; only 3–5 of these had been surveyed by the government as of 2006. Justin Watkins (p.c.) clarifies that some Wa languages have only /Cl/ and /Cô/ clusters while other Wa speakers also produce some phonetic syllables beginning with [Cj] and [Cw]. The latter can be analyzed as /CjV/ and /CwV/ in some dialects, but the most accurate phonemicization of these is /CiV/ and /CuV/ in many other cases. The variety of Wa assumed here is the one described in Watkins (2002), which only attests OL onsets, not *OG. Nevertheless, without more details this may be one of those situations in which it is difficult to be sure of the correct interpretation of glide offset sequences (sections 5.1 and 6.1). Consequently, further clarification would be welcome. 6.4. Summary of section 6 (liquid offset languages) In this section I have described six liquid offset languages. All of these contain the equal distance cluster type OLV. Three of them in fact allow this kind of initial demisyllable and no others. Those languages are classified here as type

Sonority distance vs. sonority dispersion – a typological survey

155

F. The other three languages (type E) attest NL clusters as well as OL. This typological array of languages, grouped according to the kinds of onset clusters that occur, parallels the glide offset continuum summarized in section 5.6. The difference here is that all clusters end with a liquid rather than a glide. Apart from this one principled distinction, a strong and consistent generalization across both scales is that the existence of clusters containing anchor consonants of sonority rank x implies the existence in that same language of anchors having all sonority ranks lower than x. In terms of glide offset languages this pattern generates four typological classes: A, B, C, and D (Table 12). With respect to liquid offset languages, in contrast, we expect at most three categories, all else being equal. This is because when C2 is a glide, C1 can also be a glide. However, when C2 is a liquid, C1 cannot be a glide (lest a sonority reversal result). Therefore, when the sonority index of the offset consonant is fixed at 3 (a liquid), the sonority index of the anchor consonant must also range from 1 to 3 only (among CL languages). This generates a maximum of three language types, according to whether the sonority distance between C2 and C1 is set to a value of ≥ 0, ≥ 1, or ≥ 2. Of these three possibilities, the last two are actually attested. Specifically, language type E requires a minimum sonority distance of just 1, whereas type F is more restrictive and thus requires a distance of 2. This latter parameter setting effectively limits C1 to just obstruents in type F languages. However, a third possibility is also theoretically conceivable – a liquid offset language in which the Minimum Sonority Distance is 0. Such a system would exhibit LL onset clusters (sonority plateaus). Naturally it would also have OL and NL clusters too. This type of language does not occur among those I surveyed. If such a language is ever discovered, it will be analogous to type A, which allows GG plateaus. Recall from section 5 that only one language of type A is reported in this paper, Shilluk. It is thus the rarest kind of glide offset language in this sample. We can also predict that A should be the least common type of CG language in the world overall. The lack of a corresponding liquid offset language (with a Minimum Sonority Distance of 0) is considered here to be accidental (cf. section 3.3). Hopefully, future research will turn up some examples of such a system. If so, it will fill in this logical gap and nicely complete the typology. I now summarize the genetic and areal distribution of the six CL languages discussed here (cf. Table 12). In Table 20 below the relative spread of the six languages is not quite as ideal as that of the eight glide offset languages in Table 12. For example, there are no liquid offset languages from Africa in this sample. This is unfortunate. However, providentially there is one language from South America (Catío). Furthermore, while two of the type E languages are spoken in Indonesia (Aceh and Isirawa), they are located in different regions of that

156

Steve Parker

Table 20. The liquid offset continuum of language types (cf. Table 3) language type

permissible onset clusters

E

OL, NL

F

OL

language

phylum

countries

Aceh Garo Isirawa Catío Tshangla Wa

Austronesian Sino-Tibetan Tor-Kwerba Choco Sino-Tibetan Austro-Asiatic

Indonesia India, Bangladesh Indonesia Colombia, Panama Bhutan, China, India Myanmar, China

country. Also, these two languages pertain to different genetic phyla. There are also two Sino-Tibetan languages in this table (Garo and Tshangla), but they are different subtypes (E vs. F, respectively). Another felicitous characteristic of this table is that each of the two types (E and F) is represented by three different languages. Overall, then, this initial survey has provided enough evidence to establish a preliminary continuum of liquid offset languages composed of two different types. Furthermore, as noted in section 4, a number of other potential CL languages are listed in Appendix B. These are not included in the analysis here since they do not meet all of the ideal criteria laid out in section 4. Nevertheless, they tentatively appear to be exemplars of these two language types, should someone else wish to follow up on them. Another noteworthy detail of this sample is that a single phylum (Austronesian) attests both a glide offset language (Ga’dang in Table 12) and a liquid offset language (Aceh here in Table 20). Similarly, Sino-Tibetan is represented by a mix of language types – Kham is CG in Table 12, while Garo and Tshangla are CL in Table 20. A final feature worth highlighting is the precise phonetic quality of the liquid offset consonants occurring in these languages. Recall that in the three type E languages, C2 is most often a rhotic (section 6.2.4). All three of these languages attest rhotic offsets, but only one of them (Aceh) also permits a lateral in this position. For the three F languages in section 6.3 a similar pattern is observed. In all three cases rhotic offsets are allowed, and two of the languages (Catío and Tshangla) permit only rhotics in C2 . The third language, Wa, has clusters with /l/, yet it has an equal number of clusters with /ô/. We have thus identified a statistical trend among the languages sampled here: in systems where C2 is limited to liquid consonants, rhotics are more common than laterals, all else being equal. In contrast to this, among the glide offset languages documented in section 5, neither glide is more frequent overall. That is, neither the palatal /j/ nor the labiovelar /w/ is more predominant among the languages listed in Table 12, numerically speaking.

Sonority distance vs. sonority dispersion – a typological survey

7.

157

Conclusion

In this survey of 122 languages containing onset clusters, I have identified two partially new phonological patterns rooted in sonority. One of these is the glide offset (CG) continuum. It confirms the need for minimum sonority distance constraints, as Steriade (1982), Harris (1983), and Selkirk (1984) posit. It also indicates that glides are one major type of default, unmarked segment in C2 position. At the same time, however, a liquid offset (CL) continuum exists side by side with this CG hierarchy and directly competes with it. What CL languages tell us is that liquids can be inherently preferred in C2 position just as much as glides can. This is a partial validation of Clements’ (1990) Sonority Dispersion Principle. Together these two formal proposals go a long way towards accounting for the majority of all onset clusters in most languages. Furthermore, this study gives us a preliminary idea of the relative strength of these two cross-linguistic generalizations by documenting the tendencies with hard empirical data to a degree that is not likely due to chance. This is a significant contribution if for no other reason than that it has never been done before. Consequently, insofar as these results accurately reflect the prototypical linguistic situation, we can now make the following prediction. Based on the number of CG languages analyzed in section 5 (n = 8), together with the quantity of potential glide offset languages listed in Appendix A (n = 51), glides are probably more common than liquids in C2 position universally. Nevertheless, it remains to be seen whether this claim will prove to be statistically reliable. All of this begs an important and crucial question: how can there be two unmarked onset clusters cross-linguistically at the same time, OG and OL? On the one hand I suggest that glides are optimal offset segments since they contrast the most with preferred anchor consonants in terms of sonority. On the other hand, however, glide offsets are a liability because they are too similar to the following vowel. If this latter “pressure” is of paramount importance to avoid, a language can opt for the next best thing – a liquid offset. This has the advantage of being midway between the surrounding segments (an obstruent and a vowel) – a compromise in effect. In other words, liquids encroach on neither of the segments they are adjacent to. Since the optimality theoretic constraints that encode these two general tendencies are violable, there is no problem; languages differ because rankings differ. To summarize, the two sonority-based language continua posited here can be graphically expressed as the following cross-linguistic implicatures (cf. Gipp 2011):

158

Steve Parker

(41) (a) The glide offset hierarchy of onset types: GG → LG → NG → OG (b) The liquid offset hierarchy of onset types: LL → NL → OL In (41) the rightward pointing arrows indicate universal, asymmetric entailments that appear to be absolute. Thus, for example, the presence of the cluster type GG in a specific language necessarily implies the existence of LG clusters in that same inventory, but not vice-versa. Similarly, NL clusters cannot exist without OL clusters, etc. For lack of space I have not pursued here a formal analysis of these facts. Nevertheless, we can at least briefly speculate now about a potential direction for future work. Let us assume some type of general Minimum Sonority Distance constraint family, as described in section 3.1 (see also Smith and Moreton in this volume). Let us furthermore assume that there is some way to restrict C2 to glides. A combination of these two types of constraints would then produce the glide offset language continuum. Addressing CL languages now, suppose furthermore that there is a constraint specifically against glides in syllable onsets. One such possibility is Prince and Smolensky’s (1993/2004) margin constraint *Onset/Glide. However, this is problematic because we need to specifically target glide offsets yet not glide anchors. The reason is because all of the CL languages described in section 6 allow at least one glide in CV syllables. Nevertheless, assuming that this technical challenge can be overcome, a high rank for this hypothetical *Onset/Glide kind of constraint would prohibit glides in offset position. Then if classical minimum distance constraints are enforced in the usual way, the interaction of all of these together would result in what look like CL languages: highly separated sonority classes in onset clusters, yet no glides in C2 . The upshot of this line of analysis is that Clements’ (1990) Sonority Dispersion Principle would then largely be an epiphenomenon. All of these findings provide fresh support for the notion of sonority as a basic organizational feature of Universal Grammar. That is, given that phonological theory should capture principled correlations between languages, such as systematic restrictions on inventories of well-formed consonant clusters in different prosodic positions, sonority helps us explain why the human world is the way it is. For example, we can now predict that no language should exhibit the opposite of what we have observed here: maximum sonority distance constraints. Such a hypothetical language (or a continuum of imaginary languages) might require C2 to be a glide while simultaneously restricting anchor consonants to one of the following sets of natural classes: (a) glides, liquids, and nasals; or (b) glides and liquids; or (c) glides only. In this infelicitous universe, however, obstruents crucially never occur in C1 position in at least some languages that permit other glide offset clusters. This would necessitate constraints

Sonority distance vs. sonority dispersion – a typological survey

159

that are the logical inverse of those motivated by the languages occurring in this sample. The reason such constraints should not exist is because they would impede perception by minimizing the contrast in sonority between anchor and offset segments (section 3.2). Finally, we can now return to a point noted in passing in section 2. If minimum sonority distance constraints are on the right track, and if fricatives outrank stops on the sonority scale, then we expect to find some languages having the following two characteristics: (1) all offset consonants are glides (or else liquids), and (2) permissible anchors include stops (e.g., /tR/, /tw/, etc.) but not fricatives (*/sl/, */sw/). I leave this as an interesting prospect to be explored in future research. Appendix A: Other potential glide offset (CG) languages As discussed in sections 4 and 5.6, this survey uncovered a number of languages in which the only licit onset clusters end with a glide. Eight such canonical languages are described in section 5. In the chart below I list 51 other contenders for this type of language (glide offset only). However, as noted in conjunction with (14), in some of these cases the status of these sequences as true onset clusters is in doubt. Consequently, each analysis must be crucially confirmed by language-specific argumentation. Furthermore, in section 4 I discuss several a priori theoretical and practical factors for determining whether possible data sources suit the purposes of this study. Unfortunately, the 51 cases below each lack in one or more of these criteria. Therefore it seems prudent not to incorporate them (yet) into the analysis. Nevertheless, this does not imply that the linguistic descriptions in these sources are necessarily incorrect. Rather, in many instances the only hesitation I have is that the coverage of details is not as exhaustive as it could be (see section 4). Therefore, if the analyses contained in these works can be confirmed more robustly, many of them will prove to be true CG languages. They are thus listed here in the hope that further studies will take them into account. Both Rod Casali and Ken Olson (p.c.) point out that apparent glide offset languages are especially common in Sub-Saharan Africa. This prediction is in fact consistent with my findings; nearly one-half of the languages in this appendix are from that continent. Therefore we might call this the African syndrome. The 51 languages listed below are presented in alphabetical order. The spelling of language names follows the conventions of Lewis (2009; cf. section 5.2.1). Within each entry the name of the language is given first and is underlined. In two cases (Lango and Mono) I also include the country where the language is spoken since other languages with those same names also ap-

160

Steve Parker

pear in the Ethnologue. The next detail in each cell is one or more references to the works containing the corresponding linguistic data and discussion. After that I also provide, for some of the languages, a brief comment summing up the analysis presented by the source(s). Finally, I indicate the potential type of CG system each language could turn out to be if the analysis is confirmed: A, B, C, or D, based on Table 2. In cases where the exact type cannot be determined from the information given by the source(s), I annotate this as unclear. For an analogous list of potential liquid offset (CL) languages, see Appendix B. 1.

Akha. Katsura 1973; Hansson 2003. Potential type: C.

2.

Akoose. Hedinger and Hedinger 1977; Hedinger 2008. Potential type: B.

3.

B30 Bantu (Tsogo, Vove, Viya, Pinzi(pinzi), Kande, Himba(ka), and Bongwe). van der Veen 2003. Potential type: unclear.

4.

Bana. Hofmann (1990) considers [Cw ] labialization and [Cj] syllable prosody. Potential type: D.

5.

Baruya. Lloyd and Healey (1970) interpret CG as complex onsets. However, a complication is that later, Lloyd (1981) posits a competing analysis with a different number of phonemes. Potential type: C.

6.

Burmese. Wheatley 2003; Green 2005. Potential type: A or B.

7.

Cebaara Senoufo. Mills (1984) interprets CG as a word level secondary release. However, Burquest (2001) notes that both glides (/j/ and /w/) occur with virtually all anchor consonants in C1 position, so he concludes that a cluster interpretation is preferable. Otherwise the number of consonant phonemes is nearly tripled. Potential type: unclear.

8.

Chantyal. Noonan 2003. Potential type: unclear.

9.

Chinese. Duanmu 1999, 2007, 2011; Daland et al. 2011. Potential type: B.

10.

Darai. Kotapish and Kotapish 1973. Potential type: unclear.

11.

Deg. Crouch and Herbert (2003) interpret [Cw] as labialization. It appears that [Cj] may not occur at all. Potential type: B.

12.

Ganda. Tucker 1962; Cole 1967; Katamba 1985, 2001; Clements 1986. Potential type: B.

13.

Garawa. Furby (1974) posits two clusters involving a retroflexed rhotic-like offset that may technically be a semi-consonant or glide. Potential type: D.

14.

Gbari. Rosendall 1992. Potential type: A.

15.

Hanga. Hunt and Hunt (1981) interpret CG as complex consonants (one phonemic unit). Potential type: C.

16.

Haya. Byarushengo 1977. Potential type: unclear.

17.

Herero. Elderkin 2003. Potential type: B.

Sonority distance vs. sonority dispersion – a typological survey

161

18.

Hmwaveke (Fa Tieta). Campbell (1987) interprets CG as single complex units. Potential type: C.

19.

Iu Mien (Yao). Downer 1961; Purnell 1965. Potential type: B.

20.

Ivatan. Hidalgo and Hidalgo 1971. Potential type: B.

21.

Japanese. Bloch (1950), Martin (1952), Vance (1987, 2008), and Kondo (2000) analyze /Cj/ as complex onset clusters. In contrast, Hinds (1986) and Ito and Mester (1995) interpret these as palatalized units (/Cj /), but give no evidence to support this conclusion. Potential type: B.

22.

Kasem. Callow (1965) interprets CGV as [CVV], i.e., vowel clusters. See also Awedoba (2002). Potential type: A.

23.

Koluwawa. Guderian and Guderian 2005. Potential type: C.

24.

Korean. Martin 1951; Kim 1994; Lee 1994; Kang 2010; Daland et al. 2011; Smith to appear. Potential type: B.

25.

Lango (Uganda). Noonan 1992. Potential type: B.

26.

Lealao Chinantec. Rupp 1990. Potential type: B.

27.

Lega. Botne 2003. Potential type: B.

28.

Limbum. Fiore (1987) interprets CG as single complex units. Potential type: C.

29.

Logo. Goyvaerts 1983; Wright 1995. Potential type: B.

30.

Lubuagan Kalinga. Gieser (1958) analyzes [CGV] as /CVGV/. Potential type: unclear.

31.

Makhuwa. Kisseberth 2003. Potential type: B.

32.

Mansaka. Svelmoe and Svelmoe 1990. Potential type: unclear.

33.

Mono (Democratic Republic of the Congo). Olson 2005. Glide offsets are normally realized as mid vowels rather than high vowels. Also, the language has a marginal CLV type of onset cluster that optionally alternates with CVLV. Potential type: B.

34.

Ndali. Botne 2008. Potential type: A.

35.

Newar. Hargreaves 2003. Potential type: unclear.

36.

Ngas. Burquest (1971) interprets [Cw ] and [Cj ] as release: labialization and palatalization. Potential type: B.

37.

Parecís. Rowan (1967) posits /Cw/ as an onset cluster. However, according to Silva (2009 and p.c.), the only such sequence in his data is [kw]. He interprets this as a unit phoneme, just like /tj /. Potential type: B.

38.

Puinave. Caudmont 1953. Potential type: C.

39.

Purepecha. Foster 1969. Potential type: D.

162

Steve Parker

40.

Qawasqar. Clairis 1985. Potential type: D.

41.

Rawang. Morse 1962. Potential type: C.

42.

Silacayoapan Mixtec. North and Shields 1977. Potential type: C.

43.

Swahili. Contini-Morava 1997. Potential type: C.

44.

Tagakaulu Kalagan. Dawson 1958. Potential type: B.

45.

Tangut. Hwang-Cherng 2003. Potential type: unclear.

46.

Tepetotutla Chinantec. Westley 1971. Potential type: B.

47.

Terêna. Harden 1946; Bendor-Samuel 1966. Potential type: B.

48.

Toda. Emeneau 1957. Potential type: C.

49.

Tol. Fleming and Dennis 1977. Potential type: B.

50.

Vagla. Crouch and Smiles (1966) interpret CGV as [CVV], i.e., vowel clusters. Potential type: unclear.

51.

Yuchi. Wagner 1933; Crawford 1971; Ballard 1975. Potential type: D.

Appendix B: Other potential liquid offset (CL) languages As highlighted in sections 4 and 6.4, some languages require all onset clusters to end with a liquid consonant. Six clear examples of this type of language are described in section 6. Eleven other candidates are listed in the chart below. For the purposes of this paper the following languages are all problematic in one way or another. For example, in some of these the CL clusters are marginal or rare. In others the data suggest that there may in fact be phonetic glide offset clusters too. For each of these 11 cases I include summary comments indicating the reason(s) for not including that language in my analysis above. In the table below the format for presenting the relevant details is the same as that of Appendix A. For example, language names are listed alphabetically and are underlined. See Appendix A for further related discussion. That appendix includes many other potential CG (glide offset) languages. 1.

Araki. François 2002. Most onset clusters are of the type CL, but he posits no glides in the inventory of phonemic consonants. Potential type: E.

2.

Big Nambas. Fox 1979. There are lots of clusters involving /l/ or /r/ in offset position. However, there are apparently no phonetic or phonemic glides. Potential type: F.

Sonority distance vs. sonority dispersion – a typological survey

163

3.

Isaka. Donohue and San Roque 2004. The language has [Cl] and [Cr] (or [CR]) clusters phonetically, but all of these offset consonants are allophones of /d/. There are no liquid phonemes per se. Furthermore, such clusters occur only word-initially, and only in monosyllabic words. Also, in slow speech such clusters are often separated by an intrusive schwa. There are no CG clusters. Potential type: F.

4.

Kemtuik. van der Wilden and van der Wilden 1975. The only onset clusters, which occur strictly in stressed syllables, have /l/ as the second member. However, in the sequence /CuV/ the /u/ can be phonetically glided (nonsyllabic) and thereby produces contrastive (non-predictable) labialization on some syllable-initial consonants. Potential type: E.

5.

Kobon. Davies 1980, 1981. No CG clusters occur. There are a few clusters of the type CL. However, (1) these are always word-initial, never word-medial; (2) there are just a couple of tokens of each type and only nine such tokens overall in the entire language; and (3) the two consonants are often separated phonetically by an intrusive schwa. Potential type: F.

6.

Merei. Chung 2005. Jeremiah Chung (p.c.) reports that the language has no phonemic glides. What he transcribes orthographically as is phonologi> cally the affricate / dz/. Potential type: F.

7.

Pacoh. Alves 2006. In the native vocabulary syllable-initial clusters are limited to CL. However, numerous loanwords from Vietnamese have introduced the cluster type CG into the language. Furthermore, Watson (1966) does not posit any phonemic glides, even though his earlier work (1964) does. Potential type: F.

8.

Upper Ta’oih. van der Haak 1993. The language has CL clusters as well as the sequences /Cia/ and /Cua/. Feikje van der Haak (p.c.) clarifies that in such cases the high vowel is nuclear and in fact is stressed more than the /a/ is. She is not aware of any words in which /i/ or /u/ sounds like an onset glide. However, her published analysis is very brief and appears to be tentative. Potential type: F.

9.

Vinmavis. Crowley 2002; Musgrave 2007. The only productive (frequentlyattested) onset cluster root-initially is /ns/. All other syllable-initial clusters occur in just a couple of forms each: /tl/, /tn/, /sn/, and /dR/. Potential type: F.

10.

Wayu. Michailovsky (2003) reports initial clusters consisting of /Cl/ and /CR/, but these are rare except in “phonaesthetic adverbs”. Also, an earlier work (Michailovsky and Mazaudon 1973) includes two words with /xw/ clusters. Potential type: F.

164

11.

Steve Parker

Yanomámi. Aikhenvald and Dixon (1999) list four clusters of the type /CR/. However, there also appear to be some sequences of the type /CiV/. These could potentially be realized phonetically as [CjV]. One of the sources they refer to (Gómez 1990) is actually discussing a different language of the cluster (Ninam). I was not able to access any of Aikhenvald and Dixon’s (1999) primary sources and thus could not directly confirm their analysis. Potential type: E.

Abbreviations C C1 C2 CG CL CSR G H inf IPA ISO L M MSD n N O obj. OCP OT ROTB SD SDP sg SI sp. SSP v V

consonant the first consonant in a tautosyllabic onset cluster (= the anchor) the second consonant in a tautosyllabic onset cluster (= the offset) glide offset clusters and languages liquid offset clusters and languages Cluster-to-Segment Ratio (see (18)) glide high tone infinitive International Phonetic Alphabet International Organization for Standardization liquid (also low tone) mid tone minimum sonority distance noun nasal obstruent object Obligatory Contour Principle Optimality Theory Richness of the Base sonority distance and/or sonority differential Sonority Dispersion Principle singular Sonority Index species Sonority Sequencing Principle verb vowel

Sonority distance vs. sonority dispersion – a typological survey

165

Acknowledgments For very helpful and detailed suggestions on an earlier draft of this paper I am especially grateful to Rod Casali and Jen Smith. I also thank Michael Boutin as well as the students in my Current Issues course taught at GIAL in the fall of 2011 (Brianna Gipp, Christy Melick, David Pate, and Paul Williamson). Please do not assume that any of these people necessarily agrees with everything I say here. Finally, I am grateful to the following persons for clarifying the facts of the respective languages: Michael Ahland (Bambassi), Erik Andvik (Tshangla), Scott Berthiaume (Pame), Rob Burling (Garo), Mark Durie (Aceh), Leoma Gilley (Shilluk), Chaz Mortensen (Catío), Mike Walrod (Ga’dang), Justin Watkins (Wa), and Steve Watters (Kham).

Sonority variation in Stochastic Optimality Theory: Implications for markedness hierarchies Jennifer L. Smith and Elliott Moreton Abstract. Constraints based on the sonority scale and other markedness hierarchies have been formalized in two different ways: as scale-partition constraint families, with a universally fixed ranking, and as stringency constraint families, where the constraints stand in subset relations and their ranking is not universal. We identify a new empirical domain where these two approaches make different predictions: within-speaker variation. Namely, in the framework of Stochastic Optimality Theory (Boersma and Hayes 2001), the scale-partition approach predicts harmony reversals in the presence of variation (i.e., sometimes a more marked form should be chosen in preference to a less marked form), while the stringency approach does not. We further demonstrate that the pattern of harmony reversals predicted by the scale-partition approach can be numerically quantified, based on the constraints’ ranking values; this allows true harmony reversals to be efficiently distinguished from cases where an unrelated constraint causes apparent harmony-reversal behavior. Known cases of variation involving markedness hierarchies have not been described at the level of detail necessary to distinguish true harmony reversals from apparent ones, but we have identified empirical conditions that would strongly support the scale-partition approach to markedness hierarchies if they were to be uncovered in future work.

1.

Introduction

Sonority-related effects belong to a large class of phenomena – in phonology, and in linguistic theory more generally – that have been analyzed in terms of markedness hierarchies. A markedness hierarchy is a multi-step scale designed to model implicational universals, such as the cross-linguistic preferences for high-sonority syllable nuclei (Dell and Elmedlaoui 1985) and for low-sonority syllable onsets (Steriade 1982; Zec 1988). Sonority-related effects have been studied for many years (Sievers 1876/1893), and have influenced the study of markedness hierarchies in general (Aissen 1999). In Optimality Theory (OT), markedness hierarchies have been formalized in one of two ways: as a scale-partition constraint family (Prince and Smolensky 1993/2004; see also much subsequent work on markedness scales in OT), in which there is one constraint per level of the hierarchy and the constraints in the

168

Jennifer L. Smith and Elliott Moreton

family are ordered in a universally fixed ranking, and as a stringency constraint family (Prince 1997, 1999; de Lacy 2002, 2004, 2006), in which constraints are formalized as increasingly larger subsets of the hierarchy and can be freely ranked on a language-particular basis. Previous comparison of the two formal approaches to markedness scales has focused on differences in their between-language typological predictions (see especially de Lacy 2004). This chapter identifies an additional empirical domain where the two approaches make distinct predictions, but where the implications for markedness hierarchies have remained largely unexplored: speaker-internal phonological variation. Specifically, we examine the role in variation patterns of harmony reversals – the selection of a less harmonic or less desirable form in preference to a more harmonic one, as seen schematically in the mapping of the form /plapna/ to [pV.la.pna]. Here, the potential onset cluster [pl] has been avoided through epenthesis, but the cluster [pn], which is more highly marked because of its smaller sonority distance, has surfaced faithfully. We show that in the framework of Stochastic OT (Boersma and Hayes 2001), formalizing sonority effects (or other types of markedness hierarchies) as scalepartition constraints predicts the existence of harmony reversals under phonological variation, while the stringency approach makes no such prediction (sections 2–3). We confirm that patterns of variation involving multiple levels of the sonority scale exist, so the empirical scenario of interest is attested and theoretically relevant (section 4). However, it is difficult to test whether or not any particular case of sonority-related variation really does show harmony reversals, because interference from other constraints might lead to a pattern that coincidentally resembles a harmony-reversal pattern. We present a new empirical method for testing whether a harmony reversal has actually been found, by showing that a true harmony reversal exhibits a particular mathematical relationship between the probabilities of ranking reversals among multiple constraints in the markedness-hierarchy constraint family (section 5). As our introductory discussion in this section illustrates, we take sonority to be a phonological property, to which formal phonological constraints can make reference, rather than a primarily phonetic property (although it may be ultimately based on or grounded in phonetic factors; see Parker 2002 for an extensive review). In addition, we view sonority in terms of a scale, to which, again, the phonological grammar can refer. However, the arguments we make in this chapter do not crucially depend on whether sonority is formalized as a single multivalued feature (de Lacy 2006), or whether instead the sonority scale is derived from combinations of values of binary major-class features such as [±vocalic, ±approximant, ±sonorant] (Clements 1990). Moreover, our work shows that whether the sonority scale is predicted to be universally consistent, or

Sonority variation in Stochastic Optimality Theory

169

to have language-particular or variable aspects, crucially depends at least in part on the formalization of sonority constraints in a particular phonological framework; identical assumptions about the sonority scale itself will lead to crosslinguistic consistency in the stringency approach but to cross-linguistic variation in the scale-partition approach when implemented under Stochastic OT. While this chapter focuses on examples of sonority-related variation, the basic point about scale-partition versus stringency constraints and predicted patterns of speaker-internal variation is more widely relevant, with implications for any domain of linguistics where constraint families based on markedness hierarchies have been proposed. 2.

Markedness hierarchies in Optimality Theory

The concept of the markedness hierarchy has proven useful in multiple domains of linguistics, in morphosyntax (Silverstein 1976; Keenan and Comrie 1977; Dixon 1979; Croft 1988; Aissen 1999, 2003; Lee 2006) as well as in phonology. In broad terms, a markedness hierarchy is a family of related linguistic features – such as the features encoding sonority, place of articulation, animacy, or definiteness – that is structured in a cross-linguistically consistent hierarchy of implicational relationships, and plays a role in multiple linguistic patterns within and across languages. A number of markedness hierarchies are based on the sonority scale.1 For example, it has been proposed that syllable peaks in general (Dell and Elmedlaoui 1985), stressed syllable peaks (Kenstowicz 1997), and moraic segments (Zec 1995; Gnanadesikan 2004) each show a preference for segments of the highest possible sonority level; that onsets show a preference for segments of the lowest possible sonority level (Steriade 1982); that onset clusters show a preference for rising and maximally dispersed sonority (Selkirk 1982; Clements 1990; Baertsch 1998; see also Parker, this volume); and that with respect to syllable contact, adjacent coda-onset sequences show a preference for falling and maximally dispersed sonority (Murray and Venneman 1983; Gouskova 2004). As a simplified illustration of a markedness hierarchy involving sonority, consider the preference for higher-sonority syllable peaks. Vowel height cate-

1. Indeed, sonority-related markedness hierarchies were among the first to be formalized within OT (Prince and Smolensky 1993/2004).

170

Jennifer L. Smith and Elliott Moreton

gories have the following sonority relationship2 (where ‘>’ means ‘is greater in sonority than’). (1)

Sonority scale (partial) high sonority low vowels > mid vowels

>

low sonority high vowels

Since syllable peaks prefer higher-sonority segments, this basic sonority scale corresponds to a harmony scale (Prince and Smolensky 1993/2004: section 8.1), as in (2) (where ‘ ’ means ‘is more harmonic than,’ i.e., ‘is phonologically preferable to’). (2)

Harmony scale more preferred peak/lowV

peak/midV



less preferred peak/highV

In Optimality Theory and related frameworks, the two main approaches to modeling such preference relationships by means of phonological constraints are the scale-partition approach (section 2.1) and the stringency approach (section 2.2). 2.1. The scale-partition approach to markedness hierarchies Originally, a harmony scale such as that in (2) was mapped directly onto a family of phonological constraints according to a formal operation known as constraint alignment (Prince and Smolensky 1993/2004: section 8.1).3 By this operation, each phonological structure gives rise to a constraint that penalizes that structure (and only that structure). The least-preferred phonological configuration (here, a syllable peak with a high vowel) is associated with the highest-ranked constraint, i.e., incurs the most severe penalty. (3)

Scale-partition constraint family: *Peak/X *Peak/HighV  *Peak/MidV  *Peak/LowV

2. This fragment of the sonority scale is presented in order to facilitate a concrete discussion of the formalization of sonority-related constraints; for simplicity, no central (reduced) vowels and no consonants are considered here. See Kenstowicz (1996), Parker (2002), and de Lacy (2002, 2004, 2006) concerning the place of central vowels on the sonority scale; consonants have lower sonority than all vowels. 3. Constraint alignment, the operation which derives a family of universally ranked constraints from a markedness scale as discussed here, is distinct from Alignment constraints (McCarthy and Prince 1993a), which are phonological constraints requiring particular morphosyntactic and phonological constituents to align their edges in output representations.

Sonority variation in Stochastic Optimality Theory

171

The constraints in this family assign violations to candidates with different syllable peaks as follows. (4)

Violations assigned by *Peak/X constraints4 *Peak/HighV *Peak/MidV *Peak/LowV a. [a] * b. [e] * c. [i] *

As the tableau in (4) indicates, in order for the *Peak/X constraints to select output candidates in accordance with the harmony scale in (2), they must always be ranked in the order shown in (3) in every language. This is because each constraint penalizes exactly one point on the harmony scale, rather than an interval or set of points along the scale. If these constraints could be ranked differently in different languages, then languages would be predicted to vary widely as to which sonority levels were most preferred as nuclei. In the extreme case, taking the entire sonority scale into account, this predicts that some languages should prefer obstruent nuclei over low-vowel nuclei, a prediction that is not empirically supported.5 Therefore, Prince and Smolensky (1993/2004: section 8.1) explicitly propose that constraint families that are derived from markedness scales through constraint alignment in this way have a universally fixed ranking determined by the associated harmony scale (as in (2)). 2.2. The stringency approach to markedness hierarchies An alternative to scale-partition constraint families for modeling markedness hierarchies is stringency, proposed by Prince (1997, 1999) and extensively developed by de Lacy (2002, 2004, 2006). In the stringency approach, for every point along the harmony scale, there is a constraint that assigns violations to that point and all points up to and including the least-preferred end of the scale. In other words, each constraint in the stringency family refers to the least-preferred structure on the harmony scale, and if the constraint refers to more than one 4. This is not a standard tableau, since no input is shown. This display simply shows how violation marks would be assigned to the given outputs by the constraints under discussion. 5. Sonority is not the only factor that influences which vowel is the least marked syllable peak in a given language; other constraints (such as those for featural markedness) are relevant as well. But it is clear that typological facts about default syllable nuclei do not support a constraint family such as that in (3) with free reranking.

172

Jennifer L. Smith and Elliott Moreton

point on the scale, all such points form a contiguous interval. For example, the harmony scale in (2) would give rise to the family of constraints in (5) (5)

Stringency constraint family: *Peak/≤X *Peak/≤HighV penalizes peaks associated with {HighV} *Peak/≤MidV penalizes peaks associated with {HighV, MidV} *Peak/≤LowV penalizes peaks associated with {HighV, MidV, LowV}

With this set of constraints, violations are assigned to candidates with different syllable peaks as in (6). (6)

Violations assigned by *Peak/≤X constraints *Peak/≤HighV ¦ *Peak/≤MidV ¦ *Peak/≤LowV a. [a] ¦ ¦ * b. [e] ¦ * ¦ * c. [i] * ¦ * ¦ *

As discussed in detail by both Prince and de Lacy, stringency constraints differ from scale-partition constraints in that they do not require a universally fixed ranking in order to ensure that only grammars compatible with the harmony scale can be generated. Crucially, no ranking of the constraints in (6) can possibly produce a grammar in which a lower-sonority peak is allowed but a higher-sonority peak is not allowed, because any constraint that penalizes a higher-sonority peak will necessarily also penalize all peaks that are lower in sonority; lower-sonority peaks are harmonically bounded (with respect to the constraints in the stringency family). In other words, no single constraint of the *Peak/≤X constraint family ever prefers [i] over [e] as a syllable peak, because any *Peak/≤X constraint that assigns a violation to [e] necessarily assigns a violation to the lower-sonority [i] as well. Prince and de Lacy make an empirical case for the stringency approach to markedness scales on the basis of factorial typology. That is, they demonstrate that there are categorical phonological patterns occurring in natural-language phonologies that are predicted under a stringency approach, but not under a fixed-ranking approach. Crucial examples involve scale conflation, a pattern in which two or more points on the scale are treated by some phonological pattern as equally (un)desirable. This chapter identifies a second domain in which the two approaches make distinct empirical predictions: in variation patterns involving the reranking of a single constraint with respect to multiple members of the constraints in a markedness scale. If we assume the scale-partition approach, then learning the

Sonority variation in Stochastic Optimality Theory

173

variation pattern entails learning a pattern that produces harmony reversals – instances of variation in which a structure found lower on the harmony scale is actually chosen over a structure that is more harmonic. If we assume the stringency approach, then the variation pattern can be learned with a grammar that nevertheless continues to prohibit harmony reversals. As we discuss in section 5, variation data described at the level of detail necessary to distinguish between the two approaches is not yet available. However, we identify empirical conditions that would provide strong support for the scale-partition approach if found. 3.

Markedness hierarchies, phonological variation, and harmony reversals

3.1. Phonological variation in Stochastic OT A given constraint ranking produces one consistent output for each input. This means that a speaker (or a language community) showing variation between two or more output forms must in some way be making use of two or more distinct constraint rankings. One influential approach to modeling linguistic variation in Optimality Theory is Stochastic OT (Boersma 1998b; Boersma and Hayes 2001). In this framework, constraint rankings correspond to points on a number line. While the exact numerical values assigned to the constraints are arbitrary, a greater value corresponds to a higher ranking, and so domination relations between constraints can be represented numerically. Stochastic OT models variation by adding an element of unpredictability to the ranking relationships between constraints. In the grammar of a particular language, each constraint has an intrinsic and consistent ranking value. However, every time an input is mapped to an output by the grammar, the ranking value for each constraint is perturbed by a “noise” component (stated informally, for each constraint in the grammar, a small amount is added to or subtracted from the ranking value), resulting in a value known as the selection point for the constraint in question. The noise component is drawn from a normal distribution whose mean is the constraint’s ranking value and whose standard deviation is some constant value, often set at 2.0 units by convention; each constraint’s selection point will be within three standard deviations (±6.0 units) of the ranking value in more than 99% of all input-output mappings. Boersma and Hayes (2001: 50) propose that all constraints have the same standard deviation for their noise distribution, because the noise function is part of the grammar as a whole and not the property of an individual constraint. The proposal that the noise distribution is the same for all constraints has crucial consequences for

174

Jennifer L. Smith and Elliott Moreton

variation involving markedness scales, as is discussed in detail in section 3.2 below. Because the value of the noise component varies for each constraint each time the grammar is invoked, so does the constraint’s selection point. Since it is the relative ordering of two constraints that determines which constraint dominates the other, then when two constraints have ranking values that are very close together, the domination relation between them can actually vary, depending on whether their selection points on a given input-output mapping are higher or lower than their intrinsic ranking values. Specifically, if constraint C1 has a ranking value that is higher than that of constraint C2 by five standard deviations (here, 10.0 units) or more, then the relative ranking C1  C2 is stable, because the probability of C1 having a selection point that is lower than C2 ’s selection point on a given input-output mapping is vanishingly small. This scenario is illustrated in (7), where ranking values and noise distribution curves are shown for two constraints whose ranking values are far apart. (7)

Ranking values and noise distributions for two constraints whose ranking is stable C1 C2 ↑ C1 ranking value

↑ C2 ranking value

On the other hand, if two constraints C1 , C2 have ranking values that are close together, then the relative ordering of their selection points will vary, as shown in (8). Such a grammar does in essence make use of two different rankings because on some evaluations, C1  C2 , as in (8)(b), but on others, C2  C1 , as in (8)(c). (The probability of occurrence of C1  C2 versus C2  C1 depends on their ranking values, a point whose implications for markedness scales will be explored in detail in section 5.) (8)

Variable constraint ranking in Stochastic OT (a) Ranking values and noise distributions for two constraints whose ranking is variable C1 C2 ↑ C1 ranking value

↑ C2 ranking value

Sonority variation in Stochastic Optimality Theory

175

(b) Example of selection points on an evaluation where C1  C2 C1 C2 ↑ C1 selection point



↑ C2 selection point

(c) Example of selection points on an evaluation where C2  C1 C1 C2



↑ ↑ C2 selection point C1 selection point A grammar in which the relative ranking of two conflicting constraints can vary in this way is a grammar that produces discernable phonological variation, because the choice of which output wins on a given evaluation will depend on which of the two constraints happens to have a higher selection point for that evaluation. 3.2. Markedness hierarchies and harmony reversals under Stochastic OT Under the assumption (Boersma and Hayes 2001) that the standard deviation of the noise distribution is the same for all constraints, the Stochastic OT model places restrictions on possible patterns of variation (a point discussed by Anttila 2007: 534 as well). [...] it is worth noting that this model is quite restrictive: there are various cases of logically possible free rankings that it excludes. Thus, for example, it would be impossible to have a scheme in which A “strictly” outranks B (i.e., the opposite ranking is vanishingly rare), B “strictly” outranks C, and D is ranked freely with respect to both A and C. This scheme would require a much larger standard deviation for D than for the other constraints. (Boersma and Hayes 2001: 50)

In other words, given a ranked set of constraints A  B  C, and variation in the relative ranking between these constraints and a fourth constraint D, scenarios (9)(a–b) are possible, but (9)(c) is not.

176

Jennifer L. Smith and Elliott Moreton

(9)

Variation scenarios for one constraint (D) versus ranked constraints (A  B  C) (a) Possible: – The relative ranking A  B  C does not vary – D varies with respect to at most two consecutive points on the scale B

A

C

D

(b)

Possible:

– D varies with respect to each of A, B, C – The relative ranking A  B  C also shows variation B

A

C

D

(c)

Impossible:

– The relative ranking A  B  C does not vary – D varies with respect to each of A, B, C

As noted in section 2.2, a major difference between the scale-partition and stringency approaches to markedness scales is whether or not the family of constraints requires a universal ranking in order to produce patterns in accordance with the associated harmony scale; the scale-partition approach does require such a universal ranking, but the stringency approach does not. In the Boersma and Hayes (2001) implementation of Stochastic OT, this difference leads to a difference in predicted phonological patterns. Assume a situation as in (9), in which A, B, and C are specifically a family of constraints associated with a markedness hierarchy. If there is another constraint D whose ranking is known to vary with respect to that of multiple members of the constraint family, then the ranking of the constraints within the family (as determined by their selection point values at the time of evaluation) must also be variable. Crucially, if the constraints in the family are scale-partition constraints, then allowing them to vary will lead to cases of harmony reversal, in which a structure lower on the harmony scale is variably preferred to a structure higher on the scale. However, if the constraints in the family are stringency constraints,

Sonority variation in Stochastic Optimality Theory

177

then they generate patterns consistent with the harmony scale under any ranking, as demonstrated in section 2.2 above. As a consequence, harmony reversals should never be observed, even in cases of variation as described here, because no constraint in a stringency family ever assigns a violation to a less-marked point on the scale without necessarily assigning a violation to all more-marked points at the same time. This difference can be illustrated with a schematic example involving one sonority-based markedness scale, onset-sonority distance. (See section 4 for language examples involving this and other sonority-based markedness scales.) Phonological patterns are sometimes attested in which there is variation in the production of a target syllable with an onset cluster (CCV): in some cases the cluster is produced intact (CCV), and in other cases the onset cluster is avoided through vowel epenthesis (CV.CV or VC.CV, where V indicates an epenthetic vowel).6 Crucially, the variation can be sensitive to the sonority profile of the onset cluster, so that a more harmonic cluster (such as obstruent+liquid) is produced with epenthesis less frequently than a less harmonic cluster (such as obstruent+obstruent). This pattern indicates that the ranking of the anti-epenthesis constraint Dep (McCarthy and Prince 1995) is varying with respect to multiple members of a sonority-based constraint family on onset clusters. Sonority-based restrictions on onset clusters can be stated as a harmony scale, where a greater distance in sonority between the segments in a cluster is more strongly preferred (Selkirk 1982; Baertsch 1988; Clements 1990; Zec 2007). For example, consider the simplified consonant sonority scale in (10). (10)

Sonority scale [j] > [l] > [n] > [s] > [t]

On this scale, the cluster [tl] would have a sonority distance of 3, because [l] is three steps away from [t].7 The cluster [nl] would have a distance of 1. The cross-linguistic preference for larger sonority distance within an onset cluster can be modeled with the following harmony scale. 6. Epenthesis is, of course, not the only possible way to avoid onset clusters, but it is used here in this schematic example for concreteness. Other types of faithfulness violations are discussed in section 4 below. 7. Exact numerical values for sonority distance will depend on the precise version of the sonority scale adopted. The scale shown in (10) is intended as a concrete illustration for use in the discussion, not a substantive claim about the exact structure of the sonority scale. Note also that this approach would model onset clusters with falling sonority as having a negative sonority distance, correctly predicting that they would be even more marked than a cluster with distance 0.

178 (11)

Jennifer L. Smith and Elliott Moreton

Harmony scale for onset sonority distance Dist=4 Dist=3 Dist=2 Dist=1 Dist=0

This harmony scale in turn corresponds to the following scale-partition and stringency constraint families respectively. (12)

Constraint families for onset sonority distance (a) Scale-partition constraint family *Dist=0  *Dist=1  *Dist=2  *Dist=3  *Dist=4 (b) Stringency constraint family *Dist≤0, *Dist≤1, *Dist≤2, *Dist≤3, *Dist≤4

In a language that avoids all potential CC onset clusters through epenthesis, Dep is ranked below the sonority-distance constraint against the least problematic cluster, so that clusters are broken up no matter what their sonority distance. (13)

Ranking for a language with epenthesis into all potential onset clusters (a) With scale-partition constraints: Dep ranked below lowest constraint in scale8 *Dist=0  *Dist=1  *Dist=2  *Dist=3  *Dist=4  D EP /pja/

*Dist=0 *Dist=1 *Dist=2 *Dist=3 *Dist=4 Dep

a. pja

*!W

L

→ b. pVja

*

(b) With stringency constraints: Dep ranked below most stringent constraint *Dist≤4  D EP (the other *Dist ≤ n constraints can be ranked anywhere) /pja/ a. pja → b. pVja

*Dist≤4 Dep ¦ *Dist≤2 ¦ *Dist≤0 ¦ *Dist≤3 ¦ *Dist≤1 *!W

L

¦

¦

¦

¦

*

¦

¦

¦

¦

Conversely, in a language that allows all potential CC onset clusters to surface and never shows epenthesis, Dep is ranked above all sonority-distance 8. The tableau format used here and below is a “combination tableau” (McCarthy 2008), which indicates for each winner-loser pair the constraints that would favor the winner (W) and those that would favor the loser (L).

Sonority variation in Stochastic Optimality Theory

179

constraints, so that epenthesis is never chosen no matter how close the sonority distance in the onset cluster. (14)

Ranking for a language where all potential onset clusters surface (a) With scale-partition constraints: Dep ranked above entire scale D EP  *Dist=0  *Dist=1  *Dist=2  *Dist=3  *Dist=4 /pta/ Dep *Dist=0 *Dist=1 *Dist=2 *Dist=3 *Dist=4 → a. pta * b. pVta *!W L

(b) With stringency constraints: Dep ranked above all stringencyfamily constraints D EP  { *Dist≤0, *Dist≤1, *Dist≤2, *Dist≤3, *Dist≤4 } (the ranking among the *Dist ≤ n constraints is irrelevant) /pta/ Dep *Dist≤4 ¦ *Dist≤2 ¦ *Dist≤0 ¦ *Dist≤3 ¦ *Dist≤1 → a. pta * ¦* ¦ * ¦ * ¦ * b. pVta *!W ¦ ¦ ¦ ¦ L L L L L

Consequently, a language that shows variation between epenthesis and no epenthesis for target CCV forms of all sonority distances is one in which the ranking of Dep must vary with respect to the sonority constraints – sometimes the grammar in (13) is invoked (epenthesis in even the best cluster), sometimes the grammar in (14) is invoked (no epenthesis even in the worst cluster), and sometimes Dep takes an intermediate position (epenthesis in some clusters but not in others). Concretely, this means that under the stringency approach, the ranking of Dep must vary with respect to at least *Dist≤4 (and possibly with other *Dist≤n constraints as well, depending on the precise pattern of variation). And, crucially, under the scale-partition approach, the ranking of Dep must vary with respect to the entire scale-partition constraint family, as shown in (15) (15)

Variation between Dep and all members of the scale-partition *Dist = n family *D0 *D1 *D2 *D3 *D4

DEP

180

Jennifer L. Smith and Elliott Moreton

Under standard Stochastic OT (Boersma and Hayes 2001), the scalepartition approach therefore predicts that the members of the scale-partition family *Dist=n must also be able to vary with respect to each other (see (9) and the discussion thereof), leading to harmony reversals. A certain proportion of the time, clusters from lower on the sonority-distance harmony scale in (11) should actually be chosen in preference to clusters that are higher on the harmony scale, meaning that a form such as /plapna/ might surface as [pV.la.pna], with epenthesis into the more harmonic cluster /pl/ but not into the less harmonic cluster /pn/. 3.3. Implications for the phonological system From the perspective of sonority, the empirical question is this: When there is phonological variation involving more than one member of the sonority scale, are patterns of harmony reversal ever observed? If sonority-related harmony reversals are observed under phonological variation, then this supports a model that includes standard Stochastic OT and sonority constraints as scale-partition constraint families. If, on the other hand, sonority-related harmony reversals are never observed even under phonological variation, then this supports at least one of the following conclusions: – Sonority-related constraints are instantiated as stringency families, not scalepartition families. – Stochastic OT must be modified so that the addition of the noise component to each constraint’s ranking value never causes constraints in a fixed-ranking family to alter their family-internal ordering. One implementation of this modification would be to add the same exact noise value to each member of a fixed-ranking constraint family on every evaluation, so that the relative ranking of the whole family might vary with respect to other constraints but the ranking distance between members of the family would never vary. – Stochastic OT must be modified so that it is not necessary for the standard deviation of the noise function for all constraints to be the same. For example, one approach would be to give any constraints that belong to a scalepartition family a noise distribution with a much smaller standard deviation – a “narrower curve” – than constraints that do not belong to such a family. A more radical modification would be a Stochastic OT model in which every constraint has a potentially distinct noise distribution, allowing some constraints to vary in their selection point much more widely than others; in such a model, what must be learned for each constraint in the course of language acquisition would be not only its relative ranking value as compared to other constraints, but the standard deviation of its noise distribution as well.

Sonority variation in Stochastic Optimality Theory

181

The remainder of this chapter first presents language examples confirming that sonority-related variation is observed in phonological patterns (section 4). Then, a new empirical heuristic for distinguishing a true harmony reversal from the interference of an additional constraint in an otherwise harmonyscale-consistent pattern is presented in section 5. Conclusions and implications are considered in section 6. 4.

Sonority-related phonological variation: Examples

The preceding discussion has shown that the empirical predictions of scalepartition and stringency constraint families are different under Stochastic OT in cases of phonological variation involving a markedness hierarchy. This section reviews a selection of case studies demonstrating that phonological variation involving multiple points on a sonority-related harmony scale does indeed exist, and therefore that the theoretical points raised in this chapter have empirical relevance. For reasons discussed below and in section 5, these case studies are not described in the literature in enough detail for us to determine whether harmony reversals are actually observed. The goal of section 4 is specifically to confirm that the right types of phonological variation occur such that it is possible, in principle, to look for harmony reversals among them. Anttila (1997) presents an analysis of genitive plural allomorphy in Finnish according to which a markedness-hierarchy family of constraints preferring higher sonority for stressed vowels shows variation in ranking with respect to other constraints on syllable weight, stress, and sonority.9 Berent et al. (2007) and Berent et al. (2009) report indirect evidence for phonological variation between epenthesis and the faithful realization of different onset-cluster types. When listeners were exposed to clusters that were illegal in their native language, they sometimes perceived the target clusters accurately, and other times as though they had been separated by an epenthetic vowel. Such ‘perceptual epenthesis’ occurred at a higher rate for clusters with a less desirable sonority profile, but variability was shown at several sonority levels. A role for sonority-related constraints has been found in first-language (L1) acquisition; for example, in determining which consonant in a cluster is retained when the cluster is reduced to a singleton (Pater and Barlow 2003; Gnanadesikan 2004). Sonority-related variation in the L1 acquisition of Dutch is described by Jongstra (2003a, 2003b); see section 5 below for discussion. 9. We strongly suspect that additional cases of sonority-related variation occur in adult L1 phonology as well, but examples discussed in detail are difficult to find.

182

Jennifer L. Smith and Elliott Moreton

Cases of sonority-related variation have been reported in studies of secondlanguage (L2) phonological acquisition as well. For example, Petriˇc (2001) studied the pronunciation of German word-final consonant clusters by 48 children, aged 11 to 13, who were learning German as a second language in school in Slovenia. For German-legal clusters consisting of a liquid, nasal, or fricative followed by a nasal, fricative, or stop, the pattern in the aggregated data is that the production error rate increases as the sonority distance falls (Petriˇc 2001, Table 8, summarized in our Table 1 below), although stop-fricative and stop-stop clusters were an exception to this pattern, being easier than expected. Table 1. Proportion of target clusters successfully produced by L2 German learners in the study of Petriˇc (2001, Table 8) C1

Nasal

C2 Fricative

Stop

Liquid Nasal Fricative Stop

0.62 – – –

0.94 0.84 0.78 0.92

0.96 0.97 0.92 1.00

There are a number of L2 studies investigating cases in which learners’ productions show variation between the target-language realization of an onset cluster and some non-target form, generally involving epenthesis, in which the frequency of a target CC production decreases as the sonority profile of that cluster becomes more marked. For example, Cardoso (2008) presents results from a study of Brazilian Portuguese speakers learning English that examined the production of target [st], [sn], and [sl] clusters in word-initial position, and whether these clusters were produced accurately or with vowel epenthesis ([is.C]). A variable-rules analysis in Goldvarb10 indicated that accurate CC clusters were more likely for [sn] and [sl] than for [st] in Cardoso’s learner corpus. A similar study is presented in Boudaoud and Cardoso (2009), examining the production of target [st], [sn], and [sl] clusters in the L2 English of Farsi speakers. This time, the Goldvarb results showed greater CC accuracy for [sl] as compared to both [sn] and [st]. Both cases are consistent with the generalization that clusters with a higher sonority distance are produced accurately more often than clusters with a lower sonority distance, indicating ranking variation be10. Goldvarb is multivariate analysis software for performing a variable-rules analysis based on the methodology first developed in Cedergren and Sankoff (1974). The program outputs factor weights, which indicate the degree to which each linguistic or sociolinguistic factor in the analysis (here, onset cluster type) predicts the appearance of a variable pattern (here, accurate [CC] realization).

Sonority variation in Stochastic Optimality Theory

183

tween constraints in the onset-sonority distance family and the anti-epenthesis constraint Dep. Carlisle (2006) examines the L2 English productions of initial [sl], [sn], and [st] clusters by Spanish-speaking learners; realizations varied between target CC productions and forms with epenthesis ([es.C]). When the results of all speakers were pooled, success at target [sl] was highest, followed by [sn] and then [st], once again in accordance with decreasing sonority distance in the cluster. One additional example may be found in Broselow and Finer (1991), who present results from a study of English onset-cluster production by Japanese and Korean speakers which they interpret as showing better accuracy for clusters with a larger sonority distance. However, it is possible that other aspects of segmental markedness (such as the marginal status and non-existence of [f] in Japanese and Korean respectively) might also be a factor in their findings (in particular, error rates for most clusters involved epenthesis, CCV → CVCV, but errors for [fC] clusters largely concerned a featural change from [f] to [p]). As this example from Broselow and Finer (1991) illustrates, a generally sonority-based phonological pattern can sometimes include aspects that do not follow directly from the predictions of the sonority scale. In some cases, these sonority-exception patterns really are caused by interactions with other phonological constraints or processes, as is likely true in the case of [f] in Broselow and Finer’s results. Along similar lines, Pater and Barlow (2003) present an analysis in which the outcome of cluster simplification in the phonology of children acquiring L1 English is generally driven by a sonority-related markedness scale, specifically, by constraints based on a harmony scale that relates onset consonants to low sonority. Some of the children’s productions appear to go against a sonority-based pattern, as when a word with a target [sn] onset cluster is realized as [n], rather than [s], even though [n] has higher sonority. However, Pater and Barlow (2003) account for these, not as cases of actual harmony reversal (requiring a reranking among the sonority constraints), but as cases where other, unrelated constraints such as *Fricative interact with the sonority-based constraint family in particular ways. Likewise, Bouaoud and Cardoso (2009) consider whether their findings on cluster production in Farsi speakers’ L2 English (in which [sl] clusters are more accurately produced than either [sn] or [st]) are best explained with sonority constraints, or instead with reference to the [±continuant] values of segments in clusters, such that a cluster whose members are both [+continuant] ([sl]) is preferred over clusters with a mismatch in [±continuant] values ([sn], [st]). Consequently, although section 3 has shown that the scale-partition approach predicts harmony reversals under variation, and the stringency approach does

184

Jennifer L. Smith and Elliott Moreton

not, it is not a trivial problem to determine whether harmony reversals are actually observed. In order to compare these two competing approaches to markedness scales, it is essential that we find a way to distinguish true cases of harmony reversal from cases of harmony scales merely interacting with additional constraints. Section 5 uses the properties of constraints and their noise distributions under Stochastic OT to propose a method for making this distinction. 5.

Deriving empirical predictions

Stringency hierarchies exclude all possibility of harmony reversal under withinor between-speaker variation, whereas scale-partition hierarchies do not (section 3, above). The stringency hypothesis would thus at first glance seem to have the virtue of easy falsifiability, since a single case of markedness reversal would refute it. However, the effects of a stringency hierarchy can be interfered with by constraints outside it in ways that could produce the appearance of a harmony reversal. For example, the hierarchy in (12) predicts that no language should favor epenthesis into a stop-liquid onset (distance = 3) more than epenthesis into a fricative-liquid onset (distance = 2). English seems to counterexemplify this prediction by permitting syllables to begin with /sl/ but not with /tl/. The apparent markedness reversal is really due to a constraint outside the sonority-distance hierarchy; several such plausible constraints are discussed by Bradley (2006). We are thus faced with the problem of distinguishing actual counterexamples from spurious ones. This section of the paper describes a class of situations in which the effects of a scale-partition hierarchy can be recognized unambiguously, in the form of a transparent relationship between the frequencies of the observed variants that is highly unlikely to arise by a chance conspiracy of unrelated constraints. 5.1. Example: Cluster simplification For a concrete example of a harmony reversal involving sonority, we consider the simplification of onset clusters from two consonants (C1 C2 ) to one (C1 or C2 ) by first-language learners. The process is illustrated by data from Gita, a two-year-old American-English learner studied by Gnanadesikan (2004). Gita regularly reduced target biconsonantal onsets to the less sonorous of the two consonants; e.g., blue [bu], sky [kaI], snow [soU]. Pater and Barlow (2003) propose that the simplification is driven by highly ranked *ComplexOnset, while the output consonant is chosen by a markedness scale that penalizes sonorous segments in onsets:

Sonority variation in Stochastic Optimality Theory

(16)

185

*x-Ons: Give one violation mark for every segment in sonority class x that is in an onset *Glide-Ons  *Liquid-Ons  *Nasal-Ons  *Fricative-Ons11

When only one consonant can surface in the onset, the *x-Ons hierarchy favors the retention of the less sonorous one. This is shown in (17) (after Pater and Barlow 2003, Tableau 7).12 (17)

Retention of the less sonorous onset in onset-cluster simplification *Glide- *Liquid- *Nasal- *FricativeOns Ons Ons Ons sky /skaI/ a. [saI] *!W → b. [kaI] smell /smEl/ → a. [sEl] * b. [mEl] *!W L

Gnanadesikan (2004) does not describe variation in Gita’s choice of reduction output, but the scale-partition hypothesis predicts that it is possible for a learner to show such variation. We can imagine (though we know of no concrete examples) a Gita-like grammar in which the *x-Ons constraints are ranked close enough to each other to be observed exchanging places, so that, e.g., smell surfaces sometimes as [sEl] and sometimes as [mEl], depending on whether *Fricative-Ons is sampled above or below *Nasal-Ons. The pattern of simplification to the less sonorous segment is common across children and is well attested in Dutch as well as English learners (see Jongstra 2003b, Ch. 2, for a review). In a picture-naming study of 45 typically-developing Dutch-learning children around two years of age, Jongstra (2003a, 2003b) found that when word-initial two-consonant clusters were reduced to a single consonant, most children followed consistent reduction patterns, but there were also several cases of within-child variation (Jongstra 2003b: section 4.2.2.3). Clusters of the form plosive+/l/, plosive+/r/, fricative+/r/, and fricative+plosive were reduced to the less-sonorous member by most children. Clusters of the form fricative+/l/, fricative+nasal, and /sx/ were more variable within and/or 11. Pater and Barlow (2003) assume an approach to markedness scales that does not include a constraint against the least marked member of the scale; here, stop onsets. 12. Deletion in (17) is motivated by the ranking *ComplexOnset  Max (Pater and Barlow 2003). These two constraints are not shown.

186

Jennifer L. Smith and Elliott Moreton

across children; e.g., /sm/ is produced consistently as [s] by five children, consistently as [m] by three, and variably as [s] or [m] by four (as in the hypothetical smell example above). This pattern of variation is consistent with the predictions of a scale-partition constraint family whose members are ranked close enough together to change places, as described above. This variation would not be predicted by an alternative grammar model in which the *x-Ons constraints were replaced by a stringency hierarchy – e.g., *Glide-Ons, *(Glide or Liquid)-Ons, *(Glide or Liquid or Nasal)-Ons, etc. – because the less-sonorous output harmonically bounds the more-sonorous one (section 2.2, above, and (18) below).13 (18)

A failed alternative using a stringency hierarchy; the output is the same no matter how the ranking of the constraints varies

sky /skaI/ a. [saI] → b. [kaI] smell /smEl/ → a. [sEl] b. [mEl]

*Glide- ¦ *(Glide or ¦ *(Glide, ¦ *(Glide, Liquid Ons ¦ Liquid)-Ons ¦ Liquid, or ¦ Nasal, or ¦ ¦ Nasal)-Ons ¦ Fricative)-Ons ¦ ¦ ¦ ¦ ¦ ¦ *!W ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ * ¦ ¦ *!W ¦ *

However, that does not allow us to reject outright the hypothesis that the sonority effects are governed by a stringency hierarchy. It is clear that factors other than sonority can be involved in the choice of output. For example, many children consistently or usually reduce /sm/ and /sn/ to the more-sonorous [m] and [n] (Pater and Barlow 2003). A stringency constraint family cannot be ranked to produce those outputs, but neither can a scale-partition constraint family, since the ranking value of *Nasal-Ons cannot be smaller than that of *Fricative-Ons Some other constraint from outside the hierarchy must be re-

13. Following most first-language acquisition work in Optimality Theory, we adopt the hypothesis that adult and child grammars differ only in constraint ranking, and our conclusions are only valid under that assumption. It has been clear from the start that constraints must to some extent be learned (see, e.g. the discussion of Lardil Free-V in Prince and Smolensky 1993/2004: section 7.1), and hence that the constraint set may differ between adults and children, or even between children (Pater and Werle 2003). It may be possible to construct alternative explanations based on maturation or constraint learning for any of the phenomena discussed in this paper.

Sonority variation in Stochastic Optimality Theory

187

sponsible, e.g., a context-free *Fricative constraint (Pater and Barlow 2003).14 Such a constraint could produce the apparent harmony-reversal effect with either kind of sonority-constraint hierarchy. An apparent harmony reversal therefore cannot be taken as evidence against the stringency hypothesis unless we are quite certain that no interfering constraint, known or unknown, exists outside the hierarchy. If even outright reversals of harmony cannot necessarily distinguish the hypotheses, what can? The next two subsections of this paper describe a characteristic signature left by interactions among scale-partition constraints in the form of a particular relationship among the frequencies of the different variants: If the odds of preferring A to C are oAC to 1, the odds of preferring B to C are oBC to 1, and the odds of preferring A to B are oAB to 1, then log oAB should be approximately equal to log oAC − log oBC . This relationship is very specific and unlikely to arise from a chance arrangement of other constraints, even unknown ones; if observed, it is good evidence for a scale-partition hierarchy. (Failure to observe it may be due to interference from other constraints and so does not have an unambiguous interpretation.) 5.2. Ranking distance and domination probability Consider a system of three constraints, C0, C1, and C2 , in a Stochastic OT grammar. Ultimately, we will want to interpret them as markedness constraints from the same harmony scale, but since the logic applies equally well to any three constraints, we will start out speaking as generally as possible. For simplicity’s sake, we renumber the ranking scale so that the noise distribution has a standard deviation of 1, and we let μi be the ranking value of Ci. Let pi j = Pr (Ci  Cj ), the probability that Ci will be seen to dominate Cj on any particular optimization, and suppose our language data allow us to estimate p10 and p20 . Since these probabilities tell us how far C1 and C2 are ranked from C0 , they also tell us how far they are from each other, and so p12 is predictable from them. The next bit of this paper derives a simple approximate method for making that prediction. It is intuitively clear that if p10 and p20 are similar, then C1 and C2 must be ranked near each other, and so p12 must be about 0.5. If, on the other hand, p10 is much larger than p20 , then C1 must be ranked well above C2 , and so p12 should be close to 1. This intuition can be made more precise quantitatively. The difficult step is converting back and forth between ranking values and variation frequencies. For any given selection point x, the probability that Ci is observed in an interval of width dx around x is ϕ (x − μi)dx, where ϕ (x) is the standard 14. For justification of *Fricative on typological and developmental grounds, see Pater and Barlow (2003).

188

Jennifer L. Smith and Elliott Moreton

normal probability-density function. The probability that Cj is observed below x is Φ(x − μ j ), where Φ(x) is the standard normal cumulative distribution function. The probability of both events happening at once is Φ(x − μ j )φ (x − μi )dx. Summing this up for each possible selection point x, we get the following equation. (19)

pi j =

 ∞

−∞

Φ(x − μi )φ (x − μi )dx

Although this equation can be solved numerically for any specific values of pi j or of μi − μ j (e.g., via a simulation using the Gradual Learning Algorithm of Boersma 1998b, Boersma and Hayes 2001), it is opaque to intuition (i.e., it does not suggest an interpretation in terms of linguistically-relevant concepts) and provides no help in thinking about general relationships between constraint ranking probabilities. To find a more convenient approximation, we start by restricting our attention to variant frequencies between 1% and 99%. Next, we convert the observed frequencies from probabilities to log-odds, where log-odds(p) = ln(p/(1− p)). As Figure 1 shows, it turns out that log-odds(pi j ) is approximately linear in μi − μ j over the range from 1% to 99%. Thus, μi − μ j , the difference in ranking values, is approximately a constant factor s times the log-odds of the probability of observing Ci  Cj : (20)

log-odds(pi j ) ≈ s(μi − μ j )

That is, if variation probabilities are expressed as log-odds, they can be treated as distances between constraints, as if we had simply rescaled the ranking continuum using a different length unit. Consequently, (21)

log-odds(p12 ) ≈ s(μ1 − μ2 ) ≈ s(μ1 − μ ) − s(μ2 − μ ) ≈ log-odds(p1 ) − log-odds(p20 )

In other words, because of the near-linear relationship between log-odds and ranking distance, we can (approximately) predict log-odds directly from logodds without going through ranking values at all.15

15. If the normally-distributed noise in Stochastic OT is replaced by logistically-distributed noise, then the log-odds is exactly proportional to the ranking distance, and there is no need to approximate. The normal and logistic distributions are very similar, making the two different versions of Stochastic OT difficult to distinguish empirically (Evanini 2007). In the absence of compelling empirical evidence favoring

189

2 0 -2 -6

-4

log-odds (Pr (C.i >> C.j))

4

6

Sonority variation in Stochastic Optimality Theory

-4

-2

0

2

4

mu.i-mu.j

Figure 1. There is an approximately linear relationship between μi − μj and the logodds of the probability that Ci will be observed to dominate Cj , as long as the log-odds is between about −6 and 6 (corresponding to a probability between about 1% and 99%). The dashed line, an ordinary-least-squares regression line, has slope s = 1.371.

To show how much accuracy is lost in the approximation, we calculated p12 as p10 and p20 jointly ranged over 0.025, 0.05, 0.10, 0.15, 0.25, 0.5, 0.75, 0.85, 0.9, 0.95, and 0.975, excluding combinations for which log-odds(p10 ) and logodds(p20 ) differed by more than 4. The exact and approximate p12 are plotted against each other in Figure 2. The largest difference in absolute terms was 0.061, which occurred when p10 and p20 were 0.025 and 0.100 (in either order). 5.3. Variant frequency in scale-partition hierarchies The foregoing is general and abstract, applying to any three constraints whatsoever as long as their relative ranking probabilities pi j can be unambiguously inferred from the data. Conveniently, when the three constraints belong to a either distribution, there is no reason for linguists not to prefer the mathematically more tractable option.

190

0.6 0.4 0.0

0.2

Predicted Pr (C1 >> C2)

0.8

1.0

Jennifer L. Smith and Elliott Moreton

0.0

0.2

0.4

0.6

0.8

1.0

Actual Pr (C1 >> C2)

Figure 2. Approximate vs. exact p12 as calculated at multiple levels of p1 and p20

scale-partition hierarchy, there are circumstances in which the pi j are not just inferable from, but actually equal to, the variation probabilities. A concrete example can be constructed from Pater and Barlow’s (2003) analysis of the data from Gnanadesikan (2004), shown above in (17). Since the sets of forms assigned violations by each *x-Ons constraint do not intersect (there are no entailed violations between constraints in this family), it is only *Fricative-Ons and *Nasal-Ons that are relevant for the [sEl]/[mEl] decision. Hence the probability that fricative-nasal clusters are reduced to the fricative rather than the nasal is exactly equal to Pr(*Nasal-Ons  *Fricative-Ons), and likewise for any other pair of the sonority classes listed in (16). The nonintersecting property of scale-partition constraints thus means that the domination probabilities pi j can be read directly off the variation probabilities in the data. Therefore, by the argument made in section 5.2, the variation probabilities should stand in predictable relations to each other, e.g., (22)

log-odds (Pr (sk → s)) ≈ log-odds (Pr (sl → s))− log-odds (Pr (kl → k))

If the empirical variation probabilities are not so related, then one of the hypotheses must be wrong: Either the relevant sonority constraints do not form a

Sonority variation in Stochastic Optimality Theory

191

scale-partition hierarchy (the sets of forms to which they assign violations intersect, perhaps as in a stringency hierarchy, so that the variation probabilities are not equal to the domination probabilities), or other constraints are also involved in the choice between sonority classes. On the other hand, if the predicted relationship does hold, it would be strong evidence in favor of a scale-partition hierarchy. We have not succeeded in finding any published data sets which conform to the scale-partition variation predictions. Only a few have the relevant quantitative data in any case (e.g., Tropf 1987; Ohala 1999; Hansen 2001; Jongstra 2003a, 2003b), so not too much should be made of this failure yet. To illustrate how the predictions are tested, we applied the model sketched above to a subset of the Jongstra (2003a, 2003b) data. We focus here on the three Dutch-legal clusters [sk sn kn], and on the ten children who reduced each of those clusters (C1 C2 ) to a single consonant at least eight times in the sample (that being the author’s criterion of frequent attestation). Table 2 shows the rate of reduction to C1 , or to a consonant in the same sonority class, as a proportion of all reductions to a single consonant. Table 2. Reductions of [sk sn kn] to a segment in the same sonority class as one of the onset consonants, showing the proportion where the class of the output consonant was the same as that of the first target consonant rather than the second (Jongstra 2003b, Table 5.2b). Each proportion is based on at least eight observations Child

sk

Target cluster sn

kn

3 4 5 6 13 14 15 23 28 34

0.08 0.00 0.00 0.00 0.00 0.13 0.05 0.00 0.08 0.00

1.00 0.97 0.96 0.85 0.75 0.94 1.00 0.92 1.00 1.00

0.86 0.69 0.95 1.00 0.91 0.86 1.00 1.00 1.00 1.00

MEAN

0.034

0.939

0.927

All ten of these children preferentially reduce [sk] to a stop and [sn] to a fricative; i.e., they choose the least-sonorous segment, just as Gita did. Since

192

Jennifer L. Smith and Elliott Moreton

stops are preferred to fricatives, and fricatives to nasals, we expect [kn] should be reduced to a stop, as indeed it is. If the choice is determined entirely by a scale-partition constraint family like the *x-Ons constraints, then we would expect the preference for stops over nasals to be even greater than that for stops over fricatives or that for fricatives over nasals; indeed, expressed as log-odds, it should be approximately equal to their sum. However, this is not the case. Child 14, for example, prefers stops over fricatives 87% of the time (a log-odds of 1.90) and fricatives over nasals 94% of the time (2.75). The *x-Ons hypothesis predicts that he or she should prefer stops over nasals 99% of the time (4.65 = 1.90 + 2.75), but the observed rate is only 86% (1.82), which is numerically less than the rate of preferring stops to fricatives or fricatives to nasals. If we assume that all of the children have the same constraint ranking values, and combine their data (by averaging with equal weight), the same pattern occurs. Stops are preferred to fricatives for [sk] 96.6% of the time (log-odds of 3.35), and fricatives to nasals for [sn] 93.9% of the time (2.73), which predicts that stops should be preferred to nasals for [kn] 99.7% of the time (6.08 = 3.35 + 2.73). The actual rate is 92.7% (2.54), less than either of the other two preferences. By (21), there is no way to assign ranking values to the constraints in (17) that will match the observed frequencies: In order to model large preferences for stops over nasals and nasals over fricatives, the anti-stop and anti-nasal constraints have to be far apart, but in order to model the smaller preference for stops over nasals, the same two constraints have to be close together.16 This section has identified a clear empirical signature of a scale-partition hierarchy, which crucially depends on the lack of intersecting sets of violations between the constraints which are in variation. Deviation from the predicted relationship indicates such intersection, either between constraints in the hierarchy itself, or between constraints inside and outside the hierarchy. Cases which conform to the relationship may be rare (since there are many outside constraints that could interfere), but would strongly support the scale-partition hypothesis if found.

16. We caution against making too much of this particular example, since the number of data points per child does not allow statistically significant comparison between such close proportions, and there are other complications such as changes in children’s productions over the five-month course of the study.

Sonority variation in Stochastic Optimality Theory

6.

193

Conclusions and implications

In this chapter, we have shown that variation provides a new empirical domain for comparing the two competing approaches to markedness hierarchies in a constraint-based model. The scale-partition approach predicts harmony reversals, while the stringency approach does not. Further, we have shown that the pattern of harmony reversals predicted by the scale-partition approach can be empirically distinguished from superficially similar patterns caused by interactions of markedness-hierarchy constraints with other, unrelated constraints. The ideal data set for distinguishing the two hypotheses would describe a sonority-sensitive process involving at least three distinct sonority classes (three, because (21) presupposes three distinct constraints). It would provide individual-level data, so that within-speaker variation could be separated from between-speaker variation. If an acquisition study, it would break the data down further by recording session, to distinguish variation from change over time. Finally, it would have enough tokens in each cell to allow reasonably precise probability estimates. All this could prove a tall order, since pinning down even a single variant frequency to a 95% confidence interval of ±0.1 could take up to 100 observations,17 and we would need to determine at least three such frequencies per speaker. Data sets of this size may become more common as technology improves. It is worth noting that the general predictions we have made about the differences between scale-partition constraint families and stringency constraint families are independent of whether a constraint-based phonological framework is implemented as Optimality Theory (Prince and Smolensky 1993/2004), in which higher-ranked constraints strictly dominate lower-ranked constraints, or as Harmonic Grammar (Legendre, Miyata and Smolensky 1990b; Smolensky and Legendre 2006), in which constraints are weighted rather than strictly ranked and the effects of violations of different constraints are additive. The same predictions are made under HG as under OT because even in HG the scalepartition constraints will not show additive effects, as their violation profiles are completely independent of one another. As for the stringency constraints, they will show additive effects under HG, and this would likely affect their overall position with respect to the entire constraint hierarchy in a given language, but it 17. In order to estimate p10 to a precision of ±0.10 with 95% confidence, we need a sample of size at least (1.96)2 p10 (1 − p10 )/(0.12 ) = 384 p10 (1 − p10 ) (Tortora 1978, Eqn. 3.1). The value of p10 is unknown, but in the worst case it could be 0.5, requiring 96 observations. The number will be more favorable the smaller p10 really is; e.g., for p10 = 0.05, we need only 18 observations.

194

Jennifer L. Smith and Elliott Moreton

does not change the fact that stringency constraints rule out harmony reversals altogether. The results of this chapter have implications beyond sonority, and in fact beyond phonology. The use of markedness hierarchies, and of constraint families based on harmony scales, is a technique that has been applied in morphosyntax as well. Moreover, analyses involving just the crucial scenario we have identified here, where there is variation in the ranking of some constraint with respect to multiple members of a harmony-scale constraint family, have been proposed by, for example, Aissen (2003) and Lee (2006). However, the implications for harmony reversals have not generally been explored, beyond a brief remark by Dingare (2001: 8) acknowledging that Stochastic OT might allow for the selection points for constraints in a markedness hierarchy to end up in reverse order from their usual harmony scale. Thus, the predictions we identify and questions we raise may be fruitfully pursued both within and beyond phonology. Abbreviations C Dist HG L1 L2 OT Pr V

consonant (sonority) distance Harmonic Grammar first language second language Optimality Theory probability vowel

Acknowledgments Many thanks to Paul de Lacy, Jaye Padgett, and Steve Parker for comments and suggestions.

Sonority intuitions are provided by the lexicon Ruben van de Vijver and Dinah Baer-Henney

Abstract. Do the intuitions of German native speakers concerning the role of the Sonority Sequencing Principle in onset clusters go beyond what they can directly observe in their lexicon, or do their intuitions reflect observable patterns in their lexicon? We addressed this question by means of a rating study involving nonce words, an analysis of onset clusters in a corpus and a phonotactic learning simulation. It turns out that the intuitions of German native speakers reflect patterns they can observe in their lexicon, provided it is analyzed in terms of distinctive features.

1.

Introduction

The Sonority Sequencing Principle (Sievers 1876/1893; Jespersen 1904; Selkirk 1982; Clements 1990; Blevins 1995) constrains the phonotactics of syllables in most languages. The Sonority Sequencing Principle states that the sonority of segments – analyzed as their relative loudness (Sievers 1876/1893; Parker 2002, 2008) – rises from onset to nucleus and falls from nucleus to coda. An onset such as [kl] in which the sonority rises is more well-formed than an onset such as [kt] in which the sonority is flat, and the onset [kt] is still more well-formed than the onset [mt] in which the sonority falls. Onsets with a sonority rise are more common across languages than onsets with a flat sonority, which, in turn, are more common than onsets with a sonority fall (Greenberg 1978). In fact, implicational statements for the occurrence of clusters in a language can be derived from the distribution of particular types of onsets across languages (Berent et al. 2007). The presence in a language of onsets with a falling sonority profile implies the presence of both onsets with a flat sonority and onsets with a rising sonority profile ([mt] [kt] and [kl]). In other words, if a cluster of a [+nasal] segment followed by a [–sonorant, –continuant] segment is grammatical in a language, a cluster of two [–sonorant, –continuant] segments is also grammatical. The presence of onsets with a sonority plateau implies the presence of onsets with a sonority rise, but not of onsets with a sonority fall ([kt] [kl], but not [mt]). The presence of clusters with a sonority rise does not imply the presence of clusters with flat sonority profile nor of

196

Ruben van de Vijver and Dinah Baer-Henney

clusters with a falling sonority profile ([kl] implies neither the presence of [kt] nor of [mt]). Berent et al. (2007) argued that human beings possess knowledge of the relative well-formedness of onset clusters that goes beyond their lexicon and could perhaps be attributed to innate knowledge. In a series of ingenuous experiments they showed that English native speakers have knowledge of the relative wellformedness of onset clusters – as expressed by cross-linguistic occurrence – that do not occur as consonant sequences in English. They interpret this to mean that English native speakers have knowledge of sonority that is not provided by the lexicon. In a similar vein, Zuraw (2007) found that native speakers of Tagalog, a language without any clusters in the native lexicon, tend to break up clusters in borrowings from English and Spanish. However, not all clusters are equally likely to be split up. The greater the rise in sonority, the more likely it is that a cluster is broken up. Zuraw concludes that native speakers of Tagalog must have knowledge of sonority relations between members of onsets, and, since Tagalog has no native complex onsets, this knowledge goes beyond lexical knowledge and could, therefore, perhaps be attributed to innate and universal grammatical knowledge. The studies of Berent et al. (2007) and of Zuraw (2007) imply that speakers have knowledge of the well-formedness of onset clusters that cannot be derived from existing consonant sequences in their lexicon. However, phonotactic restrictions can be most insightfully explained in terms of distinctive features (Jakobson, Fant and Halle 1952). Even though English has no words that start with [kn], the existence of a word such as [klæn] clan implies that a sequence of [–sonorant, –continuant][+sonorant, –continuant] is not impossible.1 Native speakers could use this as evidence that a nonce word such as [knIk] is not completely ungrammatical, even though no word in English begins with [kn]. Recently Daland et al. (2011) argued that the knowledge of sonority may be derived from features that define the natural classes of segments in the lexicon.2 They showed in a rating experiment that native speakers of English have preferences for onset clusters with rising sonority profiles over flat and falling sonority profiles, and that they prefer onset clusters with flat sonority profiles over falling sonority profiles. They then went on to create a grammar from a lexicon and a feature matrix using the phonotactic learner model developed in Hayes and Wilson (2008) and used the grammar to assess the well-formedness 1. See Wiese (1996) for arguments that [l] is [–continuant]. 2. Daland et al. (2011) was posted on the Rutgers Optimality Archive while we were completing our experiments.

Sonority intuitions are provided by the lexicon

197

of the words used in the rating experiment. It turned out that there was a good correlation between the intuitions of the native speakers and the assessment of the words by the grammar (r = .83). Daland et al. conclude that the intuitions concerning the well-formedness of onset clusters can be learned on the basis of the lexicon, provided the segments of the lexicon are analyzed in terms of distinctive features. The central question of this chapter is where does knowledge of sonority come from? To answer this question we followed a research strategy similar to the one in Daland et al. (2011). We conducted a study of the onsets in a corpus of 11,427 words from the spoken version of the German CELEX (Baayen, Piepenbrock and Gulikers 1995). The sonority scale for German, as derived from a universal scale (Clements 1990) by Wiese (1996), is reflected in the frequency of the clusters in the corpus. Clusters with a larger sonority difference are more frequent than clusters with a smaller sonority difference. It is therefore not inconceivable that native speakers derive their knowledge of the sonority scale from the lexicon. We will then study the intuitions concerning the relative well-formedness of onset clusters by German native speakers by means of a judgment experiment and we will compare their intuitions, as assessed in the experiment, with the relative well-formedness of the onset clusters in the words as established by a machine-learned maximum entropy grammar (Della Pietra, Della Pietra and Lafferty 1997; Goldwater and Johnson 2003; Jäger 2007; Hayes and Wilson 2008). It will turn out that the intuitions of native speakers are matched by the assessment of the phonotactic grammar. We will therefore conclude that sonority can be derived from the lexicon. On the basis of our experiment, we will conclude that Germans mirror the frequency of onsets in their lexicon, and that their intuitions about clusters which consist of segment combinations that do not occur in German follow the Sonority Sequencing Principle. On the basis of our machine-learned grammar we will conclude that these intuitions can be learned from the lexicon, provided that words are represented with phonological features and constraints are learned over at least three adjacent segments. This paper is organized as follows. In section 2 we will provide the theoretical background. In section 3 we will provide an analysis of the sonority profiles of onsets in a corpus of 11,427 words. In section 4 we will describe our experiment and in section 6 we will present our learning simulation, the results of which will be presented in section 5. We will discuss the results in section 7. The paper ends with a conclusion in section 8.

198 2.

Ruben van de Vijver and Dinah Baer-Henney

Background

Natural classes of segments can be grouped according to their sonority, or intrinsic loudness. Obstruents, for example, have a very low intrinsic loudness, while vowels have a very high intrinsic loudness (Sievers 1876/1893; Jespersen 1904; Selkirk 1982; Clements 1990; Parker 2002, 2008). Cross-linguistically sonority tends to increase from onset to nucleus and to fall from nucleus to coda (Greenberg 1978; Berent et al. 2007). This preference is reflected in the types of clusters a language allows. If a language allows an onset cluster with falling sonority, such as [mt], it will also allow an onset cluster with a sonority plateau, such as [kt]. A language at hand is Polish, in which onsets with rising sonority, flat sonority and falling sonority are > allowed ([prOS˜E] ‘please’, [pt1C] ‘cookie, cream puff’ and [rdza] ‘rust’). Greek allows rising and flat sonority in clusters ([plAsma] ‘plasma, creature’, [ptOma] ‘corpse’), but it does not allow clusters with falling sonority.3 English only allows clusters with rising sonority ([kli:n] ‘clean’, [kjut] ‘cute’). Berent et al. (2007) addressed the question of whether knowledge of the relative well-formedness of sonority profiles in onsets is part of the grammar of native speakers, and if it is, whether this knowledge goes beyond what they can directly observe in their lexicon. In other words, do English native speakers know that onset clusters with a flat sonority profile are more well-formed than clusters with a falling sonority profile, despite the lack of overt clusters with such a profile in their lexicon? On the basis of their experiments they conclude that knowledge of sonority is part of the grammatical knowledge of native speakers and that this knowledge goes beyond what can be directly observed. Berent et al. (2007) asked native speakers of English how many syllables were in nonce words that they heard. These monosyllabic nonce words had onsets with a rising sonority profile ([bnIf]), a plateau sonority profile ([bdIf]) and a falling sonority profile ([lbIf]). It turns out that the participants tended to perceive monosyllabic nonce words with a falling sonority profile as disyllabic, but words with a flat or rising sonority profile as monosyllabic. This shows that knowledge of the relative well-formedness of sonority profiles is part of the grammar of native speakers – even of clusters which consist of segment combinations that are not directly observable in English. They concluded that since the knowledge of these segment combinations cannot be learned from the lexicon, the knowledge of sonority profiles must come from another source, which 3. The Polish examples were provided by to us Agata Renans, and the Greek ones by Stella Gryllia.

Sonority intuitions are provided by the lexicon

199

they identified as Universal Grammar (Chomsky 1965; Prince and Smolensky 1993/2004). A similar argument is made in Zuraw (2007) concerning cluster splittability in Tagalog. Tagalog only has complex onsets in recent borrowings, as in the stem graduate. These complex onsets may optionally be split by an infix, for example –um, which marks realis aspect, (gr-um-aduate or g-um-raduate), and the nature of the cluster determines the frequency of splits. The more the second member of the cluster resembles a vowel – put differently, the greater its intrinsic sonority – the more likely it is that the cluster is split. Native speakers of Tagalog agree, in an experiment reported by Zuraw (2007), about the splittability of clusters that hardly occur with an infix. This suggests that Tagalog speakers must have knowledge about clusters that they cannot have learned from directly observable splits. Even though both Berent et al. (2007) and Zuraw (2007) conclude that the knowledge of sonority of native speakers cannot be directly derived from the lexicon, the authors of both papers also emphasize that they cannot exclude the possibility that lexical statistics can, in the end, explain their findings, if they counted different units than segments. Phonotactic generalizations are crucially formulated in terms of features (Selkirk 1982; Steriade 1982; Clements 1990; Blevins 1995) and are incorporated in the grammar as constraints. German words may start with the coronal continuants [z] ([zOn@] Sonne ‘sun’) or [S] ([Sul@] Schule ‘school’), but not with [s] (*[sOn@]). This restriction can be formulated with distinctive features. Words may not begin with a voiceless, coronal, continuant obstruent: *word [–voice, –sonorant, +coronal, +continuant, –high]. This constraint bans [s] at the beginning of a word, but allows both [z] and [S]. The Sonority Sequencing Principle, which is also a phonotactic generalization, can also be formulated as (a set of) constraints, which are defined in terms of distinctive features. Native speakers do not have direct evidence for clusters with a flat or a falling sonority profile according to Berent et al. (2007), and native speakers must therefore rely on different sources of evidence in order to judge the grammaticality of such clusters. However, this claim could be undermined by finding evidence that native speakers rely on features, rather than on segments, for their generalizations. Daland et al. (2011) pursued the question of whether knowledge of the Sonority Sequencing Principle can be derived from the lexicon. They did this by comparing the results of a judgment experiment with a learning experiment. Participants had to rate pairs of words which differed only in their onset cluster in the judgment experiment. The words consisted of existing onset clusters, marginal onset clusters and non-existing onset clusters. They used the sonority

200

Ruben van de Vijver and Dinah Baer-Henney

scale of Clements (1992), in which obstruents are least sonorous, followed, in increasing sonority, by nasals, liquids, glides and vowels. In the non-existing onset clusters, some onsets had a large sonority rise, such as ‘zr’, some a smaller sonority rise, such as ‘km’ or ‘ml’, some a flat sonority, such as ‘pk’, some a slightly falling sonority, such as ‘rn’ and some had a large falling sonority, such as ‘lt’.4 Each onset was combined with one of six disyllabic tails, such as ‘-ottiff’ or ‘-eppid’. The words were presented in pairs and participants had to choose which word was the best English word. The sonority of the unattested words best predicted the preference of the participants for a word. Daland et al. then used the phonotactic learner, developed in Hayes and Wilson (2008), to learn phonotactic constraints from a lexicon and a set of features, which characterize the segments in the lexicon. The constraints that the learner comes up with are defined in terms of natural classes. An example of such a constraint would be *word [–voice, –sonorant, +coronal, +continuant, –high], mentioned above, which prohibits words starting with [s]; a constraint that is important in the phonology of German (Wiese 1996). The phonotactic learner creates constraints and assigns a weight to each one (Legendre, Miyata and Smolensky 1990c; Jäger 2007; Smolensky and Legendre 2006; Coetzee and Pater 2008; Potts et al. 2010; Pater 2009). The wellformedness of a word is calculated by multiplying each violation of a constraint by its weight and summing all products. The participants in the experiment of Daland et al. had the same preferences as the participants in Berent et al. English native speakers have preferences for sonority profiles that adhere to the Sonority Sequencing Principle, even if the target words have clusters that are not overtly attested in English. The experiment was carried out as a head-to-head comparison, in which participants were asked to state which word in a pair of words is more likely to be an English word. The results of their learning simulations showed that these preferences mirror the lexicon frequencies of clusters provided these are analyzed in terms of distinctive features.5 German is interesting – and different from English in this respect – in that it tolerates clusters of all sorts. German native speakers accept and produce clus4. The words were presented orthographically. The clusters are therefore given in letters here and not in the IPA. 5. As the native speakers need to have analyzed their segments in terms of distinctive features and as these are part of Universal Grammar – either innately (Jakobson, Fant and Halle 1952) or through induction (Hayes and Steriade 2004) – their finding can be interpreted as the result of the interplay between Universal Grammar and frequency (see Coetzee (2008) for a similar interplay between Universal Grammar and frequency in Obligatory Contour Principle-place effects).

Sonority intuitions are provided by the lexicon

201

ters with a flat sonority sequence such as [ps] in [psYx@] Psyche ‘psyche’; [pt] in the name Ptolemäus is pronounced [ptol@meus] or [tm] in the word Tmesis ‘tmesis’ without changing their pronunciation (Wiese 1996). A further remarkable cluster is [kv], which consists of an obstruent followed by a fricative, as in the word [kvA5k] Quark ‘low-fat curd cheese’. Even though this fricative is often transcribed as voiced, [v] is often realized voiceless, especially if it follows a voiceless obstruent (Wiese 1996). Other examples of clusters with [v] as > its second member are [tsvaI] Zwei ‘two’ and [SvaIn] Schwein ‘pig, wild boar’. This type of cluster is marked from the point of view of sonority because it contains two obstruents, even though [v] is sometimes realized as voiced and therefore slightly more sonorous than [k]. It is more marked – typologically less frequent – than a cluster such as [km] or [ml]. If the lexicon serves as a basis, then we would expect that clusters which contain [kv] are judged better than less marked clusters for which there is less direct evidence; the segment combinations in the less marked clusters are less frequent. We also expect that, since clusters with a flat sonority profile are allowed in German, a non-existing cluster such as [kt] will be judged better than a non-existing falling cluster such as [mt]. We can now turn to a study of the phonotactics of sonority in onsets in German and a corpus study of the frequency of onset clusters. 3.

Sonority and phonotactics of onsets in German

Wiese (1996) proposes that the following sonority scale, very similar to the one proposed by Clements (1990), is appropriate for German.6 There are two differences between Wiese’s scale and Clements’ scale. In Clements’ scale liquids occupy the same position; in Wiese’s scale the non-continuant liquid [l] is less sonorous than the continuant liquid [ö]. The argument for this distinction between the liquids comes from their behavior in codas. There are words in which both liquids [öl] are a complex coda – [kEöl] Kerl ‘guy, bloke, lad’ – but in the reverse order the liquids are always separated by a schwa, as in [kEl@ö] Keller ‘basement’ (Wiese 1996; Féry 2000; for differences in the sonority in liquids in General American English see Proctor and Walker this volume). Assuming that the sonority in a coda falls, this justifies splitting up the liquids according to their sonority (see also Henke, Kaisse and Wright this volume). The second difference concerns the role of high vowels, which are less sonorous than non6. The phonetic symbol [ö] is used to refer to the [+continuant] liquid, as opposed to the [–continuant] liquid [l]. Not all German speakers have a uvular trill, though, notwithstanding the IPA symbol [ö] (Wiese 1996).

202

Ruben van de Vijver and Dinah Baer-Henney

high vowels in Wiese’s scale. High vowels are realized as a glide before vowels in German, which Wiese attributes to the lower sonority of high vowels. (1)

The German sonority scale Less sonorous obstruents nasals [l]

[ö]

high

more sonorous vowels vowels

This scale implies that a cluster which consists of an obstruent and a nasal is more marked than a cluster which consists of an obstruent and a [l], which in turn is more marked than a cluster which consists of an obstruent and an [ö] The scale in (1) explains most of the clusters in (2), which is adapted from Wiese (1996). The table depicts the first member of a cluster vertically and the second member of a cluster horizontally. Each + indicates a possible cluster and each – indicates an unattested cluster. A + between parentheses indicate rare clusters in non-adapted loans. Liquids combine with more consonants than nasals, which may be seen as a reflection of their relatively unmarked sonority. (2)

German complex onsets C2 C1 p t k b d g f v z > pf > ts S

m – – (+) – – (+) – – – – – +

n (+) – + – – + – – – – – +

l + – + + – + + (+) (+) + – +

ö + + + + + + + + – + + +

s + – (+) – – – – – – – – –

v – (+) + – – – – – + – + +

In addition to sonority, the Obligatory Contour Principle (Goldsmith 1976; Frisch and Zawaydeh 2001; Frisch, Pierrehumbert and Broe 2004; Kager and Shatzman 2007; Coetzee 2008) plays a role in constraining complex onsets. Consonants in a complex onset may not share a place of articulation. This ex-

Sonority intuitions are provided by the lexicon

203

>> >> plains the absence of these clusters: *[pm], *[pv], *[tn], *[tl], *[pfm], *[pfv], > > *[tsn], *[tsl], *[fm], *[fv] and *[zn].7 Some attested clusters are very rare and occur in just a handful of words, such as [pn] in [pnOYmolog@] Pneumologe ‘pneumologist’, and [km] in [kmE5] Khmer ‘Khmer’. The clusters [zl], [zv] and [ml] only occur in Serbo-Croatian > names, such as [zlatan] Zlatan, [zvOnomi5] Zvonomir and [mladItS] Mladic. The latter clusters are produced without any change, which is why we think they are acceptable clusters, even if they might not be judged as grammatical as, for example, [kl] clusters. The clusters with [s] and [v] as second members form a sonority plateau. Clusters with [v] as second member are remarkable, not only because they have a flat sonority, but also because they are the only clusters in which a voicing difference between obstruents is allowed. Wiese (1996) argues that [v] is /U/ underlyingly, and suggests that the sonority scale affects strings that are more abstract than the surface form. However, there are no alternations between [v] and [U], and it seems unlikely that the sonority scale, with its intrinsic appeal to a physical property of segments – sound level protrusion (Parker 2002, 2008) – would refer to mental representations. Moreover, in recent loans and in foreign names Germans do use a rounded glide. The name of the Chinese artist Ai Weiwei is pronounced [Aj we:we:] in German and the Nintendo’s Wii is pronounced [wi:] (not [vi:]). So, there is reason to assume that [v] is indeed a voiced fricative and not a veiled glide. 3.1. Frequency of clusters in German Should the markedness of onset clusters be read off the lexicon, and their relative well-formedness follow from their probability, then we need to establish their frequency. If clusters with a greater sonority distance occur more often than clusters with a smaller sonority distance, this might shape the preference of native speakers for clusters. We created a corpus of 11,427 words taken from the spoken version of CELEX (Baayen, Piepenbrock and Gulikers 1995). We took only words that consist of three or fewer syllables, since there are relatively few words that are longer and these words probably provide no new types of syllables. The corpus contains 6071 nouns, 3278 verbs, 1691 adjectives and 387 adverbs. There are 4792 words with three syllables, 5487 words with two syllables and 1148 7. There are clusters that are sonority plateaus according to (1): [St], [Sp] and [sk]. These sequences are the only ones that occur in clusters of three consonants ([Stö], [Spö], [Spl], [skl], [skö]).

204

Ruben van de Vijver and Dinah Baer-Henney

words with one syllable. The words were syllabified by assigning boundaries on the basis of the Maximum Onset Principle (Selkirk 1982), which states that as many prevocalic consonants are parsed into the onset of a syllable as the phonotactics of German allow. Boundaries were also added at morpheme boundaries, except between consonant final stems and vowel initial suffixes. In those cases, the final consonant of the stem was parsed as an onset. In the infinitive fahren /faö/stem+/@n/suffix ‘to drive’, the stem final [ö] is syllabified as onset of the last syllable: [fa.ö@n]. The frequencies of complex onsets derived from this corpus are provided in Table 1. The cells with the dashes indicate absent onsets (see (2) above). The

Table 1. Frequencies in German onsets. The number of types is given and the number of tokens is in parentheses. C2 C1 m n l ö s v Total p – 1(1) 83 121 2 (9) – 207 (1022) (1833) (2865) – – – 291 – 0 (0) 291 t (4766) (4766) k 0 (0) 67 162 171 0 (0) 38 438 (390) (2240) (2347) (328) (5305) – – 143 194 – – 337 b (2494) (2601) (5095) d – – – 85 – – 85 (1109) (1109) 0 (0) 10 (76) 81 164 – – 255 g (2078) (4585) (6739) f – – 140 140 – – 280 (1100) (4238) (5338) – – 0 (0) 2 (5) – – 2 (5) v – 0 (0) – – 0 (0) 0 (0) – > – – 36 0 (0) – – 36 pf (302) (302) > – – – 0 (0) – 53 53 ts (1033) (1033) S 69 72 224 85 – 138 588 (303) (544) (3000) (1081) (1689) (6314) Total 69 150 869 1253 2 (9) 229 – (303) (1011) (12236) (22565) (3050)

Sonority intuitions are provided by the lexicon

205

numbers are types and the numbers in parentheses are tokens. The table uses the order of the sonority sequence scale given in (1). Not only do liquids combine with more consonants, they are also more frequent. This frequency distribution is indeed skewed; clusters with a relatively wellformed sonority profile tend to be more frequent than clusters with a relatively less well-formed sonority profile. Clusters with liquids are more frequent than clusters with nasals. Within onsets with liquids, clusters with the more sonorous [ö] are more frequent than clusters with the less sonorous [l]. However, the Sonority Sequencing Principle does not completely determine the frequency of a cluster, as can be seen from the relative high frequency of the cluster [Sv], which is quite ill-formed from the point of view of sonority. This is illustrated in Figure 1, which is a plot of the residuals of the χ 2 distribution of the type frequency of the clusters (in Figure 1 the affricates are annotated with ts and pf, and [S] is annotated with S). Clusters with liquids tend to have positive values and clusters with nasals tend to have negative values. The sonority scale in (1) is reflected in the distribution of onset clusters in Figure 1. The larger the sonority difference between the members of a cluster, the more often a cluster occurs. Clusters with [ö] are more likely to have positive residuals than clusters with nasals. Clusters with fricatives are more likely to have negative residuals than clusters with [l] The scale in Figure 1 is a model to explain the well-formedness of onset clusters, and makes no prediction about their relative frequency.8 It is clear, though, that native speakers of German receive more evidence for the wellformedness of larger sonority distances between the consonants in a cluster than for smaller sonority distances. It also suggests that native speakers of German receive evidence for clusters with smaller sonority profiles and even for clusters with flat sonority profiles. It remains to be seen, of course, if this intuition is correct. As Daland et al. (2011) did for English, we need to derive a grammar from the lexicon and the features in the lexicon which describes the phonotactics of natural classes. This grammar should then be used to assess the well-formedness of the words in our experiment. If the relative well-formedness of the words can be derived from the grammar, there should be a correlation between the judgments of native speakers and the assessment by the grammar. First we will discuss the rating experiment and then we will discuss the learning experiment.

8. The residuals are calculated as follows:

Observed-Expected √ . Expected

206

Ruben van de Vijver and Dinah Baer-Henney

Figure 1. Over- and underrepresentation of clusters.

4.

The experiment

We ran a head-to-head comparison of onset clusters in which we tested whether German native speakers have preferences for sonority profiles in nonce words. We used clusters that consist of segment combinations that are attested in the lexicon and clusters that consist of unattested segment combinations. The attested combinations differed in frequency and sonority profile, the unattested clusters differed – obviously – only in sonority profile.9 9. The experiment was also done with 7 children who were 7 years old. The aim was to be able to compare the sonority preferences of children and adults. The children were not able to reliably repeat the items. Their sonority preferences can therefore not be assessed, but the intension to compare children and adults influenced a number of design decisions.

Sonority intuitions are provided by the lexicon

207

We compared 6 onset clusters: [kl], [kv], [km], [ml], [kt], [mt]. These clusters not only allow us to establish whether the lexicon determines sonority intuitions, but also how the lexicon is organized if it does. If it is organized in terms of segments, the last four clusters should all be dispreferred – their combinations do (almost) not occur in German. Yet, if the lexicon is organized in terms of features, [km] should be preferred over [ml], since nasals are often preceded by obstruents, but rarely followed by liquids. Both [km] and [ml] should be preferred over [kt] and [mt], since sonorants occur more frequently in the second position of a cluster than obstruents. The onset [kt] should be preferred over [mt], since obstruents in the second position of a cluster are usually preceded by another obstruent. The clusters were combined with four rimes each: [If], [Ef], [Of] and [Uf]. These rimes were chosen because monosyllabic words with a complex onset that end in a rime with a lax vowel and a labial fricative are rare in German. The target words were excised from the carrier sentence Papa sah X öfters “daddy saw X from time to time”. This sentence guaranteed a comparable prosodic environment. All words have stress on the first syllable and there is a pitch accent on the target word (Parker 2008). We created pairs in which each nonce word was compared with each other nonce word, which resulted in 60 pairs. In each pair the nonces only differed in their onsets. Each onset was compared with all other nonces, which means that each onset was offered 20 times (6–1 pairs * 4 rimes) as one member of a pair. The onset was offered ten times as the first member of a pair and ten times as the second member of a pair. The order of the presentation was randomized for each participant. The 17 participants, 15 women and 2 men, were all students at the University of Potsdam, but none of them studied linguistics. They were given a small amount of money or study credits. They were told that they would hear words that are being considered as names for new toys. They would have to pick one of the words as best suited as name for a toy. After they had heard the words they would first have to repeat them. This made sure that the participant heard the intended onset. If the participant made a mistake in the word, the words were replayed and the participant was asked to repeat them again. The words could be repeated up to ten times, but no participant required more than 3 repetitions. Most mistakes were made with the [km] cluster, which was initially repeated as [kn] in a few cases, but then correctly repeated as [km] upon hearing the repetition. The experiment was run in Praat (Boersma and Weenink 2011) on a MacBook. The participants indicated their choice by a mouse click on the track pad

208

Ruben van de Vijver and Dinah Baer-Henney

of the mouse on the laptop. The preferences for onsets are calculated as the percentage of winners in all pairs in which the cluster was one of the alternatives. 5.

Results

The results of the experiment are given in Table 2. The clusters that consist of combinations of segments that appear in the lexicon – [kl] and [kv] – were chosen more often than those that are not. Of the clusters that consist of segment combinations that do not occur in the lexicon, there was a preference for clusters with a (slightly) rising sonority profile over clusters with a flat or falling sonority plateau. Table 2 lists the raw number of winners, the probabilities and the odds. The probabilities are the number of times a cluster is the winner divided by the total number of times the cluster appeared as a choice (which is 17 · 20 = 340). The odds are an indication of the chance the cluster is chosen as a winner compared to the chance that it would be chosen as the loser (Agresti 1990). They are calculated by dividing the chance of a cluster being chosen as a winner by the chance that it would not be chosen as a winner. If the odds are greater than 1 the chance that the cluster will end up as the winner are greater than that it will be chosen as the loser. The test words with clusters that appear as segment sequences in the lexicon ([kl] and [kv]) are more likely to be chosen than test words that begin with unattested segment sequences. Of the test words that begin with sequences that are unattested in the lexicon ([km], [ml], [mt] and [kt]), [km] and [ml], which have a less marked sonority profile, are less likely as a loser than [mt] and [kt]. The attested clusters, [kl] and [kv], are preferred over unattested clusters. As for the unattested clusters, they roughly follow the Sonority Sequencing Principle (Sievers 1876/1893; Jespersen 1904; Selkirk 1982; Clements 1990; Blevins 1995). The [km] cluster is preferred over the [ml] cluster, and both of these are preferred over the [kt] and [mt] clusters. The fact that there is no preference for [kt] clusters over [mt] clusters is a consequence, we think, of the fact that we have relatively few different clusters in our rating experiment. An analysis of the contingency table of the proportions of the times a cluster emerges as a winner shows that the proportions are not evenly distributed over all candidates (χ 2 (5) = 39.89, p < .0001). An inspection of the standardized residuals shows that [kl] and [kv] are overrepresented as winners and the other clusters are underrepresented as winners. The words with a less marked sonority profile [km] and [ml] are less underrepresented as winner than [mt] and [kt] (this is illustrated in Figure 2).

Sonority intuitions are provided by the lexicon

209

Table 2. Counts of winners from the experiment

kl kv km ml mt kt

Winner (raw)

Winner (probability)

odds

283 208 163 144 112 110

83.2 61.1 48.0 42.3 32.9 32.3

4.9 1.6 0.9 0.7 0.5 0.5

Figure 2. Distribution of standardized residuals in the experimental items

Now we can address the question of whether Germans would be able to map these intuitions from their lexicon. They cannot have mapped them directly from sequences of segments, since many of the onsets in the experiment do not occur as such in the lexicon. Segments, however, are not the smallest building blocks in phonology; features are (Jakobson, Fant and Halle 1952). Many phonotactic generalizations can be most economically expressed in terms of features. The fact that German lacks words that start with [s], even though there are words that start with [z] and [S], is expressed by the constraint in (3). (3)

Words do not start with [s] *word [–voice, –sonorant, +coronal, +continuant, –high]

It is likely that other phonotactic constraints, for example sonority constraints, can also be expressed in terms of distinctive features. If this is true, then we

210

Ruben van de Vijver and Dinah Baer-Henney

expect that sonority relations can be derived from the lexicon, since our corpus analysis showed that clusters that are more well-formed from the point of view of sonority are also more frequent. 6.

Phonotactic learner simulations

The phonotactic learner is a model that derives and weights phonotactic constraints from positive evidence and was proposed by Hayes and Wilson (2008).10 The learner deduces these constraints from a corpus of words and an index of features that define the segments in the corpus. The features are used to create a list of natural classes of segments that occur in the corpus. The search algorithm then looks for constraints that cover a large number of words in the lexicon using constraints that are as economical as possible – the natural classes it uses contain as few features as possible. The algorithm can be constrained to stop after a specified number of constraints are found and also to only look for constraints consisting of 1, 2, 3 or 4 natural classes. The weights are established by the algorithm in such a way as to maximize the probability of the words in the lexicon (Hayes and Wilson 2008). The grammar establishes the harmony of each word on the basis of its score, which is the sum of the number of constraint violations multiplied by the weight of each constraint. The learner creates a constraint-based grammar (Prince and Smolensky 1993/2004) that uses weighted constraints (Legendre, Miyata and Smolensky 1990c; Jäger 2007; Smolensky and Legendre 2006; Coetzee and Pater 2008; Potts et al. 2010; Pater 2009). Such a grammar can adequately deal with variation and gradience (Boersma 1998a; Boersma and Hayes 2001; Coetzee and Pater 2008; Potts et al. 2010; Pater 2009), and in addition is proven to converge (Goldwater and Johnson 2003; Jäger 2007). In our simulations we used our corpus of words as lexicon. We created a feature matrix based on a standard analysis of German (Wiese 1996; see the Appendix below). We used two different feature matrices, one with a feature for [syllable boundary] and one without. The phonotactic learner generalizes over linear strings without prosodic structure (Chomsky and Halle 1968). In order to simulate prosodic structure we added the feature [syllable boundary] to the feature matrix, but we do not claim that syllable structure in German is contrastive. We also used two different lexicons, one with syllable structure and one without. The syllable structure was indicated by an extra segment which 10. The software used in running the simulations can be downloaded from: http://www.linguistics.ucla.edu/people/hayes/Phonotactics/index.htm

Sonority intuitions are provided by the lexicon

211

was only characterized by the feature [+syllable boundary]11 This was done since word-internal onsets often have a more simple structure. We terminated the learner at four different points: after it had found 100, 200, 300 and 400 constraints. We specified that the constraints consisted either of maximally 2 or of maximally 3 sequences of natural classes. We obtained a grammar for each of these specifications. 6.1. Phonotactic learner results After training we used the grammar created by the algorithm to assess the words of the experiment. To this end, we fed each grammar the words of the experiment. The weighted constraints of each grammar then assigned a score to each word of the experiment. The scores assigned to the words by each grammar with 400 constraints are given in Table 3; one set of scores for the bigram grammars with and without the feature [+syllabic] and one set of scores for the trigram grammars with and without the feature [+syllabic]. The lower the score, the more harmonic a word is. In other words, there should be a negative correlation between the score of a word and its odds, if the sonority preferences of German native speakers could derive from their lexicon. Table 3. Scores of the test words assigned by different grammars consisting of 400 constraints

kl kv km ml mt kt

2grams, –syl.

2grams, +syl.

3grams, –syl.

3grams, +syl.

odds

4.257 4.654 6.91 6.34 6.623 5.09

1.572 1.938 7.331 8.43 3.536 2.684

0.00 1.779 5.999 4.327 10.802 8.588

0.00 2.734 10.201 15.151 22.017 18.353

4.9 1.6 0.9 0.7 0.5 0.5

The trigram grammars, based on constraints that span maximally three segments, do better than the bigram grammars, based on constraints that span maximally two segments. This is because bigram grammars rate the clusters [mt] and [kt], which occur frequently at the end of a word, the same as, or better than, [km] and [ml] clusters. Trigram grammars, on the other hand, take into account that [mt] and [kt] are never followed by a vowel – they never occur word-initially. Native speakers have a preference for [km] and [ml] clusters over 11. We collapsed all vowels in the lexicon, because otherwise the number of features was too large for the phonotactic learner to work with. It ran out of working memory after having found only 50 constraints.

212

Ruben van de Vijver and Dinah Baer-Henney

[kt] and [mt]. The trigram grammars are, therefore, a better model of the intuitions of native speakers. Moreover, the difference between a grammar based on a lexicon and a feature set with and without the feature [+syllable boundary] seems relatively small. Both grammars rate [mt] onsets as worst, followed by [kt], with [km] and [ml] in between and [kv] and [kl] clusters as best. In Table 4 we show the scores of grammars without the feature [+syllable boundary] that contain 100, 200, 300 and 400 constraints; the third column in Table 3 corresponds to the fourth column in Table 4. Table 4 shows that trigram grammars without the feature [+syllable boundary] accurately model the responses of the participants in the rating experiment. The grammars with 200, 300 and 400 constraints model the ratings of the participants somewhat better than a grammar of 100 constraints, but there is little difference between grammars of 200, 300 and 400 constraints. Table 4. Scores of the test words assigned by different grammars consisting of 100, 200, 300 and 400 trigram constraints without the feature [+syllable boundary] 100 200 300 400 odds kl 4.9 kv 2.265 2.415 1.779 1.6 3.386 2.934 6.490 5.999 0.9 km 4.412 4.177 4.572 4.327 0.7 ml mt 12.046 10.233 10.876 10.802 0.5 8.010 7.780 6.492 8.588 0.5 kt correlation .73 .81 .80 .84 (p = .01) (p < .001) (p = .01) (p = .006)

A correlation of .84 is achieved between the ratings and the scores of the unsyllabified words in a grammar of 400 constraints. Each of these grammars used constraint trigrams. The trigram grammars correlate well with the ratings, but the bigram grammars do not. There is no correlation between the rating and the scores of a bigram grammar with 300 constraints and a non-significant correlation of .36 between the ratings and the scores of a bigram grammar with 400. 7.

Discussion

Our study has shown that the preferences for onsets clusters can be derived from the lexicon. This is shown by the high correlation between the ratings of clusters and their assessment by the phonotactic learner. The lexicon has enough structure to allow native speakers to derive the Sonority Sequencing Principle, provided it is analyzed in terms of distinctive features, a conclusion also reached

Sonority intuitions are provided by the lexicon

213

for English by Daland et al. (2011). This is indicated by our study of the corpus, by which we showed that the sonority scale in (1) not only describes the relative well-formedness of onsets, but also their frequency. We ran a learning simulation to establish whether the lexicon provides enough evidence for native speakers to derive intuitions concerning the Sonority Sequencing Principle. We used the phonotactic learner (Hayes and Wilson 2008) to derive a grammar from our corpus and a feature matrix, which describes the segments in the corpus. There was a high correlation between the assessment of the test words by this grammar and the assessment of the test words by native speakers. This shows that the intuitions of native speakers can be derived from the lexicon. The best correlation was achieved by a grammar that uses constraints that are deduced from sequences of three segments (trigrams) rather than constraints deduced from sequences of two segments (bigrams). The syllable structure may be learned, we think, from trigram constraints. The problem with bigram constraints is that German has many words that end in [kt] ([pAkt]) Pakt ‘agreement’) and [mt] ([Amt] Amt ‘profession’) clusters. Such clusters are, of course, grammatical, but only at the end of a syllable. However, the bigram constraints do not include the crucial contextual piece of information and these grammars rate sequences of [kt] or [mt] independently of whether they occur prevocalically, where they are ungrammatical, or postvocalically, where they are grammatical. If the constraints consist of trigrams, word-final clusters would include information about the end of the word and therefore word-initial [kt] or [mt] are not acceptable at the beginning of a syllable. We implemented syllable structure by means of a feature [+syllable boundary], but found that it did not improve our results. This is because the constraints based on trigrams have enough contextual information – specifically the notions ‘postvocalic’ and ‘prevocalic’ – to mimic the effect of syllable structure. Since sonority can be expressed in terms of distinctive features, native speakers can derive restrictions on onsets due to sonority from their lexicon. If sonority is understood as sound level protrusions – relative loudness – (Parker 2002, 2008) and these are represented in features, sonority can be derived from the lexicon. It may turn out that sound level protrusion is but one aspect of sonority, and that other perceptual effects also play a role in determining the phonotactics of consonant sequences. Henke, Kaisse and Wright (this volume) argue that sound level protrusion is one of the perceptual cues that determines the phonotactics of segment sequences and that in addition the perceptual cues to place, manner and their interplay influence the phonotactics of consonant sequences. Native speakers can infer the phonotactics of consonant sequences from their lexicon even with these additional perceptual cues, as long as these cues are represented as distinctive features.

214 8.

Ruben van de Vijver and Dinah Baer-Henney

Conclusion

Preferences for sonority profiles can be derived from the lexicon, provided speech sounds are represented in the lexicon with distinctive features. If this is true, then the phonotactics of languages can be described by statements that prohibit sequences of natural classes of sounds (Daland et al. 2011). These statements allow native speakers to have intuitions about the phonotactics of their language that go beyond what is directly observable in the words of the language. Even if German does not have words that start with [ml] or [mt], this does not mean that Germans cannot use the lexicon to assess that [ml] is slightly more well-formed than [mt]. In fact, our study suggests that phonotactic generalizations can be deduced from the lexicon if the words in it are analyzed in terms of distinctive features. The high correlation between the ratings of words and the assessment of these words by a phonotactic grammar shows that this analysis has some merit. It is striking that the sonority preferences of languages are very similar (Sievers 1876/1893; Jespersen 1904; Selkirk 1982; Clements 1990; Blevins 1995). This raises the question why lexicons across the world seem to adhere to the Sonority Sequencing Principle? There are presently two competing theories available that address this question. In one theory, called Phonetically Based Phonology, the phonology is derived from phonetics (Hayes and Steriade 2004) and in the other, called Evolutionary Phonology, typological patterns are the result of language change (Blevins 2004, 2006). In Phonetically Based Phonology speakers analyze the words in their lexicon in terms of distinctive features and they make generalizations about their distribution. These distinctive features are phonologized information about perception and production of speech sounds and their distribution is to some extent determined by their perception and production. The preference for large sonority profiles is therefore a consequence of our knowledge of what is easy to perceive and easy to produce coupled with an active avoidance of sound patterns that are difficult to produce or perceive. Since human beings all have the same production and perception system, this would explain the similarity across languages (Hayes and Steriade 2004). In Evolutionary Phonology (Blevins 2004, 2006) the explanation is historical. Clusters that have a large sonority profile, [kö] for example, are more frequent cross-linguistically because they are less likely to undergo changes. Clusters with smaller sonority profiles, such as [kn] for example, are more likely to be changed in the history of a language. Changes occur because clusters with smaller sonority profiles are more likely to be perceived as having an epenthetic vowel, or because they are more likely to be produced with an epenthetic vowel. There are no grammatical pressures to pro-

215

Sonority intuitions are provided by the lexicon

duce a certain pattern. There is evidence in other work, though, that children and adults are more likely to generalize phonetically likely patterns than phonetically unlikely patterns (Wilson 2006; van de Vijver and Baer-Henney 2011; Baer-Henney and van de Vijver to appear). On the basis of the evidence provided in this chapter we cannot decide between these proposals. We know that knowledge of sonority can or may come from the lexicon, but we do not know why lexicons are structured the way they are. Appendix Features of the German consonant system as used by the phonotactic learner. The features are slightly adjusted from (Wiese 1996). voice

+

consonantal obstruent continuant delayed release nasal labial coronal dorsal front syllable boundary

p b + + -

t d + + -

k g + + -

f v + + + -

s z + + + -

S Z + + + -

ç J + + + -

x G + + + -

+ -

+ -

+ -

+ -

+ -

+ -

+ + -

+ -

h

> pf

> ts

$

m + -

n + -

N + -

l + -

ö + + -

+ + + -

+ + +

+ + +

-

+ + -

+ + -

+ + -

+ -

+ -

-

+ -

+ -

+

Abbreviations IPA International Phonetic Alphabet Acknowledgments We’d like to thank Laura Hahn and Kirill Elin for their help in running the experiment, Ralf Ditsch and Frank Kügler for their help in recording the material, Stella Gryllia for furnishing the Greek examples and Agata Renans for furnishing the Polish examples. We also would like to thank Steve Parker and Aditi Lahiri for their constructive comments on our chapter.

Part 2: Sonority and Phonetics

Sonority and central vowels: A cross-linguistic phonetic study Matthew Gordon, Edita Ghushchyan, Bradley McDonnell, Daisy Rosenblum and Patricia A. Shaw

Abstract. This paper reports results of a cross-linguistic study of four potential acoustic correlates of vowel sonority. Duration, maximum intensity, acoustic energy, and perceptual energy are measured in five languages (Hindi, Besemah, Armenian, Javanese, and Kw ak’w ala) in order to determine whether there is an acoustic basis for the position of schwa at the bottom of vocalic sonority scales. The five targeted languages belong to two groups. In three languages (Armenian, Javanese, and Kw ak’w ala), the reduced phonological sonority of schwa relative to peripheral vowels is manifested in the rejection of stress by schwa. In two languages (Hindi and Besemah), on the other hand, schwa is treated parallel to the peripheral vowels by the stress system. Results indicate that schwa is differentiated from most vowels along one or more of the examined phonetic dimensions in all of the languages surveyed regardless of the phonological patterning of schwa. Languages vary, however, in which parameter(s) is most effective in predicting the low sonority status of schwa. Furthermore, the emergence of isolated contradictions of the sonority scale whereby schwa is acoustically more intense than one or more high vowels suggests that phonological sonority in vowels may not be quantifiable along any single acoustic dimension.

1.

Introduction

Sonority refers to the relative prominence of different sounds. Scales based on sonority have proven very useful in characterizing a wide range of phonological phenomena, including syllabification, phonotactic constraints, and stress (see Parker 2011 for an overview of sonority in phonological theory). One of the productive research programs belonging to the study of sonority is the examination of the physical properties defining sonority distinctions (Parker 2002, 2008, 2011). This paper contributes to this research agenda by examining the acoustic basis for one type of sonority distinction that is particularly important in the description of many stress systems: the sonority contrast between central and peripheral vowels. An acoustic study of five languages shows that a number

220

Matthew Gordon et al.

of acoustic properties successfully predict sonority distinctions based on vowel quality, though there is no single parameter that correlates in all five languages with all of the sonority distinctions involving vowels. Central non-low vowels such as /@/ and /1/ rank lower on many sonority scales than more peripheral vowel qualities (Parker 2002, 2008). Much of the evidence for their sonority profile is drawn from stress systems in which unstressed vowels reduce to schwa, e.g. in English, and languages in which central vowels reject stress in words containing at least one peripheral vowel (see Kenstowicz 1997 and de Lacy 2002, 2004 for overviews), e.g. Mari (Itkonen 1955, Kenstowicz 1997), Javanese (Herrfurth 1964, Horne 1974), Aljutor (Kodzasov and Muravyova 1978, Kenstowicz 1997). For example, stress in most varieties of Armenian (Vaux 1998b) falls on the final syllable (1a) unless this syllable contains schwa (1b). (1)

Armenian stress (examples from Vaux 1998b:132) a. [mo"rukh ] ‘beard’, [ArtA"sukh ] ‘tears’, [jerkrAkedronA"kAn] ‘geocentric’ b. ["mAn@r] ‘small’, [jer"ph em@n] ‘sometimes’

The sonority distinction between interior and peripheral vowels potentially presents a challenge to the quantification of sonority in terms of acoustic phonetic scales, because intensity, the most reliable acoustic correlate of sonority (Parker 2002, 2008), has been shown to be greater for mid-central vowels like schwa than for high peripheral vowels due to the increased degree of vocal tract aperture associated with central vowels relative to high vowels. Furthermore, the cross-linguistic tendency for lower vowel qualities to be longer than higher vowels might superficially appear to preclude another potential phonetic correlate of sonority, duration, from predicting the reduced phonological sonority of non-low central vowels. An articulatory-based measure of sonority is potentially informative as a predictor of the phonological status of mid-central vowels like schwa since the tongue position associated with such vowels is closer to its default location in the center of the vocal tract. This holds true of all schwas regardless of whether they are underlying or the result of vowel reduction, epenthesis or excrescence triggered by a consonant (see Silverman 2011 for an overview of the various sources of schwa). Mid-central vowels thus require less movement of the tongue, and presumably less articulatory effort, than their more peripheral counterparts requiring vertical or horizontal movement of the tongue and jaw. Nevertheless, although an articulatory-driven account of the behavior of central vowels is intuitively appealing, it suffers from the notorious difficulty as-

Sonority and central vowels: A cross-linguistic phonetic study

221

sociated with quantifying physical effort. In contrast, measuring sonority along various acoustic dimensions is a far more tractable endeavor even if historically it has been a difficult task to pinpoint a single acoustic property that predicts all sonority distinctions (see Parker 2002, 2008 for discussion of this research program). For this reason, we believe it is worthwhile to exhaustively explore the potential acoustic basis for sonority before appealing to more elusive articulatory-based accounts. The merits of pursuing an acoustically-driven analysis of sonority are further justified by an extensive literature proposing numerous acoustic correlates of sonority (see Parker 2002, 2008 for an overview). Although most of these proposals are not accompanied by supporting acoustic evidence, Parker’s (2002) multi-dimensional phonetic examination of sonority in English and Spanish finds that a measurement of intensity correctly predicts the order of most classes of segments in cross-linguistic sonority scales assuming that other factors such as pitch are held constant. Building on his earlier work, Parker (2008) expands his study to include data from Quechua in addition to Spanish and English. He shows that a function based on a measure of intensity extremes, the peak intensity of vowels and the intensity nadir of consonants, fits closely to established sonority hierarchies. Jany et al. (2007) find a similarly close fit between Parker’s intensity-based equation and data from consonants in four additional languages (Mongolian, Hindi, Arabic, and Malayalam), though they employ mean RMS amplitude rather than intensity extremes. Given the demonstrated success of a measure of acoustic intensity as a correlate of sonority, we adopt the working hypothesis that the sonority of central vowels is predictable on acoustic grounds. Nevertheless, we will also explore the possibility that intensity is not the sole acoustic dimension on which sonority is projected. 2.

The sonority of schwa

Vowel sonority adheres to a hierarchy predictable from height and centrality (Kenstowicz 1997, de Lacy 2002, 2004, Gordon 2006) as shown in Figure 1.

Low V æ, a

Mid V e, o

 High sonority Figure 1. Sonority scale for vowels

High V i, u

Mid-central V @

High-central V 1  Low sonority

222

Matthew Gordon et al.

Low vowels such as /æ, a/ are the highest sonority vowels, followed by peripheral mid vowels /e, o/, followed by the peripheral high vowels /i, u/, the mid central vowel /@/, and, at the bottom of the sonority scale, the high central vowel /1/. This hierarchy is deducible from cross-linguistic observation of stress systems (see Kenstowicz 1997, de Lacy 2002, 2004), although most languages do not distinguish all levels of the hierarchy. The Kobon stress system (Davies 1981, Kenstowicz 1997) appears exceptional in exploiting all five distinctions in the hierarchy in Figure 1. Despite the intuitive basis of the vowel sonority scale in Figure 1, its phonetic grounding is not entirely transparent and, like consonantal sonority, may not be reducible to a single phonetic parameter (Ohala 1990, Ohala and Kawasaki-Fukumori 1997). An inverse correlation between vowel height and both duration and intensity is well established (Lehiste 1970), such that lower vowels are longer and more intense than higher vowels. However, a purely height-based correlation between the phonetic properties of duration and intensity and the phonological feature of sonority is insufficient to account for the reduced sonority of the mid central vowel /@/ relative to the high peripheral vowels /I, U/. This problem is apparent in the classic publications by Gordon Peterson and Ilse Lehiste (Lehiste and Peterson 1959, Peterson and Lehiste 1960), which show that stressed schwa in English, which is found in many British varieties of English including RP, is characterized by greater duration and/or average intensity (RMS amplitude) than many other more peripheral vowels. Intensity, the dimension shown by Parker (2002, 2008) to be the best predictor of sonority, is found by Lehiste and Peterson (1959) to be 5–8 decibels greater (depending on the experimental condition) for schwa than for the high vowels /I, U/. On the other hand, it is clear that unstressed schwa in English is shorter and less intense than its stressed counterparts. Parker (2008) finds that peak decibel levels for schwa are 2.3dB less than those averaged over the high vowels /i, I, u, U/ in American English. He also finds that barred-1, e.g. in the second syllable of words like roses, is a further 3.7dB less intense than schwa in keeping with the lighter status of the high central vowel relative to the mid central vowel in Kobon. It is unclear, however, whether the lower intensity of the central vowels relative to their more peripheral counterparts in Parker’s study is due to vowel quality or to stress since the two types of vowels are in virtual complementary distribution in American English. Gordon (2002, 2006) presents phonetic data comparing schwa and the peripheral vowels /a, i/ of Javanese, a language that treats schwa as light in its stress system. He finds that schwa is much shorter and has less overall perceptual energy than the peripheral vowels. He does not, however, present intensity data independent of his measure of auditory energy, which is a temporal inte-

Sonority and central vowels: A cross-linguistic phonetic study

223

gration of intensity transformed to reflect auditory loudness. It is thus unclear how much of the energy difference in his data is due to the shorter duration of schwa as opposed to its reduced acoustic intensity. Comparison of the Lehiste and Peterson (1959, 1960), Gordon (2002, 2006), and Parker (2008) studies suggests that schwa may be phonetically quite different between languages in keeping with differences in its phonological status (see Silverman 2011 on the phonological behavior of schwa cross-linguistically). On the one hand, in languages like British English in which it an underlying phoneme that may carry stress, schwa may display properties that are expected of its more peripheral mid vowel counterparts. On the other hand, in languages in which schwa resists stress whether underlying, as in Javanese, or the result of vowel reduction, as in American English, it appears to display phonetic characteristics that make it less prominent than not only the mid but also the high peripheral vowels. It is nevertheless unclear given the varied measurements taken in previous research (average intensity, duration, peak intensity, perceptual energy) exactly how central vowels phonetically differ on a language-specific basis. The current work seeks to remedy this lacuna in our understanding of the phonetic basis for the sonority of the mid central vowel schwa by examining its characteristics in several languages. Of particular interest is the phonetic sonority of schwa relative to high peripheral vowels, which might be expected to be phonetically less prominent than schwa, but which nevertheless occupy a higher position on phonological sonority scales. The set of studied languages includes those in which schwa behaves as a lower sonority vowel than more peripheral vowels as well as those in which schwa lacks any phonological characteristics suggesting that it is less sonorous than other vowel qualities. We explore various potential phonetic correlates of sonority in both types of languages in order to determine whether there is any universal correlate of sonority that predicts its assumed position at the bottom of the sonority scale for vowels or whether its phonetic characteristics differ across languages according to its phonological sonority. More generally, the present work also provides an opportunity for cross-linguistic investigation of the phonetic correlates of sonority in all vowels, including peripheral vowels of different heights as well as pairs of vowels distinguished on the basis of the tense/lax feature. Cross-linguistic examination of the phonetic basis for the sonority of schwa and other vowels promises to shed light on broad issues related to the mapping between phonetic properties and phonological patterns and the role of language-specificity in the phonetic and phonological domains.

224 3.

Matthew Gordon et al.

Methodology

Five languages were targeted for inclusion in the study. Three of these languages, Armenian, Javanese, and Kw ak’w ala, possess phonemic schwa that asymmetrically rejects stress in contexts in which more peripheral vowels attract stress. The other two languages, Besemah and Hindi, have a schwa phoneme that does not display any propensity to reject stress. The targeted languages including genetic affiliation (according to the 16th edition of the Ethnologue, online at http://www.ethnologue.com), primary places where they are spoken, and sources of data on each are summarized in Table 1. Further information concerning each language and the corpus of data analyzed for each is presented in the respective results sections (section 4) for individual languages. Table 1. Languages targeted in the phonetic study Language Genetic affiliation Primary location Armenian Indo-European Armenia Besemah Austronesian Sumatra, Indonesia Hindi Indo-European India

Javanese

Austronesian

Java, Indonesia

Kw ak’w ala

Wakashan

British Columbia

Primary data source(s) Vaux (1998b) McDonnell (2008) Dixit (1963), Kelkar (1968), Ohala (1977, 1999) Clynes and Rudyanto (1995), Horne (1974) Boas (1947), Bach (1975), Wilson (1986), Shaw (2009)

For all five languages, a series of potential phonetic correlates of sonority were measured for the phonemic vowel qualities of the language. All of the target vowels appeared in an interconsonantal environment and had the same level of stress in the target words, which were produced between two and five times (depending on the language) by each speaker in randomized order. In order to control for microprosodic effects, an attempt was made to control for surrounding consonants, in particular voicing, to the extent possible. Further details about the methodology employed for each language appear in the sections devoted to the results for the individual languages. The corpus for each language appears in the appendix. Measured properties included the following: duration, maximum intensity, first formant values, total acoustic intensity (intensity integrated over time), and total perceptual energy (temporally integrated acoustic intensity submitted to a number of filters designed to model independently known properties of audition). Of these measurements, the first four (first formant, duration, maxi-

Sonority and central vowels: A cross-linguistic phonetic study

225

mum intensity, and average intensity) were calculated using Praat (Boersma and Weenink 2010), while the last two, the intensity and perceptual energy summations, were computed using Cricket, custom software developed at UCSB (Gordon and Nash 2007; downloadable at http://www.linguistics.ucsb.edu/faculty/ gordon/projects.htm). This software is designed to perform an intensity summation as well as an auditory transform designed to capture the perceptual prominence of a sound rather than its physical intensity (see below for further discussion). Most of the targeted measurements are straightforward with the exception of auditory energy, which is described in detail below. The beginning and end points of the target vowels were marked using a waveform in conjunction with a time-aligned spectrogram with the second formant onset and offset serving as the demarcation point for the beginning and end of the vowel, respectively. Duration, maximum intensity, and first formant values were collected in Praat using a script. Values for the first formant were calculated over a 25 millisecond window centered on the midpoint of the vowel. Formant values were collected in order to assess whether any differences in the other measured properties might be predictable from phonetic differences in vowel quality that might not emerge in broad phonemic transcriptions used in phonological descriptions. Total acoustic intensity was calculated in Cricket by sliding an 11.6millisecond (256points) window over the entire duration of each target vowel with a new window starting at the end point of the previous one. Within each window, intensity was averaged over the frequency range of 0–10 kHz with a resolution of 86Hz. The average intensity values for all the windows were summed together to yield a total intensity value integrated over time. In case the duration of the vowel was not a multiple of 11.6ms, the last window factored into the summation was smaller than 11.6ms. We turn now to the measure of total auditory energy. The auditory energy values in the current work are based on power spectra calculated using the same sliding 11.6millisecond (256points) window used to perform the intensity integration. Within each window, intensity was calculated throughout the frequency range of 0–10kHz with a resolution of 86Hz. Spectra were computed for successive windows stretching over the entire duration of each target rime. These spectra were submitted to a series of filters representing various processes that take place in the mapping of an acoustic signal to an auditory one. The first two stages in the auditory transform model the bandpass filtering properties of the outer and middle ear. The first filter is an outer ear filter capturing the bandpass filtering characteristics of the pinna and the outer auditory canal (the meatus). The natural resonating frequency of the outer ear is about 2.5kHz with an approximately 10dB per octave attenuation on either side of 2.5kHz (Shaw 1974).

226

Matthew Gordon et al.

The lower skirt of this filter becomes flat at 1.25kHz, one octave below 2.5kHz. The next filter represents the bandpass filter provided by the middle ear, where pressure fluctuations on the eardrum are converted to mechanical energy by the ossicles. The middle ear is a maximally effective transducer of energy at approximately 1.5kHz, with a 15dB per octave attenuation at frequencies above and below 1.5kHz (Nedzelnitsky 1980). Because a greater proportion of the measured frequency range falls above the center frequency of 1.5kHz, the result is a greater relative diminution of energy at higher frequencies. The next step in the auditory transform models the bandpass filtering characteristics of the auditory system (Patterson et al. 1982, Moore and Glasberg 1983). Cricket uses a symmetric filter with a 60dB per octave attenuation linearly interpolated from the center frequency to the base of the skirts, which increase in breadth as the center frequency increases. The filter was slid over the entire frequency range of each spectrum starting at the highest frequency and working downward calculating the attenuation of each intensity value in the range of frequencies affected by the filter (Bladon and Lindblom 1981). The net response at any frequency was the sum of the responses to the filter as it progresses through the frequency domain. The overall loudness of each spectrum was then calculated by summing the outputs of all the bandpass filters. The next step involves the modeling of temporal effects in the auditory response as adaptation and recovery functions (e.g. Plomp 1964, Wilson 1970, Viemeister 1980). The adaptation function captures the gradual decline in sensation to a continued stimulus, while the recovery function reflects the boost in auditory response after a reduction in stimulus intensity. In the present model, adaptation and recovery were implemented as follows. First, the total loudness value for the second spectral slice in the rime is compared with the loudness values for the first spectral slice. If the loudness of the second frame exceeds that of the first frame, the difference in loudness between the two frames is multiplied by a recovery factor yielding a value that is added to the loudness value of the second frame to yield an output loudness value for the second frame. If, however, the loudness of the first frame is greater than that of the second frame, the difference in loudness between the two frames is multiplied by an adaptation factor that is subtracted from the loudness value of the second frame to yield an output loudness value for the second frame. The loudness of the third frame is compared with the output loudness value averaged over the previous two frames. This procedure proceeds from left to right throughout the entire duration of the rime by comparing the loudness of a given spectrum with a baseline loudness value reflecting the average of the output loudness values for all the previous spectra. An adaptation factor of 2decibels per frame was employed, while a recovery factor of 1decibel per frame was assumed based on a synthe-

Sonority and central vowels: A cross-linguistic phonetic study

227

sis of results from a variety of sources (Plomp 1964, Wilson 1970, Viemeister 1980). The final step in the auditory model is a simple summing of the auditory loudness values for each spectrum, thereby yielding a single measure of auditory energy. Implementation of the auditory model potentially has a number of effects on the acoustic data feeding into it. We describe here some of these effects and their relationship to predictions about vowel sonority, focusing first on the frequency dependencies. The outer and middle ear filter provide a boost in loudness to sounds characterized primarily by energy in the bottom half of the examined 0–10 kHz frequency range. In particular, frequencies falling between the peak of the middle ear filter at 1.5kHz and the peak of the outer ear filter at 2.5kHz receive the greatest boost. This frequency selectivity potentially accounts for the propensity of lower vowel qualities to attract stress in several languages. The first and the second formants for a low central vowel like the prototypical one found in most languages with a single low vowel lie close together near to the 1.5kHz peak associated with the middle ear filter. For this reason, low vowels are perceived as louder than higher vowel qualities. In contrast, the first formant for higher vowel qualities is much lower than 1.5kHz and would not benefit from the auditory boost. We also might expect high front vowels to have greater auditory energy than high back vowels due to the location of the second formant for high front vowels in the 2kHz to 3kHz range. The damping of acoustic energy associated with increases in frequency, however, potentially offsets any auditory advantage of the more forward articulation in the case of high vowels. The predictions relating vowel backness to auditory energy are thus less clear-cut. Central vowels like schwa might also be predicted to receive a perceptual boost relative to high back vowels since their second formant values are closer to the 1.5kHz center frequency of the middle ear filter. The auditory prominence of central vowels is potentially compromised, however, by two properties. First, they are often shorter than other vowels, at least in languages such as Javanese (Gordon 2002, 2006), in which they pattern as low sonority vowels in the stress system. Second, it is conceivable that the acoustic intensity of central vowels is low enough relative to other vowel qualities, again perhaps on a languagespecific basis, to offset any perceptual boost they might receive due to their distribution of energy. Comparison of the acoustic measurements of duration and intensity with perceptual energy values in the present work will allow for assessment of the relative contribution of different acoustic parameters to the overall perceptual prominence of vowels. Furthermore, perceptual energy is another potential phonetic correlate of sonority whose efficacy in predicting phonological sonority can be compared with that of acoustic properties.

228 4.

Matthew Gordon et al.

Results

Sections 4.1–4.5 present the results for the five languages targeted in our study, beginning in sections 4.1 and 4.2 with those for languages (Hindi and Besemah) in which schwa does not pattern differently from other vowels with respect to stress. In sections 4.3–4.5, we move on to languages in which schwa tends to reject stress (Armenian, Javanese, and Kw ak’w ala). 4.1. Hindi 4.1.1. Background Standard Hindi has a ten vowel system in the native vocabulary that is based partially on length and partially on vowel quality. Most of the peripheral vowels come in pairs, one of which is slightly more peripheral, i.e. tense, as well as longer than the other (lax) member of the pair (Ohala 1999). There are two central vowels, a schwa and a low vowel. The vowel phonemes of Hindi appear in Table 2 following the conventions of Ohala (1999) with the exception of replacing /A/ with /a/. Table 2. Vowels of Hindi Front Central High i I Mid e @ E Low a

Back u U o O

There is disagreement in the literature about the exact principles governing the location of stress in Hindi (see Ohala 1977 and Hayes 1995 for overview and analysis), though the weight criterion described is relatively consistent across accounts. Most scholars are in agreement that both closed syllables and those containing a tense vowel are treated as heavy, with certain scholars (e.g. Kelkar 1968) pointing to a third superheavy degree of weight assigned to syllables closed by two consonants and to closed syllables containing a tense vowel. The simplest characterization of the primary stress rule is the one adopted by Dixit (1963), as discussed in Ohala (1977). Stress falls on the rightmost non-final heavy syllable (2a) or on the final syllable if the only heavy syllable is final (2b). In words in which all syllables are light, stress fall on the penultimate syllable (2c). Examples are from Ohala (1977).

Sonority and central vowels: A cross-linguistic phonetic study

(2)

229

Hindi stress a. [a"v@Sj@k ] ‘necessary’, ["kim@t] ‘price’, [pa"kIstan] ‘Pakistan’, [In"sanIj@t] ‘humanity’ b. [go"b@r] ‘cow dung’, [r@"soi] ‘kitchen’, [@mI"ta] (proper name) c. [tO"lIja] ‘towel’1

As the examples in (2) show, schwa patterns with other lax vowels in attracting stress when it is followed by a coda consonant and rejecting stress when it is in an open syllable. Other variants of the stress system reported by Dixit (1963) differ in the location of primary stress (see Hayes 1995 for discussion), but crucially for our purposes, schwa patterns together with all non-schwa lax vowels with respect to its ability to attract stress. 4.1.2. Methodology The target vowels for Hindi all appeared in the first and stressed syllable of a disyllabic word. Each measured vowel appeared in two words. Measured vowels were followed by a sonorant consonant (except for the vowel in the first syllable of "wESja ‘prostitute’) and each vowel appeared in two words, one in which the target vowel occurred in a closed syllable and the other in which it was found in an open syllable. The Hindi words were embedded in the middle of the carrier phrase ham ab ________ bolte hain ‘We now say ________’. Each phrase and thus its embedded target word was read five times by a speaker of standard Hindi and recorded on a SONY DAT recorder in a soundproof booth at a sampling rate of 44.1kHz using a headworn microphone (Shure SM10) before being converted to .wav files in preparation for acoustic analysis. Data from three male speakers were analyzed. 4.1.3. Results Figure 2 depicts first formant values for the measured Hindi vowels averaged across the three speakers. Bars are ordered from left to right in order of height (from lower to higher vowels) and then frontness (front to back) with schwa on the far right. Individual speaker means along with the number of tokens and standard deviations appear in Table 3. As expected, given its low tongue body position, first formant values are highest for the low vowel /a/. The mid vowels, in turn, have higher first formant 1. In the appendix, Ohala (1977:336) transcribes certain all-light words with antepenultimate stress, although her text description (p. 330) predicts penultimate stress for them.

230

Matthew Gordon et al.

Figure 2. First formant values averaged across five tokens each produced by three Hindi speakers. Whiskers indicate one standard deviation from the mean. Table 3. Mean first formant values (in Hz) for three Hindi speakers Speaker Vowel M1 M2 M3 N Mean Std.Dev. N Mean Std.Dev. N Mean Std.Dev. a 10 729 25 10 789 31 10 739 21 E 10 550 106 10 544 39 5 629 28 O 10 547 64 10 620 142 10 718 51 e 10 516 113 9 400 16 10 457 49 o 10 476 44 9 403 28 10 466 15 I 10 412 15 10 431 38 8 462 88 U 10 402 32 9 381 25 10 469 56 i 10 352 25 9 416 66 10 358 28 u 10 352 30 8 383 62 9 381 37 @ 10 576 31 9 604 33 10 614 52

values than the high vowels, although this relationship only holds for vowels of equivalent tenseness/laxness. The lax high vowels /I/ and /U/ interestingly do not have reliably lower first formant values than the tense mid vowels /e/ and /o/, suggesting that the contrast between these two sets of vowels resides at least partially in the second formant. Schwa occupies a height equivalent to that of the lax mid vowels /E, O/, lower than the tense mid vowels but considerably higher than the low vowel. Graphs depicting results for the four measured correlates of sonority appear in Figure 3, duration in the top left, maximum intensity in the top right, acoustic energy in the bottom left, and perceptual energy in the bottom right. Individual speaker values for each dimension are given in Tables 4–7.

Sonority and central vowels: A cross-linguistic phonetic study

231

Figure 3. Duration (top left), maximum intensity (top right), acoustic energy (bottom left), and perceptual energy (bottom right) values averaged across three Hindi speakers. Whiskers indicate one standard deviation from the mean.

Table 4. Mean duration values (in seconds) for three Hindi speakers Speaker Vowel M1 M2 M3 N Mean Std.Dev. N Mean Std.Dev. N Mean Std.Dev. a 10 0.116 0.011 10 0.134 0.013 10 0.178 0.022 e 10 0.089 0.005 9 0.093 0.023 10 0.142 0.017 o 10 0.103 0.015 9 0.101 0.019 10 0.144 0.019 E 10 0.091 0.013 10 0.082 0.014 5 0.081 0.014 O 10 0.119 0.012 10 0.109 0.008 10 0.164 0.024 i 10 0.077 0.014 9 0.079 0.022 10 0.143 0.010 u 10 0.083 0.020 8 0.083 0.019 9 0.137 0.009 I 10 0.055 0.011 10 0.053 0.015 8 0.069 0.027 U 10 0.050 0.014 9 0.049 0.010 10 0.064 0.011 @ 10 0.078 0.010 9 0.082 0.010 10 0.081 0.015

232

Matthew Gordon et al.

Table 5. Mean maximum intensity values (in decibels) for three Hindi speakers Speaker Vowel M1 M2 M3 N Mean Std.Dev. N Mean Std.Dev. N Mean Std.Dev. a 10 78.5 1.0 10 83.7 0.9 10 84.2 2.9 e 10 77.3 2.8 9 79.1 1.6 10 79.0 1.8 o 10 71.5 1.8 9 80.1 3.1 10 80.8 2.4 E 10 78.2 2.7 10 77.1 1.1 5 82.0 3.1 O 10 75.5 2.9 10 78.2 2.2 10 81.6 2.2 i 10 70.2 3.2 9 77.5 4.8 10 76.1 1.8 u 10 67.9 1.5 8 73.7 1.1 9 73.7 3.6 I 10 74.9 3.2 10 75.8 2.5 8 79.8 4.6 U 10 70.0 2.3 9 75.9 2.4 10 76.5 3.7 @ 10 75.8 6.2 9 76.5 4.4 10 81.1 2.8 Table 6. Mean total acoustic energy values (in decibel seconds) for three Hindi speakers Speaker Vowel M1 M2 M3 N Mean Std.Dev. N Mean Std.Dev. N Mean Std.Dev. a 10 76777 7806 10 94819 10505 10 126039 18538 e 10 60047 5742 9 64274 17181 10 94660 11761 o 10 61761 9364 9 66534 12431 10 93346 15738 E 10 59910 10184 10 55069 8914 5 60115 9216 O 10 73482 10606 10 68342 7213 10 106016 14825 i 10 46503 8780 9 53215 12261 10 88999 8872 u 10 44983 11856 8 50882 11229 9 79843 9784 I 10 37118 8911 10 34964 10131 8 49650 20924 U 10 29537 8774 9 32052 7280 10 41486 8327 @ 10 50633 10075 9 54157 5489 10 57627 9426 Table 7. Mean perceptual energy values (in arbitrary units) for three Hindi speakers Vowel a e o E O i u I U @

N 10 10 10 10 10 10 10 10 10 10

M1 Mean Std.Dev. 1.70E+6 3.87E+5 1.06E+6 3.77E+5 8.87E+5 2.54E+5 1.23E+6 4.71E+5 1.09E+6 2.56E+5 5.85E+5 1.75E+5 5.08E+5 1.35E+5 6.67E+5 2.44E+5 3.85E+5 1.54E+5 1.07E+6 6.27E+5

N 9 9 10 10 9 8 10 9 9 10

Speaker M2 Mean Std.Dev. 3.16E+6 5.30E+5 1.36E+6 3.80E+5 1.76E+6 8.11E+5 1.01E+6 2.59E+5 1.39E+6 3.28E+5 1.06E+6 2.74E+5 7.73E+5 2.66E+5 6.93E+5 3.46E+5 4.92E+5 2.09E+5 9.54E+5 2.46E+5

N 10 10 10 5 10 10 9 8 10 10

M3 Mean Std.Dev. 4.94E+6 1.65E+6 2.31E+6 4.88E+5 2.52E+6 1.16E+6 1.68E+6 3.99E+5 2.72E+6 6.27E+5 1.67E+6 5.36E+5 1.45E+6 5.73E+5 1.43E+6 7.43E+5 8.13E+5 3.34E+5 1.75E+6 6.03E+5

233

Sonority and central vowels: A cross-linguistic phonetic study

One-factor analyses of variance (ANOVA) conducted for each of the four phonetic parameters indicated a significant effect of vowel quality on all parameters: for duration, F(9, 275) = 37.312, p < .001; for maximum intensity, F(9, 275) = 16.171, p < .001; for acoustic energy, F(9, 175) = 35.793, p < .001; for perceptual energy, F(9, 275) = 24.086, p < .001. Table 8 summarizes the vowels differentiated at p < .05 or less according to Scheffe posthoc tests along each of the four phonetic parameters measured. “D” stands for duration, “I” for maximum intensity, “A” for acoustic energy, and “P” for perceptual energy. Sonority reversals, cases in which a phonetic parameter contradicts the ranking of two vowel qualities along phonological sonority scales, are indicated by “!” after the relevant parameter. Thus, for example, schwa has greater intensity than /u/ even though /u/ ranks higher on sonority scales. Table 8. Summary of vowels distinguished by different phonetic parameters in Hindi

a

e o

i u

DAP

IP DA

e DAP

o IAP D

i u DIAP DIAP DIAP DIAP DA I DA DIAP DA DIA DA DIAP I DA DAP I DA DA DI DA

DIAP DA DA DIA A! I!

The most reliable sonority distinction among the vowels is between the low vowel /a/ and all other vowels with /a/ having greater perceptual energy and, in most cases, greater duration, maximum intensity and acoustic energy, than other vowel qualities. There is also a somewhat weaker tendency for the mid vowels to be differentiated from the high vowels, in particular, the lax high vowels, although the relevant differentiating parameter(s) varies depending on the vowels involved. Although schwa is distinguished from most of the mid vowels (with the exception of /E/), the difference between schwa and the high vowels is not robustly manifested along the measured phonetic dimensions. In fact, there are two sonority reversals in which schwa is less prominent than a high vowel occupying a position higher on the sonority scale. Schwa thus has greater maximum intensity than /u/ and schwa has greater acoustic energy than /U/. On the other hand, these reversals are only reversals when considered from a phonological

234

Matthew Gordon et al.

standpoint. Phonetically, the fact that schwa may be more prominent than high vowels along certain dimensions is not surprising given schwa’s lower tongue body position. Of the measured correlates of sonority, duration and acoustic energy make the most sonority distinctions, both distinguishing 25 of the 45 possible pairwise comparisons. Both fare well in differentiating the low vowel from most other vowels and distinguishing the mid vowels (with the exception of /E/) from the high vowels. Neither duration nor acoustic energy, however, differentiates the high lax vowels from schwa in the direction predicted by sonority scales. Maximum intensity and perceptual energy predict 16 and 12, respectively, of the pairwise comparisons. Of all the measures, perceptual energy is the best at distinguishing the low vowel from other vowels, although it does not distinguish any of the high vowels from schwa. Maximum intensity is the most successful differentiator of mid and high vowels, although it too fails to capture the lesser phonological sonority of schwa relative to the high vowels. 4.2. Besemah 4.2.1. Background Besemah (McDonnell 2008) is an Austronesian language of Sumatra with a conservative vowel system consisting of four vowel phonemes (see Table 9) including schwa. Table 9. Vowels of Besemah Front High

Central

i

Back u

Mid

@

Low

a

Schwa contrasts with the other vowel phonemes in the penultimate syllable in both open and closed syllables, but is in complementary distribution with /a/ in word-final position with /a/ occurring in closed syllables and schwa in open syllables. Although the properties of word-level stress are not entirely clear, it is clear that in words in isolation stress falls on the final syllable, even when schwa is in the final syllable (3). (3)

Besemah stress [ti"pu] ‘spy’, [ka"t@] ‘word’, [ti"tu] ‘that’, [a"pi] ‘fire’

Sonority and central vowels: A cross-linguistic phonetic study

235

4.2.2. Methodology The Besemah data was recorded as part of a study of vowel quality in Besemah, so the recording conditions differ slightly between Besemah and the other examined languages. The measured vowels for Besemah appeared in the penultimate (unstressed) syllable of a disyllabic word, a context in which all four vowel phonemes are found. In stressed (final) syllables, /a/ and schwa are in complementary distribution, with schwa occurring in open syllables and /a/ in closed syllables. The analyzed Besemah words were uttered twice in isolation. Each word was recorded on a Marantz PMD670 solidstate recorder with an Audio-Technica AT825 stereo microphone at a sampling frequency of 48kHz. The sound files were stored as .wav audio files in preparation for analysis. Data from two male speakers and two female speakers were collected. 4.2.3. Results Figures 7 and 8 show mean first formant values averaged across male and female speakers, respectively. Individual speaker results for all speakers appear in Table 14.

Figure 4. Mean first formant values averaged across two male (left) and two female (right) Besemah speakers. Whiskers indicate one standard deviation from the mean.

Table 10. Mean first formant values (in Hz) for four Besemah speakers Vowel a i u @

N 4 4 4 4

F1 Mean Std.Dev. 886 55 400 25 459 37 527 159

N 4 4 4 4

Speaker F2 M1 Mean Std.Dev. N Mean Std.Dev. 736 13 4 637 34 369 20 4 307 15 393 8 4 372 22 435 55 4 367 25

N 4 4 4 4

M2 Mean Std.Dev. 610 27 297 15 350 25 458 30

236

Matthew Gordon et al.

Formant values suggest a four-way height distinction with /i/ being highest followed in turn by /u/, /@/ and /a/. The higher F1 values associated with /u/ is consistent with a common cross-linguistic pattern (de Boer 2011).2 This fourway distinction is most clearly evinced in the data from speakers F1 and M2. The distinction between schwa and /u/ is least robust across speakers as one of the male speakers (M1) has virtually identical first formant values for the two vowels, and one of the female speakers (F2) has a relatively small 42Hz difference in first formant values between these two vowels. Speaker F2 also displays the smallest difference between the two phonemic high vowels /i, u/. Graphs depicting results for the four measured correlates of sonority appear in Figure 5, duration in the top left, maximum intensity in the top right, acoustic energy in the bottom left, and perceptual energy in the bottom right. Individual speaker values for each dimension are given in Tables 11–14.

Figure 5. Duration (top left), maximum intensity (top right), acoustic energy (bottom left), and perceptual energy (bottom right) values averaged across four Besemah speakers. Whiskers indicate one standard deviation from the mean.

One-factor analyses of variance (ANOVA) conducted for each of the four phonetic parameters indicated a significant effect of vowel quality on all parameters 2. Thanks to Steve Parker for pointing out this cross-linguistic tendency.

Sonority and central vowels: A cross-linguistic phonetic study

237

Table 11. Mean duration values (in seconds) for four Besemah speakers Vowel a i u @

N 4 4 4 4

F1 Mean Std.Dev. 0.082 0.012 0.062 0.016 0.072 0.011 0.032 0.016

N 4 4 4 4

Speaker F2 Mean Std.Dev. N 0.090 0.011 4 0.088 0.025 4 0.089 0.006 4 0.026 0.007 4

M1 Mean Std.Dev. 0.088 0.006 0.078 0.006 0.081 0.009 0.034 0.016

N 4 4 4 4

M2 Mean 0.086 0.097 0.079 0.035

Std.Dev. 0.005 0.025 0.012 0.009

Table 12. Mean maximum intensity values (in decibels) for four Besemah speakers Vowel a i u @

N 4 4 4 4

F1 Mean Std.Dev. 79.06 2.30 79.14 3.57 78.88 2.06 73.15 2.42

N 4 4 4 4

Speaker F2 Mean Std.Dev. N 78.90 1.97 4 78.74 3.45 4 81.26 4.06 4 72.43 2.63 4

M1 Mean Std.Dev. 79.63 2.61 80.47 2.13 79.18 2.29 74.56 1.21

N 4 4 4 4

M2 Mean Std.Dev. 80.90 2.41 81.59 3.64 82.45 3.61 80.08 1.83

Table 13. Mean acoustic energy values (in decibel seconds) for four Besemah speakers Vowel a i u @

N 4 4 4 4

F1 Mean Std.Dev. 10121 18872 80351 18679 87783 11180 37841 18229

N 4 4 4 4

Speaker F2 Mean Std.Dev. N 106510 12677 4 99784 27526 4 94670 11277 4 30574 79561 4

M1 Mean Std.Dev. 1458371 494461 726867 731605 1323827 320330 491690 259154

N 4 4 4 4

M2 Mean Std.Dev. 115498 10496 133150 25653 107883 18360 52961 8724

Table 14. Mean perceptual energy values (in arbitrary units) for four Besemah speakers Speaker Vowel a i u @

N 4 4 4 4

F1 Mean 1.61E+6 1.19E+6 9.25E+5 4.44E+5

Std.Dev. 4.13E+5 4.92E+5 1.73E+5 2.11E+5

N 4 4 4 4

F2 Mean 1.36E+6 1.22E+6 1.32E+6 2.89E+5

Std.Dev. 3.30E+5 6.57E+5 4.07E+5 6.41E+4

N 4 4 4 4

M1 Mean 1.46E+6 1.32E+6 1.32E+6 4.92E+5

Std.Dev. 4.94E+5 1.42E+5 3.20E+5 2.59E+5

N 4 4 4 4

M2 Mean 2.09E+6 2.14E+6 2.30E+6 7.26E+5

Std.Dev. 4.41E+5 5.35E+5 1.18E+6 4.83E+4

except acoustic energy: for duration, F(3, 60) = 50.227, p < .001; for maximum intensity, F(3, 60) = 10.531, p < .001; for perceptual energy, F(3, 44) = 20.432, p < .001. Note that data from the second male speaker was excluded from the two ANOVAs involving energy since his acoustic and perceptual energy values were sharply divergent from those of the other speakers. Table 15 summarizes the various phonetic parameters distinguishing (at p < .05 or less according to Scheffe posthoc tests) the vowels of Besemah. Schwa is differentiated from the other three vowels in duration, maximum intensity, and perceptual energy, a result that accords with the position of schwa

238

Matthew Gordon et al.

Table 15. Summary of vowels distinguished by different phonetic parameters in Besemah i u a DIP i DIP u DIP

lower on the phonological sonority scale than other vowels cross-linguistically, even though schwa is not distinguished from other vowels in terms of its ability to attract stress in Besemah itself. None of the three non-schwa vowels, however, are distinguished along any of the measured dimensions. 4.3. Armenian 4.3.1. Background Standard Western Armenian possesses six vowel phonemes, one of which is schwa (Vaux 1998b). Table 16. The vowels of Armenian Front Central Back High i u Mid E @ O Low a

Primary stress in most varieties of Armenian falls on the final syllable unless the final syllable contains schwa (4a), in which case stress retracts onto the penult (4b). There are no content words whose only vowels are schwa. Vaux (1998b) reports that secondary stress characteristically falls on the initial syllable. Examples of Armenian stress, repeated from (1), with secondary stress also marked, appear in (4). (4)

Armenian stress (Vaux 1998b:132) a. [­mO"rukh ] ‘beard’, [­ArtA"sukh ] ‘tears’, [­jErkrAkdrOnA"kAn] ‘geocentric’ b. ["mAn@r] ‘small’, [­jEr"ph Em@n] ‘sometimes’

Sonority and central vowels: A cross-linguistic phonetic study

239

4.3.2. Methodology The measured vowels for Armenian appeared in the penultimate (secondary stressed) syllable of a disyllabic word. The Armenian words were uttered five times in the carrier phrase A"sA ___ nO"rits ‘Say ___ again’. Each word was recorded on a Marantz PMD660 solidstate recorder as a .wav file using a unidirectional microphone (Shure SM10) at a sampling frequency of 44.1kHz. Data from two male speakers and two female speakers of Eastern Armenian were collected. Three of the four speakers were born in Yerevan, Armenia before emigrating to the United States as children, while one of the speakers was born in the United States of parents who speak the Eastern Armenian dialect. All of the speakers spoke Armenian as their first language. 4.3.3. Results Mean first formant values averaged across the male speakers and female speakers are shown in Figures 13 and 14, respectively. Individual speaker results appear in Table 17. Both the male and female speakers make a clear three-way height distinction between high vowels, mid vowels, including schwa, and a low vowel. For the male speakers, the high back and mid vowels are associated with slightly higher first formant values, i.e. a lower tongue body position, than their front mid counterparts, i.e. /u/ is slightly lower than /i/, and /O/ is somewhat lower than /E/, in keeping with a common cross-linguistic tendency (de Boer 2011) found earlier in the Besemah data. Graphs depicting results for the four measured correlates of sonority appear in Figure 7, duration in the top left, maximum intensity in the top right, acoustic

Figure 6. Mean first formant values averaged across two male (left) and two female (right) Armenian speakers. Whiskers indicate one standard deviation from the mean.

240

Matthew Gordon et al.

Figure 7. Duration (top left), maximum intensity (top right), acoustic energy (bottom left), and perceptual energy (bottom right) values averaged across four Armenian speakers. Whiskers indicate one standard deviation from the mean. Table 17. Mean first formant values (in Hz) for four Armenian speakers Vowel a E O i u @

N 10 6 9 10 10 19

F1 Mean Std.Dev. 790 21 601 50 575 39 417 16 455 71 678 80

N 9 8 10 10 10 24

Speaker F2 Mean Std.Dev. N 870 66 10 511 19 11 474 51 10 302 20 10 347 44 9 607 83 19

M1 Mean Std.Dev. 608 24 451 29 488 34 314 30 358 34 471 45

N 11 10 10 10 9 29

M2 Mean Std.Dev. 690 25 465 17 573 40 313 26 365 18 497 73

energy in the bottom left, and perceptual energy in the bottom right. Individual speaker values for each dimension are given in Tables 18–21. One-factor analyses of variance (ANOVA) conducted for each of the four phonetic parameters indicated a significant effect of vowel quality on all parameters: for duration, F(5, 277) = 21.775, p < .001; for maximum intensity, F(5, 277) = 16.309, p < .001; for acoustic energy, F(5, 277) = 22.865, p < .001; for perceptual energy, F(5, 277) = 20.296, p < .001.

Sonority and central vowels: A cross-linguistic phonetic study

241

Table 18. Mean duration values (in seconds) for four Armenian speakers Vowel a E O i u @

N 10 6 9 10 10 19

F1 Mean Std.Dev. 0.099 0.016 0.072 0.009 0.090 0.021 0.064 0.013 0.068 0.015 0.068 0.026

N 9 8 10 10 10 24

Speaker F2 Mean Std.Dev. N 0.090 0.021 10 0.078 0.011 11 0.092 0.029 10 0.055 0.007 10 0.061 0.027 9 0.067 0.015 19

M1 Mean Std.Dev. 0.077 0.017 0.064 0.009 0.070 0.018 0.055 0.019 0.062 0.010 0.044 0.013

N 11 10 10 10 9 29

M2 Mean Std.Dev. 0.076 0.028 0.064 0.006 0.067 0.017 0.043 0.014 0.045 0.014 0.045 0.017

Table 19. Mean maximum intensity values (in decibels) for four Armenian speakers Vowel a E O i u @

N 10 6 9 10 10 19

F1 Mean Std.Dev. 74.36 2.32 69.48 2.99 70.42 2.72 66.97 2.14 68.17 3.41 71.85 2.14

N 9 8 10 10 10 24

Speaker F2 Mean Std.Dev. N 73.86 3.11 10 74.39 2.45 11 76.51 2.65 10 71.34 2.24 10 73.16 1.33 9 74.77 2.79 19

M1 Mean Std.Dev. 70.54 2.34 67.16 2.71 67.40 2.85 61.28 4.00 63.17 1.02 66.93 3.20

N 11 10 10 10 9 29

M2 Mean Std.Dev. 74.35 4.11 72.08 3.49 70.95 2.24 64.95 4.90 63.21 4.17 69.67 4.82

Table 20. Mean acoustic energy values (in dB seconds) for four Armenian speakers Vowel a E O i u @

N 10 6 9 10 10 19

F1 Mean Std.Dev. 99881 19327 70810 18160 78161 17671 61045 10325 60238 10671 70084 21754

N 9 8 10 10 10 24

Speaker F2 Mean Std.Dev. N 96047 22997 10 83159 13331 11 95687 35052 10 54820 6621 10 58676 22705 9 68958 14328 19

M1 Mean Std.Dev. 74659 21545 59432 9865 65726 14752 45961 16927 51905 9761 43151 11438

N 11 10 10 10 9 29

M2 Mean Std.Dev. 78864 31529 68402 13132 64273 19634 40352 18176 38037 8187 47163 18995

Table 21. Mean perceptual energy values (in arbitrary units) for four Armenian speakers Speaker Vowel a E O i u @

N 10 6 9 10 10 19

F1 Mean 1.09E+6 6.65E+5 8.18E+5 5.35E+5 6.04E+5 7.63E+5

Std.Dev. 2.96E+5 1.78E+5 2.60E+5 9.03E+4 1.43E+5 2.19E+5

N 9 8 10 10 10 24

F2 Mean 1.17E+6 1.04E+6 1.30E+6 5.86E+5 7.05E+5 9.08E+5

Std.Dev. 4.22E+5 2.37E+5 6.24E+5 1.30E+5 2.36E+5 2.68E+5

N 10 11 10 10 9 19

M1 Mean Std.Dev. 8.04E+5 1.96E+5 5.92E+5 1.35E+5 6.13E+5 1.49E+5 4.13E+5 1.47E+5 4.60E+5 6.09E+4 4.22E+5 1.21E+5

N 11 10 10 10 9 29

M2 Mean Std.Dev. 1.08E+6 4.48E+5 7.31E+5 2.13E+5 7.52E+5 2.28E+5 4.38E+5 2.36E+5 3.66E+5 1.35E+5 5.44E+5 2.62E+5

Table 22 summarizes the various phonetic parameters distinguishing (at p < .05 or less according to Scheffe posthoc tests) the vowels of Armenian. The clearest distinction emerging overall is the bifurcation between the high vowels and schwa, on the one hand, and the peripheral mid vowels and /a/, on the other hand. The low vowel is also distinguished from the front mid vowel

242

Matthew Gordon et al.

Table 22. Summary of vowels distinguished by different phonetic menian i u a DAP DIAP DIAP DIAP IAP DIAP DIAP i u

parameters in Ar-

DIAP DA DAP I! I!

/E/ along three of the four measured dimensions. Maximum intensity draws a further distinction between the high vowels and schwa with schwa displaying greater peak intensity values than the two high vowels, a contradiction of phonological sonority scales placing schwa below high vowels. 4.4. Javanese 4.4.1. Background The standard Javanese of central Java is typically characterized as having six or eight vowel phonemes (Clynes and Rudyanto 1995, Horne 1974). In the six vowel system, which appears to characterize the speech of our consultants, [E] and [O] are in complementary distribution with [e] and [o], respectively. The lower allophone occurs when the following vowel is schwa or another mid vowel, or /i, u/ in an open syllable. The schwa occurs in pre-final syllables, both open and closed, but only in closed syllables word-finally. Table 23. Vowels of Javanese Front Central High i Mid e @ (E) Low a

Back u o (O)

Stress in Javanese falls on the penultimate syllable (5a) unless the penult has a schwa (Herrfurth 1964, Horne 1974), in which case, stress shifts to the final syllable (5b). (5)

Javanese stress a. ["pantun] ‘rice plant’, ["kates] ‘papaya’ b. [k@"tes] ‘slap’, [k@"tan] ‘sticky rice’, [j@n"t@n] ‘cumin, caraway seed’

Sonority and central vowels: A cross-linguistic phonetic study

243

4.4.2. Methodology The target vowels for Javanese appeared in the final (stressed) syllable of a disyllabic word containing a schwa in the penult. Each vowel appeared in a closed syllable in three words in the corpus. Each Javanese word was uttered four times in isolation and recorded using a high quality unidirectional microphone connected to a solidstate recorder. Data from two male speakers were collected, one speaker from Bojonegoro in East Java (at a sampling rate of 44.1kHz) and one speaker from Semarang in Central Java (recorded as part of a larger phonetic study of Javanese at a sampling rate of 22.05Hz). The sound files were stored as .wav audio files in preparation for analysis. 4.4.3. Results Figure 8 shows first formant values for the measured Javanese vowels averaged across the two speakers. Individual speaker means appear in Table 24.

Figure 8. First formant values averaged across two Javanese speakers. Whiskers indicate one standard deviation from the mean.

Table 24. Mean first formant values (in Hz) for two Javanese speakers Vowel a e o i u @

N 12 6 11 12 12 12

Speaker M1 Mean Std.Dev. N 681 21 12 607 25 9 628 19 12 465 33 12 497 14 12 525 44 12

M2 Mean Std.Dev. 671 28 616 14 586 30 480 14 490 24 558 42

244

Matthew Gordon et al.

First formant values distinguish five vowel heights with the two peripheral mid vowels /e, o/ being slightly lower (consistent with a transcription as /E, O/ rather than /e, o/) than schwa, which occupies the middle of the vowel space. Graphs depicting results for the four measured correlates of sonority appear in Figure 9, duration in the top left, maximum intensity in the top right, acoustic energy in the bottom left, and perceptual energy in the bottom right. Individual speaker values for each dimension are given in Tables 25–28.

Figure 9. Duration (top left), maximum intensity (top right), acoustic energy (bottom left), and perceptual energy (bottom right) values averaged across two Javanese speakers. Whiskers indicate one standard deviation from the mean.

One-factor analyses of variance (ANOVA) conducted for each of the four phonetic parameters indicated a significant effect of vowel quality on all parameters: for duration, F(5, 128) = 21.195, p < .001; for maximum intensity, F(5, 128) = 3.826, p = .003; for acoustic energy, F(5, 128) = 9.972, p < .001; for perceptual energy, F(5, 128) = 28.750, p < .001. Table 29 summarizes the phonetic dimensions that distinguish (at p < .05 or less according to Scheffe posthoc tests) the vowels of Javanese. The reduced prominence of schwa relative to other vowels is consistent with phonological scales placing schwa at the bottom of the sonority hierarchy

Sonority and central vowels: A cross-linguistic phonetic study

245

Table 25. Mean duration values (in seconds) for two Javanese speakers Speaker Vowel M1 M2 N Mean Std.Dev. N Mean Std.Dev. a 12 0.108 0.038 12 0.113 0.021 e 6 0.102 0.018 9 0.098 0.018 o 11 0.114 0.028 12 0.113 0.016 i 12 0.079 0.018 12 0.089 0.014 u 12 0.105 0.025 12 0.098 0.016 @ 12 0.061 0.018 12 0.060 0.010 Table 26. Mean maximum intensity values (in decibels) for two Javanese speakers Speaker Vowel M1 M2 N Mean Std.Dev. N Mean Std.Dev. a 12 78.97 2.52 12 76.84 3.23 e 6 78.71 1.81 9 77.00 2.04 o 11 79.43 2.09 12 77.47 2.53 i 12 77.87 1.93 12 76.48 1.85 u 12 76.18 1.73 12 75.38 3.54 @ 12 76.80 1.90 12 75.69 2.58 Table 27. Mean acoustic energy values (in decibel seconds) for two Javanese speakers Speaker Vowel M1 M2 N Mean Std.Dev. N Mean Std.Dev. a 12 6.77E+4 2.11E+4 12 1.21E+5 1.71E+4 e 6 6.76E+4 1.39E+4 9 1.05E+5 1.88E+4 o 11 7.25E+4 1.75E+4 12 1.18E+5 1.60E+4 i 12 5.24E+4 1.20E+4 12 9.30E+4 1.47E+4 u 12 6.41E+4 1.56E+4 12 9.90E+4 1.66E+4 @ 12 3.96E+4 1.17E+4 12 6.39E+4 9.79E+3 Table 28. Mean perceptual energy values (in arbitrary units) for two Javanese speakers Speaker Vowel M1 M2 N Mean Std.Dev. N Mean Std.Dev. a 12 1.50E+6 3.76E+5 12 1.61E+6 2.36E+5 e 6 1.18E+6 3.90E+5 9 1.45E+6 2.70E+5 o 11 1.35E+6 4.89E+5 12 1.43E+6 2.40E+5 i 12 7.94E+5 2.04E+5 12 1.17E+6 1.98E+5 u 12 9.24E+5 2.91E+5 12 1.12E+6 1.55E+5 @ 12 5.84E+5 2.27E+5 12 7.51E+5 1.60E+5

246

Matthew Gordon et al.

Table 29. Summary of vowels distinguished by different phonetic parameters in Javanese.

e a e o i u

o

i DAP P DAP

u P IP

DAP DAP DAP DP DAP

and also accords with the light status of schwa in the Javanese stress system. All pairwise comparisons involving schwa are distinguished through duration, acoustic energy, and perceptual energy, with the exception of the comparison of schwa with /i/, which is not differentiated through acoustic energy. The low and mid vowels are distinguished from the high vowels with the exception of /e/ and /u/. These distinctions, however, are manifested along different acoustic dimensions. Perceptual energy is the most reliable differentiator of high vowels from both low and mid vowels, failing only to distinguish /e/ and /u/. Duration and acoustic energy differentiate /i/ from both /a/ and /o/, while maximum intensity serves to separate /o/ from /u/. 4.5. Kw ak’w ala 4.5.1. Background The Kw ak’w ala vowel system can be characterized in terms of three “full” vowel phonemes distributed around the periphery of the vowel space: /i, a, u/, plus a shorter central vowel, schwa /@/ (Boas 1947, Grubb 1977). The relative height and/or backness of particular realizations within each of these phonemic vowel categories varies across dialects and speakers, generally reflecting a broad spectrum of co-articulatory effects with adjacent consonants. The distinction between schwa and the other full vowels plays a fundamental role in the locus of stress (Boas 1947, Bach 1975, Wilson 1986, Zec 1988, Shaw 2009). The basic generalization is that primary stress falls on the leftmost full vowel of a word (6a) and, in the absence of a full vowel anywhere in the word, on the rightmost schwa (6b). The dichotomous stress behavior of the “full” vowels vs. schwa is attenuated by the relative sonority of coda consonants. A schwa followed by a sonorant coda thus follows the same stress generalizations as syllables with a full vowel nucleus, and whichever is leftmost will receive primary stress (6c). Laryngealization in a coda (represented by an apostrophe in (6)), on the other hand, has a prominence-reduction effect. Syllables with a glottalized

Sonority and central vowels: A cross-linguistic phonetic study

247

resonant or a glottal stop in the coda (Shaw 2009), regardless of vowel quality, are thus skipped in the scan for the leftmost stressable syllable (6d). (6)

a. b. c. d.

["kw ak’wala] ‘Kwakwala’, [s@ "baju] ‘searchlight’, [b@q’@ì@"la] ‘sleepy, drowsy’ [ts@ "G@ì] ‘thimbleberry’, [ts@G@ì"m’@s] ‘thimbleberry plant’, [dz@ "Gw@d] ‘coal’ ["t’@mxw.m’@s] ‘wild gooseberry plant’, [ì@. "[email protected]] ‘red elderberry plant’ [g@l’. "dzud] ‘to crawl onto a flat thing’, [gw aP.s@. "la] ‘people of Smith’s Inlet’

4.5.2. Methodology The four vowels in Kw ak’w ala all appeared in a stressed final syllable in isolation. One male speaker of Kwak’w ala was recorded repeating each word five times. Recordings were made at a sampling rate of 48kHz using a Marantz PMD670 solidstate recorder via a desktop Audio-Technica AT-831b cardioid condenser microphone. 4.5.3. Results A one-way ANOVA indicates a significant effect of vowel quality on first formant values: F(3, 87) = 50.068, p < .001. Figure 10 shows first formant values for the measured Kw ak’wala vowel, followed by mean values in Table 30.

Figure 10. First formant values averaged across one male Kw ak’w ala speaker. Whiskers indicate one standard deviation from the mean.

248

Matthew Gordon et al.

Table 30. Mean first formant values (in Hz) for one Kw ak’w ala speaker Vowel a i u @

N 14 24 15 10

F1 Mean Std.Dev. 743 63 319 21 327 12 631 64

As the phonemic transcription of the four vowels suggests, first formant values confirm that there is a three-way height distinction with /a/ lowest in quality, /i/ and /u/ highest, and schwa intermediate in height. Graphs depicting results for the four measured correlates of sonority appear in Figure 11, duration in the top left, maximum intensity in the top right, acoustic energy in the bottom left, and perceptual energy in the bottom right. Individual speaker values for each dimension are given in Tables 31–34.

Figure 11. Duration (top left), maximum intensity (top right), acoustic energy (bottom left), and perceptual energy (bottom right) values averaged across one male Kw ak’w ala speaker. Whiskers indicate one standard deviation from the mean.

Sonority and central vowels: A cross-linguistic phonetic study

249

Table 31. Mean duration values (in seconds) for one Kw ak’w ala speaker Vowel a i u @

N 14 24 15 10

F1 Mean Std.Dev. .177 .039 .127 .043 .174 .033 .078 .017

Table 32. Mean maximum intensity values (in decibels) for one Kw ak’w ala speaker Vowel a i u @

N 14 24 15 10

F1 Mean Std.Dev. 61.81 7.63 62.86 2.54 62.08 3.09 56.94 5.12

Table 33. Mean acoustic energy values (in decibel seconds) for one Kw ak’w ala speaker Vowel a i u @

N 14 24 15 10

F1 Mean Std.Dev. 134559 46159 95414 29547 122239 23230 53109 11717

Table 34. Mean perceptual energy values (in arbitrary units) for one Kw ak’w ala speaker Vowel a i u @

N 14 24 15 10

F1 Mean 1204508 799776 1054565 906745

Std.Dev. 313873 240869 189172 323689

One-factor analyses of variance (ANOVA) conducted for each of the four phonetic parameters indicated a significant effect of vowel quality on all parameters: for duration, F(3, 59) = 19.376, p < .001; for maximum intensity, F(3, 59) = 3.981, p = .012; for acoustic energy, F(3, 59) = 15.908, p < .001; for perceptual energy, F(3, 59) = 20.331, p < .001. Table 35 summarizes the phonetic dimensions that distinguish (at p < .05 or less according to Scheffe posthoc tests) the vowels of Kwak’w ala.

250

Matthew Gordon et al.

Table 35. Summary of vowels distinguished by different phonetic parameters in Kw ak’w ala. i u a DAP DAP i DP DIAP u DAP

The clearest phonetic distinction is between schwa and the three full vowels, all of which are more prominent than schwa along at least three dimensions. Only maximum intensity fails to differentiate all three full vowels from schwa. Interestingly, the low vowel differs from /i/ (in all measures except maximum intensity) but not from /u/. On the other hand, /u/ is differentiated from its front high counterpart /i/ in both duration and perceptual energy, suggesting a sonority distinction, at least phonetically, between higher sonority /u/ and lower sonority /i/. 5.

Discussion

Comparison of the results across the five examined languages indicates a number of similarities as well as certain differences in the relative prominence of different vowel qualities along the various studied phonetic dimensions, as well as in the particular phonetic parameters used to differentiate vowels. Table 36 encapsulates the phonetic distinctions between different vowel qualities in the examined languages and the properties used to distinguish the vowels. To facilitate comparison across the five languages, all of which except Hindi do not make tense vs. lax distinctions, the tense vowels but not the lax vowels of Hindi are included in the table and the mid vowels of all other languages are represented as /e, o/ regardless of their phonetic height within the mid vowel subspace, i.e. whether they are phonetically /e/ or /E/ and /o/ or /O/. Sonority reversals in which a lower sonority vowel according to phonological scales has greater prominence along a given dimension are represented with “!” after the relevant phonetic parameter. Phonemic vowel pairs that are not differentiated phonetically along the measured dimensions in a given language are indicated by Ø. Light shaded cells occur at the intersection of contrasts that do not occur in a given language.

Sonority and central vowels: A cross-linguistic phonetic study

251

Table 36. Summary of the phonetic distinctions between vowels in five languages

a

e

o

i

u

Hindi Besemah Armenian Javanese Kw ak’w ala Hindi Besemah Armenian Javanese Kw ak’w ala Hindi Besemah Armenian Javanese Kw ak’w ala Hindi Besemah Armenian Javanese Kw ak’w ala Hindi Besemah Armenian Javanese Kw ak’w ala

e DAP Ø DAP Ø

o IAP Ø Ø Ø Ø Ø Ø Ø

i DIAP Ø DIAP DAP DAP Ø

u DIAP Ø DIAP P D I

DIAP DIP DIAP DAP DAP DA

DIAP P

DIAP Ø

DA DAP

Ø

I

DIA

DIAP DAP

DIAP IP

DAP DAP

Ø Ø Ø Ø DP

Ø DIP I! DP DIAP I! DIP I! DAP DAP

5.1. The universality of the link between phonetic prominence and phonological sonority As the table shows, the vowel that is most consistently distinguished phonetically from all other vowels along at least one of the measured dimensions is schwa. The only pairwise comparison involving schwa that is not manifested phonetically is the distinction between schwa and /i/ in Hindi. Nevertheless, although schwa is nearly universally differentiated from other vowels, the phonetic dimension(s) along which these distinctions are expressed differ between languages and even within languages between vowels paired with schwa. The low vowel is differentiated in 11 of 18 pairwise comparisons involving vowels other than schwa. Of these 11 successful comparisons, 8 involve /a/ and the high vowels, which differ along at least one of the measured dimensions in four of the five languages (the exception being Besemah). The low vowel is less reliably distinguished (3 of 8 pairwise comparisons) from the mid vowels

252

Matthew Gordon et al.

in four of the five languages (the exception being Kw ak’w ala) with mid vowels. Mid and high vowels are distinguished in 9 of 12 pairwise comparisons. Of the 9 distinctions, three entail a single distinguishing phonetic parameter. By comparison, 9 of the 11 pairwise distinctions involving a low vowel are conveyed by at least three phonetic properties. Results point to uniformity among classes of vowels sharing height features. The two mid vowels are thus not phonetically differentiated along the measured dimensions in any of the languages. Furthermore, the two high vowels are only distinguished in prominence in one of the five languages, Kw ak’w ala. 5.2. Language specificity in the phonological sonority of schwa and its phonetic properties In the current data, differences between languages in the phonological status of schwa in the stress system do not correlate with interlanguage variation in the phonetic prominence of schwa relative to other vowels. Rather, the dominant pattern is for schwa to be phonetically less prominent than other vowels regardless of its phonological behavior. Of the three languages with mid vowels, both the one in which schwa behaves parallel to other vowels with respect to stress placement, Hindi, and the two in which schwa rejects stress, Javanese and Armenian, have a schwa that is phonetically weaker than both mid vowels along at least two phonetic dimensions. The phonetic strength of the high vowels relative to schwa is not as consistent across languages, but this variation is not predictable from the phonological behavior of schwa. In one language with a schwa that attracts stress, Besemah, and in two languages with a schwa that rejects stress, Javanese and Kw ak’wala, high vowels are more prominent than schwa according to at least two phonetic parameters. In Hindi, which treats schwa like other vowels for stress, /i/ is not more prominent than schwa and /u/ is actually less intense than schwa. Perhaps more surprisingly, in Armenian, which avoids stress on schwa, there is no phonetic property among those measured that predicts the light status of schwa relative to the two high vowels. In fact, schwa has greater maximum intensity than both high vowels in contradiction of the sonority hierarchy. The failure of the measured parameters to distinguish schwa from the high vowels in Armenian in the correct direction raises questions about Gordon’s (2002, 2006) hypothesis that phonological weight distinctions are predictable from acoustic properties or from perceptual properties ultimately derived from the acoustic signal via auditory transforms. The present work suggests that it might be necessary to explore an alternative hypothesis that syllable weight, and perhaps more generally, sonority, is at least partially grounded in

Sonority and central vowels: A cross-linguistic phonetic study

253

speech production. Under this view, the reduced sonority of schwa could be to some degree attributed to the proximity of schwa to the tongue’s rest position, an articulatory setting that would require less physical effort to achieve than more peripheral vowel qualities. A potential complication for this effort-based approach to sonority is the fact that high vowels may require greater tongue displacement from the rest position than low vowels, even though high vowels rank lower in phonological sonority than low vowels. It is conceivable that a measure of effort that penalizes jaw movement more than tongue movement due to the greater mass of the jaw could be invoked to account for the greater sonority of low vowels relative to high vowels (see Mooshammer, Hode and Geumann 2007 for an overview of articulatory studies of jaw height). Appealing to effort-based considerations, of course, predicts that sonority could be contextsensitive. For example, one might expect high vowels to be less sonorous than low vowels in coronal and velar contexts where less movement is required to produce a high vowel than in bilabial contexts, where a greater articulatory excursion is necessary to produce a low vowel than a high vowel. 5.3. Assessing the robustness of different phonetic correlates as a predictor of sonority Excluding cells representing the intersection of vowels of equivalent phonological height, there are 51 possible pairwise comparisons of vowels in the five examined languages. Of these 51 pairs, duration and perceptual energy each distinguish 32, acoustic energy distinguishes 27, and maximum intensity differentiates 22. Interestingly, of the 22 distinctions made by maximum intensity, three are cases in which schwa, a vowel of low phonological sonority, has greater intensity than a vowel ranking higher in phonological sonority, /i/ and /u/ in Hindi and /i/ in Armenian. (It may be noted that an additional reversal involving schwa and the high back lax vowel based on acoustic energy was found in Hindi but is not included in Table 36, which excludes the lax vowels of Hindi.) It thus appears that the close link between maximum intensity and vocal tract aperture is adept at predicting phonological sonority distinctions based on vowel height, but is less successful in predicting sonority differences between peripheral and central vowels. There is, however, no other single parameter that adequately predicts the sonority distinction between peripheral vowels and schwa. Duration is the most consistent property differentiating schwa from other vowels, being used in 18 of 21 total vowel distinctions involving schwa across the five languages. Yet, as mentioned above, duration fails to distinguish schwa from either /i/ or /u/ in Armenian, even though schwa is demonstrably lower in sonority than the

254

Matthew Gordon et al.

high vowels in Armenian on the basis of its stress system. Perceptual energy and acoustic energy differentiate only 14 and 13, respectively, of the 21 total vowel pairs involving schwa and their success is likely attributed in large part to the fact that both measures are integrated over time and thus receive a boost in longer vowels. In summary, the failure of any single phonetic property to accurately predict all aspects of sonority scales for vowels suggests that phonological sonority may not be quantifiable along any single dimension, at least in the case of vowels, but rather may reflect some weighted aggregate of multiple phonetic factors, a view espoused by Ohala and colleagues in earlier work (Ohala 1990, Ohala and Kawasaki-Fukumori 1997). The relevant parameters for predicting sonority may thus be multidimensional in nature, encompassing some, as yet undiscovered, combination of acoustic, perceptual, and articulatory properties. 6.

Conclusions

The primary goal of this paper was to explore the hypothesis that the phonological status of schwa as a lower sonority vowel than peripheral vowels is predictable on phonetic grounds. Results from five languages with schwa indicate that, although sonority distinctions in individual languages are typically predicted by at least one phonetic parameter, there is no single parameter that predicted sonority distinctions across all languages. Although schwa is characteristically less prominent than other vowel qualities along multiple phonetic dimensions, there emerged several instances in which schwa not only failed to display reduced prominence relative to other vowels, but actually was characterized by greater prominence. Thus, in two languages (Armenian and Hindi), schwa was associated with greater peak intensity values than at least one of the peripheral vowels. Most strikingly, one of these sonority reversals even occurred in a language (Armenian) that treats schwa as phonologically lighter than peripheral vowels in its stress system. Although appealing to a property other than maximum intensity as a correlate of sonority eliminates instances of reversals in the data, there is still no single phonetic property that correctly predicts the lower sonority of schwa in all languages. The dimensions that are most successful cross-linguistically in our data set, duration and perceptual energy, do not distinguish schwa from high vowels in Armenian. Furthermore, these two properties are only partially successful in making sonority distinctions between the peripheral vowels. The present results thus underscore the challenges confronted by any model of the phonetics-phonology interface that posits a single phonetic dimension underlying phonological sonority.

Sonority and central vowels: A cross-linguistic phonetic study

255

Appendix: Corpora (target vowels in bold) Hindi "m@nI "g@rd@n "wESja "pEnS@n "mIl@ "bIrla "bOn@ "kOmtSa "sOnO "kursi "lan@ "sarda "len@ "keln@ "nil@ "kirt@n "son@ "Sorb@ "sun@ "murti Armenian ­gA"tA ­hA"sAk ­gE"tAk ­hE"sAn ­go"ti ­hO"sAnk ­gi"tAk ­hi"sun ­gu"th ­hu"sAl ­g@"tAkh ­h@"Ki

kind of pastry age small river grindstone belt stream connoisseur fifty an plough to hope you found pregnant

Besemah pa"tah ta"tap pi"tuN ti"tu pu"tih tu"tus p@"taN t@"tak

snap touch hold this white strike evening cut

jewel neck prostitute pension obtain rare dwarf skimmer listen chair bring goddess take play blue mentioning, praising sleep, gold broth lonely statue

Javanese k@"tan k@"tat @n"tas j@n"t@n b@n"t@t m@n"t@s @m"pet k@"ten g@"ten g@n"tos b@"ton j@"pot g@"pit k@"tis s@"tin k@n"tut g@"tun b@n"tus

sticky rice tight, constricting bring in out of the rain cumin seed, caraway seed exact amount, completely filled well filled out crowded agile hardworking, industrious change, replacement pit of a jackfruit go away to avoid conversation squeeze (of voice) sweet and clear satin fart remorseful bump

K wak’ wala s@"bas reply, echo b@s"bas to eat biscuits d@"nas inner bark of red-cedar g@"pud to unbutton, unwedge something g@p"stud to tuck in, to stuff up a hole; to plug it up @p"sut the opposite side g@"p’id to button up, to tip, slip money (etc.) to someone s@"biì sun rays striking floor b@ χ "sis to lance the foot m@"ìik salmon, sockeye @"nis aunt n@p"b@s always throwing rocks n@"p’@p hair on chest

256

Matthew Gordon et al.

Acknowledgments The authors gratefully acknowledge the many speakers providing the recordings discussed in this paper. We thank the several Kwak’w ala consultants (in particular, the late Lorraine Hunt, Beverly Lagis, Margaret Pu’tsa Hunt, Chief Robert Joseph, Daisy Sewid-Smith, and Pauline Alfred) contributing the Kw ak’wala forms and generalizations about stress, many of which appear for the first time in print here. We would also like to thank Universitas Sriwijaya for their support in collecting the Besemah data as well as many Besemah speakers from the village of Karang Tanding for volunteering to take part in this research. We also extend our gratitude to Marc Garellek for assisting in recording one of the Javanese consultants. Furthermore, we acknowledge the generous financial support of NSF grant BCS0343981 to Matthew Gordon, SSHRC grant Kan’s kwak’wale’ xan’s yak’anda’s! Let’s keep our language alive! to Patricia A. Shaw (in partnership with U’mista Cultural Center (the late Andrea Sanborn, Director), Lilawagila School, Kingcome Inlet (Mike Willie), ’Namgis First Nation (Chief William T. Cranmer), and T’lisalagi’lakw School, Alert Bay), as well as the financial, intellectual and institutional support of the 2008 InField Institute held at the University of California at Santa Barbara and directed by Carol Genetti. Finally, thanks to Steve Parker and two anonymous reviewers for their many constructive and useful comments on an earlier draft of this paper.

Sonority and the larynx Brett Miller

Abstract. Most of the phonological patterns conventionally grouped under the term sonority can be explained by perceptual factors that promote the sequencing of segment classes in certain orders. This chapter explores the production mechanisms behind the relevant acoustic cues. Two scalar primitives, sound source and vocal tract aperture, synergize to motivate the sonority classes most solidly established by the phonological evidence. Pairs of sound types to which the two scales would assign opposite rankings appear to be mutually unranked in sonority. This claim is supported with a variety of evidence, largely relating to the phonological behavior of implosive and glottal consonants. Sonority itself is identified as a useful term for the role played in segment-sequencing patterns by the relative intrinsic perceptibility of natural classes.

1.

Phonological evidence for sonority

When segment strings are assigned contours by giving each segment a higher or lower relative value according to the scale in (1), several patterns are reportedly consistent across the world’s languages, as exemplified in (2–5). The property defining the scale and seen in the patterns is conventionally called sonority. Various further divisions of the scale have been proposed, some of which are discussed below; for extensive reviews of the literature and phonological evidence see Parker (2002, 2011). (1)

Basic sonority scale obstruents < nasals < liquids and glides < vowels

(2)

Sonority Sequencing Principle Syllable nuclei are consistently associated with sonority contour peaks in one-to-one fashion, with the qualification that for this procedure, some languages ignore peaks below a certain threshold (cf. Samuels 2009: 119– 121).

(3)

Sonority Dispersion Principle The rising slope in a syllable’s sonority contour tends to be steep while the falling slope tends to be minimal (Parker 2002: 18). In other words, lowsonority onsets and high-sonority rhymes are cross-linguistically more common.

258

Brett Miller

(4)

Accent by sonority If word accent is sensitive to sonority, it either selects the highest peak in a word or is limited to peaks above a certain sonority threshold (Parker 2011 section 2.4).

(5)

Weight by sonority If coda moraicity is sensitive to sonority, then codas are moraic above a certain sonority threshold (Parker 2011 section 2.4).

2.

Basis and status of sonority

Attributing the sonority scale to innate formal constructs in the universal human cognitive endowment is redundant if the scale can be motivated from other, independently evident factors such as phonetic forces. Perhaps the most commonly cited phonetic correlate is vocal tract aperture or some closely related notion of “openness” (see Parker 2002: 43–49), but this is hard to quantify given the complex shape of the vocal tract (section 7) and makes unclear or incorrect predictions about the relative sonority of glottal consonants and implosives (sections 5.3–5.4). Of several proposed acoustic and aerodynamic correlates for sonority, the strongest contender seems to be acoustic intensity (Parker 2002, 2008). Yet even with this correlate, Parker found many local mismatches between a ranking of segments by average intensity on one hand and phonologically defensible scales of segmental sonority on the other (not to mention cross-linguistic differences in the intensity rankings; see also Jany et al. 2007). Another pervasive problem with proposals to motivate sonority from single phonetic correlates is that the correlates are typically too gradient to explain the locations of the boundaries that define the classes in the sonority scale. Perception-driven explanations like those of Henke, Kaisse and Wright (this volume) address these problems by motivating the sonority scale as a segment sequencing principle that maximizes the hearer’s ability to identify the segments in an utterance. The present chapter builds on cue-based accounts of the sonority of more common segment types (Henke, Kaisse and Wright this volume; Kaisse 2011; Steriade 2001, 2009; Wright 2004) by exploring the articulatory interactions underlying the cues and expanding the picture to include less common types of sounds. Perceptually driven explanations cover much more ground, of course, than is traditionally staked out for sonority. The question of why we may still wish to classify some perceptually driven phenomena under the heading of sonority is explored at the end of this chapter (section 8).

Sonority and the larynx

3.

259

Articulatory primitives underlying sonority

Context-dependent perceptual factors are determined partly by the intrinsic or context-free perceptual properties of the segments involved and partly by the properties of the transitions between them. Transitions are addressed below (sections 6, 8), but the bulk of this chapter focuses on the intrinsic perceptual properties of segments. As revealed in detailed acoustic treatments of the diverse array of phonologically significant acoustic cues (e.g. Stevens 1999; Wright 2004), two major factors determining these properties are (a) the nature of the sound source minus (b) attenuation from impedance higher in the vocal tract. These two variables each have a range of possible values, giving us two scales which we will refer to as the sound source scale and the aperture scale. Aperture here means the inverse of the degree of impedance in the vocal tract between the sound source and the hearer. 3.1. Source scale Speech sound sources include the periodic (regular, repeating) movement of the vocal folds, which generates a sound’s fundamental frequency, perceived as pitch. We will refer to this as the periodic glottal source (PS). Non-periodic sound sources include turbulent and release sources. Turbulent sources (TS) are vocal tract configurations that present enough of an obstacle to the airflow to cause noise. One turbulent source is the shape of the glottis and surrounding laryngeal muscles when the glottis is abducted, which we will refer to as the glottal source (GS); another turbulent source is the shape of a supraglottal constriction sufficient to generate frication, which we will call the fricative source (FS). A transient or release source (RS) is an act of releasing a constriction in such a way as to generate an audible burst of air that had built up behind the constriction. Finally, the closure portion of voiceless stops involves a negligible level of sound exhibiting no systematic pattern, so we will distinguish this state of virtual silence as having no sound source (NS). The properties of these sources have been richly studied; a good starting point is Stevens (1999: 464–470) who discusses the periodic and turbulent glottal sources, frication, and the transient release source. Trills are also classified as periodic sound sources, though they vibrate slowly enough that their cycles are perceived as a succession of events rather than a single continuous sound (Brosnahan and Malmberg 1976: 54; compare the frequency of most trills at 20–35 Hz with the frequency range for voice, from around 40 Hz for creaky voice to 1000 Hz for a soprano or child’s voice: Shadle 1999: 50). Due to this intermittent quality as well as the fact that the very low frequency produced

260

Brett Miller

by actual trilling is always combined with some other sound source (voice, or glottal turbulence in the case of breathy trills), we will not distinguish trills in the discussion of sound sources below. Another range of fine distinctions that we will largely ignore is seen in the complexity of fricative place articulations, which may create multiple sound sources at the same time (both the teeth and the tongue constriction play this role in sibilants, for example: Evers, Reetz and Lahiri 1998: 353) and which can be produced in a diverse range of frequencies (see Fujimura and Erickson 1997: 76; Howe and Fulop 2005; Shadle 1999: 47). For the most part we will subsume these distinctions under the more general fricative category in our discussion of sound sources, though Henke, Kaisse and Wright (this volume) discuss phonotactic environments in which the higher-frequency sibilants are typologically preferred over other fricatives. The relative acoustic salience of different sound sources can be estimated by examining the average acoustic intensity of sounds where individual sources operate nearly in isolation from other sound sources and with minimal localized impedance between the source and the hearer. Sounds that serve this purpose include modally voiced vowels for the periodic glottal source (PS), h for glottal turbulence (GS), and voiceless oral fricatives for frication (FS). Since the presence or absence of glottal turbulence in fricatives appears difficult to perceive and not phonologically significant (Stuart-Smith 2004: 206; Vaux 1998a: 509), we will categorize the sound source of voiceless fricatives simply as turbulence (TS), regardless of how much glottal turbulence is present in addition to supralaryngeal frication. The question of the status of RS will be deferred (section 6). Parker’s experimental results (2002: 114–133, 2008: 85–87) consistently indicate that PS has higher acoustic intensity than TS when one compares vowels with oral fricatives and h (following the rationale just described). We will therefore adopt the working hypothesis that voicing has more perceptual salience than fricative or glottal turbulence when there is minimum impedance between the sound source and the hearer. Further informal support for this idea comes from the fact that people are more likely to use their vocal folds than to make an h or s when they want to make a loud sound, as in shouting or whistling for attention. Finally, the combination of PS and GS in breathy voice does not sum the perceptual salience of the two sound sources, because their articulatory and aerodynamic properties mitigate one another: there is less turbulence due to lower airflow in H than in h, but there is also weaker voicing due to slower and/or more limited movement of the vocal folds than in modal voicing (on the phonetics of voice with glottal turbulence, i.e. breathy and slack voice, see Brunelle 2010; Stuart-Smith 2004). Acoustically, breathy vowels have lower

Sonority and the larynx

261

intensity than modal vowels (Gordon 1998: 94). Since breathy voice only seems to be perceived and phonologically functional when it is phonetically realized during sonorants, not during stop closure (Ladefoged 2001: 124; cf. Kagaya and Hirose 1975) and not during fricatives (see above), we do not have to compare the relative perceptibility of breathy voice and frication as independent sound sources. These considerations motivate the following scale of sound source perceptibility: (6)

Source scale no source (NS) < turbulence only (TS) < breathy voice (PS+GS) < modal voice (PS)

3.2. Aperture scale The aperture parameter is relatively familiar, though some of the more finelygraded proposals are difficult to support experimentally (see section 7). A relatively coarse scale is presented in (7). Our concern is the degree to which articulatory impedance attenuates the sound wave transmitted from a sound source further back in the vocal tract. (7)

Aperture scale explosive stops < voiced implosives < fricatives < nasals < liquids and glides < vowels

At one end of the scale we have stops, which involve maximum impedance of the airflow. In the case of voiced stops, this is airflow traveling up from the periodic glottal source; in the case of voiceless stops there is no systematic sound source during closure, which simply means that the impedance has blocked virtually all sound. Voiced implosives tend to have amplitude increase during the latter half of their closure – the opposite of what is seen in voiced explosive stops. This amplitude difference has been linked to faster transglottal airflow in implosives caused by cavity-resizing gestures (Cun 2005: 3; Ladefoged and Maddieson 1996: 84; Zeng 2008: 400) which help stave off oral air pressure buildup that would otherwise eventually cause voicing failure (section 5.3). Even the syllabic unreleased implosive reported in the Bantu language Hendo is shown with greater amplitude than that of a following voiced explosive stop, though in this unusual preconsonantal environment the implosive has amplitude decrease leading into the following consonant (Demolin, Ngonga-Ke-Mbembe and Soquet 2002: 4–8, 14; see section 5.3). Another difference with voiced ex-

262

Brett Miller

plosive stops is that voiced implosives tend to raise the pitch of a following vowel (Odden 2007: 67–69). With respect to rising amplitude and possibly also higher pitch, then, voiced implosives appear to be acoustically more salient than voiced explosive stops. We will explore voiced implosives in greater detail later; for now, they are provisionally placed second in the aperture scale, with the idea that the amplitude increase resulting from their cavity-enlarging gestures is comparable to the higher amplitude typical of larger aperture in general, as we will see in the rest of this section. In the case of implosives with significant ingressive airflow at the moment of release, this burst of air might be considered a separate release source comparable to the egressive release burst of explosive stops. Reasons for not including release sources in the sound source scale have already been discussed, and as we will see later, the degree of ingressive airflow in the phonological category conventionally called implosives can be negligible or zero. After stops come fricatives, where the air is not completely blocked but is still so constricted in its escape that congested turbulence, or frication, results at the oral constriction point. This constriction thus considerably attenuates any sound transmitted by a lower source. Fricative constriction simultaneously constitutes a noise source of its own, but we accounted for this separately in the sound source scale above. Next we have nasals (m n etc.). In voiced nasals, the periodic sound source is significantly attenuated by dispersal through the sinus cavity, a complex maze of membranes which the air must navigate before any of it can reach the relatively small openings of the nostrils (see Mayo 2011). In spite of this attenuation, the nasal air egress path is not normally constricted enough at any point to cause a significant pressure increase comparable to that of obstruents; this allows nasals to transmit important acoustic cues like formants relatively well, albeit at reduced amplitude compared to liquids and glides (cf. Cser 2003: 33). Liquids and glides occupy the next part of the scale. Finally we have vowels, which are generally even less constricted, and higher in intensity, than liquids and glides (section 7). For more on the articulatory and aerodynamic traits of these classes of sounds, see Kenstowicz (1994: 36–37) and Wright (2004). 3.3. Laryngeal constriction Laryngeal and epilaryngeal constrictions play different roles in noise source and aperture manipulation, some of which challenge traditional phonetic categories (Moisik and Esling 2011). For example, since laryngeal constriction and glottal spreading are controlled by different muscles, they are not mutually exclusive but can combine to generate hoarse whisper (Hirose 1997: 133), in contrast to

Sonority and the larynx

263

ordinary [h] or [H] which in their simplest form lack any constriction different from those of adjacent vowels (Ladefoged and Maddieson 1996: 325–326). Of special phonological interest is creaky voice, where laryngeal constriction reduces the amplitude of the first harmonic relative to modal voice. Noting this, Stevens and Keyser (1989: 93) regard creaky voice as having phonetically less “sonorancy” than modal voice. Buckley (1992, 1994: 36–56) presents data suggesting that creaky voice is also phonologically less sonorous than modal voice. In Kwakw’ala (Wakashan: Canada, cf. Ladefoged and Maddieson 1996: 378), a coda consisting of a non-glottalized sonorant is moraic while one consisting of an obstruent or /n / or /l / is not. In Kashaya (Pomoan: Northern California), onset /m n / surface˜ as [b˜d], and /n/ behaves like coronal obstruents in ˜ ˜ a coronal obstruent,˜while non-glottalized nasals do not debuccalizing before show these behaviors. This naturally provokes the question of whether creaky-voiced vocalic segments are also less sonorous than plain voiced ones. The classic proof would be where both types of sound occur adjacently and the former is assigned to a syllable margin while the latter is nuclear. Ladefoged and Maddieson (1996: 76–77) discuss an orally placeless consonant with both voicing and laryngeal constriction that occurs in Gimi (Papua New Guinea), where it contrasts with /H/ and /h/ (the latter realized as [H] between vowels). They provisionally symbolize the sound as /*/ and give four words to illustrate the contrasts, adapted in (8): (8)

Gimi creaky-voiced glottals a. /rahoP/ [ôaHoP] no gloss b. [haPo] ‘shut’ c. [ha*oP] ‘many’ d. [hao] ‘hit’

Ladefoged and Maddieson (1996: 76) suggest calling /*/ “a creaky voiced glottal approximant” (glide). This appears to be the marginal segment we are looking for, the creaky-voiced counterpart of [H]. Since the latter could also be phonetically symbolized as [V ] (a non-syllabic breathy vowel), we may tentatively ¨“ notate the Gimi creaky-voiced glide as [V ], a clumsy but clearer alternative to ˜“ the asterisk. Ladefoged and Maddieson (1996: 75) report a sound similar to this occurring as the realization of /P/ in Lebanese Arabic. Similar realizations are reported in Newcastle English for what is elsewhere normally transcribed as [P] (Docherty and Foulkes 1999). Creaky voice contrasting with a true glottal stop is also reported in some Northeast Caucasian languages (Moisik and Esling 2011: 1408).

264

Brett Miller

It is unclear whether creaky voice and hoarse whisper are better treated as sound sources or as laryngeally impeded relatives of voice and breath, so I have not included laryngeally constricted categories in either the source or aperture scales. The phonetic and phonological evidence for creaky-voiced sonorants being less sonorous than modally voiced ones will be taken into account in the following section. Creaky voice will be mentioned again later when we return to implosives (section 5.3). 4.

Complex sonority hierarchy

Combining the sound source and aperture scales produces a more complex hierarchy of sonority, or the intrinsic relative perceptibility of sound classes. The source and aperture scales are reproduced in (9–10) with internal numbering, ascending in the direction of increasing perceptibility: (9)

Source scale: 1) no source, 2) turbulence only, 3) breathy voice, 4) modal voice

(10) Aperture scale: 1) explosive stops, 2) voiced implosives, 3) fricatives, 4) nasals, 5) liquids and glides, 6) vowels The combination of these two scales is mapped in (11). The following cover symbols are used: T voiceless explosive stop, D voiced explosive stop, D < voiced implosive stop, S voiceless fricative, Z voiced fricative, N nasal, L liquid, V vowel. Glides specified with their own supraglottal constriction are included with liquids, while glides not specified with any constriction (h H as already discussed) are included with vowels. The categories N, L, and V are modally voiced unless marked otherwise (e.g. N N). After each cover symbol, ˚ ¨ are given in parentheses; the source and aperture scale values shown in (9–10) for example, the highest entries in the two scales, 4) modal voice and 6) vowels, generate the modally voiced vowel entry V (4,6). Laryngeally constricted sonorants are unrated due to peculiarities discussed in the previous section. Association lines connect pairs of sound types where one member of the pair ranks higher than the other on at least one of the two scales in (9–10) and ranks lower on neither scale. For each pair of sound types connected by an association line, the line descends from the less sonorous sound type to the more sonorous one.

Sonority and the larynx

265

(11) Complex sonority hierarchy T (1,1) D (4,1) D< (4,2)

S (2,3)

Z (3/4,3)

N (2,4) ˚

N (3,4) ¨ N ˜

N (4,4)

L (3,5) ¨

L ˜ L (4,5) V ˜“

L (2,5) ˚ h/V (2,6) ˚

H/V (3,6) ¨

V (4,6) Most of the rankings follow a simple scheme: diagonal relationships from the upper right to the lower left show increasing sonority of sound source, from voiceless through breathy to modal voice (or in the case of stops, from voiceless through voiced to non-explosive closures). Diagonal relationships from the upper left to the lower right show increasing aperture from stops through fricatives, nasals, liquids, and vowels. In the lower left region of the hierarchy, a third dimension incorporates creaky voice in tune with the previous section. This aspect of the hierarchy may be incomplete; internal ranking by aperture among creaky-voiced sonorants (N < L < V) and rankings between these sounds and ˜ been ˜ shown ˜“ less sonorous ones have not due to lack of phonological evidence. Unlike the other sound types, voiced fricatives are listed as having two possible source values (breathy or modal) because glottal abduction is considered possible but not distinctive in this case (section 3.1). The ranking between phonetically breathy-voiced fricatives and breathy-voiced nasals is the one shown by the association line; modally voiced fricatives and breathy-voiced nasals are unranked. Several types of sounds are formally possible given the source and aperture scales in (9–10) but are ruled out by additional phonetic factors. When the sound source value is 1 (no source), the aperture value cannot be higher than 1 (stop)

266

Brett Miller

because the only way to create silence without ceasing the pulmonic activity that underlies normal speech is to stop the airflow. When the source value is 2 (turbulence only), the aperture value cannot be lower than 3 (fricative) because turbulence does not seem perceptible enough during stop closure to serve any phonological purpose, as already mentioned. The multi-dimensionality of (11) has some antecedents in the literature, discussed by Parker (2002: 79–84). The “strength hierarchies” of Lass (1984: 178) are two interacting scales: an aperture scale of stops < affricates < fricatives < vowels and /h/, and a “sonorization” scale with two values, voiceless and voiced. Lenition processes change inputs to outputs that rank higher in either aperture or sonorization, while fortition processes do the reverse. Gnanadesikan (1997) posits three ternary scales: vowel height (high, mid, low), stricture (stop, fricative/liquid, vocalic), and “inherent voicing” (voiceless obstruent, voiced obstruent, sonorant). She too focuses largely on applying her model to elucidating lenition processes, especially in the Celtic languages. The two parameters of voicing and oral aperture have long been objects of attention in Celtic linguistics, where different mutations affect the features [voice], [continuant], and [consonantal] (see Green 1997, 2003b). Neither Lass nor Gnanadesikan’s model is formally presented as a sonority scale, and the basic intuition of deriving sonority from multiple potentially conflicting parameters – hardly a striking idea, given the complexity of the vocal tract – occurred independently in the genesis of this chapter. Two aspects of this proposal which may be innovative are the formal consideration of the multiple sound sources provided by the vocal tract and the concept of clashing parameters causing formally unranked sonority relationships. 5.

Unranked pairs

5.1. Definition The complex sonority hierarchy in (11) was constructed using a simple principle: if x and y are types of segments, then x is more sonorous than y if x outranks y on the source or aperture scale (or both) and is outranked by y on neither. Where x outranks y on one of the two scales and is outranked by y on the other, they are mutually unranked in sonority. The sonority hierarchy in (11) contains the following unranked pairs of sound types. The order in which the segments are listed in each pair does not matter. (12) Unranked pairs DS, DN, SD