Phonetic Transcription in Theory and Practice 9780748691012

The first book-length monograph to address all aspects of phonetic transcription The aim of phonetic transcription is t

230 79 9MB

English Pages 336 [328] Year 2013

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Phonetic Transcription in Theory and Practice
 9780748691012

Citation preview

Phonetic Transcription in Theory and Practice

e

Phonetic Transcription in Theory and Practice e Barry Heselwood

© Barry Heselwood, 2013 Edinburgh University Press Ltd 22 George Square, Edinburgh EH8 9LF www.euppublishing.com Typeset in Times by Servis Filmsetting Ltd, Stockport, Cheshire, and printed and bound in Great Britain by CPI Group (UK) Ltd, Croydon CR0 4YY A CIP record for this book is available from the British Library ISBN 978 0 7486 4073 7 (hardback) ISBN 978 0 7486 9101 2 (webready PDF) ISBN 978 0 7486 9102 9 (epub) The right of Barry Heselwood to be identified as author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

Contents

e

List of Tablesix List of Figuresx Prefacexiii Acknowledgementsxv

Introduction1

1 Theoretical Preliminaries to Phonetic Notation and Transcription5 1.0 Introduction 5 1.1 Phonetic Transcription and Spelling 5 1.1.1 Logography and phonography 6 1.1.2 Sound–spelling correspondence 6 1.1.3 Speech, writing and the linguistic sign 9 1.1.4 Spoken and written languages as translation equivalents14 1.2 Phonetic Symbols and Speech Sounds 15 1.2.1 Speech sounds as discrete segments 15 1.2.2 Complexity of speech sounds 18 1.2.3 Speech sounds vs. analysis of speech sounds 19 1.3 Phonetic Notation, General Phonetic Models and the Role of Phonetic Theory 20 1.3.1 Phonetic transcription as descriptive phonetic models 24 1.3.2 Phonetic transcription as data reduction-­by-­analysis 25 1.4 Content of Phonetic Models 26 1.5 Respelling as Pseudo-­Phonetic Transcription 28 1.5.1 Transliteration as pseudo-­phonetic transcription 29 1.6 Orthographic Transcription 32 1.6.1 Interpretation of spellings and transcriptions 33 1.7 Status and Function of Notations and Transcriptions 35

vi

Phonetic Transcription in Theory and Practice

2  Origins and Development of Phonetic Transcription37 2.0 Introduction 37 2.1 Representation of Pronunciation in Writing Systems 37 2.2 Phonographic Processes in Writing Systems 38 2.2.1 The rebus principle 38 2.2.2 Syllabography 39 2.2.3 The acrophonic principle 40 2.2.4 The notion ‘segment’ revisited 41 2.2.5 Subsegmental analysis 45 2.2.6 Diffusion and borrowing of writing systems 46 2.2.7 Anti-­ phonography 47 2.3 The Development of Phonetic Theory 48 2.3.1 Phonetic theory in the pre-­Modern world 49 2.3.2 Phonetic theory in the Early Modern world 51 2.3.3 Phonetic terminology in the ‘English School’ 65 2.3.4 Phonetic theory in the late eighteenth and nineteenth centuries66 2.3.5 From correspondence to representation 69 2.3.6 Spelling reform 70 3  Phonetic Notation73 3.0 Introduction 73 3.1 Organic-­Iconic Notation 74 3.1.1 Korean Hangŭl75 3.1.2 Helmont’s interpretation of Hebrew letters 76 3.1.3 Wilkins’s organic-­iconic symbols 77 3.1.4 Bell’s Visible Speech notation 79 3.1.5 Sweet’s organic-­iconic notation 80 3.1.6 The Passy-­Jones organic alphabet 82 3.2 Organic-­Analogical Notation 83 3.2.1 Wilkins’s analogical notation 83 3.2.2 Lodwick’s analogical notation 86 3.2.3 Sproat’s analogical notation 88 3.2.4 Notation for a voiced alveolar trill in Wilkins, Bell/Sweet and Passy-­Jones 90 3.3 Analphabetic Notation 92 3.3.1 Jespersen’s analphabetic notation 93 3.3.2 Pike’s analphabetic notation 95 3.4 Alphabetic Notation and the Structure of Symbols 97 3.4.1 Pre-­nineteenth-­century alphabetic notation 101 3.4.2 Lepsius’s Standard Alphabet106 3.4.3 Ellis’s palaeotype notation 109 3.4.4 Sweet’s romic notation 111 3.4.5 IPA notation 112 3.4.6 Extensions to the IPA 119 3.4.7 IPA Braille notation 124 3.4.8 Pitch notation 126

Contents

vii

3.4.9 Notation for voice quality and long domain categories 128 3.4.10 SAMPA notation 129 3.4.11 Notation for infant vocalisations 130 3.4.12 Using notations 132 3.5 Ordering of Components and Homography in Composite Symbols134 3.6 Hierarchical Notation 137 4  Types of Transcription141 4.0 Introduction 141 4.1 Specific and Generic Transcriptions 142 4.2 Orientation of Transcriptions 143 4.3 Broad and Narrow Transcriptions 144 4.4 Systematic and Impressionistic Transcriptions 145 4.5 General Phonetic Transcription 147 4.6 Phonemic Transcription 148 4.7 Allophonic Transcription 155 4.8 Archiphonemic Transcription 157 4.9 Morphophonemic Transcription 158 4.10 Exclusive and Inclusive Transcriptions 160 4.11 Dynamic Transcription 161 4.11.1 Parametric transcription 163 4.11.2 Gestural scores 165 4.11.3 Intonation and rhythm 166 4.12 Instrument-­Dependent and Instrument-­Independent Transcriptions170 4.13 Transcriptions as Performance Scores 170 4.13.1 Nonsense words 171 4.13.2 Transcriptions as prescriptive models 173 4.13.3 Spelling pronunciation 174 4.13.4 Active and passive readings of transcriptions 175 4.14 Third Party Transcriptions 175 4.15 Laying Out Transcriptions 175 5  Narrow Impressionistic Phonetic Transcription178 5.0 Introduction 178 5.1 Pressure-­Waves, Auditory Events and Sounds 179 5.2 The Auditory System and Auditory Perception Of Speech 180 5.2.1 Just noticeable differences 184 5.3 Perception of Speech 185 5.4 Is Speech Processed Differently from Non-­Speech Stimuli? 191 5.5 The Issue of Consistency 194 5.6 The Issue of Veridicality 195 5.7 The Content of Perceptual Objects 198 5.8 The Objects of Analysis for Impressionistic Transcription 201 5.9 Phonetic Judgements and Ascription 204 5.10 Objections to Impressionistic Transcription 206

viii

5.11 5.12 5.13 5.14

Phonetic Transcription in Theory and Practice Who Should Make Impressionistic Transcriptions? Conditions for Making Transcriptions Comparing Transcriptions and Consensus Transcriptions Are Some Kinds of Data Harder to Transcribe Than Others?

209 211 215 220

6  P  honetic Transcription in Relation to Instrumental and Other Records223 6.0 Introduction 223 6.1 Instrument-­Dependent Transcriptions 225 6.1.1 Instrument-­ determined transcriptions 225 6.1.2 Instrument-­ informed transcriptions 228 6.2 Functions of Instrument-­Dependent Transcriptions 229 6.2.1 Annotating function 229 6.2.2 Summarising function 233 6.2.3 Corpus transcriptions 234 6.3 Indexed Transcriptions 235 6.4 Impressionistic Transcription and Instrumental Records 236 6.5 Phonetic Domains, Phonetic Theory and Their Relations 240 6.5.1 Articulatory domain 243 6.5.2 Aerodynamic domain 245 6.5.3 Acoustic domain 246 6.5.4 Auditory domain 247 6.5.5 Perceptual domain 248 6.5.6 Phonetic categories as domain-­neutral 249 6.6 Multi-­Tiered and Multilayered Transcriptions 250 7  Uses of Phonetic Transcription251 7.0 Introduction 251 7.1 Transcription in Dictionaries 251 7.2 Transcription in Foreign Language Learning and Teaching 253 7.3 Transcription in Phonetics Learning and Teaching 256 7.4 Transcription in Speech Pathology and Therapy 256 7.5 Transcription in Dialectology, Accent Studies and Sociophonetics257 7.6 Transcription in Conversation Analysis 261 7.7 Transcription in Forensic Phonetics 263 Glossary265 References268 Appendix:  Phonetic Notation Charts IPA Chart Revised to 2005 295 Elaborated Consonant Chart from Esling (2010) 297 ExtIPA Chart Revised to 2008 298 VoQS Chart 1994 299 IPA Braille Chart 2009 300 Index304

List of Tables

e

Table 1.1 Types of writing-­system units and their corresponding pronunciation units 7 Table 1.2 Separate letters corresponding to front and back allophones of /ɡ/ in written Azeri 8 Table 2.1 Consonantal manner terminology in the ‘English School’ of phonetics in the sixteenth and seventeenth centuries 65 Table 3.1 Examples of Jespersen’s notation for phonetic categories 94 Table 3.2 Conventions for interpreting Pike’s analphabetic notation for [t] 96 Table 5.1 Pressure-­waves, auditory events and sounds 179 Table 5.2 Alignments of variant transcriptions 216 Table 5.3 Comparison of variant transcriptions and what they have in common219

List of Figures

e

Figure 1.1 Two views of the relationship between language, speech and writing 9 Figure 1.2 Classification of notation in writing 10 Figure 1.3 The relationship of phonetic transcription to language 13 Figure 1.4 Correspondences and equivalences between expression-forms in translations 14 Figure 1.5 Segmentation of So does she keep chickens? into acoustic classes17 Figure 1.6 Categories, dimensions and models in a small, two-­dimensional, abstract taxonomic space 21 Figure 1.7 The mapping of speech phenomena onto a theoretical model creates a descriptive model 25 Figure 1.8 Transliteration as pseudo-­transcription and respelling 30 Figure 1.9 Classification of phonetic notation and transcription in terms of status 35 Figure 2.1 Units used for spelling the written signs of language A are used for representing the pronunciation of spoken signs in language B 46 Figure 2.2 Late twelfth-­or early thirteenth-­century vocal tract diagram entitled Sūrat makhārij al-­hurūf ‘Picture of the outlets of the letters’ from Miftāh al-­‘Ulūm ‘The Key to the Sciences’ by Al-­Sakkāki51 Figure 2.3 (a) Robinson’s ‘scale of vowels’ diagram of 1617; (b) Bell’s ‘scale of lingual vowels’ of 1867 with his Visible Speech symbols; (c) Jones’s drawings of cardinal vowel tongue positions of 1918, based on X-­ray photographs 56 Figure 2.4 Wallis’s 1653 sound chart ‘Synopsis of all letters’ 58 Figure 2.5 Wilkins’s sound chart of 1668 61 Figure 2.6 Holder’s table of consonants (left) and ‘scheme of the whole alphabet’ (right) 63 Figure 3.1 Articulatory configurations motivating the Hangŭl letters 75



List of Figures

xi

Figure 3.2 Helmont’s diagram of Hebrew bēth (left) and his vocal tract diagram (right) 77 Figure 3.3 Wilkins’s organic alphabet and articulatory diagrams of 166878 Figure 3.4 Bell’s vocal tract diagrams for consonants and vowels 79 Figure 3.5 Sweet’s (1906) organic symbols for (a) consonants and (b) vowels 81 Figure 3.6 The Passy-­Jones organic alphabet 82 Figure 3.7 The analogical symbols of Wilkins 84 Figure 3.8 The analogical symbols of Lodwick with a transcription of the Lord’s Prayer 87 Figure 3.9 Sproat’s analogical symbols for consonants 89 Figure 3.10 Organic symbols for a voiced alveolar trill 90 Figure 3.11 Structural classification of alphabetic phonetic symbols with examples99 Figure 3.12 Vowel symbols of Iceland’s ‘First Grammarian’ 101 Figure 3.13 Hart’s new letter-­shapes 104 Figure 3.14 EPG frames showing simultaneous central and lateral channels for airflow during (a) [lsˁ] in the word θˡˁaim ‘pain’ (Al-­Rubū‘ah dialect), (b) [lzˁ] in the word ðˡˁahr ‘back’, and (c) [lzˁ] in the word ðˡˁabʕ ‘hyena’ (Rijāl Alma‘ dialect)123 Figure 3.15 Halliday’s use of musical staves to show pitch dynamics in speech126 Figure 3.16 Consonant chart from Canepari (2005: 168) 135 Figure 4.1 Steele’s (1775: 47) adaptation of musical notation 142 Figure 4.2 Overlapping but distinct sets of allophones of /d/ and /b/ at an assimilation site 151 Figure 4.3 Dynamic transcriptions in Pike’s ‘sequence diagrams’ for (a) [abop] and (b) [zʒɣn]162 Figure 4.4 Parametric transcription of Good morning 164 Figure 4.5 Gestural score for palm 166 Figure 4.6 Steele’s transcription of a ‘bombastic’ manner of reciting lines from Thomas Leland’s Orations of Demosthenes 167 Figure 4.7 (a) F0 trace; (b) orthographic transcription with accent and tone marking; (c) interlinear tonetic transcription with iconic representation of pitch height, accentual prominence, and pitch movement; (d) ToBI transcription 168 Figure 4.8 Relations between speech, instrumental records and transcriptions in instrument-­determined, instrument-­informed and instrument-­independent transcriptions170 Figure 5.1 The human auditory response area 183 Figure 5.2 Korean ‘denasalised’ alveolar stop, with IPA symbol alternatives, from the phrase miguŋ nodoŋ ‘American labour’ 211 Figure 6.1 Praat waveforms, spectrogram and labelled text grids for segmentation and annotation 224

xii

Phonetic Transcription in Theory and Practice

Figure 6.2 Spectrogram of a dragonfly with aligned multi-­tiered transcription showing segment overlap 225 Figure 6.3 Palatographic frames showing onset, steady state and offset of a lateral articulation 227 Figure 6.4 Example of an annotated spectrogram and waveform incorporating measurement data 229 Figure 6.5 Acoustic and palatographic displays of Libyan Arabic /miʃ ɡdar/ ‘was not able to’ showing total overlap of alveolar and velar articulations and the release of /d/ 230 Figure 6.6 Acoustic display of Libyan Arabic wagt ‘time’ with epenthetic [ə] separating /ɡ/ from /t/ 231 Figure 6.7 Spectrogram, waveform, laryngoscopic images and spectrum (FFT and LPC) of the Iraqi Arabic word /saʕiːd/ ‘happy’ realised as [saˁʕ̆iːd]232 Figure 6.8 Annotated waveform and spectrogram focusing on a particular realisation of English /t/ 233 Figure 6.9 Intensity, Fx (pitch) and Qx (closed quotient) traces from an utterance of What are you talking about? annotated with ExtIPA, IPA and VoQS notation 234 Figure 6.10 Averaged FFT spectrum and laryngogram indexed to a specific transcription of the Arabic word /waʕʕad/ ‘to make someone promise’ showing voice quality features in the realisation of the geminate pharyngeal /ʕʕ/235 Figure 6.11 Spectrograms indexed to a generic allophonic transcription of English lilt to show typical clear and dark allophones of /l/ with formant tracks 236 Figure 6.12 Multi-­tiered transcription showing (A) signal-­oriented transcription summarising acoustic records (spectrogram and speech waveform); (B) speaker-­oriented transcription summarising an articulatory record (larynx waveform); (C) listener-­oriented impressionistic transcription 240 Figure 6.13 Phonetic domains in a chain of cause and effect which map independently to phonetic categories 241 Figure 6.14 Domain-­neutral theoretical model and domain-­specific descriptive models 243 Figure 6.15 (a) Midsagittal vocal tract diagram representing generic physical articulatory space with IPA symbol [s] at the relevant place of articulation; (b) region of abstract articulatory space containing [s] as the product of category intersection244 Figure 6.16 Vowel plot as a model of normalised acoustic space showing the grand mean distributions and standard deviations of the English dress, trap and strut vowels for different groups of speakers 246 Figure 6.17 Centroid for a token of [s] 247 Figure 7.1 Pages from Ellis’s SED fieldwork notes with IPA transcriptions260

Preface

e

Why write a book on phonetic transcription? After more than half a century of major advances in instrumental phonetics which have rightly taken credit for broadening and deepening our knowledge of the structure of speech, it can appear to many that symbols and transcription have had their day. What, it might be asked, can [d] ever tell us that spectrograms, palatograms and the like cannot? If traditional transcription is not to fade away or be made the amanuensis of automated forms of analysis, then a case must be made for it on the grounds that it can express something which instruments cannot. Arguments need to be put against the view that there is nothing to be gained in phonetics by listening analytically to people speaking and transcribing what we hear. Marshalling the arguments provides the opportunity not only to examine critically the aims and methods of transcription but also to think about how phonetic symbols work in relation to phonetic theory on the one hand and phonetic data on the other; to consider, that is, the manner of their semiosis. This book attempts to address these issues and to place them in the context of the historical emergence of transcriptional resources from resources for writing language, the development of phonetic theory, and their coming together to make what I refer to as proper phonetic transcription possible. If any time and place can be identified as when and where the ideas for this book originated, it is nearly twenty-­five years ago when I started teaching phonetics to speech and language therapy students at Leeds Polytechnic, later Leeds Metropolitan University. There were quite intensive practical phonetics classes and tests involving transcriptions of clinical as well as non-­ clinical speech samples which had to be marked. Anyone who has had to transcribe difficult clinical speech data, and judge the accuracy of others’ transcriptions, might agree that there is nothing quite like it for making one realise that fair copies do not, and cannot, exist. And yet not all transcriptions are equally insightful. It was the knowledge, expertise and insightfulness of my then colleague Stephen Mallinson which showed me that the twists and turns of the transcription process which threaten to entrap one in endless indecision can be transformed from a maze of blind alleys into a labyrinth whose path, after leading you deeper into a chaotic world of sounds, leads you out again past a pleasingly ordered array of symbols

xiv

Phonetic Transcription in Theory and Practice

and diacritics. It is a transformation that only takes place once one has a thorough practical grasp of phonetics, a good understanding of phonetic theory in all its aspects, and the right balance of faith and doubt in one’s ability to make a good transcription: belief that it is possible, but uncertainty that one has ever quite managed to do it. I have been fortunate enough to collaborate over many years with Sara Howard on various aspects of phonetic analysis and transcription, benefitting greatly from her knowledge and experience, and finding her appetite and enthusiasm for intractable phonetic data a true inspiration. Much of the content of this book would hardly have been imaginable otherwise. The scope of the book has had to be limited to keep it within constraints of space and pressures of time. Consequently I have not looked at shorthand systems, despite their obvious relevance and historical contribution to the representation of pronunciation, on the grounds that they are not used by phoneticians for phonetic transcription and are not as independent of language-­specific lexical, grammatical, phonological and spelling systems as phonetic notation aims to be. Transcription of non-­speech vocal phenomena inseparably woven into spoken communication, such as laughter and sighing, has not been included although infant pre-­speech vocalisations are briefly looked at. Transcriptional resources for other aspects of human communicative behaviours such as gesture, gaze and proxemics, and notation for discourse structure, have also been omitted as being outside the usual meaning of ‘phonetic’ as pertaining to the sounds of speech. Intonationists will probably be disappointed in the greater emphasis on segmental transcription, but one aim of the book is to bolster the legitimacy of segments as theoretically respectable elements of auditory-­perceptual speech analysis and denotata for phonetic symbols. Barry Heselwood February 2013

Acknowledgements

e

Many people have indirectly influenced the content of this book, far too many to list. But I should like to mention, in alphabetical order, those whose direct advice and assistance, on small points or on larger issues, have been a help even if they were not aware of it at the time: Munira al-­Azraqi, Michael Ashby, Martin Ball, Martin Barry, Helen Barthel, Monica Bray, Emanuela Buizza, Elena Coope-­ Bellido, Ian Crookston, James Dickins, Gerry Docherty, Martin Duckworth, Robert Englebretson, John Esling, Paul Foulkes, Tony Fox, Alaric Hall, Zeki Majeed Hassan, Sara Howard, Mark Jones, Miho Kamata, Pat Keating, Ghada Khattab, Maha Kolko, Young-­Shin Kim, Rachael-­Anne Knight, Sujuan Long, Michael MacMahon, Reem Maghrabi, Stephen Mallinson, Samia Naïm, Sue Peppé, Leendert Plug, Robin Le Poidevin, Rawya Ranjous, Raouf Shitaw, Mark Shouten, Fiona Skilling, Alison Tickle, Clive Upton, Gareth Walker, Juan Wang, Janet Watson, Dominic Watt, Frances Weightman, John Wells, Anne Wichmann; also all those, not already named, who attended meetings of the Phonetic Transcription Group in Leeds convened by Sara Howard and myself. Needless to say, they bear no responsibility for how I have used their advice and assistance, any errors and inconsistencies being entirely mine. I am also grateful to students who over the years have contributed their ideas in phonetic transcription classes, often noting things which I missed and raising issues I had not before thought about. Thanks also to David Thomas for agreeing to have his painting on the cover, to the Faculty of Arts and the School of Modern Languages and Cultures at Leeds University for funded sabbatical leave, and to colleagues in Linguistics and Phonetics for their much-­valued support and collegiality. I would also like to express gratitude to Gillian Leslie at Edinburgh University Press for her patience and advice in steering the book towards publication, and to Fiona Sewell for diligent copy-­editing, Sue Lightfoot for compiling the index, and Rachel Arrowsmith for assistance with proof-reading. Last but very far from least, I am grateful to my wife and family for their forbearance while much of my time and attention was consumed in pursuit of completing this book.

(Stoop) if you are abcedminded, to this claybook, what curios of signs (please stoop), in this allaphbed! Can you rede (since We and Thou had it out already) its world? It is the same told of all. Many. Miscegenations on miscegenations. James Joyce, Finnegans Wake

Introduction

e

Phonetic transcription is concerned with how the sounds used in spoken language are represented in written form. The medium of sound and the medium of writing are of course very different, having absolutely no common forms or substance whatsoever, but over the ages people have found ways to represent sounds using written symbols of one kind or another, ways that have been more or less successful for their purposes. This book aims to explore the history and development of phonetic transcription as a particular example of technographic writing and to examine critically the problems attending its theory and practice. A good many academic books include ‘theory and practice’ in their title, and I offer no apology for doing so in a work on phonetic transcription. Theory and practice have shaped the resources for transcription by pulling often in contrary directions through obedience to different priorities. Theory, being concerned with the logic and consistency of category construction, has made many attempts to impose itself on the design of phonetic notation systems, but practice has almost always rebelled, finding the demands of theory too inflexible and too forgetful of the practical need to make and read transcriptions with a minimum of difficulty. The failure of many proposed notation systems has illustrated that the only valid test for a notation is ‘practice, not abstract logical principles’ (Abercrombie 1965: 91). It is in phonetic transcription that theory and practice have to make compromises – practice must not ignore the rigour of theory or it will lose its accuracy of expression, and theory cannot afford to overlook the needs and constraints of practice or practitioners will lose patience with it. It might be objected that I have over-­theorised in places, that we can get by perfectly well using symbols as imitation labels with attached definitions and be guided by professional intuition, but if we are to understand what we are really doing with notations and transcriptions and be able to justify them, then we do need to expose their theoretical foundations to critical scrutiny, and strengthen them if need be. It is as well to understand the tools of one’s trade conceptually and structurally if one can. The idea of representing something by means of something else is inherently problematic and contradictory but lies at the very heart of language itself. Phonological forms of words, themselves meaningless, are used in spoken

2

Phonetic Transcription in Theory and Practice

l­anguage to stand for meaningful things; likewise orthographic forms in written language. How is it possible for one thing to stand for, or represent, something else? If I write the word roses no roses appear on the page. Even a good painting of roses gets us no closer. We might be tempted to think that a photograph of an object is somehow a more faithful representation than a word-­form or an artist’s painting, but there are still no roses on a photograph of roses; and it may, after all, have been plastic or paper roses which were photographed. In representations of sound there is the same absence of the thing represented. No sounds emanate from the notes on a musical score, or from a page of phonetic symbols. Phonetic notation, orthographic word-­forms, crotchets and quavers, artists’ paintings and even photographs can only represent something by convention. Whatever means are developed for representing things, they have to be interpreted, and there has to be sufficient agreement on how to interpret them if they are to do their job. Phonetic theory is the source of interpretation for phonetic symbols and is what essentially distinguishes them from the characters used in written language; it is the difference, for example, between the phonetic symbol [b] and the alphabetic letter . I have just said that representation works by one thing standing for something else, and yet it also has to stand for itself if it is to be recognised. The phonetic symbol [b] stands for a particular bundle of phonetic categories but it also stands for a type of graphic shape, or glyph, consisting of a bowl and an ascending stroke attached to the left of it, for without that shape it would not be recognised as that symbol. There is always, therefore, a self-­signifying function in the figura (see Section 1.1.3) of any sign or symbol as well as a deictic function. It is as if it is saying ‘Look at me, I look like this and I stand for that.’ Once we have recognised it, however, we need to forget the symbol and attend to that which it represents. The less distracted we are by the symbol itself, the easier this will be. But this conflicts with the commonly held, and on the face of it reasonable, belief that a good representation should resemble the thing it represents as faithfully as possible, which implies profusion of detail. At the head of his section on ‘Symbols’, Jespersen (1889: 12) quotes from Thomas Carlyle’s Sartor Resartus: ‘In a symbol there is concealement [sic] and yet revelation.’ A central purpose of this book is to try to understand what it is that remains in concealement and to explicate what it is that is revealed when we use phonetic symbols, and to show that much of this depends on the principles according to which symbols are constructed; furthermore, that this in turn is crucially dependent on phonetic theory. The inevitable circularity in these relationships means that a symbol as part of a notation system cannot tell us anything theoretical that we do not already know, but can in transcriptions tell us particulars which we do not already know, and that indeed is symbols’ ultimate purpose in transcriptions. For example, if someone tells me that such-­and-­such a variety of English realises final singleton /t/ as [h], I know nothing more about /t/ or [h] as phonological and phonetic entities, but I do now know more about that variety of English. It would be a mistake, however, to think I now know everything about the realisation of final singleton /t/ in that variety because [h] as a representation normalises for all kinds of variables, such as pharyngeal volume and tongue elevation, not considered by phonetic theory to be important in relation to [h]. Theory, therefore, determines

Introduction

3

what a symbol reveals and what it conceals, and symbol design determines how its revelations are displayed. A practical solution to the inflexibilities and impracticalities of theory, and to the problem of overly detailed representation, is to acknowledge with Abercrombie (1967: 120) the advantages of arbitrariness in symbol systems just as Saussure acknowledged it in his theory of the linguistic sign. It is the arbitrariness of the relation between a word-­form and its meaning that gives human language its extensive and enduring power to signify, and the same principle applies to a sophisticated symbol system such as phonetic notation. The seventeenth-­ century project to design a universal philosophical language failed to acknowledge this fundamental point, and so have iconic and analogical phonetic notation systems. In both cases, theories about the phenomena to be represented have dictated the forms of representation, the consequence being that, should the theory be revised, the forms of representation become obsolete. The same happens to phonographically reformed spellings when pronunciations change. Sweet’s first response to Bell’s Visible Speech organic notation recognised this weakness (Sweet 1877: 100–­1), but it was not long before Sweet succumbed to the familiar delusion of every age, that things are now, at last, properly understood well enough. Writing only four years later, he declared himself a committed champion of Bell’s approach with the justification that ‘[i]f we impartially survey the whole field of phonetic knowledge, we shall see that the great majority of the facts are really as firmly established as anything can well be’ (Sweet 1881: 184). One only has to call to mind a few of the many, many discoveries in phonetics over the course of the twentieth century, and the continuing additions to our knowledge and revisions to our theoretical frameworks as we make our way through the first quarter of the twenty-­first, to see how wide of the mark Sweet was. Arbitrariness of symbols should not prevent us from appreciating their power to activate representations in the minds of those exposed to them and thus to appear, from a subjective point of view, to have a necessary connection with what they signify, becoming subjectively iconic in a Piercean sense. How many phoneticians trained in the cardinal vowel system can see [e] and not ‘hear’ cardinal vowel number 2, perhaps even hear Daniel Jones’s production on F natural in New Philharmonic pitch if they are familiar with his recordings, before starting to retrieve its IPA phonetic label? The iconic power symbols accrue, despite their logical arbitrariness, tends to protect and preserve symbol–denotatum relations, thereby conferring considerable stability on a notation system once it has been adopted, very much as with the spellings of written language. The relationship we have with symbols, as with written words, is more materially immediate than with what they signify, an insight which has led psychoanalysts such as Jacques Lacan to declare the primacy of the signifier from a psychological point of view in contrast to the logical parity of signifier and signified in Saussure’s conception of the structure of the linguistic sign (Benvenuto and Kennedy 1986: 24). Proposals to make changes to how things are symbolised have to be well founded and well argued to have a chance of success. That there is a certain irrationality in our psychological relations with symbols is evident if one asks how likely is it that anyone would seriously propose a swastika glyph for a new phonetic symbol. The world-­wide success of IPA-­style notation in the discipline of phonetics

4

Phonetic Transcription in Theory and Practice

rides on the near-­universal familiarity among literate peoples with the basic stock of symbol shapes experienced through exposure to written forms of languages using roman alphabetic letters. This is true even of users of other writing systems such as Arabic, Chinese, Hindi and Thai, who can hardly escape the reach of roman-­based writing systems. No doubt this is due in large part to the spread of English as an international language in the wake of political and economic influence and domination by English-­speaking nations. Roman alphabetic letters themselves have come about through adaptation of letters by literate speakers of many different languages over millennia in a process which is quite accurately captured in Joyce’s phrase ‘miscegenations on miscegenations’. To regard IPA notation as historically misbegotten, however, does not mean we should regard it as unfit for its purpose. Its fitness or otherwise will be determined by the practical needs of phoneticians requiring resources for transcription. Whether this notation will meet the needs of future generations of phoneticians is something we cannot be in a position to know, but it is unlikely that they will not engage with the practicalities of transcription whilst continuing to theorise about phonetics, and either stick with the principles of the IPA and its notation or give birth to a new ‘miscegenation’.

1

e Theoretical Preliminaries to Phonetic Notation and Transcription

e

1.0 Introduction In this first chapter, a number of points of theory need to be clarified concerning both the relationship between spoken and written language, and the status of phonetic transcription as a particular kind of technographic writing for representing speech. In the course of clarification I hope to define proper phonetic notation and proper phonetic transcription, to distinguish them from the notion of a phonographic orthography, and to give theoretical expression to respelling and transliteration in relation to phonetic transcription. An issue of overriding importance throughout the book is what exactly phonetic symbols denote and what transcriptions represent. The issue is tackled largely from an assumption that the notion of a ‘segment’ is valid providing we take a sophisticated view of it as being rooted in the mental world of perception, not the physical world of measurable properties. Arguments for this position are put forward in Section 1.2.1 and returned to in Chapter 2 Section 2.2.4. Like the concept of the phoneme in phonology, the segment is often denied, but something remarkably like it seems to be reinstated quickly if only to provide a concept about which statements can be predicated.

1.1  Phonetic Transcription and Spelling Much of the discussion of phonetic transcription in this chapter is concerned with the differences between transcription and spelling and thus between spoken and written language.1 In any consideration of written language there has to be some account of the many different writing systems that have arisen in the relatively short time since written language first appeared around the end of the fourth millennium bce. Writing also features prominently in Chapter 2, where the emergence of transcription out of phonographic processes in writing systems is traced. It will therefore be useful to outline briefly the main conceptual division of how writing represents language, that is to say whether its units represent meaningful words and morphemes (logography) or meaningless units of sound structure such as syllables, or consonants and vowels (phonography).2 The division is based on Sampson (1985: 32–5).

6

Phonetic Transcription in Theory and Practice

1.1.1  Logography and phonography

Although none of the writing systems we know about are completely logographic, and few if any are completely phonographic, the distinction is a crucial one in principle. Logography means that a word or morpheme is written with its own character and contains no information about how the corresponding spoken word is pronounced. Words with identical or similar pronunciations may have entirely different written characters. In Chinese, for example, 握 ‘hold, grasp’ and 卧 ‘lie down’ are both pronounced [ˋwo] but the characters are silent about any phonetic similarity. By contrast, phonography means that each character corresponds to an expression unit of spoken language such as a syllable, a consonant or a vowel. Words with identical pronunciations will be written the same. The English words date (fruit) and date (calendar) are pronounced and spelt identically although they are clearly different lexical items synchronically and etymologically. While it is easy to see that logography has little to do with phonetic transcription, it is also easy to assume that phonetic transcription is a phonographic writing system, an assumption that has in fact been made by scholars of writing such as Sampson (1985: 33). I will explain below in Section 1.1.3 why I think this is a mistake. The logography–phonography distinction is in practice more of a continuum when actual writing systems are analysed and we see logographic and phonographic principles at work. For example, written Chinese is often held to be logographic (Sampson 1985: 145–71, but see DeFrancis 1989: 99–121, who argues it is morphosyllabic) but makes extensive use of phonography albeit in a rather opaque manner. Written English is more obviously phonographic but not all homophones are spelt the same – hair–hare, blue–blew, sight–site, moat–mote and so on. Even in Spanish, often cited as highly phonographic in its spellings, there are a few non-­homographic homophones – for example vaca ‘cow’ and baca ‘roofrack’, both pronounced [ˈbaka], haya ‘beech tree’ and halla ‘there is’, both pronounced [ˈaja]. In written languages the extent to which logographic and phonographic principles are in evidence in typical written texts varies so that some writing systems, such as Ancient Egyptian and Chinese, are more logographically oriented than others, and some, like Spanish and the Japanese kana syllabaries, more phonographically oriented than others. Processes of phonography in writing increase the orientation towards pronunciation and create resources which can be used for transcription as well as for spelling (see Chapter 2 Section 2.2). A type of writing that manifests both logographic and phonographic features is what is sometimes called morphophonemic writing or morpho-­phonography. English exhibits this category when morphemes are given invariant spellings despite variant phonological forms. The regular plural inflection, for example, has the phonological variants /-­s, -­z, -­ɪz/ in spoken English but invariant in written English, although of course does not always spell the plural morpheme (see also Chapter 2 Section 2.2.7 and Chapter 4 Section 4.9). 1.1.2  Sound–spelling correspondence

Relationships between elements of writing and elements of pronunciation I shall, following common practice, talk of as correspondences. It will be useful first, and



7

Theoretical Preliminaries

in preparation for discussions in later sections and chapters, to summarise and exemplify the different kinds of units in writing systems that can be put into correspondence with units of pronunciation. Daniels (1996: 4, 2001: 43–4) proposes six fundamental kinds of characters in writing systems, distinguished by their relationships of correspondence to units of pronunciation in spoken language, and which cannot be further analysed into components having their own correspondences. Logosyllabograms (or morphosyllabograms) are units that function in written language to spell whole words or morphemes but which also correspond to discrete syllables in spoken language if, in the language in question, words are typically monosyllabic as is the case in Chinese. The character 撒 ‘to scatter’ spells the whole written word and the spoken language equivalent is pronounced [ˇsa]. The character can therefore be said to correspond to the pronunciation-­form [ˇsa]. A syllabogram is a unit of writing that corresponds to a discrete syllable in speech and which is used for spelling any words whose spoken equivalents contain that syllable regardless of meaning. The characters of an abjad, or consonantary, correspond only to consonants in spoken language while those of an abugida correspond to a consonant-­plus-­vowel sequence. Vowels in abugidas correspond to systematic additions to a base consonant character which on its own often represents a consonant plus /a/ as a kind of default vowel – an abugida is thus a vocalically augmented abjad. Note that an abjad can, as in Arabic, have optional diacritics corresponding to vowels whereas the vocalic augmentation in abugidas is obligatory. In an alphabet there are autonomous characters which can be put into correspondence with vowels as well as consonants. The final type is a featural system in which ‘the shapes of the characters correlate with distinctive features of the segments of the language’ (Daniels 1996: 4). Written Korean is given as an example; Arabic and Hebrew pointing, and the niguri and maru diacritics in Japanese kana scripts, are also featural (see Chapter 2 Section 2.2.5). Table 1.1 presents examples of the six types. TABLE 1.1:  Types of writing-­system units and their corresponding

pronunciation units 撒,苏,色

さ,す,せ

‫س‬

ሠ,ሡ,ሤ

s, a, u, e



/ ˇsa, ˉsu, `se/

/sa, su, se/

/s/

/sa, su, se/

/s, a, u, e/

[dental]a

Arabic abjad consonant letter

Amharic abugida consonant-­ plus-­vowel letters

Spanish alphabet consonant and vowel letters

Korean featural feature letter

Japanese ‘scatter’, ‘revive’, hiragana  ‘colour’ syllabograms Chinese   logosyllabograms a

Sampson (1985: 124–5) calls this feature ‘sibilant’.

‘Sound–spelling correspondence’ is a general term, neutral with respect both to type of writing-­system unit, and to the size of the sound elements of speech. It is common to come across the term ‘grapheme–phoneme correspondence’ in literature dealing with reading and writing but there are problems with it. ‘Grapheme’

8

Phonetic Transcription in Theory and Practice

means different things in different theoretical approaches to writing systems, and ‘phoneme’ means different things in different phonological theories, the implications of which for phonemic transcription are considered in Chapter 4 Section 4.6. Concerning ‘grapheme’, some writers follow Pulgram (1965) in using it for the minimal distributional element of writing in a given writing system whether this be a logogram, syllabogram or alphabetic letter. Others, such as DeFrancis (1989: 54), reserve the term for written characters that correspond systematically to minimal elements of sound in spoken language. The latter use brings its own problems in cases of so-­called ‘silent’ letters, which occur frequently in, for example, English and French spelling. English made is spelt and transcribed phonemically as /meɪd/ (or /mejd/). The final can be regarded either as part of a discontinuous digraph corresponding to the diphthong /eɪ/, or, as Venezky (1970: 50) advocates, as a diacritical letter telling us that the grapheme in this context corresponds to /eɪ/, preventing made becoming mad. Similar problems attend the in comb and climb. Daniels (2001: 66–7) favours ditching the term ‘grapheme’ altogether. The notoriously many and contentious definitions of ‘phoneme’ in the phonological literature preclude review here (see Chapter 4 Section 4.6), but on a very general level the term can be understood as a distinctive consonant or vowel without regard for contextual (allophonic) variation. It is rare for the allophones of a phoneme to have separate corresponding letters but Azeri furnishes an example. In this Turkic language /ɡ/ has a front allophone before front vowels and a back allophone before back vowels. Azeri, at different periods, has been written using Arabic, Roman and Cyrillic letters and in each case the two allophones of /ɡ/ have had their own letter as shown in Table 1.2. TABLE 1.2:  Separate letters corresponding to front and back allophones of /g / in

written Azeri (from Coulmas 1996: 30)

Front allophone Back allophone

Roman

Cyrillic

Arabic

g q

Ҝ Γ

‫گ‬ ‫ق‬

The rarity of different allophones of a phoneme being in correspondence with different letters depends to some extent on how one does one’s phonological analysis. For example, many languages have vowel–glide pairs which are in complementary distribution, e.g. English [u] and [w], [i] and [j], and which have their own corresponding letters , . If these glides are regarded as non-­nuclear allophones of vowels, then examples of allophone–letter correspondences may not be so hard to find. Letters can correspond to what structuralist phonologists call an ‘archiphoneme’, which is the result of the neutralisation of a phonemic opposition in a particular phonotactic context. Trubetzkoy (1933/2001: 12 n.1) gives the following three examples. The three-­way oppositions between voiced, voiceless and aspirated plosives in Ancient Greek were neutralised before /s/. Letters were



Theoretical Preliminaries

9

invented to correspond to the sequence of the neutralised stop + /s/. For example, corresponded to the sequence comprising the archiphoneme /P/, resulting from neutralisation of the /b–p–pʰ/ oppositions, plus a following /s/. The letter in the Avestan alaphabet corresponded to an archiphoneme /T/ representing the neutralisation of /t–d/ () in prepausal and pre-­obstruent positions. The Devanagari script has a letter representing the archiphoneme resulting from the neutralisation of the nasals /m–n–ɳ–ɲ–ŋ/ before stops (see Bright 1996: 385). The correspondence of letters to archiphonemes is rather surprising because it demonstrates that whoever invented letters for that purpose realised that there was something different, not necessarily about the sound itself at that position in the phonotactic structure, but about its distinctiveness in that position. It attests to some conscious appreciation of distinctiveness as an abstract structural property of a system. Some writing resources have thus developed as a consequence of an analysis as deep, if not as detailed, as any in modern phonological theory. By conceiving of the relationships between sound units of spoken language and graphic units of written language as relations of correspondence I am deliberately taking a non-­representationalist view of written language. That is to say, I do not take the Aristotelian view (De Interpretatione 16a3) that writing represents speech (Figure 1.1a). I take instead the view, elaborated in Section 1.1.3, that language can be expressed in spoken and written forms but that its ontology as a system of lexis and grammar is equally independent of, and dependent on, both (Figure 1.1b). It is the purpose of phonetic transcription to embody an analysis of its spoken expression. A theoretical account of how it does so is outlined in Section 1.3. (a)

LANGUAGE

SPEECH

WRITING

LANGUAGE

(b)

SPEECH

WRITING

FIGURE 1.1:  Two views of the relationship between language, speech and

writing: (a) that speech expresses language and writing represents speech; (b) that both speech and writing independently express language. The dotted arrow in (b) indicates that relations of correspondence can be set up between elements of speech and elements of writing. 1.1.3  Speech, writing and the linguistic sign

Resemblances between phonetic transcription and phonographic writing are obvious but potentially misleading. They are both forms of writing in the wider sense of graphic representations of some aspect of language, and they may even employ notation which is visually the same, but their purposes are quite different. Spelling uses notation to write items of lexis and grammar which by

10

Phonetic Transcription in Theory and Practice

definition are language-­specific, whereas phonetic transcription uses notation to write an analysis of pronunciation-­forms using language-­independent symbols. By a pronunciation-­form I mean something pronounced, either real words of a particular language or nonsense words, looked at from a perspective which is neutral with respect to speaking and listening. The general term I shall use for the elements of spelling is character (Coulmas 1996: 72), a term that includes logograms, syllabograms, the letters of consonantaries and abugidas, alphabetic letters and also punctuation marks. For the elements of phonetic transcription I shall use the general term symbol to include all resources for segmental, suprasegmental and parametric transcription, including diacritics. The term glyph is a superordinate term for characters and symbols and is useful for referring to the graphic form of a character or symbol. Figure 1.2 shows this classification of notation by purpose. WRITING

NOTATION

Glyphs

SPELLING

TRANSCRIPTION

Characters

Symbols

Graphic resources

Graphic resources

for expressing lexis and grammar

for expressing analyses of pronunciation

FIGURE 1.2:  Classification of notation in writing

The three attributes of a ‘letter’ discussed by Abercrombie (1949/1965) – figura, potestas and nomen – are applicable to symbols as well as characters. They obviously both have written shape (figura), and can be referred to by some kind of name (nomen), for example the names given to phonetic symbols in Pullum and Ladusaw (1996). What is meant by potestas ‘power, ability, value’ is not so straightforward. Abercrombie takes it to be the pronunciation, in which case there would be no difference between a character and a symbol, and indeed he points out that the term ‘letter’ has traditionally been ambiguous between written character and speech sound. It is perhaps more useful to interpret potestas as the value a character or symbol has in its contexts of usage, that is



Theoretical Preliminaries

11

to say, its power or ability to distinguish one linguistic form from another; this interpretation seems to have been given to it by the Icelandic ‘First Grammarian’ in the twelfth century who took the littera doctrine from the Ars Grammatica of Donatus (Haugen 1972: 51–61). The value of a character is that it is a distinguishable unit of spelling, while the value of a phonetic symbol is its ability to express an analysis of a distinguishable unit of pronunciation (see Figure 1.2) or, to put it another way, to denote a model onto which a distinguishable unit of pronunciation can be mapped (see Section 1.3). Because phonetic transcription is a form of writing, there is a temptation to think of it as an alternative way of spelling, one that is more faithful to pronunciation-­forms than orthographies usually are, particularly in languages notorious for complicated sound–spelling correspondences such as English and French, or in languages that use writing systems which are more logographically oriented such as Chinese. This temptation is likely to be strengthened by the fact that most of the symbols of the IPA, currently the most commonly used phonetic notation system, are derived from roman alphabetic letters and have the same or similar shapes. But it is of fundamental importance to understand that phonetic transcription is not an orthography for the words and morphemes of any languages. Its purpose is to express, in a language-­independent notation, an analysis of pronunciation-­forms. There is also a widespread misunderstanding that the main purpose of spelling, especially in phonographically oriented writing, is to provide information about pronunciation, and that writing systems are defective to the extent that they cannot provide for one-­to-­one sound–spelling correspondences, and spellings are defective to the extent that they do not employ sound– spelling correspondences consistently and systematically. While information about pronunciation can be gleaned from spelling with varying degrees of reliability, the primary purpose of spelling is to identify which words and morphemes are being written. The reader will generally already know how to pronounce the spoken form of those words and morphemes. As the philologist Max Müller expressed it using Isaac Pitman’s 1876 alphabet in the magazine Fortnightly Review, ‘[r]aitiŋ woz never intended tu foutograf spouken laŋgwejez’ (quoted in Baker 1919: 209). To appreciate these points and their implications more fully, it is necessary to consider briefly what a linguistic system is, and the relationship between spoken and written language. There has been a long tradition, already alluded to in Section 1.1.2 above, stretching back to Aristotle in ancient Greece and persisting through to the writings of Saussure, that written language represents speech (Coulmas 2003: 2–12). The view still has currency, having been more recently expressed for example by DeFrancis (1989: 6–7) and Daniels (1996: 3). But challenges to this view have come from the recognition that spoken and written discourses have their own particular features such that the one cannot be seen merely as the transfer of the other into a different medium (Vachek 1945–9, 1973; McIntosh 1961; Pulgram 1965; Halliday 1985; Mulder 1994), and from theorising about the relationship between language, speech and writing. Critical perspectives on the relationship between spoken and written language are found in Harris (1986) and Olson (1994). For written language to be a representation of spoken language, concepts relating to linguistic structure such as ‘word’ and ‘syllable’, Olson argues, would

12

Phonetic Transcription in Theory and Practice

already have to have been explicitly recognised before the invention of writing. Olson (ibid.: 68) proposes the reverse, that ‘awareness of linguistic structure is a product of a writing system not a precondition for its development’ (my italics). Olson’s claim, that linguistic structure is only accessible for analysis once language has a written form, may, however, be mistaken. A vigorous tradition of grammatical scholarship arose in India during the early centuries of the first millennium bce culminating in descriptions of Sanskrit still regarded as exemplary linguistic analyses, for example Pāṇini’s Astādhyāyī ‘Eight Books’. It is very possible that these analyses were first carried out in the absence of literacy and were orally transmitted from memory, only later being set down in written form (Allen 1953: 15; Misra 1966: 19; however, for evidence of Pāṇini’s possible literacy see Bronkhorst 2002). Whether Olson is correct or not, there is no logical precedence of spoken language over written language. While it is accepted that spoken language existed for tens of thousands of years before writing was invented, and that human beings acquire spoken language before learning to read and write, it is logically possible for there to be a written language without a corresponding spoken language. Words and morphemes, the basic abstract items of language that possess meaning and grammatical properties, are equally independent from sound and from visual marks, but without sound they cannot be spoken or heard and without visual marks they cannot be written or seen. The fact that phylogenetically and ontogenetically the linguistic harnessing of sound predates the linguistic harnessing of visual marks has little if anything to do with any intrinsic properties of lexis and grammar. Explanation for these historical and developmental facts has to be sought in the evolution of cultural practices in human society (Trigger 2004) and the course of biological maturation in individuals from birth through infancy into childhood and beyond (Locke 1993). Because originally language only manifested through speech, when language started to be written it might well seem as if it were speech that was being written. The adaptation of Saussure’s concept of the linguistic sign in Figure 1.3 shows that the relationship of phonetic transcription to spoken language is not analogous to the relationship of spelling to written language. Saussure’s linguistic sign has two aspects (Saussure 1974: 65–7): the ‘signified’, which can be interpreted broadly as the meaning of the sign, and the ‘signifier’, which I will interpret as pertaining to the observable manifestation of the sign.3 The terms ‘content’ and ‘expression’ are often used instead of signified and signifier respectively. ‘Expression’ can be thought of as the clothing that a sign wears so that it can be recognised. In written language, spelling is the clothing while in spoken language it is the pronunciation. Phonetic transcription is a way of setting down in notation an analysis of what the clothing of spoken language is made of. An analogous description of what the clothing of written language is made of would be the naming of the characters used in the spelling of written signs. We can also, of course, name the symbols used in a phonetic transcription using, for example, the symbol names given in Pullum and Ladusaw (1996) and recommended, although not officially adopted, by the IPA (IPA 1999: 31, 166–84, 188–92). In doing so, we are treating a transcription symbol as a sign whose content is its phonetic definition and whose expression is a glyph, that is to say the glyph is



13

Theoretical Preliminaries

the ‘spelling’ of the sign. The point is that, unlike spelling, phonetic transcription does not express linguistic-­semantic meaning; it expresses an analysis of pronunciation. For example, the IPA transcription [ˈtʰeɪbəɫ] does not express the same as the spelling

– the latter expresses the word table whereas the former comprises symbols which express categories such as aspirated alveolar plosive, close-­mid front closing diphthong, etc. LANGUAGE

SPOKEN LINGUISTIC SIGNS

WRITTEN LINGUISTIC SIGNS

Content

Content

Expression (pronunciation using speech sounds)

Expression (spelling using characters)

Phonetic transcription using symbols to express an analysis of pronunciation

FIGURE 1.3:  The relationship of phonetic transcription to language

When spellings for a written language become fixed and an orthography is established, the criterion for correct spelling is not how closely it matches pronunciation but whether the correct expression units, i.e. characters, have been used and are in the right sequence. Pronunciation can vary widely, and change over time, without affecting spelling. To take an example from current British English, the correct spelling of the word party is whether or not the spoken word uses plosive [t], spirantised [s̝] or glottal [ʔ] to realise the /t/ phoneme, or just a hint of breathy voice in a pronunciation we can transcribe as [pʰɑː̤ɪ]; even if its pronunciation became homophonous with the word pie the identity of party would still be expressed in written English by its spelling as . Having said that phonetic transcription is not an alternative spelling system, it has to be pointed out that there are transcriptions which do have functions more like those of spelling, and may be considered a type of spelling. This is most true of representations of postulated invariant underlying forms in phonology, such as in morphophonemic transcription, which are discussed in Chapter 4 Sections 4.6 and 4.9. To summarise, phonetic transcription embodies in a written form an analysis of the expression elements of spoken language by using symbols which have

14

Phonetic Transcription in Theory and Practice

phonetic definitions drawn from phonetic theory. By contrast, spelling uses characters as the written expression of language. The characters themselves have no theoretical definitions. 1.1.4  Spoken and written languages as translation equivalents

It is justifiable to regard the relationship between the spoken and written forms of a language as a translation relationship (Mulder 1994: 54). To write a spoken word down, or to read out a written word, involves identifying equivalent items in two different systems in much the same way that translating from one language into another does (the difficulty, or even impossibility, of finding precise translation equivalents between languages does not affect the argument, nor am I necessarily claiming that written and spoken words are absolute equivalents within the same language). When literate translators translate between English book and French livre, six corresepondences and equivalences between expression-­forms are implicated (indicated by double-­headed arrows) as shown in Figure 1.4. English Spoken

[bυk]

Written

French

FIGURE 1.4:  Correspondences and equivalences between expression-­

forms in translations The expression-­forms of English are completely different from the equivalent expression-­forms of French; it is (near-­)equivalence of meaning that connects them all. The same is true if we look only at English or only at French. Ignoring the visual similarity of characters and symbols, the spoken expression-­form [bʊk] and the written expression-­form have nothing in common as expression-­ forms: the former is a pronunciation-­form, the latter a spelling-­form. Their only connection is via the abstract lexical item book for which they are both expressions in different media. It is worth pursuing this point a little further by looking at logographic writing, where the translation nature of spoken language and written language relationships is more obvious. The Chinese logogram for ‘below’ is while the spoken form of the word is [ˋɕiɛ].4 No properties of the one in any way suggest any properties of the other any more than properties of the English spelling-­form suggest the French pronunciation-­form [livʁ], or properties of the French spelling-­form suggest properties of English [bʊk]. The Saussurean doctrine of the arbitrariness of the linguistic sign holds sway over all these relationships of cross-­linguistic equivalence and cross-­medium correspondence. Insight into these relations, and into the question of whether writing is used to represent speech, is provided by the phenomenon of xenography (from Greek ξένος ‘stranger’), also called heterography. A xenogram, or heterogram, is a



Theoretical Preliminaries

15

loanword written in the spelling of the donor language but pronounced as the spoken translation equivalent in the borrowing language. An example would be if English were to spell book as but read it aloud as [bʊk]. The French spelling would provide no information about the English pronunciation but would identify the lexical item in logographic fashion. The similarity to translation is apparent when we see that xenography is the exploitation of the relation shown in Figure 1.4 between [bʊk] and . Xenograms have occurred here and there throughout the history of writing in situations of language contact and the borrowing of writing systems. Coulmas (1996: 564, see also Gelb 1969: 105–6, where they are referred to as allograms) mentions Sumerian spellings being used to correspond to Akkadian pronunciations (sometimes called Sumerograms), Aramaic spellings corresponding to Middle Persian pronunciations (sometimes called Aramaeograms; see also Skjærvø 1996: 517–20), and Chinese characters corresponding to Japanese pronunciations in Japanese kanji. Xenography shows that the only absolutely crucial correspondences between written and spoken language are at the level of lexis and grammar.

1.2  Phonetic Symbols and Speech Sounds At first sight it may seem self-­evident that what phonetic symbols denote are speech sounds. They are often talked of in this way, but there are three major difficulties to consider: the notion of a single discrete speech sound itself as an identifiable object, the indeterminate complexity of speech, and the problem of real-­world extension. 1.2.1  Speech sounds as discrete segments

The notion of a single discrete speech sound, often referred to as a ‘segment’, is highly problematic in the context of spoken language. It has become commonplace in phonetics and phonology to regard the segment as a ‘fiction’ (Abercrombie 1965: 122, 1991: 30–1; Laver 1994: 568) and to stress the parametric nature of continuous speech, but the fictional status of the segment needs some critical discussion if we are not to fall into the trap of dismissing it as something devoid of any kind of reality. It is perfectly possible to produce an isolated steady-­state vowel sound such as [a], or nasal such as [m], or fricative such as [s], or lateral approximant such as [l], and quite feasible with some practice to produce isolated stops of various kinds with release bursts unaccompanied by vowels, such as [p]. These sounds can be produced by speakers and perceived by listeners, they are discrete, and they are every bit as materially real as speech. But we cannot meaningfully call them segments because they are not part of a larger item: the term ‘segment’ implies ‘segment of’ an articulated structure. When we look at the phonetic structure of speech we do not find it composed of discrete sounds strung together, in Hockett’s (1955: 210) simile, like beads on a string. The phenomenon of formant transitions nicely illustrates the problem of segmentation. Experiments in speech perception have shown that information about the place of articulation of a stop consonant is contained in the formants of adjacent vowels as they undergo changes in frequency caused by changes in

16

Phonetic Transcription in Theory and Practice

vocal tract shape. The presence of the transitions is enough to cause listeners to hear the stops. Formant transitions are, from an auditory-­perceptual perspective, part of the structure of the stops as much as they are part of the acoustic structure of the vowels. The resonant properties of the transitions are vocalic but they are encoding information about stops which are not vocalic. The form of the transition information, we might say, belongs to the vowels carrying it but the value of the information belongs to the stop articulations causing, and being perceptually cued by, the transitions. It is impossible to segment between the form and the value of the information. The ‘fiction’ that Abercrombie and Laver talk about comes from treating speech as if it were constructed from the kind of discrete vocal sounds which we know can exist outside of speech. But it does not take much to abstract sounds perceptually from speech and equate their qualities with the qualities of these discrete sounds, for example equating the vowel sound in the pronunciation of cat with an isolated [a]. We can then treat phenomena such as formant transitions as if they result from contextual influences on otherwise discrete and spectrally stable sounds. The fact that we can do this attests to some normalising and integrating processes in our perceptual and cognitive systems enabling us to identify segments in our perceptions (Repp 1981: 1462; Raphael 2005: 200–1; and see Chapter 5 Section 5.3) and to operate with the notion ‘segment’ as a pre-­theoretical model of the kind that may have facilitated the development of alphabetic writing. Postulated contextual influences on putatively discrete and stable segments are referred to in phonetics and phonology as ‘coarticulation’, a phenomenon which Laver points out is a further fiction necessitated by the fiction of the segment, an ‘antithetic error’ which Abercrombie (1989/1991: 31) sees as a case of enabling two wrongs to make a right. It needs to be appreciated, though, that in phonetics as in literature, fiction is not the same as fantasy. Analysing and describing speech in segmental terms, and transcribing it with discrete phonetic symbols, are based on a principled understanding of the structure of speech and how it can fruitfully be analysed, not on unbridled invention or naïve assumption. It may even parallel quite closely how we process the time-­varying speech signal in terms of stable percepts when we listen to it, rather than parallelling speech production processes (see Chapter 5 Section 5.3). Nonetheless, it is absolutely necessary to remember that symbols in a segmental transcription do not in themselves accurately reflect the temporal structure of speech as revealed instrumentally; readers of a transcription with sufficient knowledge of phonetics will not be misled into thinking that they do. Because we can analyse speech in terms of segments does not, and should not, commit us to the view that it is produced in terms of segments. One way to align segmental transcriptions with the temporal structure of speech is to exploit the fact that the acoustic signal can be segmented into discrete acoustic classes (Fant 1962: 11; Barry and Fourcin 1990: 32–3, 40). The prime acoustic classes in speech are silence, transience, aperiodicity and periodicity. Silence occurs in the structure of speech as the acoustic correlate of the articulatory hold phase of a voiceless oral stop; transience occurs when there is a sudden release of air pressure causing a single pressure pulse, for example the release



Theoretical Preliminaries

17

burst of a plosive; aperiodicity is found as a result of air being forced through a partial articulatory stricture under pressure to create the turbulence of fricatives characterised by the quasi-­random variation of frequency and amplitude; periodicity is characterised by regularly repeated pressure pulses of very similar frequency and amplitude resulting from vocal fold vibration, occurring in all voiced sounds. Acoustic classes can occur singly or in certain combinations. A voiced fricative, for example, combines aperiodicity and periodicity; a voiced plosive combines transience and periodicity. All in all we can set up six basic acoustic classes: four simple ones and two compound ones. The spectrogram and synchronised waveform of the utterance So does she keep chickens? in Figure 1.5 show how speech can be segmented into these acoustic classes. The phonetic transcription underneath is an approximate indication of how the classes relate to the phonetic structure of the utterance. Further acoustic subclasses could be set up by reference to spectral and amplitude discontinuities such as can be seen in Figure 1.5 at the points marked on the waveform by the arrows. Yet further subclasses could be established on the basis of the distribution of acoustic energy (see Turk, Nakai and Sugahara 2006 for discussion of criteria for acoustic segmentation). If a different symbol were to be assigned to each subclass then we could use symbols to express categories that occur discretely and objectively in speech as segments. The reason we do not do this may be partly because phonetic notation is still firmly rooted in the tradition of focusing on the articulatory domain of speech (MacMahon 1996: 821), but it is surely mostly because we would lose track of the linguistic-­phonetic information

FIGURE 1.5:  Segmentation of So does she keep chickens? into acoustic

classes. s = silence, t = transience, a = aperiodicity, p = periodicity

18

Phonetic Transcription in Theory and Practice

which is distributed across acoustic class boundaries (see for example Fowler 1986: 11–13). This information is important because what phonetics is most interested in is not speech as a catenation of noises but speech as the pronunciation of language. There is experimental evidence that we perceive speech in ‘temporal compounds’ (Warren 2008: 198–9) which may contain many changes of acoustic class extending over at least a whole syllable and encompassing realisations of several phonemes, from which we can then ‘infer’ the presence of a segmental structure (see Chapter 5 Section 5.3). General phonetic categories are of interest because of how they can be put into relations with the phonological categories of spoken language. It is more fruitful to deal in phonetic categories that more closely match our phonological categories than in ones that refer only to acoustic classes. For example, it is useful if the symbol [d] can be interpreted to include formant transitions in adjacent vowels as well as a hold phase and a burst; all these phenomena and the acoustic classes in which they are embedded are relevant to [d] as the realisation of a phonological item /d/. They may well also be highly relevant to the stability of the auditory correlate of [d], despite considerable differences in formant transition patterns depending on the frontness or backness of adjacent vowels. Further discussion of the notion of a segmental speech sound, and a defence of its legitimacy in phonetic description, is presented in Chapter 2 Section 2.2.4. 1.2.2  Complexity of speech sounds

The second difficulty with the claim that symbols denote speech sounds is that, even in the case of an isolated steady-­state sound, the processes and events going on are too numerous to identify. As Pike (1943: 152) has counselled, ‘no phonetic description, no matter how detailed, is complete’. Speech is a series of overlapping events taking place in articulation, aerodynamics, acoustic transmission, auditory reception and perception, which are interlocking domains connected in a chain of cause-­and-­effect relations of a complex and often non-­monotonic kind. No transcription can ever hope to denote all the events in even one of these domains, never mind all of them, nor can any phonetician claim to know everything about them all. To take a very simple example, the vowel sound transcribed by the IPA symbol [ɑ] involves the following events (the list is by no means exhaustive): 1. 2.

In the articulatory domain: contractions and relaxations of the intercostal and abdominal muscles, contraction of various intrinsic laryngeal muscles, repeated opening and closing of the glottis, lowering of the jaw and tongue, and retraction of the tongue root into the pharynx. In the aerodynamic domain: movement of air up the trachea, increases and decreases in subglottal air pressure, and jets of air releasing into the pharynx.



Theoretical Preliminaries

19

3. In the acoustic domain: rapid oscillations of countless air particles at thousands of different frequencies and amplitudes, and the formation of a standing wave in the vocal tract with pressure and velocity nodes. 4. In the auditory domain: rapid oscillations of the eardrum and the perilymph fluid, repeated stimulations of the hair cells in the inner ear, and repeated firings of many auditory nerves. 5. In the perceptual domain: awareness in consciousness of a sound having a particular pitch, timbre and loudness. In transcription all these are distilled down to [ɑ], a static visual object, and it is far from clear how we should characterise the relationship between all these myriad events and a single symbol. We cannot describe or observe all the individual events, even by marshalling a whole battery of instrumental techniques. We cannot even know, at the lower levels of detail, how many events take place. If we claim that phonetic symbols denote sounds, then we have to admit that we do not fully know what it is they actually denote because we cannot fully know everything about sounds. This situation is not of course unique to phonetic notation but is shared by all forms of representation – whatever is represented, we cannot know everything about it. Our view of the thing represented is selective, shaped by properties of our perceptual and cognitive systems, by our experiences and by the purpose for which we wish to represent it, otherwise it would be an exact copy, like the map in Borges’s story ‘that was of the same Scale as the Empire and that coincided with it point for point’.5 Because phonetic symbols express an analysis of speech, and because we can only analyse things in terms of what we know about them, it follows that phonetic symbols cannot, at any one time, denote anything beyond the limits of what is known about speech at that time. It is the role of phonetic theory to systematise our knowledge of speech by identifying the important parameters along which it varies to give rise to distinguishably different sound-­types – place of articulation, degree of stricture, glottal settings and so on. It is from these parameters and parameter-­values that phonetic theory constructs its models, and, as discussed in Section 1.3, it is these models that phonetic symbols denote. 1.2.3  Speech sounds vs. analysis of speech sounds

The third and final of the three serious difficulties attending any claim that phonetic symbols denote speech sounds concerns the problem of real-­world extension. The same problem is encountered by the claim that in language words directly denote things (Lyons 1977: 216). Suppose we did want to use symbols to denote actual speech sounds. We hear a sound si and denote it with the symbol σi. We then hear another sound, sj, which to our ears sounds the same as si, but we cannot use the symbol σi because that denotes si. Things soon get out of hand because of the sheer numbers involved. If symbols denote individual sounds

20

Phonetic Transcription in Theory and Practice

then each symbol must denote a sound produced at a particular time and place and no other. With this restriction all transcriptions would have to be specific transcriptions (see Chapter 4 Section 4.1 for the distinction between specific and generic transcriptions) and all transcriptions would have to be unique, just as, if words denoted individual things, there could be no generic reference, only specific reference. Furthermore, symbols in these conditions would only serve as substitutes for sounds, needed for no reason other than that sound cannot be put onto paper – we could instead carry round sacks of recordings in the manner of Swift’s Lagado professors.6 Symbols in transcriptions would therefore not be capable of embodying a phonetic analysis of the expression elements of spoken language because they would not be denoting theoretical categories, but would only denote specific non-­equivalent events (or sets of events); they would not even be embodying pre-­theoretical analyses of the kind required to judge that two things share some common property. Nor is it a solution to say that a phonetic symbol in a transcription denotes the set or class of sounds of which specific sounds are members. A set of sounds is a potentially ever-­growing collection of individual sounds simply giving us more and more of the same. It is only when we come to consider criteria for assigning sounds to sets that we start to get somewhere. If we assign sounds to the same set because they sound the same then we are indeed applying a pre-­theoretical analysis to recognise the similarity; our symbol can then denote this similarity. If we have a theory that can account for the similarity then we are applying a theoretical analysis and our symbol can denote the theoretical category or categories in terms of which we make that analysis. These issues are explored further in Section 1.3.

1.3  Phonetic Notation, General Phonetic Models and the Role of Phonetic Theory The answer to the problems raised in Section 1.2 is to regard a system of phonetic notation as a system for denoting general phonetic models. Models are either theoretical or pre-­theoretical. Theoretical models are generated by the categories of a theory whereas pre-­theoretical models are abstractions from experience and more like prototypes in recognition memory (Johnson 2007: 30–2) or imitation labels. If we use a symbol in the absence of a phonetic theory then we have to find some way of defining the model it denotes without recourse to a theory. The alternative to a theoretical definition is an ostensive definition. What «b» denotes can be defined ostensively as the sound at the beginning of the spoken word bee.7 Phonetic theory plays no part in such a definition. Ostensive definitions can be refined into something more general and abstract by saying that «b» is what the spoken words bee, boot, bark and so on have in common. Ostensive definitions of this kind rely firstly on one’s having experienced the relevant spoken words, and secondly on one’s ability to notice and abstract the relevant similarity from them. Pre-­theoretical phonetic models can therefore be defined in terms of the commonalities shared by members of sets of known pronunciation-­ forms. But there is a circularity here: the very phenomena one wishes to model are furnishing the models to be used for modelling them. Circularity is broken if we have an adequate phonetic theory to provide the definitions for our models



21

Theoretical Preliminaries

and the categories for their generation. What we think of as the sounds of speech are constellations of events whose complexity, as we have seen in Section 1.2, defies exhaustive description. To deal with this intransigence we theorise about the most salient identifiable events marking off one distinguishable sound from another and set them up as a network of interrelated theoretical categories, in the dimensions of an abstract taxonomic space. The intersections of these categories generate theoretical models as shown in a simple two-­dimensional space in Figure 1.6; it should be understood that in fact there is no limit to the number of dimensions that can be set up. The role of phonetic theory in relation to phonetic notation is thus crucial on two counts: it furnishes us with categories for the analysis of speech, and it enables us to set up these categories as models in a non-­circular way. In other words, it provides the denotata for phonetic notation. Part of the task of phonetic theory is to chart abstract taxonomic space by setting up the kinds of dimensions and categories that observable phonetic data can be mapped onto, to decide how many dimensions and categories are required, and to work out which categories can and cannot co-­occur.

DIMENSION

y

DIMENSION

x

Category i

Category j

Category k

Category c

Model ci

Model cj

Model ck

Category d

Model di

Model dj

Model dk

Category e

Model ei

Model ej

Model ek

FIGURE 1.6:  Categories, dimensions and models in a small,

two-­dimensional, abstract taxonomic space I shall call any notation system not underpinned by a phonetic theory ‘pseudo-­ notation’ and its symbols ‘pseudo-­ phonetic symbols’; a system of notation which is underpinned by phonetic theory I shall call proper notation and its symbols proper phonetic symbols. ‘Pseudo’ and ‘proper’ are not to be taken as value terms. The role of phonetic theory in relation to phonetic notation is therefore crucial. It is responsible for distinguishing between a proper notation which qualifies as a technographic writing system with a scientific basis (Mountford 1996: 628) on the one hand, and a pseudo-­notation based on abstraction from experienced exemplars on the other hand. Commonly encountered forms of pseudo-­transcription are respelling and transliteration (see Section 1.5 below). Any expression element from any glottographic writing system can be used to represent some aspect of pronunciation on the basis of correspondences between elements of a writing system and elements of pronunciation without phonetic theory playing any part. This is how the rebus principle arose and how phonography has gained ground in the diachrony of writing systems (see Chapter 2 Section 2.2). Phonetic theory is not a prerequisite in such cases; all

22

Phonetic Transcription in Theory and Practice

that is needed is an ability to make same-­or-­different judgements about pronunciation in a pre-­theoretical manner. If phonetically untrained literate English speakers hear a proper name they have not heard before and do not know how to spell it, they can try to write it using the letters of the English alphabet to represent the sounds that they identify. The result will be a pseudo-­transcription and the person will have used the letters as a pseudo-­notation system. It is important to understand that they will not thereby have spelled the name whether or not the result is the same arrangement of letters as the spelling. One can only spell a word if one knows the spelling. If one guesses a spelling, one does so via pseudo-­transcription – witness the idiosyncratic sound–spelling relations in proper names such as -­[ˈʧʌmli], -­[ˈkiːθli], -­[ˈsafəld] (from Wells 2008). In pseudo-­transcription, sounds will not have been identified through theoretically informed phonetic analysis, and therefore the transcription will not be expressing such an analysis. It does, however, express a pre-­theoretical analysis of the kind needed to make similarity judgements. Conversely, if a reader is presented with an unknown name in written form, the spelling can take on the properties of a pseudo-­transcription if the reader tries to extract information about its pronunciation. The key point about a proper phonetic transcription is that it expresses an analysis into theoretically defined categories. A pseudo-­transcription does express some kind of analysis, but into elements that are not theoretically defined. They will be known through ostensive definition which by its nature relies on experience, not on knowledge of theory. Compare, for example, the theoretical definition of [b] as ‘voiced bilabial plosive’ and the ostensive definition of «b» as ‘the first sound in the word bee’. Different kinds of knowledge are required to understand these definitions and different kinds of analyses are undertaken by applying them. Pseudo-­notation is a set of graphic resources for expressing a pre-­theoretical analysis of pronunciation, and pseudo-­ transcription is the deployment of a pseudo-­notation to express a pre-­theoretical analysis. Proper phonetic notation is a set of graphic resources for expressing a theoretically informed analysis, and proper transcription is the deployment of proper phonetic notation. Transliteration tends in practice also to be pseudo-­transcription (see Section 1.5.1 below). The process by which one language borrows and adapts a writing system from another language involves pseudo-­transcription in which the expression elements are transferred into the borrowing language as pseudo-­notation (see Figure 2.1 in Chapter 2). A distinction needs to be made between graphic resources for notation being taken, on the one hand, entirely from an orthography and, on the other hand, being developed or created as a special phonetic system of notation. I shall call notation ‘proto-­phonetic’ if it is based on phonetic theory but uses only orthographic resources. We therefore have three possibilities for the status of a phonetic notation system (see also Figure 1.9 in Section 1.7 below): 1. Pseudo-­notation – denoting models not defined by phonetic theory; comprising orthographic characters which then take on the status of pseudo-­ phonetic symbols; enclosed in double angled brackets, e.g. «b».



Theoretical Preliminaries

23

2. Proto-­notation – denoting models defined by phonetic theory; comprising orthographic characters which then take on the status of proto-­phonetic symbols; enclosed in ornate parentheses, e.g. (b). 3. Proper notation – denoting models defined by phonetic theory; comprising a special notation system of proper phonetic symbols; enclosed in square brackets, e.g. [b]. The status of a transcription is defined by the status of the notation system in which it is written. The same glyph can be a spelling letter, a pseudo-­phonetic symbol, a proto-­phonetic symbol, or a proper phonetic symbol depending on the purpose for which it is used and how it is read and interpreted. The glyph ‘b’ can be used as the letter in spelling the English written words bat, blue, debt, climb, or as a pseudo-­phonetic symbol «b» in transcribing a spoken word perceived to contain a sound that the spoken words bee, boot, bark etc. have in common, or as a proto-­phonetic symbol ﴾b﴿ in transcribing a spoken word containing a sound analysed as a voiced bilabial plosive where the symbol comes from an orthography, or as a proper phonetic symbol [b] in transcribing a spoken word containing a sound analysed as a voiced bilabial plosive where the symbol comes from a phonetic notation system. A phonetic symbol can be defined as a glyph in relation with a phonetic denotatum. Proper symbols and proto-­symbols can be defined formally as in (1.1) where R is a denoting relation: (1.1)

Phonetic symbol = Glyph R theoretical phonetic model Example: [b] or ﴾b﴿ = ‘b’ R voiced bilabial plosive

Pseudo-­phonetic symbols are glyphs in relation with non-­theoretical denotata such as ostensive definitions based on commonalities as in (1.2). (1.2)

Pseudo-­phonetic symbol = Glyph R ostensive definition Example: «b» = ‘b’ R what bee, bat, crab have in common

What distinguishes a proper symbol from a proto-­symbol is that it is a member of a set of symbols which is not co-­extensive with the set of orthographic letters used for spelling a written language. The IPA symbol [b] has systematic relations with symbols such as [ɓ] and [ʘ] which are not used for spellings; the letter has sequential relations with , , etc. in the order of the alphabet. Proper symbols and proto-­symbols denote analytic models whereas pseudo-­symbols tend to denote holistic prototype models. Proper phonetic notation will not be as constrained as pseudo-­and proto-­ notation by limits on the graphic resources available and on the number of distinctions among the sounds and parameters of speech that can be notated. It ought also to be less biased towards particular languages and types of languages, although language biases are probably always going to feature to some extent in transcriptional practice (see Chapter 5 Section 5.11). As Ladefoged (1990: 343–4) has pointed out, ‘[o]nce a language has been learned one is living in a room with a limited view. [. . .] Even skilled phoneticians will fail to recognise

24

Phonetic Transcription in Theory and Practice

auditory distinctions to which they are completely unaccustomed.’ It has to be acknowledged also that special systems of phonetic notation such as the IPA have in-­built biases reflecting the linguistic context of their origins and development (see Chapter 3 Section 3.4.5). Once it is set up, phonetic theory generates its complex models from categories independently of experience. For example, the IPA chart generates the model ‘pharyngeal nasal’ from the categories ‘pharyngeal’ and ‘nasal’ although no such sound is possible, and therefore no symbol has been provided for it. Obviously, no such model could come about as a result of abstraction from experience because no such sounds will ever have been experienced. In so far as the models denoted by phonetic notation are constructed by phonetic theory independently of specific languages, they are general phonetic models. Phonetic symbols can be said to denote descriptive phonetic models when they are used in relation to language data in transcriptions, and to represent, or refer to, those phenomena which are mapped onto the general phonetic models. 1.3.1  Phonetic transcription as descriptive phonetic models

A phonetic notation system on its own denotes the categories and models in terms of which analyses of pronunciation can be made. When used in a transcription of speech data the theoretical models denoted by symbols become descriptive models through having observed phenomena mapped onto them (for the distinction between theory and description on which this approach is based see Mulder 1975). Transcribers have to judge whether the phenomena meet the criteria for being mapped onto a particular model (see Chapter 5 Section 5.9). The phenomena in question may be linguistic, in the sense of realising categories of linguistic structure such as phonemes or tones, or may be paralinguistic or extralinguistic – the only limitation is that they must be produced by the human vocal tract. They may also belong to any of the domains of phonetic phenomena, of which it is useful to recognise five: articulatory, aerodynamic, acoustic, auditory and perceptual (see Chapter 6 Section 6.5). At this point we need to distinguish between denoting on the one hand, and representing or referring to on the other hand, in relation to phonetic symbols. A descriptive model is the conjunction of a theoretical model which is denoted by the phonetic symbol, and certain speech phenomena which are mapped onto the theoretical model and which are referred to, or represented by, the symbol; these relations are diagrammed in Figure 1.7. Symbols in transcriptions are descriptive models. Whenever I talk about phonetic symbols representing sounds or referring to sounds in the ensuing sections and chapters it should be understood in the way just explained. In addition to representing and referring to sounds, symbols also express an analysis of them by virtue of the theoretical models they denote – the representing/referring capacity is extensional, while the analysis-­expressing capacity is intensional. That is to say, a potentially infinite number of referents can have one and the same analysis, or, in other words, an infinite number of descriptive models can relate to a single theoretical model.



25

Theoretical Preliminaries THEORETICAL

PRONUNCIATION

MODEL [a]

PHENOMENA

Low front unrounded vowel

Denotes

Mapping relation

[a]

Sounds judged to meet relevant criteria for the theoretical model [a]

Represents

DESCRIPTIVE MODEL

FIGURE 1.7:  The mapping of speech phenomena onto a theoretical model

creates a descriptive model Phonetic transcriptions, then, are composed of descriptive phonetic models. A phonetic transcription is a proper phonetic transcription if the descriptive models derive from pronunciation phenomena being mapped onto theoretical models and a special phonetic notation is used for writing it; it is a proto-­phonetic transcription if it is written with orthographic characters; and it is a pseudo-­ phonetic transcription if the descriptive models derive from phenomena being mapped onto pre-­theoretical models of what several pronunciation-­forms have in common. Again it should be stressed that the terms ‘proper’, ‘proto’ and ‘pseudo’ are not value terms. Proper phonetic transcription is not intrinsically better than pseudo-­or proto-­transcription; how good a transcription is depends on how well it fulfils its aims and purposes. The differences are nevertheless very important and hinge on whether there is a body of phonetic theory underpinning the notation to provide it with consistent phonetic definitions, and whether the notation comprises a set of special symbols linked to the theory by interpretative conventions such as those of the IPA. 1.3.2  Phonetic transcription as data reduction-­by-­analysis

Representing the myriad events of continuous speech as a linear sequence of a relatively small number of stationary graphic objects, rather than being an unfortunate limitation, is precisely what makes transcription useful. It is a process of data reduction in which the transcriber tries to make static order out of a seeming dynamic chaos by analysing an utterance in terms of known phonetic categories. It can furnish us with a visual record of an analysis of a particular observed utterance by denoting the categories which, in the judgement of the transcriber, are the most appropriate ones for mapping the phonetic phenomena onto. Sounds as auditory events appear and disappear in an audio recording just as they do in live speech. Although it is possible to slow playback down without affecting the pitch of the speaker’s voice, the constantly changing signal makes it difficult to recognise recurring patterns in a speaker’s pronunciation of the kind a phonetician, dialectologist, sociolinguist, conversation analyst, speech pathologist or forensic phonetician might be interested in. Patterns can be seen much more easily in a transcription when the eye can scan the page at leisure. But a specific transcription does both more and less than arrest the sounds of

26

Phonetic Transcription in Theory and Practice

speech as they fly by. Whereas an audio recorder simply registers whatever hits the microphone, a transcriber has to make judgements about what hits his or her ear and make decisions about how to represent it. Inevitably during this analytic and interpretative process certain aspects of the raw speech signal will escape the transcriber’s notice, or be judged not worth including in the transcription. The transcriber’s own language background, and experience in doing transcription, will partly determine what escapes notice and what is judged relevant. In this sense, in addition to the impossibility of capturing all speech events, a transcription contains less than the utterance it purports to represent. That is to say, a narrow phonetic transcription could always contain more if more time and effort were spent on it, though one has to recognise the law of diminishing returns. On the other hand, a consideration of the theory-­dependence of transcription leads to the conclusion that in a crucial sense a transcription contains more than the raw utterance contains. It contains a classification, based on the categories of phonetic theory, of what the transcriber thinks are the relevant constituent parts of the phonetics of the utterance. Abercrombie makes precisely this point when he says ‘phonetic transcription records not an utterance but an analysis of an utterance’ (Abercrombie 1967: 127). This truth should never be overlooked when we think of phonetic transcription as a form of data reduction: the fact that it expresses a data analysis means that it is also data-­enhancing. This is the import of Thomas Carlyle’s observation that ‘[i]n a symbol there is concealement [sic] and yet revelation’. Phonetic transcription helps to make spoken language more available for further phonological analysis by, ironically, representing it in a written form. By so doing it does, to some extent, imprison it in ‘the written language bias’ that Linell (1982) saw in linguistics in general. For example, segmental transcriptions usually take the ‘word’ as the basic unit of utterance structure and employ the convention of bounding words with spaces despite the absence of spaces between the pronunciations of words in continuous speech. Parametric transcription is more faithful to speech in this respect. Nevertheless, weighing against this written language bias is the ability of phonetic transcription to capture aspects of the prosody of spoken language, and paralinguistic and extralinguistic features such as voice quality, tempo and loudness, most of which have no common parallels in written language. Although writing can use devices such as enlarged characters, changes of case and font, different colours and so on for emphasis and other effects, these are not systematic and are not all routinely employed outside of advertising and graphic design. By contrast, it is impossible for spoken language not to have voice quality, pitch, tempo and loudness, all of which are manipulated by speakers for communicative purposes of one kind or another. Any system of phonetic notation should provide resources for representing these kinds of features in transcriptions.

1.4  Content of Phonetic Models Theoretical models belong to theories not to data. It follows that the content of a theoretical model cannot be of the same kind as the contents of data. In Chapter 6 Section 6.5 I propose that the categories of phonetic theory should be conceived



Theoretical Preliminaries

27

of as neutral with respect to the domains of articulation, aerodynamics, acoustics, auditory processing and perception, despite the largely articulatory terminology of sytems such as the IPA, so that phonetic symbols are independent of these domains whilst being interpretable within each domain through domain-­specific conventions. That is to say, the theoretical categories of phonetics inhabit first and foremost taxonomic phonetic space, and inhabit specific domains by general phonetic conventions. What, then, is the content of the theoretical model denoted by, for example, the IPA symbol [b]? According to Principle 2 of the IPA (1999: 159), it is ‘voiced, bilabial, plosive’, the categories that intersect to generate the model. This is surely the correct way to define the content of a theoretical model so that it can be exhaustively defined, providing that we can maintian domain-­ neutrality. When we use the term ‘labial’, does it always and only refer to labial activity, that is to say is it confined to the articulatory domain? This question is taken up and discussed in Chapter 6 Sections 6.4 and 6.5 in relation to multi-­ tiered transcriptions in which each tier takes a different perspective on the data: speaker-­oriented transcriptions take an articulatory perspective in which symbols have articulatory interpretations, signal-­oriented transcriptions take an acoustic perspective in which symbols have an acoustic interpretation, and listener-­ oriented transcriptions take an auditory-­perceptual perspective in which symbols need to be interpreted accordingly. Transcriptions expressing an interpretation of articulatory and acoustic records have to denote, respectively, articulatory and acoustic categories to be meaningful, likewise transcriptions expressing an auditory-­perceptual analysis. ‘Labial’ from an acoustic perspective denotes negative formant transitions and whatever else is thought to be an acoustic correlate of ‘labial’. In an impressionistic transcription ‘labial’ denotes auditory-­perceptual correlates – what labiality sounds like – and, importantly, from an articulatory perspective it denotes articulatory correlates rather than being defined exclusively in articulatory terms. That is to say, phonetic transcription is better served if phonetic categories are set up as domain-­neutral with domain-­specific correlates. Historically, phonetic categories have tended to be overwhelmingly articulation-­based, which has led to problems in making and reading transcriptions without direct access to articulatory data (Heselwood 2008b: 90–2). Exhaustive definition of a theoretical model does not entail exhaustive definition of a descriptive model. While it is true that what a symbol denotes is exhaustively determined by the structure of taxonomic phonetic space, what it represents, or refers to, is a mixture of known and unknown real-­world properties in whichever domain the transcription is oriented to. In the case of [b] the speech phenomena we map onto this model may have many unknowns, such as the position of the tongue-­tip, the volume of the buccal chamber, the tilt of the epiglottis, the height of the larynx and so on. Until we know everything about speech phenomena and can structure phonetic space so finely that no detail need ever be unaccounted for, a descriptive model in transcription will in a sense always represent more than it denotes. This means that the analysis expressed by the theoretical model is not an exhaustive analysis in so far as our knowledge of the speech phenomena in question is incomplete. That is to say, we must not mistake classifications for descriptions (O’Connor 1973: 125–8; Howard and

28

Phonetic Transcription in Theory and Practice

Heselwood 2013: 73–9). Our understanding of [b] as a descriptive model in a transcription depends not on knowing everything it is made of as a datum, but on knowing how it relates to other objects in taxonomic phonetic space along certain dimensions. The question of what something is made of is a question to be levelled at the speech phenomena which are mapped onto theoretical models, not at the theoretical models themselves. Phonetic instruments have a pivotal role when our ears cannot answer such questions. Their revelations can lead to the setting up of additional dimensions in abstract articulatory or acoustic space so that its structure becomes finer and more of the content of speech phenomena can be mapped onto models defined in that enriched space. Taking this view of the content of phonetic models allows us, I suggest, to accept Ladefoged’s (1990: 338) assertion that ‘the symbols are not symbols for phones; they are simply shorthand for what a phonologist would regard as a bundle of features’, whilst also accepting Ashby’s (1990: 23) rival claim that ‘they represent sound types’. Accommodation of these apparently conflicting positions is achieved if we take Ladefoged’s view to be true of the theoretical model denoted by a symbol in a notation system, and Ashby’s to be true of the descriptive models represented by a symbol in a transcription.

1.5  Respelling as Pseudo-­Phonetic Transcription Respelling is a strategy, used in some monolingual and bilingual dictionaries and language teaching materials, for indicating pronunciation more accurately than the normal spelling does. Respelling uses orthographic conventions but regularises their correspondences with elements of pronunciation so that, as far as possible, the same character always corresponds to the same pronunciation element. The pronunciation elements they correspond to can be thought of roughly as phonemes, although usually no explicit phoneme theory is invoked for identifying them. A need for respelling is often felt when spelling has become standardised and fixed while pronunciation has continued to change. In such conditions sound–spelling correspondences become more opaque and irregular so that readers who do not know the pronunciation of the word cannot reliably work it out from the orthography. Respellings are a means of trying to re-­establish more direct sound–spelling correspondences and maintain a transparently phonographic written language. I will try now to characterise what respellings are from a theoretical point of view. The important question is whether respellings are best seen as a type of spelling or a type of phonetic transcription. This question in effect asks if they are expressions of written words, or analyses of the expressions of spoken words. Expressions of written words have the function of enabling the reader to recognise those words via their written form. Respellings, it could be argued, do not have this function because the word has usually already been identified by its normal spelling. It could only clearly be said to have a word-­identifying function if it were replacing the conventional spelling as part, for example, of a spelling reform programme. The purpose of the respellings we are considering is to give the reader a better idea of the pronunciation of the item than the normal spelling provides. But how far can it be said to embody explicitly an analysis of the



Theoretical Preliminaries

29

spoken form? The analysis embodied in a proper phonetic transcription relies on phonetic theory for its recovery. A reader with no knowledge of phonetic theory cannot recover that analysis. Yet some analysis of sound–spelling correspondences in the language has to have taken place to decide which letters should be used in the respelling. Analysis into sound-­types of the kind required for phonographic writing systems is therefore presupposed. We can characterise this awareness of sound-­types as pre-­theoretical phonetic knowledge and characterise respellings as embodying a pre-­theoretical analysis of pronunciation. Being pre-­theoretical, it has no explicit classificatory framework within which to make its analysis, whereas proto-­and proper phonetic transcriptions do. Respellings are in effect transcriptions made outside of any theoretical phonetic framework and qualify as pseudo-­transcriptions as defined in Section 1.3 above. The orthographic resources used in respelling therefore take on the status of a pseudo-­phonetic notation. 1.5.1  Transliteration as pseudo-­phonetic transcription

Transliteration is defined by Coulmas (1996: 510) as the ‘one-­to-­one conversion of the graphemes of one writing system into those of another writing system’. It involves replacing the expression elements of written language signs with a different set of expression elements, e.g. writing English words using Arabic letters, or Hindi words using Japanese syllabograms. The English and Hindi words still have to be recognisable as English and Hindi words but they no longer wear their normal clothing because the spelling–sound correspondences of Arabic and Japanese have been transferred into the writing of English and Hindi. The conversion cannot proceed without reference to the pronunciation of both of the languages involved. Examination of an example of the English word boot transliterated into Arabic characters will make this clear. If someone with sufficient knowledge of English and Arabic is asked to transliterate English into Arabic characters they are very likely to write it as .8 There is nothing about ب‬to suggest it is the appropriate character to transliterate , and the same is true of the other characters. The characters are chosen not because of any intrinsic properties they have linking them to the English characters (although as it happens there may be distant historical links – see Gardiner 1916) but because they have correspondences with closely comparable phonemes in the two languages. The English letter corresponds to the English phoneme /b/, exceptions such as debt and comb notwithstanding, and the Arabic letter ب‬corresponds to the Arabic phoneme /b/; the English digraph corresponds mostly to English /uː/, and Arabic و‬corresponds to Arabic /uː/ (also to /w/); English corresponds to the English phoneme /t/ and in written Arabic ت‬corresponds to /t/. It is these correspondences that determine the form of the transliteration. In fact, there need be no reference to the English spelling at all. When ب‬is used in writing English boot it is in effect a transcription of spoken English /b/. If it is carried out outside of a phonetic theory it is a pseudo-­transcription which can then function as a respelling, or even as a first spelling if the language has not previously had a written form. This is the principal process by which writing systems are adapted for

30

Phonetic Transcription in Theory and Practice

writing other languages, a process that has been repeated many, many times in the history of human literacy. Most transliteration, then, is a process of pseudo-­ transcription which can become established as a spelling or a respelling. That is to say, it can function as the expression of the written sign as well as expressing a pre-­theoretical analysis of the corresponding spoken sign. The two functions, spelling and pseudo-­transcription, will share the same glyphs unless and until the spoken sign is affected by pronunciation changes without corresponding changes in spelling. The fact that the process we have been considering can be carried out on unwritten languages as well as on written languages rather shows the term ‘transliteration’ as we have used it so far to be a misnomer. This is clear also in cases of ‘transliterating’ logograms. Transliteration of a logogram into elements of a phonographic writing system can only be done through reference to the pronunciation in spoken language of the word represented in written language by the logogram. For example, the Chinese character meaning ‘below’ is written as using the roman alphabet Pinyin system. The choice of which roman letters to use is not determined by any sound–spelling correspondences to be found in the relationship between spoken / ˋɕiɛ / and written (see Section 1.1.1). Clearly the choice of letters in the Pinyin spelling is determined instead by properties of the spoken form with reference to the kinds of sound– spelling correspondences that the letters take part in in languages that use roman letters. It is therefore really a case of pseudo-­transcription functioning as a respelling. Figure 1.8 diagrams the process. Spoken sign

Pinyin

Logographic

written sign

written sign

Content

Expression

Content

Expression

Content

Expression

BELOW

ˋɕiɛ

BELOW

xià

BELOW

л

Transcribed as

Respelled as

FIGURE 1.8:  Transliteration as pseudo-­transcription and respelling

An example of transliteration in the other direction, from roman letters to Chinese logograms, also makes it clear that transliteration makes reference to pronunciation. In the People’s Republic of China the three-­syllable foreign word Obama (the surname of the president of the USA at the time of writing) is written in Chinese using the syllabograms , corresponding from left to right



Theoretical Preliminaries

31

to the Pinyin spellings àu bā mǎ. The syllabograms originated as logograms for ‘mysterious’, ‘adhesive’ and ‘horse’ respectively. The choice of Chinese characters is determined by a matching of pronunciation, not letter–character equivalences. It is clear from the above discussion and examples that transliteration as usually practised is not, strictly speaking, transliteration. That is to say, one cannot simply list the characters of two writing systems, find some criteria such as positions in the lists or visual similarity for pairing them up, and expect the result to make linguistic sense. Even if one uses criteria based on pronunciation the result may not be satisfactory. For example, the English postalveolar fricative /ʃ/ corresponds in spelling to the English digraph , but Arabic /ʃ/ corresponds to the Arabic grapheme ش‬. A strict transliteration into Arabic of the > using the one-­to-­one converEnglish orthographic form would be < sions → س‬, → ه‬, → ي‬, but this would not be regarded as a helpful way to write the English word shy using Arabic spelling.9 The preferred solution < > is a respelling via pseudo-­transcription (the diacritic corresponds to a short /a/ vowel). Strict transliteration does, however, have its uses. One of these uses, ironically enough, concerns phonetic notation systems. For example, Ellis (1869: 15) presents tables showing one-­to-­one equivalence between Bell’s organic notation and his own palaeotype symbols, and MacMahon (1996: 837) adds equivalent IPA symbols. A symbol in a cell in one table is equivalent to the symbol in the corresponding cell in the other table, e.g. Bell’s symbol [] is equivalent to Ellis’s [sh] and IPA [ʃ]. Dobson’s edition of Robert Robinson’s Art of Pronuntiation (1617) transliterates Robinson’s invented symbols into a mixture of IPA symbols and English orthographic letters (see discussion in Dobson 1957: xi–xv); Robinson’s symbol [ƨ] transliterates as the IPA symbol [u], for example. ‘IPA Braille’ is another and very recent example (see Englebretson 2009; and see Chapter 3 Section 3.4.7), in which every symbol on the IPA chart has an equivalent braille form, as is the SAMPA notation for use in emails (Wells 1995b; and see Chapter 3 Section 3.4.10). The criterion for transliteration of phonetic notation is that the symbols denote comparable models and can represent the same pronunciation phenomena. Further uses for strict transliteration are in the classification of documents and other bibliographic control measures (Wellisch 1978: 31–7), and for assigning keystrokes on a keyboard to characters other than standard keyboard characters. For example, the Wǔbǐzìxíng method of typing Chinese characters, also known as Wubi, or Wang Ma, assigns keys to characters on the basis of a character’s stroke-­ structure, not its pronunciation. The character meaning bank (financial) is entered by typing ‘qvtf’, which is a kind of transliteration process although nobody would use as a spelling for writing the word. The Q key is used for the left-­hand part of the component because it is in the area of the keyboard for characters with strokes falling to the left, while the V key is in the area for characters with a hook stroke so is used for entering the right-­hand part; the left-­hand and right-­hand parts of the component are assigned to the T and F keys because of left-­falling and horizontal strokes respectively. The pronunciation is represented in the Pinyin pseudo-­transcription spelling , which has no relationship at all to the assigning of keystrokes.

32

Phonetic Transcription in Theory and Practice

1.6  Orthographic Transcription When a piece of spoken language is written down using spelling conventions it is not the expression elements of spoken language which are being transcribed; rather it is the expression elements of the corresponding written language which are being written. There is therefore a process not unlike translation taking place in which the spoken language is in a sense translated into written language (see Section 1.1.4). An orthographic transcription of a word will be the same regardless of how pronunciation of the word might vary within and across speakers, because it is the orthography that determines how the words will be written, not their pronunciation. In many varieties of English the phonetic form [ɹoʊd] (or something similar) will occur three times in Jane rode down the road and rowed across the river, but it will correspond to three different spellings in an orthographic transcription because there are three different words, each with its own sequence of letters. To carry out an orthographic transcription of spoken language one has to recognise the words and know how to spell them (and recognise the grammar and know how to punctuate it), but one does not have to do any phonetic analysis of the spoken forms. The road–rode–rowed example shows the influence of morphology on English orthography. The -­ed past tense inflection distinguishes the weak verb row from the strong verb ride as well as from the noun road. In addition to the effects of historical changes in pronunciation, it is the intrusion of morphology into orthography that makes English spelling appear illogical to anyone who thinks the job of spelling is to represent pronunciation. English spelling is best categorised as morpho-­phonographic in so far as morphemes that have alternations in spoken language tend to have a single invariant spelling in written language. This is particularly true of inflectional morphology such as plural -­s, third singular indicative -­s, possesive -­s, past tense -­ed and past participle -­ed, and only marginally less true of stem morphology where some exceptions can be noted: witness the spelling differences in maintain~maintenance, rigour~rigorous, for example. We will see in Chapter 4 Sections 4.6 and 4.9 how a quest for phonological invariance in spoken language can affect how transcriptions are made and interpreted. Orthographic transcription of a more logographically oriented language such as Chinese makes even more obvious the fundamental difference between transcribing the expression elements of spoken language and writing the expression elements of the corresponding written language. It is impossible to make an orthographic transcription of spoken Chinese using traditional Chinese characters unless one recognises the lexical items and knows the characters with which they are written. Perhaps a little surprisingly, the same is in fact true of any language, no matter how phonographic its orthography might be. If we do not know a word, or do not know its written form, and we write it on the basis of knowledge of its pronunciation, then we are not transcribing it orthographically but making a pseudo-­or proto-­transcription using the orthographic resources as a pseudo-­or proto-­notation. What we write may be identical to an orthographic transcription, but it will have come not from knowledge of how to spell that particular word but from knowledge of general sound–spelling correspondences.



Theoretical Preliminaries

33

Failure to ­appreciate this point has led sometimes to misuse of the term ‘orthographic transcription’, at least from the point of view of the distinction between transcription and spelling that I have been at pains to draw. In Guendouzi and Müller (2006), for example, the term is used to cover what I would describe as the employment of orthographic resources in pseudo-­or proto-­transcription. The authors are concerned with producing, for clinical purposes, accurate transcripts of the spoken language of speech and language therapy clients so that these can be analysed in a largely Conversation Analysis framework. It is therefore very useful for the transcripts to represent aspects of speech behaviours such as voice quality and tempo. When the authors say that an orthographic transcription ‘has to be detailed and as faithful as possible to the data at hand’ (Guendouzi and Müller 2006: 36), they are moving away from translation of spoken into written language and moving towards representing aspects of the expression elements of spoken language. An orthographic transcription leaves no room for variation of detail or faithfulness – it translates the grammar and lexis of a piece of spoken language (an utterance) into written language by adhering to spelling practices. 1.6.1  Interpretation of spellings and transcriptions

If we consider the question of interpretation of characters in spellings and symbols in transcriptions, the lack of analogy between spelling and transcription ought to be apparent. The symbols used in phonetic transcription have to be interpreted as standing for something, which is not true of individual characters used in spellings. Proto-­and proper symbols denote theoretical models and therefore are interpreted in terms of those models and the dimensions of phonetic space that define them. But when we see the letter in the written form of the English words goat, ghost, gaol, sign, badge, cough, weigh and so on it is pointless to ask what it denotes or represents, or how to interpret what it means; indeed, it is not in the least clear that it has anything in common across this set of written words beyond its graphic form, being called ‘Gee’ and being numbered seventh in alphabetical order. Any further synchronic interpretation is likely to be no less fanciful than Clarence’s assertion, on the authority of a wizard, that G is a disinheritor.10 A literate user of English only needs to know which words contain it in their spelling and where to put it, or where to expect it when reading. It is not even necessary to know explicitly how it corresponds to units of pronunciation, although literate language users do have some explicit knowledge of sound–letter correspondences which enables them to attempt pronunciations of newly encountered written words and to make a stab at spelling newly encountered spoken words; proper names commonly pose these problems. The essential point here is that the letter is not there primarily to supply information about how to pronounce the words. The primary function of the arrangement of letters in spellings is to identify words the pronunciation of which will already be known. The interpretative process in relation to spelling is primarily at the level of lexis and grammar. In conclusion, we can make the general statement that the characters of written language do not denote anything at all except their function as distinct characters in an orthography. That is to say, they have only a self-­signifying ­function in writing system scripts.

34

Phonetic Transcription in Theory and Practice

By contrast, when it comes to seeing the symbol [ɡ] in a phonetic transcription we do need to know how to interpret it. What it denotes, every time it is used, is a model fixed and defined by phonetic theory comprising an intersection of particular categories in phonetic space. We need to understand the categories to understand the symbol, and know what kinds of phenomena can be mapped onto the model comprising them. This point is perhaps clearer when there is no close resemblance between a symbol and a character in a phonographically oriented writing system, for example the IPA symbols [ʘ] and [ʢ]. The symbol [ɡ] in the IPA notation system denotes a theoretical model which may be defined as a ‘voiced posterodorso-­velar plosive’. Each term in the definition can only be properly interpreted through knowledge of the phonetic theory underpinning these categories. The category ‘voiced’ is a category onto which can be mapped vocal fold vibrations of the modal type which, according to current understanding, involves aerodynamic-­myoelastic action throwing the true vocal folds into sustainable quasi-­periodic vibration; ‘posterodorso-­’ is a category onto which can be mapped actions of that part of the dorsum of the tongue lying opposite the soft palate and identified as the active articulator; the category ‘velar’ is for mapping involvements of the soft palate identified as having the role of passive articulator; ‘plosive’ is a category onto which can be mapped the complex sequences of events in which intra-­oral pressure is manipulated and converted into transient acoustic energy. Each constituent category thus has a necessary connection with particular parts of a comprehensive theoretical account of how speech is produced by the human vocal tract, an account which at its deeper levels draws on theoretical knowledge from the disciplines of anatomy and physiology, aerodynamics and acoustics. Full interpretation of proper phonetic notation and proper phonetic transcription is therefore heavily theory-­dependent, which is not the case with characters such as alphabetic letters, syllabograms or logograms. While the latter two invoke the concepts of syllable and word respectively, theoretical understanding of these concepts is not a requirement for literacy. To enable correct use of any phonetic notation system, a set of conventions for its interpretation must be supplied, defining what the symbols denote. This brings us to consideration of pseudo-­ phonetic notation and pseudo-­ phonetic transcription. What is the interpretative process when «g» is used as a pseudo-­phonetic symbol? Because it is a pre-­theoretical model its interpretation is not dependent on any body of theory. Instead it is dependent on experience of pronunciation-­forms containing phenomena that map onto «g», a pre-­theoretical model abstracted from experiencing what word-­forms such as goat, again, bag and so on have in common, though not gnat, sign, badge. We can think of «g» as an imitation label for a particular type of sound which we can recognise and repeat. Spellings can be read as transcriptions and vice versa, as we have seen in Section 1.5 on respelling and transliteration. But it should always be borne in mind that spellings are expression-­forms in written language which can be put into correspondence with expression-­forms in spoken language, whereas ­transcriptions, whether pseudo, proto-­or proper, represent analyses of spoken language expression-­ ­ forms through denoting pre-­ theoretical or theoretical models.



35

Theoretical Preliminaries

1.7  Status and Function of Notations and Transcriptions The status of a phonetic notation system, and of transcriptions made with it, is crucially dependent on its relationship to a body of theoretical phonetic knowledge and on the graphic resources available. A further factor in assessing status is whether a transcription is specific or generic (see Chapter 4 Section 4.1). Figure 1.9 illustrates this classification. It is possible for the status to be different for a transcriber and a reader of a transcription depending on their level of familiarity with phonetic theory. Function refers to the purpose to which a transcription is put by a transcriber or a reader. The most common function of a transcription is probably to NOTATION and TRANSCRIPTION

PSEUDO-

PROTO-

Denotes models established through abstraction from experience

PROPER

Denotes models established by phonetic theory

Notated with orthographic characters which become pseudo- or proto-phonetic symbols pseudo- in « », proto-in ()

Notated with phonetic symbols in [ ]

GENERAL PHONETIC MODELS Not in a mapping relation with phonetic data Denote only phonetic models/categories

DESCRIPTIVE PHONETIC MODELS In a mapping relation with phonetic data Represent an analysis of data in terms of phonetic models/categories

SPECIFIC TRANSCRIPTIONS Data are from a single observed pronunciation

GENERIC TRANSCRIPTIONS Data are from an indefinitely large class of observed and/or postulated pronunciations

FIGURE 1.9:  Classification of phonetic notation and transcription in terms

of status

36

Phonetic Transcription in Theory and Practice

express an analysis of pronunciation, whether specific or generic (see Chapter 4 Section 4.1). This is a passive function in so far as it does not influence pronunciation but is providing knowledge about pronunciation. However, we shall see in Chapter 4 Section 4.13 that transcriptions can have active functions as performance scores and prescriptive models. The various functions of transcriptions can be used in different contexts, some of which, such as lexicography, language teaching, speech therapy and conversation analysis, have already been mentioned. These contexts will be revisited along with other contexts such as dialectology and forensic phonetics in Chapter 7.

Notes   1. For the first two chapters of this book I shall be using the term ‘phonetic transcription’ in a wide sense without distinguishing between broad and narrow, impressionistic and systematic, or even between phonetic and phonemic, except where such distinctions are explicitly indicated.   2. The terms morphography or morphemic writing are sometimes used where the unit represented is a morpheme rather than a word. I shall use logography to include morphographic writing unless otherwise stated.   3. Saussure conceived of it as a ‘sound image’ in the speaker’s mind.   4. This character derives historically from the incorporation of the phonetic element 卜 having the phonetic value [bǔ], which bears no relation to / ˋɕiɛ/, the phonological form corresponding to 下 in modern Mandarin Chinese. It is one of the approximately 33 per cent of characters in written Chinese that do not have any components that correspond to any elements of the spoken form of the word (DeFrancis 1989: 110–12); it thus truly qualifies as a logogram.  5. ‘On Exactitude in Science’ in Jorge Luis Borges (1975), A Universal History of Infamy, London: Penguin.   6. In Swift’s Gulliver’s Travels, part III, ch.V: ‘since Words are only Names for Things, it would be more convenient for all Men to carry about them, such Things as were necessary to express the particular Business they are to discourse on’.  7. I shall use double angle brackets to enclose symbols representing pre-­theoretical models, and square brackets to enclose symbols representing theoretical models.   8. The direction of Arabic writing is from right to left.  9. < ‫ > ه‬is the isolated form. 10. Shakespeare’s Richard III, act 1, scene i.

2

e Origins and Development of Phonetic Transcription

e

2.0 Introduction In Chapter 1 I described proper phonetic transcription as a technographic form of writing in which the symbols have phonetic definitions supplied by phonetic theory. In this chapter I will look at how writing became available as a means of representing pronunciation and consider the rise of the discipline of phonetics as a means of analysing and describing it. I will then attend to how writing and phonetics have come together to provide the practical and theoretical resources that have enabled proper phonetic notation and transcription to develop. Going back through history it is apparent that these resources have arisen independently in different cultures and periods, and that what I call pseudo-­notation and pseudo-­ transcription have been widespread in the transmission and adaptation of writing systems. Proper phonetic notation and transcription require phonetic theory and analysis, and have therefore not been so widespread. They did, however, develop in the work of the phoneticians of ancient India and Greece, among the medieval grammarians of the Middle East, and among the spelling reformers of Renaissance and Early Modern Europe. But it was not until the nineteenth century that phonetic notation started to become systematically separate from the characters of written language, and transcription systematically and conceptually separate from spelling.

2.1  Representation of Pronunciation in Writing Systems Whether or not writing has been language-­dependent from its very beginnings is partly a matter of definition. Systematic use of visual marks may have started independently of language as a means of expressing extralinguistic meanings and concepts directly rather than as a means of identifying language-­specific words. A modern-­day example is the use of a sign such as to warn of danger. It will be read very differently depending on the language of the reader – English ‘danger!’, German ‘Achtung!’, Spanish ‘¡peligro!’ etc. – but it can also be read differently in the same language, e.g. ‘hazard!’, ‘be careful!’, ‘keep away!’ etc., because it represents a concept or set of concepts, not a word. It is technically

38

Phonetic Transcription in Theory and Practice

a semasiogram; it expresses a meaning independently of any particular language, as do mathematical symbols. Some scholars are happy to call this kind of graphic communication writing, while others prefer to call it proto-­writing or partial writing, or exclude it from writing altogether (see critical discussions in Sampson 1985: 29–32; Harris 1986: 57–75; DeFrancis 1989: 3–64; Boone 2004: 313–17). Because it is not tied to any specific language we can classify it as ‘non-­ glottographic’. Pronunciation can start to be represented once writing has become glottographic and takes on the function of expressing language-­specific words in visual form. Because written words have spoken equivalents expressed through pronunciation, it becomes possible to link the visual marks of writing with recurrent aspects of pronunciation and to systematise these links into explicit sound–spelling correspondences. Once this happens the resources are there for pseudo-­notation and pseudo-­transcription. One question for us is when and how, and also why, the conditions for this have arisen in the history of writing.

2.2  Phonographic Processes in Writing Systems Phonographic writing could not have come into existence without some kind of analysis of pronunciation, albeit of a pre-­theoretical kind. Characters of written language take on, in addition to their status as written language expression elements, the status of pseudo-­phonetic symbols representing properties that auditory-­perceptual experience suggests are shared by the expressions of different spoken words. Historically, these properties have been at various levels: the whole word-­form, the syllable, the segment, or segment constituent. 2.2.1  The rebus principle

A simple kind of analysis of pronunciation is that which enables homonymic relations to be established. It is on this kind of analysis that the rebus principle rests. Supposing we had in English the logogram for the word rye. We could use it to represent the homophonous word wry as well. To recognise homophones one has to pay attention to the pronunciations as well as the meanings of the words and be able to notice that they sound the same, although without necessarily being able to give any sort of phonetic account of the similarity. The judgement as to the sameness of pronunciation only need be holistic for rebus writing, so there is no call for analysis of the pronunciation into any constituent parts and no notion of a ‘speech sound’ other than the sound-­impression of the spoken word-­form as a whole. Punning exploits homophony, occurring among non-­literate as well as literate speech communities. Rebus writing is an early step in the phonographic orientation of writing although, as Harris (1986: 67) points out, it is still logographic. In the above example the word wry is represented by the logogram just as much as is the word rye, but the choice of that particular logogram is made by reference to pronunciation and is therefore phonetically motivated, whereas logograms themselves typically have semantically motivated origins, although these may become opaque over time as has happened in Chinese (Sampson 1985: 150; see examples of diachronic change in Li 1992). The process by which would extend to be the expression of the written word



Origins and Development

39

wry is a process of pseudo-­transcription in which ‘۲’, at least temporarily, has the status of a pseudo-­phonetic symbol, «۲», representing an abstraction of what the pronunciations of rye and wry have in common. It would not have this status when used as the expression of the written word wry, having instead the status of a character. Baines (2004: 163) cites studies advancing the claim that both logographic and phonographic writing of Ancient Egyptian are exemplified in archaeological finds dating from the late fourth millennium bce, and that the rebus principle may already have been employed at that time. These finds from the site known as tomb U-­j at Abydos in Upper Egypt might be the oldest language-­dependent writing that we know about. If the archaeologists’ interpretations of the U-­j finds as reported in Baines (2004) are accurate, then the rebus principle may be as old as writing itself, in which case phonography has been present in glottographic writing since its beginnings. 2.2.2 Syllabography

If started to be used in the writing of all words containing the syllable [raɪ] in corresponding spoken words – writing, riding, ripen, arise and the like – then it would correspond recurrently and systematically to that syllable and could be used as a pseudo-­phonetic symbol to transcribe it. The invention of syllabograms requires analysis of pronunciation at a deeper level than rebus writing. Instead of the judgement of sameness being made over whole words it has to be made over syllables, therefore requiring segmentation of speech into syllables even if syllable boundaries are not precisely or consistently established and there is no formal definition of a syllable. The real significance of this only becomes apparent in the context of polysyllabic words in which constituent syllables are themselves meaningless, having no semantic or grammatical content. The pseudo-­phonetic symbol «۲» is now available to represent that abstracted spoken syllable on its own, as an expression element divorced from content. The pre-­theoretical model denoted by the symbol can be defined ostensively as what pronunciations of rye, wry, writing, arise and so on have in common. Divorcing expression from content, it could be argued, is the single most important step that has to be taken in order for any form of phonetic notation to develop. In languages where words are generally monosyllabic, such as Chinese, it may not be so obvious that expression can be divorced from content because all occurring syllables will be word-­forms. This might account for why written Chinese is not as phonographic as most other written languages (Robertson 2004: 34). Although most compound characters in Chinese consist historically of a ‘phonetic’ and a ‘signific’, i.e. one character present for its spoken expression value and another for its content value, the logic underlying the structure of compounds has lost its systematicity due to three thousand years or more of pronunciation changes (Sampson 1985: 156). Syllabography has arisen historically in contexts of what Wellisch (1978) calls script conversion, using the writing system of one language to write another. Script conversion often involves the reinterpretation of spelling elements such that they change their relationships of correspondence with the spoken language

40

Phonetic Transcription in Theory and Practice

in a phonographic direction. For example, as we saw in Chapter 1 Section 1.1.4, the Akkadians and Japanese adapted, respectively, Sumerian and Chinese logograms as syllabograms. The fact that script conversion tends to increase phonographic orientation may be responsible for the view, current until relatively recently and articulated particularly by Gelb (1969: 200–5), that there is some teleology at work guiding the development of writing from hazy beginnings in pictography to the polished clarity of alphabetic writing. This view has been heavily criticised by Harris (1986), Olson (1994) and Coulmas (2003: 197–8) and is hard to reconcile with a number of facts, chief of which is the observation that most languages are written using stable mixed systems of writing in which logographic and phonographic elements co-­exist. Akkadian happily continued to use Sumerian logograms as xenograms alongside syllabograms derived from logograms, and Japanese continues to do the same with its Chinese logogram-­derived kanji (Coulmas 2003: 74), although kana spellings are gradually replacing kanji in some morpheme and word classes (Nomura 1988, cited in Smith 1996: 210). 2.2.3  The acrophonic principle

Acrophony takes a logogram or syllabogram and uses it to correspond to the first sound in that word or syllable; it can then be used in the spelling of any word containing that sound in its pronunciation. For example, our logogram for the word rye could be used to correspond to the initial [r]; we could then use it in the spelling of red, crab, berry and so on. It therefore takes pre-­theoretical phonetic analysis of pronunciation further than syllabography and provides the means to represent speech as a segmental structure below the level of the syllable. Once speech is seen as segmental, and the segments are associated with individual characters, they become objects with an abstracted existence of their own; written characters, in addition to spelling words, can then take on the function of representing these segments independently of the words they occur in, and we have the conditions for a pre-­theoretical kind of segmental pseudo-­notation. The character can be seen as denoting a pre-­theoretical model abstracted from what we perceive the spoken forms of red, crab, berry and so on to have in common. Acrophony thus involves establishing an initial sound and separating it from the rest of the pronunciation-­form. The Ancient Egyptian consonantal signs, in use by 3000 bce, came about through acrophony (Sampson 1985: 78) coupled with the need to be able to write proper names, particularly foreign ones. Segmental pseudo-­transcription can therefore be said to date from at least this far back in history, at least with respect to consonants. Examples of the manipulation of expression elements as objects independent from the words they are used to spell can be seen in the Early Dynastic inscriptions of Ancient Egypt. It is not known if they were ever pronounced, but their significance lies in the conceptual and physical separation of expression from content without which the development of any form of phonetic notation and transcription would not be possible. Centuries later, the Chinese in the third century ce developed fǎnqiè, a kind of acrophonic procedure in which characters could be used for their syllable onset values and others for their syllable rhyme values; writers could thus create nonsense words by combining them to write non-­occurring syllables in a pseudo-­



Origins and Development

41

phonetic transcription. Except for explicit phonetic analysis of tones, the Chinese did not develop phonetic analysis and classification beyond division into onsets and rhymes until phonetic scholarship came in from India some centuries later (Halliday 1981: 131–­5). Once phonetic analysis was incorporated as a result of Indian influences, the syllabogram characters in fǎnqiè could be regarded as changing their status from pseudo-­phonetic to proto-­phonetic notation. 2.2.4  The notion ‘segment’ revisited

In Chapter 1 Section 1.2.1, the notion of a speech sound as a discrete segment realising a discrete phonological element was critically examined. We need to return to it here in the light of the claim that the notion is dependent on the prior existence of an alphabetic writing system. This claim has been advanced by Faber (1992) using psycholinguistic evidence from studies of reading ability alongside evidence from the history of writing. It sits comfortably with other claims by scholars such Olson (1994) that written language provides models for the analysis of spoken language (see Chapter 1 Section 1.1), and has become quite strongly entrenched in modern linguistics. Fraser (2005: 116), for example, in a generally insightful discussion of types of representation of speech, confidently claims that ‘[i]t is well-­established that it is only through acquisition of alphabetic literacy that an analysis of speech into segments becomes available to language users’. Faber’s arguments can, I think, be met on two fronts: firstly, whether acceptance of her case means that the notion of segments has no legitimacy in linguistic and phonetic theory; secondly, whether her case is persuasive and ought to be accepted. On the first point, I think the answer has to be no. If it is true that segmental awareness only arises among users of an alphabetic writing system, this is no reason to regard the segment as an illegitimate analytic concept for phoneticians and phonologists. In the sphere of syntax, language users can only parse sentences if they have been taught grammar, but we do not take this to mean that we have to dispense with notions such as noun and verb, particle and affix. Whatever contingencies might be responsible for the notion of a segment as a constituent of the structure of speech, whether we should apply the notion or not depends on how well it facilitates analysis. All theoretical notions are arbitrary, but some are more appropriate than others. I agree with Laver (1994: 110) that the segment is an appropriate notion in phonetic theory providing we understand how to apply it. Regarding the persuasiveness of Faber’s arguments, I find it lacking. Her arguments are essentially of two kinds: psycholinguistic and historical. The psycholinguistic evidence is cited in the main from three papers published in an issue of Cognition. One of these studies is Morais, Bertelson, Cary and Alegria (1986), in which illiterate and ex-­illiterate (having become literate in adulthood) speakers of European Portuguese were tested on various consonant, vowel and syllable segmentation tasks. Illiterate subjects were able to segment initial [p] with 18.6 per cent accuracy, compared to 62.5 per cent and 83.3 per cent for poor readers and better readers respectively. Figures for vowel segmentation were 55.2 per cent for illiterates and 85.0 per cent for both groups of readers. Literate subjects performed considerably better, but the task was by no means

42

Phonetic Transcription in Theory and Practice

beyond all the illiterate subjects, refuting the claim that alphabetic literacy is a prerequisite for consonant and vowel segmentation. Responses of illiterates were 15.2 per cent correct for separation of a [pl-­] cluster into [p] and [l]. Another study cited is Mann (1986), which compared phoneme awareness in school-­age Japanese readers of kanji and syllabaries, and school-­age American alphabetic readers. Awareness of phoneme-­sized units was exhibited by Japanese fourth-­ grade children (c. 9 years) who had had no instruction in alphabetic reading. In the light of her results, Mann (ibid.: 89) suggests that ‘the capacity for manipulating phonemes could be part and parcel of a language acquisition device’. The third study cited by Faber is Read, Yun-­Fei, Hong-­Yin and Bao-­Qing (1986), which compared segmentation ability in two groups of literate Chinese speakers: one group who had learned the alphabetic Pinyin spelling system in addition to learning traditional Chinese characters, and one group who had only learned the traditional logographic characters. Non-­alphabetic readers scored 21 per cent correct on non-­words and 37 per cent correct on real words, compared to 83 per cent and 93 per cent correct responses by the alphabetic readers (ibid.: 38). Again, the results confirm that segmentation skills are by no means completely lacking in the absence of alphabetic knowledge and experience. All three studies cited by Faber in fact suggest that segmentation at the level of individual sounds can be performed by around a quarter of language users without prior familiarity with an alphabetic writing system, although accuracy and consistency of performance improve dramatically among those who are in the habit of using one. There is anecdotal fieldwork evidence of illiterate speakers undertaking quite sophisticated segmental analysis. Trubetzkoy (1937/2001: 37), for example, relates how an illiterate Circassian speaker told him: ‘Where we pronounce a strong s the H̤ak˚əc˚ pronounce it that way too, but in words where we pronounce a very weak s, they replace it by č.’ The historical argument concerns the supposed uniqueness of the early Greek alphabet in having letters for vowels as well as consonants and thus being fully segmental. The introduction of vowel letters into the alphabet by the Greeks was at one time hailed as a major intellectual advance on the Semitic abjad (Carpenter 1933), suggesting implicitly or explicitly that the Semitic speakers had lacked the insight into spoken language structure to appreciate the existence, or importance, of vowels (see Bernal 1987b: 393–9). The segmental nature of the Greek alphabet as it existed after being adapted from the Canaanite abjad is explained by Faber, following Sampson (1985: 100–2; see also Gelb 1969: 181, and Coulmas 2003: 127), as having arisen not through segmental analysis but through a misinterpretation of certain letters which corresponded in Semitic languages to consonants that had no equivalents in Greek. The Greeks instead used them to represent Greek vowels similar in quality to the vowels in the Canaanite letter-­names (Allen 1981: 115), perhaps thinking that this was how they had always been used. Once this had happened, and only once this had happened, Greek letters could be seen as representing individual discrete vowel sounds as well as consonant sounds. Beforehand, the notion ‘segment’ in relation to vowels could not, according to Faber, be said to have existed. The historical evidence, to my mind, supports a contrary view. The practice of matres lectionis in archaic Semitic writing shows clearly that resources for



Origins and Development

43

r­epresenting vowels had in fact been developed before the Greeks, during the second millennium bce (Gelb 1969: 197), and may in fact have influenced the Greek usage of vowel letters (Bernal 1987a; Coulmas 1996: 329). In matres lectionis (‘mothers of reading’), letters corresponding to glide consonants were used to indicate vowels of a similar auditory quality to the glides. The letters corresponding to consonantal /w/ and /j/, for example, were used to indicate the long /uː/ and /iː/ vowels. Pairing of semivowels and vowels relies on accurate recognition of phonetic similarity and suggests that experimental observation may have been involved in the process: it is by holding steady in the form of discrete sounds the articulation of [w] and [j] that one observes them becoming, respectively, [u] and [i]. Characters corresponding solely to vowels date in fact from very early in the history of writing. They are found in Ancient Egyptian from before 2000 bce (Gelb 1969: 168). Although they were not used very often, and never became systematically integrated into the Ancient Egyptian writing system, they attest to awareness of vowel sounds separate from consonantal sounds a long time before the Greek alphabet appeared. They cannot be explained away as mistakes arising from the adaptation of a writing system to another language with a different inventory of consonants. Even a consonantary without any letters corresponding to vowels would attest to the same ability to segment as an alphabet containing vowel letters. The only difference is that the vowel segments have no corresponding letters. If a spoken [CVC] structure corresponds to a written structure, then the [V] has been left out of account, but it can only be left out by detaching it from the Cs, unless one claims it was simply not noticed at all. The small vowel inventories of Ancient Egyptian and Semitic languages, and the lexico-­semantic stability of their consonantal roots, placed less importance on vowels than on consonants for word identification. Vowels mainly expressed inflections; their distribution would have been much more predictable from grammatical context than in an Indo-­European language like Greek, so their representation in writing was not so necessary. It is still the case in Arabic, a modern Semitic language, that written texts are typically unvowelled for precisely these reasons. Although our word ‘consonants’ implies their dependence on vowels, in traditional Sanskrit grammar the word is vyañjana, which, according to one authority, comes from the verb vy-­añj-­ ‘to manifest’ because consonants manifest meaning (Allen 1953: 81). Because languages tend to have many consonants but fewer vowels, consonants will have a higher functional load and differentiate word-­forms more than vowels. Faber (1992: 127) regards the Chinese fǎnqiè as non-­segmental and adduces it to support the view that the segment is a notion dependent on alphabetic writing, not one that helped to shape it. However, the fǎnqiè process of separating a syllable onset from a syllable rhyme will result in segmentation into a consonant and a vowel in any open syllable with a single onset consonant. Although there is now some doubt whether CV syllables are universally the first syllable type to appear in language acquisition (Savinainen-­Makkonen 2007), it is generally accepted that CV and V are the most widely attested syllable types across the world’s languages, being found in all known languages (Kenstowicz 1994: 254), and CV is certainly an extremely common syllable type in Chinese (Yip 2000: 20). If the syllable is the basic unit of production and perception (Levelt and

44

Phonetic Transcription in Theory and Practice

Wheeldon 1994), then, as Warren (2008: 201) points out, speakers and listeners will have direct access to monosyllabic lexical items, and if the structure is V the process of inferring segmental content will be maximally easy. To infer the segmental content of a CV syllable only requires recognition that something has been appended to the V. This analytic process can be repeated to deal with more complex syllables. The history of phonographic writing contains, from its earliest stages, evidence that language users were able to segment speech into the same kinds of consonantal and vocalic elements that IPA symbols denote. Segmentation may even predate writing, if the Indian phoneticians of the first half of the first millennium bce did not use a writing system. Segmentation is identified by Allen (1953: 18–19) as the second of the three main stages of ancient Indian phonetic analysis – between articulatory processes and prosodic features – resulting in the establishment of much the same consonantal and vocalic segments as modern analysis would establish (see table in Allen 1953: 20). Daniels (2001: 70) claims that by the time writing reached India discrete consonants and vowels were already fully understood. Whether the segment in speech is a ‘natural unit’ of auditory perception or is a notion that arose when people deliberated about how language could be analysed or written amounts to the same thing: that the human mind is capable of applying a segmentation procedure to spoken language without the idea having been suggested by alphabetic letters adapted through misinterpretation. Perceptual and cognitive constraints determine which kinds of properties of the speech signal tend to be noticed and, as a result, can come to be regarded as objects which combine together to build speech. Pre-­literate children’s sensitivity to syllables and to onset–rhyme division, as evidenced in studies such as Bowey and Francis (1991), and evidence from naming tasks that speakers parse syllables and store them in their mental lexicons (Levelt and Wheeldon 1994), attest to a perceptual-­cognitive bias in humans which may be responsible for driving the development of writing in the direction of syllabograms and alphabetic letters via rebus writing and acrophony. The same biases seem to underlie the poetic devices of alliteration, assonance and rhyme which are found in pre-­literate oral poetry as well as in written literatures (Finnegan 1977: 93–6). The prevalence of CV syllable types in languages means that these perceptual-­cognitive biases will encounter ample input to feed and reinforce an analysis into two segments: a consonant and a vowel. These then become models in terms of which analyses of more complex structures can be made. We ‘find’ segments in the structure of speech not because they are there in any physically objective sense, but because we are predisposed to conclude they are there, either innately or through learning. Modern physics describes a world very different from the world as it appears to us, or the way it appears to a bee or a bat (see for example Nagel 1974). What causes it to appear to us the way it does is our perceptual-­cognitive make-­up. A physical description of the world includes descriptions of pressure-­waves, but we do not experience speech as pressure-­waves; we experience it as sound with a concatenated structure (see Chapter 5 Sections 5.1, 5.7 and 5.8). The phenomenologist Merleau-­Ponty (1945/2002: 240, original italics) comments that ‘seen from the inside, perception owes nothing to what we know in other ways about



Origins and Development

45

the world, about stimuli as physics describes them and about the sense organs as described by biology’. If the notion ‘segment’ or ‘speech sound’ exists as a pre-­theoretical model in a listener, and if it is available to take part in complementary processing (the interaction of auditory input with higher-­level stored information; see Chapter 5 Section 5.4), it will predispose the listener to experience speech as consisting of segments of the kind that can be produced as isolated sounds. It could arise as a pre-­theoretical model through judgements about what is common to words such as bee, bar, boot and bee, tea, key, etc. 2.2.5  Subsegmental analysis

So far we have seen that phonography has existed from very early in the history of literacy, perhaps from its very beginnings, and that the units implicated in phonography can be whole word-­forms (rebus writing), syllables (syllabograms) and phoneme-­like segments (the acrophonic principle). Phonography can also implicate units smaller than the segment. The introduction of naqt ‘diacritical pointing’ into the Arabic alphabet in the seventh century ce indicates that analysis of place and manner of articulation of Arabic consonants had already been carried to some level of sophistication. For instance, the Aramaic letter ح‬was used in early Arabic writing not only for spelling words having the pharyngeal /ħ/ in their spoken form, but also for words with uvular /χ/ and postalveolar /ʤ/ (or possibly palatal /ɟ/), resulting in homographs and near-­homographs. A diacritical dot was placed over the letter to create a new letter خ‬corresponding to /χ/, and placed below to create ج‬corresponding to /ʤ/. According to Revell’s analysis of the pointing system (Revell 1975: 182–3), the criterion for dot placement was place of articulation: it was placed above for sounds further back in the vocal tract and below for those further forward. Arabic phoneticians, like the earlier Indian phoneticians, started their descriptions at the back of the vocal tract, which they described as being ‘higher’ than the front. Diacritical dotting can thus be seen to be motivated by the iconicity of this perspective. The Hebrew dagesh diacritical dotting was introduced for similar reasons of disambiguation. It indicated a stop consonant while its absence corresponded to a homorganic fricative (Coulmas 2003: 116). In the Japanese katakana and hiragana syllabaries the niguri (double slanted ‘ditto’ marks) and maru (small circle) diacritics represent voice and voicelessness respectively (cf. the IPA voicelessness diacritic [ ]̥ ) when added to CV syllabograms in which the C corresponds to labial /p/ or /b/: katakana ビ corresponds to /bi/, and ピ corresponds to /pi/; hiragana び corresponds to /bi/, and ぴ corresponds to /pi/. The niguri was introduced in the twelfth century CE, the maru in the sixteenth (DeFrancis 1989: 135). They are added to a base character which on its own corresponds to a CV syllable where the C is /h/. Uniquely in the history of written language, there is one example of a writing system designed so that characters consistently correspond to elements smaller than segments. The Hangŭl (also spelled Han-­gul, Han’gŭl, Hankul, Hangeul) system, developed in the fifteenth century ce for writing Korean, is founded on an analysis of consonants and vowels into component articulatory features

46

Phonetic Transcription in Theory and Practice

(Sampson 1985: 124–9; King 1996: 219–20). The importance of this has been downplayed by some on the grounds that not all the features needed are represented, and that literate Koreans are unaware of a featural dimension in the system (DeFrancis 1989: 196–8). Nevertheless, Hangŭl does provide examples of feature-­level correspondence between written and spoken forms similar in principle to the examples furnished by Arabic and Hebrew pointing, and by the Japanese niguri and maru diacritics, but extending through almost the whole system rather than being peripheral additions. A proper phonetic notation based on Hangŭl characters has in fact been developed and is exemplified in the 1999 IPA Handbook (p. 123; see also Chapter 3 Section 3.1.1). 2.2.6  Diffusion and borrowing of writing systems

When a writing system which was developed for one language is borrowed to write another language it often has to be adapted to suit the structural properties of the borrowing language (Wellisch 1978). In addition to differences of morphological structure, there will also be different consonants and vowels, so that close attention to the pronunciation of words is necessary in order to decide how to deploy the writing system. An ability to compare pronunciations in the two languages would seem to be an obvious prerequisite for adapting any elements of a writing system to write another language phonographically. The process by which the spelling units of one language are used to write another language can be modelled using the concept of pseudo-­transcription as diagrammed in Figure 2.1. Spelling units from language A are interpreted as standing for sounds on the basis of sound–spelling correspondences in language A. They are then used to stand for the sounds in the expression of spoken words in language B. That is to say, they are used as a pseudo-­notation to make pseudo-­ transcriptions which become the expressions of the written signs in language B. LANGUAGE A

LANGUAGE B

SPOKEN SIGN

WRITTEN SIGN

SPOKEN SIGN

WRITTEN SIGN

content

content

content

content

expression

expression

expression

expression

Pseudo-transcription

FIGURE 2.1:  Units used for spelling the written signs of language A are

used for representing the pronunciation of spoken signs in language B. This pseudo-­transcription then becomes the spelling for the written signs in language B.



Origins and Development

47

2.2.7  Anti-­phonography

There are counter-­influences to phonography at work in the world of writing which can be seen clearly in modern English spelling. Although written English is predominantly phonographic, different spellings for homophonous words such as hair–hare and too–two pull in a direction away from the rebus principle and represent a resistance to phonography which displeases proponents of spelling reform but enables ambiguity to be avoided in written texts. Attempts by spelling reformers to change this through further phonographic orientation have not met with any great success among the general literate English-­speaking public, who seem to value the morpho-­phonographic features of English spelling whereby alternations correspond to invariant spellings. Among the beneficiaries of this resistance are the regular plural/possessive/third singular present alternants all spelt , the regular past tense/past participle alternants all spelt , and stems that undergo pronunciation changes in suffixation but whose spellings remain intact, e.g. atom~atomic, climate~climatic, photograph~photography. Invariant spellings facilitate lexical and grammatical recognition in reading and prevent what would otherwise be an increase in the number of spelled forms that have to be learned and remembered. The use of logograms in written languages alongside phonography suggests that phonography does not necessarily yield preferred resources for writing. Logograms maintained a vivid presence in Ancient Egyptian writing for three millennia, actually increasing from some 700 in the Middle Kingdom (c. 2000 bce to 1650 bce), when a full consonantary was already in use, to around 5,000 in the Graeco-­Roman period (332 bce to c. 400 ce) (Ritner 1996: 74). In China, the introduction of phonographic roman Pinyin spellings was not intended to replace traditional Chinese characters and shows no sign of doing so, despite some voices calling for this since the Cultural Revolution in the 1960s (Wellisch 1978: 77–81). Although there are indications that logographic Japanese kanji, Chinese characters that spell Japanese translation equivalents as xenograms, are in decline in some word classes in written Japanese (see Section 2.2.2), they are still very much a living part of the written language despite the presence of the highly phonographic hiragana and katakana syllabaries. As Coulmas (2003: 180) points out, logography seems to have some appeal, which may also help to explain the practice of xenography. If written language is supposed to represent spoken language then xenography is an exceedingly strange way to do it. Commenting on the continuing use of logographic resources in writing systems, Cooper (2004: 92) warns us not to underestimate ‘the ideological investment a culture has in its traditional script’. The power of ideology is also evident in modern spelling reform debates; see for example Johnson (2005: 119–48) for a critical analysis of the debate on the 1996 German orthographic reforms. It is interesting that Korean Hangǔl writing, devised according to phonetic analysis and therefore unambiguously phonographic in conception, has become increasingly morpho-­phonographic over the centuries (King 1996: 223), receiving an official impetus in this direction in 1933 with the publication of the Guide for the Unification of Korean Spelling by the Korean Language Research Society (Sampson 1985: 139). Hangǔl readers and writers have thereby shown

48

Phonetic Transcription in Theory and Practice

a ­preference for spellings to be invariant with respect to morphemes rather than with respect to pronunciation. In examples of what I have called ‘anti-­phonography’ in writing – and more could be given – greater importance is put on lexical and grammatical identity than on sound–spelling correspondences, a practice inconsistent with the Aristotelian view that written language consists of signs for representing spoken language.

2.3  The Development of Phonetic Theory I have discussed how phonographic processes in the history of writing made the expression units of written language available as a resource that can be used for what I have termed pseudo-­notation and pseudo-­transcription. We have seen that language users seem to have been able to deploy characters in this capacity for writing proper names and in adapting writing systems from other languages from very early on in historical times. However, for characters to become a proto-­or proper phonetic notation system there has to be, as I have said earlier, a body of theoretical phonetic knowledge that can provide phonetic definitions and interpretations for the elements of the notation system. This existed at various levels of sophistication in the ancient world in India and Greece, and in medieval times among the grammarians of the Middle East, but in western Europe ‘[t]he discipline of phonetics did not appear until the early modern period’ (Law 1997: 262). The lack of interest in phonetics in the Europe of the Middle Ages (Robins 1990: 87) is symptomatic of a wider lack of interest in observational method. Philosophy in Europe at that time was overwhelmingly theological; debate among medieval scholastics concerning the correct way to obtain knowledge tended to revolve around whether divine revelation was the only source of true knowledge or whether knowledge could also be arrived at through human reasoning. Everyday observable facts were hardly accorded any importance (Russell 1961: 428). But phonetics cannot really be studied if this epistemological attitude prevails. Attempts to establish the facts of speech production can only be founded on observation. This may be why at the present day those phonologists in the generative tradition who take a rationalist stance on linguistics tend not to be much interested in phonetic detail or its representation (for example, Bromberger and Halle 2000: 24–5). In order to trace how proper phonetic notation evolved from pseudo-­notation and proto-­notation, in the sections that follow I review the emergence of the main theoretical approaches to phonetics in the pre-­Modern world up to the European Renaissance (Section 2.3.1), and the Early Modern world up to the late eighteenth century (Sections 2.3.2 and 2.3.3), followed by crucial developments during the nineteenth century (Section 2.3.4) which end with the establishment of the International Phonetic Association. The Association set the general tenor of what phonetic notation would be like up to the present day (see Chapter 3 Section 3.4.5). Phonetic theory continues to develop, pushed along by technologies such as sound spectrography, laryngoscopy and other instrumental means of phonetic research, but the basic formula of the International Phonetic Alphabet, roman-­ based base symbols with diacritics, keeps pace and continues to provide for the



Origins and Development

49

transcriptional needs of phoneticians. For an account of the first century of the IPA, see MacMahon (1986). The chapter concludes with a section comparing and contrasting sound–spelling and sound–symbol relations, and a final section on spelling reform. 2.3.1  Phonetic theory in the pre-­Modern world

As far as we know, theorising about pronunciation was first indulged in in ancient India before the time of Pānini, possibly in the absence of written language (Allen 1953: 15; Varma 1961: 12; Misra 1966: 19 – but see Bronkhorst 2002) and therefore possibly without the resources for notating sounds in any manner at all. Consonants and vowels were classified according to the articulatory criteria of place, manner and voicing in much the same way as in the modern IPA system of phonetic classification. In fact Allen (1953: 7) takes the view, with regard to the development of phonetics in western Europe in the nineteenth century, that ‘Henry Sweet takes over where the Indian treatises leave off.’ The motivation behind the development of phonetic theory in ancient India was religious. Sacred Vedic texts were recited, not written, and accuracy of pronunciation was highly valued to the point where mispronunciation ran one the risk of damnation. In order to instruct believers in correct pronunciation it was necessary to understand how speech is produced. When they had use of the Brāhmī and Brāhmī-­derived alphabets from about the third century bce, the Indian phoneticians did not explicitly distinguish between letters as units of spelling and letters as symbols for representing aspects of pronunciation. Letters thus had a dual use: as units of expression for written language and, because phonetic descriptions attached to them, as transcription symbols for representing the expression units of spoken language. Because their descriptive framework for classifying consonants and vowels was the product of theorising about speech production, each letter in its pronunciation-­representing capacity was part of a proto-­phonetic notation system. In the hands of the Indian phoneticians the letters could be used as proto-­ symbols having precise phonetic definitions, and therefore could be used for proto-­transcription. These phoneticians could read written language either, like the literate layman, as spellings for words, or as representing a phonetic analysis of spoken language. In ancient Greece the rudiments of a science of phonetics, including a division into consonants and vowels, can be seen in writings by Plato and Aristotle in the fourth century bce. It developed further under the Stoics in the third to second centuries bce. As was the case with the Indian grammarians, the motivation behind the study of phonetics in Greece was often prescriptive. Grammarians wished to preserve the pronunciations of Hellenic Greek and protect them from changes taking place due to koineisation and the spread of Greek to speakers of other languages whose pronunciation of Greek was influenced by those languages (Robins 1990: 20). The Greeks developed methods for phonotactic analysis and analysed sounds into manners of articulation, dividing them into stops and continuants and setting up three triads of aspirated–unaspirated–voiced plosives. Although the Greeks fell short of an accurate account of voicing, there are hints in certain texts that they understood more about it than they have often

50

Phonetic Transcription in Theory and Practice

been given credit for. Terminology was used with explicit phonetic definitions such that alphabetic letters came to have phonetic definitions associated with them, giving them the status of proto-­symbols for use in phonetic analysis and proto-­transcription in addition to their status as letters for use in spelling. It is notable, though, that no terms were coined by the Greeks, or by the Romans after them, for denoting places of articulation. Turning attention to the Middle Eastern grammarians of the medieval period, it has been suggested that they learnt their phonetics from India (Danecki 1985). However, there is no direct evidence for this and the circumstantial evidence is very thin (Law 1990); Bakalla (1983: 49), for example, believes that ‘Arabic phonetics grew up largely independently of the general scientific tradition of the pre-­Muslim world.’ Greek influences may be more likely (Semaan 1963: 10; Versteegh 1977: 21–5; Odisho 2011), although Carter (2007) argues against this possibility, pointing out that Arab scholars were careful to acknowledge external sources but no such acknowledgments are found in their phonetic writings. We have already seen that the deployment of diacritical pointing in written Arabic around the late seventh century ce was guided by phonetic observation. By the time of Sībawayh, the most renowned of the medieval grammarians of the Middle East, in the late eighth century ce, a situation existed similar to that which obtained in India over a thousand years previously: there was a comprehensive framework for phonetic classification based on careful observation of articulatory processes in which the letters of the Arabic abjad were given phonetic definitions, and allophonic and dialectal variants were described (Al-­Nassir 1993). The Middle Eastern grammarians therefore had the means at their disposal for proto-­phonetic notation and transcription. In fact advances were made beyond the bounds of the writing system when ways were devised of notating features such as vowel nasalisation, which is not contrastive in Arabic. Bakalla (1983: 55–7) relates that dots, circles and superscript letter-­shapes were used for this purpose in the tajwīd tradition for instructing correct recitation of the Qur’ān. These non-­orthographic resources can be regarded as proper phonetic notation according to the definition proposed in Chapter 1 Section 1.3. Similar transcriptional devices were independently invented by Iceland’s ‘First Grammarian’ in the twelfth century ce (Haugen 1972: 15–19), attesting to a phonetic knowledge which has been described as unrivalled in western Europe at that time (Robins 1990: 82; Vineis and Maierú 1994: 187) but which remained virtually unknown until the nineteenth century. The ‘First Grammarian’ carried out a classificatory analysis of Icelandic vowel distinctions based on length, nasality and openness and, significantly for us, proposed new letters for them by systematically adding diacritics to the five vowel letters of the roman alphabet (Haugen 1972: 15–19, 34–41; and see Chapter 3 Section 3.4.1). There was no attempt, however, to classify consonants other than by their letter-­names, and noting that whether their names have a CV or VC structure correlates with the stop–continuant distinction: is called ‘bee’, is called ‘eff’ etc. Prescriptivism provided the initial motivation for phonetic scholarship in the medieval Middle East in much the same way as in ancient India and Greece. Accurate pronunciation of the Qur’ān was and remains important for Muslims. New converts whose first language was not Arabic had to be taught how to recite



Origins and Development

51

sacred verses, but in the ideas of the Middle Eastern phoneticians one can see an interest in phonetics for its own sake, reaching levels of analysis over and above what is required for instruction in ‘correct’ pronunciation. Commenting on the Sirr al-­Sinā‘at al-­‘Irab ‘The Secret of the Inflectional Endings’ by Ibn Jinni (tenth century ce), which is ostensibly a prescriptive work, Mehiri (1973: 76) describes it as ‘un véritable traité de phonétique’. Ibn Jinni likened the vocal tract to a flute through which air is blown, with the places of articulation functioning like the finger-­holes to give different qualities of sound. This is the insight of a phonetician, not a prescriptivist. The first known diagram of the vocal tract appeared in the late twelfth-­or early thirteenth-­century Arabic treatise Miftāh al-­‘Ulūm ‘Key to the Sciences’ by Al-­Sakkākī and is reproduced in Figure 2.2. Each letter is written beside the place of articulation of the corresponding consonant. We can interpret the diagram to the effect that the letters become proto-­symbols and the places of articulation are identified as part of the theoretical models that the proto-­symbols denote. I am not, of course, claiming that Al-­Sakkākī would have explained it in these terms.

FIGURE 2.2:  Late twelfth-­or early thirteenth-­century vocal tract diagram entitled Sūrat makhārij al-­hurūf ‘Picture of the outlets of the letters’ from Miftāh al-­‘Ulūm ‘The Key to the Sciences’ by Al-­Sakkāki. Dotted line indicates the nasal passage with a nostril above the lip.

2.3.2  Phonetic theory in the Early Modern world

Challenges to medieval European modes of thought brought in the Renaissance at around the time that vernacular languages were gaining status in Europe. A burning question in many quarters was how these languages, regarded heretofore

52

Phonetic Transcription in Theory and Practice

as inferior illiterate dialects, should be written. Attention to this question, along with a more empirical approach to knowledge, was probably a major impetus to the emergence of phonetic theory in western Europe in the sixteenth and seventeenth centuries. In deciding how words in French, Italian, Spanish and other Romance vernaculars should be spelled, two guiding principles came into conflict, namely etymology and pronunciation. Proponents of etymological spellings tended to be Roman Catholic by religion and socially hierarchical, desiring to show close links between their own spoken language and Latin, the language of the Roman Catholic Church. By contrast, those who favoured taking pronunciation as the guide tended to be Protestant and socially egalitarian. They saw etymological spellings as a barrier to literacy for the population at large and an attempt to preserve written language for social and religious elites. An influential figure in the fight against etymological spellings for French was the Calvinist Louis Meigret in the mid-­sixteenth century. Spoken French had drifted further from its Latin origins than most other Romance dialects and there was an anxiety that phonetically based spelling would not only seriously obscure the Latin etymologies but also create large homograph sets and render grammatical and lexical identities opaque. Meigret, however, did not accept these objections, taking his justification from the Aristotelian thesis that writing is the representation of speech. He regarded any spelling that was not true to pronunciation as a ‘superstition’ – we can perhaps see contempt for Roman Catholicism in his use of this term. Meigret went to the length of insisting that some of his works be printed in his own phonetically motivated respellings, as a result of which they were not widely read (Tavoni 1998: 25). A somewhat similar fate befell Le Maître phonétique, the forerunner of the current Journal of the International Phonetic Association, which published its contributions in IPA notation until 1971. Daniel Jones was lamenting already in 1912 that because of this policy ‘many valuable articles are simply lost to the world’ (Collins and Mees 1999: 128). A compromise form of writing French was proposed in which phonetic spellings would be written on a lower line with etymological ones above wherever the etymology was obscured by a phonetic spelling (Tavoni 1998: 25). Like many compromises, it pleased no one and no one took it up. The headmaster of St Paul’s School in London, Alexander Gill, practised a more acceptable kind of compromise for English, resorting to etymology only where sounds he described as ‘indistinct or wavering’ made phonetic spelling problematic. He seems to have been referring to reduced vowels and proposing that non-­reduced alternants should motivate their spelling, a strategy found in some phonological analyses of English schwa, for example Hammond (1999: 206), and which in effect is what English spelling does anyway. Another compromise was proposed by Desainliens (aka Holyband), in which unnecessary letters were to be retained but identified by ‘a speciall marke’ (Desainliens, The French Littelton, Dedication, cited in Danielsson 1955: 65). It is not hard to see that this would make spellings even more complicated and written texts more taxing to read. Attention to the spelling of vernacular languages in Europe was not confined to the Romance world. The same debates were going on in Germany, Denmark, the Netherlands and England, often mixing nationalism into the arguments to



Origins and Development

53

advocate spellings that would mirror the national tongue and mark it as different from neighbouring cognate languages. The egalitarians who favoured the phonetic orientation of spellings over the etymological were following the injunction of Quintilian in the first century ce to write a language as it is spoken rather than speak it as it is written. Writing it as it is spoken is, in the absence of phonetic theory, to practise pseudo-­transcription by prioritising the identity of sounds in spoken language equivalents over the identity of words and morphemes in written language. It means that awareness of pronunciation is sharpened and before long a need is felt for a better understanding of speech and speech sounds. When this need is felt acutely enough it can only be satisfied by developing a theoretical approach to phonetics. A nascent general phonetic theory can be seen in sixteenth-­century western Europe in the works of Jacob Madsen in Denmark and Petrus Montanus in the Netherlands (Kemp 2006: 473–7), who coined hundreds of new technical terms but had little subsequent influence (Abercrombie 1993: 311), but it gained its strongest momentum in England in the work of John Hart (c. 1501–74) and other scholars of the time who were motivated by a commitment to spelling reform in the wake of the sound–spelling dislocations occasioned by the English Great Vowel Shift, and by an interest in observing how speech sounds are made. They are the first of the ‘English School of Phonetics’ discussed by Firth (1946; see also Albright 1958; Collins and Mees 1999: 455–71). Hart acknowledged Meigret as a key influence on his thinking and rejected etymological spellings almost as vigorously, arguing strongly in favour of phonetic spellings. Speech sounds he likened to Aristotelian ‘elements’ and regarded letters as ‘their markes’ and ‘the Images of mannes voice’ (Hart 1551: 29–­34, in Danielsson 1955: 118). These views are similar to those of Sir Thomas Smith (1513–77), an English diplomat stationed in Paris, who wrote that ‘writing may truly be described as a picture of speech’ (Smith 1568: 5, in Danielsson’s edition, 1983: 31). Smith puts forward an Aristotelian case for the naturalness of sound–letter relationships, despite recognising that writing takes its nature ‘by a postulate’ rather than, as he says speech does, ‘by creation’. Arguing syllogistically that ‘if a by itself is a, and b, b; taken together they make ab’ (Smith 1568: 8, in Danielsson’s edition, 1983: 43), he claims that for spellings to disturb this simple orthographic logic upsets the natural order, for example using digraphs such as
and for single sounds; curiously, though, he has no objection to a single letter standing for a cluster of two sounds, as for final /-­ks/, even proposing Greek for English final /-­ps/, which suggests he did not fully understand the archiphonemic nature of in Greek orthography (see Trubetzkoy 1933/2001: 12 n.1). Hart displays a similar attitude when he makes the case for writing to be governed by ‘due order and reason’ (Hart 1569: title page) instead of the disorder he saw in contemporary English spellings. Hart’s descriptions of the production of sounds are more perceptive and detailed than Smith’s, and on the whole reasonably accurate as far as they go. He noted the presence of aspiration in English voiceless plosives, which Smith did not (though he remarks on it in Welsh), and represented it in writing, for example writing pipe as , albeit somewhat inconsistently in relation to /t/ and /k/ (Jespersen 1907: 13–14). He did not provide any description

54

Phonetic Transcription in Theory and Practice

or explanation of aspiration, though, beyond saying that ‘ui brẹð ðe h softli’ (Hart’s spellings). There are other important gaps in Hart’s accounts. He offers no description of the production of [l], for example, and nor did Smith; and although Hart distinguished between voiced and voiceless sounds, like the Greeks and Middle Eastern grammarians he did not appreciate the mechanism of voicing, describing the difference only in auditory-­impressionistic vocabulary such as ‘soft’ (voiced) and ‘hard’ (voiceless). Salmon (1995: 142–­6) gives an account of Hart’s attempt to establish triads of aspirated–voiceless–voiced stops in English on Thrax’s model for Greek, abandoning it when faced with the facts of his own phonetic analyses of English sounds. Smith also mentions the Greek categories as subdivisions of the ‘mute’ consonants, but never actually fully applied the terms to English, probably because he was unable to make them fit. Moreover, his statements that /p/ and /t/ are the same in English as in Latin indicate either that he was unaware of the unaspirated–aspirated difference between the two languages, or that he was referring to an English-­accented Latin. Both Hart and Smith realised that the phonography of the Latin alphabet was inadequate for expressing the sounds of English and devised some notational devices of their own (see Chapter 3 Section 3.4.1). If we take their respective versions of letters for /ʃ/ and look at how they defined them we can see the extent to which their definitions are theoretical or ostensive.1 Taking first Smith’s [ ], which he names [ɛʃ], he gives a list of keywords such as she, shed, shine, ash, blush but provides no description of how the sound is produced. An experimental analysis is performed in which he compares it on the one hand to the sequence [sh-­] constructed by prepending [s] to hell in order to show that the result does not sound like shell, and on the other to the sequence [sj-­] constructed by prepending [s] to yell in order to show that this yields a pronunciation more like shell. Smith thus defines [ ] ostensively and justifies it experimentally by drawing attention to its palatality (without identifying it as such) but does not offer an account of its production. Hart gives two descriptions of the production of [ʃ] (Hart 1569: §38b, in Danielsson 1955: 195; Hart 1570: §2b, in Danielsson 1955: 242) for which he provides the new letter . Both descriptions are less than precise about tongue configuration, saying that the tongue is drawn ‘inward’ to the upper teeth and that [ʃ] is distinguished from [s] and [z] by the tongue not touching the palate. In contrast to Smith, Hart does attempt to define the uniqueness of [ʃ] in articulatory terms, although not as accurately as Danielsson (1955: 221) is prepared to give him credit for. But it does mean that of the two, Hart is the more theoretically inclined in providing an interpretation of his letter which is not solely ostensive. Consequently, Hart’s [ȣ] has more of the proper phonetic symbol about it than Smith’s [ ] and reaches a level of phonetic description comparable to that achieved by the medieval Middle Eastern linguists such as Sībawayh and Ibn Sīnā (Avicenna), whose descriptions of Arabic [ʃ] refer to a narrowing relation between the middle part of the tongue and the hard palate (El-­Saaran 1951: 247; Semaan 1963: 39–40; Al-­Nassir 1993: 15). Danielsson (1955: 54) is clear that Hart ‘had devised his new orthography to serve both as a reformed spelling of English and as a general phonetic alphabet’.



Origins and Development

55

Hart’s primary aim, however, was to reform spelling. In so far as he developed a phonetic theory it was to guide orthographic decisions away from the irregularities and morpho-­phonographic tendencies of English spelling firmly towards a completely phonographic writing. His notation was there to provide the resources for it. It is clear that he desired to go a long way in the direction of phonography to provide spellings which are ‘shallow’ in Sampson’s (1985: 43–5) sense of being close to the surface phonetics of speech. His distinct spellings for strong and weak forms of English gradable words show sensitivity to differences in their pronunciation, and he provides spellings for assimilated and elided forms – for example, weak-­form and spelled as before vowels and before consonants, as with before voiced sounds and before voiceless ones (Danielsson 1955: 187). Although primarily a spelling reformer, Hart shows the kind of observational acuity without which an adequate theory of phonetics cannot develop. He is part of the wider trend towards observation and description that formed the beginnings of the scientific methods that became more firmly established in the following century. Additional observations about speech sounds and speech production were made in the late sixteenth and seventeenth centuries which helped to advance phonetic understanding and provide the knowledge for more detailed phonetic descriptions. In talking of the seventeenth-­century scholars who wrote on phonetics, Abercrombie (1993: 310) has remarked: ‘Their contribution to the history of the subject is not to be despised. They succeeded in constructing the foundations of a true general phonetics.’ Robert Robinson, a contemporary of Shakespeare, published The Art of Pronuntiation in 1617 not so much to reform spelling as to devise a way of describing pronunciation so that learners of foreign langauges could learn native-­ like forms of speech. He created a vowel chart, perhaps the first ever, showing in a diagrammatic representation of the mouth the relationship of the tongue to five points along the palate (see Figure 2.3a). At each point Robinson indicated five associated vowel qualities, in short and long variants, using his own set of symbols, although neither the open–close dimension nor lip-­shape is incorporated into the scheme. Figures 2.3b and 2.3c show that very similar scalar diagrams were used by Bell (1867: 74) and Jones (1918/1972: 32). For consonants, Robinson used his own adaptations of existing letters and designed new ones, using diacritics to distinguish between voiced and voiceless (Dobson 1957: xii–xiii, 23–4). He defined the characters in terms of five locations for vowels and three for consonants (‘outer’, ‘middle’ and ‘inner’), and four consonantal manner distinctions (‘mute’ = plosive, ‘semi-­mute’ = nasal, ‘greater obstrict’ = fricative, ‘lesser obstrict’ = approximant) plus a fifth for ‘the peculiar’ [l] (ibid.: 14–24). Assignment of sounds to these categories is not always in agreement with modern phonetics: [θ] and [ð] are placed in the ‘inner’ region along with velars, behind [s] and [z]. Comparing his solution for [ʃ] with Smith’s and Hart’s, Robinson tells us in a passage reminiscent of Smith that he derived his symbol [xx] from the sequence [xox] (= [jsj]) because ‘it seems to be but one consonant sound, nor indeed can it be discerned to be otherwise, vnlesse by a very diligent obseruation’ (Robinson 1617 (not paginated), italics added). That he did not give [ʃ] the status of a primitive suggests he thought in reality

56

Phonetic Transcription in Theory and Practice a)

b)

c)

FIGURE 2.3:  (a) Robinson’s ‘scale of vowels’ diagram of 1617. A = larynx, B = front of palate, C = tongue root. Robinson (1617), The Art of Pronuntiation, facsimile edition, edited by R. C. Alston, Menston: The Scolar Press, 1969; (b) Bell’s ‘scale of lingual vowels’ of 1867 with his Visible Speech symbols. Bell (1867), Visible Speech: The Science of Universal Alphabetics, London: Simpkin, Marshall and Co.; (c) Jones’s drawings of cardinal vowel tongue positions of 1918, based on X-­ray photographs. Jones (1918/1972), An Outline of English Phonetics, Cambridge: Cambridge University Press, ninth edition

it was two sounds, which would explain why he did not classify it or give it a description to compare with Hart’s. Nevertheless, Robinson’s scheme marks an advance on the work of Hart for its conception of a notation free from the influence of any irregularities in the sound–spelling correspondences of traditional orthography, and for the setting up of a small number of theoretical phonetic categories to account for all the consonants and vowels he could discern. His notation therefore meets the requirement of a proper phonetic notation more fully than Smith’s or Hart’s because it is more explicitly based on theory, however inadequate we might nowadays judge that theory to be. Its purpose was not to replace extant orthography but to be able to represent the expression elements of spoken language. His symbols can therefore be said to denote general phonetic models that have theoretical definitions. Their use in proper phonetic transcription is exemplified in a number of surviving manuscripts in the Bodleian Library, most extensively in a transcription of a poem by Richard Barnfield,



Origins and Development

57

Lady Pecunia, which runs to 56 six-­line stanzas. Robinson may therefore arguably be the first phonetician to produce proper running phonetic transcriptions in English; they can be classed as generic, broad and systematic (see Chapter 4 Sections 4.1, 4.3 and 4.4). An interesting feature of Robinson’s notation is the way he represented voice and voicelessness as consonantal prosodies or ‘long domain’ features, which ‘strikingly anticipated Firthian prosodic analysis’ (Abercrombie 1993: 311). Voiced and voiceless cognates were given the same base symbol and an ‘aspirate’ mark was placed above the first consonant symbol of a syllable if the onset and/or coda contained any voiceless consonants: [↼] = voiceless onset, [⇁] = voiceless coda, [ϟ] = voiceless onset and coda. Dobson (1957: xiii) complains that this is ‘ill-­conceived’, but it has some merit as an analysis of English onset and coda clusters in which, with a handful of optional exceptions, obstruents agree in voicing (Gimson 1980: 239–53). In the latter half of the seventeenth century four figures are generally credited with having made the most progress in the English School of phonetics: John Wallis, John Wilkins, William Holder and Francis Lodwick. Wallis (1616–1703) attracted controversy for accusations and counter-­ accusations regarding claims about his achievements, for which Firth (1946: 109) is unforgiving, but Kemp (1972: 13), while not excusing Wallis’s dishonesty, is a little more understanding of how academics sometimes succumb too much to vanity. In the Tractatus de Loquela, prefaced to his Grammatica Linguae Anglicanae of 1653, Wallis, a founding member of the Royal Society, presents a classificatory scheme for vowels and one for consonants.2 These are summarised in tables of intersecting categories much like the modern IPA chart in principle if not in detail (reproduced in Figure 2.4). Vowels are defined as the intersections of two dimensions, front–back and close–open, each having three values: guttural–palatal–labial and wide–medium–narrow respectively, specifying nine vowel qualities; Bell’s (1867: 73) nine primary vowels, and Sweet’s (1877: 12), are defined by almost identical categories (Kemp 1972: 46) but presented in tabular form more iconically with the high–mid–low categories on the vertical axis, where Wallis places his wide–medium–narrow on the horizontal axis. Wallis gives other dimensions (open, round, obscure, fat, thin) in the cells in a somewhat ad hoc manner. The table for consonants shows four dimensions: the manner dimension mute–semi-­mute–semivowel built on the place dimension labial–palatal–guttural, and a thin–fat dimension (which Wallis describes variously as a spread–rounded or narrow–wide distinction) built on an aspirate–non-­aspirate dimension (although the thin–fat distinction does not apply to non-­aspirates). For an extensive discussion of Wallis’s knowledge of phonetics, how it compared to that of other scholars of the time, and the meanings of his terms, see Kemp (1972: 39–66). For our purposes we should note that his terminology originates in a theoretical approach even if it is at times rather vague (Kemp 1972: 48), and that Wallis tried to fit vowels and consonants into the same place-­of-­articulation dimension of ‘labial’, ‘palatal’ and ‘guttural’, anticipating some modern attempts such as Catford’s polar coordinates (Catford 1977: 182–7).

58

Phonetic Transcription in Theory and Practice

FIGURE 2.4:  Wallis’s 1653 sound chart ‘Synopsis of all letters’

The significance of a tabular presentation of sounds in the development of phonetic theory can hardly be overestimated. By setting up phonetically defined dimensions whose categories intersect, phonetic models are generated which become the denotata for phonetic notation. That is to say, instead of symbols denoting real-­world phenomena with all the problems that that conception of symbols brings (see Chapter 1 Section 1.2.3), they can denote products of a theory. In this manner, orthographic characters are transmuted into proper phonetic symbols. Tables with dimensions defined in terms of articulatory phonetic theory are models of an abstract taxonomic phonetic space in a way that labelled diagrams of the vocal tract such as Robinson’s for vowels and Al-­Sakkākī’s for consonants are not. Labelled vocal tract diagrams associate parts of the vocal tract with particular sound qualities whereas tables define the articulated dimensions of a more abstract conception of phonetic space with at least the potential to be domain-­neutral. Wallis may not, of course, have thought of his tabular arrangement in quite these terms, but it liberated symbols from their orthographic origins to guarantee them the potential for a freedom they had never had before, allowing them to be put to the service of phonetics



Origins and Development

59

as a scientific notation. Abercrombie’s (1993: 312) verdict on Wallis, that ‘his De Loquela is an unsatisfactory book in many ways’, overlooks this very significant step in the often parallel development of phonetic theory and phonetic notation. We can see in Wallis’s table how it generates sound-­types which he recognises as not occurring in speech (mugitus ‘mooing’, gemitus ‘groaning’),3 just as we saw in Chapter 1 Section 1.3 how the IPA chart generates ‘pharyngeal nasal’ although no such sound is possible. Compared to Robinson, Wallis is less venturesome in his symbol set – his only new symbol is [ɴ̄ ], which denotes a voiced velar nasal – but the models they denote have firmer theoretical foundations resulting from a more systematic attempt to chart taxonomic phonetic space. There is a line of development from Hart through Robinson to Wallis in which phonetic observations become more systematic though not always more accurate, phonetic theory is more prominent, and a more universalist perspective is evident. Where exactly we draw a line and say that proper phonetic notation in the western Early Modern world starts will be to some extent arbitrary, but there is enough to show that Wallis was clearly operating in a manner informed by observation and theorising which was closer in method to modern phonetics than his predecessors. He also showed more concern to make his scheme applicable to other languages, including the non-­European languages Hebrew and Arabic. Like any other pre-­Modern phonetician, he can be criticised for errors that seem elementary to us. For example, he says that in the production of [θ, ð] the air exits through ‘a round shaped hole’ while for [s, z] it escapes ‘through a slit’ (Wallis 1765: 23, tr. Kemp 1972: 173) and he fares no better than Robinson, and rather worse than Smith and Hart, on the ‘esh test’. Wallis excluded [ʃ], and the affricates [ʧ, ʤ], from his table, regarding them as compounds made up of the sequences [sj, tj, dj]. Kemp (1972: 60) conjectures that Wallis may have based his analyses on pre-­coalescent pronunciations of words such as nation, nature, soldier (see Cruttenden 2001: 76, 190) rather than on words such as shop, ash, church, judge in which [ʃ, ʧ, ʤ] do not result from coalescence. This greater uncertainty about [ʃ] in the later writers Robinson and Wallis, also seen in eighteenth-­century accounts of English pronunciation (e.g. Walker 1791: 4), may be connected with coalesced pronunciations of words such as sugar starting to be perceived as vulgarisms (see Beal 1999: 144–51). Bishop John Wilkins, brother-­in-­law to Oliver Cromwell and, like Wallis, a founder member of the Royal Society, lived from 1614 to 1672. His reputation among modern linguists is for his work on a ‘universal language’, the famous Essay Towards a Real Character and a Philosophical Language (1668), that would by-­pass natural languages and allow world-­wide communication in terms of supposedly universal semantic categories each having its own written character. This semasiographic project was carried out in an intellectual climate much influenced by Francis Bacon (Salmon 1983: 128) in which there was little faith in the ability of natural languages to express truth clearly and distinctly. In the five years prior to his death in 1626, Bacon had written in his unfinished work, The Great Instauration, about what he called the ‘idols of the mind’, four types of preconceptions or inclinations in the minds of human beings which tend to

60

Phonetic Transcription in Theory and Practice

prevent us from apprehending truths. The type he called ‘idols of the marketplace’ were responsible for the false belief that we have rational control over our use of language, and for our failure to see that language can control our thought. In a sentence which looks forward to activation models of the mental lexicon, Bacon asserts that ‘words react on the understanding; and this it is that has rendered philosophy and the sciences sophistical and inactive’ (Spedding, Ellis and Heath 1858: IV 60–1, quoted in Carlin 2009: 19). The desire to establish a universal philosophical language in the seventeenth century had both religious and scientific motivations. On the religious side, it was a programme to tackle the linguistic chaos which ensued, according to the Old Testament, after the destruction of the Tower of Babel. Latin had functioned as a kind of universal language in Roman Christendom but the rise of vernaculars, and the strength of the Reformation, had weakened its status (Clauss 1982: 532–3). In the opinion of many, a state of linguistic homogeneity needed to be restored to mankind. On the scientific side, advances in the taxonomic classification of the natural world led to a belief that all reality and human experience could be similarly classified and a system of universal categories set up as the content elements of a universal language. Reality and language would then ‘form two isomorphic systems’ (Hüllen 1986: 119) over which the idols of the marketplace would have no power. Each category would be assigned a written character which in some versions would be pronounced as the translation equivalent of the language of the reader – that is to say, the character would be a semasiogram – while in other versions, Wilkins’s being one, each character would be assigned a pronunciation. For this purpose, Wilkins tried to establish universal phonetic categories, much as does the IPA. The linking of a universal perspective on phonetics with the idealism of international communication came about again in the late nineteenth and early twentieth centuries when spelling reformers and the Esperanto movement made common cause in challenging national orthographies and national languages, bolstering their positions with reasoning from phonetics. Wilkins is important for his contribution to both phonetic theory and phonetic notation. Regarding phonetic theory, his classification of consonants showed more awareness of articulatory structures than Wallis’s and he made a more succcessful attempt to incorporate vowels into the same scheme. His cross-­ classificatory sound chart, shown in Figure 2.5, is therefore a more sophisticated model of articulatory phonetic space and each symbol consequently denotes a more exact general phonetic model. Regarding notation, Wilkins devised symbols based on the postures of the speech organs during the production of consonants and vowels in so far as they were understood. The symbols of this ‘organic alphabet’ bear no relation to alphabetic letters but are motivated by the shapes of the articulators and the passage of the airstream, their iconicity depending on observation and theory. Wilkins makes no mention of the Dutch philosopher and alchemist Franciscus Mercurius ab Helmont, who the year before had published his account of the Hebrew alphabet (Helmont 1667) with cutaway sagittal drawings of the vocal tract to try to prove that Hebrew letters constituted a ‘natural’ organic alphabet. Wilkins’s drawings are stylistically and anatomically very similar, including an



Origins and Development

61

FIGURE 2.5:  Wilkins’s sound chart of 1668. Reproduced with the permission of the Brotherton Collection, Leeds University Library

‘at rest’ diagram with numbered articulators. Although Wilkins did not intend his organic symbols to be used as transcription symbols, they marked an important step away from orthographic thinking. The importance of this step is summed up by Heselwood et al. (2013: 12):

62

Phonetic Transcription in Theory and Practice

Organic symbols explicitly identify sounds as objects of study independently of any writing system and therefore imply the possibility of phonetics as a language-­independent discipline drawing on the disciplines of anatomy and physiology. In their role as ‘pictures of the letters’ the organic symbols linked the letters to articulation to give them concrete phonetic interpretations, thus acting as shorthand definitions for the accompanying letters which were used as phonetic notation. For example, the organic symbol for [F] (= IPA [f]) shows the two lips touching but with a line bisecting them to indicate that air is passing between them. For [P] (= IPA [p]) there is no bisecting line, and for [V] (= IPA [v]) the line has a single oscillation at the left end to indicate vibration of the epiglottis, which Wilkins took to be the source of voicing (the vocal tract is oriented to face right; see Chapter 3 Section 3.1 on organic notation). One year after Wilkins’s Essay, William Holder’s Elements of Speech was published, although it was probably completed before Wilkins’s work appeared (Salmon 1972: 152). Holder lived from 1616 to 1698. That he continues the general Aristotelian view of writing’s relation to speech is evident when he says (Holder 1669: 63) that ‘[l]anguage is a connexion of audible signes [. . .] Written language is a description of the said audible signes by signes visible.’ Holder has a view of spoken language very similar to Smith’s and Hart’s in which the sounds we make are ‘natural elements’ but the meanings are ‘artificial’ and come about by ‘institution and agreement’ (ibid.: 9–11). How the ‘audible signes’ are to be written is something which can be reasoned about rather than resulting from the operation of ‘uncertain fabulous relations’ beyond our knowledge. Although he talks of written language as providing a ‘description’ of spoken language, Holder did not propose organic symbols. Like Smith, Hart and Wallis, he gave phonetic definitions to existing roman letters, took [θ] from Greek, and used a few extra ones, for example [ȣ] representing a ‘labio-­guttural’ vowel, a glyph previously used by Hart for [ʃ]. Holder employed the diacritic [‘] to denote voicelessness when added to a sonorant consonant, for example [L‘] (= IPA [l ]̥ ), but for nasalisation when added to a fricative, for example [S‘] (= IPA [s̃]); see Figure 2.6. The general strategy of co-­opting roman alphabetic letters, taking letters from other alphabets and adding new letters and diacritics to create a notation system was to become, over two centuries later, the recognised strategy of the IPA for enlarging its stock of symbols (see Chapter 3 Section 3.4.5). Albright (1958: 8–12) thinks Holder’s lasting importance in phonetics can be reduced to his invention of the [ŋ] symbol for a velar nasal, although the symbol itself did not appear because, as Holder explains, the printer had no type for it. In fact, Alexander Gill had already come up with something very similar in his Logonomia Anglica of 1619 (Abercrombie 1981: 212). Albright’s rather dismissive evaluation, perhaps premised on the erroneous view that Holder merely followed Wilkins (Albright 1958: 11), ignores some quite profound passages in Holder which have led Kemp (1981a: 42) to compare him to the ancient Indian grammarians and Abercrombie (1993: 315) to hail him as ‘the most important 17th century figure’ in phonetics. Holder’s description of voicing (Holder 1669: 23) is the first comprehensive account in western phonetic literature which, even



63

Origins and Development

if it does not quite attain the accuracy of modern descriptions (Abercrombie 1986: 4–5, 1993: 318–19), ‘provides the conceptual rudiments of what we know as the aerodynamic-­myoelastic theory of phonation, and the source–filter model of speech production’ (Heselwood et al. 2013: 12). It refers to breath from the lungs passing between approximated vibrating cartilages in the larynx to create a tone which is ‘sweetened and augmented’ by resonance in the supralaryngeal vocal tract. In Abercrombie’s (1986, 1993) discussions of the ‘hylomorphism’ of Holder’s framework, we can see a clear identification of ‘matter’ and ‘form’ in speech production with the ‘source’ and ‘filter’ respectively of modern speech acoustic theory. The matter, or material of speech, is the airstream, which can be voiced or voiceless and which remains undifferentiated until given different forms by the variable filter of the supralaryngeal vocal tract. Holder’s hylomorphic scheme and the modern source–filter scheme can be mapped onto the three functional components of speech production in parallel, as in (2.1). (2.1)   Holder: Functional components: Acoustic theory:

Matter Initiation

Phonation

Source

Form Articulation Filter

Some confusion over whether glottal [h] and [ʔ] count as sounds comes through in Holder (1669: 72–­3), which is not a great surprise when we consider difficulties later writers have had in distinguishing between phonatory and articulatory functions in the larynx. Holder’s descriptions of several sounds are notable for detail and accuracy.

FIGURE 2.6:  Holder’s table of consonants (left) and ‘scheme of the whole

alphabet’ (right). From Holder (1669: 62, 96)

64

Phonetic Transcription in Theory and Practice

His account of [l] and [r] would only look outdated in a modern phonetics textbook for its seventeenth-­century language. Muscles are identified, and the trilling action of [r] is described in aerodynamic-­myoelastic terms: ‘born stiffely, as with a Spring, by the Muscles, (especially by the Genioglosse) and agitated by strong impulse of Breath’ (Holder 1669: 50). The syntagmatic axis of speech gets more attention from Holder than from other writers of the time. He sees speech as successive openings and closings of the vocal tract, each cycle separated by an ‘appulse’, an approach of an active articulator towards a passive one, very much in the same vein as the ‘frame and content’ view of syllables based on mandibular cycles of Davis and MacNeilage (2005). Analysis of places and manners of articulation is more modern-­sounding in Holder than in previous accounts, with greater consistency in situating the terminology in relation to the different domains of phonetics, and there is more emphasis on what would nowadays be called the phonemic or phonological function of speech sounds (Fromkin and Ladefoged 1981: 4). Finally, it is worth drawing attention to Holder’s account of the process of hearing in the Appendix to Elements of Speech, where he identifies the components of the outer and middle ear, the ‘three very little Bones’, and refers to the ‘inward ear’ which connects to the auditory nerve. The last of the English School phoneticians to be considered here is Francis Lodwick (1619–94). His Essay Towards an Universall Alphabet was published by the Royal Society in 1686 but had already been circulating for some years amongst scholars interested in universal languages (Abercrombie 1948/1965: 49). It presents an organic-­analogical alphabet (see Chapter 3 Section 3.2.2, and Figure 3.8) in tabular form which only partly follows the structure of the vocal tract and uses numbers to label the rows and columns (‘ranks’ and ‘files’) instead of phonetic terminology. Although we can see the network of cross-­classifications showing which consonantal correlations are proportional to other correlations, Lodwick does not identify the phonetic bases of these relationships, leaving the reader to work them out. This absence of phonetic explanation means that Lodwick did not really add very much to phonetic theory, although his principle ‘that no one Character have more than one Sound, nor any one Sound be expressed by more than one Character’ (Lodwick 1686: 127, in Salmon 1972: 236) is close to the IPA principle, first articulated in 1888, that ‘[t]here should be a separate letter for each distinctive sound’. One other interesting point of theory, although he gives no rationale for it, is that voiceless obstruents are derived from more ‘primitive’ voiced ones. In the history of how voiced and voiceless obstruents have been handled in descriptive frameworks, we have here perhaps for the first time the suggestion that voiced obstruents are more basic than voiceless ones. It is not clear whether Lodwick conceived of the relationship being one in which voicelessness is added to derive voiceless obstruents, or voice is taken away, though the former is implied by the device of adding a stroke to denote voicelessness. By the late seventeenth century phonetic knowledge in England had reached a level broadly comparable to the Middle Eastern grammarians of some eight hundred years before. It would not reach the level of attainment of the ancient Indian grammarians of over two thousand years before until the nineteenth century.



65

Origins and Development

2.3.3  Phonetic terminology in the ‘English School’

One indication of a mature scientific discipline is a stable and consistent terminology so that the same phenomena are referred to in the same way by different scholars. By this indicator, phonetics was still making its way through early adolescence in the seventeenth century, with no two writers using the same set of classificatory terms. Table 2.1 presents the manner of articulation terms employed by the major figures from Smith to Holder against the closest IPA equivalents; Lodwick has been left out because he did not use phonetic terminology to classify sounds. We get a sense of each scholar trying to find the most appropriate terms for the categories as they understood them. Influence from classical writings surfaces most clearly in Wallis but there are differences in how classically derived terms are used. For example, Robinson and Holder use ‘mute’ for plosives, in line with the term’s classical origins (from Greek aphōna via Latin mutae; Allen 1981: 117–18), and ‘mute’ was used in this sense as late as the early 1840s by Pitman (see Kelly 1981: 251–2), while Wallis and Wilkins use it for all voiceless sounds. There is conspicuous uncertainty here about whether ‘mute’ refers to absence of sound generated at places of articulation or in the glottis, probably because the mechanism of voicing was not known except by Holder. Terms vary in their relations to different phonetic domains. They had not settled into the predominantly articulatory basis of modern phonetic categories. ‘Mute’, ‘sonorous, ‘hard’ and ‘soft’ are auditory-­perceptual concepts; ‘obstrict’, ‘aspirate’, ‘breathless’, ‘breath’ and ‘pervious’ are aerodynamic; while ‘closed’, ‘occluse’, ‘open’ and ‘partial’ are articulatory, as are ‘thin’ and ‘fat’, which Wallis uses to refer to the size and shape of articulatory constrictions. Wilkins’s inclination towards aerodynamic terms may reflect the focus on airflow expressed in his organic-­iconic diagrams and symbols (see Chapter 3 Figure 3.3). This mixture TABLE 2.1:  Consonantal manner terminology in the ‘English School’ of phonetics

in the sixteenth and seventeenth centuries IPA

Smith

Hart

Robinson Wallis

Wilkins

Plosive

Mute

Stopped breath

Mute

Primitive/ Closed

Breathless Plenary/ Occluse/ Mute

Continual breath

Greater obstrict

Thin Derived/ Mouth-­ Open/ breathing Aspirate

‘Blæse’ Partial/ (lisping) Pervious and sibilant

Fat

Semi-­ vocal

Fricative

Approxi- Semi-­ Semi-­vocal Lesser  mant vocal/ obstrict Liquid Voiceless Hard

Breath, hard

Aspirate Mute

Voiced

Soft

Sound, soft No term Semi-­mute

Nasal

Semi-­ Semi-­ vocal/ vocals Liquid

Semi­mute

Semi-­vocal

Holder

Mute

Breath

Sonorous

Voice

Nose-­ breathing

Nasal

66

Phonetic Transcription in Theory and Practice

of terms from different domains shows an empirical taxonomic approach which had not yet decided on its methods of observation and classification and had not oriented itself into a single overall direction. It was later to do so by attending to physiological causes of speech sounds and the anatomical structures responsible for them. Much of the impetus in this direction came from the Indian tradition, which came to the notice of modern western linguists only towards the end of the eighteenth century, but we should not overlook the steps which were taken in this direction by the ‘English School’. For example, we have already seen that Holder had greater insight into phonation than his contemporaries because of knowledge of laryngeal structure and vibration. Holder’s conception of a basic distinction between ‘breath’ and ‘voice’ is the one the Indians operated with under the terms śvāsa and nāda (Allen 1953: 33–4), and which may have been independently developed in the medieval Middle East by Sībawayh, who coined the terms mahmūs (participle form of Arabic hams ‘whisper’) and majhūr (Arabic jahr ‘clear, outspoken’), possibly as a result of Greek influence. Holder may have got it from Hart’s ‘breath–sound’ dichotomy by applying his more accurate knowledge of phonation. It is the distinction used by Sweet (1906: 9–­12), based on Bell (1867: 45–­6), and perpetuated in Jones (1918/1972: 19–­22), who equates breath with ‘voiceless’, the latter being preferred by Abercrombie (1967: 26–­7) and now in widespread use. Of all the terms in Table 2.1, these are the only ones with a presence in modern phonetic taxonomy, although ‘sonorous’ and the concept of sonority have become centrally important in theories of the syllable (Laver 1994: 503–­5; see also Botma 2011). 2.3.4  Phonetic theory in the late eighteenth and nineteenth centuries

The eighteenth century saw very little progress in phonetics until the final quarter. This is in great contrast to the nineteenth, by the end of which huge advances had been made in phonetic theory and also in the application of technology to the study of speech. It is not an exaggeration to say that by the start of the twentieth century phonetics had become a science in Europe linked with the scientific study of anatomy and physiology and of acoustics (Albright 1958: 19), but first it had to forge its own identity separate from the interests of language teaching and spelling reform. It is in the nineteenth century, particularly the second half, that we see most directly the roots of modern theoretical and experimental phonetic science and the development of our current resources for phonetic transcription. Notation systems are dealt with in Chapter 3, where their relations to phonetic theory will be examined in some detail and in a historical context; consequently at this point comments on these matters will be kept brief. In general at this period phonetic theory was more closely tied to issues of notation than to instrumental methods and experimental procedures, the latter being carried out by physical scientists who viewed speech as the product of a system of pumps, tubes and valves rather than as the spoken manifestation of language. Symbols in notation systems had to be defined, and this was usually done in relation to how the symbolised sound was understood to be produced, that is to say in terms of articulatory phonetic theory. Several ingredients came together from the late eighteenth through to the



Origins and Development

67

mid-­nineteenth centuries which all contributed significantly to the formation of ­phonetics as a science. Marking the start of the last quarter of the eighteenth century was Joshua Steele’s (1700–­ 91) An Essay Towards Establishing the Melody and Measure of Speech of 1775. Steele was concerned with the prosodic structure of speech and particularly with its representation. He adapted terms and notational devices from music in his analyses of rhythm, intonation and other dynamic features. Steele’s work went largely unappreciated at the time (Sumera 1981: 103), but some of his resources have made a reappearance in the extensions to the IPA with the same applications to speech, for example allegro, f(orte), p(iano) (Duckworth, Allen, Hardcastle and Ball 1990); there are also resemblances to later interlinear intonational transcriptions (see Chapter 4 Section 4.11.3) and to Halliday’s (1970: 52) representations of intonational pitch. One of the first representations of vowels in an abstract vowel space was presented in the 1781 Dissertatio Physiologico-­Medica de Formatione Loquelae of Christoph Hellwag (1754–1835) in the form of a ‘vowel triangle’ (Kemp 2001: 1469–70). It has clear similarities to the cardinal vowel system of Daniel Jones (e.g. Jones 1918/1972: 31–9) and the modern IPA vowel quadrilateral. Lexicography rather than general phonetics was more in the ascendency at this time as shown in the number of English pronouncing dictionaries which appeared with various ways of representing consonants, vowels and word-­accent (Beal 2008). In his Grand Repository of the English Language of 1775 Thomas Spence (1750–1814) produced ‘a genuine, scientific, phonetic alphabet’ (Abercrombie (1948/1965: 68). The letters of this alphabet are modifications of the roman alphabet and are presented in alphabetical order with keyword exemplifications but without phonetic descriptions. It is questionable whether Spence really adds anything to general phonetic science, although he can be applauded for showing that it is possible to regularise the grapheme–phoneme correspondences of English into a ‘broad phonemic system’ (Beal 1999: 89). John Walker (1732–1807) achieved greater fame than Spence with his A Critical Pronouncing Dictionary of 1791. Walker’s classification scheme shows no advance on those of Wilkins or Holder, and his phonetic descriptions are sometimes less perceptive. He does not appear to have understood Holder’s account of voicing despite referring to it. Labiodental fricatives he describes as produced ‘by pressing the upper teeth upon the under lip’ (Walker 1791: 6), which fails to assign active and passive roles accurately to the articulators. He seems unsure whether the sounds corresponding to English orthographic and
are single sounds or not, describing them rather confusedly as ‘mixed or aspirated’, having ‘a hiss or aspiration joined with them, which mingles with the letter’ (Walker 1791: 4–5). While Walker seems to have had an acute ear for detecting subtleties of sound, he lacked a corresponding acuity in matters of phonetic theory. A huge influence on phonetics, because of the need to apply it as a tool in historical and comparative linguistics, came from the work of Sir William Jones, a British legal official stationed in India. Although resemblances between Sanskrit and European languages had been noted from the late sixteenth century (Robins 1990: 150), it was Jones’s presentation of his famous paper with the unpromising title Third Anniversary Discourse in 1786 that established beyond doubt the systematic relationship of Sanskrit to Greek and Latin, and set historical linguistics

68

Phonetic Transcription in Theory and Practice

on a footing where it could apply the taxonomic approaches current in botany and biology to historical linguistic data. Interest in Sanskrit and the availability of Sanskrit texts brought ancient Indian phonetics into European scholarship such that, according to Allen (1953: 7) as we saw above (Section 2.3.1), ‘Henry Sweet takes over where the Indian treatises leave off’, although Alexander J. Ellis had already made a study of Indian ideas, as had the German-­trained American linguistic W. D. Whitney. The two biggest influences on Sweet were probably Ellis and Bell, but he also greatly admired the Norwegian phonetician Johan Storm and took serious note of what was going on in Germany in the work of Carl Merkel, Eduard Sievers and Wilhelm Viëtor. Sweet’s contributions to phonetic theory have been evaluated by Kelly and Local (1984), who stress the attention to detail, consistency of description and comprehensiveness of scope of his work compared to his predecessors such as Bell. A. J. Ellis (1814–­90) is best known for his researches on, and conjectured phonetic descriptions of, English pronunciation from the Old English period through to his own time, and for his work with Isaac Pitman on systems of notation (see Kelly 1981) leading to his own palaeotype system (Ellis 1867). Alexander Melville Bell (1819–­1905) is probably best known for his Visible Speech, also dated 1867, an experiment in organic alphabet creation based on detailed analyses of consonant and vowel production. It combines the principles of Wilkins’s organic and systematic alphabets so that ‘all Relations of Sound are symbolized by Relations of Form’ (Bell 1867: 35). Sweet at first resisted the organic approach to notation but soon became a convert (see Chapter 3 Sections 3.1.4 and 3.1.5). Advances in phonetic theory at this time owed much to comparative and historical linguistics on the one hand, and to medical understandings of anatomy and physiology on the other. Although the development of the comparative method in the first half of the century by scholars such as Rasmus Rask, Franz Bopp and Jakob Grimm sought to establish language relationships through shared sounds, the emphasis was on their lexical distribution rather than the phonetic structure of the sounds themselves (Morpurgo Davies 1998: 163). It soon became clear, however, in the attempts at internal reconstruction by scholars such as August Schleicher and Friedrich Schlegel, that their methods would require a phonetic theory sophisticated enough to account for phenomena covered by sound laws such as Grimm’s Law and Verner’s Law. On the practical side, a good notation system makes for more concise, accurate and systematic descriptions of historical and comparative data, as Ellis (1867: 1–2) remarked. From the 1850s, articulatory and acoustic frameworks of phonetic description became available in Germany through the work of the physiologists Ernst Brücke and Carl Merkel, and the physicist Herrmann von Helmholtz, who used their scientific knowledge to study properties of speech sounds. Helmholtz (1821–94), for example, identified separate vowel resonances in the mouth and pharynx, and undertook experiments in synthetic speech (Dudley and Tarnoczy 1950). By applying advances in medical technology, techniques of laryngoscopy, aerometry and direct palatography were developed for investigation of the articulatory domain of phonetics, and the acoustic domain started to become more amenable to investigation with Scott’s Phonautograph, invented in 1859. These developments in the understand-



Origins and Development

69

ing of the physical properties of speech made it possible to give a more explanatory account of historical sound changes, and formed the foundation for Eduard Sievers’s achievements in general phonetic theory and its application to historical linguistics and linguistic phonetics (see Kohler 1981). But perhaps the invention with the greatest impact on phonetic transcription took place in 1877, the year after Sievers’s Grundzüge der Lautphysiologie and the year of Sweet’s Handbook of Phonetics. This was the invention by Thomas Alva Edison, himself hard of hearing since childhood, of a device for audio recording and playback. Without audio recording we would not be able to collect speech from different speakers and store it for later analysis, nor would we be able to listen to the same utterance again and again, which is essential for analytic listening. All impressionistic transcription would have to be live, and its inability to keep up with continuous speech would make it a rather poor tool. Nowadays we take recorded speech very much for granted, but as phoneticians we should probably be more thankful for this invention than for anything else to be found in phonetics laboratories. It is hard to overestimate the impact that sound recording has had on the development of phonetics as a data-­driven science. Henry Sweet’s Handbook of Phonetics of 1877 and Sievers’s Grundzüge der Phonetik of 1881 show us the state of phonetic theory in western Europe in the years leading up to the formation of the IPA. Both authors stressed the importance of accurate phonetic descriptions of living languages and the value of practical phonetic skills. Both were also suspicious of instrumental phonetics and tried to discourage it, which in hindsight looks rather Canute-­like given the ubiquity of instrumental methods in phonetics today, and somewhat misplaced in the light of what they have revealed to us about the articulatory and acoustic structure of speech. Nevertheless, it would be unwise to dismiss Sweet’s plea that instrumental methods should not be allowed to supersede auditory methods, a plea taken up later in this book in Chapters 5 and 6. The context in which the International Phonetic Alphabet, the most well-­ known and widely used phonetic notation system, had its beginnings was formed from the influences outlined above coupled with the desire to make pronunciation clearly representable in written form. Two groups who made common cause in pursuit of this desire were spelling reformers and teachers of modern languages, and several of the most influential and energetic founders of the International Phonetic Association were both, including the leading figure Paul Passy. 2.3.5  From correspondence to representation

In summary, the process by which the phonographic orientation of writing and the development of phonetic theory have made possible a proper phonetic notation and proper phonetic transcription is one where relations of correspondence change into relations of denotation and representation. Phonography provides written characters which correspond to units of pronunciation. Phonetic theory provides models for units of pronunciation. If written characters are used to denote these models then they are being used as general phonetic symbols. When speech phenomena are mapped onto these models, then the phonetic symbols denote descriptive models and can be said to represent those phenomena. It is

70

Phonetic Transcription in Theory and Practice

the difference in function between correspondence and representation that essentially distinguishes spelling from phonetic transcription. Failure to distinguish correspondence relations from representation relations is responsible for the Aristotelian doctrine that written language represents spoken language, and it sustains the energies of spelling reformers who wish to regularise sound–spelling correspondences. Even in a fully regular and consistent phonographic writing in which sound–spelling correspondences were entirely isomorphic with symbol– sound representations, spelling and transcription would still be different activities with different purposes and interpretations: the former identifies meaningful items of linguistic content in written language, the latter embodies an analysis of meaningless items of linguistic expression in spoken language. Pseudo-­phonetic transcription was possible more or less from the beginning of glottographic writing. Writing a foreign name in Ancient Egyptian uniconsonantal characters results in a spelling of that name, but the procedure by which the spelling is constructed is one which exploits the possibility of a representational relation between the speech phonemona observed when the name is pronounced and the pre-­theoretical models abstracted from experiences of hearing similar sounds. That is to say, pseudo-­transcription has always been one way of producing new spellings. 2.3.6  Spelling reform

We have seen how phonetics in sixteenth-­century England began in the service of spelling reform to make literacy easier to acquire and foreign languages easier to study, and then became increasingly focused on description and taxonomy. The emergence of phonetics as a more scientific discipline in the nineteenth century gave a surer basis to taxonomic categories and terminology. It also loosened the ties with spelling reform, but it was some time before it cut them. The 1949 Principles of the International Phonetic Association recognises reformed spelling as an application of IPA symbols, thus giving symbols the status of letters, an aim dropped in the 1999 version. The application of phonetics to language learning and teaching does not now have such a strong presence in the IPA’s journal as it did in the early 1990s. Phonetics is not now primarily seen as existing to support the learning and teaching of languages but as a body of theoretical and practical knowledge about how speech is structured to be put more in the service of phonology, sociolinguistics and speech technology than language pedagogy. Spelling reform is usually thought of as a policy to increase the transparency of sound–spelling correspondences, but several of the lasting examples of reformed spellings in the history of English orthography have in fact had the opposite effect. They date from the fifteenth through to the seventeenth centuries, when some letters were introduced into spellings for etymological reasons where the spoken forms had no corresponding sounds to motivate them (Scragg 1974: 56–9). The consequence is that sound–spelling correspondence becomes complicated and increasingly lexically specific. The introduced into the French loan dette to give us debt through conscious reference to Latin debitum raises the question as to whether we should say that corresponds to /ɛ/, or corresponds to /t/, or even corresponds to /ɛt/, or whether we can simply leave



Origins and Development

71

out of correspondence relations altogether as preferred by Carney (1994: 213). All options are lexically restricted (cf. bet, met, set, get, let, web) such that the spelling has similarities to a xenogram (see Section 1.1.4): we use the Latin-­ influenced spelling for the written language form of debt, and pronounce it /dɛt/ in the spoken form. The lasting success of Latinising etymologically driven changes to English spelling, whether based on true or false etymologies, further exemplifies the power of anti-­phonography in glottographic writing and the systemic independence of written and spoken language despite their obvious close association. Phonographically motivated spelling reformers have generally had an uphill battle. They advocate in effect a state of affairs in which spelling would be isomorphic with transcription and spellings would be performance scores functioning as prescriptive models. Their motives are socially progressive, arguing that it will facilitate literacy for the masses and open up greater access to foreign languages by making them easier to learn from written sources. However, the egalitarian aims of spelling reform tend to be undermined when it comes to deciding whose pronunciation a reformed spelling should be based on. John Hart, one of the earliest proponents of reforming English spelling phonographically (see Section 2.3.2), was forthright in his views on this, deliberately echoing Quintilian in saying it should be based on the speech of the learned, and most emphatically not on the speech of ‘the unexpert vulgar’. How it should be decided whose pronunciation will shape a reformed orthography is a serious problem which is likely to cause attempts at spelling reform to flounder, particularly in the case of a language like English with social and geographical variation extending over nations and continents. If reformed spellings were to follow the speech of the learned elite in Quintilian’s quomodo sonat fashion, then an Alcuinian policy of ad litteras (see Chapter 4 Section 4.13.3) would have to be imposed on the ‘unexpert vulgar’ if they were to gain any benefit from the enterprise. The benefit would come at the price of abandoning local norms of speech in a top-­down, centralised policy of prescriptive accent levelling. Henry Sweet (1877: 196), for example, advocated the teaching and testing of pronunciation in schools so that it would match a reformed spelling. If bath and trap words were to have different vowel letters because different vowel qualities are used by the social elite, then either everyone has to use those vowel qualities or the spelling reform is only meaningful to those native speakers who already make the vowel quality distinction and do not need to be told; it would of course have benefits for non-­native learners of English. Spelling reform in a language exhibiting large-­scale social and regional variation can hardly be other than anti-­democratic if it is to have any significant effect for its native-­speaker population. The only way to avoid this totalitarianism is for each variety to develop its own spellings, in which case reading will be either more restricted or more demanding, and cross-­variety written communication put in jeopardy. The strongest linguistic arguments against phonographically driven spelling reform are founded on the view, expressed in Section 1.1, that the ontology of language as a lexico-­grammatical system is equally independent of writing and speech, and that characters and sounds are alternative sets of clothing enabling language to be made manifest in different media for communicative purposes of

72

Phonetic Transcription in Theory and Practice

all kinds. Any correspondence relations that can be set up between speech and writing are merely incidental and irrelevant for the functioning of the language qua language. Weighing against this view, however, is the undeniable importance of phonographic processes in the history of written language, as outlined in Section 2.2, which seems to be evidence that literate language users have always valued at least some transparency in sound–spelling correspondence. It may be that two different needs have to be reconciled: the need for spoken language and written language each to function effectively on its own terms, including social-­ indexical functions, and the need for literate users to be able to translate between spoken and written language as effectively as possible. Trying to force reforms to meet the latter need may upset the balance that has to be struck. Nevertheless, the phonographic tendencies in written language which have given hope to spelling reformers have been fortuitous for the development of resources for phonetic notation and transcription.

Notes 1. The term ‘letter’ did not mean the same among earlier writers as it does today. Instead of meaning only the alphabetic letters of written language, it was formerly used to mean an element, or unit, of linguistic analysis neutral with respect to written and spoken language which could manifest as a written character or a sound (see Abercrombie 1949, 1993: 316–18). 2. The first edition was 1653. The edition consulted is the sixth, of 1765, in Kemp’s (1972) facsimile edition with translation from the original Latin. 3. In fact these categories would generate nasalised continuants which do occur.

3

e Phonetic Notation

e

3.0 Introduction The purpose of a system of phonetic notation is to function as a resource for denoting theoretical models which become descriptive models when used in transcriptions (see Chapter 1 Section 1.3.1, Chapter 4 Section 4.0). There are two sides to phonetic notation, namely the design of the glyph and its denotation. The history of written language and phonetic notation is full of the same glyph being used with different values. Just to take a random example, the ‘bullseye’ glyph ‘ʘ’ seems to have started life as a variant of the Greek letter theta in the Umbrian alphabet, for which it was also used in Tocharian; it was drafted several centuries later into the Gothic alphabet invented by the Greek Bishop Ulfilas (aka Wulfila) in the fourth century ce for IPA [w] (Coulmas 1996: 168), appears with the phonetic value [nd] in the Turkish Yenisei runes (ibid.: 515), corresponds to [s] in the Berber Tifinagh alphabet (ibid.: 504), was used in late eighteenth-­century America by William Thornton for IPA [ʍ] (Abercrombie 1981: 210, 216), turns up in the Vai syllabary in 1820s Liberia for [ku] (Coulmas 1996: 538), and then in 1976 became the IPA symbol for the bilabial click (Pullum and Ladusaw 1996: 132). Fascinating as the history of individual glyphs is, the focus in this chapter will be on the principles behind notation systems and how they function as a whole to denote phonetic categories. Phonetic notations come in different types. They can be constructed according to different principles and be used in transcriptions to express analyses at different levels of phonetic, phonological and morpho-­phonological structure. This chapter is concerned with describing principles of notation construction and how they relate to phonetic theory; Chapter 4 will consider the different types of transcriptions which can be made by employing notation systems. Any phonetician engaged in transcription is likely to be in sympathy with Sweet (1877: 100) when he asserts that ‘[t]he notation of sounds is scarcely less important than their analysis’. Of course, analysis is more important than notation because without it there is nothing to symbolise, but without notation we cannot express analyses so succinctly and conveniently. Once sufficient familiarity with phonetic theory and transcription conventions is attained, phonetic analysis can

74

Phonetic Transcription in Theory and Practice

be read from notation relatively quickly and easily providing the notation is user-­ friendly. The ultimate aim of a system of proper phonetic notation is to be able to denote all the categories of phonetic classification that one’s phonetic theory identifies, and thus to denote all points in taxonomic phonetic space. Each point in that space is a model onto which phonetic data can be mapped (see Chapter 1 Section 1.3). Another way to think of this is to say that a notation system should be able to populate the taxonomic phonetic space mapped out by phonetic theory with symbols so as to leave no yawning gaps. How symbols denote categories is an issue that leads to looking at the internal structure of symbols as well as relationships between symbols and denotata. These issues will be addressed for each type of notation considered in the following sections. The issue of whether there is information value in the sequential arrangement of symbol components – that is to say, the question of whether symbols are functionally ordered or functionally simultaneous – will be considered in Section 3.5.

3.1  Organic-­Iconic Notation An organic notation is one in which symbols denote categories defined in terms of articulators or articulatory states and actions. It is therefore anchored firmly in the articulatory domain and can be thought of as populating abstract articulatory space with symbols. In organic notation, abstract articulatory space is the taxonomic phonetic space. It has been customary to classify as ‘organic’ only those notation systems which explicitly and systematically set out to denote sounds by their articulatory formation such that each symbol can be analysed into components denoting individual articulators. I shall follow this custom, but it should be noted that any phonetic notation is organic to the extent that the conventions for its interpretation take an articulatory perspective. The great problem with an organic bias in phonetic notation is that in practice most phonetic analysis is not directly articulatory but either perceptual or, since the invention of spectrography, acoustic. In the history of phonetic notation, organic systems have either been iconic, so that there is some visual similarity between the symbol and what it denotes, or analogical, so that the same denotatum is always denoted by the same symbol but without visual similarity. In analogical notation the relation between symbol and denotatum is therefore arbitrary. Examples of the iconic type are Bishop John Wilkins’s organic alphabet of 1668 and Alexander Melville Bell’s Visible Speech symbols of 1867. The analogical type is exemplified by the symbols of Francis Lodwick’s 1686 Universall Alphabet and Amasa D. Sproat’s symbols of 1857. The division into iconic and analogical notations is not, however, always clear-­cut. The characters of the Korean Hangŭl orthography and the symbols of the Passy-­Jones alphabet are somewhere in the middle, but they will be dealt with under the ‘organic-­iconic’ heading here. The most complete and transparent kind of organic-­iconic notation would be one where the whole configuration of the vocal tract during the production of a sound was depicted in a symbol, but such symbols would not be easy to read and write, being in effect highly detailed drawings of physical vocal tract space; nor would they be selective in expressing an analysis of the particular sound being represented – all parts of the vocal tract would appear to be equally implicated



Phonetic Notation

75

and equally important in contributing to the formation of the sound. To be useful and informative, segmental organic-­iconic symbols need to be selective, stylised diagrams of those articulators identified as responsible for producing the sound in question, the selection being the responsibility of phonetic theory. There should be a one-­to-­one relationship between an organic-­iconic symbol and an articulatory category such that, for example, labiality is always denoted by the same graphic representation of the lips, and plosiveness, and voicing, and so on. Each organic-­iconic symbol thus denotes an articulatory category. Whole consonants and vowels are then represented by composite multi-­category symbols. The great advantage claimed for a good organic-­iconic notation is that it is maximally analytic and maximally transparent to any reader with sufficient knowledge of the vocal tract. One great disadvantage is that it tends to be difficult to use in practice, but a further disadvantage is that it cannot be used to denote dimensions of classification which cannot be tied to a particular articulatory parameter, for example sonority, sibilance or rhoticity. These disadvantages are no doubt partly responsible for the fact that none of the organic-­iconic notations which have been devised have been widely or lastingly adopted by phoneticians, despite enthusiastic support from leading phoneticians such as Henry Sweet (e.g. Sweet 1881). Some examples of organic-­iconic notation systems are discussed in the following sections. 3.1.1  Korean Hangu˘l

The first known notation system on organic-­iconic principles is the Korean Hangŭl orthography (see Chapter 2 Section 2.2.5) introduced in the fifteenth century to replace Chinese characters for the spelling of Korean words. Most Hangŭl letters are complexes constructed from characters that represent particular articulatory configurations, as shown in Figure 3.1. For example, in the

FIGURE 3.1:  Articulatory configurations motivating the Hangŭl letters.

Reproduced with kind permission from King Sejong the Great: The Everlasting Light of Korea, p. 92

76

Phonetic Transcription in Theory and Practice

letters (transliterated respectively as ), the upper horizontal line is a component character corresponding to the palate oriented with the front to the left; the lines contacting it correspond, respectively, to closures at the alveolar and velar places of articulation. Under the influence of the structure of Chinese characters, Hangŭl letters are composed into blocks corresponding to syllables, so that trisyllabic datugo ‘fighting, quarrelling’ is written with three syllable blocks (separated from each other here for ease of identification) as 다 투 고. The Hangŭl letters belong to an orthography but have a phonetic theory underpinning their design (Sampson 1985: 124–9; King 1996: 219–20). They therefore constitute a proto-­phonetic notation system as well as an orthographic system. This system has been developed into a proper phonetic notation system by Hyun Bok Lee of Seoul National University. Called the International Korean Phonetic Alphabet (IKPA), it was first published by the Korean Language Society in 1971. Using the Hangŭl organic principles for the construction of complex symbols, Lee uses diacritics and modifying strokes to extend the notation to cover sounds not found in Korean and arranges the symbols linearly instead of in syllable blocks. Transcription of datugo then becomes [ᄃ ᅡ ᄐ ᅮ ᄀ ᅩ] with each consonant and vowel clearly separate in sequence from left to right. IKPA transforms Hangŭl characters from proto-­symbols into proper phonetic symbols, although organic-­iconicity is difficult to identify in relation to some of the characters and they are not all systematically deployed throughout the system. The category ‘fricative’, for example, is denoted by a subscript circle similar to the IPA voicelessness diacritic, but not all fricative symbols have it. However, in principle, each articulatory category is denoted by a separate symbol the graphic shape of which is based on some aspect of how the vocal tract implements that category. An example transcription using IKPA is given in Lee (1999: 123). 3.1.2  Helmont’s interpretation of Hebrew letters

A curious twist on the organic-­iconic approach is found in a book by the Dutch philosopher and alchemist Franciscus Mercurius ab Helmont published in 1667. Helmont tried to show that the letters of the Hebrew alphabet represented the articulatory configurations for the corresponding sounds. But his interpretations of the letters as a kind of phonetic tablature notation, as if they were like Korean Hangŭl, led him to incorrect conclusions about the formation of the sounds. His description of [b], based on the shape of the letter bēth < ‫> ב‬, would have it that ‘[l]ingua cum maxima corporis sui parte, valide admodem palato applicatur, adeo, ut propterea mucro ejus antrorsum quadantenus incurvetur’ (‘the largest part of the body of the tongue is applied fully to the palate, so much so that its tip is to some extent curved forwards’) (Helmont 1667: 60–1). A similar tongue position is attributed to [m] from the letter-­shape of mēm < ‫> ם‬: ‘Lingua palatum leniter attingit, prout et labia sese leniter exosculantur’ (‘the tongue strikes the palate softly, and according as the lips are gently kissed by each other’) (ibid.: 74). His diagram for [b] is given in Figure 3.2 alongside his vocal tract diagram.



Phonetic Notation

77

FIGURE 3.2:  Helmont’s diagram of Hebrew bēth (left) and his vocal tract diagram (right). Reproduced with the permission of the Brotherton Collection, Leeds University Library

3.1.3  Wilkins’s organic-­iconic symbols

The Frenchman Honorat Rambaud may have been the first European to experiment with an organic alphabet (Abercrombie 1948/1965: 50), but better known and more influential is the one devised by John Wilkins, bishop of Chester, in the seventeenth century (see Chapter 2 Section 2.3.2). As with Hangŭl and IKPA, the organising principle is that each subsegmental category is denoted by a symbol, and symbols combine into complex symbols – ‘natural Pictures of the Letters’ – to represent segment-­sized sounds. Wilkins’s organic alphabet is reproduced in Figure 3.3, which shows for each sound how, according to the phonetic understanding of the time, the vocal tract is modified compared to the partly labelled at-­rest diagram in the lower right of the table. For voiced sounds the epiglottis is shown in two positions to indicate its oscillation, which Wilkins erroneously thought was the voicing mechanism, despite his claim to have read Holder (Wilkins 1668: 357). Airflow is also represented for all sounds except oral stops and the first three vowels, the latter presumably because the view is frontal in order to show lip-­shape; airflow is shown bifurcated in the case of laterals and issuing from the nose in the case of nasals. In the top right of each picture is an organic-­iconic symbol intended to capture the essential articulatory state shown in the diagram; note that these symbols are oriented to the right whereas the diagrams are oriented to the left. Wilkins did not intend these organic symbols to be used in transcriptions; instead he assigned to each one a non-­organic upper case roman alphabetic symbol, shown in the top left corner.

78

Phonetic Transcription in Theory and Practice

FIGURE 3.3:  Wilkins’s organic alphabet and articulatory diagrams of

1668. Reproduced with the permission of the Brotherton Collection, Leeds University Library



Phonetic Notation

79

3.1.4 Bell’s Visible Speech notation

The Visible Speech notation of Alexander Melville Bell (Bell 1867) is nowadays the most well-­known organic-­iconic notation system (MacMahon 1996: 838). As Bell himself explained, ‘[i]t is the aim of this System of Letters to write every sound which the mouth can make, and to represent it exactly as the mouth makes it’ (Bell 1867: 70, italics added). It is devised on basically the same principles as Wilkins’s organic alphabet, but there are differences in the categories denoted and how they are expressed. Bell provides diagrams of the vocal tract for consonants and vowels, reproduced here in Figure 3.4. The principal organs of speech are labelled with numbers and shown in their neutral at-­rest positions except for the tongue-­body and tongue-­tip, which are shown in both lowered and raised positions. The epiglottis is represented (not with great anatomical accuracy) but not labelled, reflecting the fact that Bell knew it played no role in voicing, which he correctly attributed to vibration of the vocal ligaments (Bell 1867: 46). Voicing as a separate feature is denoted by the symbol [ɪ], indicative of the vocal folds meeting along the midline of the glottis; when combined with other features into a segmental symbol, it becomes a short line ‘inserted within the consonant curve’ (ibid.: 66), for example [] represents a velar articulation, [] a velar articulation with voicing. Bell’s symbols are less explicitly organic and more diagrammatic than Wilkins’s but they go a considerable way towards

FIGURE 3.4:  Bell’s vocal tract diagrams for consonants and vowels (Bell

1867: 38)

80

Phonetic Transcription in Theory and Practice

justifying his claim that ‘the sound of every symbol is deducible from the form of the symbol itself’ (ibid.: 99), though the further claim that this can be done ‘without any encumbrance to the reader’s memory’ is perhaps less justifiable. Although the symbols are iconically motivated, one has to learn and remember what they stand for; it is hardly self-­evident. One thing which is soon apparent when looking at proposed organic-­iconic notations is that the same vocal organ can motivate different iconic representations. The symbol [] could denote open lips if attention were to be focused on the right-­hand part of the symbol, and there is nothing intrinsic in [] to tell us it stands for a low back vowel with widened pharynx if we have not memorised the conventions. Interpretative conventions are no less necessary with iconic symbols than with other kinds. That is to say, their denotation is not completely determined by their form. It is also questionable whether users find it more convenient to interpret a complex symbol in terms of its constituent parts than to memorise it as a whole. The great value of Bell’s Visible Speech to us nowadays, apart from its value as an experiment in organic-­iconic notation, is that it shows us explicitly the state of phonetic theory in the latter half of the nineteenth century, reminding us that much of what we take to be the sophistication of modern phonetics was in fact current at that time despite the absence of modern instrumentation. His appreciation of English contextual devoicing is a good example (Bell 1867: 67). 3.1.5  Sweet’s organic-­iconic notation

Although Sweet recognised that Bell’s organic symbols were at the mercy of changes in phonetic theory (Sweet 1877: 100–1), within three or four years he had come to the view that enough was known for certain about speech production to justify opting for an organic-­iconic notation system (Sweet 1881: 183). Any tinkering about with it that might become necessary was, in his opinion, a small price

(a)



Phonetic Notation

81

(b) FIGURE 3.5:  Sweet’s (1906) organic symbols for (a) consonants and

(b) vowels to pay for avoiding the arbitrariness and ‘cross-­associations’ of symbols based on roman alphabetic letters. By ‘cross-­associations’ Sweet meant the problem of, for example, English and French phoneticians interpreting roman-­based symbols in terms of their typical letter–sound correspondences in English and French, which he saw as particularly likely in the case of vowels (Sweet 1881: 181–2). Sweet revised aspects of Bell’s notation (see Figure 3.5) to increase the simplicity and distinctiveness of certain symbols, for example the symbols for

82

Phonetic Transcription in Theory and Practice

nasals, thus making them easier to use. But he also made changes based on theoretical differences concerning the production of certain sounds, for example glides. While he was highly respectful of Bell’s analysis of vowels, Sweet did not adopt Bell’s set of glide symbols, objecting to his category ‘glide’ on two grounds (Sweet 1881: 197–9). The first was that it confused two distinctions: consonant–vowel and syllabic–non-­ syllabic (cf. the consonant–contoid and vowel–vocoid distinctions introduced by Pike (1943: 143–5)). Secondly, Sweet did not accept that there could be a category of stricture between close vowel and fricative consonant. It is not clear from Bell’s description of glides as ‘intermediate to consonants and vowels’ (Bell 1867: 69) whether he really meant intermediate in stricture or in some other sense, but modern phonetic theory does in fact recognise that the stricture for [j], for example, tends to be closer than for [i], as can be seen when they occur in sequence in English yeast, but not close enough to produce the friction of [ʝ]. Sweet proposed that non-­syllabic vowels be symbolised by reducing the size of the vowel symbol, so that IPA [j] becomes [], a smaller version of [] (= IPA [i]), being then the same height as a consonant symbol (Sweet 1881: 204–5). Sweet (1906: 52–62) then used the term ‘glide’ for coarticulatory transitional sound qualities produced epiphenomenally as a result of the vocal tract moving from the articulation of one sound to the articulation of the following sound, or between a sound and silence. 3.1.6  The Passy-­Jones organic alphabet

The last serious attempt to launch an organic notation was by Paul Passy and Daniel Jones (see Passy 1907). Although the symbol shapes are obviously heavily influenced by those of Bell and Sweet (see Figure 3.6), they are made to look more like familiar roman letters (Collins and Mees 1999: 52–3) and thus to loosen their iconic connection with vocal tract structures. In consequence, any advantages conferred by iconicity are diminished, while the disadvantages of unfamiliarity remain, which may be one reason why this notation was soon abandoned.

FIGURE 3.6:  The Passy-­Jones organic alphabet (Le Maître phonétique

1907, Supplement)



Phonetic Notation

83

In the Passy-­Jones system, the size of the symbol also has signification. A small version of a symbol denotes a retracted place of articulation relative to the larger version. Labiodental symbols are smaller versions of bilabial ones, alveolars of dental ones, and uvulars of velar ones. The system also contains ‘bronchiales’ (probably because of Sweet’s view that Arabic [ħ] and [ʕ] are produced below the glottis (Sweet 1904: 37)) which are symbolised by smaller versions of the ‘laryngeale’ symbols. An obvious problem with the distinctive use of symbol size is knowing which is intended if a symbol is used on its own. Size is also used to distinguish between a ‘roulée’ (trill) and a ‘semi-­ roulée’ (tap or flap). The straight line inside the consonant curve is halved in length, no doubt motivated by the idea that a tap is like half a trill, i.e. one beat instead of the typical two or three beats found in singleton trills (Laver 1994: 219). Jones (1918/1972: 47) describes a trill as ‘a rapid succession of taps’ and a flap as ‘a single tap’ without mentioning the very different mechanisms modern phonetic theory takes to be responsible for their production (see Laver 1994: 224).

3.2  Organic-­Analogical Notation Symbols in organic-­analogical notation systems are more arbitrary than those in organic-­iconic systems. The principle of analogical notation is that each phonetic category is consistently denoted by the same symbol or symbol component. However, the way this is done varies considerably in different notation systems, as does the way in which the notation system relates to phonetic theory in terms of explicitness and accuracy. These differences can be seen in the examples considered in the following sections. 3.2.1  Wilkins’s analogical notation

In the same work in which he published his organic symbols, Wilkins provided a chart of analogical symbols (Wilkins 1668: 376). It is reproduced here in Figure 3.7, where we can see his list of consonantal roman letters and digraphs (and one trigraph) given in lower case in column 1 and upper case in column 9. In row 1 he gives the vowel letters ([ ] = the strut vowel, IPA [ʌ]; see Wilkins 1668: 363). Column 2 and row 2 contain, respectively, the equivalent analogical symbols for the consonants and vowels in isolation. Sounds represented in rows 3–17 are based on a straight vertical stroke, which is tilted for the ­semivowels – backwards for [w], forwards for [j], perhaps motivated by their respective relationships to front and back vowels. A short stroke adjoined at the top of the obstruent symbols in this set denotes voice; adjoined at the base it denotes voicelessness; this device is also used to distinguish voiced and voiceless liquids.1 Place of articulation is denoted by the way this short stroke is adjoined: at a 45° angle it denotes labial; horizontal extending in one direction from the vertical denotes apical; horizontal extending in both directions denotes dorsal. Manner of articulation is expressed through the addition of a curve at the end of the short stroke to denote a fricative or affricate. Sounds in rows 18–25 are based on curves. The sibilant fricatives in rows 18–21 have the orientation

84

Phonetic Transcription in Theory and Practice

rag- 37610 :tt 12.. $

e 0 u. "}}.. 01 ol ol cl cj cl 3 1v I 1 ~ J ~ 4 J H 4 1t7 \ '\ ~ ~ \ \ \. w s ~ I 'l tj ,j l ( I y 6 b 1 1 ~ ~ 1 ~ '1 B 7 1' k ~ cit J, } {t )J 'P S'V 1 1 1 ~ 1 i 1 v 9 f J... '\r i- J,. ~ ~ );. F 10 a, 1 1 Cj "'J l ~ 1 D 11 t L 'L i. J.. 'L {_ L T ' 12 J,h, 1 'J 1 ,1 1 Bl t; Ut. L t ,_, .L l. {. .L Th 1'4 g T if 'I T 1 I G 15 c 1 1 i J. 1 .i l c 16 'gh 'T 'J 'T r T Gh 17 ch: .L 1. i ~ 1. 1. .L C'h .l8 z., 1_ (. ~ l ( ?. l z 1.9 s 5 ) £ $ £ 1 '2

0.. lL

, ,. "' ,. .,.

s

20 zh '[. "[ 2.1

sh- 3

2'2.

t

2'3

)

hL l

24 1'

25

hr

26

1)1,

lhm

( '( ") C)

)

{

'[

~

) ). )

1. 5 J

'< ). '(

'(

i l 1 ~ 't ;c t l

70j 1 Cj "l

'J "J

'J 'i '28 n. J J J J J 'l.J 1m J J J J J 3o n; u ~ ~ lJ 0 '31 h~ u G ~ [J 0 2']

a. e 0 10 jo lo I'

C(.

r ~"

~

14 15' u

,,

;;.. I,

b

r

~

1

\,

\

1

\

~

~ "!,

~ ~ r ..,.

I.. 1

(

h.

I' /> 6 I } t 1 .1> 1 1 ')> 1 ~

~

t

[,;

'1 i> 't 1 i ~

~

I .P E. k 'f 1 E ~ 1 p 1 i

r r !. j>, t e.

1>-

1 1.. 1. 1.. 1 1 b

~

I 1 11

L ~ 1. T "F 1 1 l. 1

r '}' 1 L l l l l (.

..b. 7..

s s J s; s ) { r 2}, -r. t t l ~ } J Sh J '} I J } l y '} 3 J L j } l I } l .l hL I '1. rr '} ~ ~ hM 'f 'P 'b '1 '? J J N J J> J. J J d J hN ·J J J J J lJ lS lJ NJ v l} u (j (J hNO (J 0 u u (} j

'

'

u

i t,

I ).

1.

c

t

'1

'1 J J

u

u

\( ~l( IVlJ c(L IJ L~, m7 1/ 1/ Jo;, 1/ l/U1'1 .l.'l, 1/. V) 'V lJ /J .(L t /L 12 !J t1.J T/1 S 1/S 11 \( 1'1)1 1~

,...

I'

/

~ ~ ~

Tlzc L or·ds Pray er in. .Englffo--. ')'

/ G

v

v

I

I

/

C

'

oc

I

V

>'

-

I

~~ ~~9, '?'4'h19,'hJ'Jn 'f'b7n ; ?,Pf!? 7 .'b?n1n.~ Lf.J_'In >' t

I

V

v

f

//