Introduction to communication sciences and disorders : the scientific basis of clinical practice 9781597562973, 1597562971

4,270 201 61MB

English Pages [411] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Introduction to communication sciences and disorders : the scientific basis of clinical practice
 9781597562973, 1597562971

Table of contents :
Preface
Acknowledgments
Reviewers
1. Introduction to Communication Sciences and Disorders
Introduction: Communication Sciences and Disorders as a Discipline
Communication Sciences and Disorders: The Whole Is Greater Than the Sum of Its Parts
An Interdisciplinary Field
Translational Research
Does the Basic Science Work? Does the Clinic Work?
Evidence-Based Practice
A Typical Undergraduate Curriculum
Who Are the Professionals in Communication Sciences and Disorders?
Preparation for, and the Profession of, Speech-Language Pathology
Preparation for, and the Profession of, Audiology
Order of Chapters in the Text
Chapter Summary
References
2. The Nervous System: Language, Speech, and Hearing Structures and Processes
Introduction
Central and Peripheral Nervous Systems
The Neuron
The Synapse
Tour of Gross Neuroanatomy
Frontal Lobe
Occipital Lobe
Temporal Lobe
Parietal Lobe
Hidden Cortex
Subcortical Nuclei
Brainstem, Cerebellum, and Spinal Cord
The Auditory Pathways
The Dominant Hemisphere and the Perisylvian Language Areas
Arcuate Fasciculus (Dorsal Stream) and Ventral Stream
Functional Magnetic Resonance Imaging and Speech and Language Brain Activity
Functional Magnetic Resonance Imaging
Diffusion Tensor Imaging
Chapter Summary
References
3. Language Science
Introduction
What Is Language?
Language: A Conventional System
Language: A Dynamic System
Language Is Generative
Language Uses Mental Representations
Language Is Localized in the Brain
Components of Language
Form
Social Use of Language (Pragmatics)
Language and Cognitive Processes
Why
How
When
Chapter Summary
References
4. Communication in a Multicultural Society
Introduction
Why It Matters
Difference Versus Disorder
Standardized Testing and Language Difference Versus Disorder
Accent, Dialect, and Culture
Accent
Dialect
Code Switching
Foreign Accent
Bilingualism and Multilingualism
Chapter Summary
References
5. Preverbal Foundations of Speech and Language Development
Introduction
Preparatory Notes on Developmental Chronologies
0 to 3 Months: Expression (Production)
0 to 3 Months: Perception and Comprehension
3 to 8 Months: Production
3 to 8 Months: Perception and Comprehension
8 to 12 Months: Production
8 to 12 Months: Perception and Comprehension
Gesture and Preverbal Language Development
Chapter Summary
References
6. Typical Language Development
Introduction
12 to 18 Months
18 to 24 Months
Three Years (36 Months)
Multiword Utterances, Grammatical Morphology
Expanding Utterance Length: A Measure of Linguistic Sophistication
Grammatical Morphology
Typical Language Development in School Years
Metalinguistic Skills
Pragmatic Skill: Discourse
Complex Sentences
Sample Transcript
Chapter Summary
References
7. Pediatric Language Disorders I
Introduction
Specific Language Impairment/Developmental Language Disorder
Language Characteristics of Children with SLI/DLD
Summary of the Language Disorder in SLI/DLD
What Is The Cause of SLI/DLD?
The Role of Genetics in SLI/DLD
Language Delay and Autism Spectrum Disorder
Language Characteristics in ASD
Language Delay and Hearing Impairment
Epidemiology of Hearing Loss
Language Characteristics in Hearing Impairment
Speech and Language Development and Hearing Impairment
Chapter Summary
References
8. Pediatric Language Disorders II
Introduction
Criteria for a Diagnosis of ID
Down Syndrome (DS): General Characteristics
Epidemiology and the DS Phenotype
Language Characteristics in DS
Fragile X Syndrome: General Characteristics
Epidemiology of FXS
Language Characteristics in FXS
Chapter Summary
References
9. Language Disorders in Adults
Introduction
Review of Concepts for the Role of The Nervous System In Speech, Language, and Hearing
Cerebral Hemispheres
Lateralization of Speech and Language Functions
Language Expression and Comprehension Are Represented in Different Cortical Regions of the Left Hemisphere
Connections Between Different Regions of the Brain
Perisylvian Speech and Language Areas of the Brain
Adult Language Disorders: Aphasia
Classification of Aphasia
Aphasia Due to Stroke: A Summary
Traumatic Brain Injury and Aphasia
Nature of Brain Injury in TBI
Language Impairment in TBI
Dementia
Brain Pathology in Dementia
Language Disorders in Dementia
Chapter Summary
References
10. Speech Science I
Introduction
The Speech Mechanism: A Three-Component Description
Respiratory System Component (Power Supply for Speech)
The Respiratory System and Vegetative Breathing
Speech Breathing
Clinical Applications: An Example
The Larynx (Sound Source for Speech)
Laryngeal Cartilages
Laryngeal Muscles and Membranes
Phonation
Characteristics of Phonation
Clinical Applications: An Example
Upper Airway (Consonants and Vowels)
Muscles of the Vocal Tract
Vocal Tract Shape and Vocalic Production
Velopharyngeal Mechanism
Valving in the Vocal Tract and the Formation of Speech Sounds
Coarticulation
Clinical Applications: An Example
Chapter Summary
References
11. Speech Science II
Introduction
The Theory of Speech Acoustics
The Sound Source
The Sound Filter
Vowel Sounds Result From the Combination of Source and Filter Acoustics
Resonant Frequencies of Vowels Are Called Formants: Spectrograms
The Tube Model of Human Vocal Tract Makes Interesting Predictions and Suggests Interesting Problems
A Spectrogram Shows Formant Frequencies and Much More
Speech Synthesis
Speech Recognition
Speech Acoustics and Assistive Listening Devices
Speech Perception
The Perception of Speech: Special Mechanisms?
The Perception of Speech: Auditory Theories
Motor Theory and Auditory Theory: A Summary
Top-Down Influences: It Is Not All About Speech Sounds
Speech Intelligibility
Chapter Summary
References
12. Phonetics
Introduction
International Phonetic Alphabet
Vowels and Their Phonetic Symbols
Consonants and Their Phonetic Symbols
Clinical Implications of Phonetic Transcription
Chapter Summary
References
13. Typical Phonological Development
Introduction
Phonetic and Phonological Development: General Considerations
Phonetic and Phonological Development
Phonetic Development
Phonological Development
Typical Speech Sound Development
Determination of Speech Sound Mastery in Typically Developing Children
Possible Explanations for the Typical Sequence of Speech Sound Mastery
Phonological Processes and Speech Sound Development
Phonological Development and Word Learning
Chapter Summary
References
14. Motor Speech Disorders in Adults
Introduction
Classification of Motor Speech Disorders
Dysarthria
Subtypes of Dysarthria
The Mayo Clinic Classification System for Motor Speech Disorders
The Dysarthrias: A Summary
Apraxia of Speech
Chapter Summary
References
15. Pediatric Speech Disorders I
Introduction
Speech Delay
Diagnosis of Speech Delay
Quantitative Measures of Speech Delay and Speech Intelligibility
Speech Delay: Phonetic, Phonological, or Both?
Additional Considerations in Speech Delay and Residual and Persistent Speech Sound Errors
Speech Delay and Genetics
Childhood Apraxia of Speech
CAS Compared With Adult Apraxia of Speech (AAS)
CAS: Prevalence and General Characteristics
CAS: Speech Characteristics
CAS and Overlap With Other Developmental Delays
CAS and Genetics
Chapter Summary
References
16. Pediatric Speech Disorders II
Introduction
Childhood Motor Speech Disorders: Cerebral Palsy
Subtypes of Cerebral Palsy
Dysarthria in Cerebral Palsy
Childhood Motor Speech Disorders: Traumatic Brain Injury and Tumors
Traumatic Brain Injury
Brain Tumors
Treatment Options and Considerations
Chapter Summary
References
17. Fluency Disorders
Introduction
Incidence and Prevalence of Stuttering
Genetic Studies
Diagnosis of Developmental Stuttering
The Natural History of Developmental Stuttering
Stage I: Typical Dysfluencies
Stage II: Borderline Stuttering
Stage III: Beginning Stuttering
Stage IV: Intermediate Stuttering
Stage V: Advanced Stuttering
Recovery of Fluency
Possible Causes of Stuttering
Psychogenic Theories
Learning Theories
Biological Theories
Acquired (Neurogenic) Stuttering
Symptoms of Neurogenic Stuttering Compared With Developmental Stuttering
Treatment Considerations
Chapter Summary
References
18. Voice Disorders
Introduction
Epidemiology of Voice Disorders
Initial Steps in the Diagnosis of Voice Disorders
Case History
Perceptual Evaluation of the Voice
Viewing the Vocal Folds
Measurement of Basic Voice Parameters
Classification/Types of Voice Disorders
The Hypo-Hyperfunctional Continuum
Phonotrauma
Organic Voice Disorders
Functional Voice Disorders
Neurological Voice Disorders
Pediatric Voice Disorders
Prevalence of Childhood Voice Disorders
Types of Childhood Voice Disorders
Treatment of Childhood Voice Disorders
Chapter Summary
References
19. Craniofacial Anomalies
Introduction
Definition and Origins of Craniofacial Anomalies
Embryological Development of the Upper Lip and Associated Structures
Embryological Errors and Clefting: Clefts of the Lip
Embryological Errors and Clefting: Clefts of the Palate
Cleft Lip With or Without a Cleft Palate; Cleft Palate Only (Isolated Cleft Palate)
Epidemiology of Clefting
Speech Production in CL/P and CPO
Diagnosis of VPI
VPI and Hypernasality
VPI, Consonant Articulation, and Speech Intelligibility
Clefting and Syndromes
Cleft Palate: Other Considerations
Chapter Summary
References
20. Swallowing
Introduction
Anatomy of Swallowing
Esophagus
Stomach
The Act of Swallowing
Oral Preparatory Phase
Oral Transport Phase
Pharyngeal Phase
Esophageal Phase
Overlap of Phases
Breathing and Swallowing
Nervous System Control of Swallowing
Role of the Peripheral Nervous System
Role of the Central Nervous System
Variables That Influence Swallowing
Bolus Characteristics
Development
Age
Measurement and Analysis of Swallowing
Videofluoroscopy
Endoscopy
Client Self-Report
Health Care Team for Individuals With Swallowing Disorders
Chapter Summary
References
21. Hearing Science I: Acoustics and Psychoacoustics
Introduction
Oscillation
Waveform
Spectrum
Waveform and Spectrum
Resonance
Psychoacoustics
Pitch
Loudness
Sound Quality
Chapter Summary
References
22. Hearing Science II: Anatomy and Physiology
Introduction
Temporal Bone
Peripheral Anatomy of the Ear
Outer Ear (Conductive Mechanism)
Middle Ear
Inner Ear (Sensorineural Mechanism)
Chapter Summary
References
23. Diseases of the Auditory System and Diagnostic Audiology
Introduction
Hearing Evaluation
Case History
Otoscopy
Immittance
Tympanometry
Acoustic Reflex Threshold
Audiometric Testing
Physiological Responses
Vestibular Assessment
Audiometric Results
Type, Degree, and Configuration of Loss
Hearing and Balance Disorders
Patient Examples
Chapter Summary
References
24. Assistive Listening Devices
Introduction
Hearing Aids
Steps in Selecting and Fitting a Hearing Aid
Types of Hearing Aids
Hearing Aid Components
Auditory Implantable Devices
Bone-Anchored Implant
Middle Ear Implant
Cochlear Implant
Chapter Summary
Hearing Aids
Auditory Implantable Devices
References
25. Aural Habilitation and Rehabilitation
Introduction
Aural Habilitation
Assessment of Communication Needs in Children
Pediatric Intervention
Components of a Family-Centered Intervention
Auditory Training in Aural Habilitation
Communication Options
Outcome Measures for Children
Aural Rehabilitation
Assessment of Communication Needs in Adults
Adult Intervention
Auditory Training in Aural Rehabilitation
Communication Strategies
Speechreading
Outcome Measures for Adults
Group Aural Rehabilitation
Chapter Summary
Aural Habilitation
Aural Rehabilitation
References
Index

Citation preview

Introduction to Communication Sciences and Disorders The Scientific Basis of Clinical Practice

Introduction to Communication Sciences and Disorders The Scientific Basis of Clinical Practice

Gary Weismer, PhD David K. Brown, PhD

5521 Ruffin Road San Diego, CA 92123 e-mail: [email protected] Website: https://www.pluralpublishing.com

Copyright © 2021 by Plural Publishing, Inc. Typeset in 10/12 Palatino by Flanagan’s Publishing Services, Inc. Printed in Canada by Friesens Corporation All rights, including that of translation, reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, including photocopying, recording, taping, Web distribution, or information storage and retrieval systems without the prior written consent of the publisher. For permission to use material from this text, contact us by Telephone: (866) 758-7251 Fax: (888) 758-7255 e-mail: [email protected] Every attempt has been made to contact the copyright holders for material originally printed in another source. If any have been inadvertently overlooked, the publishers will gladly make the necessary arrangements at the first opportunity. Disclaimer: Please note that ancillary content (such as documents, audio, and video, etc.) may not be included as published in the original print version of this book. Library of Congress Cataloging-in-Publication Data Names: Weismer, Gary, (Professor Emeritus), author. | Brown, David K. (Professor of audiology), author. Title: Introduction to communication sciences and disorders : the scientific basis of clinical practice / Gary Weismer, David K. Brown. Description: San Diego, CA : Plural Publishing, Inc., [2021] | Includes bibliographical references and index. Identifiers: LCCN 2019029827 | ISBN 9781597562973 (paperback) | ISBN 1597562971 (paperback) Subjects: MESH: Communication Disorders | Voice Disorders | Hearing Disorders | Language Development | Speech — physiology | Hearing — physiology Classification: LCC RC423 | NLM WL 340.2 | DDC 616.85/5 — dc23 LC record available at https://lccn.loc.gov/2019029827

Contents Preface xv Acknowledgments xvii Reviewers xix

1 Introduction to Communication Sciences

1

and Disorders Introduction:  Communication Sciences and Disorders as a Discipline 1 Communication Sciences and Disorders:  The Whole Is Greater 2 Than the Sum of Its Parts 3 An Interdisciplinary Field Translational Research 4 Does the Basic Science Work? Does the Clinic Work? 6 Evidence-Based Practice 7 10 A Typical Undergraduate Curriculum Who Are the Professionals in Communication Sciences and Disorders? 10 Preparation for, and the Profession of, Speech-Language Pathology 10 Preparation for, and the Profession of, Audiology 12 Order of Chapters in the Text 13 14 Chapter Summary References 14



2 The Nervous System:  Language, Speech, and

17 Hearing Structures and Processes Introduction 17 Central and Peripheral Nervous Systems 17 The Neuron 18 21 The Synapse Tour of Gross Neuroanatomy 21 22 Frontal Lobe Occipital Lobe 23 Temporal Lobe 23 24 Parietal Lobe Hidden Cortex 24 Subcortical Nuclei 24 Brainstem, Cerebellum, and Spinal Cord 26 The Auditory Pathways 27 v

vi

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The Dominant Hemisphere and the Perisylvian Language Areas 28 Arcuate Fasciculus (Dorsal Stream) and Ventral Stream 30 Functional Magnetic Resonance Imaging and Speech and Language 31 Brain Activity Functional Magnetic Resonance Imaging 31 Diffusion Tensor Imaging 32 Chapter Summary 33 References 34

3 Language Science



4 Communication in a Multicultural Society



5 Preverbal Foundations of Speech and

35 Introduction 35 What Is Language? 35 Language:  A Conventional System 36 Language:  A Dynamic System 36 Language Is Generative 37 Language Uses Mental Representations 37 38 Language Is Localized in the Brain Components of Language 38 Form 38 Social Use of Language (Pragmatics) 42 Language and Cognitive Processes 42 Why 43 How 43 When 43 Chapter Summary 44 References 44 45 Introduction 45 Why It Matters 47 Difference Versus Disorder 47 48 Standardized Testing and Language Difference Versus Disorder Accent, Dialect, and Culture 50 Accent 50 Dialect 51 Code Switching 52 53 Foreign Accent Bilingualism and Multilingualism 54 Chapter Summary 55 References 55 57 Language Development Introduction 57 Preparatory Notes on Developmental Chronologies 58 0 to 3 Months:  Expression (Production) 58 0 to 3 Months:  Perception and Comprehension 60 3 to 8 Months:  Production 61 3 to 8 Months:  Perception and Comprehension 63

Contents

8 to 12 Months:  Production 65 8 to 12 Months:  Perception and Comprehension 65 Gesture and Preverbal Language Development 66 Chapter Summary 66 References 67

6 Typical Language Development



7 Pediatric Language Disorders I



8 Pediatric Language Disorders II

69 Introduction 69 12 to 18 Months 71 18 to 24 Months 71 Three Years (36 Months) 72 Multiword Utterances, Grammatical Morphology 72 Expanding Utterance Length:  A Measure of Linguistic Sophistication 74 Grammatical Morphology 76 Typical Language Development in School Years 77 Metalinguistic Skills 77 78 Pragmatic Skill:  Discourse Complex Sentences 81 Sample Transcript 81 Chapter Summary 83 References 83 85 Introduction 85 Specific Language Impairment/Developmental Language Disorder 85 Language Characteristics of Children with SLI/DLD 86 88 Summary of the Language Disorder in SLI/DLD What Is The Cause of SLI/DLD? 88 The Role of Genetics in SLI/DLD 88 Language Delay and Autism Spectrum Disorder 89 Language Characteristics in ASD 89 92 Language Delay and Hearing Impairment Epidemiology of Hearing Loss 92 Language Characteristics in Hearing Impairment 92 Speech and Language Development and Hearing Impairment 93 Chapter Summary 94 References 95 99 Introduction 99 Criteria for a Diagnosis of ID 99 Down Syndrome (DS): General Characteristics 100 Epidemiology and the DS Phenotype 101 Language Characteristics in DS 102 Fragile X Syndrome: General Characteristics 104 Epidemiology of FXS 106 Language Characteristics in FXS 106 Chapter Summary 109 References 109

vii

viii

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice



9 Language Disorders in Adults

111 Introduction 111 Review of Concepts for the Role of The Nervous System In Speech, 111 Language, and Hearing 111 Cerebral Hemispheres Lateralization of Speech and Language Functions 112 Language Expression and Comprehension Are Represented in 112 Different Cortical Regions of the Left Hemisphere Connections Between Different Regions of the Brain 112 Perisylvian Speech and Language Areas of the Brain 112 Adult Language Disorders:  Aphasia 114 Classification of Aphasia 114 Aphasia Due to Stroke:  A Summary 122 Traumatic Brain Injury and Aphasia 124 Nature of Brain Injury in TBI 124 Language Impairment in TBI 125 Dementia 126 Brain Pathology in Dementia 126 Language Disorders in Dementia 127 Chapter Summary 128 References 128

131 10 Speech Science I Introduction 131 The Speech Mechanism:  A Three-Component Description 131 Respiratory System Component (Power Supply for Speech) 131 The Respiratory System and Vegetative Breathing 133 Speech Breathing 134 137 Clinical Applications:  An Example The Larynx (Sound Source for Speech) 138 Laryngeal Cartilages 138 Laryngeal Muscles and Membranes 139 Phonation 141 Characteristics of Phonation 142 Clinical Applications:  An Example 145 Upper Airway (Consonants and Vowels) 145 Muscles of the Vocal Tract 146 Vocal Tract Shape and Vocalic Production 146 Velopharyngeal Mechanism 147 Valving in the Vocal Tract and the Formation of Speech Sounds 149 Coarticulation 149 Clinical Applications:  An Example 149 Chapter Summary 150 References 151 153 11 Speech Science II Introduction 153 The Theory of Speech Acoustics 154 The Sound Source 154

Contents

The Sound Filter 154 Vowel Sounds Result From the Combination of Source and 155 Filter Acoustics Resonant Frequencies of Vowels Are Called Formants:  Spectrograms 156 The Tube Model of Human Vocal Tract Makes Interesting Predictions 158 and Suggests Interesting Problems A Spectrogram Shows Formant Frequencies and Much More 158 Speech Synthesis 159 Speech Recognition 159 Speech Acoustics and Assistive Listening Devices 159 Speech Perception 160 The Perception of Speech:  Special Mechanisms? 160 The Perception of Speech:  Auditory Theories 162 Motor Theory and Auditory Theory:  A Summary 163 Top-Down Influences:  It Is Not All About Speech Sounds 163 Speech Intelligibility 165 166 Chapter Summary References 166 169 12 Phonetics Introduction 169 International Phonetic Alphabet 170 Vowels and Their Phonetic Symbols 170 174 Consonants and Their Phonetic Symbols Clinical Implications of Phonetic Transcription 176 Chapter Summary 177 References 178 179 13 Typical Phonological Development Introduction 179 Phonetic and Phonological Development:  General Considerations 180 Phonetic and Phonological Development 180 181 Phonetic Development Phonological Development 181 Typical Speech Sound Development 181 Determination of Speech Sound Mastery in Typically Developing Children 183 Possible Explanations for the Typical Sequence of Speech Sound Mastery 183 Phonological Processes and Speech Sound Development 186 Phonological Development and Word Learning 188 188 Chapter Summary References 188 191 14 Motor Speech Disorders in Adults Introduction 191 Classification of Motor Speech Disorders 191 Dysarthria 193 Subtypes of Dysarthria 193 The Mayo Clinic Classification System for Motor Speech Disorders 193

ix

x

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The Dysarthrias:  A Summary 201 Apraxia of Speech 203 Chapter Summary 203 References 204 205 15 Pediatric Speech Disorders I Introduction 205 Speech Delay 206 Diagnosis of Speech Delay 207 Quantitative Measures of Speech Delay and Speech Intelligibility 208 Speech Delay:  Phonetic, Phonological, or Both? 209 Additional Considerations in Speech Delay and Residual and 210 Persistent Speech Sound Errors Speech Delay and Genetics 211 Childhood Apraxia of Speech 211 CAS Compared With Adult Apraxia of Speech (AAS) 212 214 CAS:  Prevalence and General Characteristics CAS:  Speech Characteristics 214 CAS and Overlap With Other Developmental Delays 215 215 CAS and Genetics Chapter Summary 216 References 217 219 16 Pediatric Speech Disorders II Introduction 219 Childhood Motor Speech Disorders:  Cerebral Palsy 220 Subtypes of Cerebral Palsy 220 221 Dysarthria in Cerebral Palsy Childhood Motor Speech Disorders:  Traumatic Brain Injury 224 and Tumors Traumatic Brain Injury 224 Brain Tumors 225 226 Treatment Options and Considerations Chapter Summary 227 References 227 229 17 Fluency Disorders Introduction 229 Incidence and Prevalence of Stuttering 229 Genetic Studies 231 Diagnosis of Developmental Stuttering 231 The Natural History of Developmental Stuttering 231 Stage I:  Typical Dysfluencies 232 Stage II:  Borderline Stuttering 232 Stage III:  Beginning Stuttering 233 Stage IV:  Intermediate Stuttering 233 Stage V:  Advanced Stuttering 233 Recovery of Fluency 234 Possible Causes of Stuttering 234

Contents

Psychogenic Theories 234 Learning Theories 235 Biological Theories 235 Acquired (Neurogenic) Stuttering 238 Symptoms of Neurogenic Stuttering Compared With 238 Developmental Stuttering Treatment Considerations 239 Chapter Summary 239 References 240 243 18 Voice Disorders Introduction 243 Epidemiology of Voice Disorders 243 Initial Steps in the Diagnosis of Voice Disorders 244 Case History 244 Perceptual Evaluation of the Voice 244 244 Viewing the Vocal Folds Measurement of Basic Voice Parameters 245 Classification/Types of Voice Disorders 247 248 The Hypo-Hyperfunctional Continuum Phonotrauma 249 Organic Voice Disorders 252 Functional Voice Disorders 252 Neurological Voice Disorders 254 257 Pediatric Voice Disorders Prevalence of Childhood Voice Disorders 257 Types of Childhood Voice Disorders 257 Treatment of Childhood Voice Disorders 257 Chapter Summary 258 References 258 261 19 Craniofacial Anomalies Introduction 261 Definition and Origins of Craniofacial Anomalies 261 Embryological Development of the Upper Lip and Associated 261 Structures Embryological Errors and Clefting:  Clefts of the Lip 264 Embryological Errors and Clefting:  Clefts of the Palate 265 Cleft Lip With or Without a Cleft Palate; Cleft Palate Only 266 (Isolated Cleft Palate) Epidemiology of Clefting 267 Speech Production in CL/P and CPO 267 Diagnosis of VPI 269 VPI and Hypernasality 270 VPI, Consonant Articulation, and Speech Intelligibility 270 Clefting and Syndromes 272 273 Cleft Palate:  Other Considerations Chapter Summary 274 References 275

xi

xii

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

20 Swallowing 277 Introduction 277 Anatomy of Swallowing 277 Esophagus 277 Stomach 278 The Act of Swallowing 278 Oral Preparatory Phase 280 Oral Transport Phase 280 Pharyngeal Phase 280 Esophageal Phase 282 Overlap of Phases 282 Breathing and Swallowing 282 Nervous System Control of Swallowing 283 Role of the Peripheral Nervous System 283 Role of the Central Nervous System 283 Variables That Influence Swallowing 284 284 Bolus Characteristics Development 285 Age 286 Measurement and Analysis of Swallowing 286 Videofluoroscopy 286 Endoscopy 287 Client Self-Report 287 Health Care Team for Individuals With Swallowing Disorders 288 Chapter Summary 289 References 290 293 21 Hearing Science I:  Acoustics and Psychoacoustics Introduction 293 Oscillation 294 Waveform 295 Spectrum 295 Waveform and Spectrum 295 Resonance 297 Psychoacoustics 297 Pitch 298 Loudness 299 Sound Quality 299 Chapter Summary 300 References 300

2 2 Hearing Science II:  Anatomy and Physiology

301 Introduction 301 Temporal Bone 301 Peripheral Anatomy of the Ear 303 Outer Ear (Conductive Mechanism) 303 Middle Ear 305 Inner Ear (Sensorineural Mechanism) 307 Chapter Summary 314 References 314

Contents

23 Diseases of the Auditory System and 317 Diagnostic Audiology Introduction 317 Hearing Evaluation 317 Case History 318 Otoscopy 318 Immittance 319 Tympanometry 320 Acoustic Reflex Threshold 323 Audiometric Testing 324 Physiological Responses 327 Vestibular Assessment 332 Audiometric Results 334 Type, Degree, and Configuration of Loss 335 Hearing and Balance Disorders 337 Patient Examples 337 341 Chapter Summary References 341 343 24 Assistive Listening Devices Introduction 343 Hearing Aids 343 Steps in Selecting and Fitting a Hearing Aid 343 Types of Hearing Aids 345 Hearing Aid Components 348 349 Auditory Implantable Devices Bone-Anchored Implant 350 Middle Ear Implant 351 Cochlear Implant 354 Chapter Summary 355 355 Hearing Aids Auditory Implantable Devices 356 References 356 359 25 Aural Habilitation and Rehabilitation Introduction 359 Aural Habilitation 360 Assessment of Communication Needs in Children 360 Pediatric Intervention 362 Components of a Family-Centered Intervention 364 Auditory Training in Aural Habilitation 365 Communication Options 365 Outcome Measures for Children 367 Aural Rehabilitation 367 Assessment of Communication Needs in Adults 367 Adult Intervention 368 Auditory Training in Aural Rehabilitation 368 Communication Strategies 369 Speechreading 369

xiii

xiv

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Outcome Measures for Adults 370 Group Aural Rehabilitation 371 Chapter Summary 372 Aural Habilitation 372 Aural Rehabilitation 372 References 373 Index 375

Preface Introduction to Communication Sciences and Disorders: The Scientific Basis of Clinical Practice is a textbook designed and written for undergraduate students who enroll in a course that lays out the scientific foundations for the clinical disciplines of speech-language pathology and audiology. The great majority of departments in our field that offer an undergraduate major have a regularly taught introductory course among their course offerings. Introductory courses in any field, whether in psychology, anthropology, linguistics, or communication sciences and disorders (hereafter, CS&D), are survey courses in which nearly all aspects of a field are presented. For academic disciplines that have many aspects — and most do — breadth of coverage takes precedence over depth of coverage. Simplification of complicated material is inevitable, and long-standing, ongoing debates in a field cannot be described in detail. An introductory course in CS&D is subject to these characteristics, and these constraints. That being said, we have attempted to provide a carefully measured depth in each chapter, in the hope of conveying the sense of excitement in the continuing expansion of the scientific basis of clinical practice in CS&D. This textbook is organized with a general plan of matching individual chapters to individual lectures, or perhaps to one-and-one-half lectures. The textbook is written to give the instructor the option of not including selected chapters in the classroom lectures, or not assigning them as required reading material, if that is desired. For example, there are two chapters that present information on pediatric language disorders, and two chapters that present information on pediatric speech sound disorders. For each pair of chapters, one chapter presents information on two or three disorders, and the other presents information on two or three other disorders. An instructor who decides to present examples of a particular pediatric language or speech sound disorder can surely choose one chapter for a lecture and assign (or not) the other chapter for reading. The same can be said of several other chapters in the textbook. In this sense, we believe the textbook is a flexible instructional companion for both instructors and students. The graduate training of speech-language pathologists (SLPs) and audiologists (AuDs) is a significant mission of CS&D departments. Communication Sciences and Disorders is, at its core, a clinical discipline. But if a clinical endeavor is to be disciplined, the core must include material that supports and motivates clinical practice with knowledge that has emerged from the research laboratory. This text is primarily concerned with the scientific basis of clinical practice, the former being a first step to qualify for the latter professional skill. Clinical information is not ignored in the textbook. In fact, all chapters that present the nature of language, speech, and hearing disorders include some information on diagnosis and treatment of communication disorders. In some chapters, this information is integrated with the presentation of the main material, in others a brief section describes clinical issues relevant to the communication disorder(s) under discussion. A fixed formula is not used for the inclusion of clinical information in various chapters of the textbook; rather, in each chapter that presents information on communication disorders, the clinical information is placed in the location that seemed (in our opinion) to make the most sense. xv

xvi

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Curricula in departments of CS&D are structured to include classes on typical and disordered language, on typical and disordered speech, and on typical and disordered hearing. This is to say that language, speech, and hearing occupy three different categories of coursework. The categories are organized more for the structure of a curriculum, rather than a belief that language, speech, and hearing processes are separate. They are not. The integrated nature of language, speech, and hearing processes, whether typical (normal) or disordered, is known by all clinicians and scientists concerned with communication sciences and disorders. For example, a child who is seen in the clinic for a delay in the mastery of speech sounds often has delays in language acquisition as well, and is at risk for reading delays. Similarly, an American child who is born deaf may have delays in oral language development but have typical language development in American Sign Language (ASL). This textbook follows the approach of separating language, speech, and hearing chapters. But we ask students to keep in mind that this is a teaching decision (much like the organization of courses, as stated earlier), not a statement that the areas are separate. Language chapters are presented first, followed by speech chapters and then hearing chapters; this sequence is arbitrary. One of us (GW) taught the introductory course in the University of Wisconsin– Madison CS&D department for 20 years, changing the order of the language, speech, and hearing categories several times to see if one sequence was more effective than others; the order did not seem to make a difference. The textbook covers a lot of information; this is a necessary feature of a text designed to be the primary reading material for a survey course in communication sciences and disorders. Some areas of the field may be mentioned only briefly, which does not mean we believe they do not merit careful discussion. Decisions were made to limit discussion of certain areas to a minimum to accommodate the goal of a compact textbook. Two final comments are in order. First, the use of pronouns is an efficient and straightforward way to construct sentences in a textbook with frequent references to people. In cases (which constitutes most of the uses) we have chosen to limit pronouns to “he” and “she,” and to alternate between the use of the two when the reference is to a person who is (for example), a clinician or person seeking services. Second, the pattern and extent of citations vary across chapters. Every effort has been made to provide interested students and instructors with upto-date references, and with review papers that provide overviews of the current state of both the research and clinical aspects of a topic under study. We hope the textbook and the course are effective in creating an enhanced understanding of the importance of successful communication, and of the need to understand the impact of a communication disorder on every aspect of an individual’s life. Happy learning!

Acknowledgments Kalie Koscielak, Valerie Johns, and Angie Singh, we are indebted to you for years of support and encouragement. Susan Ellis Weismer had a profound influence on the shaping of Chapters 3, 5, 6, 7, and 8. Professor Ellis Weismer read and reread successive drafts of these chapters, each time making spot-on suggestions for revision. We cannot thank her enough. Once again, as it is with previous textbooks, Maury Aaseng’s beautiful artwork is a defining feature of this textbook. Thanks, Maury. Thanks to Professor Susan Thibeault, and Eileen Peterson, for their gracious offer and preparation of images for Chapter 18. Thanks to Denny and Shelley Weismer for the photo of Friday, their African gray. Thanks to Professor Jenny Hoit for her enormous and generous influence on several parts of this textbook. Anna Ollinger read drafts of several chapters and made excellent suggestions for clarification of concepts and organization. Thanks to Professor Steven Kramer for his influence on the audiology portions of this textbook. The people named are not responsible for any errors that may exist in the textbook; whatever errors exist are solely our responsibility.

xvii

Reviewers Plural Publishing, Inc., and the authors would like to thank the following reviewers for taking the time to provide their valuable feedback during the development process: Gretchen Bennett, MA, CCC-SLP NYS Licensed Speech-Language Pathologist Coordinator of Speech-Language Clinical Services Clinical Associate Professor/Supervisor SUNY at Buffalo Speech-Language and Hearing Clinic Kate Bunton, PhD, CCC-SLP Associate Professor Speech, Language, and Hearing Sciences University of Arizona Jaime Fatás-Cabeza, MMA Associate Professor Director of Translation and Interpretation Department of Spanish and Portuguese University of Arizona Vicki L. Hammen, PhD, CCC-SLP Professor and Program Director Communication Disorders Indiana State University Jennifer M. Hatfield, MHS, CCC-SLP Speech-Language Pathologist Clinical Assistant Professor Indiana University, South Bend Rachel Kasthurirathne, MA, CCC-SLP Indiana University, Bloomington

Breanna Krueger, PhD, CCC-SLP University of Wyoming Florence Lim-Hardjono, MA, PhD (ABD), CCC-SLP Mount Vernon Nazarene University Avinash Mishra, PhD, CCC-SLP University of Connecticut Elisabeth A. Mlawski, PhD, CCC-SLP Assistant Professor Monmouth University Nikki Murphy, MS, CCC-SLP University of Nevada, Reno Kelly S. Teegardin, MS, CCC-SLP, LSLS Cert AVT Instructor I Communication Sciences and Disorders University of South Florida Angela Van Sickle, PhD, CCC-SLP Texas Tech University Health Sciences Center Jason A. Whitfield, PhD, CCC-SLP Bowling Green State University

xix

For Susan For Dianne

1 Introduction to Communication Sciences and Disorders We would build a profession independent of medicine or psychology or speech, based in colleges and public schools. — Van Riper, 1981

Introduction: Communication Sciences and Disorders as a Discipline This is how Charles Van Riper, one of the pioneers of the field of Communication Sciences and Disorders, remembered the early 20th-century beginnings of the discipline. From the time he began to speak as a child, Van Riper had a severe stuttering problem. In young adulthood, he continued to stutter and desperately sought a “scientific” explanation for his problem. He reasoned that if an explanation could be identified through a program of systematic discovery — a program of scientific research — treatment methods would follow from the explanations, perhaps leading to a cure for stuttering. Van Riper interacted with a small group of individuals, several of whom were also people who stuttered; jointly they decided to break away from the

domination of medical and Freudian perspectives on speech disorders. In 1925, approximately 25 individuals established an independent society called the American Academy of Speech Correction. This society was intended as a research organization. One of the charter members of this organization was Dr. Sara Mae Stinchfield, who was the first person in the United States to be awarded a PhD (from the University of Wisconsin) in the field of Speech Pathology. In 1929, the organization changed its name to the American Society for the Study of Disorders of Speech. The word “Study” in the organization’s new name highlighted the scientific goals of the group. This contrasted with the more practical but (in the opinion of some of the founding members of that society) less lofty goal of treating Communication Sciences and Disorders. “Speech teachers,” or people who attempted to help individuals with problems such as stuttering, articulation disorders, language delay, speech and language problems associated with neurological disease, or unintelligible speech resulting from absence or loss of hearing, were well known in society but certainly not professional mainstays in schools and hospitals. The newly minted American Society for the Study of Disorders of Speech struggled a bit because of small membership and some disagreements among members. As recounted by Van Riper (1981), several of the 1

2

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

influential members wanted the group to focus on scientific investigation of stuttering, but others saw the world of Communication Sciences and Disorders more broadly. Pauline Camp, who was serving as the head of speech correction in the State of Wisconsin, proposed that the field could grow by establishing speech correction clinics in universities. These clinics would train future “speech correctionists” as well as scientists interested in the nature and cause of speech disorders. As trained clinicians found employment in public schools and demonstrated their ability to help children with speech problems, the need for additional trained professionals would increase, and the American Society for the Study of Disorders of Speech would grow. Camp’s proposed strategy for growing the profession was right on target. University programs were developed, with the training of “service providers” (clinicians) and scientists conducted in the same environment. The guiding principle of this training concept was the presence of clinicians and scientists in a common environment, teaching each other and enhancing their respective knowledge and performance. Scientists formulated more specific and worthy research questions by obtaining information about the clinical details of communication problems in actual patients, and clinicians sharpened their diagnostic procedures and practice techniques by learning from the research. This training model has persisted until the present day, and has been successful. In 1934, the young speech organization, much larger than it was in 1930, was reconstituted under a third name: the American Speech Correction Association. This name stuck until 1947, when the association was renamed the American Speech and Hearing Association, or ASHA. In 1978, the group was renamed the American Speech-Language-Hearing Association, to recognize the equivalent importance of language function (as compared to the act of producing speech, or the ability to hear) in the understanding of normal and disordered communication function. The association has retained this name to this day but is still referred to as “ASHA.” As of 2018, ASHA reported a membership (including student members) of 203,945 individuals (https://www​ .asha.org/uploadedFiles/2018-Member-Counts​.pdf). Among the members of ASHA are 12,480 who have their primary training in Audiology and practice as Clinical Audiologists. Many of these professionals are also members of the American Academy of Audiology (AAA), an organization whose mission is to define the training and practice guidelines for professionals who work as clinical audiologists (https://www.audiology.org/about-us/academy-information). AAA was

founded in 1988, in recognition of the need for an organization whose primary purpose would be serving the profession of Clinical Audiology. Many of the 12,000+ Audiologists who are members of ASHA are also members of the American Academy of Audiology. There is a difference between the perspectives of ASHA and AAA on the right to practice Clinical Audiology. ASHA currently argues that a Clinical Audiologist must have a Certificate of Clinical Competence in Audiology (CCC-A), issued by ASHA, as the proper credential for the practice of audiology. AAA’s position is that the CCC-A is not necessary for the practice of audiology; what is required is that students-intraining in audiology have a sequence of courses that is recognized as the foundation for training professional audiologists, and that a year of professional work (much like an internship) follows the completion of the coursework training. In the view of AAA, this training prepares the student for state licensure as a Clinical Audiologist, which when obtained provides the “legal” right to practice clinical audiology. The different perspectives on the credentials needed by trainees to practice clinical audiology are complicated; readers are encouraged to visit https://www.audiology​ .org/publications-resources/document-library/audiology-licensure-vs-certification. There is a concerted effort among several different associations, including ASHA and AAA, to resolve these different perspectives (https://www.asha.org/uploadedFiles/AlignedSense-of-Purpose-for-the-Audiology-Profession.pdf).

Communication Sciences and Disorders: The Whole Is Greater Than the Sum of Its Parts When Van Riper remembered the early vision of a discipline “independent of medicine or psychology or speech,” he was not thinking of abandoning the content of these other fields of study. Rather, he imagined an academic and clinical field with a separate identity, forged from the concepts and facts of medicine, psychology, and other disciplines, but clearly something different and new — a field with its own identity, able to stand on its own merits. It is comically ironic (to this author, at least) that over the past 10 to 15 years, two buzzwords on college campuses have been “interdisciplinary research” and “translational research.” The field of Communication Sciences and Disorders embraced these two activities — in fact, defined itself by an interdisciplinary and translation mentality — long before they became fashionable and fundable claims in university settings.

1  Introduction to Communication Sciences and Disorders

An Interdisciplinary Field Communication Sciences and Disorders is a field practiced and studied by individuals with expertise in a variety of academic and clinical disciplines. It is truly interdisciplinary, the product (but not merely the sum) of many different areas of knowledge. Speech is produced by moving structures of the respiratory system, larynx, and vocal tract (the latter sometimes referred to as the “upper articulators,” including the tongue, lips, and jaw). Scientists and clinicians who are interested in communication disorders must understand the anatomy (structure) and physiology (function) of these body parts. When a person speaks, air pressures and flows are generated throughout the speech mechanism, and an acoustic signal (what you hear when someone talks) is emitted from the lips and/or nose. An understanding of these aerodynamic and acoustic phenomena of speech requires at least a foundation of knowledge of basic physics. When the acoustic signal emerges from the talker’s mouth (or nose), it is metaphorically “aimed” at another person who receives it through his or her auditory mechanism. This makes it clear that the anatomy and physiology of the auditory system must be mastered by the person specializing in Communication Sciences and Disorders. As with the process of speech production, hearing and comprehending acoustic signals involve complex mechanisms understood properly only with a decent amount of knowledge in the areas of anatomy, physiology, and physics (and other areas as well). Of course, when talkers produce speech, they want to communicate a message. The nature and structure of the message — what is being communicated, and the form it takes when it is spoken — is determined by linguistic-cognitive processes. For example, linguisticcognitive processes are set into motion by the simple act of asking someone to have coffee. An idea must be developed and structured in linguistic terms according to the intent and wishes of the person doing the asking. The idea is something like, “I want to spend time with this person and suggesting we have coffee at a comfortable café seems like a good approach,” but the manner in which this “want” is structured as a message can vary wildly, depending on many factors. “Would you like to have coffee?” “Hey, how ’bout we grab some coffee?” “I’m really sleepy, let’s stop at Completely Wired and get some coffee.” “I’d really like to talk to you over coffee.” “Let’s have a no-obligation date over coffee.” “Coffee?” These different ways to convey the same message reflect variation in underlying cognitive processes and linguistic structure, both of which are

3

critical to language usage. The clinician and scientist in Communication Sciences and Disorders deal with disorders of language structure and usage, and must therefore have expertise in the broad areas of hearing, cognition, and linguistics.

The term “cognitive-linguistic” refers to psychological processes applied to the use of language forms. “Cognition” refers to several psychological processes, including memory; executive function (e.g., planning behavior, connecting current behavior with future consequences); the development, refinement, and stabilization of mental representations; brain computation speeds; and transfer of information from one type of memory (e.g., short-term memory) to another (e.g., long-term memory). These various aspects of cognition are listed here as separate processes but in fact may overlap and in some cases be different reflections of a single psychological process. “Linguistic” refers to any aspect of language form — sounds, words, sentences, tone of voice, and so forth. The term “cognitive-linguistic” is used here to indicate that the psychological processes previously listed (among others) are applied to language forms and therefore to communication. The same cognitive processes are applied to other forms of knowledge, as well (such as spatial reasoning or mathematics).

We are not done. Because speech and language develop throughout infancy and childhood and may change throughout the lifetime and especially in old age, expertise in Communication Sciences and Disorders requires a solid knowledge of child development and aging. Most obvious, perhaps, is the need to have a broad and deep expertise concerning the many diseases and conditions associated with speech, hearing, and language disorders. Extensive medical knowledge is absolutely necessary to function as an effective specialist in Communication Sciences and Disorders. This knowledge ranges from how surgeries on structures of (for example) the brain, tongue, and ear affect speech, hearing, and language function, to how pharmaceutical interventions (such as drugs for Parkinson’s disease, or schizophrenia, or even chronic arthritis) may change a patient’s ability to communicate. Finally, legal and technical issues are relevant to the profession of Communication Sciences and Disorders.

4

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

These issues concern a person’s right to receive the proper services when he or she has a speech, hearing, or language disorder, as well as the requirements for professional accreditation as someone who can provide services or train people to provide services, or the requirement of extensive training in research to mentor students who intend to devote their careers to research. Our field has been fortunate to have professional leaders who can lay claim to both clinical and research expertise. Table 1–1 provides a partial summary of the areas of knowledge and, in many cases, expertise, required of the professional in Communication Sciences and Disorders. This list includes the areas previously mentioned and adds a few more for good measure. There are (at least) two ways to react to this list. One is to feel intimidated by the need to know so much about so many areas. The other is to look at the combination of these different types of knowledge as something special, as an opportunity to be informed about many different areas of study and, most importantly, to employ an integrated and synthesized fund of this information in an understanding of the most human of behaviors, communication. Of course, a single individual is not likely to be an accomplished expert in each of these areas, but a commitment to learn the basic principles of each of the disciplines listed in Table 1–1, to use this knowledge when providing clinical services to a person with a communication disorder, to function as an effective member of a clinical or research team, or to develop an answer to a research question, is genuinely exciting. Communication Sciences and Disorders is the original, lifelong learning discipline.1

Translational Research Researchers and clinicians are often trained in the same department and yet do not interact professionally to a significant degree. This has been a concern in various branches of medicine, as well as in departments such as Psychology, and Communication Sciences and Disorders. Many scientists in these professions are trained to do something they understand as “basic science.” In basic science, research questions are asked for the sake of improving the knowledge base in a field, or to address purely theoretical questions. An assumption of this approach to research has been that basic science, if done well, will eventually have an effect on clinical practice. In this way of thinking, “basic science” does 1

Table 1–1.  Some Areas of Knowledge Required for People to be Effective Professionals in the Field of Communication Sciences and Disorders Neuroscience Brain anatomy (structure) Brain physiology (function) Neuropharmacology (chemicals and their role in brain function) Motor control (how brain controls movement) Sensory function (how brain processes sensation) Anatomy and Physiology of the Speech Mechanism (muscles, ligaments, membranes, cartilages, etc., associated with the respiratory, laryngeal, and upper airway system, which collectively are called the “speech mechanism”) Anatomy and Physiology of the Hearing Mechanism (bones, membranes, ligaments, special structures of the ear) Child Development Aging Diseases of the Head, Neck, Respiratory System, Auditory System, and Brain Syndromes Physics Aerodynamics Acoustics Movement Cognition Memory and Processing Planning Manipulation and Use of Symbols Linguistics Phonetics and Phonology Morphology Syntax Semantics Pragmatics

not need to be motivated or prompted by immediate clinical concerns; any improvement in knowledge of the world must have implications for the betterment of humankind.

As a university professor in Communication Sciences and Disorders, I more than once told students that it was hard to believe someone was willing to pay me to come to my office every day, learn new things in many different areas, and use this information in my research, in the classroom, and in mentoring teaching (one-on-one instruction, as with graduate students training to be researchers).

1  Introduction to Communication Sciences and Disorders

Let’s consider an example of a possible link between basic science and clinical application. A fair number of scientists have investigated birdsong and its relationship to the evolution of human language (reviews can be found in Fitch, 2000, 2006, and Deacon, 1998). Much of this work has been funded by a federal agency, the National Institutes of Health (NIH), whose primary mission is to sponsor research that ultimately improves health care in the United States. The research on birdsong (and vocalizations produced by other, nonhuman species) has been “sold” to the federal agency by claiming potential links between, on the one hand, an understanding of why and how birds sing, and on the other hand, a better understanding of speech and language capabilities in humans. The link between birdsong and human communication is evolutionary, in which birdsong is a “step” along the evolutionary path to human vocalization for purposes of communication. The reasoning is extended by arguing that a better understanding of the basic “mechanisms” of vocal communication, which can be studied in birds using techniques that cannot be used in humans,2 should eventually lead to a better understanding of the partial or complete failure of similar mechanisms in humans. A better understanding of disease-related problems in human vocalization should, this reasoning concludes, result in better ways to diagnose and treat human vocalization disorders. Basic science such as work on birdsong has been criticized for occupying federal funds that might be used to fund “applied” research. “Applied science” is research with more immediate clinical consequences, research with less distance between the results of a study and its potential use in clinical settings. For example, funding could be provided for a research program in which participants with healthy voices are enrolled in a vocal exercise regime (like the kind of warm-up exercises used by many professional singers) and compared to a group of participants who do not engage in this exercise (a “control group”). The applied research question is, do nonspeech vocal exercises generalize, or translate, to the use of the voice in everyday speech? Perhaps the effect of the vocal exercise could be evaluated by having listeners judge the quality of participants’ voices, with the critical comparison being the “goodness” (pleasing quality?) of voices pre- versus postexercise. This is basic, nonclinical research — nonclinical because the participants do not have voice disorders — but a positive result, where exercise produces a more pleasing voice, points more directly to a specific clinical application in patients with voice problems. 2 

5

The relatively new buzzword for applied science is “translational research,” or research in which the results of basic science can be translated relatively quickly to clinical application. The hypothetical vocal exercise study is one example of translational research; many others have been proposed (see Ludlow et al., 2008; Raymer et al., 2008). The National Institutes of Health (NIH), the federal agency having the mission of funding and setting priorities for health-care-related research activities in the United States, published in 2008 the following text on its website concerning translational research: To improve human health, scientific discoveries must be translated into practical applications. Such discoveries typically begin at “the bench” with basic research — in which scientists study disease at a molecular or cellular level — then progress to the clinical level, or the patient’s “bedside.”

Scientists are increasingly aware that this benchto-bedside approach to translational research is a twoway street. Basic scientists provide clinicians with new tools for use in patients and for assessment of their impact, and clinical researchers make novel observations about the nature and progression of disease that often stimulate basic science. See https://nexus.od.nih​ .gov/all/2016/03/25/nihs-commitment-to-basic-sci​ ence/ for a summary of the benefits of funding both kinds of research. The National Institute on Deafness and Other Communication Disorders (NIDCD), the NIH institute that is the primary funder of research in Communication Sciences and Disorders, has a specific funding program for translational research (as of 2017). This funding mechanism is called the Research Grants for Translating Basic Research into Clinical Tools. The stated objective and requirements of these grants are as follows: [T]o provide support for research studies that translate basic research findings into better clinical tools for human health. The application should seek to translate basic behavioral or biological research findings, which are known to be directly connected to a human clinical condition, to a practical clinical impact. Tools or technologies advanced through this FOA [Funding Opportunity Announcement] must overcome existing obstacles and should provide improvements in the diagnosis, treatment or prevention of a disease process. For the purposes of this FOA, the basic science advance-

 uch as creating a small area of brain damage to see how it affects the development of birdsong, or depriving a newborn bird of exposure S to his or her species’ song to determine if, as the baby bird matures, the song develops in the same way as in birds who are exposed to their song from birth.

6

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

ment must have previously demonstrated potential for clinical impact and the connection to a human clinical condition must be clearly established. The research must be focused on a disease/disorder within one or more of the NIDCD scientific mission areas: hearing, balance, smell, taste, voice, speech, or language. Research conducted under this FOA is expected to include human subjects. Preclinical studies in animal models are allowed only for a candidate therapeutic that has previously demonstrated potential for the treatment of communication disorders. The scope of this FOA allows for a range of activities encouraging the translation of basic research findings to practical impact on the diagnosis, treatment, and prevention of deafness and other communication disorders. [https://grants​ .nih.gov/grants/guide/pa-files/PAR-17-184.html]

The first statement presents the issue of “translational research” with molecular or cellular work as the basic science, but basic science exists at the behavioral level of analysis, as well. This is why the NIDCD description mentions a “range of activities” in its mission to fund translational research in Communication Sciences and Disorders. Both of these NIH statements imply that it is the basic scientist’s obligation to show how laboratory results can be “translated” to clinical settings. This is in contrast to earlier models of the basic science/applied science dichotomy, in which the basic scientist might have said, “I’ll do the bench work (very basic science) and down the road, perhaps way down the road, clinicians can figure out how to use my findings when they diagnose and treat patients.” In this view, the clinician, not the scientist, has the primary responsibility for translating the basic science to clinical application. The second paragraph of the statement sounds remarkably similar to the concept, described previously, of training “speech correctionists” in university settings where clinical practice informs the direction of research programs, and research findings enhance clinical practice. Pauline Camp suggested this concept in 1934, and our discipline has been guided by the “two-way street” philosophy since that time. As a field, we have understood the potential value of “translational research” for a long time.

Does the Basic Science Work? Does the Clinic Work? It is all well and good to claim that people in the field of Communication Sciences and Disorders understood the value of interdisciplinary work, and practiced translational research well before the concept was so christened and attained the status of an official move-

ment on 21st-century university campuses and in government funding agencies. It is quite another thing to claim scientific success as the result of interdisciplinary efforts, or to show that basic science has indeed been translated to clinical application. A major goal of this text is to present introductory information on normal and disordered communication processes in a way that highlights previous, and the latest, scientific findings that have emerged from interdisciplinary thinking. For the time being, the reader is asked to trust the claim that the growth of the scientific basis of normal communication processes, and Communication Sciences and Disorders, has been nothing short of spectacular over the last 50 years. None of this would have been possible if speech, language, and hearing scientists had not been open to the influences and thinking of scientists in areas such as linguistics, physiology, neuroscience, and psychology (among others). Most importantly, the openness of these scientists to the experience and knowledge of clinical speech-language pathologists and audiologists has made a huge difference to the growth of the scientific knowledge base in normal and disordered communication. It is not a goal of this text to present detailed information on therapy (management) techniques for persons with speech, language, and/or hearing disorders. Readers will learn a great deal about speech, language, and hearing disorders, but a full treatment of clinical processes and procedures is a topic for a more advanced course of study, typically in graduate programs (see later in the chapter). An aspect of the clinical process that is discussed throughout this text is the diagnosis of speech, language, and/or hearing disorders. Technically, diagnosis involves the identification and determination of the nature and cause of a disorder. Notice the inclusion of “nature” in this definition. Proper techniques must be employed to describe a disorder and to document the characteristics of a communication disorder that make it different from other communication disorders. A good part of this text is therefore devoted to descriptions of how we know a specific speech, language, and/or hearing disorder is “x” and not “y.” This text does not shy away from controversies in our field about the nature and causes of certain communication disorders. As in any health-care-related field, many diagnoses remain unclear and are the subject of ongoing debate. In the best of all worlds (sorry,Voltaire), we would welcome absolute certainty concerning the diagnosis of human diseases and conditions. The world-as-is, however, does not allow such certainty, but let’s not regard the gray areas as defeats; they are opportunities. Uncertainty and controversy have always been the engines of scientific advancement. Not knowing, or disagreement about what we

1  Introduction to Communication Sciences and Disorders

do know, pushes science forward. Diagnosis, then, is a critical part of the scientific underpinnings of a health-care-related discipline such as Communication Sciences and Disorders. In many cases, questions concerning clinical diagnosis and the basic science foundation of our field are completely intertwined. The second part of the heading for this section asks, “Does the Clinic Work?” Do speech-language pathologists and audiologists make a difference in the lives of people with communication disorders? Although this text does not present detailed information about treatment of communication disorders, there is widespread evidence for treatment success. It is important for the reader to know that many of the services offered by clinicians in our field have been documented as being effective. In the absence of such documentation, the entire enterprise of training clinicians to treat communication disorders could be questioned. Fortunately, our interdisciplinary and translational approach to understanding communication disorders has produced diagnosis and management techniques that are effective for many patients. A selective sampling of publications in which this clinical success is reviewed includes results for voice therapy (Angadi, Croke, & Stemple, 2017; Desjardins, Halstead, Cooke, & Bonilha, 2017; Ramig &Verdolini, 1998; Ruotsalainen, Sellman, Lehto, Jauhiainen, & Verbeek, 2007), hearing disorders (Ferguson, Kitterick, Chong, Edmonson-Jones, Barker, & Hoare, 2017; Kaldo-Sandström, Larsen, & Andersson, 2004; Mendel 2007), stuttering (Baxter et al., 2015; R. Ingham, J.C. Ingham, Bothe, Wang, & Kilgo, 2015; Tasko, McClean, & Runyan, 2007), childhood articulatory disorders (Gierut, 1998; Wren, Harding, Goldbart, & Roulstone, 2018), and childhood language disorders (Law, Garrett, & Nye, 2008; Tyler, Lewis, Haskill, & Tolbert (2003). Students who obtain undergraduate and graduate degrees in our field learn the scientific basis and technical details of these successful clinical strategies. This is not to say that we have conquered all, or even many, of the communication disorders affecting people around the world. Indeed, there is a substantial amount of disagreement concerning precisely what constitutes therapy “success” for people with communication disorders, and a specific therapy technique may work for some patients but not others. But the articles listed previously show a pattern of success for many communication disorders; continuing research will add to this list.

Evidence-Based Practice Although this text does not present detailed information on management (treatment) of communication

7

disorders, the concept of evidence-based practice (EBP) and its role in speech, language, and/or hearing therapy is integral to an understanding of how knowledge of typical and disordered communication is related to treatment of communication disorders. EBP, a movement with roots in the medical world, takes as its central concept that any treatment approach should be supported by scientifically based evidence of the treatment’s effectiveness (the term “efficacy” is often used to refer to effectiveness of a therapy procedure, but the technical sense of “efficacy” is an experimental demonstration that a particular clinical technique shows promise as an effective management tool; it is like a first step in the determination of a treatment’s real-world effectiveness). The need to formalize such a notion may at first glance seem surprising, for should a treatment not be administered in the absence of solid evidence that it works? Again, in the best of all worlds this would be so, but in much of medicine and behavioral sciences, including Communication Sciences and Disorders, the effectiveness of treatments is often unknown or only partially supported by research data. EBP must be based on proper outcome measures. Evidence for the success of a therapeutic approach requires the measurement of one or more variables after (or sometimes during) the treatment. Outcome measures should have the best possible face validity, meaning that the measures provide good indices of the phenomena they are supposed to represent. An example from basketball helps to understand face validity of outcome measures. If an outcome measure is desired for a player’s in-game shooting accuracy following several months of intense practice of nongame, unguarded shooting, the percentage of shots made over 100 attempts has good face validity if the measure is taken during games. The measure has much poorer face validity if the measure is taken over 100 shots attempted during multiple games of HORSE. Shooting percentage during games is a much better outcome measure for “real-world” shooting as compared to shooting percentage during games of HORSE. An example from health care, closer to the concerns of this textbook, is one of drug treatment for epilepsy for which there may be the potential for multiple outcome measures with face validities that are only subtly different. The question is, after 6 months of drug treatment, are there fewer seizures as reported by the patient (one potential outcome measure)? As reported by the patient, are there no seizures over the same time period (a second potential outcome variable)? After 6 months of drug treatment, can a seizure be induced in the clinical setting by very bright flashing lights (a third potential outcome variable)? Or, after 6 months, are the blood levels of the drug in the “correct” range based on values reported in the scientific literature (a fourth

8

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

potential outcome variable)? At first glance, the first two outcome measures have the best face validity — the best evidence for reduction of seizures is a report from a patient that seizure episodes have been reduced or eliminated. Some clinicians and scientists, however, may think that patient-reported data are unreliable because they are subject to the notorious uncertainties of memory or even a patient’s misrepresentation of seizure history. Measures such as inducement of a seizure by flashing lights or drug blood levels are regarded as more objective (and have a clearly quantitative basis) and therefore may seem more reliable than the patient reports of seizure history. Yet, from the perspective of the patient, inducement of a seizure in a controlled clinical setting or “good” drug blood levels mean very little when he or she is losing consciousness two or three times a week or even having many episodes of preseizure activity. The choice of a proper outcome measure (or measures) is not straightforward and is often the subject of considerable debate. The debate is lively and even heated when the behaviors of speech, language, and hearing disorders are evaluated for their response to therapy. Readers may want to keep this in mind when considering the concept of EBP. The concept of EBP has taken on a life of its own as an academic discipline, and there is no end to the debate about precisely what serves as “good” scientific evidence for the efficacy of a treatment. Table 1–2 pre­ sents a six-level EBP model of “goodness” of evidence, with the “best” evidence at the top (Level I) and the worst at the bottom (Level VI). This simplified model of EBP serves the purposes of this discussion well and has been presented several times in the Communication Sciences and Disorders literature (Dodd, 2007; Dollaghan, 2004; Moodie, Kothari, Bagatto, Seewald, Miller, & Scollie, 2011).

Levels of Evidence

Table 1–2.  Levels of Evidence Applied to EvidenceBased Practice: A Simplified Model

Level I and II Evidence in Communication Sciences and Disorders.  In Communication Sciences and Disorders, it is relatively difficult to obtain Level I and II evidence. How easy is it to find, for example, 100 people who have a similar stuttering problem, or 100 people who have had a stroke and who have very similar problems with expressing or comprehending speech? How easy is it to find 100 children with autism, who all have the same communication challenges and similar characteristics in noncommunication domains? In each of these cases, the answer is: Not easy at all. In addition, it is unusual for different laboratories that study communication disorders, and even a single communication disorder such as stuttering (as one example), to use the same measure of stuttering fre-

Level I

Systematic reviews and meta-analyses of randomized controlled trials (RCTs)

Level II

A single RCT

Level III

Nonrandomized, controlled (well-designed) treatment studies

Level IV

Nonexperimental studies

Level V

Case reports and/or narrative literature reviews

Level VI

Expert/authority opinion

Level I and II evidence are usually based on large numbers of participants to generate the most reliable statistical results. In Table 1–2, Level I evidence is summarized as “systematic reviews” or “metaanalyses” of RCTs. An RCT is an experiment in which each individual from an initial, large pool of participants is randomly assigned to one of two (or more) treatments. Ideally, neither the experimenters nor the participants have knowledge about which treatment has been assigned to any participant in the study. The participants (and in many cases, the experimenters) are “blind” to which participants have been assigned to which treatments, and the participants are “blind” to the status of their treatment condition (real treatment group, or placebo group). This is an example of a “double-blind” experiment. In Table 1–2, Level I evidence is summarized as “systematic reviews” or “meta-analyses” of RCTs. A systematic review is the organization and evaluation of data from many different, individual RCTs, and a “meta-analysis” is a quantitative (statistical) analysis of the data from many such studies. A meta-analysis of the results of many different studies can only be done when the data from each study are sufficiently comparable — as when the same pretreatment and outcome measures were used in the different studies (such as number of seizures per week), the same blinding conditions, the same dosage levels, and so forth. Level II evidence is the result from a single RCT. Level II evidence is high-level scientific evidence but is not as trustworthy as having many different demonstrations, from different laboratories and different scientists, of the same outcome. In other words, when Level II evidence is replicated several times, Level I evidence has been produced.

1  Introduction to Communication Sciences and Disorders

quency (perhaps number of stuttered words per 100 words produced). For these reasons and others, RCTs are unusual in our field. Many RCTs in medical fields contrast an experimental group that receives a trial drug for a condition or disease, and a control group that receives a placebo. Both groups take pills on a schedule, but do not know if they are taking the experimental medication or sugar pills. Such experiments in Communication Sciences and Disorders raise an ethical question: how do you withhold treatment from a group of individuals with a communication disorder? RCTs are unusual in our field. They are difficult to execute because of many factors, not the least of which is assembling an initial participant pool, with the same kind and degree of speech/language/hearing challenges, from which random assignment to different treatment types is possible. Perhaps this explains why some introductory texts in Communication Sciences and Disorders (e.g., Justice & Redle, 2014) choose to talk broadly about EBP from sources external to scientific pursuits. These sources include patient values and preferences, and clinician expertise (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996). These factors are considered jointly with scientific data as contributions to EBP in the life of a speech-language pathologist or audiologist. The absence of solid Level I and II “high-level” evidence in our field places greater weight on the other factors (patient preference, clinical experience) in treatment decisions made by speech-language pathologists and audiologists. Level III Evidence.  The description of Level III evidence in Table 1–2 is “non-randomized, controlled (well-designed) treatment studies.” As in the case of RCTs, two groups are typically studied and compared, one receiving Treatment X, the other Treatment Y (or no treatment). Level III evidence does not involve randomization from a pool of eligible subjects but must be well controlled in other ways. Studies that produce Level III evidence are relatively common in the communication sciences and disorders literature. Level III evidence often comes from studies with a relatively small number (e.g., 10 to 20) of participants in each group, certainly smaller than the group numbers in (for example) drug trials. In addition to the absence of randomization of participants to treatment conditions, the relatively small number of participants in Level III studies renders them less powerful statistically and, therefore, less “valued” than RCTs. Level IV Evidence.  Level IV evidence is produced when a study is performed in the absence of proper

9

experimental controls. The lack of a control group whose performance can be compared to an experimental group is a common problem in experiments that align with Level IV evidence. Level IV-type evidence is found in in the speech, language, and hearing literature. Treatments are applied to a group of individuals with communication disorders, in the absence of proper controls. People with communication disorders improve following the treatment, and a conclusion is reached that the specific treatment is to be valued for its positive effect on the communication impairment. In the absence of controls, however, any form of treatment, not the specific treatment employed, may have improved the communication skills of a group of persons with a communication impairment. Levels V and VI Evidence.  Levels V and VI are types of evidence considered to be poor support for a treatment approach in any field. Case reports, which consider the outcome of a specific treatment applied to a single patient, or to a series of patients with similar characteristics, lack controls and cannot be generalized to a larger group of patients. The absence of experimental controls and the study of only a single or few individuals contribute heavily to the evaluation of this kind of evidence as “poor quality.” Even so, case reports are common in the health care literature, including the treatment literature in Communication Sciences and Disorders. An argument can be made that case reports gain value when they are organized and synthesized in a single publication, with conclusions drawn from the careful analysis of results across reports. The problem with this line of thinking is that the primary problem of lack of experimental controls in each case report is not solved by accumulating many case studies. The shared flaw of most case studies, of no experimental controls, means that a summary of many cases for the purpose of providing evidence to support a treatment approach is a summary of many flawed experiments. Another type of Level V evidence is the narrative literature review. Narrative literature reviews are publications in which a large number of research papers, most often those that provide Level III evidence, are organized and evaluated for the purpose of drawing qualitative conclusions about a focused issue. Narrative reviews are popular in Communication Sciences and Disorders and are published in leading journals. Narrative reviews have poor evidence quality for the purpose of supporting a treatment approach, because ultimately they are position papers, like editorials, with a primary

10

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

aim of persuading readers that their conclusion(s) is (are) preferable to alternate conclusions.3 The narrative review and its aim to persuade by summaries of existing research findings and theoretical issues, is a more scholarly version of the lowest evidence level, that of expert/authority opinion. Anyone can have an opinion that is stated as the likely truth. When “anyone” turns out to be an authority in a discipline, and asks that his or her position be accepted not on the basis of published data but on his or her authority, the evidence has little or no value. The concept of EBP is firmly grounded in the interaction and co-dependency of laboratory experiments and clinical practice. Scientists construct experiments to generate results in support of proper diagnosis and effective clinical management, and clinicians apply the findings to their patients and evaluate their real-world results. On the basis of those clinical results, scientists may adjust their experiments to provide additional and improved data for EBP.

A Typical Undergraduate Curriculum Table 1–3 shows the undergraduate, major curriculum in Communication Sciences and Disorders at the University of Wisconsin–Madison. This sequence of courses is more or less representative of curricula in any department in the United States that offers an undergraduate degree in our field (some variation will occur from department to department). The course for which this text was written is shown in parentheses because it is not a requirement in the UW–Madison department for an undergraduate major in the field. Rather, this course is taken each semester by a large number of students to satisfy a breadth requirement in the College of Letters and Science. Many students who choose to major in Communication Sciences and Disorders at UW–Madison do take the introductory course, and in many cases, the exposure to the field provided by the class is the reason they choose Communication Sciences and Disorders as their major. A group of courses in the curriculum (Speech Science; Hearing Science; Neural Bases of Speech, Hearing, and Language; Speech Acoustics and Perception; Language Development in Children and Adolescents; the Phonetic Transcription module of Phonological Development and Disorders) establishes a solid scientific foundation for normal (typical) processes of communication. Other courses (the second part of 3 

Phonological Development and Disorders; Voice, Craniofacial, and Fluency Disorders; parts of Neural Bases of Speech, Language, and Hearing Disorders; Auditory Rehabilitation; Child Language Disorders: Assessment and Intervention) provide basic information on the classification, causes, and nature of the many diseases and conditions associated with communication disorders. Some curricula may have a course called “Preclinical Observation,” in which students are introduced to the clinical process by observing clinical sessions, rather than being directly involved in diagnosing or treating communication disorders.

Who Are the Professionals in Communication Sciences and Disorders? Students obtain undergraduate and graduate degrees in preparation for a job. In the field of Communication Sciences and Disorders, this preparation is for employment as a speech-language pathologist or audiologist in an educational or health care setting. Or, a student may prepare for a career as a professor in a college or university setting. At the undergraduate level, training is not differentiated across these different career paths. Nearly everyone who intends to be a professional in Communication Sciences and Disorders learns a common scientific foundation for the field, as summarized in Table 1–3.

Preparation for, and the Profession of, Speech-Language Pathology The requirements to practice as a speech-language pathologist (SLP) include coursework that furnishes a knowledge base specified by ASHA, completion of a master’s degree, a clinical fellowship, and successful performance on a national exam. The information presented here is based on ASHA’s published certification standards as of 2014, as well as some revisions and amendments to these standards published in 2016. ASHA documents are available at https://www.asha.org Students finishing an undergraduate major in Communication Sciences and Disorders apply to master’s degree training programs in the fall semester of their senior year (or later, if they decide to take a year or two off before beginning graduate school). There are currently over 200 such training programs in the

 he author feels free to point to the evidentiary weakness of narrative reviews because he has published several of them. Conversely, narrative T reviews may organize the literature in a way that is useful for clinicians and scientists as they pursue their professional goals.

1  Introduction to Communication Sciences and Disorders

11

Table 1–3.  The Undergraduate Curriculum for a Major in Communication Sciences and Disorders at University of Wisconsin–Madison

Course

Comments

(Introduction to Communication Sciences and Disorders)

Survey of field

Speech Anatomy and Physiology (speech science)

Anatomy and physiology of speech mechanism (respiratory system, larynx, upper articulators)

Hearing Anatomy and Physiology (hearing science)

Anatomy and physiology of hearing mechanism; basic acoustics

Neural Bases of Speech, Language, and Hearing Disorders

Basic neuroanatomy and diseases of nervous system that affect communication

Language Development in Children and Adolescents

Typical language development from infancy through adulthood, with information on atypical language development

Speech Acoustics and Perception

Speech acoustics, speech perception, role of speech acoustics in understanding articulation processes and understanding the speech signal

Phonological Development and Disorders

Basic phonetics; typical development of the speech sound systems of languages; definition, causes, and nature of developmental speech sound disorders

Voice, Craniofacial, and Fluency Disorders

Classification, causes, and nature of disorders of the larynx (voice disorders), syndromes and other related genetic/embryological disorders affecting the speech mechanism, and fluency disorders such as developmental stuttering

Introduction to Audiology

Hearing science, approaches to evaluation of auditory disorders, interpretation of frequently used hearing tests

Preclinical Observation

Introduction to clinical issues via lectures and observation of clinical interactions

Auditory Rehabilitation

Principles and techniques for auditory training of individuals with hearing impairment

Child Language Disorders: Assessment and Intervention

Child language disorders in various populations, their classification, nature, and causes, plus the scientific basis of assessment

United States, as well as about 10 in Canada. Clinical training programs, either established or under development and based closely or more generally on the ASHA model, are available in Australia, Brazil, Belgium, China, England, Finland, Germany, Hong Kong, Ireland, Italy, the Netherlands, New Zealand, Scotland, South Africa, South Korea, Sweden, and Taiwan, among others. Many opportunities exist to find a program that fits an individual student’s needs. Master’s degree training typically includes 2 years of advanced course-

work designed to build on the foundation developed by the undergraduate course of study. Coursework at the undergraduate and master’s degree levels is designed to meet certain training standards established by ASHA (for American universities) or Speech-Language and Audiology Canada (SAC, for Canadian universities). A critical component of training at the master’s level is direct clinical experience in the diagnosis and treatment of speech-language disorders. ASHA standards require students to obtain 400 hours of direct clinical experience by the end of their master’s program. The

12

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

knowledge and skills derived from coursework and clinical experience are one component of eligibility for certification by ASHA. ASHA sets certification standards for students in training, and develops other documents and standards for professional SLPs (described in full detail at https:// www.asha.org/Certification/2014-Speech-LanguagePathology-Certification-Standards/). The steps toward clinical certification are summarized in Table 1–4. SLPs work in a variety of settings. Hospital clinics, private medical practices, public and private schools, rehabilitation centers, and nursing homes are common work sites for SLPs. SLPs also work in private practice, setting up businesses that offer on-site diagnosis and therapy (like a physician’s private practice) or contracting their services to other sites. A significant number of SLPs work in university settings, supervising the training activities of future SLPs. Finally, SLPs who earn a PhD typically work on a university faculty where they teach and do research. These individuals — probably like your instructor in this course — continue their schooling past their master’s training for (on average) between 3 and 5 years to earn the PhD degree. At most universities, faculty members are expected to perform and publish research, provide classroom instruction at the undergraduate and graduate levels, serve on committees, and mentor students in laboratory or clinical settings.

SLPs diagnosis and treat a wide variety of speech and language problems. These clinical activities range from early intervention with a child who has delayed language development or is showing an early form of stuttering, to diagnoses and treatment of voice problems, or to serving as a member of a team of health care professionals who provide services to children with cleft palates. There are too many areas of professional speechlanguage pathology involvement to mention here, but most are discussed in this text. The scope of speechlanguage pathology practice is published at https:// www.asha.org/policy/SP2016-00343/ (2016 revision).

Preparation for, and the Profession of, Audiology This is a professional doctorate, analogous to the professional doctoral degree required for the practice of (for example) optometry and pharmacy.4 At the current time (based on 2012 standards), an individual enrolled in an AuD program and who seeks the Certificate of Clinical Competence in the area of Audiology (CCC-A) must obtain a minimum of 75 credit hours of postbaccalaureate (graduate-school) study. Like the master’s degree in Speech-Language Pathology, students enrolled in an AuD program take academic courses and are engaged in clinical practica.

Table 1–4.  Steps to Obtaining Certificate of Clinical Competence in SpeechLanguage Pathology (CCC-SLP) or Audiology (CCC-A) CCC-SLP

Undergraduate degree (or equivalent) in accredited university program → Clinical master’s degree from accredited university program (2 years of coursework + total of 400 hours of supervised clinical training) → Clinical Fellowship Year (36 weeks, full-time clinical work, supervised by professional who already holds CCC-SLP) → Pass national exam

CCC-A

Undergraduate degree (or equivalent) in accredited university program → AuD degree from accredited university program (3 years of coursework + 1 year of full-time, clinical practice, supervised by professional who already holds CCC-A) → Pass national exam

4 

 he reader may wonder if there has been a similar movement among SLPs to require a professional doctorate in speech-language pathology T for clinical practice. There have been several attempts to specify standards and training objectives for something that might be called the “SplD”— Doctor of Speech-Language Pathology — but the movement has never been sufficiently focused or persuasive to prompt a serious consideration of abandoning the clinical master’s degree in favor of a professional doctorate as the “entry-level” degree to practice clinical speech-language pathology. Professional doctorates in Speech-Language Pathology are offered at some universities. One model for this degree and its requirements is the program at the University of Pittsburgh (https://www.shrs.pitt.edu/CScD).

1  Introduction to Communication Sciences and Disorders

The clinical practica in an AuD program are designed to prepare students to diagnose and treat hearing and balance disorders, and to counsel patients about these disorders and their ongoing management. Unlike master’s-level training for SLPs, the final year of the AuD program is spent in full-time clinical practice under the supervision of an individual who has the CCC-A. In total, AuD students obtain a minimum of 1,820 hours of clinical practicum before they receive the degree.5 Full details of ASHA requirements for earning an AuD are published at https://www.asha.org/Cert​ ification/2012-Audiology-Certification-Standards/ The AuD program typically requires 3 or 4 years of postbaccalaureate study, including the final year of full-time clinical practice. Currently, there are approximately 75 AuD programs in the United States. Audiologists work in many of the same settings as SLPs, with some exceptions. For example, audiologists may work for hearing aid companies, contributing to the design and use of hearing aids and dispensing them to patients. The fitting of hearing aids is done by finding the best style and amplification characteristics for the specific characteristics of a patient’s hearing loss; an important component of the AuD training involves

Pauline Camps’ 1934 plan to train SLPs for work in the public schools looked like prophecy when, in 1975, Public Law 94-142 was enacted. Public Law 94-142 is now called the Individuals with Disabilities Education Act (IDEA) and is linked with the Americans with Disabilities (ADA) act, both originally formalized as laws in 1990. These laws had the specific purpose of protecting the rights of children with disabilities (and their parents) and guaranteeing access to a public education and the special services required to make that education effective. Specifically, the law had four purposes: (a) “to assure that all children with disabilities have available to them . . . a free appropriate public education which emphasizes special education and related services designed to meet their unique needs,” (b) “to assure that the rights of children with disabilities and their parents . . . are protected,” (c) “to assist States and localities to provide for the education of all children with disabilities,” and (d) “to assess and assure the effectiveness of efforts

5 

13

hearing aid fitting (see Chapter 24). Audiologists are also regularly employed by otologists (physicians who deal with diseases of the ear). In addition to testing hearing and fitting hearing aids, audiologists assess balance problems (the mechanisms for balance and hearing are closely related), help patients manage excessive production of earwax, and train people with hearing loss to enhance their communication skills. The full scope of practice for persons with an AuD is published at https://www.asha.org/policy/SP2004-00192/

Order of Chapters in the Text This text is organized into three general areas: language, speech, and hearing. Each of these general areas is initiated with a chapter (or two) on normal processes. These normal processes are presented to support an instructional philosophy that disordered language, speech, or hearing are understood best by reference to “normal” processes and behavior. Two important aspects of the material in this text should be kept in mind when reading the chapters. Although the text is arranged in the order of language,

to educate all children with disabilities.” The law provides that each child with a disability who attends public school be provided with an Individualized Education Program (“IEP” in the jargon of school officials) designed by a team of specialists. These specialists include (among others such as occupational therapists, physical therapists, and special education teachers) SLPs and audiologists. To meet the needs of children with disabilities, these professionals must be employed by public school systems. The specifics of IDEA have undergone some changes since the original 1975 enactment of the law, but children with disabilities are still guaranteed, by law, a public education and specially designed educational plan. Public school systems in the United States therefore offer many employment opportunities for SLPs and audiologists. More information on PL 94-142, its history, and specifics, can be obtained by typing in “IDEA” in any search engine on the Web. The curious Web surfer will find the roots of PL94-142, ADA, and IDEA in the Civil Rights Act of 1964.

 he difference between this requirement for an AuD degree, and the degree requirement for clinical training of SLPs is more logistical than T substantial. As described in the text, students who complete an accredited master’s degree in speech-language pathology must have a 36-week, full-time clinical experience before they are eligible for the CCC-SLP. This experience is obtained after the degree is completed, usually in the form of a job. AuD programs incorporate this requirement into their degree programs.

14

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

speech, and hearing , the sequencing of material and the content of each chapter is somewhat arbitrary. The separation of Communication Sciences and Disorders into language versus speech versus hearing disorders, and the separation of their normal processes, is not realistic for real-world clinical and scientific settings. The material in this textbook is separated in this way for instructional purposes, but the reader should keep in mind the interconnected processes and disorders of the three areas. When appropriate, the reader is reminded of these interconnections.

Chapter Summary Communication Sciences and Disorders was formalized as an academic discipline in the 1920s and 1930s, and has enjoyed enormous growth since that time. The field grew because of its interdisciplinary roots and interactions but forged a separate identity. The field is also committed to taking the results of laboratory research and “translating” them to the clinic for both diagnostic (determining what the problem is) and treatment purposes. Clinical and academic degrees are available in the areas of speech-language pathology and audiology. Most of the regulations that govern the training and conduct of professionals in Communication Sciences and Disorders are developed and overseen by the American Speech-Language-Hearing Association (ASHA). This text is written to introduce students to Communication Sciences and Disorders; the text surveys a wide range of normal processes of language, speech, and hearing, and the way in which these processes can be affected by disease and/or conditions to affect communication. In this textbook, emphasis is placed on what we know about normal (typical) communication processes as well as the nature of language, speech, and hearing disorders that affect an individual’s ability to communicate.

References Angadi, V., Croke, D., & Stemple, J. (2019). Effects of vocal function exercises: A systematic review. Journal of Voice, 33, 124.e.13–124.e34 Baxter, S., Johnson, M., Blank, L., Cantrell, A., Brumfitt, S., Enderby, P., & Goyder, E. (2015). The state of the art in non-pharmacological interventions for developmental stuttering. Part 1: A systematic review of effectiveness.

International Journal of Language and Communication Disorders, 50, 676–718. Deacon, T. W. (1998). The symbolic species. New York, NY: W. W. Norton. Desjardins, M., Halstead, L., Cooke, M., & Bonilha, H. S. (2017). A systematic review of voice therapy: What “effectiveness” really implies. Journal of Voice, 31, e13–e32. Dodd, B. (2007). Evidence-based practice and speechlanguage pathology. Folia Phoniatrica et Logopaedica, 59, 118–129. Dollaghan, C. A. (2004). Evidence-based practice in communication disorders: What do we know and when do we know it? Journal of Communication Disorders, 37, 391–400. Ferguson, M. A., Kitterick, P. T., Chong, L. Y., EdmonsonJones, M., Barker, F., & Hoare, D. J. (2017). Hearing for mild to moderate hearing loss in adults. Cochrane Database of Systematic Reviews. doi:10.1002/14651858.CD012023​ .pub2 Fitch, W. T. (2000). The evolution of speech: A comparative review. Trends in Cognitive Science, 4, 258–267. Fitch, W. T. (2006). The biology and evolution of music: A comparative perspective. Cognition, 100, 173–215. Gierut, J. A. (1998). Treatment efficacy: Functional phonological disorders in children. Journal of Speech, Language, and Hearing Research, 41, S85–S100. Ingham, R. J., Ingham, J. C., Bothe, A. K., Wang, & Kilgo, M. (2015). Efficacy of the Modifying Phonation Intervals (MPI) stuttering treatment program with adults who stutter. American Journal of Speech-Language Pathology, 24, 256–271. Justice, L. M., & Redle, E. E. (2014). Communication sciences and disorders: A clinical evidence-based approach (3rd. ed.). Boston, MA: Pearson. Kaldo-Sandström, V., Larsen, H.C., & Andersson, G. (2004). Internet-based cognitive-behavioral self-help treatment of tinnitus: Clinical effectiveness and predictors of outcome. American Journal of Audiology, 13, 185–192. Law, J., Garrett, Z., & Nye, C. (2003). Speech and language therapy interventions for children with primary speech and language delay or disorder. Cochrane Database of Systematic Reviews. doi:10.1002/14651858.CD004110 Ludlow, C. L., Hoit, J., Kent, R., Ramig, L. O., Shrivastav, R., Strand, E., . . . Sapienza, C. M. (2008). Translating principles of neural plasticity into research on speech motor recovery and rehabilitation. Journal of Speech, Language, and Hearing Research, 51, S240–S258. Mendel, L. L. (2007). Objective and subjective hearing aid assessment outcomes. American Journal of Audiology, 16, 118–129. Moodie, S. T., Kothari, A., Bagatto, M. P., Seewald, R., Miller, L. T., & Scollie, S. D. (2011). Knowledge translation in audiology: Promoting the clinical application of best evidence. Trends in Amplification, 15, 1–18. Ramig, L. O., & Verdolini, K. (1998). Treatment efficacy: Voice disorders. Journal of Speech, Language, and Hearing Research, 41, S101–S116. Raymer, A. M., Beeson, P., Holland, A., Kendall, D., Maher, L. M., Martin, N., . . . Gonzalez Rothi, L. J. (2008). Trans-

1  Introduction to Communication Sciences and Disorders

lational research in aphasia: From neuroscience to neurorehabilitation. Journal of Speech, Language, and Hearing Research, 51, S259–S275. Ruotsalainen, J. H., Sellman, J., Lehto, L., Jauhiainen, M., & Verbeek, J. H. (2007). Interventions for treating functional dysphonia in adults. Cochrane Database of Systematic Reviews. doi:10.1002/14651858.CD006373.pub2 Sackett, D. L., Rosenberg, W. M., Gray, J. A., Haynes, R. B., & Richardson, W. S. (1996). Evidence based medicine: What it is and what it isn’t. British Medical Journal, 312, 71–72. Tasko, S. M., McClean, M. D., & Runyan, C. M. (2007). Speechmotor correlates of treatment-related changes in stuttering

15

severity and speech naturalness. Journal of Communication Disorders, 40, 42–65. Tyler, A. A., Lewis, K. E., Haskill, A., & Tolbert, L. C. (2003). Outcomes of different speech and language goal attack strategies. Journal of Speech, Language, and Hearing Research, 46, 1077–1094. Van Riper, C. (1981). An early history of ASHA. ASHA Magazine, 23, 855–858. Wren, Y., Harding, S., Goldbart, J., & Roulstone, S. (2018). A systematic review and classification of interventions for speech-sound disorder in preschool children. International Journal of Language and Communication Disorders, 53, 446–467.

2 The Nervous System: Language, Speech, and Hearing Structures and Processes Introduction This chapter presents an overview of the nervous system, and how it functions in language, speech, and hearing. The chapter has been written specifically to support brain-related material covered in subsequent chapters. Students interested in detailed presentations on brain anatomy and function, including information relevant to speech, language, and hearing, are encouraged to consult the outstanding texts by Kandel, Schwartz, Jessell, Siegelbaum, and Hudspeth (2012), and Bear, Connors, and Paradiso (2015). Additional information on brain function for speech, hearing, and language can be found in Kent (1997), Bhatnagar (2013), and Hoit and Weismer (2016). In the current chapter, the anatomy and physiology of the basic unit of the nervous system — the neuron — is presented first. A quick tour of gross neuroanatomy follows. The topic of gross neuroanatomy applies to the structural components of the nervous system. The term neurophysiology denotes the study of brain function. Knowledge of both neuroanatomy and neurophysi-

ology is relevant to the role of the nervous system in speech, hearing, and language. Selected, examples of neuroanatomical and neurophysiological topics that are matched — that is, function associated with specific structures — are presented in Table 2–1.

Central and Peripheral Nervous Systems The nervous system includes all neural tissue in the body. The nervous system has two subcomponents, the central nervous system and the peripheral nervous system. The central nervous system (CNS) includes the cerebral hemispheres and their contents — the “brain” housed inside the skull — and the entire mass of tissue beneath the hemispheres (including the cerebellum, brainstem, and spinal cord: see later in chapter). The peripheral nervous system (PNS) includes the many nerves extending from the CNS to innervate (control the function of) various parts of the body. For example, nerves in the foot run up the leg and into your spinal cord; this 17

18

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Table 2–1.  Selected Examples of Neuroanatomy and Neurophysiology Topics in the Study of the Nervous System

Neuroanatomy

Neurophysiology

Structure of neuron membrane

Nature of electrical impulses conducted by neurons

Structure of nerve attachment to muscle

Release of neurotransmitter and its effect on muscle fibers

Clusters of cells in brainstem that send nerve fibers to muscles of the tongue

Movement of tongue when part of these brainstem cells are affected by disease

Relative volume of auditory cortex in left versus right hemisphere of brain

Observation via functional brain imaging of difference between left hemisphere and right hemisphere auditory cortex when speech is presented to listener

Note.  The examples are meant to clarify the difference between the two aspects of studying the nervous system.

nerve is part of the PNS. When the fibers of the nerve enter the spinal cord, they are in the CNS. Figure 2–1 provides an image of these two broad components of the nervous system; structures labeled “nerves” are part of the PNS, the remaining neural structures are part of the CNS. Figure 2–2 shows a simple way to understand the distinction between the CNS and PNS.

The Neuron Figure 2–3 shows a schematic drawing of two neurons. The neuron is the basic cell unit of the nervous system. The neuron has a cell body, with a nucleus at its center. This cell body issues a long fiber called an axon. When a brain is removed from an organ donor, fixed in a special solution, and dissected, two shades of tissue are seen. These two types of tissue are called the gray matter and the white matter of the brain. Gray matter consists of clusters of cell bodies, and white matter is composed of bundles of axons. Groups of cell bodies in in the nervous system are organized together for a specific function. Similarly, bundles of axons are pathways that connect groups of such specialized cell bodies in one part of the nervous system to another group of cell bodies elsewhere in the nervous system. The concepts of gray matter and white matter are critical to understanding both the anatomy and physiology of the brain.

Nervous System Cells Neurons are not the only cell type in the nervous system. The nonsignaling cells serve the purpose of providing structural and metabolic support to neurons. What is meant by “nonsignaling”? Neurons communicate with each other; they pass information from one neuron to other neurons. The nonsignaling cells in the CNS do not transmit information in the brain but nonetheless have critical functions. Most of these cells are called glial cells, and they are more numerous than neurons. Glia is the Greek word meaning “glue,” a fitting name for cells that hold neurons together. Glial cells also support neurons by “feeding” them with nutrients and oxygen. Some tumors of the brain have their beginnings in these glial cells, rather than in the neurons.

Axons have a whitish appearance because they are covered with a substance called myelin. Myelin functions like an electrical insulator, allowing axons to conduct electrical impulses at high speeds. Some axons, fewer than myelinated axons, lack myelin and conduct electrical impulses at relatively slow speeds. Axons may lose myelin as a result of disease, as in multiple sclerosis.

2  The Nervous System:  Language, Speech, and Hearing Structures and Processes

Cerebrum Diencephalon Midbrain (mesencephalon) Pons Cranial Nerves

Cerebellum

Medulla Spinal cord Cervical nerves

Thoracic nerves

Lumbar nerves

Sacral nerves Coccygeal nerve Figure 2–1.  The divisions between the central nervous system (CNS) and peripheral nervous system (PNS). Structures of the CNS include the cerebrum (cerebral hemispheres), the diencephalon, the brainstem (midbrain, pons, and medulla), the cerebellum, and the spinal cord. Structures shown in the PNS include the cranial nerves (red and light green ) and the spinal nerves. The cranial nerves emerge from the brainstem; the spinal nerves emerge from the spinal cord. See text for additional details.

19

20

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Nervous System CNS

PNS

Cerebral hemispheres

Nerves to and from brainstem

Brainstem

Nerves to and from spinal cord

Cerebellum

Spinal cord

Sensory receptors and motor endplates

Figure 2–2.  Simple summary of the components of the CNS and PNS.

Axon terminals

Dendrites

Cell body Nucleus

Axon

Synaptic cleft

Figure 2–3.  Two neurons. The image shows the cell bodies and their nuclei, axons, dendrites, axon terminals, and a single synapse.

Figure 2–3 shows spiny-like projections from the cell bodies of the two neurons. At the ends of the axons, there are long projections ending in button-like structures. The spiny projections are called dendrites, and the projections from the end of the axons are called terminal buttons (or simply, terminal segments). Both projections are specialized structures that allow information to be received or sent to other neurons; they are the places where one neuron “talks” to another. Neurons conduct electrical impulses that originate in the cell body and travel like electrical current down the axon to the terminal buttons. When the electrical

energy reaches the projections at the end of the axon, a small amount of a chemical substance, called a neurotransmitter, is released into the space between two or more axons. A portion of the chemical is deposited on the dendrites of another neuron’s cell body, which causes the membrane to change its electrical properties. If everything goes well (and it usually does), the changes in the membrane’s properties cause this neuron’s cell body to “fire” an impulse, which repeats the process of conducting electrical energy down the axon and releasing more chemical to affect another neuron’s cell body. Neurons talk to each other in this electro-

2  The Nervous System:  Language, Speech, and Hearing Structures and Processes

21

Myelin Trivia Here are two interesting myelin facts for your next trivia contest. First, many axons that conduct pain and temperature impulses from various parts of the body to the CNS are unmyelinated. The relatively slow “conduction time” of these pathways explains why reactions to extreme temperatures and painful stimuli often seem to take a long time. Most of us have experienced this when touching something very hot but apparently not realizing it for a second or two. Second, there are specific processes that “wrap” myelin around neurons, a process that has a long course of devel-

chemical way: electrical energy turned into chemical energy, turned again into electrical energy, and so forth. The transfer of information in the nervous system depends critically on this chain of events. Diseases that affect either the dendrites or the terminal buttons can have a significant effect on information transfer within the PNS and/or CNS and, therefore, on the behavioral function of human beings.

The Synapse As illustrated in Figure 2–3, a synapse is the space between the projections at the end of the axon (terminal buttons) and the spiny projections (dendrites) from the cell body of an adjacent neuron. The synapse also includes connections between the terminal buttons and dendrites, as illustrated by the dashed lines in Figure 2–3. Thus, the termination of the axon and dendrites of an adjacent cell body are joined together, linking them as a unit. Figure 2–3 does not do justice to the incredibly complicated structure and function of synapses throughout the nervous system. Two observations illustrate this complexity. First, within the CNS there are between 10 and 100 billion neurons. It really is not important if the actual number is 10, 20, 70, or 100 billion, it is sufficient to note the tremendous number of neurons packed into a relatively small space. With so many neurons in such a small volume, the size of the synaptic “spaces” between adjacent neurons is very, very tiny (roughly 20 nanometers, or 20/1,000,000,000th of a meter). The cell body of a single neuron is contacted by the projections from many axons, and the terminal buttons of one neuron contact the dendrites of multiple cell bodies. The pattern of connections among the huge number of neurons within the CNS is highly overlapped and dense.

opment, continuing well into the later teenage and young adult years. Perhaps this explains why toddlers seem to react so slowly to events that look as if they should be painful. Just about everyone has seen a small child fall to the ground and hesitate for what seems like a very long time before crying. One interpretation of the hesitation is that the child is waiting for an adult’s reaction, to see if she should be crying. Another interpretation is that the pain information moves at a slow rate in the toddler’s nervous system because of the relative lack of myelinization.

Second, the transmission of information between neurons depends on several different neurotransmitters, which are critical for normal brain function. When there is too much or too little of a specific neurotransmitter, signs of neurological disease may appear. Communication problems in neurological disease can be related to deficiencies or excesses of neurotransmitters.

Tour of Gross Neuroanatomy Gross anatomy is the study of the large-structure components of a human body part. What follows is a brief tour of the gross anatomy of the human nervous system. Figure 2–4 shows the cerebral hemispheres of the CNS as seen from above. The front of the cerebral hemispheres is to the left of the image. From this vantage point, the cerebral hemispheres are seen to consist of two symmetrical halves. These halves are the right (top of image) and left (bottom of image) cerebral hemispheres. The right and left hemispheres are grossly symmetrical, but in certain cases, parts of one side are bigger or differently shaped than the analogous part on the other side. More importantly, perhaps, the two halves of the brain are not symmetrical in their functions. The left hemisphere is specialized for certain functions and the right hemisphere for other functions. Some of these specializations are discussed in greater detail later in this chapter. The visible tissue in Figure 2–4 is the cortex. The cortex is composed of densely-packed cell bodies — ​ gray matter — that perform the most complex functions of an organism. The highly developed and extensive cortical tissue in humans is responsible for the wide range of human abilities that are far more sophisticated than the abilities of non-humans.

22

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Gyri

Sulci

Right Hemisphere

Left Hemisphere Figure 2–4.  View from above of a fixed human brain, with the front of the brain to the left of the image. Note the two hemispheres of the brain (right hemisphere, top half of image, left hemisphere, lower half of image). The gyri are the hills of tissue, the sulci the fissures (grooves) between the gyri.

The two hemispheres are connected by a massive bundle of axons (white matter) called the corpus callosum. Axons arising from cortical cells in one hemisphere travel to the other hemisphere where they make synapses with other cortical cells. There are roughly 200 million axons in the corpus callosum; the connections between hemispheres are fine-grained and dense. One hand knows what the other hand is doing. The surface of the brain appears as thick, humped ridges separated by deep fissures. The ridges are called gyri (singular = gyrus), and the fissures are called sulci (singular = sulcus) or fissures. The gyri and sulci give the cortex its characteristic appearance. Humans have much more cortical tissue than even the most advanced primates, such as chimpanzees. As suggested previously, the volume and complexity of human cortical tissue constitute one reason for the huge cognitive “edge” we enjoy relative to other members of the animal kingdom. This edge almost certainly includes, and perhaps is defined by, our ability to use speech and language in creative and novel ways. Figure 2–5 shows an artist’s rendition of a side view of the left hemisphere of the brain, plus parts of the brainstem and spinal cord (discussed later). Both hemispheres have four lobes, including the frontal, parietal, temporal, and occipital lobes. The same four lobes can also be identified in the right hemisphere.

Frontal Lobe The frontal lobe, as suggested by its name, is the front part of the cerebral hemispheres. It is separated from the more back part of the brain by a long, deep sulcus running down the side of the brain. This central sulcus is a dividing line between the front and back of the brain. The frontal lobe is separated from the temporal lobe below by the sylvian fissure (also called the lateral sulcus, see Figure 2–5). The frontal lobe has many functions, including executive function. Executive function includes the skill of planning actions, connecting current behavior with future consequences of that behavior, and imposing organization on the tasks of everyday life. Diseases that affect the areas of the frontal lobe that control executive function have a major impact on behavior, including communication. The frontal lobe contains a gyrus, directly in front of the central sulcus, called the primary motor cortex (Figure 2–5, shaded blue). This gyrus extends from the top of the brain to the sylvian fissure and contains the cell bodies of neurons that control muscle contractions in all parts of the body. The arrangement of these cells is systematic, from the top to bottom of the primary motor cortex. Muscles toward the bottom of the body, such as the lower leg and foot, are controlled by cells

2  The Nervous System:  Language, Speech, and Hearing Structures and Processes

23

Central sulcus

Parietal lobe Frontal lobe

Occipital lobe Temporal lobe

Sylvian fissure

Pons

Cerebellum

Medulla oblongata Spinal cord Figure 2–5.  Side view of the left hemisphere, showing the four lobes as well as the brainstem, cerebellum, and upper part of spinal cord.

at the top of the primary motor cortex. Muscles of the face, tongue, and lips, on the other hand, are controlled by cells at the bottom of the primary motor cortex, close to the sylvian fissure. The top-to-bottom arrangement of cells in the primary motor cortex is therefore an inverted representation of muscles in the body. This systematic arrangement of cells in the primary motor cortex is called somatotopic representation. Other parts of the frontal lobe take part in the planning of action. Even a simple action, such as reaching for a doorknob to open a door, requires a complex and properly sequenced set of muscular contractions. The force of contraction and the timing of the sequence of muscles to be used must be based on a plan. The frontal lobe plays a major role in the learning and activation of these plans. Such plans are essential to speech production. To speak the word “production,” the contraction patterns of the many muscles of the head, neck, and respiratory system must be planned for accurate articulation of speech sounds, their sequencing, and prosody of the three-syllable utterance. A disorder called apraxia of speech affects this planning of speech sequences and may have an underlying cause of damage to the frontal lobe of the left hemisphere. Broca’s area is also located in the frontal lobe. Broca’s area is just in front of the primary motor cortex (see

previous description). It is known to be a “speech center” of the brain, containing tissue that is essential for speech production. Broca’s area is discussed in greater detail in the section, “The Dominant Hemisphere and the Perisylvian Language Areas.”

Occipital Lobe The occipital lobe forms the back of each cerebral hemisphere. The occipital lobe has front boundaries with both the parietal and temporal lobes (see Figure 2–5). The sulci (plural of sulcus) separating the occipital lobe from the parietal and temporal lobes are not as easily seen as the more dramatic central sulcus and sylvian fissure. The primary function of the occipital lobe is to process visual stimuli.

Temporal Lobe The temporal lobe forms much of the lateral (side) part of the cerebral hemispheres and is separated from the frontal and parietal lobes by the sylvian fissure. The temporal lobe plays an important role in hearing. In Figure 2–5, three gyri of the temporal lobe can be seen,

24

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

each one oriented more or less horizontally, with a slight upward tilt from the forward tip of the lobe to the back boundary adjoining the occipital lobe. The top-most gyrus contains the primary auditory cortex (see Figure 2–5), which contains neurons that receive auditory impulses originating in the sensory end organ of the ear (called the cochlea, covered in more detail in Chapter 22). The temporal lobe also plays an important role in the lexicon (words and their meanings stored in the cerebral hemispheres) and in the relations between words. It also is important to aspects of speech and language perception. The temporal lobe also plays an important role in memory and emotion.

Parietal Lobe The parietal lobe extends from the central sulcus back to the front boundary of the occipital lobe. A portion of the parietal lobe is separated from the temporal lobe by the sylvian fissure. The parietal lobe shares boundaries with the other three lobes of the cerebral hemispheres. The parietal lobe contains a gyrus, immediately in back of the central sulcus, called the primary sensory cortex (Figure 2–5, shaded yellow). This gyrus parallels the course of the primary motor cortex of the frontal lobe, extending from the top of the brain to the sylvian fissure. The cell bodies of the primary sensory cortex can be thought of as the final station in the brain for the collection of tactile (touch) information (other types of sensory information also find their way to the primary sensory cortex). Like the primary motor cortex, cells in the primary sensory cortex are arranged somatotopically, with touch information from the lower part of the body represented at the top of the gyrus, and touch information from the face, tongue, and lips represented toward the bottom of the sensory cortex. In addition to the primary function of touch sensation, the parietal lobe integrates large amounts of sensory data and plays an important role in coordinating various sources of information critical to cognitive functions, including language. Extensive connections exist between the parietal lobe and each of the other three lobes, allowing the integration of visual, auditory, and touch sensations, as well as motor control information. This integration is fundamental for higher-level control and cognitive functions.

Hidden Cortex The four lobes in each hemisphere are seen easily on the surface of the brain. There is more to the cortex than

just these surface features, however, and especially so in humans. The evolution of the human brain required fitting the enormous computing power of cortical neurons into a relatively small space (the skull). Clearly, there is room for just so much area on the surfaces shown in Figure 2–5. Human evolution took a novel approach to packing more cortex into this small space by “burying” millions of cortical cells in the deep sulci separating the many gyri that define the surface of the brain. In a “fixed” brain (one that has been hardened somewhat with a special solution, prior to removal from the skull), like the one shown in Figures 2–4 and 2–5, adjacent gyri can be pulled apart to reveal previously hidden, interior walls of cortical cells. The cortex is not simply the surface of the brain and its thickness of cell bodies, but also the cell bodies buried within the deep sulci of the human brain. When a human brain is compared to that of other animals, a striking feature is the greater complexity and much deeper sulci of the human version. In so-called lower animals, the surface of the brain may look positively smooth compared to the human brain. The smoothness reflects the absence of deep sulci and, therefore, the absence of extra cortical cells. In the human brain, there is also hidden cortex beneath the temporal, frontal, and parietal lobes. In a fixed human brain, the front end of the temporal lobe can be pulled away from the rest of the brain to expose additional cortical gyri. These gyri make up the insular cortex (or, simply, the insula). Many scientists believe the insular cortex plays an important role in speech, as well as in memory and emotional functions.

Subcortical Nuclei Figure 2–6 shows a view of the cerebral hemispheres in which the cortical surface and white matter in the cerebral hemispheres have been made transparent, revealing structures inside the hemispheres. As previously described, clusters of cell bodies within the brain are called nuclei; the structures shown in Figure 2–6 are several of the subcortical nuclei (below the cortex, within the hemispheres). Subcortical nuclei include the basal ganglia (sometimes called the basal nuclei) and the thalamus. The basal ganglia include five separate nuclei, which for simplicity are shown in Figure 2–6 as a single, but complex, structure. The basal ganglia play an important role in control of movement, including movements of the speech mechanism. Damage to the basal ganglia may impair a person’s ability to produce speech. In addition, the basal ganglia play a role in language. Although the lan-

2  The Nervous System:  Language, Speech, and Hearing Structures and Processes

25

Basal ganglia

Thalamus

Figure 2–6.  Selected subcortical nuclei, contained within the cerebral hemispheres. Structures of the basal ganglia and thalamus are included in this picture.

guage ability of humans is often thought of as powered by cortical cells, the basal ganglia and cortex are extensively interconnected and communicate with each other. Many scientists believe these subcortical nuclei have an important role in human communication. The thalamus, the egg-like structure in the middle

of the hemispheres (shaded pale red in Figure 2–6), is a collection of nuclei that serves as the main connection between the basal ganglia and cortex. It is also the main “relay station” for the transmission of sensory events (e.g., touch, vision, auditory) from the outside world to the cortex.

The Basal Ganglia and Parkinson’s Disease Various neurological diseases, such as Parkinson’s disease, Tourette’s syndrome, and Huntington’s disease, are known to be related to damage to structures of the basal ganglia. In Parkinson’s disease, for example, patients show several signs (evidence of a disease that can be observed on clinical examination), including rigid (stiff) limbs, slow or absent movement, and tremors. These signs are evidence of disruption of basal ganglia physiology, mostly having to do with a deficiency of a neurotransmitter called dopamine. Some of the lost dopamine can be replaced with drugs. Another form of therapy involves insertion of a tiny electrode in the brain to provide electrical stimulation to structures in the basal ganglia. The therapeutic procedure, called deep brain stimulation (DBS), often provides some relief from the signs noted above.

26

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Figure 2–7 shows another view of the brain that illustrates the relationship of subcortical to cortical structures. This is a frontal cut through the cerebral hemispheres of a human brain, as if the hemispheres are facing you and cut through to separate them into front and back halves; the interior of the brain is exposed as the front or “face” of the back half. Figure 2–7 shows the difference between gray and white matter in the cerebral hemispheres. The deep sulcus in the middle and toward the top of the image is the dividing line between the left and right hemispheres of the brain. Note the grayish, top layer of the brain, having a thickness somewhat like the rind of a watermelon. Note also the deep sulci in which the gray matter forms interior “walls” of cortex. Immediately below the rind of cortical cell bodies there is whitish tissue — the white matter of the brain. These are bundles of axons — fiber tracts — running between different groups of cell bodies throughout the CNS. Any “chunk” of white matter within the cerebral hemispheres is a densely interwoven mesh of all these different connections. The individual structures and their names are not the point of Figure 2–7. Rather, the image presents a good orientation to the distinction between gray and white matter. In addition, it shows the cortical gray matter as well as the subcortical gray matter (subcorti-

cal nuclei). Finally, the image provides a good look at the dense network of white matter within the cerebral hemispheres.

Brainstem, Cerebellum, and Spinal Cord The part of the central nervous system extending below the cerebral hemispheres includes the brainstem, cerebellum, and spinal cord. These parts of the brain also play important roles in communication and its disorders.

Brainstem In Figure 2–5, a stalk-like structure descends from the cerebral hemispheres. Its top part is the brainstem; its downward continuation is the spinal cord. The brainstem is divided into three major parts, including the midbrain (hidden in Figure 2–5 by the lower edges of the temporal lobe), pons, and medulla. The brainstem serves a host of functions including regulation of blood pressure, breathing, production of saliva and perspiration, and level of consciousness. Very serious damage to the brainstem is, in most cases, not consistent with the maintenance of life.

White matter

Caudate Thalamus

Putamen Globus pallidus

Subthalamic nucleus Substantia nigra

Figure 2–7.  Image of brain structures as if the brain is cut into front and back halves; the perspective is looking toward the “face” of the back half. Gray and white matter is shown, including the gray matter of the cortex and of subcortical nuclei. White matter is seen directly below the cortex, and surrounding the subcortical nuclei.

2  The Nervous System:  Language, Speech, and Hearing Structures and Processes

The brainstem also contains motor neurons — neurons specialized for motor control, such as muscle contraction — that control muscles in the head and neck. There are also sensory neurons that receive sensation from head and neck structures. The cell bodies of these neurons are organized into dense, small nuclei in various parts of the brainstem. The motor neurons in the brainstem nuclei receive commands from cortical motor neurons, and together the cortical and brainstem motor neurons exert control over muscles of the larynx, throat, tongue, lips, and jaw. The motor neurons in the brainstem send out fiber tracts that exit the CNS as cranial nerves (see Figure 2–1). The term “cranial” nerve is used to distinguish these from nerves running to and from the spinal cord (“spinal” nerves), as explained later in this chapter. There are 12 paired cranial nerves (CNs). They are referred to by Roman numerals (e.g, CN V; CN XII) as well as by their names (e.g., CN V = trigeminal nerve; CN XII = hypoglossal nerve). Five of the cranial nerves carry information from brainstem motoneurons to muscles of the head and neck that control movements of the jaw, tongue, soft palate, pharynx, and larynx during speech. One of the nerves serves the sense of hearing, and another nerve plays a role in breathing during speech. The brainstem and the cranial nerves attached to it play a critical role in the control of the speech mechanism and in hearing. Damage to brainstem nuclei or the cranial nerves results in weakness or paralysis of speech muscles as well as loss of sensation in head and neck structures.

Cerebellum The cerebellum is the large mass of tissue immediately behind the brainstem (see Figure 2–5). It is recognizable in fixed brain preparations because of its size, unusual appearance — somewhat like a cauliflower — relative to other surface features, and because of its location at the back and base of the cerebral hemispheres. Like the cerebral hemispheres, the cerebellum contains gray and white matter. The cerebellum has extensive connections with the cerebral cortex, brainstem, spinal cord, and basal ganglia.1 The cerebellum has been likened to a computer that integrates vast amounts of information about muscle contraction, signals coming into the brain from the outside world including the body surface, the location of the head relative to the body, and the general state of brain activity. This integration produces the smooth, 1 

27

coordinated movement of everyday life, from walking to making a jump shot while surrounded by nine other players. The cerebellum is important for other brain functions as well, but its role in movement coordination is very prominent. Patients with diseases of the cerebellum often lose the ability to produce smooth movement patterns.

Spinal Cord The spinal cord extends from the bottom of the brainstem down the back, terminating a little below the waist. The spinal cord contains cell bodies that control muscles of the arms, legs, chest, and other parts of the torso, as well as cell bodies that receive sensory information from those structures. The nerves that run to and from the spinal cord are called, not surprisingly, spinal nerves. Of primary importance for the purposes of this text are the spinal nerves that serve muscles and structures of the respiratory system. These muscles are located between the ribs, in the abdomen, and in the neck. Because the respiratory system plays a critical role as the “power supply” for speech production (Chapter 10), damage to cell bodies within the spinal cord or to the spinal nerves can result in speech breathing problems.

The Auditory Pathways The nervous system pathway that connects the sensory organ for hearing to the auditory cortex is specialized for auditory analysis. The auditory pathways are shown schematically in Figure 2–8. The bilateral, auditory pathways begin in the cochlea (marked number 1 on Figure 2–8), which is the sensory end organ of hearing. “Bilateral” means the pathways are the same on both sides of the head. The auditory nerve (CN VIII) (number 2) emerges from the cochlea and transmits auditory information to the brainstem. The nerve enters the CNS, and its fibers make synapses with several nuclei in the lower brainstem (numbers 3 and 4). At brainstem location #4, the pathways make an interesting turn. Roughly 75% of the ascending fibers cross over to the other side of the brainstem (red pathway) where they make a synapse with another nucleus; the remaining 25% of the fibers (green pathway) make synapses with cells of the corresponding nucleus on the side of entry in the brainstem. As the auditory fibers ascend, they make synapses in more brainstem nuclei (represented

 onnections between the cerebellum and basal ganglia have not always been included in neuroanatomy textbooks, but research in the past C 10 years has identified these connections (Bostum, Dum, & Strick, 2010, 2018).

28

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Auditory cortex 7

6

5

4

2

3

Auditory nerve

Cochlea 1 Figure 2–8.  A simplified view of the peripheral and central auditory pathways, from cochlea to cortex.

by number 5) before ascending to the thalamus (number 6). The final destination of the auditory pathway is the auditory cortex (number 7). At each succeeding level along the auditory pathway, the analysis of the auditory information becomes more complex and sophisticated. The crossing of auditory fibers from one side of the brainstem to the other (point number 4 in Figure 2–8) means that auditory analysis in the cortex of one hemisphere is primarily from the ear on the opposite side.

The Dominant Hemisphere and the Perisylvian Language Areas The concept of specialization of function is that within the brain there are certain clusters of cells and their con-

nections to other clusters of cells that are specialized or even dedicated to certain functions, behaviors, or processes. Two well-known examples of such specialization are the gyri in the left frontal lobe called Broca’s area, and the area of tissue in the top gyrus of the left temporal lobe and perhaps extending to a small area of the parietal lobe called Wernicke’s area. These two areas of a preserved human brain are shown in the left image of Figure 2–9. The central sulcus (see Figure 2–5) is shown for orientation. Broca’s area is thought to be specialized for the production of speech, and Wernicke’s area for the reception (comprehension) of speech and language. Of particular importance is the specialization of these areas in the left hemisphere of the brain. Broca’s and Wernicke’s areas are therefore not only specialized, they are also lateralized. Lateralization of function is the term used to

2  The Nervous System:  Language, Speech, and Hearing Structures and Processes

29

Arcuate fasciculus Central sulcus Wernicke’s area

Broca’s area

Wernicke’s area

Broca’s area Ventral stream

Figure 2–9.  Left, Broca’s area and Wernicke’s are shown on a preserved brain; the central sulcus is shown as a landmark separating the frontal and parietal lobes. Right, two pathways connecting Wernicke’s and Broca’s areas — the arcuate fasciculus and the ventral stream.

denote brain tissue specialized for a specific function, but only on one side of the brain. According to research estimates, approximately 90% of humans have speech and language functions lateralized to the left hemisphere (Bear, Connors, & Paradiso, 2016). How was this specialization and lateralization of function for speech and language discovered? Paul Broca (1824–1880), a 19th-century French physician who followed up on similar work of other physicians, had a patient who suffered brain damage resulting from syphilis and was almost completely unable to produce speech. The patient’s only speech consisted of one syllable, “tan,” which he repeated over and over. The patient seemed to understand speech well. Broca suggested the lesion, wherever it was in the brain, affected speech production, but not speech perception and language comprehension. “Lesion,” from the Latin alaedere (“injury”), means damaged or destroyed tissue. Following the patient’s death, an autopsy showed a large lesion in the lower part of the left frontal lobe, adjacent to and just forward of the primary motor cortex. The location of this lesion is indicated in Figure 2–9 as “Broca’s area.” Broca, as well as other physicians, saw additional patients with the same lesion location and the same speech symptoms. Broca concluded that this part of the brain was specialized for the articulation of speech. The lateralization of this speech articulation center to the left hemisphere was discovered in the following way: patients whose brains showed lesions in the same location as Broca’s area, but in the right hemisphere, did not have speech articulation deficits. The brain was therefore not specialized for articula-

tion in a symmetrical way. For these reasons, the term “Broca’s area” is reserved for the left hemisphere. A similar story can be told for Wernicke’s area. Carl Wernicke (1848–1905) was a Prussian (his birthplace now part of Poland) physician who was interested in the relationship between brain function and speech and language. Wernicke published a book in 1874 in which he described a link between lesions in the top gyrus of the left temporal lobe, close to where the temporal lobe meets the parietal lobe, and difficulty comprehending speech and language but little difficulty articulating speech. However, the speech produced by these patients often conveyed jumbled ideas and even jargon — speech that sounded like strings of words but lacked meaning. The lesion location described by Wernicke (Wernicke’s area) is indicated in the left image of Figure 2–9 Wernicke’s area was thought of as the language comprehension center of the brain. Like the lateralized function of Broca’s area, the language comprehension problems emerged with damage to the left hemisphere. Damage to the same location in the right hemisphere did not result in language comprehension problems. Research to the present day confirms, in general ways, the specialization of Broca’s area for speech articulation and Wernicke’s area for language comprehension. In the great majority of humans, this specialization is lateralized to the left hemisphere. The left hemisphere is often referred to as the dominant hemisphere because it houses the specialization for speech and language. There is a good deal of controversy concerning the specifics of Broca’s and Wernicke’s areas, including

30

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

their exact boundaries and the restriction of one area to speech articulation (Broca’s) and the other to language comprehension (Wernicke’s). For example, damage to Broca’s area may result in certain language comprehension problems, and damage to Wernicke’s area can produce problems with speech articulation. More is said about these issues in Chapter 9.

Arcuate Fasciculus (Dorsal Stream) and Ventral Stream Not surprisingly, Wernicke’s area and Broca’s area are connected by thick bundles of axons. One tract, shown on the right side of Figure 2–9, is called the arcuate fasciculus (“fasciculus” is a bundle of fibers; “arcuate” is descriptive of the arch-like configuration of the tract). The arcuate fasciculus connects cell bodies in Wernicke’s area to cell bodies in Broca’s area. The tract in Figure 2–9 is superimposed on the surface of the left hemisphere but actually runs deep (beneath) to the cortex within the temporal, parietal, and frontal lobes. A fiber tract that connects Broca’s and Wernicke’s areas seems to make perfect sense. For example, our ability to repeat what someone says must involve some auditory analysis of speech, in and around Wernicke’s area, and the transfer of that analysis to Broca’s area where the heard speech is readied for production. In fact, scientists have argued that the arcuate fasciculus is the pathway for connecting the auditory analysis of incoming speech sounds (Wernicke’s area) to the articulatory characteristics that produced the sound (Broca’s area). The arcuate fasciculus is discussed further in Chapter 9.

Speech perception must lead to meaning, however, so the arcuate fasciculus cannot account for linking sounds to words; it is mainly for the identification of speech sounds. The ventral stream pathway runs between Wernicke’s area and cortical regions adjacent to (or in) Broca’s area, and is thought to connect sounds with word meanings. The arcuate fasciculus is considered as the “upper loop” (dorsal stream) connecting auditory analysis areas in the temporal lobe to articulatory areas (Broca’s area) in the frontal lobe. The ventral stream runs between Wernicke’s areas and the lower part of the temporal lobe before it connects to frontal lobe regions in and around Broca’s area. The upper-loop/lower-loop model of integrating auditory analysis with articulation (upper loop) and auditory analysis with meaning (lower loop) is called the dualstream model (Fridriksson et al., 2018). Notice in Figure 2–9 how the arcuate fasciculus (dorsal stream) and ventral stream form a flattened loop around the sylvian fissure. In a landmark study on brain activity and speech and language, the Canadian neurosurgeon Wilder Penfield (1891–1976) used electrical currents to stimulate the cortex in and around the tissue enclosed in this flattened loop. The patients on the operating table were having tissue excised because of severe epilepsy. Penfield was able to stimulate the perisylvian areas with electrical currents while the brain was exposed. Penfield found that almost any stimulus around the sylvian fissure of the left hemisphere evoked some form of language behavior in his awake patients (Penfield & Roberts, 1959). Autopsies of patients who had suffered strokes and who had speech and language problems also revealed frequent damage in this loop of tissue. The part of the cerebral hemi-

Speech and Language, Together Always Readers may notice the use of “speech” and “language” as different components of the communication process (see Chapter 3). To some extent they are. “Speech” usually refers to the planning and production of the sound sequences that form words. “Language” refers to the representations and the “public” usage of those symbols to communicate. Some examples of differences between speech and language are (a) a stroke survivor who produces speech with no errors but with no meaning — the speech sounds and words are correct but the use of the words does not fit the situation (such as a response to a specific question) and are combined in meaningless ways (as in the

previous example of damage to Wernicke’s area; (b) a 9-year old child with cerebral palsy who has difficulty producing clear speech sounds as a result of a speech motor disorder, but who has an age-appropriate vocabulary, sequences words for sentences like a typically developing 9-year-old child, and comprehends speech perfectly; and (c) a typically developing 4-year-old child who has age-appropriate speech production but has not mastered morphemes, the minimal units of meaning discussed in Chapter 3. Of course, speech and language skills interact, but examples like those presented show how they can be separable.

2  The Nervous System:  Language, Speech, and Hearing Structures and Processes

spheres enclosed within the dorsal and ventral streams is referred to as the perisylvian cortex (“peri” being a prefix meaning “around” or “enclosing”).2 The perisylvian cortex is assumed to be important to normal speech and language functioning, and damage to this region of the brain is likely to result in a communication impairment, ranging from mild to severe depending on the extent of the damage.

Functional Magnetic Resonance Imaging and Speech and Language Brain Activity The identification of speech and language areas of the brain is often based on an approach of linking parts of the brain with specific functions. A patient is seen, her symptoms documented, and if the patient passes away her brain may be autopsied to locate lesions that might explain the symptoms. If a sufficient number of patients are studied with the same symptoms and lesion location, the damaged part of the brain is thought to be critical to the normal behavior compromised by the neurological disease. For example, if patients who have suffered strokes share a common symptom of reading problems, and later autopsies reveal that all these patients had lesions in the same part of the parietal lobe, that part of the brain is thought to be critical to normal reading ability. This approach to identifying functions of the brain is limited. It is unusual to find a sufficiently large group of patients with exactly the same symptoms, and precisely the same lesion location. Thus, the ability to generalize between lesion locations and brain function is limited by the lack of a sufficiently large sample of cases to “prove” the point that brain structure “x” causes behavior “y.” The last half-century has seen a technological revolution in the ability to generate images of the brain in living individuals. This revolution includes imaging techniques to identify specific brain structures and make precise measures of their length, width, volume, and tissue type. Enhancements of these techniques are also available to monitor the activity of specific brain structures as a person performs different tasks. This chapter closes with a brief discussion of how one technique to monitor brain activity during speech and language tasks has enhanced understanding of speech and language functions of the brain. 2

31

Functional Magnetic Resonance Imaging Magnetic resonance imaging (MRI) is a technique that was developed in the 1970s and 1980s to produce very detailed images of body structures. A strong magnetic field surrounding the body part of interest reacts to properties of biological tissue that affect the magnetic field. These reactions are reconstructed as an image having exquisite detail. Figure 2–10 shows an MR image of the right side of the head and neck. The image was programmed to show structures of the medial (inside) wall of the left hemisphere, as if the other hemisphere has been removed from the view. The gyri and fissures that define the surface and thickness of the cortex are easy to see, as is the white matter below the cortex. The cerebellum, midbrain, pons, and medulla of the brainstem, and the cervical (neck) part of the spinal cord are also imaged clearly. The thick, arch-like band of white matter just below the cortical tissue and in the center of the image is the corpus callosum, the fiber tract that connects the two hemispheres.

Figure 2–10.  MR image of the medial surface of a cerebral hemisphere, brainstem, and spinal cord.

 The term “perisylvian language areas” typically refers to the tissue in the left, speech- and language-dominant hemisphere. This is consistent with the facts in a general way, but of course, when you start digging it is more complicated. For example, the arcuate fasciculus (dorsal stream) is thought to be strongly lateralized to the left hemisphere. In contrast, the ventral stream, which connects auditory analysis to word meaning, is thought to be active in both hemispheres (Fridriksson et al., 2018).

32

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

About 30 years ago, MRI technology was enhanced to monitor brain activity as an individual performed a task. The technology is called functional MRI (abbreviated as fMRI). fMRI shows which parts of the brain “light up” for different tasks. Neurons use more oxygenated blood when active, as compared with resting. Neurons in a region of more heavily oxygenated blood emit a different magnetic signal compared with neurons in a less heavily oxygenated blood region. With the proper equipment and software, an MRI scanner can detect these oxygen differences and show locations in the brain that are presumed active during the performance of specific tasks. A lesson learned from fMRI studies of speech and language behavior is that many areas of the brain “light up” for these tasks. Broca’s and Wernicke’s cortical areas are far from the complete story of how the brain functions for human communication. Figure 2–11 shows data from multiple studies on brain region activation for oral language tasks (i.e., speech production) (Ardila, Bernal, & Rosselli, 2016a). The surface of the left hemisphere is shown on the left, the medial wall of the left hemisphere is shown on the right. On the surface of the left hemisphere, Broca’s area is active for oral language (purple) but frontal lobe regions adjacent to but more forward than Broca’s area are also active (red). The blue areas include not only the motor cortex in the frontal lobe (gyrus immediately anterior to the central fissure) but cortical tissue in the

region of Wernicke’s area. Activity is also seen on the middle wall of the left hemisphere, both in cortical and subcortical structures. The “big picture” conclusion from these analyses is that the brain areas that “control” speech production (or, perhaps more generally, language expression) are more widespread than Broca’s area. The same conclusion has been reached about Wernicke’s area — more brain regions are active for language comprehension than the region around the upper temporal lobe (Ardila, Bernal, & Rosselli, 2016b). Because these different areas “light up” together during communication behavior, it appears they are connected as a network. It is the network that is important, not simply brain centers such as Broca’s and Wernicke’s areas.

Diffusion Tensor Imaging If communication is controlled by a network in which clusters of cell bodies (such as in the cortex) are connected, it follows that the proper connections — fiber tracts — must exist. Computed tomography (CT), MRI, and fMRI images show gray and white matter but do not show specific fiber tracts that connect specific groups of cell bodies such as Wernicke’s and Broca’s areas. An MR technique called diffusion tensor imaging (DTI) creates images of specific fiber tracts (such as the arcuate fasciculus) that connect cell bodies.

Figure 2–11.  Summary of areas of cerebral cortex of left hemisphere that are active during oral language. Note activity in multiple areas. See text for additional details. Reproduced with permission from Ardila, A., Bernal, B., & Rosselli, M. (2016a). How localized are language brain areas? A review of Brodmann areas involvement in oral language. Archives of Clinical Neuropsychology, 31, 112–122.

2  The Nervous System:  Language, Speech, and Hearing Structures and Processes

At least two important lessons have been learned from DTI research. First, the necessary connections exist. Second, certain brain diseases have as much or more damage in the fiber tracts as they do in cortical structures. In these diseases, the disruption of connections is as important to the disease process as is damage to gray matter. A good example is Alzheimer’s disease, in which dementia is a primary symptom. Dementia is almost always associated with a communication disorder. The presence of white matter disease in dementia-associated communication disorders shows that the planning, production, and comprehension of language are dependent as much on the connections (white matter) between nuclei and cortical cell bodies as they are on the cell bodies themselves (gray matter). This recently acquired knowledge lends further support to the idea of a brain network for the production and perception of speech and language.

Chapter Summary The nervous system includes the CNS and PNS. The CNS includes the cerebral hemisphere and its contents, the brainstem, cerebellum, and spinal cord; the PNS includes all the nerves connected to the brainstem and spinal cord, these nerves carrying information from the CNS to different parts of the body (motor), or conveying information from different parts of the body to the CNS (sensory). The basic cellular unit of the nervous system is a neuron. The neuron has a cell body and an axon. Clusters of cell bodies make up the gray matter of the brain, and bundles of axons the white matter. The basic function of a neuron is to conduct electrical impulses from the cell body via its axon to other axons; neurons communicate with other neurons at synapses, where electrical energy is converted into chemical energy and then back into electrical energy. The electrical energy of one neuron causes a neurotransmitter to be released at the end of its axon, which affects the cell body of another neuron by causing it to “fire” (that is, conduct an electrical impulse). The surface of the cerebral hemispheres shows ridges (gyri) and deep fissures (sulci); the surface is called the cortex, a thick layer consisting of gray matter that cover the cerebral hemispheres somewhat like the rind of a watermelon. Many millions of neuron cell bodies are packed into a relatively small volume in the human brain, partly by “hiding” additional cortical surfaces in the walls of the sulci.

33

Clusters of cell bodies within the cerebral hemispheres but below the cortex are called subcortical nuclei; the cortex and subcortical nuclei are connected by fiber tracts. Both cerebral hemispheres are organized into four lobes, the frontal, parietal, occipital, and temporal lobe. The left hemisphere is specialized for speech and language in about 90% of people. Broca’s area, in the frontal lobe, appears to be strongly associated with the production of speech, and Wernicke’s area, mainly in the temporal lobe but also involving a small region of parietal cortex, is strongly associated with comprehension of speech/ language. The specialization of these areas in the left, and not right, hemispheres was originally discovered in patients who had suffered strokes; damage in one of these areas in the left hemisphere resulted in speech and language problems, whereas damage to the same areas in the right hemisphere did not have the same effect on communication abilities. In the left hemisphere, cortical tissue surrounding the sylvian fissure (perisylvian language areas) plays a prominent role in speech and language. This tissue includes Broca’s and Wernicke’s areas, other parts of the cortex, and fiber tracts that connect different cortical areas. The brainstem is part of the CNS that contains nuclei for the control of head and neck muscles — that is, the muscles that control speech articulation and phonation; the brainstem also contains nuclei for sensation from head and neck structures. The spinal cord contains analogous nuclei, for the control of muscular contraction of, and sensation from, respiratory structures and other important body structures (such as the arms and legs); the spinal cord is important for breathing in general, and breathing for speech in particular. The basal ganglia, a group of subcortical nuclei, play an important role in movement. The thalamus, a subcortical nucleus consisting of many smaller nuclei, is the main sensory relay to the cortex; the information from almost all sensory stimuli makes a final synapse in the thalamus before being sent to the cortex. The cerebellum communicates with all parts of the brain and is important for coordination of movement. Modern techniques of imaging the brain allow for precise identification of the size and activity levels of specific brain structures. Studies using these techniques suggest that the function of the brain is very complex, and that a network of many different brain structures is involved in speech and language function.

34

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

References Ardila, A., Bernal, B., & Rosselli, M. (2016a). How localized are language brain areas? A review of Brodmann areas involvement in oral language. Archives of Clinical Neuropsychology, 31, 112–122. Ardila, A., Bernal, B., & Rosselli, M. (2016b). How extended is Wernicke’s area? Meta-analytic connectivity study of BA20 and integrative proposal. Neuroscience Journal, https://doi. org/10.1155/2016/4962562 Bear, M. F., Connors, B. W., & Paradiso, M. (2015). Neuroscience: Exploring the brain (4th ed.). Philadelphia, PA: Wolters Kluwer. Bhatnagar, S. C. (2013). Neuroscience for the study of communicative disorders (4th ed.). Philadelphia, PA: Wolters Kluwer/ Lippincott Williams & Wilkins. Bostum, A. C., Dum, R. P., & Strick, P. L. (2010). The basal ganglia communicate with the cerebellum. Proceedings of the National Academy of Sciences, 107, 8452–8456.

Bostum, A. C., Dum, R. P., & Strick, P. L. (2018). Functional anatomy of basal ganglia circuits with the cerebral cortex and the cerebellum. Progress in Neurological Surgery, 33, 50–61. Fridriksson, J., den-Oden, D-B., Hillis, A. E., Hickok, G., Rorden, C., Basilakos, A., . . . Bonilha, L. (2018). Anatomy of aphasia revisited. Brain, 14, 848–862. Hoit, J. D., & Weismer, G. (2016). Foundations of speech and hearing: Anatomy and physiology. San Diego, CA: Plural Publishing. Kandel, E. R., Schwartz, J. H., Jessell, T. M., Siegelbaum, S. A., & Hudspeth, A. J. (2012). Principles of neural science (5th ed.). New York, NY: McGraw-Hill Medical. Kent, R.D. (1997). The speech sciences. San Diego, CA: Singular Publishing. Penfield, W., & Roberts, L. (1959). Speech and brain mechanisms. Princeton, NJ: Princeton University Press.

3 Language Science Introduction What is language? In lay terms, it is the “thing” we use to communicate. Ask the average person on the street this question, and he or she may respond, “You know, it’s the words, the sentences, stuff like that.” And indeed, it would be difficult to argue with this answer, because the words and sentences and stuff like that are all clearly important parts of language. Why is language so much more than stuff like that? For those of us who as children had a typical history of speech and language development and who have been fortunate enough to avoid illnesses associated with communication disorders, language does not seem like a very big deal, at least in its daily use. For humans, language is a bit like bipedal locomotion, sleeping, and enjoying chocolate cake: it comes naturally and seems like it should not be any other way. We are all aware, at some level (often implicit), of the profound way in which language defines us as a species. After all, you do not hear zoo animals asking each other for another piece of chocolate cake. Because speech and language are so intertwined with being human and are so natural for us, we sometimes run the risk of treating speech and language as our old, uninteresting friend, someone we know everything about and who does not surprise us. When the author was a doctoral student at University of Wisconsin (UW)–Madison, he visited his home on the East Coast after his first semester, full of (what

he thought of as) the wonder of speech and language production, perception, and comprehension. One of his brothers asked him, “So, exactly what is it that you are studying?” When the author answered this question with an excruciatingly boring monologue about the intricacies of tongue movement, the nature of the speech acoustic signal, the fascinating changes in air pressures and air flows within the speech mechanism, and their role in producing a series of speech sounds to create words and convey meaning, his brother responded, “I don’t get it, what’s the big deal, you open your mouth and you talk.” Indeed, you do just “open your mouth and talk,” or at least most of us do. For those who cannot, however, either because of developmental problems or as a result of a stroke, a degenerative neurological disease, or a structural problem within the speech and/or auditory mechanism, the deficit or loss of this natural ability has devastating consequences. It is precisely because language is so naturally human that its disruption is such a big deal.

What Is Language? Language is studied by scientists from many different disciplines, including speech-language pathology, linguistics, cognitive psychology, general medicine, neuroscience, computer science, and even engineering. Individual scientists may disagree on the details 35

36

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

of exactly what they study as “language,” but we can offer a fairly broad and noncontroversial definition to organize our discussion of language science. Language is a conventional, dynamic, and generative system of components; the relationships between these components are used to express ideas, feelings, and facts in communication with other people. Language also uses mental representations to guide linguistic behavior. And, in a general sense, language is controlled by a network of specific, connected regions of the brain. Each of these claims is discussed briefly, below.

Language:  A Conventional System Language is conventional because, to a large extent, its use is based on arbitrary specifications and rules. The arbitrary characteristic of language components is not a problem, provided that a group of people agree on their use for communication. People adopt and agree upon arbitrary language characteristics as conventions to be followed for maximum benefit to the group using them. In this case, the benefit is communication. The most obvious (but hardly the only) example of the arbitrary nature of language usage is found in words and their meanings. Different words are used in different languages to mean the same thing. Speakers of English understand the sequence of sounds forming the word “home” because they have agreed, even if implicitly, on its meaning as a place where people live. There is nothing in the sequence of an “h,” “o,” and “m” sounds that captures the concept of a place of your own, where you eat, sleep, and raise children, any more so than the sequence of a “k,” “ah,” “s,” and “ah” (“casa”: Spanish), or “j,” “ee,” and “p” (“jeep”: Korean). Users of an imaginary language may call the place where you live a “glerkin”; if everyone agreed on this use, “glerkin” would induce exactly the same warm feelings as “home,” “casa,” and “jeep.” Words are not the only arbitrary characteristic of languages. The phonetic contrasts used to make differences in word meanings vary widely across languages (see Chapter 12), as does word order within a sentence. In English, for example, the subject-verb-object word order is required (“The dog chased the rabbit,” not “chased the rabbit the dog” or “rabbit the dog chased”). In Russian, word order for these kinds of sentences is optional, unless the larger conversational context of the sentence requires a specific order. Even social uses of language may be arbitrary. For example, the characteristics of conversational turn-taking vary quite a bit across different ethnicities and cultures. In some cultures, it is imperative as an indicator of good manners

for a listener to wait until the speaker has paused sufficiently for the talking turn to shift to the listener. Other cultures may allow lots of verbal interruptions, indeed may encourage them in a kind of Darwinian struggle to be heard, with no suggestion of lack of manners. Language is conventional. One language’s conventions are no better or worse than those of another language. When a group of people agree to a set of conventions, whatever they might be, communication happens. English is referred to a great deal throughout the following discussion, but only because it is the language of instruction for this course. Examples from other languages are provided to emphasize specific points.

Language:  A Dynamic System Language is said to be dynamic because it changes over time; language evolves. Language usage is not the same from generation to generation, or even within generations as individuals move through stages of life. The claim of language as an evolving system is easy to verify by, for example, comparing dialog from films or TV shows from the 1950s to those of the present. Certain aspects of language — usually words — may become more or less “extinct” over time, certain others are adaptable and are rarely pushed to the margins by contemporary culture. For example, if a sophomore attending college in 2018 was asked if he wanted to attend a football game and responded, “That would be swell,” other sophomores overhearing the response may ask where this student’s spaceship is parked and how it escaped the gravitational pull of his home planet. On the other hand, if the student said “cool,” few of his contemporaries would take notice. The interesting thing about these two words is that “swell” succumbed to linguistic evolutionary pressures after frequent use in the 1950s (watch a Leave it to Beaver or Father Knows Best rerun, you will probably hear the word used in a context similar to the one given above). “Cool,” on the other hand, adapted and maintained its usefulness, beginning its widespread usage in English as a marker of the beatnik/jazz culture of the early 1950s and then making its way through hippie culture, hair bands, the frighteningly empty 1980s, and all the way to the present. Cool is cool, then and now. Somewhere in between these extremes along the continuum of verbal viability, we used copasetic, groovy, far out, solid, tight, rad, and awesome, to mean basically the same thing. For an interesting dissection of the origins of “cool” as we now understand it to mean hip, good, great, and many other shades of “okay,” see the Slate column (http:// www​ . slate.com/articles/life/cool_story/2013/10/

3  Language Science

cool_the_etymology_and_history_of_the_concept_​ of_coolness.html) on the long history and evolution of the word. Language is also dynamic because it is used as a group marker to indicate, “I belong to this group” without saying so explicitly. High school students and young college-age students typically sound different from their parents, instructors, and other adults from whom they are one or two generations removed. That is to say, high school students, within the same broad culture (e.g., American culture), have subtly different accents and not-so-subtle differences in dialect from their parents and grandparents. Everything from the choice of specific words to prosody is often used as a group marker. A 55-year-old male and 20-year-old male may both use the word “cool” as a response without attracting attention, but the 55-year-old who explains how he likes to fish by saying, “That’s how I roll on the weekend” will attract attention because of the mismatch between his age and language usage; so, too, the contemporary 20-year-old who says, “I’m attending a rock and roll concert tonight.” The dynamic nature of language is inextricably tied in with cultural shifts. When your author was a teenager, certain words spoken in public created a major, scandalous incident. The open-air utterance of at least one of these words suggested a deep character flaw on the part of the speaker, like dipping your face into a mound of mashed potatoes and shouting, “Look, no hands!” at an upscale restaurant to which your future in-laws have taken you to get to know you better. In contemporary culture, those words — and in particular, that one — have more or less entered the mainstream as judged by their frequent use in public, on TV shows, and in films, from the mouths of people from all generations. An interesting, mostly nontechnical survey of language evolution is found in David Crystal’s The Stories of English (2004).

Language Is Generative Speech and language define the human species. The thoughtful reader may question this claim by pointing to various animal languages or in some cases an apparent animal ability to produce speech. Honey bees have an elaborate communication system for guiding their fellow workers to sources of food, chimps have been taught rudimentary sign-language skills, and African Grey parrots in captivity learn and produce an enormous number of phrases over the course of their 501

37

or 60-year life span. How are these examples different from human communication? Human language differs from animal languages by virtue of its creative and consistently novel characteristics. Humans take a small number of language rules and use them to combine and recombine words to produce interesting and, typically, never-beforeheard or -spoken utterances. This is the generative nature of human language, its ability to produce new utterances by applying conventional rules and word meanings to the needs of a communication goal. African Grey parrots produce lots of different utterances, apparently because of their outstanding mimicry and memory skills, but evidence for their ability to generate novel phrases is scant to nonexistent.1 The evidence for novel phrase production is somewhat more compelling in primates who have been taught sign language and who have been reported to combine individual signs in unique ways to achieve communication goals. Nevertheless, even if one accepts these reports as accurate (and not everyone does), the tremendous effort required to teach primates sign language is in stark contrast to the human infant/early toddler’s ability to generate new and useful phrases with a small set of words in the absence of formal instruction; to a significant degree, it just happens. Certainly it “just happens” for little humans in an environment rich with linguistic stimulation; put a chimp in the same rich linguistic environment and she will not produce spontaneous two- or three-word utterances, vocal or signed, around 2 years of age. There is something radically different about the human child’s language capabilities and potential from those of a primate, even one who has been intensively instructed. The generative nature of human language is often taken as a proof for its genetic legacy. Clearly, if the generative nature of language is what makes it peculiarly human, the human genome must have a lot to do with our language ability (Fisher & Vernes, 2015; Fisher, 2017). But it would be a mistake to discount environmental influences on human language development and skills. Human language characteristics reflect an interaction between the environment and the human genetic endowment for sophisticated linguistic behavior.

Language Uses Mental Representations Language learning and usage are based on the development and manipulation of mental representations. This

Dr. Irene Pepperberg has claimed the famous, late African Grey, Alex invented new phrases by combining previously unrelated words.

38

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

is a central idea in much of contemporary cognitive psychology, an idea we will adhere to in this chapter, even though there are scientists who do not believe in mental representations. A mental representation is an idea or image in the mind that has information content. Many language scientists believe there are mental representations of the sounds that are used contrastively in a language. According to this view, in English there is a mental representation of the vowel category “ee” (in phonetic symbols, /i/), which can replace other vowel categories in various syllable forms to create alternative word meanings. For example, the vowel in the word “heat” distinguishes the word for a temperature-based concept from the word “hat”; the words differ only by the vowel separating the “h” from the “t.” The mental representation of “ee” is built up and stabilized by exposure to the language. In contrast, in English there is presumably no mental representation — or at least not a linguistic one — for the vowel “ee” produced with the lips rounded. This is because an “ee”-like vowel made with rounded lips does not distinguish words in English. “Heat” spoken in English with the lips spread (the typical production) or with the lips rounded would strike a listener as the same word, even if the rounded version sounds odd or like a poor attempt to imitate a foreign accent (such as Swedish). The rounded “ee” is one of the variants of the mentally represented category of “ee” in the English-speaker’s mind. This means it is recognized as a member of the “ee” phoneme, albeit one with a slightly different sound. A similar example can be made for semantic categories. Language users have a mental representation for the category “dog,” which is developed from exposure and contact with many dogs. This mental representation contributes to a listener understanding of the kind of animal that has just relieved himself on the front lawn when someone says, “A dog just made a mess on our lawn.” This mental representation and its connection to language prevent the listener from imagining the offending animal to be, say, a lion, a horse, a cat, or some other type of animal with fur, primarily quadrupedal locomotion, a tail, and so forth. The mental representations for language units are thought to be powered by a variety of psychological processes. These processes create, maintain, adjust, and store the representations.

Language Is Localized in the Brain There is good evidence that speech and language capabilities are controlled by a specific network of brain regions and their connections. In classical terms, language is said to be “localized” in the nervous system.

When neuroscientists talk about “localized function,” they refer to a region or regions of the brain, and connections between these regions, in which the tissue is specialized for a specific function, or at least has developed a critical role in a specific function. There are many debates about brain localization for specific functions, and especially in the case of speech and language. It is safe to conclude, however, that in the overwhelming majority of people, language is localized to tissue in the left hemisphere of the brain. We know about the localization of speech and language from natural diseases (e.g., stroke) that affect specific parts of the brain, from brain imaging studies of speech and language activities, and from surgical procedures in which parts of the brain are stimulated or rendered nonfunctional for a brief (not permanent) time. Lefthemisphere brain localization for speech and language is yet another argument for the species-specific nature of speech and language. The brain regions associated with speech and language are discussed in greater detail in Chapter 2.

Components of Language Language is made up of well-defined components which are grouped into three categories. These categories are form, content, and use.

Form The form category includes the components of language that are referred to as “structural.” The three subcategories of form are phonology, morphology, and syntax.

Phonology Phonology is the study of the sound system of a language. The phonological component of a language includes all the sounds used by a language (the phonetic inventory), the phoneme categories of the language (the sounds that are “contrastive”), and the rules for sound sequences that can create words (phonotactic rules). Phonetic Inventory and Phonemes. Imagine an expedition to Madison, Wisconsin, from some faraway planet where the language is entirely different from English. These space travelers have a scientific tradition of language study, like our own, and resolve to gather information on the native tongue of Madisonians. Like any good phonetician, the space travelers use their highly evolved recording instruments which

3  Language Science

have become a permanent part of their brains; their ears serve as microphones, covered by only weakly effective, cartilaginous windscreens to collect a large speech sample from several native speakers. Because there is a coffee shop on every corner of Madison, Wisconsin, they have no trouble finding people willing to talk at length and provide an extensive sampling of the sounds used in their language. The space travelers examine the collected speech samples carefully, using narrow phonetic transcription to identify every sound heard in the recordings. “Narrow” transcription means they record very fine details of the sound production. These aliens have superhuman skills in phonetic transcription — they will get it right. When the space travelers have finished their transcriptions and analyzed them, they construct a chart showing all the sounds they have transcribed. The chart is a record of the phonetic inventory of the language spoken in Madison, Wisconsin. The chart records about a dozen vowel sounds, as well as glides, stops, fricatives, and affricates (see Chapter 12). An interesting feature of their analysis is the case of stop consonants (such as “p,” “t,” “k”), for which the space travelers find very small but noticeable differences for a specific sound. A good example is the “t” sound, which sometimes sounds as if it is produced with a strong burst of air; they transcribed this sound from words like “top,” “type,” and “attack,” but they do not know the words yet, they are just listening to the sounds. At other times the “t” sounds as if it lacks this burst (words like “mitt” and “stop.” They also record a stop sound that sounds as if the speaker suddenly closed the vocal folds for a short time before suddenly releasing a puff of air. This sound resembles “t” in some ways, especially ones without a burst of air, but the space travelers have sufficiently good transcription skills and can hear the sound as slightly different from the other “t” sounds. When the phonetic inventory is established, the space travelers must determine which of those sounds function as phonemes. A phoneme can be defined as a speech sound category that can change the meaning of a word, when it is exchanged for another speech sound category. The most straightforward example of this sound exchange is when they both appear in exactly the same location within the words. Two simple examples will make this definition clear. In English, the “k” and “g” sounds are both phonemes, because they can change the meaning of a word when substituted for each other. The word “duck,” for example, is changed to “dug” simply by exchanging the word-final “k” with a “g.” Native speakers of English above the age of 6 or 7 years know that these are two different words — they have different meanings. Similarly, “girl” and “curl” are distinguished in meaning by the exchange of the

39

“g” and “k.” To offer an example with another sound pair, in English the “s” and “sh” sounds are phonemes, as easily shown by such word pairs as “so”-“show” and “sip”-“ship.” The space-traveling phoneticians learn the changes in word meanings with these sound exchanges by asking the humans what the words mean when the sounds are exchanged; this is how they learn the phonemes of Madison English — identifying the sounds from the phonetic inventory that result in the change of a word meaning. The space travelers jet off to Amsterdam, in the Netherlands, to identify the phonetic inventory and phonemes of the local Dutch dialect. They use the same methodology as in Madison — sitting in coffeehouses and engaging people in conversations, to determine if the phonetic patterns of Dutch are the same or different from those in English. Dutch, they learn, has the “k” and “s” sounds but these do not contrast with “g” and “sh,” respectively. Although these sounds may sometimes be heard in Amsterdam Dutch, there are no “k”“g” or “s”-“sh” exchanges in Dutch that change word meanings. That is, “g” and “sh” do not function as phonemes in Dutch. When they occur, they are phonetic variants of the “k” and “s” phonemes. Consideration of these examples leads to an interesting insight: the phonetic inventory and phonemes of a language are not the same thing. Here is another example, based on the space travelers’ recording of Madison English “t” sounds. In a word such as “light,” the space travelers may hear the word-final “t” produced with (a) no burst of air, (b) a burst of air, or (c) a sound made as if the vocal folds were being closed tightly for a brief time, and then released. These phonetic variants can be heard with training, yet the word “light” can be spoken with each of the variants without changing its meaning. The variants are separate entries in the phonetic inventory but are all versions of the phoneme category “t.” Phonetic variants of a single phoneme category are called allophones. Languages differ widely in their phonetic inventories as well as in the way those sounds are used as phonemes. Here is a very important point. Phonemes are the minimal sound unit that can change word meaning, but phonemes have no meaning of their own. Phonemes are therefore not the minimal unit of meaning in languages. Minimal units of meaning are discussed in the section titled, “Morphemes.” Phonotactic Rules. Phonotactic rules specify the allowable sequences of phonemes for word formation, as illustrated by the three following examples. First, in English, words cannot start with a velar nasal sound (the sound at the end of the English word in “sang”), but they can in several languages of the world (e.g.,

40

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Burmese: Ladefoged, 2001). Second, English allows syllables to be initiated by a fricative followed by a stop (as in the word “stop”), but many languages (e.g., Japanese: Avery & Ehrlich, 1992) do not allow syllables initiated by consonant clusters (such as the “st” in “stop”) to initiate syllables. Third, English syllables can take a variety of forms for words, including ones that end with a final consonant (CVC [consonant-vowelconsonant] form, as in the word “kick”). In Mandarin Chinese, however, all syllables end in vowels — there are no CVC, or closed syllable forms (i.e., syllables “closed” with a consonant2). These examples illustrate phonotactic rules (also known among linguists as phonotactic constraints) in different languages, or restrictions on the sequences of sounds that are permitted to form words.

Morphology Morphemes are the smallest meaningful units in language. Morphology is the study of the rules applying to morphemes and how they are used and modified in communication. There are free morphemes and bound morphemes. Table 3–1 shows examples of both types. A free morpheme is a minimal unit of meaning that stands alone; words such as “dog,” “wait,” “run,” “hoot,” and “giraffe” are examples of free morphemes. A bound morpheme is a minimal unit of meaning that cannot stand alone but must be attached to a free morpheme to implement its meaning. As shown in Table 3–1, the plural “s” is a bound morpheme whose meaning is “more than one” when attached to a free morpheme such as “dog.” Similarly, the past tense “ed” is a bound morpheme whose meaning is “occurred in the past” when attached to a free morpheme such as Table 3–1.  Free Morphemes and Bound Morphemes

2

Free Morphemes

Bound Morphemes

dog

-s

wait

-ed

perfect

im-

certain

un-; -ly; -ty

run

-ing

laugh

-able

fool

-ish

“wait.” The words “dogs” (dog + s) and “waited” (wait + ed) therefore consist of two morphemes, one free and one bound. Other familiar bound morphemes in English include “-ing” (running), “-er” (taller), “-able” (laughable), “pre-” (prenuptial), “un-” (unusual), and “-ish” (foolish). Morphemes are important in language development, showing a typical pattern (on average) of mastery as children develop and gain levels of language sophistication. Morphemes also have significance in certain delays and disorders of language development, as discussed in Chapters 7 and 8.

Syntax Syntax specifies the rules for ordering of words to form “legal” sentences. For example, in English, “The cat ate the mouse” is a legal sentence, whereas “Ate mouse the cat” is not, because the verb must follow the subject of the sentence. This is true for English even though the meaning of “Ate mouse the cat” can be worked out. As noted, some languages have very strict rules about the positions of verbs, adjectives, and nouns in sentences, whereas others do not. Another example of a syntactical rule in English is the requirement for noun phrases to link with adjectives by means of the “to be” verb and its variants. “John is happy” is legal, “John happy” is not.

Content The content component of language refers to meaning. Morphemes have already been identified as the minimal units of meaning in language, but here we are concerned with the nature and organization of all meaning units. The branch of language science concerned with meaning is called semantics. Semantics includes not only the meaning of words, and how sets of words may be similar or different, but also the meaning of phrases. As noted earlier, words and their meanings are established arbitrarily and become useful when the linguistic community agrees on the correspondence between them. Semanticists are interested in how speakers employ and organize the words they produce, and how listeners interpret those words. Readers may wonder, “Why should there be a difference in the meanings of a word or phrase when users of a language agree on them?” This is one of the many interesting aspects of semantics — that language users may agree in general on semantics but differ on specifics.

 There is an exception to this phonotactic rule in Mandarin: CV (consonant-vowel) syllables can end in the velar nasal sound mentioned in the text.

3  Language Science

41

Big Bits of Language The example in text of the bound morpheme “-able” in the word “laughable” (which transforms the verb “laugh” to an adjective as in, “His novel is laughable” or as in “What a laughable novel”) offers an opportunity to illustrate the complex nature of morphology, and the tricky relationship in English between orthography (the printed representation of a word) and its spoken version. In “laughable,” the same orthographic sequence (able) is also a free morpheme, but pronounced differently: “uh-bl” when it functions as a bound morpheme, “ay-bl” for the free morpheme; the meaning of both the bound and free versions is essentially the same. In “laughable,” the addition of the bound morpheme changes the verb (laugh can also be a noun) to mean, “able to evoke laughter (or laughs).” When the free morpheme “able” (“ay-bl”) is added to the free morpheme “laugh,” note how its pronunciation changes from the free-morpheme version. If you told someone, “I’m uh-bl (able) to meet that deadline,” you would receive a strange look; your listener might not understand what you mean. You may get an equally strange look if you said, “That’s laugh-ay-bl,” but in this case, your listener is likely to know what you mean, even if recognizing there is something wrong with your pronunciation. These changes in sound when morphemes are combined are called morphophonemic alternations. They are often rule based (the same change from “ay” to “uh” occurs for words such as “portable,” “notable,” “changeable,” and “stackable”). A nicely complicated example of morphophonemic alternation is the spoken versions of the words “harmony” expanded to “harmonic,” “harmonious,” and “harmonizing.” As the morphology changes, notice how the sound “ee” after the “n” in the “base” morpheme (“harmony”) changes to “ih” in “harmonic,” back to “ee” in “harmonious,” and to “eye” in “harmonizing”; in “harmonious” the sound after “m” changes to “oh.” An additional piece of this morphology puzzle is that the stressed syllable in each word (bolded in the orthographic representation that follows) depends on the morphological structure. These shifts in stress can affect the pronunciation of the vowel following the “n.” The phonetic transcriptions of this sequence of morphophonemic alternations have been included with a guide to sound symbols that are different from English orthography. harmony /hɑrməni/ phonetic symbol /ɑ/ = “ah”; /i/ = “ee”; /ə/ = very short “uh” (schwa) harmonic /hɑrmɑnIk/

phonetic symbol /I/ = “ih”

harmonious /hɑrmoniəs/

phonetic symbol /o/ = “oh”

harmonizing /hɑrmənɑIzIŋ/ phonetic symbol /ɑI/ = “eye; /ŋ/ = “ing”

Some scientists believe these subtle differences in word meanings can wreak havoc in certain communication settings. For example, most people would agree on the general meaning of the word “fine” as a response to the question, “How do I look?,” but the more specific interpretation of the response may depend on the

gender of the question-asker (Tannen, 1994). A more subtle example, one in the arena of the social use of language (discussed later in this chapter), is the meaning of a word such as “adorable.” The obvious content of this word is clear, as applied to a child, perhaps, or a small, cuddly pet, but a deeper level of meaning may

42

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

be implied by the gender of the speaker. Semanticists, therefore, explore all sorts of meaning aspects of words and phrases, and the nuance of meanings depending on who is speaking, the context of the conversation, and so forth (Tannen, 1994). In speech and language development, vocabulary is of interest as children begin to understand the hierarchical structures of meaning. “Hierarchical” means that certain meanings are subsumed, or embedded, within larger meanings, like subcategories of more general categories. At an early stage of development, “dog” may mean only Muffy, the family dog, but as the child accumulates life and language experience, “dog” takes on a more global lexical status. “Dog” may include any of those sniffing four-legged creatures attached to humans by a rope, and especially that nasty slobbering Spike who scares poor Muffy when they meet on the street. Lexical (vocabulary) development is covered in more detail in Chapter 6.

Social Use of Language (Pragmatics) Social use of language is distinguished from the form and content components of language partly because it does not have easily identified units (such as phonemes, morphemes, words) or rules (as in the case of syntax), and partly because it concerns communication on the “big stage” of social interaction. Social use of language is sometimes discussed under the term pragmatics, sometimes under the term speech acts. Even if units and rules of language pragmatics are hard to specify because they are less obvious than the units and rules for form and syntax, language pragmatics is not random or unguided by convention. For the following discussion, the term “pragmatics” will be used to designate the social use of language. Pragmatics often involves aspects of communication that are implicit to the language user’s communication skills. People know successful communication when they see it, but if asked to identify characteristics of successful communication, they are not necessarily able to verbalize them in explicit terms. Nevertheless, pragmatics are an essential part of successful communication. Conversations are maintained between two or more people by observing these implicit rules. Speakers engaged in a conversation cooperate to maintain it by sticking to the topic under discussion or using the appropriate means to change the topic, by observing turn-taking rules, and by speaking in a voice appropriate to the setting (e.g., not overly loud). Cooperation and politeness in conversation seem like obvious

requirements for communication success, but we all know people who have trouble with these requirements. It is as if these individuals do not recognize the mismatch between their conversational techniques and those of the majority of people with whom they communicate. Pragmatics, like other aspects of language, must be learned. These learned pragmatics skills can be undone by certain neurological diseases. The deterioration of pragmatics skills is part of a language disorder. Pragmatics is a complex, culture-bound aspect of language. The domain of pragmatics extends well past actual spoken conversation to such things as body language, gestures, patterns of eye contact during conversation, distances between speakers engaged in a conversation, and choice of vocabulary. For example, the simple act of waving a friendly goodbye to someone with an open hand, fingers spread, and palm facing the person may be interpreted in Greece as an insult. In certain cultures, the way in which someone leaves a room after concluding a conversation with a superior (a boss, a teacher, a parent) may differ in accordance with pragmatic rules (e.g., backing out of a room versus turning and leaving). Just as pragmatic rules may be variable across different cultures, where cultures are defined geographically, the rules may vary within cultures according to age group, ethnic and racial background, and so forth.

Language and Cognitive Processes This chapter reviews the components of language as separate entities. The presentation seems to make sense ​— an adult can accept the idea of phonemes, morphemes, words, syntax, and even pragmatics as components of language. The definition of these components, however, does not include discussion of how the components interact in both language development and communication among people with well-developed language skills (e.g., individuals at least 5 years of age). It also does not discuss why language develops so rapidly, how language develops, and when components of language and their interactions are mastered throughout language development. Each of these is considered in turn in the following sections. As with the components of language, the why, how, and when of language development and mastery are not independent. The discussion of “how” and “when” is expanded in Chapter 6. These questions are important because they have direct implications for the understanding, diagnosis, and treatment of speech and language disorders.

3  Language Science

Why The question of why language develops, and so rapidly, is answered in different ways depending on (among other things) the initial premise of the question. If the premise is the existence of an innate speech and language property of the human brain, language develops because it is driven by brain tissue dedicated to it. In this view, there is a human brain mechanism for speech and language, different from the communication brain mechanism of any other animal. At some point in the first year of life, the mechanism is “turned on,” to initiate speech and language development. Language advances so rapidly from first words through morphology, vocabulary, and sentence structure (syntax), presumably because it is coded to do so by this mechanism. This view has been championed by the famous linguist Noam Chomsky (Chomsky, 1957, 1975). He called this brain mechanism a language acquisition device (LAD). The idea of an innate, human-specific language mechanism has been disputed and rejected by many scientists. One compelling argument against an innate mechanism for language is the large variability among typically developing children in the rate of language learning. Across typically developing children, first words may be spoken over a wide range of months; at 2 years of age, there is significant variability across children in vocabulary size and utterance length (e.g., single word versus multiword utterances). If an innate mechanism guides language development, why is there so much age variability among children in the mastery of language development? Many scientists argue that the human brain does not need a specialized mechanism for language development. The brain has an extraordinary number of interconnections between its approximately 100 billion neurons (brain cells that transmit information); the brain can function like a supercomputer. The mastery of language is powered by this massive biological computing ability, which processes an enormous volume of speech and language data, organizes patterns, and learns from these data how to use language components for effective communication (see Kidd, Donnelly, & Christiansen, 2018, for a review of these issues).

How As discussed more completely in Chapters 5 and 6, language development moves from a preverbal stage (prior to first words) through simple language skills (single words, “dog”), which are later expanded into multiword utterances (“dog eat”) to use and mastery

43

of bound morphemes (“want”-“wanted”), through more sophisticated syntax, and on through more complex levels of language usage. Development of the form, content, and usage (pragmatics) components of language are most notable between ages 1 and 9 or 10 years, with especially rapid advances in the first five years after the first word. Included in this rapid advance is complete mastery of the sound system of language by no later than 8 or 9 years of age. The succeeding levels of language development overlap in time; vocabulary, for example, grows rapidly as utterance length (multiword sentences) expands. Language development includes both comprehension and production (expression) skills. Language comprehension and expression are interdependent processes in language development. Comprehension and expression can also be understood as partially (or sometimes wholly) independent. For example, in the preverbal stage of language development, comprehension skills are more advanced than skills of expression. The preverbal stage for expression is largely phonetic (as in babbling), rather than phonemic (because there are no true words), but comprehension of language and sophisticated speech perception skills for phonemic contrasts are present. For example, a 6-month-old infant is typically on the verge of comprehending her first words but is (on average) an additional 6 months away from producing a first word. Language development proceeds well into the teenage years, and even into adulthood (e.g., expansion of vocabulary, or understanding the meaning of subtle language use as in humor). Reading skills, as one example of language development, are affected significantly by language comprehension skills. The development of language may therefore be influenced by cross-modality skills (in this case, auditory and visual modalities, see Hogan, Adlof, & Alonzo, 2014).

When The “when” of language development is different for each child. Average-age benchmarks for mastery of different components of speech and language are summarized here and discussed more fully in Chapters 5, 6, and 13. First words are expected around 1 year of age, a 50-word vocabulary at 18 months of age, two-word utterances at age 2 years, longer utterances between the ages of 2 and 3 years, as well as the comprehension and expression of adjectives, verbs, and free morphemes. Bound morphemes are usually mastered by age 4 or 5, and longer, more complex utterances from age 5 years onward. Complete mastery of phonol-

44

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

ogy ranges between the ages of 5 and 8 years. Finally, production and comprehension of complex language usage — jokes, irony, and other abstract language phenomena — may not be mastered before the teenage years. These age benchmarks for language components and the very large variation across children in the time of their appearance are discussed more fully in Chapters 5 and 6. This short summary of “when” is introduced in this chapter because of the systematic nature of language development across childhood years. Why are first utterances single words? Why are most of these initial words nouns? Why does vocabulary development accelerate dramatically when two-word utterances appear in the child’s expressive skills? A person who subscribes to Chomsky’s position imagines these sequential elaborations of language form, content, and usage as a set of switches that are turned on throughout development. The switches are a characteristic of the language-acquisition device. A person who views learning as the basis for language development is interested in the development of cognitive skills as a foundation for increasingly sophisticated language skills. Nonlanguage cognitive skills (memory, organization, speed of processing, and so forth) mature throughout childhood. If nonlinguistic cognitive skills are employed to power the development of language skills, the schedule for increasing language sophistication can proceed in parallel with and, in fact, influence, nonlanguage cognition, which in turn influences language development. Brain development continues well into childhood and the teenage years, providing increasing brain power to develop very sophisticated language skills. In this view, special brain mechanisms are not required for language development.

Chapter Summary Human language is unique not only because of its conventional, dynamic, and generative nature, but also because it seems to require and make use of specific brain regions. It is convenient to separate the components of language into the categories of form, content, and use. Form includes the phonological, morphological, and syntactic components of language, content the semantic component, and use the pragmatic component (social use of language). Both comprehension and expression (production) of speech and language development are important skills in the mastery of communication.

One view of speech and language development is that there is a language acquisition device specific to the human brain, that the device “turns on” around 1 year of age and initiates a series of increasingly sophisticated language steps. Another view is that language development is guided by cognitive processes (e.g., memory, processing speed, attention) that become increasingly skilled and complex with development. These cognitive processes are not specific to language but are well suited to the organization and manipulation of the massive amount of speech and language data to which a child is exposed. Three questions can be asked about the development of language: why, how, and when. “Why” is answered by language in the service of communication: language develops to serve a critical need of humans, to communicate complex ideas and actions with flexible use of semantics, sentences, and pragmatics. “How” language develops is by starting out very simply (babbling and first words) and adding new layers of complexity to language skills as a child develops. “When” language develops refers to an age-related sequence of steps of language mastery at succeeding levels of complexity; mastery of each stage of language proceeds through systematic phases for the “average,” typically developing child, but from child to child there is substantial variation in the ages at which mastery of a specific aspect of language is accomplished.

References Avery, P., & Ehrlich, S. (1992). Teaching American English pronunciation. Oxford, UK: Oxford University Press. Chomsky, N. (1957). Syntactic structures. The Hague, Netherlands: Mouton. Chomsky, N. (1975). Reflections on language. London, UK: Fontana. Crystal, D. (2004). The stories of English. New York, NY: Overlook Press. Fisher, S. E. (2017). Evolution of language: Lessons from the genome. Psychonomic Bulletin and Review, 24, 34–40. Fisher, S. E., & Vernes, S. C. (2015). Genetics and the language sciences. Annual Review of Linguistics, 1, 289–310. Hogan, T. P., Adlof, S. M., & Alonzo, C. N. (2014). On the importance of listening comprehension. International Journal of Speech Language Pathology, 16, 199–207. Kidd, E., Donnelly, S., & Christiansen, M. H. (2018). Individual differences in language acquisition and processing. Trends in Cognitive Sciences, 22, 154–169. Ladefoged, P. (2001). Vowels and consonants. An introduction to the sounds of languages. Oxford, UK: Blackwell. Tannen, D. (1994). Gender and discourse. New York, NY: Oxford University Press.

4 Communication in a Multicultural Society Introduction If asked to define “culture,” how would you respond? “Culture” is a concept most people understand but find it difficult to define in precise terms. Here are a few (admittedly academic) definitions of “culture” selected from a search on the Internet: n “a set of learned beliefs, values and behaviors;

the way of life shared by members of a society”

n “The accumulated habits, attitudes, and beliefs

of a group of people that define for them their general behavior and way of life” n “understandings, patterns of behavior, practices, values, and symbol systems that are acquired, preserved, and transmitted by a group of people . . . ” n “Learned behavior of a group of people, which includes their belief systems and languages, their social relationships, their institutions and organizations . . . ” These four statements define culture in similar ways. The common threads among them are that culture is learned, shared among a group of people, and determines the way they behave and construct their

societies. It is easy to see how communication, and more specifically speech and language, fit with these concepts of culture. The first two definitions imply the major role of communication in culture with the phrases, “a set of learned behaviors . . . ,” “way of life shared by members of society . . . ,” and “accumulated habits . . . that define for them their general behavior and life.” The final two definitions make an explicit link between culture and communication by saying that culture consists of “symbol systems that are acquired, preserved, and transmitted by a group of people . . . ,” and, “includes their . . . languages . . . ” (emphasis added). Language, as shown in Chapter 3 is a conventional set of symbols used by members of a community to communicate. “Conventional,” in this description of language, means “arbitrary” but agreed upon by members of a group. Consider the following anecdote, from the author’s experience. I came to Madison, Wisconsin, in August 1972, never having ventured west of State College, Pennsylvania, where I earned my undergraduate and master’s degrees. When admitted to the doctoral program at University of Wisconsin–Madison, I was fortunate enough to have been awarded a fellowship. This award required me to fill out some paperwork at a university administration building, which after arriving 45

46

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

in town I located (with difficulty). I walked into the lobby and had no idea where to go, but there, at an information desk, sat an obviously sightless gentleman, wearing dark glasses, who was likely to know where I should go. I approached him and asked, “Can you tell me where I can find Window 20?” He responded, “Walk down the hallway and turn right at the bubbler.” A bubbler? I had no idea what a “bubbler” was, but I imagined a decorative, in-ground water feature — ​ I knew water must be involved — perhaps with small stone figures of angels or ducks bathing in gently percolating water emerging from an elegant spout in its center. I walked down the long hall and saw nothing so peaceful or watery, not even anything vaguely close to this image that had popped into my head when the gentleman said, “bubbler.” Back at the information desk, head bowed and feeling, well, incompetent, I said to the gentleman, “Okay, I’m really sorry, I don’t see a bubbler or maybe I don’t even know what that is.” The sightless man, who I would learn through repeated contacts over the next several years was not a smiler, grinned ever so slightly and said, “Son, where are you from?” “Philadelphia, sir.” He allowed the grin to turn a tad more obvious and said, “Down the hall, right at the drinking fountain.” I immediately, of course, made the connection between “bubbler” and “drinking fountain.”1 I had never heard the word “bubbler” used in this way, even the term “drinking fountain” was slightly foreign sounding; in Philadelphia, we called these things “water fountains.” The point is not that the use of the term “bubbler” was an impossible hurdle to my understanding of Wisconsin talk — like most adults, I learned what this new label meant after a single trial. Rather, this strange spoken label for “water fountain” made me feel — different. Over the next several weeks, I would have this same experience many times, learning terms such as “stop-and-go light” and “hoser” as alternate expressions for the vocabulary items I used to designate a traffic signal and a gentle insult directed at males, respectively. I had never considered the words I grew up with for these items or people as merely a collection of arbitrary communication signs. I was in a different culture, where communication was subtly and sometimes not-so-subtly different from the culture 1 

I knew, and was unaware of as being defined by the four bullet points previously presented.

Dictionary of American Regional English The “Dictionary of American Regional English” (DARE: http://dare.wisc.edu/, now in six volumes) lists the many regional variants of words and phrases that people in specific geographical regions of the United States have agreed to understand as having a specific meaning, and that people in other regions of the country are likely not to understand. This dictionary is like a manual for the concept of the arbitrary meanings of words.

The linkage between culture and communication has been studied for many years. The American anthropologist Ruth Benedict, in her famous (and still controversial) 1934 book Patterns of Culture, noted the arbitrary nature of cultural customs and the error of thinking, common among anthropologists who were her contemporaries, that Western Society was a “reference” culture to which all others could and should be compared (see Huntington, 1996, for an extended consideration of this idea). For Benedict, and for many subsequent scholars of linguistics and anthropology, this idea of a “reference” culture extended to speech and language. Benedict pointed out that the speech sounds used by a specific language are a very small subset of the total number of speech sounds that can be produced by the human speech mechanism. The speech sounds used in a specific language serve the functional role of communication. These speech sounds and the language forms (e.g., words) they create are a critical component of a culture. There is nothing inherently special or “correct” in the pronunciation of English spoken by Caucasians, in, say, Chicago, Illinois. The pronunciation of speech sounds, the word choices, the idioms, even the distance maintained by a talker from his or her listener — these communication behaviors reflect culture, something shared among a group of people that reflects their way of life. The biological anthropologist Terrence Deacon summed up this issue by saying, “If symbols ulti-

 o the best of the author’s knowledge, the term “bubbler” was coined by one Harlan Huckleby, whose idea for a drinking machine that shot T water upward toward the drinker’s mouth was patented in 1888 by the Kohler company of Kenosha, Wisconsin. The word may also be used in parts of New England, Michigan, and Australia.

4  Communication in a Multicultural Society

mately derive their representation power, not from the individual, but from a particular society at a particular time, then a person’s symbolic experience of consciousness is to some extent society-dependent — it is borrowed” (Deacon, 1997, p. 452). The “symbols” include the arbitrary linguistic forms and usages as previously mentioned. More importantly, Deacon claimed that an individual’s language experience is really a cultural experience of his or her society. Deacon says this experience is “borrowed” because the language/culture matrix is constantly changing, even within the same culture. An individual’s perspective on life is mediated — some may even say dictated — by the language/ culture in which he or she is raised. Many cultural factors influence communication. A partial list is provided in Table 4–1.

Why It Matters These introductory comments may seem far afield from the topic of Communication Sciences and Disorders, but in fact they are highly relevant to much of the material in this text. As pointed out in several publications (e.g., Larroudé, 2004), the United States is becoming increasingly diverse in its racial and ethnic identity. Population growth in the United States is disproportionately accounted for by nonwhite persons, pointing to group diversity as a long-term characteristic of American society. This diversity includes a multitude of cultures. If language is a prominent feature of culture and not separable from it, a panorama of communication styles can be expected as part of everyday life in the United States.

Table 4–1.  Selected Cultural Factors That Can Influence Communication

47

Everyday life includes health conditions, many of which affect a person’s ability to communicate. Even perfectly healthy persons who enter the United States for family, work, or educational reasons, who speak English as a second language, may feel as if their communication skills are impaired. At some point in their lives, many of these individuals may seek the services of a speech-language pathologist (SLP) and/or audiologist (AuD). Here is one good reason for specialists in Communication Disorders, whether clinicians or college professors who help train clinicians and researchers, to be well-versed in the influence of cultural variation on speech and language behavior. Clearly, an SLP or AuD cannot possess the entire range of multicultural knowledge and skill required for equal effectiveness among the many diverse groups in the United States. It is unreasonable, for example, to expect a professional in Communication Disorders and Sciences to understand all aspects of African American, Asian, Hispanic, and Native American cultures, including their speech and language usage. The sheer variety of cultures and their communication components form an intimidating body of knowledge; for speech and hearing professionals, “Becoming competent crossculturally is among the greatest of challenges” (Cheng, 2000, p. 40). Rather than trying to master this impossibly large amount of information, a multiculturally competent professional in speech and hearing should master a “multicultural framework.” This framework focuses on principles that are applicable across persons from the many different cultural groups a speech and hearing professional is likely to encounter in his or her practice (see Cheng, 2001, p. 125). A few of these principles are discussed later in this chapter. An important “super principle” of this framework is discussed here to illustrate why multicultural sensitivity and knowledge are important to the SLP and audiologist. This principle involves the distinction between a difference and a disorder.

Race and ethnicity Social class, education, and occupation

Difference Versus Disorder

Geographical region

Multicultural competence for a speech and hearing professional is especially important when a patient has communication behaviors that, in one or more environments (such as the classroom), call attention to themselves. A principle of multicultural competence is the recognition of the distinction between a communication difference and a communication delay or disorder. This distinction can be clarified with an example from the area of typical language development in children. Language scientists have studied in great depth various aspects of typical language development,

Gender Sexual orientation Situation or context Peer group association/identification First language community/culture Relationship between speaker and listener Source:  Based in part on Giri, 2006.

48

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Ebonics “Ebonics,” a term blending “ebony” and “phonics,” is generally understood today to be a largely historical term for African American Vernacular English (AAVE), the dialect used by many North American African American people. Ebonics became a national issue in 1996 when the Oakland Public Schools, California, passed a resolution recognizing it as a legitimate language system. The resolution had the aim of obtaining federal programs and support for instruction in Ebonics. Learning of standard English, it was argued, could be facilitated by approaching instruction through the child’s “home” language — Ebonics. This resolution, and its educational implications, provoked spirited controversy among public figures, politicians, and linguists. Whatever the fine points of the controversy might be, there is widespread agreement among linguists that AAVE has language forms, content, and use employed in a rule-based system for effective communication. Often these rules are not the same as the rules we associate with “Standard American English” (whatever that might be). McWhorter (2001) and Kretzschmar (2008) have written histories of the controversy, as well as differing opinions about the legitimacy of Ebonics as a language system.

including the learning of grammatical rules. As discussed further in Chapter 6, during the course of typical language development, it is not unusual for children aged 2 years (or a little older) to produce simple sentences such as “He running.” Children communicate the idea of a boy who is currently in the act of running with a sentence lacking the “to be” verb (in this case, the word “is” or the contracted form, “He’s”). “He running” is considered a “typical” sentence form for a child in the early stages of language learning, but the same sentence produced by a child aged 4 years may create an impression of language delay. This is because the “typical” course of language development involves a fairly quick transition from “He running” to “He is running” (or, “He’s running”). This is all well and good, but this “typical” course of verb development does not apply to all dialects spoken in the United States. A relevant case is AAVE,2 in which “He running” is a proper grammatical represen2 

tation to convey the meaning of a male in the act of running (more precisely, it is “He runnin’”). When a 4-year-old African American child who is a speaker of AAVE says, “He runnin’,” there is no language delay (as evidenced by this particular utterance) because the child uses the correct grammatical form for his cultural/linguistic community (see Rickford, 1997). In other words, the language community in which the child is developing speech and language skills recognizes “He runnin” as typical development, and as a good match to fully developed language among adult speakers of AAVE.

Standardized Tests and Sample Size In the discipline of Statistics, there is the concept of a “population” and of a “sample.” When a scientist chooses a sample of people to participate in an experiment, she hopes the sample is sufficiently representative of the population so that her results can be generalized (that is, not restricted to the specific participants she has studied in her experiment). There are several approaches to creating a sample that is representative of the population. Among these is the selection of a sample consisting of many participants (as compared to relatively few participants). The thinking is, the larger the sample, the more the results approach the “true” population characteristics. Based on this principle, most standardized tests are based on data collected from relatively large numbers of participants. Few people trust the results of an age-normed, standardized test based on data collected from, say, 10 children at each age. The precise number required to make the sample a good estimate of population characteristics depends on many factors, but the larger the sample, the more likely is the generalizability of the results.

Standardized Testing and Language Difference Versus Disorder The example in the Box above can be extended to formal, standardized testing of speech and language skills. In

 ome refer to this dialect as African American English (AAE), or Black English, but in this chapter AAVE is used. See the Box S on Ebonics.

4  Communication in a Multicultural Society

the case of speech and language development in children, it is useful to have knowledge of age milestones for specific events. For a typically developing child, how many words are included in the spoken vocabulary at age 2 years? At age 3 years? Or, at age 2 years, which verbs are understood and/or spoken by a typically developing child? The answers to these questions are clearly relevant to clinical diagnoses of delayed or disordered language development. How are answers to these questions determined, and how are they used in clinical settings? Age-normed, standardized tests provide one way to answer these questions. The purpose of these tests is to generate an accurate estimate of the age at which most typically developing children master a particular speech or language skill, and to express the test scores in a way that permits direct comparisons across ages. The results of an age-normed, standardized test can be used to document the amount of language delay, expressed as the number of years behind performance expected based on the child’s chronological age. For example, a four-year-old child who has language scores that are at the mean (average) of the distribution of scores obtained from the three-year old, normative sample, is said to be a year behind in language development. And, these normative distributions can be used to track a child’s progress during speech-language therapy. In the example immediately above, the desired outcome of the therapy is to move the child’s standardized score in the direction of the age-appropriate distribution (i.e., the four-year-old distribution). The standardized scores of typically developing children at any age depend on the nature of the sample used to construct the test. The nature of the sample is critical to understanding the limitations of a standardized test in making clinical decisions; the results of the test, and their interpretation, depend entirely on how the test was normed. Consider the four-year-old, AAVE-speaking child, described above, who produces verbs consistent with adult use in AAVE (“He runnin”). Let’s assume we have a standardized test of verb development, based on samples of children living in the state of Wisconsin. The sampling takes account of the proportions of Caucasian, African-American, Hispanic, Asian, and other racial and ethnic groups in the state population. The method used to develop this standardized test may follow excellent principles of test development, including the proportionate sampling of children within the state population. However, the norms in the test are not likely to reflect “typical” performance in other states where the proportion of (for example) AAVE-speak-

49

ing children is much greater than it is in Wisconsin. For example, Illinois and Florida have a substantially higher proportion of African-Americans than Wisconsin, and a greater number of speakers of AAVE The most obvious use of an aged-normed test is clinical, in which a diagnosis can be made concerning a possible delay in speech and/or language development. The issue posed concerning an African American child’s use of AAVE exposes one major, potential problem with age-normed tests: the results, and their interpretation, depend entirely on how the test was normed. Our hypothetical test of verb performance was developed from data collected in Wisconsin, a state with a much lower proportion of African Americans than, say, Illinois or Florida (http://www.wadsworth.com/ sociology_d/special_features/ext/census/african​ .html). If the test considers “standard English grammar” as the requirement for correct verb performance, the norms are likely to misrepresent “typical” language development depending on the geographical location of data collection. A “normative” test can reflect substantial cultural biases. Here is a hypothetical example of cultural bias in normative testing. An African American child who uses AAVE in the home and with his friends is referred by his preschool teacher to a speech-language clinician for language evaluation. The teacher and the SLP are not well informed about potential cultural bias in standardized testing. These well-meaning individuals were educated many years ago, before our discipline had a clear sense of the professional implications of cultural sensitivity. A 5-year-old child is given standardized tests based on samples from primarily white populations, and his scores are like those of 3-year-old, typically developing children who contributed to the sample on which the test was based. In the absence of cultural sensitivity and the strong link between culture and language, the child’s score may be interpreted as an indicator of developmental language delay. The child is scheduled for sessions with the SLP to address this clinical diagnosis. The child’s parents are confused when informed of this diagnosis, because they have not noticed a problem with the child’s ability to communicate. Their child’s language abilities seem age appropriate, judging from their experience with other children in their family, and in their community. Formal or informal testing of this child’s language skills in AAVE would show, in fact, that the child’s language development is age appropriate. The SLP must have cultural competence, including cultural sensitivity, to determine if a child’s speech and language skills represent a difference from other cultural expectations, or a true delay or disorder. A speech/language difference is not something to be

50

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

treated (unless the patient or his parents request such treatment: see later in chapter); a delay or disorder should be treated. This rather long example is the most obvious case where unintentional cultural biases may influence the interpretation of formal, standardized tests. In the evaluation of all aspects of language, including nonverbal language, social use of language, and many other factors, SLPs and audiologists must have cultural competence to be most effective. Cultural variations in all components of speech and language are too numerous to document here, or to be known by an individual SLP or AuD. The key is for professionals in our field to be aware that differences from their own ideas of communication normality may be cultural, not clinical. This awareness should include the knowledge and skills to identify sources — in the clinical and research literature, or of the human variety — to assist in the evaluation of communication behaviors as cultural differences versus communication delays/disorders. A framework for knowledge and skills in cultural sensitivity is now incorporated into the training of SLPs and AuDs. More specifically, when an individual SLP is working with many patients who identify with a variety of cultures, whether it is African American, Hispanic, Asian, Native American, Deaf, or any other group, development of a solid understanding of communication characteristics within the group is critical. The SLP or AuD may not have a full range of cultural knowledge for each group, but based on the multicultural framework that is part of SLP and AuD training, know how to identify and obtain the knowledge relevant to any child or adult seeking diagnostic or management (therapy) services. The questions about speech-language and audiology specialists and the possible intertwining of their professional competencies and cultural identities can be stated more broadly. Can Caucasian SLPs be effective in a school setting where the majority of students use AAVE to communicate, or vice versa? Can an SLP who is not familiar with Hispanic culture and language usage diagnose and treat speech and language delays or disorders in a school where the students are primarily of Hispanic heritage (or vice versa)? The steps taken by ASHA to ensure that all persons being trained as SLPs and AuDs receive instruction in cultural competence are likely to address these questions in a positive way. ASHA’s perspective on cultural competence is summarized in an overview statement on their Web page (http://www.asha.org/ Practice-Portal/Professional-Issues/Cultural-Compe​ tence/). Cultural competence is viewed by ASHA as an issue much broader than the few examples provided in this chapter. Cultural competence may involve cultural

issues associated with age, gender, and socioeconomic status, to name a few.

Accent, Dialect, and Culture Some aspects of cultural variation are plainly obvious. For example, differences between Japanese and American culture, or between German and French culture, are indisputable. A clear and dramatic difference between these cultures, of course, is language. Even within the same language, however, dramatic cultural differences may exist. The “face” of these within-language differences is often speech and language patterns, even when the actual face does not suggest a difference. Good examples are British versus American English, the Chinese spoken in Taiwan versus Mainland China, and Egyptian versus Jordanian Arabic. Accent and/or dialect are often strong identifiers of geographic, ethnic, and even cultural roots. Accent and dialect are sometimes used interchangeably, but as noted in the next sections, there is an important distinction between them.

Accent In lay terms, accent refers to how a person “sounds” when he or she speaks. More technically, accent refers to how the sounds of speech and the melody and rhythm of speech (prosody) are heard. The statement, “Bill has a Boston accent,” refers to the way Bill says certain sounds, and perhaps the melody and rhythm of his speech. “Susan has a Southern accent” means the same thing, and distinguishes the sound of her speech from Bill’s. The way these two speakers sound allows almost any speaker of American English to identify their general geographic origin within the United States. Accent is often thought to be conveyed primarily by the way vowels are spoken, although consonants and prosody can contribute in important ways to a regional accent. How many distinguishable accents exist in the United States? This is a difficult question to answer, but Dr. William Labov of the University of Pennsylvania, who has studied regional accents in the lower 48 states, identifies six major accent regions (Labov, Ash, & Boberg, 2006). These regions can also be thought of as cultural regions. For example, among the six accent regions are a Western accent and a Southern accent. Most Americans would agree that Western culture (as found in, say, California or Washington State) differs from Southern culture (as found in Louisiana or Georgia, for example). These same Americans would not expect much difficulty in identifying speakers with Western versus Southern accents. The person with an easily identifiable southern accent probably con­

4  Communication in a Multicultural Society

siders himself a southerner, and not simply because of his accent. The six major accent regions identified by Labov and his colleagues are almost certainly an oversimplification of accent variation in the United States. For example, the “North” accent in this system includes speakers in Minnesota, Wisconsin, Chicago, Michigan, and western New York. Many people who have grown up in Wisconsin can hear the difference between a native Wisconsinite and a native Minnesotan, and the Chicago accent is (to the author’s ear) different from the typical Wisconsin accent. Labov and his colleagues include 11 states and parts of two others in their “Western” accent, but speakers from Colorado and Southern California, included in the “Western” accent group do not sound alike (at least to these ears). The accent regions identified by Labov and his colleagues serve an important purpose even though each of the categories contains accent variation. The average person walking on the street in Madison, Wisconsin, who is introduced to someone from Mississippi, Georgia, or South Carolina is likely to hear their accent as “Southern.” The Georgian may hear the difference between her accent and that of the South Carolinian or Mississippian. But it is unlikely for a Georgian without special training to detect the differences between a Wisconsin, Minnesota, and Michigan accent; they all sound “Northern.” So, the variation within any one of the six accent groups is likely to be detected by someone within the accent group but not by someone from a different accent group (see Clopper, Levi, & Pisoni, 2006, for experimental work on perceptual identification and discrimination of regional accents). Talker accents often highlight “us” and “them” (Müller, Ball, & Guendouzi, 2000). Accents are an identifier of geographical/cultural allegiances and associations. Regional accents can contribute powerfully to how we are perceived, and how we perceive ourselves. When I first came to Madison, Wisconsin, it was not only my initial inability to identify a “bubbler,” or to understand what it meant when someone handed me a hammer and said, “Hold this once,” that made me feel different. I also sounded funny when I spoke. Within 10 minutes of driving into town and interacting with several people as I signed a lease and got something to eat at the (now defunct) Marc’s Big Boy, I was acutely aware of my funny-sounding vowels. Among all of these many accents, can one be defined as “standard”? And what does it mean to have a standard accent? This question and its possible answer have as much, if not more, social-political meaning than linguistic importance. In Great Britain, for example, there has been a long-standing debate concerning the advantages of a standard accent. This accent is defined as that heard in the speech produced

51

by educated people who are likely to be from the London area or have learned London-accented English (Müller et al., 2000). In the United States, the debate is not as public as it is in England, but the tendency among some accent groups to regard other accents as “substandard” or to associate certain accents with certain personality traits cannot be denied (see review in Fridland, 2008). The issue of whether or not a standard accent exists or should exist continues to be a matter of debate among linguists and educators. In this text, the position is taken that accent is such an important part of regional identification and culture that any one accent cannot be viewed as “better” or more desirable than another. This is consistent with the acceptance of all cultures, and specifically all the subcultures in America, as equally worthy and equal partners in the creation and shaping of the American social landscape.

Dialect Labov and his colleagues actually called the six accent regions described above “dialect” regions. As mentioned earlier, “accent” refers to an impression of the sound of speech, which is what Labov meant when he enumerated the six “dialect” regions. But the term “dialect” is technically different from “accent” because dialect includes aspects of language that go well past the sound of speech. Dialect is defined as a language variant, typically associated with a geographic region or group of people. A dialect may include unique sound and prosodic characteristics — that is, an accent — but may also include unique vocabulary items, grammatical structure, and even rules for how people communicate. Accent is therefore a component of dialect. The Wisconsin dialect, for example, not only includes different-sounding vowels than the Philadelphia dialect but as described earlier has vocabulary items such as “bubbler,” “hoser” (probably borrowed from Canada), and “pop” (as in, soda), which are typically not part of the Philadelphian’s vocabulary. A less subtle dialect difference in the United States is between AAVE and the several accent and dialect variants of American English spoken by white persons around the country. In addition to the phonological characteristics of AAVE which are different from most white regional dialects, AAVE has vocabulary items not heard in white American English, and may also have different rules for social communication. AAVE is a dialect of English, not simply an accent difference from the several accent varieties of white American English. What does it mean to say that a language such as American English has several different dialects? Dialects of a language are typically mutually intelligible; speakers of different dialects can communicate

52

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

effectively, even if occasionally they are confused by a word or grammatical usage, or if they find their communication partner’s concept of personal body space a little odd. This idea of mutual intelligibility, even when accent, vocabulary, grammatical form, and other aspects of language usage vary quite a bit, is the usual standard for differentiating a dialect difference from a language difference. When two people talk and cannot understand each other, it is likely they are speaking different languages, not different dialects of the same language. The mutual intelligibility criterion for distinguishing a dialect difference from a language difference is a technical distinction that does not always fit easily into real-world experience. Most Americans have had the experience of listening to rapidly produced British or Australian English — in a movie theater, for example — and having great difficulty following the dialogue. The languages spoken in various parts of the United Kingdom, Canada, Australia, New Zealand, and parts of India are surely English, technically are dialects of English, but they are not always mutually intelligible. If you travel to Manchester, England, and ask someone on the street for directions and her reply seems unintelligible, is she speaking a dialect of your language? The answer seems to be “no” when “mutual intelligibility” is the criterion for different speech patterns/styles to qualify as dialects of the same language. But the Manchester native is speaking English; how do we resolve this? Perhaps the resolution is to admit the uncertainty of a dividing line between different dialects and different languages (Backus, 1999). Languages evolve — they are constantly changing — and over time a changing

dialect variation may be sufficient to make it unintelligible to other users of the parent language. The dialect difference then becomes a language difference. The relationship between dialect and accent is summarized in Figure 4–1. Accent is shown as a component of dialect but as separate from other components of language that contribute to dialect variation (e.g., morphology, discussed in Chapter 3).

Code Switching When dialect differences exist between two groups of people who have extensive contact as a result of common neighborhood, common workplace, or friendship (among other factors), speakers in one group may develop the skill of switching to the dialect of the other group. This skill is referred to as code switching. Language is a code, and the ability to switch between different versions of the code is valuable. An SLP who can code switch among children with different native dialects or languages has a distinct advantage in the planning and execution of (for example) language therapy. This advantage may enhance the language learning process among the children. Code switching takes place for all aspects of language: for phonetics, phonemes, lexicon, morphology, word order in sentences, and even pragmatics. The role of code switching in American society is becoming more important as multiple-language homes are increasing, and parents are emphasizing among their language-learning children mastery of the language spoken in the home as well as that of the majority language of the society in which they live.

The Angel’s Share When whisky is aged in oak barrels, the fluid may fill the cask nearly to the top. After aging, when the barrel is finally opened for bottling, the volume of the whisky is less than it was when originally poured for aging — the fill line has decreased. The evaporated whisky, lost during the aging process, is called the “angel’s share” — people on earth will have plenty to drink, they won’t miss this small tribute to those on a different plane. The 2012 film The Angel’s Share is a story about a whisky heist by a small group of Scottish men, looking to profit from the removal of a priceless barrel of aged whisky from a famous distillery. The main actors are Scottish, and the one British actor does a spot-on Scottish brogue; they all speak English throughout the film. If you see the film (highly recommended), you may be surprised to see subtitles. It is English, right? Why the subtitles? After the first two or three minutes of the film, you understand the use of subtitles perfectly: the dialogue is nearly unintelligible to the typical American ear. Dialect or language difference? Watch the film; you decide.

4  Communication in a Multicultural Society

53

Dialect

Accent

Other Language Components

Grammar

Morphology

Vocabulary

Figure 4–1.  Dialect and accent.

Language Components The language units referred to in the text are described in Chapter 3 and appear frequently in subsequent chapters of this textbook. By way of review, single-sentence definitions of each component of language are as follows: (a) phonetics designates the speech sounds of a language; (b) phonemes designate the speech sounds of a language that, when exchanged in the same position of a sequence of sounds (e.g., the /k/ and /g/ of the words “coat” and “goat”) change the word meaning; (c) morphology designates the meaningful units of speech that “inflect” words and change their grammatical identity (e.g., making a word plural as in dog versus dogs, or indicating past tense as in “want” versus “wanted”; (d) vocabulary (the lexicon) includes the word forms people implicitly agree upon as having specific meanings; (e) syntax designates the grammatical rules of language that permit certain word orders but not others for the formation of sentences (e.g., in English, “Big dog” follows grammatical rules but “Dog big” does not); and (f) pragmatics determine the social use of communication that depend on factors such as age, gender, ethnic group, and so forth)

Foreign Accent Scientists and clinicians are interested in the characteristics and possible modification of foreign accent. The characteristics of foreign accent are interesting because

they provide phoneticians (people who study the sounds produced in different languages of the world) and phonologists (people who study speech sound systems — the rules governing the use of speech sounds for purposes of communication) with the opportunity to ask, “How do the speech sounds of one language affect a person’s ability to produce the speech sounds of a second language?” To illustrate, consider the Swedish, Greek, and American English languages. Swedish has a more complicated set of vowels as compared to English, which in turn has a more complicated set of vowels than the relatively simple, five-vowel system in Greek. When a Swede and a Greek are attempting to learn American English, does the relative complexity of their native vowel systems affect the way they learn English vowels? The best answer we can give today for this question is: yes, the relationship between the vowel systems of two languages influences the ability to learn the nonnative system, but the specifics of this influence are complicated. This scientific question, of how the sound systems of two languages influence the ability to learn the sounds of a second language, is relevant to the practical issue of modifying a foreign accent. SLPs with relevant training provide services to people who want to modify a foreign accent. Modification of foreign accent is also a rather controversial aspect of our field (see, e.g., Fitch, 2000; Müller, Ball, & Guendouzi, 2000; Powell, 2000; and Winkworth, 2000). One reason for the controversy is found in the potential for independence between accent severity and speech intelligibility. A speaker may have a substantial accent when speaking a nonnative language — that is, the speaker can clearly be identified as a nonnative speaker by a native speaker — yet be perfectly intelligible. If, for example, a native speaker of Mandarin Chinese produces accented

54

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

but perfectly intelligible English, why seek modification of the accented English? Intelligibility is often the main concern of people who seek to reduce their foreign accent; after all, being understood is the primary goal of communication. Some speakers may seek the help of an SLP, or a specialist in English as a foreign language, to reduce their foreign accent simply to make them sound less “different.” Speakers may also recognize that their foreign accent does not compromise their intelligibility but still causes their listeners to work harder to extract the fully intelligible message (Floccia, Butler, Goslin, & Ellis, 2009). Accent reduction therapy is most commonly initiated by the person seeking to reduce an accent (rather than by a health care or education specialist). There may be certain situations, however, in which a speaker’s accent is judged as a potential challenge to his or her professional success, and a recommendation is made to seek accent reduction therapy. For example, students enrolled in a master’s-level SLP training program who are not native speakers of English may have accents that interfere with aspects of therapy for articulation disorders in children or adults. At least part of articulation therapy involves clinicianproduced models of the speech sounds being trained. A clinician (the student with the accent) who produces a sound model that is insufficiently native-like may be recommended for accent reduction therapy. Part of the controversy surrounding the idea of accent (or even dialect) reduction therapy is who decides what the “reference” accent or dialect is, and even if there should be such a reference. The interplay of accent, dialect, culture, and professional training is complicated. A few years ago, the author had a chance to visit a Communication Sciences and Disorders training program at a southern university, where the majority of graduate students in clinical training had a strong southern accent (to these northern ears, and I suspect most northerners). I asked myself how these students, having completed their clinical training program and seeking jobs, might fare in an interview at a school or hospital in a northern state. I reached a tentative conclusion that two candidates for a job as an SLP in a northern school or hospital with equivalent credentials, graduate school success, and comparable letters of recommendation, who both performed well at an interview, but who differed by their regional accents, would not be viewed as equally qualified for the job because of the accent difference. (I suspect the same situation would apply for the same hypothetical situation except with the regions reversed — native northerner and southerners applying for an SLP position in a southern school or hospital.)

Bilingualism and Multilingualism The deep connection between language and culture is a professional challenge when language development takes place in two or more languages. A child who has roughly equal exposure to multiple languages (and, likely, cultures) as she learns language from birth is a “simultaneous” bilingual (or, more rarely, trilingual, or however many languages the child is immersed in). Children who learn one language first and around the age of 3 or 4 years are immersed in a second language, and develop roughly equal competence in both languages, are called sequential bilinguals. Bilingualism is not restricted to oral language. A person who is fluent in both an oral and American Sign Language (ASL) is considered bilingual. Bi- or multilingual language development and multilingualism in adults raise questions for the SLP and audiologist. For example, in the early stages of language learning, does equal exposure to two languages affect language development, either in a positive or a negative way (i.e., is typical bilingual language development more or less the same as monolingual language development)? If a child has language delay in one of the languages but not the other, is speechlanguage therapy appropriate for the language with delay? When speech-language therapy is indicated for developmental delay in both languages, does it make a difference which language is used by the therapist for language stimulation? Many other questions have been asked about the role of multilingualism in language development (see Goral & Conner, 2013, for a review of these issues). Scientists and clinicians have also been interested in the effect of bilingualism on speech and language perception and comprehension. For example, is a listener’s comprehension (or the cognitive processes that support comprehension) affected when a speaker uses the same language as the listener, but with a mild to moderate accent? Let’s say that a monolingual, English-speaking child listens to a native speaker of Spanish who is speaking accented English; is the listener’s comprehension affected by the accent, compared to listening to a native speaker of English? At first glance, the role of accented speech in language comprehension may seem no more than an academic, laboratory exercise. The implications for clinical practice, however, are potentially substantial, as illustrated by the following two questions. First, when language stimulation services are provided by an SLP with accented English, for an English-speaking child with language delay or an adult with a comprehension

4  Communication in a Multicultural Society

deficit resulting from a stroke, does the accent result in poorer comprehension as compared to speech without an accent? Second, when an audiologist performs speech perception tests as part of a diagnostic workup for a possible hearing disorder, are the test results affected by the accent of the speaker who produced the words or sentences used in the testing (Shi, 2014)? Based on research to date, the answer to both questions seems to be “yes” even if all the details of accent influence have not been determined (Harte, Oliveira, Frizelle, & Gibbon, 2016). The study of the effect of accented speech on language comprehension is worth the effort. It is a significant aspect of consideration of multicultural and multilinguistic influences in a speech and hearing clinic. Accent and its potential relevance to speech and language therapy may also apply to accent variation within a language (e.g., the effect of New England– accented English on comprehension in speakers who hail from the Pacific Northwest, or of Irish-accented English on American Southern–accented English).

Chapter Summary Culture can be defined in many ways, but each definition mentions beliefs, behaviors, and symbol systems that are shared and agreed upon by members of the cultural community. Language is intertwined with culture in the agreement among members of the connection between symbols and meaning; language is conventional. The population change under way in the United States, and in many other countries, requires SLPs and audiologists to understand different cultures and the influence of these cultures on speech and language behaviors. A major consideration in diagnosing and managing speech, language, and hearing disorders is recognition that there is no dominant language, meaning that evaluation of a potential communication disorder must account for cultural differences. Standardized tests of speech, language, and hearing disorders that are normed on a group of children or adults from one culture are not likely to be valid as assessment tools for children or adults from another culture. In the evaluation of a possible communication disorder, a cultural difference must not be confused with a communication disorder. Accent refers to the “way people sound” when they talk; accent includes speech sounds and prosody,

55

the latter including variations in the melody of speech (intonation), loudness of speech, and rhythm of speech. Accent may refer to regional accent (varying accent among native-born speakers of one language) or to foreign accent (accented speech of a speaker having one native language who speaks a second language). Dialect includes accent, but also word and phrase choices, the order of words in sentences, and the use of minimal units of meaning in language (morphemes). Regional accent, dialect, and foreign accent may affect speech, language, and hearing testing and management, depending on the similarities between the therapist or tester accent/dialect and the accent/dialect of the person receiving management services or being tested.

References Backus, E. (1999). Mixed native language: A challenge to the monolithic view of language. Topics in Language Disorders, 19, 11–22. Benedict, R. F. (1934). Patterns of culture. New York, NY: Houghton Mifflin. Cheng, L-R. L. (2000). Children of yesterday, today, and tomorrow: Global implications for child language. Folia Phoniatrica et Logopaedica, 52, 39–47. Cheng, L-R. L. (2001). Educating speech-language pathologists for a multicultural world. Folia Phoniatrica et Logopaedica, 53, 121–127. Clopper, C. G., Levi, S. V., & Pisoni, D. B. (2006). Perceptual similarity of regional dialects of American English. Journal of the Acoustical Society of America, 119, 566–574. Deacon, T. W. (1997). The symbolic species. New York, NY: W. W. Norton. Fitch, J. (2000). Accent reduction: A corporate enterprise. Advances in Speech-Language Pathology, 2, 135–137. Floccia, C., Butler, J., Goslin, J., & Ellis, L. (2009). Regional and foreign accent processing in English: Can listeners adapt? Journal of Psycholinguistic Research, 38, 379–412. Fridland, V. (2008). Regional differences in perceiving vowel tokens on Southerness, education, and pleasantness ratings. Language Variation and Change, 20, 67–83. Giri, V. N. (2006). Culture and communication style. Review of Communication, 6, 124–130. Goral, M., & Conner, P. S. (2013). Language disorders in multilingual and multicultural populations. Annual Review of Applied Linguistics, 33, 128–161. Harte, J., Oliveira, A., Frizelle, P., & Gibbon, F. (2016). Children’s comprehension of an unfamiliar speaker accent: A review. International Journal of Language and Communication Disorders, 51, 221–235. Huntington, S. P. (1996). The clash of civilizations and the remaking of world order. New York, NY: Simon and Schuster. Kretzschmar, Jr., W. A. (2008). Public and academic understandings about language: The intellectual history of Ebonics. English World Wide, 29, 70–95.

56

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Labov, W., Ash, S., & Boberg, C. (2006). Atlas of North American English. Berlin, Germany: Mouton de Gruyter. Larroudé, B. (2004). Multicultural-multilingual group sessions: Development of functional communication. Topics in Language Disorders, 24, 137–140. McWhorter, J. (2001). The power of Babel: A natural history of language. New York, NY: Times Books Müller, N., Ball, M. J., & Guendouzi, J. (2000). Accent reduction programmes: Not a role for speech-language pathologists? Advances in Speech-Language Pathology, 2, 119–129. Powell, T. W. (2000). The turn of the scrooge: One Yank’s perspective on accent reduction. Advances in Speech-Language Pathology, 2, 145–149.

Pullem, G. K. (1999). African American Vernacular English is not standard English with mistakes. In R. S. Wheeler (Ed.), The workings of language (pp. 39–58). Westport, CT: Praeger. Rickford, J. R. (1997, December 1). Suite for ebony and phonics. Discover Magazine. Retrieved from http://discover​ magazine​.com/1997/dec/suiteforebonyand1292 Shi, L-F. (2014). Speech audiometry and Spanish-English bilinguals: Challenges in clinical practice. American Journal of Audiology, 23, 243–259. Winkworth, A. (2000). Promoting intelligibility not terminology: The role of speech-language pathologists in accent reduction programmes. Advances in Speech-Language Pathology, 2, 139–143.

5 Preverbal Foundations of Speech and Language Development Introduction “Preverbal speech and language development” refers to the set of communication skills developed by an infant roughly between birth and the production of the first word, typically around 1 year of age. This straightforward description does not do justice to the many controversies surrounding exactly how and why children move from producing no words and understanding little at birth to uttering their first word around 12 months of age and at the same time understanding many more words. An understanding of preverbal speech and language development requires knowledge of both emerging production, perception, and comprehension skills. Issues of motor maturity (production), auditory perceptual skill (perception), and ability to represent auditory percepts for the purposes of linguistic categories including those for meaning (e.g., words), are all relevant to preverbal language development. The issues in preverbal language development are controversial. In the Chomsky view, speech and language is not “learned” in the traditional sense but rather triggered biologically at some point in develop-

ment. The alternative viewpoint regards speech and language skills strictly as things to be learned, and the first year of life as an intensive, immersion crash course in language learning. Of course, biological maturity plays a role in the efficiency and success of this learning, but the learning perspective typically rejects the idea of a special speech-language device in the brain that “turns on” the ability. There is a substantial body of facts concerning the development of speech and language skills during the first year of life. Scientists have made many observations concerning preverbal speech and language skills of babies, and written pages and pages of theory to explain their results. In this chapter, the facts of preverbal skills — those specifically relevant to speech and language development, are presented chronologically, as they evolve through the first year of life. This chronology is supplemented by some ideas about how speech and language development following the preverbal year — after 1 year of age — emerge from this preverbal skill development. This is very important, because the way in which the skills underlying speech and language develop during the first year of life is not strictly logical — the way it does happen is not the only way it could happen.

57

58

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Preparatory Notes on Developmental Chronologies Typical speech and language development in the first year of life is chronicled in this chapter according to three broad time periods: 0 to 3 months, 3 to 8 months, and 8 to 12 months. These time periods are “loose”; they represent average developmental sequences and are by no means applicable to every typically developing baby. Among children who are developing without disease or obvious delay resulting from an undiagnosed problem, there is substantial variability in the chronology of development. The notion of “typical” development recognizes this variability by understanding the emerging skills to fall within a fairly large range. An understanding of this variability also explains the use of rather broad time intervals for stages in the chronology (for example, 3 to 8 months). In addition, the discussion within each of the three time periods often refers to developmental processes in one of the other time periods. For example, preverbal skills in the 0- to 3-month period are presented along with the implications for preverbal skills in the 3- to 8-month and 8- to 12-month periods. Throughout this chapter, keep in mind the variability in preverbal language skills across typically developing children, as well as the links between earlier and later preverbal skills. The chronologies are first separated into production and comprehension skills, followed by additional information on interactions between the two. Children in the first year of life must learn to use their lungs, larynx, tongue, lips, and jaw (as well as other structures of the head and neck) to perform the motor skills required to make speech sounds; or they reach an age where speech motor ability becomes available to them for the purpose of producing speech sounds (for this subtle distinction, see the section, “3 to 8 Months: Production,” on babbling). This is the production part. Likewise, children must learn to make perceptual distinctions relevant to their native language, to associate specific sequences of sound distinctions (e.g., words) with meaning as a linguistic representation, and to use their memory skills to access the link between the acoustic signals and their meanings. Or, from the perspective of the Chomsky view of language development, they must reach an age where the ability to perform the linguistic interactions between perception, mental representation, and memory is “turned on” by maturation of brain mechanisms dedicated to language. This is the comprehension part. During the first 1

year of life, language comprehension skills are typically more advanced than expressive (production) skills.

0 to 3 Months: Expression (Production) Babies cry a lot in the first few months of life; this is hardly a surprise. Crying in very young infants is a reflexive vocalization to hunger and other forms of discomfort (e.g., being too cold or too hot, being in pain because of gas, and so forth). Most experts do not regard crying among very young babies to have propositional value, in the sense of the vocalization having meaning. Infant cries and the variation of their quality clearly affect parent perception of a baby’s comfort level and needs (Lagasse, Neal, & Lester, 2005). At approximately 2 months of age, babies may produce what Oller (1980) called quasi-resonant nuclei. These are clearly not reflexive expressions of discomfort and may occur as apparent “happy responses” when a parent talks to the baby. Oller called them quasi-resonant nuclei because they give the impression of slightly muffled, nasalized vowels that often seem to be produced with the lips closed. If you have held a baby at this age and heard these kinds of sounds, you will recognize them from this description, and may remember thinking, “How is the baby making a sound that seems vowel-like even though her mouth is closed?” Toward the end of this period, the baby may produce vocalizations called “coos” and “goos.” The range of speech sounds in these early, nonreflexive vocalizations is limited and often includes the vowels “ah” (as in “hot”) and “oo” (as in “boot”), sometimes with a consonant-like sound resembling a “k” or “g.” Most likely, coos and goos are not intentional; the baby does not intend to communicate some meaning with these vocalizations. An issue in the initial months of preverbal sound development concerns the interaction between the baby’s anatomical structures and sensorimotor capa­ bilities,1 and the sounds the baby produces. When a gesture is produced, such as hand waving for “byebye” or shaping the oral and throat cavities with a narrow oral constriction between the tongue and the front of the hard palate (bony roof of the mouth) and a wide throat passageway (as in the vowel “ee”), specific patterns of muscle contraction must be produced. In addition, sensations from these contractions (such as the feel of the sides of the tongue against the teeth when the front constriction is made) are part of the package

“ Sensorimotor,” as used here, denotes the brain mechanisms used to control movement. The inclusion of both “sensory” and “motor” in the term reflects the role of sensory and motor capabilities, and the integration of the two, in movement control.

5  Preverbal Foundations of Speech and Language Development

of information used to verify the “correctness” of the gesture. These sensorimotor skills mature in the first year of life, but their relative immaturity in the first several months is one limiting factor on the kinds of sounds produced by a baby. Of equal interest is the effect of vocal tract growth on sound production during the first year of life. The vocal tract is the air passageway between the lips and the vocal folds (often called the vocal cords). The shaping of this air passageway by movements of the lips, tongue, jaw, and pharynx (the throat) determines which speech sound is produced. Figure 5–1 shows an artist’s rendition of two vocal tracts from a side view. The drawing on the left shows a vocal tract for a newborn and on the right for a young adult. The vocal tracts are shaded light blue in these drawings, making it easy to see not only the age-related difference in length but also in shape. The shaded area is an air-filled, flexible tube. Note the shortness of the newborn’s pharynx (the distance from the posterior tip of the soft palate to the vocal folds) in comparison to the adult’s. This can be best appreciated by looking at the near-contact of the posterior tip of the velum and upper edge of the epiglottis in the newborn as compared to the clear separation between these structures in the adult. The shape differences are further highlighted by showing the bend of the two vocal tracts, from the oral to throat cavities, with simple straight lines. In the adult, the pharynx (throat) cavity is oriented roughly at a right angle to the mouth cavity. In the newborn, the two cavities form a more open angle, and thus have a gentler transition between them. The close approximation of the epiglottis and soft palate in the very young infant most certainly con-

Infant

59

tributes to the sound of the “quasi-resonant nuclei,” as previously described. This is because the airway is continuous from the vocal folds through the nasal cavities in the newborn — in the adult a clear airway path from the vocal folds through the mouth is more available. Throughout the first year of life, a major growth pattern of the infant vocal tract is a lengthening and descent in the neck of the pharynx. As the pharynx lengthens and the vocal folds at the bottom of this tube move down and away from the velum, the pharynx rotates relative to the mouth, creating the 90° bend seen in adults (see Vorperian, Kent, Lindstrom, Kalina, Gentry, & Yandell, 2005, for measurements of patterns of vocal tract growth from birth to nearly 7 years of age, and in the adult years). Why are the shape differences between the newborn/early infancy and adult vocal tracts interesting with respect to sound production? The vocal tract is an air-filled tube with resonant frequencies that vary by length and shape. “Resonant frequency” is an acoustic term that denotes a frequency (rate of vibration) at which the amplitude of vibration is maximum. This description may gain some clarity by considering pipes in concert organs. The pipes of the instrument vary in length. The longer the pipe, the lower is its resonant frequency. In the human vocal tract, the tube not only can be of a different length (e.g., the difference between the short length of a baby and the long length of an adult male), but because it is flexible, it can also change shape. Changes in positions of the articulators create different vocal tract shapes, which result in changes in resonant frequencies of the vocal tract. The different resonant frequencies for different vocal tract shapes are recognized as different vowels. Chapter 11 provides more

Adult

Figure 5–1.  An artist’s rendition of a newborn (left image) and adult (right image) vocal tract, as seen from the side with one side of the head removed. The blue lines on both vocal tracts show the angle of the mouth (oral) and throat (pharyngeal) cavities.

60

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

information on how changes in vocal tract shape result in changes in resonant frequencies of the vocal tract. For many years, the gently curved vocal tract shape of newborns was thought to limit the kinds of vowels produced in cooing and gooing, and even in babbling behaviors occurring later in the first year of life. The frequent occurrence of vowels such as “ah” and “oo” in coos, therefore, was thought to be as much, if not primarily, a result of the baby’s anatomy as of the baby’s limited sensorimotor control. Even if the baby’s sensorimotor control was adult-like, the story went, infant’s vocal tract size and shape prevented the creation of vocal tract shapes required for certain vowels. Based on recent research, it seems this interpretation is only partly true. The infant vocal tract does not actually prevent the occurrence of certain vowels, but the nonadult anatomy may promote the production of a limited type of vowel such as the “eh” in “bed” and the “ah” in “hot” (Menard, Schwartz, & Boë, 2004). Other vowels can, in theory, be produced by the baby vocal tract, but during the first three months it is easier to produce a limited set. The ability to make certain speech sounds during the first year of life, and especially the first three months of life, is therefore related to a host of factors. These include the child’s anatomical, sensorimotor, and cognitive maturity. The development of speech sounds also depends on the child’s ability to hear speech sounds and distinguish them from each other.

0 to 3 Months:  Perception and Comprehension In this chapter, the term “perception” refers to the ability to detect an auditory feature in an acoustic signal or to discriminate one acoustic signal from another. The focus is on acoustic signals and the auditory abilities required to hear and process them, because these are most relevant to preverbal speech and language skills. One can just as easily imagine cases in which perceptual skills for visual and tactile signals are important for communication. The combination of all perceptual skills plus cognitive processes (such as memory) must be considered in the meaning of “comprehension,” which is the ability to understand communicative intent and meaning.

As an example, it is entirely possible for a baby to have reasonable perceptual skills but poor comprehension, or to be able to comprehend well even with nonoptimal perceptual skills. This distinction between perception and comprehension is illustrated by the baby’s skills in the 0- to 3-month period. It has been known for many years that babies as young as 1 month of age can discriminate between very similar sounds (such as “p” and “b,” or “s” and “sh”) in much the same way as adults (Eimas, Siqueland, Jusczyk, & Vigorito, 1971).2 Scientists agree, however, that the ability of infants to comprehend speech, to extract meaning from communicative situations, is extremely limited. Babies’ abilities to discriminate subtle phonetic differences in much the same way as adults may reflect general auditory skills, rather than skills specific to speech perception. In other words, the baby’s detection of “p” and “b” as different auditory events is not relevant to “p” and “b” as linguistic events — that is, as phonemes. Some scientists believe the newborn auditory system3 is equipped for auditory distinctions just like the adult auditory system. In fact, very young babies can discriminate virtually all phonetic distinctions that are used in languages of the world (Vihman, 2017). The auditory capability for any distinction is available early in the first year of development. How do babies begin the process of learning the distinctions that are relevant to their native language? There are theories that address this issue. A useful theory must explain an additional phenomenon that unfolds during the first year of life: the almost universal set of phonetic distinctions in the infant’s auditory repertoire gets “pared down” to only the ones used in the native language. Vihman (2017) said that progress in phonetic perception is best defined as loss of the ability to discriminate contrasts that are not relevant in the native language (Kuhl et al., 2006). As the baby develops, auditory distinctions relevant to phoneme distinctions in the baby’s language are “tuned up” by exposure, whereas distinctions not relevant to the native language weaken and then disappear. The baby hears a huge amount of native contrasts, creating a special sensitivity for them. Over time, the nonnative contrasts cannot compete for the baby’s attention with native contrasts; the ability to discriminate the nonnative contrasts disappears.

2 

 he adult data on discrimination of phoneme contrasts were based on volitional responses (writing down the phoneme heard, or pushing a T button labeled with the phoneme heard). Infants obviously cannot make the same kinds of responses, so Peter Eimas and his research group exploited a well-known baby skill — sucking on a nipple — to demonstrate the baby’s ability to discriminate between two sounds having just slightly different acoustic characteristics.

3 

 he auditory system includes all auditory structures from the external ear (the part attached to the side of the head) to the cortex of the cerebral T hemispheres; see Chapter 22.

5  Preverbal Foundations of Speech and Language Development

We are a little bit ahead of the chronology that organizes this chapter. The paring down to perceptual sensitivity for acoustic contrasts used in the native language and the disappearance of sensitivity to other acoustic contrasts is complete by 10 to 12 months of age. (The research by Segal, Hejli-Assi, and KishonRabin [2016] is an example of this kind.) The process begins, however, in the first three months with a goal at the end of the first year of life of maximal sensitivity to relevant phonetic contrasts — the establishment of phonemic categories. As outlined later, these categories are essential to word learning and word production as well. One apparent skill possessed by infants as young as 4 days of age that cannot be explained as part of general auditory mechanisms is the ability to distinguish utterances spoken in their native language from utterances spoken in a foreign language. Jacques Mehler and his colleagues (Mehler et al., 1988) demonstrated this ability in 4-day-old French infants who appeared to be sensitive to the difference between French and Russian utterances, and in 2-month-old American infants who gave evidence of hearing the difference between Italian and English utterances. Mehler and his colleagues believed the infants’ remarkable ability to distinguish the languages was based on knowledge of the prosodic characteristics of their own language, as compared to other languages. This knowledge may have been gained by native language exposure both in the womb and after birth. Nonnative prosodic patterns that do not match the native language may be detected as a “new” event. No matter how recent the exposure, babies appear to attend to and retain the melodic and rhythmic characteristics of the language used in their home. Scientists have argued that this ability to recognize the rhythmic and melodic characteristics of their native language is a foundation for infants’ developing ability to recognize words (Werker, 2012). The ability to detect the unique melody and rhythm of the native language rhythm “bootstraps” the extraction of words from the speech signal. Babies typically comprehend their first words around 6 months of age, using their knowledge of the rhythmic aspects of speech to isolate a word (Werker, 2012). Although babies between 0 and 3 months of age almost certainly have very limited comprehension of speech and language, other behaviors may lay the groundwork for future communication skills. For example, caregivers may interpret baby sounds and mutual eye gaze as having communicative intent. Based on this assumed communicative intent, the caregiver may engage in turn-taking behavior, exchanging vocalizations with the baby and using eye contact according to the “typical” rules of conversation. Even

61

in the absence of true comprehension, turn-taking may provide a model for the baby’s learning about communicative interaction.

3 to 8 Months:  Production Most babies do not produce true babbling until 6 or 7 months of age. What does “babbling” mean? The term is reserved for those vocalizations in which consonants and vowels are clearly recognized but do not form words. Early in the 3- to 8-month period, the coos and goos have a few vowels, but as noted above, the consonant-like sounds may have only a vague resemblance to the real thing. Prior to the onset of babbling, there is an expansion of the baby’s vocal repertoire that may be partially supported by an increasing ability to mimic vocal behavior. As the baby moves toward the first half-year of life, she is likely to produce squeals, growls, yells, and Bronx cheers (“raspberries”), all of which seem like vocal play, practice, and exploration. When the baby starts producing consonants, they are likely to be labials (p, b, m) and those made with the front of the tongue (t, d, n). “Back” consonants such as “g” and “k” typically come later as a regular feature of babbling. Babbling typically begins between 6 and 8 months of age but in many cases of typical development may not begin until 10 months or a bit later. Babbling has a specific form. Its basic unit is a consonant-vowel (hereafter, CV) syllable, where the consonant is likely to be a “b,” “m,” or “w,” and the vowel is an “eh” like in the word “bet” and “uh” as in “but,” or an “ih” as in “bit.” The syllable may be produced once or repeated in a sequence (“buh-buh-buh-buh”). When the same syllable is repeated in sequence, there is little variation in syllable-to-syllable duration, pitch, and loudness. These syllable sequences are called reduplicative babbling. Although the CV syllable is most frequently observed in early babbling, other forms (such as vowelconsonant [VC] syllables) may also be heard. Why do most early babble syllables have a CV form favoring bilabial consonants like “b” and only a few of the possible vowels? Several proposals have been set forth to account for this fact, but here we briefly describe one perspective, chosen for its carefully developed background and theoretical simplicity. Peter MacNeilage and Barbara Davis, in work done at the University of Texas at Austin, view babbling as the evolutionary product of the discovery by nonhuman primates, and ultimately early humans, of the sound-producing capabilities of the moving articulators (MacNeilage & Davis, 2000). Chewing is characterized by rhythmic up-and-down motions of the

62

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

mandible as the tongue and teeth position and grind food. At some point in early history, primates accidentally or purposely phonated (created sound by vibrating their vocal folds) during this rhythmic mandible movement and took notice of the modulation of the sound (try it: generate a steady voice and move your mandible up and down, see how the motion generates a repeating “syllable-like” effect). According to MacNeilage (1998), this is how the basic syllable was born. Presumably speech evolved in humans from this basic syllabic “frame.” In fact, MacNeilage and Davis refer to the mandible opening-closing movement as a frame, a vocal tract movement capable of holding the content of a syllable. The closing of the vocal tract is conducive to forming the kind of tight constriction that is characteristic of consonants, and the opening permits the full acoustic resonance typical of vowels. Content of the basic syllable is the specific identity of the consonant and vowel making up the syllable. From the simple “buh-buh-buh” or “bih-bih-bih” resulting from rhythmic motions of the mandible with the tongue resting passively within the mouth, the tongue hitching a ride on the jaw, so to speak, humans learned that different content (different speech sounds) could be inserted into the basic frame by changing the motion and position of the tongue, lips, and jaw. This gave evolving humans a wide range of sound combinations for the labeling of different objects, actions, and people. When the basic CV was expanded to include different syllable shapes (e.g., VC, CCV) and different sequences of syllable shapes (e.g., “baby,” an elaboration of “buh-buh”; “ice cream” as an elaboration of “ay-ay”), the ability to label objects, actions, and feelings became immensely flexible. Together with advances in brain structure and function, a vast collection of words was assembled, and phrases to sequence those words allowed even wider communication of meaning. The creation and evolution of sophisticated human language skills may possibly be traced back to the basic babble syllable. The “Frames-then-Content” theory of MacNeilage and Davis (2000) is appealing for several reasons. First, it is a theory not only about the evolutionary basis of speech sound development, but also about the developmental course of speech sound development in the first year of life. Prebabbling babies can often be observed to wag their mandibles without sound, and it is not unreasonable to imagine them adding sound to the wags and discovering the syllable-like results, like their early human predecessors. A “buh” or “bih” is a common first syllable in babbling, just as one might expect from a simple mandible wag with sound added and a relaxed tongue. Second, the theory derives the soundproducing abilities of humans from very basic movements of the lips, jaw, and tongue within the human

vocal tract. The motorically simple “frame” of rhythmic mandibular movements can, as motor capabilities of the articulators develop and mature, be “filled” with increasingly complex content. This content will include the motions, positions, and configurations of the tongue, lips, jaw, and other parts of the vocal tract required for the production of different consonants and vowels. Third, the theory has universal implications: If early babbling is indeed derived from simple motions of the mandible, at least the early sound content of babbling should be the same in all languages because all babies are using the same mechanism. This last point is important and invites the question, “Do babies from different language environments produce a uniform set of babbling sounds, or do they also produce sounds showing the unique influence of their native language?” For babies just beginning to babble, a firm answer to this question is unavailable because there are not enough relevant data from vari-

Phonetic Practice or Random Sound Play? Before scientists began careful, detailed studies of babbling, it was thought to be no more than sound play. Babies discovered the sound-making capabilities of their little speech mechanisms: adjustments to the respiratory system (e.g., loudness change), larynx (e.g., pitch change), and articulators (speech sound change) resulted in a variety of speech sounds, some of them resembling those produced by adults and therefore very amenable to phonetic transcription. The sound play perspective on babbling implied that complete phonetic inventories for babies from all over the world would be the same — even in babbling from babies close to 1 year of age. In these inventories, most if not all of the possible phonetic events capable of being produced by the human speech mechanism would be found. There would be no language-sensitive influences on babbling because the onset of true language was initiated by the special language acquisition described in Chapter 3. We now know babies start this way, and as they progress to the end of the first year of language learning, sharpen and mold their phonetic inventories under the influence of their native language phonetics. Babbling is a well-practiced dress rehearsal for the opening night of language performance — the first words — rather than a cacophony of random sounds from performers without a script.

5  Preverbal Foundations of Speech and Language Development

ous languages. Some studies of early babbling suggest a phonetic inventory that is essentially uniform across several languages, with little evidence of the nativelanguage phonetic inventory. More data are available on the preword, phonetic inventories of later babblers whose native language is Swedish, Japanese, French, and English. These data show an influence of the native language on the phonetic inventory of babbling (Boysson-Bardies, Hallé, Sagart, & Durand, 1989; BoyssonBardies & Vihman, 1991; Lee, Davis, & MacNeilage, 2010). A conclusion from this work is that babbling and its development is a kind of practice for the use of native language speech sounds in first words. Babbling with a “true” CV form in which the syllable is repeated is called canonical babbling. D. Kimbrough Oller of Memphis State University, Tennessee, has been studying babbling for many years and believes the CV form is a basic structure of human sound systems (Lee, Jhang, Relyea, Chen, & Oller, 2018). A CV form, for example, is the most common syllable in many languages, and its frequent appearance in early (canonical) babbling is revealing of a basic structural characteristic of speech sound systems. A baby is said to produce canonical babbling when her previously difficult-to-transcribe sounds become recognized as “real” consonants and vowels in a CV form. Later babbling clearly shows influences from the phonetics of the parent language (as in a comparison between Mandarin Chinese and American English: see Lee et al., 2018). Other factors, such as parental interaction style, may also affect babbling onset and its content. Canonical babbling is not just a cute characteristic of infants taking their speech mechanism for a test drive. There is evidence that the age at onset of babbling predicts the age of onset of first words, and that the specific phonetic content heard in babbling predicts the phonetic content of first words (McGillion et al., 2017). In other words, the practice of specific speech sounds during babbling predicts early words having the same speech sounds (“the consonants used in babble are typically the ones used in first words” [McGillion et al., 2017, pp. 157–158]). The age of onset of babbling also seems to predict the development of vocabulary at later childhood stages. Finally, canonical babbling is delayed or absent in several developmental speech disorders and may contribute to the diagnosis of certain conditions (such as in autism, or in children with intellectual disabilities [Lohmander, Holm, Eriksson, & Liberman, 2017]).

4 

63

3 to 8 Months:  Perception and Comprehension During the 3- to 8-month period, babies lose the ability to discriminate between selected sound pairs as described previously. Phoneme contrasts that are not used in the native language “drop out” of the infant’s perceptual repertoire and the auditory system is gradually tuned to those contrasts used in the native language. A good example of this is the difficulty with the “r”-“l” distinction experienced by native speakers of Japanese when they are listening to (and producing) English. The loss of the ability to discriminate /r/ from /l/ does not affect the Japanese baby’s ability to master her language, because the “r” and “l” sounds do not create minimal pairs in Japanese.4 In English they do, as evidenced by word pairs such as long-wrong, lightright, hail-hair. Toward the end of the 3- to 8-month period, another remarkable, auditory-perceptual skill can be observed among infants. Around 6 to 7 months of age, infants show an ability to identify specific words within the ongoing stream of speech. The speech acoustic signal from connected speech does not show obvious boundaries between words, and indeed the continuous stream of sounds often makes the process of word identification difficult for digital speech recognition programs. Adults have little difficulty with this skill, but at some point infants must (and do) learn to perform segmentation of the speech signal to extract words from the continuous sequence of auditory speech events. An accumulation of evidence over the past 20 years suggests that even though infants may not comprehend the language being spoken around them, they are paying attention to its structural properties. Language exposure in the first half-dozen months of life and beyond allows infants to build up a cognitive, statistical model of these structural properties. Presumably, a product of this cognitive model around 7 months of age is knowledge of the likely forms of words. This knowledge allows the baby to begin extracting words from the stream of speech. The learned skill of identifying words in the continuous stream of speech is probably the basis for an increase in comprehension of language toward the end of the 3- to 8-month period. Infants begin to show comprehension of words as they approach 8 months of age, but only within rich context. This means the word is accompanied by supporting gestures and intonation, and perhaps is spoken at a particular time of day

 ecall from Chapter 3 that a minimal pair contrast is when one sound substitutes for another in the same position in a word (such as initial R consonants) and results in a change in word meaning.

64

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

(e.g., waking up, or at meals), all of which contribute to the infant’s comprehension. For example, a Mom who leaves the house every morning with her baby may say the word “out?” while pointing to the door; the baby understands the word because of the rich context in which it is spoken. The same word spoken at a different time of day and without the supporting gesture may not be understood by the 8-month-old baby. Werker (2012) describes a more specific example, relevant to the early comprehension of words, of how the baby’s environment may shape the phonetic distinctions that come to have phonemic status in the native language. In her own words, “the cooccurrence of two phones [she means, phonetic events] with two different objects could help pull them apart, whereas the cooccurrence of two phones with a single object could help collapse the distinction” (Werker, 2012, p. 55). What does Werker mean? Let’s imagine that a baby who is learning American English hears the word “lap” and “rap” in her environment. (“Come sit in Mommy’s lap” as Mom lifts the baby into her lap, and, “Mommy loves rap” as she turns up a tune in the car; Mommy is old school.) Baby knows, of course, that Mommy’s lap is a place she loves to be, and understands that Mommy loves music with heavy beats and a steady stream of human speech. The baby hears many versions of both words, because Mommy takes her in her lap a lot, asking her if she wants to sit there, and listens to a lot of rap in the car and is always telling baby the name of the music style. Baby develops the idea of the sounds, “l” and “r,” being associated with different meanings because clearly, Mommy’s lap and the music she calls “rap” are different things. This link-

age between sound differences on the one hand, and meaning differences on the other hand, “hardens” the “l” versus “r” sound contrast as a categorical difference — that is, the sound difference is phonemic, functioning to distinguish words. A Japanese baby may have a stuffed bunny rabbit, which her parents call “rini” (Japanese word for “little bunny”). The Japanese “r” is sometimes described as being articulated between an English “r” and “l”, and Mom and Dad may vary the way they say the “r” at the beginning of “rini,” but the word is always spoken in the context of the stuffed bunny. Baby hears the variations in the Japanese “r” sound and in so doing learns to dismiss the phonetic differences because the variation between “r”like and “l”-like phonetic sounds is not tied to meaning differences (in this case, object differences). This is what Werker means by a phonetic distinction being “collapsed” — the variants of the sound are not critical to signaling differences in meaning. In Japanese, they are phonetic variations of the same phoneme category. Werker’s (2012) idea about how phonetic differences may or may not become phonemic differences, and how a child’s learning of the sound system of his or her language is a pathway to word learning, is interpreted in Figure 5–2. The process of learning sound distinctions and their relationship to word learning is imagined in this figure as a circular, interactive process. The left side of Figure 5–2 depicts the process of learning the “l”-“r” distinction for babies whose native language is English. Different speech sounds are heard and paired with different objects or actions. The linkage of the phonetic variation with different objects suggests “l” and “r” as different categories, contributing

Words Established

Words Established

Different Speech Sounds

Different Phonemes

Different Objects or Actions

Different Speech Sounds

One Phoneme

Same Object or Action

Figure 5–2.  The learning of sounds as different phonemes (left circle) or variants of the same phoneme (right circle). The learning of phonemes “primes” the baby to be sensitive to words with the same sounds, and to develop early comprehension of words beginning with these sounds. Based on Werker, J. (2012). Perceptual foundations of bilingual acquisition in infancy. Annals of the New York Academy of Sciences, 1251, 50–61.

5  Preverbal Foundations of Speech and Language Development

to the learning of unique vocabulary items beginning with these different sounds (“Words Established” in Figure 5–2). The circular nature of the process represents the establishment of word items beginning with “l” and “r” and identifies these sounds as good candidates for the learning of other word items beginning with the same sounds. The newly learned “r”-“l” contrast primes the baby to be on the lookout for new words with these initial sounds. The process is interactive because it is not simply the accumulation of massive amounts of speech acoustic data , as happens when a baby is exposed to so much human speech, but rather the organization for meaning of these sound data by environmental data — people, objects, and actions. The right side of Figure 5–2 shows the same process for the baby learning Japanese. As in the case of the English-learning baby, the Japanese baby is exposed to phonetic variations that on close inspection by a trained phonetician seem to be sometimes “r”-like, sometimes “l”-like. In Japanese, these phonetic variations are applied to the same object or action, as if the single object/thing can be represented by the “r”-like or “l”-like variant. Unlike the case of English, the phonetic variants are not paired with different objects but are “mapped” on to the same object. When the child is exposed to multiple instances of the sound variations applied to a single object, and many other objects, names, and actions, she “collapses” the phonetic variations into a single phonemic category. New words can be established within this category, but the phonetic variants do not prime the child for “r”-like versus “l”-like words.

8 to 12 Months:  Production Canonical babbling may continue for several months after its onset, certainly into the last part of the first year of life. In some children, this most simple type of babbling is followed by continued use of the repeated frame, but with varied content across the syllables. Now, along with (or instead of) “buh-buh-buh-buh” the baby may produce “bah-dee-goo-gae,” or any number of other CV syllable combinations. This is called variegated babbling in recognition of its varied content. Variegated babbling retains certain features of canonical babbling, namely, a “metered” syllable sequence, as if every syllable has the same duration and “flat” melody, as if produced on a single pitch. If you 5 

65

listen to children or adults engaged in conversation, you hear their voice pitch change frequently, rising across certain groups of syllables and falling across others. Pitch changes in speech are called intonation, and most people can recognize “normal intonation” even if they cannot define it explicitly. Unusual intonation, whether the monotone speech of certain individuals who are sad or depressed or the wild pitch changes of a game-show host after too many cups of coffee, are also easily recognized. Adults are sensitive to small changes in intonation — they recognize something momentous when a baby begins to produce variegated babbling with intonation. It is as if the baby is trying to mimic the sound of normal conversation but does not have the words to convey meaning. This stage of babbling is called “jargon” because the child’s utterances sound like “real” speech even if consisting of strings of nonsense syllables with varied sound content. When children produce jargon, usually close to a year of age, parents can expect a first, “real” word at any time. Transitional forms between babbling and real words, called protowords (also called phonetically consistent forms), may also be heard. These are syllablesized utterances (usually more complex than canonical syllables) produced by the child in a consistent way, with a recognizable referent, apparently meant to convey meaning. Protowords, however, are not part of the adult lexicon — they are not real words. For example, one of the author’s nephews consistently said “bahmp” to refer to bread, either to identify it for the delight of his easily delighted audience or to request a piece for eating or shredding. The consistent use of “bahmp” to refer to bread qualifies it as a protoword because “bahmp” is not part of the English lexicon.5 Protowords are often mixed in with jargon, and their use is another signal that “real” words are just around the corner.

8 to 12 Months:  Perception and Comprehension In the 8- to 12-month period, infants continue to tune their ability to discriminate speech sounds of their native language. The corollary to this is the loss of discrimination ability for nonnative sound contrasts. Scientists believe this “sculpting” of speech perceptual skills is a result of continued exposure to the language. The infant’s cognitive model of the structural

I t could be part of the English lexicon, if users of the language would agree on its meaning. But at the current time, if you approached 50 people on the street and asked them what “bahmp” means, most would probably shrug their shoulders and say they do not know. All of those people, however, would recognize that the form of this protoword — a consonant-vowel-nasal-consonant (CVNC) form — is an allowable sequence of sounds for the formation of English words (see discussion of phonotactics in Chapter 3).

66

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

properties of the language is of prosodic form not only for words (e.g., the relationship between stressed and unstressed syllables) but also for the important segmental (sound-level) properties. As infants approach 1 year of age, they begin to comprehend more language, including an increasing number of words and, apparently, more complex sentences. Before and after the first birthday, children typically comprehend more words than they produce. Around one year of age children also begin to comprehend more complex sentences. Comprehension of complex sentences in this age period requires lots of supporting context, as described previously for word comprehension by children around 8 months of age. As the first year of life is completed, infants begin to show comprehension of some paralinguistic aspects of communication. The term “paralinguistic” is used here to denote nonsegmental (not associated with speech sounds and their sequencing to form words) aspects of voice and speech that nevertheless convey meaning. Often, this meaning concerns the mood, intent, or state of mind of the speaker. For example, rising pitch across an utterance signals a question, as in the difference between “He’s here?” versus “He’s here” (for wh- questions, the rising pitch is not necessary).6 Another example is loudness of the voice, which can signal a range of speaker moods and intents. “Bed time” means something very different spoken loudly as compared to a soft, gentle version of the utterance. Paralinguistic aspects of language are complex (ask anyone who has aspired to be an actor), but by the end of the first year, infants are beginning to comprehend their meaning.

Gesture and Preverbal Language Development People gesture when they speak; this is hardly a surprise. Perhaps a less obvious aspect of gesture is its integral role in communication. Adults, and children who are past their first words and on their way to twoword utterances, coordinate gesture with oral speech. Gestures play more than a supporting role in conveying meaning. Indeed, gesture may accomplish communicative goals not easily conveyed by oral speech such as (for example) representation of shape, size, and orientation of an object that is being discussed (Brentari & Goldin-Meadow, 2017; Goldin-Meadow, 2017). 6 

Gesture plays an important role in preverbal language skills as well. Many gestures are initiated by parents in their interactions with the baby. The gestures accompany recurring actions (“where is it?”; “more”; “all gone!”), emotional states (smiling, surprise, sad face), and representation of object properties, as noted just previously. As the baby enters the fourth month and is a more active communication partner, pointing becomes an important gestural component of communication interactions. Parents connect spoken words to people, objects, pictures, and actions by pointing at them. The baby hears and sees a multitude of this coordinated communication act, even for the same person, object, and so forth, and learns the idea of a word (the spoken label) as well as the potential utility of their own pointing gestures to request a spoken label. Pointing therefore serves as a builder of comprehension vocabulary in early babyhood and throughout the first year of life and months after as well. Pointing also serves to establish what is called joint attention. Around 6 months of age, babies will follow a point and look at the object or person to which (or whom) the point is directed (Rohlfing, Grimminger, & Lüke, 2017). The point joins the attention of both parent and child to the object, or to the person, or in more advanced cases to an action such as “running” (“Look, the boy is running”). The emergence of pointing is a landmark stage in preverbal language skills. Early pointing may even predict the speed and sophistication of language development in the first few years of life. Conversely, some scientists believe delayed pointing or its absence can be predictive of delay or disorder in language development (Lüke, Ritterfeld, Grimminger, Liszkowski, & Rohlfing, 2017).

Chapter Summary Throughout the first year of life, babies learn a range of linguistic and general knowledge skills that serve as the foundation for language skills. The skills include preparation for producing and understanding language. Three chronological age periods, 0 to 3 months, 3 to 8 months, and 8 to 12 months, are presented as age ranges during which preverbal skills are developed; the age ranges are somewhat arbitrary due to the large chronological variability among children’s mastery of these skills.

 he distinction between the intonation of statements (declarative utterances) and questions may be disappearing to some (or a large) degree. T In young people, there is a growing tendency to produce statements with a rising pitch. This is called “Uptalk” and seems to be more common in young women compared with young men. Some writers trace this trend to the early 1990s. The author believes a catalyst for the widespread use of this conversation style is the dialogue in the film “Clueless” (1995). For more on Uptalk, see Tyler (2015).

5  Preverbal Foundations of Speech and Language Development

In the 0- to 3-month period, babies are able to distinguish closely related speech sounds but do not truly comprehend language; during this period, their sound production is dominated by vocalizations that do not convey intention meaning. In the 3- to 8-month period, babies begin to comprehend simple aspects of language, but a rich context is needed to support comprehension. Comprehension skills improve throughout the 3to 8-month period, but a rich context remains an important component of language understanding Early in the 3- to 8-month period, sound production is characterized by coos and goos, and toward the end of the period babbling emerges. Canonical babbling, consisting of consonantvowel, repeated syllables, is the first syllable type produced by babies, followed by variegated and then jargon babbling. In the 8- to 12-month period, children begin to understand more complex utterances and have especially good comprehension skills with rich context. Production skills in the 8- to 12-month period may include protowords, also called phonetically consistent forms, which are not “real” words but are used consistently to refer to a specific toy, pet, parent, and other objects/people. First words are usually produced around 1 year of age. A good deal of theoretical controversy surrounds how and why babies develop language skills throughout the first year of life.

References Boysson-Bardies, B., Hallé, P., Sagart, L., & Durand, C. (1989). A cross-linguistic investigation of vowel formants in babbling. Journal of Child Language, 16, 1–17. Boysson-Bardies, B., & Vihman, M. M. (1991). Adaptation to language: Evidence from babbling and first words in four languages. Language, 67, 297–319. Brentari, D., & Goldin-Meadow, S. (2017). Language emergence. Annual Review of Linguistics, 3, 363–388. Eimas, P. D., Siqueland, E. R., Jusczyk, P., & Vigorito, J. (1971). Speech perception in infants. Science, 171, 303–306. Goldin-Meadow, S. (2017). What the hands can tell us about language emergence. Psychonomic Bulletin and Review, 24, 213–218. Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a facilitation effect for native language phonetic perception between 6 and 12 months. Developmental Science, 9, F13–F21. LaGasse, L. L., Neal, A. R., & Lester, B. M. (2005). Assessment of infant cry acoustic cry analysis and parental perception. Mental Retardation and Developmental Disabilities Research Reviews, 11, 83–93.

67

Lee, C-C., Jhang, Y., Relyea, G., Chen, L-m., and & Oller, D. K. (2018). Babbling development as seen in canonical babbling ratios: A naturalistic evaluation of all-day recordings. Infant Behavior and Development, 50, 140–153. Lee, S. A. S., Davis, B., & MacNeilage, P. (2010). Universal production patterns and ambient language influences in babbling: A cross-linguistic study of Korean- and Englishlearning infants. Journal of Child Language, 37, 293–318. Lohmander, A., Holm, K., Eriksson, S., & Liberman, M. (2017). Observation method identifies that a lack of canonical babbling can indicate future speech and language problems. Acta Paediatrica, 106, 935–943. Lüke, C., Ritterfeld, U., Grimminger, A., Liszkowski, U., & Rohlfing, K. J. (2017). Development of pointing gestures in children with typical and delayed language acquisition. Journal of Speech, Language, and Hearing Research, 60, 3185–3197. MacNeilage, P. F. (1998). The frame-content theory of evolution of speech production. Behavioral and Brain Sciences, 21, 499–511. MacNeilage, P. F., & Davis, B. L. (2000). On the origin of internal structure of word forms. Science, 288, 527-531. McGillion, M., Herbert, J. S., Pine, J., Vihman, M., dePaolis, R., Keren-Portnoy, T., & Matthews, D. (2017). What paves the way to conventional language? The predictive value of babble, pointing, and socioeconomic status. Child Development, 88, 156–166. Mehler, J., Jusczyk, P., Lambertz, G., Halstead, N., Bertoncini, J., & Amiel-Tyson, C. (1988). A precursor of language acquisition in young infants. Cognition, 29, 144–178. Menard, L., Schwartz, J-L., & Boë, L-J. (2004). Role of vocal tract morphology in speech development: Perceptual targets and sensorimotor maps for synthesized French vowels from birth to adulthood. Journal of Speech, Language, and Hearing Research, 47, 1059-1080. Oller, D. K. (1980). The emergence of the sounds of speech in infancy. Child Phonology, 1, 93–112. Rohlfing, K. J., Grimminger, A., & Lüke, C. (2017). An interactive view on the development of deictic pointing in infancy. Frontiers in Psychology, 8, 1319. doi:10.3389/fpsyg​ .2017.01319 Segal, O., Hejli-Assi, S., & Kishon-Rabin, L. (2016). The effect of listening experience on the discrimination of /ba/ and /pa/ in Hebrew-learning and Arabic-learning infants. Infant Behavior and Development, 42, 86–99. Tyler, J. C. (2015). Expanding and mapping the indexical field: Rising pitch, the Uptalk stereotype, and perceptual variation. Journal of English Linguistics, 43, 284–310. Vihman, M. M. (2017). Learning words and learning sounds: Advances in language development. British Journal of Psychology, 108, 1–27. Vorperian, H. K., Kent, R. D., Lindstrom, M. J., Kalina, C. M., Gentry, L. R., & Yandell, B. S. (2005). Development of vocal tract length during early childhood: A magnetic resonance imaging study. Journal of the Acoustical Society of America, 117, 338–350. Werker, J. (2012). Perceptual foundations of bilingual acquisition in infancy. Annals of the New York Academy of Sciences, 1251, 50–61.

6 Typical Language Development Introduction Typical (normal) language development is variable in children. Some children develop language skills early, some late, but most typically developing children attain approximately equivalent levels of language skill by the age of 5 or 6 years. Like preverbal language skills, the sequence and milestones of normal language development described in this chapter are appropriate for the “average” child. Some may argue that an “average” child does not exist, and that variability in language development among typically developing children can be tied to environmental and cultural variables. For example, an infant who has been spoken to a great deal by primary caretakers may emerge as a child who develops language very quickly. Substantial language stimulation, however, may be culture specific. In some cultures, adults do not direct much speech to children. For example, a child reared in such an environment, or in any environment with relatively infrequent language stimulation, may have fewer words at 18 months compared with an 18-month-old child who is spoken to a lot, but the relatively small spoken vocabulary must be evaluated relative to the prevailing environmental/ cultural influences. This child’s small vocabulary may reflect a substantial influence of his culture. A good summary of potential environmental/cultural influences on language development is given by RoseberryMcKibben (2007, pp. 47–49).

The social setting in which a child matures, her perceptual, cognitive, and conceptual skills, plus linguistic factors, all have the potential to influence language development (Johnston, 2006). These factors probably interact in different ways for different children, accounting in part for the wide variability in language development milestones observed across children. The reader should keep in mind these potential influences on a child’s language development. Table 6–1 presents a summary of language development sequences and milestones for the typically developing child. This table includes information covered in Chapter 5, summarizing how infants move through coos, goos, and babbling stages as they approach the end of the first year and the production of their first word. Between 12 and 18 months, toddlers add 7 to 11 words per month to their vocabulary, for a total of about 50 production words at age one and a half years. These words are most often spoken as single-word utterances, as if the child does not have the concept of a sentence. At this point in development, the comprehension vocabulary is typically larger than the production vocabulary. Around 18 to 24 months, toddlers experience a vocabulary “spurt,” adding a huge number of words to their vocabulary and beginning to combine them into simple two-word utterances. Between 2 and 3 years, vocabulary continues to grow, utterances become longer, and children begin to use bound morphemes in an appropriate way. During this period, children begin to 69

70

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Table 6–1.  Summary of Stages (by Age Range) of Typical Language Development

Age Range

Language Skills

Newborn

May be able to discriminate certain phonemes; may attend to specific voices

1–3 months

Coos, goos, probably not used intentionally for communication purposes

3–6 months

Onset of canonical babbling toward later end of age period

6–8 months

Variegated babbling toward end of period, comprehension of words in rich contexts

8–12 months

Jargon, consistent phonetic forms (protowords), first word toward end of first year, comprehension of sentences in context

12–18 months

New words gained until a total of about 50 at the end of this period; more words comprehended than produced

18–24 months

Vocabulary “spurt” as a result of naming insight; two-word utterances toward the end of the period, vocabulary of about 200–300 words, many more words comprehended and sentences understood in rich context

2–3 years

Usage of grammatical morphemes, longer utterances (three-word, possibly four-word utterances), typical mean length of utterance (MLU) around 1.5–2.5; comprehension excellent for simple sentences, more complex sentences understood with context and in familiar settings

3–4 years

Vocabulary growth, continued mastery of grammatical morphology, MLU ~3–4

4–5 years

More complex sentences, developing pragmatic skills for conversation and understanding of more complex sentences; MLU ~3.5–4.7; relative clauses, coordination, passive forms; metalinguistic skills begin to develop

5–6 years

Mastery of grammatical morphemes complete, continued development of conversational and narrative skills

7 years and beyond

Expanding vocabulary, production and comprehension of complex sentence forms, various metalinguistic skills improve into collegeage years

learn conversational and narrative skills as well, developing the pragmatic skills for effective communication in real-life situations. During the fourth year, vocabulary continues to increase along with longer and more complex sentences. By age 5 years, most typically developing children are beginning to recognize that language has a structure built from individual components (e.g., children begin to recognize that words consist of individual sounds). Past age 5 years, language gains include expanding vocabulary, comprehension, and production of increasingly complex sentences, and language subtleties such as understanding and telling jokes, social

use of language (“pragmatics,” such as turn-taking in conversation, and understanding nonverbal cues to communication such as facial expression and gesture). The development of reading skills is likely to affect oral language, both comprehension and expression (production), and the development of oral language skills is likely to affect reading skills. Mastery of high-level vocabulary skills and sentence-level utterances continues into college-age years and beyond. Throughout these developmental stages, language comprehension including morphology, syntax, vocabulary, and pragmatics is likely to be more sophisticated compared with language expression (production).

6  Typical Language Development

12 to 18 Months Between the ages of 12 and 18 months, a typical child learns to produce about 50 words. These 50 words are mostly nouns, including specific names for people (“Mama,” “Dada,” “Muffy”), body part names, food names, and names of other familiar objects (e.g., “book,” “cup”). The nouns included among the first 50 words tend to be very concrete — abstraction is not prominent in the toddler vocabulary — and are used frequently by both caretakers and the child. A smaller number of the first 50 words are verbs such as “run,” “walk,” and “play.” There may also be a few adjectives such as “allgone,” “dirty,” and “cold,” and a few greeting words (“bye-bye”) and function words (e.g., “where,” “for”). Many of these observations about the first fifty words are described by Nelson (1973; 1981). As in preverbal language development, comprehension outpaces production in the 12- to 18-month period. If the relationship between comprehension and production is measured in number of words recognized versus spoken, an 18-month-old toddler understands many more words than she produces. Children in this age range may also appear to understand complex sentences, but they are probably using context and familiarity to comprehend these utterances. With context eliminated, as in a laboratory experiment, sentence understanding is probably limited to fairly simple utterances. Much has been made of the phenomena of overextension and underextension in the child’s first fifty words. When a toddler overextends the meaning of a word, the semantic category is too broad. For example, the toddler who refers to all four-legged animals as “doggie” is generalizing from experience with the family dog. Similarly, if Daddy has a beard, all men with beards become worthy of the name. When overextension is viewed within the framework of linguistic category formation, the establishment of different and distinctive semantic categories is still at a very early stage, as are the mental representations for different four-legged animals. There is an ongoing process of refinement of these categories as the child is exposed to more linguistic data provided by parents, other adults, and video. The opposite phenomenon is underextension, in which a single example of an object-word link becomes sufficient to define an entire semantic category. Muffy, the family dog, becomes the only dog in the world. When Mom points out Spike the slobbering bulldog to her 16-month-old and asks, “What’s that?,” the baby does not respond because Spike, or any dog other than Muffy, have not yet contributed a valid representative

71

to the doggie category. At this stage of language learning only Muffy qualifies as a doggie. For the child who underextends a word such as this, the link between object and meaning is entirely specific to the context in which the object is named — the familiar four-legged animal in the baby’s house, whom she has known forever as a doggie.

18 to 24 Months As the end of the second year is approached, something interesting happens to change the rate at which toddlers add new words to their spoken vocabulary. Up to this point, children have added roughly 7 to 11 new words per month, largely by constant repetition by caregivers of object-word links. Mom or Dad has pointed to the dog many times and said the word “doggie” before the toddler produces the word. Sometime late in the second year, toddlers have a “naming insight,” which allows them to do fast mapping of spoken words to objects, actions, descriptions, and so forth. It is as if the child suddenly realizes, “I get it, when something is pointed to and Mom uses a word at the same time, that must be the name of the object!” Now the child appears to need only a single instance of object-name or action-name pairing, and the word enters the spoken vocabulary. When the child has this naming insight, he or she experiences a large spurt in spoken vocabulary. The child’s newly found skill of fast mapping is likely to result in 20 to 40 new words per month. By 24 months, this vocabulary spurt results in a vocabulary size of 200 to 300 words. Toward the end of the second year, language skills gain in sophistication, possibly as a result of the rapid increase in vocabulary. To this point, children communicate by producing single words, but now two-word utterances are heard for the first time. Some scientists believe the vocabulary must reach a relatively large size (for a toddler, at least), before the two-word utterance stage is entered. In this view, a critical vocabulary size “bootstraps” the grammatical step to two-word utterances. Additional detail on multiword utterances is provided later in the section entitled, “Multiword Utterances, Grammatical Morphology.” At 2 years of age, language comprehension skills continue to improve, sharing a lot of the characteristics of earlier stages of language development. Parents often overestimate their toddler’s comprehension abilities, when comprehension is defined as the ability to understand utterances in the absence of context. Children use their growing fund of world knowledge, however, to comprehend the meaning of relatively

72

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

complex utterances, and of course, comprehension always includes context.1

Three Years (36 Months) By 3 years of age, an average, typically developing child has a vocabulary size between 600 and 2,000 words. The range of words is so large because “typical” vocabulary size depends on so many factors, including the toddler’s history and primary environment (socioeconomic status, caregiver style of communication, and so forth), current toys (and the range of words associated with them), exposure to television, day care setting, and even the way in which it is decided that the child “has” words (such as parent report, or from specific experimental procedures). At 3 years of age, the largest proportion of a toddler’s words is still nouns. There is another reason to be cautious about identifying a “typical” number of words in a toddler’s vocabulary (lexicon) at a given age (the same caution applies to “typical” aspects of almost any aspect of language development). Most of the information we have concerning language development has come from studies of English-speaking children. A recent but growing body of knowledge is demonstrating how the facts of language development differ across languages and even across dialects within a language. Elin Thordardottir, a multilingual language scientist from McGill University in Quebec, Canada, showed that 3-year-old children learning Quebec French have significantly smaller vocabulary sizes than children learning Canadian English (Thordardottir, 2005). Clearly, the Frenchspeaking children should not be regarded as having delayed vocabulary development relative to Englishspeaking children. Rather, the difference in vocabulary size reflects differences between the structure of the two languages, cultural differences, or some complex combination of the two factors. Other examples of different vocabulary development in different languages can also be found in the research literature. Throughout the remainder of this chapter, the reader should keep in mind the potential cross-linguistic differences in language development. 1

Multiword Utterances, Grammatical Morphology Beginning around 2 years of age, two notable and related aspects of language development are the development of grammatical morphology and the production of multiword utterances. Between the ages of 18 and 24 months, typically developing children make first attempts at combining two words to enhance their communication skills. Early attempts may not be two “real” words, but rather an approximation to a multiword utterance supported by combining words with gestures. For example, Bates and her colleagues showed that toddlers often paired gestures with words to create early, “two-word” propositions (Bates, 1980; Bates, Thal, Whitesell, & Fenson, 1989). For example, a child may say “Daddy” and point to a chair, meaning “Daddy sit.” In this case, the communicative act is equivalent to two spoken words. Similarly, some children may mix jargon (see Chapter 5) with real words, creating a two-“word” utterance in which only one of the words is recognizable to an adult. In these examples, the child’s use of two units of meaning suggests an awareness of the potential to combine words for communicative purposes. Both examples include one “unit” that is not a “real” word ​ — ​a gesture in one case, a spoken nonword in the other. The transition from one- to two-word utterances may therefore not be clear-cut. Children may produce something in between, indicating they have the idea of two-word utterances even if there are not actually two well-formed words. When children begin to produce two-word utterances in which both words are “real,” the utterances have certain “baby-ish” characteristics. The utterances have only content words and lack articles, prepositions, and grammatical morphemes. For example, the child who wants to indicate to Daddy that the dog has possession of a bone says “Muffy toy,” not “Muffy’s toy”; the grammatical morpheme for possession is omitted (see later in chapter). A good deal of effort has been devoted to identifying the regularities and rules of toddlers’ two-word utterances. There are several different ways to review this information; we have chosen to present Brown’s

 here are two points to be made here. First, an experimenter can estimate a child’s “pure” comprehension abilities by presenting utterances T isolated from any context to obtain responses that show the presence or absence of understanding. Results may not be particularly revealing of “typical” language usage but can contribute to models and theories of language competence across development. It is important to understand that a child’s difficulty in comprehending a complex utterance in a laboratory study does not mean the child cannot understand the same utterance when spoken in a familiar situation. Second, certain developmental language delays may involve an inability or deficit in using world knowledge to comprehend utterances that are beyond current “pure” comprehension abilities. Scientists may want to separate the two sources of comprehension ability (“pure” comprehension of language from use of world knowledge to assist comprehension) to better estimate the relative effects in certain disorders.

6  Typical Language Development

(1973) perspective on this stage of utterance development. Brown interpreted two-word utterances as expressions of broader semantic relations. In other words, toddlers organize these simple utterances in terms of larger, relational categories that serve as “frames” for many different possible utterances. Table 6–2 presents some of the semantic relations proposed by Brown, with examples of how different utterances “fit” the frames. Also included are possible adult versions of the toddler utterances. “Agents” are people, animals, action figures, and so forth — any individual who can cause something to happen or stand in relation to an object. “Actions” are typically verbs, indicating action (e.g., eat, cry), occurrences (e.g., shine, rain), or states of being (e.g., happy, sad). “Objects” have an obvious definition. “Locatives” are words that indicate locations and directions, as in “Muffy down.” When children use locatives, it suggests they know about objects or agents in different locations (e.g., “Muffy bed” versus “Muffy out”). Note how Muffy falls into different semantic relations depending on her status. When she is doing something, such as eating, she is an agent of the action; when she is located on the bed (“Muffy bed”), she is an entity in a specific location. Other semantic categories are also used as frames for multiword utterances. For example,

“demonstratives” is a semantic category for specifying something that may be ambiguous (“That one,” not the one you’re pointing to, Dad, wake up!). “Possessor and possession” are important categories for the toddler who views connections between agents or objects and their owners as critical to an orderly world (“Mommy sock” is not so critical, of course, as “Daddy TV”; a sock is a sock, but a big-screen TV . . . ). The combinations of semantic categories, some of which are listed as semantic relations Table 6–2, give the toddler a great deal of flexibility in creating a variety of utterances. Words can be “slotted” into these frames, providing the toddler a way to communicate about important agency, actions, ownership, and location in her environment. The semantic relations listed in Table 6­–2 also provide the beginning of a grammar for the toddler, amounting to a set of simple rules for how words can be combined to produce meaningful, effective utterances. The semantic categories proposed by Brown can be combined to create longer but still incomplete utterances. Between 2 and 3 years of age, for example, the toddler may generate utterances such as “Daddy watch TV” and “Muffy eat shoes,” using the agent+action+​ object framework, or “Daddy watch TV den” or “Muffy eat shoes den” (ruining Daddy’s expensive loafers

Table 6–2.  Brown’s Semantic Relations as “Frames” for Two-Word Utterances

Semantic Relation

Examples

How Would an Adult Say It?

Agent + Action

Muffy eat

Muffy is eating her food

Jenny cry

Jenny is crying

Daddy yell

Daddy is yelling at Bobby

Man hat

The man is wearing a hat

Jenny dress

Jenny has a dress

Mommy treat

Mommy has a treat

Kick ball

I just kicked the ball

Eat popsicle

I want to eat this popsicle

Drive car

Mommy is driving the car

Go park

We’re going to the park

Fly up

The bird flew into the tree

Muffy down

Muffy is downstairs

Car there

The car is over there

Demonstrative + Entity

That car!

That other car, not this one

Possessor + Possession

Mommy sock

This sock belongs to Mommy

Daddy TV

Daddy has a big-screen TV

Agent + Object

Action + Object

Action + Locative Entity + Locative

73

74

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

and therefore his big-screen TV experience) using the agent+action+object+locative framework. None of these utterances are adult-like, but they show increasingly sophisticated language skills as more complex relations are expressed by fitting words into the semantic relations categories shown in Table 6–2.

Expanding Utterance Length:  A Measure of Linguistic Sophistication After typically developing children master two-word utterances, they extend utterance lengths in a systematic way. Utterance length has been a prime focus of scientists interested in language development. As we will see, consideration of how utterance length changes as a child develops must include an account of grammatical morphology and how it is mastered. In Roger Brown’s (1925–1997) 1973 book (Brown, 1973) he reported a detailed analysis of the early language development of three children (Brown christened them Adam, Sarah, and Eve). The observations were longitudinal, meaning for each of the three children, language samples were collected and analyzed over time, as the children developed their skills. In addition, the language samples analyzed for Adam, Sarah, and Eve were taken from spontaneous utterances produced in conversation with their mothers (and occasionally other caretakers). Brown was interested in an account of genuine, functional language development, rather than the kind of data collected in a highly structured experiment. Eve’s language was first sampled when she was 18 months old, and Adam’s and Sarah’s when they were 27 months old. Importantly, all three children had roughly the same average length of utterance when the longitudinal observations began. In this sense, at the beginning of the longitudinal observations, the three children were more or less at equivalent stages for the complexity of their multiword utterances.

Mean Length of Utterance (MLU) The idea of using the average (mean) typical length of a child’s utterance as a measure of language sophistication was one of the conclusions of Brown’s study. In fact, the measure he called mean length of utterance (MLU) has become an “industry standard” as an index of language sophistication. The computation of MLU is straightforward, even though the process of collecting and analyzing usable data demands great care and patience. 2 

First, an adequate sample of spontaneous speech must be obtained from a child, perhaps while the child is playing with a parent. The utterances produced during this interaction are recorded, along with Mom’s part of the conversation, and the number of morphemes per utterance is counted, and the sum of all morphemes across the total number of utterances is divided by the number of produced utterances. Table 6–3 shows a simple example of the computation of MLU. Ten utterances from a 3-year-old child are shown in the left column along with the Mom’s part of the conversation; the number of morphemes for each child utterance is given in the right column. Some utterances are short (e.g., “Throw there,” “In street,” both with two morphemes), and two are longer (“Daddy’s big TV broked,” six morphemes; “Daddy don’t got football,” five morphemes).2 This kind of variability in utterance length for a given child at a given point in language development is not unusual. When utterance lengths are averaged across the 10 child utterances in Table 6­–3, the computed MLU is 3.3, which is representative for a typically developing child of this age. Note especially utterances 6, 7, and 8, where the number of morphemes exceeds the number of words. The child is using grammatical morphology (see later in this chapter), and the free and bound morphemes are counted as separate “units” in the utterances, even when the grammatical morphology is applied incorrectly (utterance 6, “broked” = two morphemes). Brown’s MLU data for Adam, Sarah, and Eve are shown in Figure 6–1. Age in months is shown on the x-axis and MLU on the y-axis. Two general characteristics of these data are clear: (a) MLU increases for each child as he or she gets older, and (b) there are differences between the children. Most notably, Eve (blue line) increases her MLU from about 1.5 to just under 4.5 over a younger age range (18 to 27 months) as compared with Adam and Sarah (27 to 43 months). All three children increase their MLU in the same way, but Eve does so at a younger age. Like vocabulary development, data on MLU may vary across languages and cultures. In Elin Thordardottir’s study of children’s language learning of Canadian English and Quebec French, at comparable ages French-speaking children had greater MLUs than English-speaking children. Does this mean the Frenchspeaking children have more sophisticated language development skills than the English-speaking children? The answer is “no.” Rather, the difference between the two groups of children can be explained by differences between the languages. French has a much more

 hese morpheme counts demonstrate that there are cases in which the number of morphemes is not crystal clear: Is “Daddy’s” two or three T morphemes? (Dad + y (morpheme for diminutive) + possession morpheme, ’s). In the two examples, we count “Daddy” as one morpheme.

Table 6–3.  Example of Computation of Mean Length of Utterance for Ten Utterances, with “Units” Being Morphemes (Shown Only for Child Utterance)

Utterance

Morphemes

Mom:  What do you have? 1.  Child:  See I got ball.

4

Mom:  What are you going to do with the ball? 2.  Child:  Throw there.

2

Mom:  Be careful, you don’t want to break anything. 3.  Child:  This, break!

2

Mom:  Throw it over there (points), that’s safe. 4.  Child:  Break our house!

3

Mom:  Then where would we live? 5. Child: In tree!

2

Mom:  That wouldn’t be much fun. 6.  Child:  Daddy’s big TV broked.

6

Mom:  With the ball? 7. Child: I’m laughing!

4

Mom:  What’s funny? 8.  Child:  Daddy don’t got football.

5

Mom:  Not a bad idea! 9.  Child:  Throw at TV?

3

Mom:  Um, not a good idea . . . 10. Child: Not good.

2

Total Utterances: 10

Total Morphemes: 33

MLU = 33/10 = 3.3

4.5

MLU (morphemes)

4.0

3.5

3.0 2.5 Eve Adam Sarah

2.0

Figure 6–1.  Plot of MLU data (y-axis) as a function of age (x-axis) for the three children studied by Brown (1973). Adapted and modified from Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.

1.5 18

22

26

30

34

38

42

AGE (mos.)

75

76

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

extensive system of grammatical (bound) morphemes as compared with English, which tends to make MLU greater for French-speaking children. In other words, French words are more likely than English words to be inflected — that is, to have a bound morpheme (or multiple morphemes) attached. Interestingly, when Thordardottir expressed length of utterance using words as the unit within utterances, French- and Englishspeaking children did not differ. The lesson from this cross-linguistic comparison is that a single measure such as MLU may reflect different things in different languages. Because MLU is typically computed by counting morphemes, the meaning of the measure varies across languages with different types of grammatical morphology. “Heavily inflected” languages — that is, languages with lots of frequently used grammatical morphemes — may appear to have higher MLUs at a given age as compared with languages with fewer grammatical morphemes. What may be true for English may not be true for other languages.

Grammatical Morphology The child who combines the semantic categories previously described for two-, three-, and four-word utter-

ances still sounds “baby-ish” when the utterances do not include grammatical morphemes. When children enter the two-word stage, they begin to use some grammatical morphemes that make utterances sound “complete.” English grammatical morphemes are listed in Table 6­–4, in the order proposed by Brown for their acquisition. Early grammatical morphemes such as the present progressive (-ing, as in “She running”), prepositions, and plurals are mastered by many children sometime around the third birthday or a little after. Other grammatical morphemes may not be used accurately even in the fourth year. What is clear, however, is the degree to which the continued refinement of the grammatical morpheme system — think of it as the elegant window dressing of language — adds to the adult-like sound of a child’s speech.

Grammatical Morphology and Rule Learning Children’s learning of grammatical morphology reflects rule learning. For example, the “-ed” grammatical morpheme for past tense is applied as a rule to verbs when the child wants to express something that has already happened. The child does not need to learn the past tense morpheme for every verb; rather, he learns the following generalization (that is, rule): when express-

Table 6–4.  Grammatical Morphemes of English

Morpheme

Example

Present progressive (-ing)

She running

Preposition “in”

Muffy in bed

Preposition “on”

Spoon on floor

Plural inflections (e.g., “s”, “es”)

dogs, dresses

Past inflections on irregular verbs

I went (go) home; I ate (eat) candy

Possessive inflections

Muffy’s ball

Uncontractible copula (is, am, and are)

Here it is! They were naughty!

Articles (the, a, an)

The dog; A man

Past inflections on regular verbs (e.g., “ed”)

He walked fast; The baby cried

Regular third person forms (-s)

She walks fast

Irregular third person forms (has, does)

He has some; She does

Uncontractible auxiliary forms

Is Daddy home? You were there

Contractible copula (e.g., ’s and ’re)

Muffy’s there; They’re gone

Contractible auxiliary forms (e.g., ’d)

He’d play every day

Note. The order in which the morphemes are listed is roughly the order in which they are acquired, starting at about age 2 years. Based on Brown, 1973.

6  Typical Language Development

ing an action that has occurred, attach “-ed” to the verb. Ironically, a proof of rule learning is that children attach the morpheme to a verb that has an irregular past tense form. In English, verbs such as “go,” “hit,” and “run” all have irregular past tense forms (“went,” “hit,” “ran”). Typically developing children often overgeneralize the grammatical morpheme for past tense by saying “goed,” “hitted,” and “runned.” This demonstrates knowledge of the rule and the ability to combine free and bound morphemes. Part of language development is the elimination of these overgeneralized inflections and the learning of irregular forms. As with the other aspects of language development, children vary in their order of development and the rate at which they learn grammatical morphemes. The order of mastery proposed by Brown is not fixed, and in fact, varied somewhat for Adam, Eve, and Sarah. The rate of mastery of grammatical morphemes also varies among typically developing children. “Typical” development may include the child who masters correct use of all grammatical morphemes by age 3 years as well as the child who continues to have difficulty with some grammatical morphemes at age 4.5 years.

Typical Language Development in School Years By age 5 or 6 years, about the time a child enters first grade, typically developing children have a relatively large and expanding vocabulary as well as mastery of grammatical morphology. What else is there to learn apart from new words, grammatical morphemes, and multiword utterances? As it turns out, there is still plenty to learn. Here we describe three aspects of advanced language skills and their development throughout the school-age years. These include metalinguistic, pragmatic (specifically discourse), and complex grammar skills.

Metalinguistic Skills Metalinguistic skills include the ability to reflect on language and its components, to use language in a way that demonstrates knowledge of these components and the arbitrary way in which they are combined. Selected examples of metalinguistic skills include the ability to 3 

77

decompose words into their speech sound components, to use (or understand) the same word with very different meanings, to engage in linguistic wordplay, to be able to judge the correctness (or incorrectness) of word order in sentences, and to recognize ambiguity in the meaning of a single sentence.

Words and Their Speech Sound Components An adult who is asked the question, “What happens to the word ‘about’ when the first sound ‘uh’ is taken away?” will answer, “You have the word ‘bout.’ ” This answer demonstrates the ability to break a word apart into its speech sound components. This metalinguistic skill is generally not part of the language capability of toddlers. Ask a 4-year-old the same question, and he or she will likely be baffled by it. This metalinguistic skill appears around the age of 5 or 6 years. Before children develop this skill, they seem to treat words as “unbreakable” units: break one part (take away the “uh” from “about”), and you break the whole word. Perhaps it is not a coincidence that the ability to recognize words as being made up of individual speech sounds appears around the same time as early reading skills.

Same Word, Different Meanings As children figure out that words can be broken down into sound components, they begin to understand and possibly use the same word with different meanings. Preschool children tend to be rigid in their understanding of words and may have difficulty separating a particular word from its referent. “Cold” for a 4-year-old toddler is specific to temperature and does not make sense as a description of someone’s personality (“She is cold”) or as the effect of a particularly vicious right hook in a prize fight (“Knocked him out cold”). School-age children begin to understand how one word may have multiple meanings and, in fact, how certain words may function in metaphors and idioms. This metalinguistic skill is developed throughout the school-age years and into and through adolescence. Some words can be used in very subtle, distant ways from their most obvious meanings (as in the previous “cold” example, or “clam,” “clam up,” clammy”). This language skill takes a good deal of time to reach full maturity.3

J ust how often words are used by native speakers in these multiple ways, with no sense of unusual language usage (i.e., recognition of frequent use of idiomatic linguistic forms), becomes obvious when you communicate with people who are in a second language environment and who have what appear to be pretty good skills in that language. The author has had a number of doctoral students from Taiwan and Korea, and when meeting with them and speaking casually, he finds himself using phrases and expressions loaded with words having a variety of metaphorical and idiomatic meanings. It is only after these expressions have produced a puzzled expression on the students’ faces that he realizes how much our language is like an express train, avoiding all the local stops and getting to the final destination as fast as possible.

78

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Linguistic Wordplay Preschool children may experiment with words by changing sounds to make them funny, but real wordplay does not emerge until the school years. There are many different types of wordplay; much of it is designed to be humorous. Some forms involve simple changes in sounds of words (“Captain Brunch” to describe someone who enjoys those Sunday trough feedings at the local hotel chain) or mimic a foreign accent (“zomewhere” for French-accented “somewhere”). These linguistic abilities depend on the earlier-acquired skill of decomposing words into their component sounds. Other types of wordplay may require the skill of understanding multiple meanings of different words and the same meaning for different words, as in the following silly exchange: “Question: How can you tell when a bucket gets sick? Answer: It looks a little pale.” Still others require much more sophisticated knowledge of cultural slang, possibly combined with very low-level humor: “You can tell if your doctor is a quack when you see his large bill.”4

Pragmatic Skill:  Discourse Pragmatics covers a wide range of behaviors that are thought to be important to language use in social contexts (see Chapter 3). The question of pragmatic development as part of language development can be stated simply as, “How do children learn the rules and customs of social communication?” These rules and customs may include something as straightforward as politeness, to the more complex consideration of how children figure out what someone means when what they say is not what they really mean. People who study this latter skill often refer to it as the “presupposition” part of pragmatics, which means placing a conversational experience within the speaker’s and listener’s world knowledge, and the child’s ability to place a current communicative exchange within the proper context. For example, assume you have been attending each lecture of a class, and it is the 13th week of a long, hard, 15-week semester. In each lecture, the instructor has droned on without mercy, rarely making eye contact with students or changing the melody of his speech. He reads information from his slides, never making impromptu remarks or elaborating on a point in a natural, spontaneous way. The exams have been as uninteresting and obtuse as the instructor. On this day of the never-ending semester, you have brought 4 

your 7-year-old brother to class; he is visiting and wants to experience a real college class. As the instructor begins yet another long monologue, you turn to a friend sitting on the other side of your brother and say, “This class is awesome” without a surface trace of sarcasm or irony — nothing in your facial expression, your intonation, or your rhythm. Your statement suggests nothing other than a straightforward declaration. Hearing this, your friend knows you mean exactly the opposite, but your little brother thinks, if this class is awesome, maybe I want to reconsider college and instead learn a useful trade. This is an important part of pragmatics, and of pragmatic development. The friend, hearing “This is awesome” within the history and context of the class, having had experiences with other, less immobile instructors, knowing the big brother, and so forth, used his presuppositions about the comment to interpret it correctly: the class is clearly not awesome. The little brother, however, does not have some of the presuppositions available to his brother’s friend (he had not been sitting in this mind-melting lecture hall for 13 weeks), and even in the immediate context of an obviously boring professor, with other students in various stages of sleepiness, does not have the pragmatic skill to understand the true meaning of his brother’s comment. Here we summarize the development of discourse skills as an important part of pragmatic development. Discourse is the broad term that includes conversation skills and narrative (storytelling). According to Brinton and Fujiki (1989), conversation skills include (a) turntaking, (b) topic management, and (c) conversational repair.

Turn-Taking Turn-taking during conversation is a skill, guided by learned rules. In a conversation, you speak and when you are finished with a thought, your conversation partner replies, and so on and so on. People taking part in this conversation typically know when to stop speaking (if they have the floor), how to anticipate that their turn to speak is near, and when to begin speaking as the current speaker finishes. Turn-taking behaviors seem obvious because we learn them relatively early and so many people do take turns in a socially acceptable way. But all of us have encountered people who begin to speak long before their conversational partner has finished speaking, who seem to miss those cues that say, “I’m done with my turn in a second or two.”

 he word “quack” as a description of a doctor with questionable (or even fraudulent) skills is apparently derived from the 16th century Dutch T word “quacksalver,” meaning someone who boasts while they apply a salve to a wound, or who sells useless medicines on the street.

6  Typical Language Development

The foundation for turn-taking skills is apparently developed in infancy, when caregivers respond to infant vocalizations as if they are participating in a true communication exchange. Children as young as 1 year of age seem to get the general idea of waiting until someone speaking to them is finished before beginning their own turn. By the preschool years — roughly between the ages of 3 and 6 — children’s turn-taking skills are fairly sophisticated. Turn-taking in conversation becomes more sophisticated throughout adolescence and into adulthood. Speaking turns between members of a conversational pair become more related, with the content of consecutive utterances containing increasingly more agreement in theme and factual basis (Nippold, 1998). As turn taking becomes more sophisticated, there are fewer off-topic turns, and people engaged in communication begin to value the perspectives of their communication partners.

Topic Management Topic management is what conversational partners do to maintain a sensible spoken interaction. Conversation about a shared topic makes for sensible communication. If this sounds slightly ridiculous as an aspect of language development, consider this: In a study of conversations between toddlers aged 19 to 38 months and their mothers, the youngest children shared topics with their moms about 56% of the time, the oldest about 76% of the time (Bloom, Rocissano, & Hood, 1976). Sometime between the ages of 2 and 3 years, toddlers learn something about how to stay on topic during a conversation. The mastery of topic management as a component of language development is difficult to pin down because the definition of “topic” is loose and open to different interpretations. Even with disagreements concerning the fine points of what does and does not constitute a conversational topic, the following two examples make the point nicely, and easily, as 5-yearold Bart tries to sample opinion from two of his friends on his favorite football team: Example 1

Example 2 Bart:  What’s wrong with the Packers? Ralphie:  My doggie doesn’t know where to go to the bathroom. Bart:  Maybe they need a new quarterback . . . Ralphie:  Watch my nose open and close all by itself! Bart:  Do you care about the Packers? Ralphie:  Okay, I’ll come back and play later. These examples suggest that even if the nature of the “topic” is hard to define, in many cases, it is easy to tell when one member of a communication “dyad” (two people in conversation) is off topic. Anyone who has interacted with a small child knows that Example 2 is certainly possible. Throughout adolescence and into adulthood, topic management is likely to change, with fewer topic shifts per conversation and longer times spent on a given topic (Nippold, 1998).

Conversational Repairs One aspect of joint conversational efforts is the way in which miscommunications are handled. In conversations between two adults, an adult and child, or two children, there are instances of uncertain meanings, unintelligible words, and ambiguous referents (e.g., when a speaker says “he,” who does he mean if the person has not been identified previously?). The process by which speakers or listeners “flag” a miscommunication and fix it is called conversational repair. Here are two examples of conversational repair: Example 3 Bart:  The Packers are huge! Milhouse:  But I thought you said they’re no good . . . Bart:  No, they’re big guys, look at those linemen! Example 4 Bart:  What is up with those CB’s?

Bart:  What’s wrong with the Packers?

Milhouse:  Huh, CB’s?

Milhouse:  Brett Favre is my favorite.

Bart:  You know, cornerbacks, the guys who can intercept passes.

Bart:  He . . . he’s too old. Milhouse:  He’s probably cold up there, he’s from way down. Bart:  Sacked again!

79

In the first case, Milhouse misinterprets Bart’s use of the word “huge” to mean “good.” Milhouse is confused because Bart had expressed his concern with the Packers’ poor performance (Example 1). Bart repairs

80

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

the conversational breakdown by specifying his use of the word in literal terms. The second example, referred to as a clarification request, is a little more straightforward as Milhouse simply does not know what “CB’s” means. Bart repairs the conversation by defining the term. Conversational repair skills develop throughout childhood, although it is difficult to specify ages at which particular kinds of repair (and there are many more than the two examples given previously) are learned. Clarification requests like the one in Example 4, or simpler versions, as when a listener says “Huh,” “What,” or responds to a statement with a quizzical look, may be observed in children as young as 24 to 26 months. Other types of conversational repair, such as a sequence of clarification requests that converge on the intended meaning of the speaker (several requests for clarification that become increasingly more specific) probably develop at a later age, perhaps around 36 months. More advanced conversational repair skills continue to develop throughout the school-age years and into adolescence.

Narrative The term narrative means, in the simplest sense, storytelling. A narrative can be an account of an event, a persuasive speech, or any extended sequence of utterances (or paragraphs, in writing). A narrative can be elicited from a child by asking a question such as, “Tell me about your summer vacation” or “Tell me what happens in the movie Moana.” The child responds by producing a series of utterances which are meant to “hang together,” so that the result is a sensible narrative. Utterances within the narrative can hang together in a number of ways. They can have a consistent theme, can be sequenced appropriately, can have explicit referents (can include all the information required by the listener to understand the narrative), can use cohesive devices such as pronouns and conjunctions to link individual sentences with each other, and can be told with a minimum of repetitions and alternate formulations of the same statement. Here are two narratives in response to the same question, produced by male children around the age of 6 years; one hangs together, and the other does not. “Tell me what you did on your summer vacation”. Well, we went to a mountain camp, which was at a lake. They had guides who helped us and fed us lunch every day, you know, hot dogs, chips, and pop and stuff. They showed us how to, like, hike through the forest around the lake, and sometimes there would be

animals, like bighorn sheep and these funny little guys that looked like beavers. One time, one of those beaver guys ran right across the trail and he like scared us, even the guide. So he said we should keep our eyes open for other animals on the trail, and then we were really wide awake, man. Really. At night we would sit around a, um, campfire and eat dinner and sing and tell scary stories. Those were fun, lots of fun. Sometimes we wouldn’t go to bed until midnight, or maybe even later. I think I’ll go back next summer. “Tell me what you did on your summer vacation.” This place had water and animals right in front of us. He told us it was dangerous. The sheep and everything. Those sheep, um, something else, too. I had a hot dog and chips. We had some food, hot dogs. A bunch of us went hiking. One time the guy told us to watch it. The beaver animal. He said some scary stories; I covered my ears. The campfire was really hot; they gave us paper plates for our food. I liked staying up late. I wish the pop was a different flavor, I don’t like cola.

Some of you reading these examples are probably familiar with a slightly less obvious version of the disorganized, second narrative — the ones you hear in some lecture halls. The point is made by comparing the two narratives: relating something in a coherent way requires a consistent theme, a sensible sequence, and grammatical devices to tie the different sentences together. These cohesive devices are bolded in the first narrative and nearly absent in the second one. Notice especially the pronouns “he” in the fourth and fifth sentences of the first narrative; these refer to different entities (the first to the beaver-animal, the second to the guide), but the listener has no problem knowing who or what is being referred to. This is because the narrative is well formed and therefore removes any ambiguity that may arise as a result of the consecutive use of the same pronoun. In contrast, the referents of the two “he” pronouns in the second narrative (second and tenth sentences) are not at all clear, because the overall theme is unfocused, and the sentences are oddly sequenced. Narrative skills develop continuously, probably well into adulthood, but it is not easy to attach ages to the development of specific skills. Nippold (1998) has said that from about age 5 years onward, narrative skills are improved by increasing the length of stories, including more details within them, incorporating subthemes within the major theme, and providing smooth transitions between the various episodes of the story (that is, not just jumping from episode to episode, as

6  Typical Language Development

might be the case for young children). The development of narrative skills is important not only because it serves an important social function, but because adults — especially teachers — may make judgments concerning a child’s intelligence and potential based on the child’s ability to tell a coherent story.

Complex Sentences When children begin school, their utterances are likely to be relatively short and simple in grammatical form. From about age 5 years, children begin to use more compound and complex sentences, which increase the length of their utterances (Nippold, 1998). A compound sentence contains two independent clauses joined by a conjunction (e.g., “I have a test today so I am going to study”), and a complex sentence contains at least one independent and one dependent clause (“I have a test today if the instructor did his job”). These two examples are a simplification of the many ways in which sentences can have complex syntax. The important point is, the use and understanding of these complex forms develops through the school-age years and adolescence. —  both their Syntactically complex sentences  production and their comprehension — are mastered

81

throughout the school years and beyond. Some constructions, such as passives with a conjunction (“The dance number was turned down by the ballerina because it was too difficult”) or the use of gerunds plus past perfect voice (“Running through the woods was to be her lifelong choice of exercise”) are clearly not heard from children in the first few grades and perhaps even through middle school. As a general rule, the greater the syntactic complexity of an utterance, the later the age at which it will be comprehended and mastered in spoken language.

Sample Transcript Table 6–5 contains a sequence of transcripts of conversations between an adult and a typically developing child at three ages — 30 months, 42 months, and 54 months. These longitudinal transcripts for a single child allow direct comparison across age to demonstrate developmental changes in language skills. The transcripts are brief, but the child utterances show clear developmental trends. Study the transcripts for the changing language structures and usage, and compare the changes to the discussion of language development previously presented.

Table 6–5.  Transcripts of a Single Child’s Language Performance During Conversation With an Adult

Sample From 30 Months Old C:  THIS BABY IS CRY/ING.

C:  DADDY (IS) TAKE/ING (A) SHOWER.

A: WHY?

A:  HE/’S TAKE/ING

A:  WHY IS HE CRY/ING?

C:  {SHOWER NOISES}.

C:  MOMMY AND DADDY GET HIM.

C:  (DO) (YOU) HEAR WATER COME/ING OUT (OF) HERE?

A:  OH, GOOD.

A:  I HEAR IT.

C:  THEY (ARE) GO(ING) UPSTAIRS.

A:  I HEAR THE WATER.

C:  MOMMA GOT HIM.

C:  IT/’S IT/’S IT/’S IT/’S WATER RIGHT HERE.

C:  MOMMA GOT HIM OUT (OF) HIS CRIB.

C:  ME CLOSE/(ED) (THE) DOOR/S.

A:  OH, GOOD.

A:  YOU DID.

C: YEAH.

A:  THAT/’S A GOOD IDEA.

A:  WHAT WAS THE MATTER?

A:  THEN THE WATER STAY/3S INSIDE.

C:  I DO/N’T KNOW.

C:  {SHOWER NOISES}.

C:  HE/’S SLEEPING WITH HIS DIAPER.

C:  HE/’S DONE TAKE/ING (A) SHOWER.

A:  SLEEPING WITH HIS DIAPER.

A:  HE/’S DONE [G]?

A:  THAT/’S PRETTY SILLY.

A: OKAY. continues

Table 6–5.  continued

Sample From 42 Months Old A:  WHAT HAPPEN/ED? C:  SHE FELL RIGHT UP.

C: HEY I WONDER IF THIS MIGHT BE SUPERMAN/Z DOG/S?

A:  SHE FELL UP HUH?

A: OH.

C: {SCREECH}.

C:  (SO) SO HE X HIM TO BE KRYPTO.

A: WELL WHO/’S GONNA WATCH THE BABY/S CHRISTOPHER?

A: WHAT?

C: THEY DON’T WANT TO HAVE GUY/S TO WATCH THEM.

C: BECAUSE HIS NAME (COULD) COULD BE KRYPTO BECAUSE HIS HIS HIS NAME IS THAT/’S THAT/’S) THAT/’S (WHAT) SUPERMAN/Z DOG IS NAMED.

C:  THEY DON’T WANT ANYONE WATCH/ING THEM.

A:  SUPERMAN/Z DOG IS NAMED KRYPTO?

A: HEY, (CAN YOU) WHILE YOU/’RE OUT THERE, CAN YOU CHECK THE MAILBOX AND SEE IF THERE/’S ANY MAIL?

C: MHM. A: OH.

C:  THERE/’S NOTHING.

A: I THINK, IS/N’T THERE SOMETHING ABOUT SUPERMAN AND KRYPTONITE, IS/N’T THERE?

A:  NO MAIL TODAY?

C: MHM.

C: NO. A:  OH THAT/’S TOO BAD.

A: WHAT HAPPEN/3S IF SUPERMAN GO/3S BY KRYPTONITE?

C:  HERE/’S THEIR DOGGY.

C:  HE HE GET/3S HITTED BY IT.

A:  OH WHAT/’S HIS NAME?

A: YEAH.

C:  HIS NAME IS GOODO.

A:  IT/’S NOT GOOD FOR HIM, IS IT?

A:  GOODO [G]?

C: UHUH.

Sample From 54 Months Old C: WAIT.

A: MHM.

C:  I THINK I KNOW WHAT THIS IS FOR.

C: SEE?

A: OH.

C: (THEN) AND THEN (IT) BRING/3S THE LITTLE BABY UP AND THEN IT BRING/S THE SOCCER PERSON UP.

A:  WHAT IS IT FOR? C: I THINK THIS IS LIKE IF SOMEBODY IS TOO FAR AND THEY (ARE) TOO TIRED FROM DRIVE/ING THEY HOOK THIS HOOK \ UP AND THEN THEY DRIVE BY.

C:  AND THEN IT BRING/3S THIS GUY UP. C: (AND THEN IT CLOSE/3S) AND IT CLOSE/3S (AND THEN DRIVE/ING) AND THEN EVERY SINGLE PEOPLE​ [EW:PERSON] GET/(S) HURT IN THE FIRETRUCK.

C: AND THERE/’S A FIRE ON THE HOOK AND (THEY) BRING IT UP HERE SO THEY CAN SPRAY IT.

C:  SO THEY HAVE TO GO TO THE AMBULANCE.

A:  OH YEAH.

A:  .

A:  THAT SOUND/3S LIKE A GOOD IDEA.

C:  ?

A:  I WONDER IF THEY NEED SOME HOSE/S.

A: YEAH.

C:  THIS IS A HOSE.

A:  I WANNA SEE THAT.

A:  IT IS A HOSE.

C:  THEY TAKE THE THING OUT.

A:  YOU/’RE RIGHT.

C:  THEN THEY TAKE OUT THE BED.

A:  THAT/’S A BIG ONE. C: LOOKIT.

C: THEN THE BABY GET/3S ON (AND THEN) AND THEN THE OTHER PERSON GET/3S ON.

C:  (THEY/’RE LAY/ING) THAT ONE FIT INTO THE BACK.

C:  AND THE OTHER PERSON GET/3S ON AND THEN>

C: SEE?

C:  (AH) THAT PERSON (IS/N’T) IS PRETTY SAFE.

A:  THEY ARE SO LUCKY.

C:  SO THIS GUY ISN/’T GO/ING.

C:  AND THEN WATCH. C: THE PEOPLE WITH SPECIAL LITTLE BED/S SO THEY ALL LAY DOWN ON THE BED [EU]. Note.  The samples were taken at 30, 42, and 54 months. Child utterances are indicated by “C,” Adult utterances are indicated by “A.” Grammatical morphemes that were produced are preceded by a forward slash; grammatical morphemes and other words that were omitted are enclosed in parentheses, so the complete utterances can be inferred from reading all the words. “3S” indicates third-person singular (as in “GET/3S = “gets”).

82

6  Typical Language Development

Chapter Summary Typical language development occurs over a broad range of ages at which milestones are reached. Age benchmarks for major language-development accomplishments, such as 50 words at 18 months, initial use of grammatical morphology around age 2 years, and the metalinguistic skill of decomposing words into component sounds around age 5 years, show substantial variability across children for age of mastery. Children’s lexicons — their vocabularies — begin to grow rapidly when they have a naming insight and, with a single trial of having an object labeled, “get” the connection between words and objects and actions and properties (such as colors or size). The vocabulary spurt that begins with the naming insight occurs around the same time as the appearance of the two-word sentence. Multiword utterances are typically combinations of semantic categories, as originally discussed by Roger Brown. The semantic categories are “frames” into which children insert different words; the frames are blueprints for acceptable sentence structures. Two-word utterances are gradually expanded to three- and four-word utterances. The mastery of grammatical morphology plays an important role in creating longer sentences. MLU, a measure of utterance length, is a good index of the linguistic sophistication of a child; the comparison of MLUs across different languages must account for the different grammatical morphology across languages. Language development from the beginning of grade school through adolescence and into young adulthood involves expansion of the lexicon, increased use and comprehension of sentences with complex syntax, and development of pragmatic skills in conversation and narratives; although the focus of language development is typically on the dramatic changes

83

occurring in the first several years of life, these laterlife language changes have a great deal of importance for social and academic skills.

References Bates, E. (1980). Vocal and gesture symbols at 13 months. Merrill-Palmer Quarterly, 26, 407–423. Bates, E., Thal, D., Whitesell, K., & Fenson, L. (1989). Integrating language and gesture in infancy. Developmental Psychology, 25, 1004–1019. Bloom, L. Rocissano, L., & Hood, L. (1976). Adult-child discourse: Developmental interaction between information processing and linguistic knowledge. Cognitive Psychology, 8, 521–552. Brinton, B., & Fujiki, M. (1989). Conversational management with language-impaired children. Rockville, MD: Aspen. Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press. Johnston, J. (2006). Thinking about child language: Research to practice. Eau Claire, WI: Thinking Publications Leadholm, B. J., & Miller, J. F. (1992). Language sample analysis: The Wisconsin guide. Madison, WI: Wisconsin Department of Public Instruction. Loban, W. (1976). Language development: Kindergarten through grade twelve (Research Report No. 18). Urbana, IL: National Council of Teachers of English. Nelson, K. (1973). Structure and strategy in learning to talk. Monographs of the Society for Research in Child Development, 38, 1–135. Nelson, K. (1981). Acquisition of words by first-language learners. Annals of the New York Academy of Sciences, 379, 148–159. Nippold, M. A. (1998). Later language development: The school age and adolescent years (2nd ed.). Austin, TX: Pro-Ed. Roseberry-McKibben, C. (2007). Language disorders in children: A multicultural and case perspective. Boston, MA: Pearson Education. Thordardottir, E. T. (2005). Early lexical and syntactic development in Quebec French and English: Implications for crosslinguistic and bilingual assessment. International Journal of Language and Communication Disorders, 40, 243–278.

7 Pediatric Language Disorders I Introduction There are many reasons why a child may have significant difficulties in acquiring language. Hearing impairment, intellectual disabilities associated with a variety of conditions, acquired brain injury (resulting from a traumatic brain injury, childhood stroke, seizure disorder, or tumor), or psychiatric problems can result in a clinically significant language delay. The current chapter covers language disorder of unknown cause, language disorder associated with autism spectrum disorder (ASD), and language disorder associated with hearing loss. As discussed later, language disorder without an obvious cause or due to a related condition is referred to as either specific language impairment (SLI) or developmental language disorder (DLD). The debate of diagnostic terminology between SLI and DLD is ongoing. In this chapter, the label “SLI/DLD” is used to reflect the usage of both terms even though clinicians and researchers in different countries (including the U.S.) use one or the other (see summary in Volkers, 2018). These types of language disorder affect a large number of children and continue into adulthood. These disorders are not specific to race, ethnic group, or country; they are observed and diagnosed around the world.

Specific Language Impairment/Developmental Language Disorder SLI is largely defined using exclusionary criteria — no intellectual disability, hearing impairment, motor deficits, autism, or other conditions that might account for language difficulties (Leonard, 2014). Traditionally, SLI has been defined in terms of normal range nonverbal IQ (score of 85 or better on standardized measure of IQ, which is within 1 SD of the mean of 100). It should be noted that at least in the United States the term SLI has mostly been used by researchers, rather than by clinicians. A proposed alternative for the diagnostic category of SLI is developmental language disorder (DLD). Children diagnosed with DLD, like children with SLI, have no obvious condition or disease that explains their language delay. DLD may be preferred because it is a more general term than SLI and does not exclude children with nonverbal IQs of 70 to 85. In addition, the diagnostic term DLD may be more widely understood by parents, educators, and administrators who approve speech-language therapy for children evaluated for language delay (Bishop, Snowling, Thompson, Greenhalgh, & CATALISE consortium, 2016, 2017).

85

86

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

SLI/DLD is the most frequent disability among children. According to several estimates, SLI/DLD occurs in around 7% of the population at school entry (5 years of age) (Norbury et al., 2016; Tomblin, Records, Buckwalter, Zhang, Smith, & O’Brien, 1997). Children diagnosed with SLI/DLD are at increased risk for reading problems as well as other academic problems. The early language impairment and later reading problems in school-aged children accumulate and lead to poor school performance and a poor career outlook. Society clearly benefits by an understanding of SLI/DLD, its causes, characteristics, and treatment. Even with the large variability in age milestones and rates of learning in typical language development (see Chapter 6), there are a surprisingly large number of otherwise typically developing children whose language skills are judged to be significantly impaired. In these cases, deficits in language performance may be primarily in the area of language production or in both language comprehension and production.

Standardized Tests A brief review of standardized tests is as follows (Chapter 4). A large number of typically developing children are tested on a particular skill (such as vocabulary size), at a particular age, resulting in a range of scores. These scores are converted to standardized units (z-scores), which are converted to a mean score of 100 and a standard deviation of 15 points, on either side of the mean. Sixty-eight percent of the tested children have scores between 85 and 115 on the standardized test, and 95.5% of the children have scores between 70 and 130.

There is a lot of variation in the language characteristics of children diagnosed with SLI/DLD. A typical description of a child diagnosed with this disorder is (a) the child did not produce his first word until 20 months of age, and at 3 years of age has an unusually small vocabulary (e.g., less than 200 words) and is just starting to combine words into short phrases; (b) the child shows only minimal mastery of grammatical morphemes; and (c) the child’s comprehension of language is better than her production of language, but both are delayed relative to typical development. In addition, the child does not produce complex sentences, may be hesitant to talk, and has problems with social interactions. When adults listen to a child diagnosed with SLI/ DLD, the language sounds “babyish,” as if the lan-

guage performance is mismatched with the child’s age. Formal tests of hearing (Chapter 23) place these children in the normal range for auditory sensitivity. There are no obvious signs of neurological disease or evidence of autism or psychiatric disturbance. There is no evidence that the child has been raised in a language-impoverished environment. Nonverbal tests of intelligence (IQ tests), in which children do not have to make responses requiring the use of language, reveal IQ scores within the normal range. SLI/DLD is a disorder with “fuzzy” boundaries. The diagnosis is not always made with a high degree of confidence because the delayed language may resemble delayed language characteristics not only in other disorders, but in language development in typically developing children who are learning language more slowly than most of their peers. Interested readers are referred to the reviews published by Conti-Ramsden and Durkin (2012), Kamhi and Clark (2013), Ellis Weismer (2013), and Laasonen et al. (2018).

Language Characteristics of Children with SLI/DLD Children with SLI/DLD are like typically developing children in at least one important sense: they have variable language skills at a specific age. Children with SLI/DLD may have impairments in vocabulary development, in phonology, in the use of grammatical morphemes, in syntactic constructions, and in pragmatics. Some children diagnosed with SLI/DLD seem to have near-normal comprehension of language but significant problems with the production (expression) of language; others have marked problems in both comprehension and production. A common thread in children diagnosed with SLI/DLD is a marked impairment in grammatical morphology, especially in productive language.

Phonology Children with SLI/DLD have more difficulty learning the sound structure of their language than typically developing children. At a given age, children with SLI are likely to have more speech sound errors than typically developing children. The problems with speech sound development may be related to a deficit in phonological memory, a kind of short-term memory important for the encoding of speech sound characteristics (Gathercole & Baddeley, 1990; Gathercole, 2006). Children with SLI/DLD make many more errors than typically developing children when asked to repeat nonwords (“nonword repetition tasks”). Nonword

7  Pediatric Language Disorders I

Phonology This chapter and the next summarize the language disorders of several groups of children according to the components of language summarized in Chapter 3. These components are phonology, morphology, syntax, content, and pragmatics. The first four are referred to as “structural” components of language, the last the social (usage) component. In this textbook, pediatric phonological disorders are presented in separate chapters. Why, then, are phonological disorders presented as a potential component of pediatric language disorders? Phonology, the structure of speech sound systems and the rules governing the use of those sounds, is not the same as articulation. Articulation (phonetics) is the physical act of producing a speech sounds; it is a motor behavior that is not necessarily unique to a specific language or dialect. Phonology can be thought of as the bridge between phonetics and language. Much as language is organized and produced according to rules, phonology is the organization of the phonetic inventory into a rule-based sound system. This is why we include phonology and its disorders as a potential (and often-observed) component of pediatric language disorders.

repetition errors are thought to reflect impairment of phonological working memory. This is important because impaired phonological memory may create difficulty in learning new words. The smaller vocabulary size in children with SLI/DLD may therefore be due, in part, to impairments in phonological working memory (Cody & Evans, 2008; Jackson, Leitao, & Claessen, 2016).

Grammatical Morphemes Problems with grammatical morphemes are generally viewed as the “signature” of SLI/DLD. Children diagnosed with SLI/DLD may have minimally delayed vocabularies and even age-appropriate phonology but significantly impaired production and/or comprehension of grammatical morphemes. The “babyish” characteristic of expressive language in children with SLI/ DLD, mentioned earlier, is associated with this type of difficulty. When a 4-year-old child produces sentences such as, “That my doggie,” “Yesterday I play with Bobby,” “Man in street,” and “My brother run to Mommy,” the mismatch of age and grammar is notice-

87

able. Children with SLI/DLD may have problems with various grammatical morphemes (English grammatical morphemes are listed in order of acquisition in Chapter 3). Some researchers propose that children with SLI/ DLD have particular difficulty with verb tense markers (Rice, 2014).

Syntax The delayed mastery of grammatical morphemes in children with SLI/DLD may reflect a more general problem of sentence comprehension. Children with SLI/DLD who are diagnosed with a primarily expressive language disorder still show comprehension skills that lag those of their typically developing peers. It is possible that the incorrect use of grammatical morphemes is related to comprehension of sentence structure. The difficulty with sentence processing (i.e., comprehension) is a poor model for the expression of grammatical morphemes. This view is supported by the finding of a relationship between sentence comprehension and grammatical morpheme expression in children with SLI/DLD: better comprehension is associated with better expression of grammatical morphemes (Bishop, Adams, & Norbury, 2006).

Vocabulary When children with SLI/DLD are given formal tests for vocabulary size, their scores tend to be lower than those of typically developing children of the same age (Laws & Bishop, 2003). Vocabulary tests can be given separately for expressive (productive) and receptive (comprehension) abilities, and SLI/DLD children often show a greater difference from typically developing children on the expressive test. This is consistent with the idea that children with SLI/DLD tend to have greater delays in expressive, as compared to receptive, language skills. Besides early delays in acquiring vocabulary during the preschool period, older children with SLI/DLD display deficits in their breadth (i.e., number of words they can define) and depth (i.e., amount of information for each word) of vocabulary knowledge (McGregor, Oleson, Bahnsen, & Duff, 2013). Older children and adolescents also have difficulties with more abstract or less frequent meanings of words (e.g., “cold” as a personality trait rather than a temperature) and with nonliteral (figurative) expressions (e.g., “feeling blue”).

Pragmatics Helland and Helland (2017) review the evidence for pragmatic difficulties among children diagnosed with

88

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

SLI/DLD. These children may have problems initiating and maintaining conversations and may have poor turn-taking skills. In addition, topic management during conversations and narratives tends to be poor, much like the examples given in Chapter 6. As pointed out by Roseberry-McKibbin (2007) and Helland and Helland, pragmatic problems lead to social and academic difficulties; children with SLI/DLD are often regarded as “different” by teachers and fellow students. The perception by others as “different” may be prompted by pragmatic difficulties such as normal greetings (“Hi, what’s going on?”), being attentive to the content and context of a conversation to become an effective communication partner, and understanding nonliteral aspects of language (“Let’s go” said as “Let’s roll”). As discussed later in this chapter, pragmatic difficulties among some children with SLI/DLD are similar to those seen in ASD but are usually less severe.

Summary of the Language Disorder in SLI/DLD In the preceding profile of how language may be delayed in children with SLI/DLD, each component of language was presented separately. This approach has some merit for instructional purposes, but in many cases too much focus on the individual trees blinds us to a proper view of the forest. Delays in particular components of language almost certainly have consequences for mastery of other components. As noted previously, a child with significantly impaired mastery of grammatical morphemes and who receive negative social cues because of the “babyish” sound of their speech is likely to initiate conversations less frequently. This, in turn, may result in substantially less practice of language skills, possibly resulting in lost opportunities to develop vocabulary. A similar situation may occur for children who have a substantial problem comprehending language. These children may have problems following conversations; when they try to make contributions to conversations their utterances may not be consistent with the topic being discussed because they have not understood it. Children with SLI/DLD may also lose language practice opportunities when other children or even adults do not choose them as conversation partners because of their problems with topic management. Individual language components may, in fact, be selectively disordered in children. This lends some real-world validity to identification of the status of separate language components in a child diagnosed with SLI/DLD (or any pediatric language disorder). The analysis of separate components of language may

also have use in guiding an intervention plan for the child diagnosed with SLI/DLD. For example, separate analyses of grammatical morphemes and vocabulary may identify which component is most delayed and, therefore, most in need of therapy. Whether the child’s language disorder is viewed through the lens of separate components of language, or as a disorder of multiple components of language development, the goal is to understand how language is affected for use in everyday life.

What Is The Cause of SLI/DLD? The cause or causes of SLI are not well understood. There are differing viewpoints among researchers about possible factors underlying SLI/DLD. Processing views suggest deficits in lower-level auditory processing skills, higher-level memory abilities such as working and phonological memory, and problems with executive function — the ability to guide language learning behavior, by focusing attention on important stimuli (and by implication, to know how to ignore other stimuli for the best learning outcomes). An alternate view is that SLI/DLD can be explained by language-learning problems that are specifically related to grammatical morphology or sentence-level syntactic constructions. More broadly, a genetic basis for disordered learning of these specific language components has been proposed to have a causal connection with SLI/DLD (Dale, Rice, Rimfeld, & HayiouThomas, 2018).

The Role of Genetics in SLI/DLD There is strong evidence of a genetic component in SLI/ DLD. SLI/DLD appears to be heritable, most likely as a result of the interaction of several genes whose structures predispose an individual to have a developmental language disorder. In the case of SLI/DLD, these several genes are likely to interact with environmental conditions (Peterson, McGrath, Smith, & Pennington, 2007). The genetic predisposition to SLI/DVD means that a language delay is not an inevitable outcome of a child’s genetic profile. It is the predisposition that is heritable; environmental influences such as extensive versus minimal language input to the developing child may work against or in favor of the predisposition. Evidence for the heritability of SLI/DLD comes from a range of studies, including twin and broader familial studies. Twin studies reveal that SLI/DLD is more likely to occur in both members of monozygotic (identical) twins compared with both members of dizy-

7  Pediatric Language Disorders I

gotic (fraternal) twins. Stated in a different way, if one member of a twin pair has SLI/DLD, the probability of the other member having SLI/DLD is significantly higher in the identical versus fraternal twin pair. This supports a genetic component in SLI/DLD because identical twins have the same genetic profile, whereas fraternal twins do not. Other familial studies have uncovered evidence of greater probability of language delay, or a history of language delay, in relatives of a child with SLI/ DLD compared with relatives of typically developing children (that is, a control group). The heritability of a genetic predisposition for SL/DLD is supported by these findings (Rice, 2013; Tomblin, 2009).

Language Delay and Autism Spectrum Disorder As defined by the Diagnostic and Statistical Manual of Mental Disorders, fifth edition1 (DSM-5), (http://www​ .aappublications.org/content/early/2013/06/04/aapnews.20130604-1), a diagnosis of Autism Spectrum Disorder (ASD) is made when a child has problems with social communication/interactions and demonstrates repetitive and restricted behaviors. Within these two general categories of behavior, there are more specific criteria for the diagnosis. Our focus is on the speech and language characteristics of children diagnosed with ASD according to the DSM-5 criteria. Like SLI/DLD, the causes of ASD are unknown. The Centers for Disease Control and Prevention (CDC) estimated in 2018 that 1/59 children at 8 years of age in the United States were diagnosed with autism (https:// www.cdc.gov/mmwr/volumes/67/ss/ss6706a1.htm​ ?s_cid=ss6706a1_w). Between 2000 and 2014, there was a dramatic increase in the prevalence of autism (see review in Graf, Miller, Epstein, & Rapin, 2017). Many children diagnosed with autism have some form of language disorder in addition to the core deficit of social communication (pragmatics), which are part of the diagnostic criteria for this condition.

Language Characteristics in ASD Chapter 3 described three major components of language — form, content, and use. Form includes phonetics, phonology, and morphology; content is meaning

1

89

(the lexicon); and use refers to pragmatics (social communication). At kindergarten age, 70% to 75% of children with ASD are verbal. This group includes children with well-developed language skills and varying degrees of language disorders. Currently, there are not good estimates of the proportion of children with ASD who have language disorders in addition to difficulties with social communication. Preschool children with ASD who have limited language skills are referred to as “preverbal,” and those without functional spoken language after 5 years of age are referred to as “minimally verbal.” Twenty-five to 30% of children diagnosed with ASD are estimated to be minimally verbal at kindergarten age. Some nonverbal children may have a few words or phrases but do not routinely use spoken language to communicate. Although the absence of multiple word utterances at 5 years of age represents a significant language disorder (Chapter 6), these children are expected to make gains in their language skills as they progress through the school years (Tager-Flusberg & Kasari, 2013). According to Boucher (2012). Even older children and adults with autism who appear to have typically developing language skills may have subtle language differences from neurotypical children and adults (see Box, “Neurotypical and Neurodiverse”).

Neurotypical and Neurodiverse “Neurotypical” is a term used primarily in the autism community (autistic persons and their allies) to indicate typical skills and behaviors; the term is descriptive of what used to be classified as “normal.” Many research publications on autism use the term “neurotypical” to describe control groups of typically developing (or developed) individuals. The term is not used as the standard, but rather a point on a continuum of brain types. A different point on that continuum is “neurodiverse,” suggesting (in the case of ASD) the brain type of persons with autism. These two terms do not stand opposed as “normal” versus “abnormal.” Rather, neurodiverse is regarded as different from neurotypical, not disordered.

The Diagnostics and Statistical Manual of Mental Disorders, version 5, is published by the American Psychiatric Association to specify diagnostic criteria for disorders it classifies as “mental disorders.” ASD is included in this classification. The criteria for a diagnosis ASD have been updated over the last 20 years and in some respects are controversial.

90

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Phonology Phonological delays (speech sound delays/disorders: see Chapters 13 and 15) have not been studied extensively in children with autism. The lack of data may be due partly to the clinical impression of normal or nearnormal phonetic/phonological skills among children with autism. Speech sound delays have been noted for a small percentage of children with autism who have neartypical speech and language development. The nature of the speech sound errors is rarely reported, and the resolution of the errors as the children mature has not been addressed. This is important because typically developing children have speech sound errors that resolve during the course of phonetic/phonological development. The speech sound errors reported for children with autism may be similar to the errors made by typically developing children and may resolve without intervention (see Boucher, 2012, p. 224). A review of speech sound disorders in children with autism is available in Broome, McCabe, Docking, and Doble (2017).

Morphology Some children with ASD have deficits in morphological development. The delay may include morphological markers such as tense (walk-walked) and possessives (Bob-Bob’s) as well as other morphemes. In children with speech and language skills that are not severely impaired, the development of morphology is often within age-level expectations or mildly delayed. In fact, delayed morphological development in some children with ASD has been described as similar to the profile of morphological errors observed in SLI/DLD.

Syntax The profile of syntactic development in ASD highlights the difference between expressive and receptive (comprehension) language development. Syntactical development in verbal children with ASD seems to be near typical, and especially so for children with more advanced language skills even when those skills are delayed. Like morphology, expressive syntax may seem near typical, but comprehension of syntax is not. This split between expressive and comprehension development becomes more obvious with more complicated syntactical forms. An example of a difference between a simple and more complicated syntactical form is the sentence, “The boy came home” (simple) versus “The

boy who owns the bike came home” (more complicated). It certainly is possible that children with ASD may be less likely than typically developing children to use (express) sentences with more complicated syntax, but their comprehension of syntactical complexity is clearly impaired.

Vocabulary As with other language components, vocabulary skills vary to a large degree among children with ASD. In general, vocabulary skills in children with ASD are delayed relative to those of typically developing children. This is similar to the vocabulary profile of children with SLI/DVD, as well as children with intellectual disabilities (Chapter 8). The relationship between receptive and expressive vocabulary in ASD seems to be different compared with typically developing children and groups of children with delayed or disordered language development. Larger receptive compared with expressive vocabularies are the rule for these children — they understand more words than they say. In contrast, a significant number of children with ASD have receptive vocabularies that are more impaired than their expressive vocabularies, based on age expectations (Kover, McDuffie, Hagerman, & Abbeduto, 2013; Kover & Ellis Weismer, 2014). Children with ASD may have delayed expressive vocabularies because the new words they add tend to be limited to word possibilities that sound alike (e.g., “house,” “mouse,” “ball,” “fall”; see Kover & Ellis Weismer, 2014; McDaniel, Yoder, & Watson, 2017). Typically developing children add words from a broader range of possibilities. It is not clear why this may be the case, but if true, it points to a therapeutic strategy to build expressive vocabulary in children with ASD: extend vocabulary training items to words that do not share sounds with words already in the expressive vocabulary. Why is receptive vocabulary affected to such a significant degree in children with ASD? The answer is not clear. One possibility is that nonverbal cognitive abilities have a disproportionate effect on receptive, as compared with expressive, vocabulary (Kover et al., 2013). An analysis of longitudinal expressive and receptive vocabulary skill in McDaniel et al. (2017) was done to evaluate the influence of receptive vocabulary on expressive vocabulary. A large group of children with ASD were studied over a 16-month period; the investigators reasoned that change in receptive vocabulary skill over this period would predict change in

7  Pediatric Language Disorders I

expressive vocabulary skill. This is a very reasonable expectation from the idea that the size of the receptive vocabulary “drives” the size of the expressive vocabulary. Surprisingly, the analysis did not support this expectation (McDaniel et al., 2017). The results of McDaniel et al. (2017) may have important clinical implications. If the receptive vocabulary does drive the expressive vocabulary, therapy directed at improving the receptive vocabulary is a sound idea for expanding the expressive vocabulary. McDaniel et al.’s results, however, do not support this approach. Perhaps therapeutic efforts to expand the expressive vocabulary are best directed at expressive tasks, or expressive plus receptive tasks.

Pragmatic Language Recall that one of the two core deficits that must be observed for a diagnosis of ASD is “problems with social communication/interactions.” (The other core deficit is “repetitive and restricted behaviors.”) In this section, we focus on the core deficit of social communication/interactions. A more precise understanding of the core impairment of pragmatic language in ASD is its presence

across individuals despite a range of structural language abilities. Structural language abilities include phonological, morphological, syntactic, and semantic components, which may range across individuals diagnosed with ASD from typically developing to severely impaired skills. Whatever the range of structural language skills may be, pragmatic language impairments are always present in children diagnosed with ASD. Many individual behaviors contribute to appropriate pragmatic language. These behaviors are grouped into five categories for observation of children 5 years and older, listed in Table 7–1 (Cordier, Munro, WilkesGillan, Speyer, & Pearce, 2014). Several of the items include behaviors that are not verbal but describe a more abstract level of social communication (Baird & Norbury, 2016). Observed impairments of the items in Table 7–1 are not part of a formal checklist for diagnosis of ASD but serve as one way to appreciate the wide range of pragmatic language behaviors and their potential to contribute to a diagnosis of pragmatic language impairment. Symptoms of pragmatic language impairment in ASD may or may not appear until children are old enough to engage in social situations in which the impairments can be reliably observed (Baird & Norbury, 2016).

Table 7–1.  Five General Categories, Each of Which Includes Several Behaviors That May Be Observed in the Pragmatic Language Impairment in Autism Spectrum Disorder

Category

Examples

Introduction and responsiveness

Selects and introduces a range of conversational topics

Nonverbal communication

Uses and responds to identifiable, clear, intentional body actions and movements

Initiates verbal information appropriate to the context

Uses and responds to a variety of facial expressions to express consistent meanings Social-emotional attunement

Considers/integrates another’s viewpoints/emotions

Executive function

Attends to communicative content; plans and initiates appropriate responses

Appropriate use of social language within context

Versatile ways to interpret/connect/express ideas Negotiation

91

Uses appropriate methods for resolving disagreement Expresses feelings appropriate to the context

Note.  Adapted from “Reliability and Validity of the Pragmatics Observational Measure (POM): A New Observational Measure of Pragmatic Language for Children,” by R. Cordier, N. Munro, S. Wilkes-Gillan, R. Speyer, and W. M. Pearce, 2014. Research in Developmental Disabilities, 35, pp. 1588–1598.

92

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Social Communication Disorder The DSM-5 includes specific criteria for the diagnosis of Social (Pragmatic) Communication Disorder (SCD). This is a disorder of social communication that does not include the repetitive and rigid behaviors seen in ASD but shares with it features of pragmatic disorder, including nonverbal and verbal impairment. In fact, some experts contend that SCD was previously diagnosed as “Pragmatic Language Impairment,” and others think these children were labeled “Autism Spectrum Disorder Not Otherwise Specified”— a category in the previous version of

Language Delay and Hearing Impairment Children born with hearing impairment have hearing losses ranging from mild to profound. Degree of hearing loss is defined on the basis of audiometric findings. Audiometry, discussed fully in Chapter 23, includes the quantification of sound energy (measured in decibels) required for a listener to detect tones at a series of different frequencies. According to the American Speech-Language-Hearing Association (ASHA), mild hearing loss requires 21 to 40 decibels (dB) to reach this “just detectable” criterion, moderate hearing loss 40 to 55 dB, severe loss 71 to 90 dB, and profound hearing loss 91 dB and greater (https://www. asha.org/public/hearing/degree-of-hearing-loss/). The “profound” category includes people who are legally deaf, some of whom may be able to respond to very high levels of sound energy with amplification (e.g., a hearing aid). Some individuals do not respond to any level of sound energy, even with amplification.

Epidemiology of Hearing Loss The incidence of hearing loss is one to three per thousand births (approximately 0.1% to 0.2%). Between the ages of 5 and 9 years, the prevalence of hearing loss increases to 2.7 to 3.5 per 1,000 children. Why is the prevalence in school-age children higher than the incidence in newborns? Possible reasons include the inclusion in prevalence estimates of children who were not diagnosed with hearing loss at birth, the influence of certain medications on hearing, and the inclusion of children who have acquired hearing loss

the DSM, meaning that a child had autistic-like characteristics but did not fully meet the criteria for a diagnosis of autism. SCD, diagnosed in childhood, is likely to persist into adulthood. SCD is not explained by low cognitive ability but interferes substantially with social relationships and academic and career performance. Children diagnosed with SCD may also have impairments of structural language (phonology, morphology, syntax, and content). SCD, a controversial diagnosis, is discussed comprehensively by Norbury (2014) and Baird and Norbury (2016).

due to various conditions/diseases or accidents (Kremer, 2019). It has been estimated that 54% to 68% of hearing loss at age 4 has a genetic basis (Morton & Nance, 2006). Although there are many genes implicated in congenital hearing loss, and especially hearing loss in the severe and profound categories, a more limited number of genes have been identified with direct influence on development of the cochlea, the sense organ for hearing. These genes are often referred to as “deafness genes.”

Language Characteristics in Hearing Impairment All individuals with hearing loss are at risk for developmental speech and language impairment. The degree of hearing loss predicts the risk of speech and language delays in a general sense, but not absolutely. There are children with severe hearing loss with speech and language skills equivalent to those of children with moderate hearing loss, and children with greater hearing loss having better speech and language skills than children with lesser loss (see Fitzpatrick, Crawford, Ni, & Durieux-Smith, 2011, for examples of this). Lederberg, Schick, and Spencer (2013) have published an excellent review of factors that may or may not contribute to speech and language development in children with hearing loss. Oral speech and language impairments in hearingimpaired children make sense, because the input that typically drives the development of speech and language is degraded to varying degrees (mild, moderate, severe) or unavailable (deaf). The input from parents,

7  Pediatric Language Disorders I

siblings, and other people who speak to a baby, to a toddler, and to older children, plays a huge role in the development of both receptive and expressive skills for oral language. We use the term “oral language” to make clear that other modalities for communication exist, and in fact are critical to understanding language skills in persons with congenital hearing loss, and especially children who are born deaf. For example, deaf individuals who are native users of American Sign Language (ASL) learn language via a visual, rather than auditory medium. Information is communicated as efficiently by ASL as it is by speech, and the language has a structure of linguistic rules, just like oral languages. In short, ASL is a language like any other (Lederberg et al., 2013). ASL is the native language of the Deaf community, the capital “D” signifying the community whose members do not view deafness as a disability. Roughly 10% of deaf children are born to Deaf parents (Mitchell & Karchmer, 2004). The discussion in the current section focuses primarily on hearing impaired and deaf people who choose to communicate orally, or have parents who choose oral communication for their children. This choice of oral communication may involve some support from a manual language such as ASL or other manual communication systems.

Speech and Language Development and Hearing Impairment As stated by Fitzpatrick et al. (2011, p. 605), “children with hearing loss of all degrees of severity continue to lag behind expectations for children with normal hearing . . . in multiple communication domains.” Typically, the issue is not so much if there is a delay, but the extent of the delay. This is critical because delays in any or all aspects of speech and language have significant potential to affect academic and social skills. The following discussion of developmental speech and language characteristics in children with hearing loss is broad. The summaries do not necessarily apply to every child with a hearing loss; just as in the typically developing population, different children have different paths to speech and language learning. In particular, and as previously noted, the pattern of speech and language development varies broadly with degree of hearing loss. Keep in mind, however, that the influence of the degree of hearing loss on speech and language development is often offset to some degree by amplification (e.g., hearing aids) and/or cochlear implants. Almost all of the research papers cited have

93

excellent reviews of language development in children with hearing loss, including deafness. A comprehensive review of language development in deaf children with cochlear implants is provided by Ganek, Robbins, and Niparko (2012).

Phonology Expressive phonological skills develop more slowly in hearing impaired children compared with typically developing children. In 4- and 5-year-old children with moderate-to-severe hearing loss and either hearing aids or cochlear implants, Fitzpatrick et al. (2011) found dramatically lower scores on the Goldman-Fristoe test of articulation compared with scores of typically developing children of the same ages. The Goldman-Fristoe is a standardized test that counts the number of correctly articulated sounds and relates the score to expected articulatory skills at a given age. A significant deficit in speech intelligibility results from the frequent occurrence of speech sound errors among children with hearing loss. Listeners have difficulty understanding children with hearing loss partly (and probably largely) due to speech sound errors; in general, the greater the number of errors, the greater is the speech intelligibility deficit. Speech sound development in children with hearing impairment depends on a number of receptive-language factors. Hearing loss makes it difficult to learn the acoustic properties required to develop a cognitive representation of speech sounds. Some consonants such as fricatives (e.g., “s,” “sh”) have less sound energy than other consonants and are more susceptible to the effects of hearing loss on forming phonological representations. Phonological memory also seems to be impaired in hearing impaired children (Halliday, Tuomainen, & Rosen, 2017a). This is a short-term memory specialized for speech sound information. Impairments in phonological memory work against the establishment of good cognitive representations of speech sound categories. Data on French-speaking children with mild-tomoderate hearing loss and no other disabilities suggest that phonological gains are made throughout childhood but do not “normalize” in adolescence. Halliday et al. (2017b) and Nittrouer, Muir, Tietgens, Moberly, & Lowenstein (2018) point to phonological skill as the most impaired aspect of language skill in children at all levels of hearing impairment. Phonological delays that extend from grade school and into high school and beyond have the potential to affect academic performance, especially in the areas of reading and writing (Deluge & Tuller, 2007).

94

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Morphology and Syntax Morphological and syntactic skills are delayed in children with hearing impairment. The degree and nature of these delays depend on many factors, including severity of hearing loss, cognitive skills, and history of amplification (hearing aids) and/or cochlear implantation. Delay in the mastery of grammatical morphemes has been reported for children with mild, moderate, and severe hearing loss. Syntax is also delayed, sometimes into the adolescent years (Deluge & Tuller, 2007). The delays in mastery of grammatical morphemes and syntax are observed for both receptive and expressive language (reviewed in Halliday et al., 2017b). Cochlear implants have a significant influence on language learning among prelingually deaf children (deaf before the age of 5 years). Cochlear implants provide deaf individuals with auditory stimulation that contributes to skills in all areas of language development. In a study of deaf children aged 5 to 13 years who had received a cochlear implant (or implants — both ears) prior to age 5 years, and had used them for an average of approximately 6 years, children with implants developed significant language skills, but their expressive morphological and syntactic skills continued to be delayed relative to a typically developing control group (Boons, De Raeve, Langereis, Peeraer, Wouters, & van Wieringen, 2013). Within the group of children with cochlear implants, some had morphological and syntactic skills equal to those of typically developing children. In general, even children who are implanted at early ages (around 1 year old, or even younger) may struggle with morphological and syntactic skills as they grow older. This may be the case even when other aspects of their language (e.g., vocabulary) are at typically developing levels (Ganek, Robbins, & Diparko, 2012). Chapter 24 presents additional information on cochlear implants.

Vocabulary Vocabulary is often regarded as a strength among children with hearing impairment. Children between the ages of 8 and 16 years with mild-to-moderate hearing loss may have age-appropriate receptive and expressive skills only slightly lower than age expectations (Halliday et al., 2017b). When tested at age 5 years, children who had cochlear implants before the age of 2 years and at least 2 years of experience with the implants, had significantly lower receptive vocabulary skills than typically developing children. However, when the same children were tested over two consecutive years, their rate of vocabulary growth was greater

than that of typically developing children (Hayes, Geers, Treiman, & Moog, 2009). In general, better language outcomes can be expected with earlier age at implantation (Ganek et al., 2012).

Pragmatic Language As pointed out by Goberis, Beams, Dalpes, Abrisch, Baca, and Yoshinaga-Itano (2012), very little research has been done on social language use in hard of hearing and deaf individuals. Goberis et al. developed a questionnaire in which social language use items were rated by parents of hard of hearing and deaf children; these ratings were compared to ratings by parents of children with typical (normal) hearing. The primary findings were as follows: (a) children with hearing loss had slower development of pragmatic language skills compared with typically developing children, and (b) the rate of social language learning from age 3 to 7 years depended on hearing loss category. Rate of learning was relatively high in the mild hearing loss group, and relatively low in the profoundly hearing impaired group. Observation of children with hearing loss in social language situations is needed to specify characteristics of pragmatic language use suggested by the parent questionnaire data. One reason that there has not been much research on social language use in this population is that it is not a primary area of difficulty as it is for children with ASD.

Chapter Summary Specific language impairment (SLI) and developmental language disorder (DLD) are two terms used to designate essentially the same disorder in children, that of delayed language development in children who are typically developing in every other way; in this chapter, the disorder is labeled “SLI/DLD.” SLI/DVD is the most frequent disability among children between 3 and 5 years of age, occurring in about 3% to 7% of the population. SLI/DLD is typically diagnosed around the age of 3 years. SLI/DLD is a diagnosis of language disorder in the absence of known causes; hearing loss, autism and other development disabilities, intellectual disability, craniofacial anomalies, psychiatric disturbance, and neurological disease must be ruled out as potential causes for the language delay. Children diagnosed with SLI/DLD may have difficulties with all aspects of language development,

7  Pediatric Language Disorders I

both expressive and receptive; expressive delay can exist in the absence of receptive delay, but children with a receptive delay are likely to have expressive delays. All components of language are likely to be delayed in SLI/DVD, with especially prominent delays in vocabulary and grammatical morphology. SLI/DVD is a significant problem because many of the preschool children who receive this diagnosis are likely to experience substantial academic, social, and career difficulties. Strong evidence exists for a genetic component in SLI/DLD, based on familial patterns of the disorder, but the precise combination of genes that make a child susceptible to SLI/DLD has not been identified. Autism spectrum disorder (ASD) is diagnosed when a child has a well-identified social communication disorder and repetitive and restricted behaviors. Some children diagnosed with ASD are nonverbal, but the great majority have language skills ranging from significantly delayed to typical or even advanc-ed; these skills are likely to improve as the child matures. Children with ASD have variable severity (and in some cases, no delay) of structural language skills. Like SLI/DLD, familial patterns of ASD have been firmly established but the specific genes underlying the disorder have not been identified. All children with hearing loss, no matter the severity, are at risk for language impairment; the degree of loss predicts the severity of the language impairment to a substantial but not perfect degree. A genetic, hereditable basis has been identified for persons with deafness, although the precise combination of deafness genes has not been identified. Expressive phonological skills develop more slowly in hearing impaired children compared with typically developing children; some researchers believe that the most impaired language component in children with hearing loss is phonological. Impairments in speech intelligibility, resulting from the phonological impairments, are a significant problem in children with hearing loss. Receptive and expressive morphological and syntactical skills are delayed in hearing impaired children. Vocabulary is regarded as a relative strength in children with hearing impairment. Pragmatic language has not been fully studied in children with hearing loss. Language development in children with profound hearing impairment seems to be optimal when a child receives cochlear implants before the age of 2 years.

95

References Baird, G., & Norbury, C. F. (2016). Social (pragmatic) communication disorders and autism spectrum disorder. Archives of Disease in Childhood, 101, 745–751. Bishop, D. V. M., Adams, C. V., & Norbury, C. F. (2006). Distinct genetic influences on grammar and phonological memory deficits: Evidence from 6-year-old twins. Genes Brain and Behavior, 5(2), 158–169. Bishop, D. V. M., Snowling, M., Thompson, P. Greenhalgh, T., & CATALISE consortium (2016). CATALISE: A multinational and multidisciplinary Delphi consensus study. Identifying language impairments in children. PLoS ONE, 11(7); e0158753. https://doi.org/10.1371/journal.pone​.0158753 Bishop, D. V. M., Snowling, M., Thompson, P. Greenhalgh, T., & CATALISE consortium. (2017). Phase 2 of CATALISE: A multinational and multidisciplinary Delphi consensus study of problems with language development: Terminology. Journal of Psychology and Psychiatry, 58, 1068–1080. Boons, T., De Raeve, L., Langereis, M., Peeraer, L., Wouters, J., & van Wieringen, A. (2013). Expressive vocabulary, morphology, syntax and narrative skills in profoundly deaf children after early cochlear implantation. Research in Developmental Disabilities, 34, 2008–2022. Boucher, J. (2012). Research review: Structural language in autism spectrum disorder — Characteristics and causes. Journal of Child Psychology and Psychiatry, 53, 219–233. Broome, K., McCabe, P., Docking, K., & Doble, M. (2017). A systematic review of speech assessments for children with autism spectrum disorder: Recommendations for best practice. American Journal of Speech-Language Pathology, 26, 1011–1029. Cody, J. A., & Evans, J. L. (2008). Uses and interpretations of non-word repetition tasks in children with and without specific language impairments (SLI). International Journal of Language and Communication Disorders, 43, 1–40. Conti-Ramsden, G., & Durkin, K. (2012). Language development and assessment in the preschool period. Neuropsychology Review, 22, 384–401. Cordier R., Munro, N., Wilkes-Gillan, S., Speyer, R., & Pearce, W. M. (2014). Reliability and validity of the Pragmatics Observational Measure (POM): A new observational measure of pragmatic language for children. Research in Developmental Disabilities, 35, 1588–1598. Dale, P. S., Rice, M. L., Rimfeld, K., & Hayiou-Thomas, M. E. (2018). Grammar clinical marker yields substantial heritability for language impairments in 16-year-old twins. Journal of Speech, Language, and Hearing Research, 61, 66–78. Deluge, H., & Tuller, L. (2007). Language development and mild-to-moderate hearing loss: Does language normalize with age? Journal of Speech, Language, and Hearing Research, 50, 1300–1313. Ellis Weismer, S. (2013). Specific language impairment. In L. Cummings (ed.), Cambridge handbook of communication disorders (pp. 73–87). Cambridge, UK: Cambridge University Press. Fitzpatrick, E. M., Crawford, L., Ni, A., & Durieux-Smith, A. (2011). A descriptive analysis of language and speech

96

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

skills in 4- to 5-yr-old children with hearing loss. Ear and Hearing, 32, 605–616. Ganek, H., Robbins, A. M., & Niparko, J. K. (2012). Language outcomes after cochlear implantation. Otolaryngologic Clinics of North America, 45, 173–185. Gathercole, S. E. (2006). Nonword repetition and word learning: The nature of the relationship. Applied Psycholinguistics, 27, 513–543. Gathercole, S., & Baddeley, A. (1990). Phonological memory deficits in language disordered children: Is there a causal connection? Journal of Memory and Language, 29, 336–360. Goberis, D., Beams, D., Dalpes, M., Abrisch, A., Baca, R., & Yoshinaga-Itano, K. (2012). The missing link in the language development of the deaf and hard of hearing: Pragmatic language development. Seminars in Speech and Language, 33, 297–309. Graf, W. D., Miller, G., Epstein, L. G., & Rapin, I. (2017). The autism “epidemic”: Ethical, legal, and social issues in a developmental spectrum disorder. Neurology, 88, 1371–1380. Halliday, L. F., Tuomainen, O., & Rosen, S. (2017a). Auditory processing deficits are sometimes necessary and sometimes sufficient for language difficulties in children: Evidence from mild to moderate sensorineural hearing loss. Cognition, 166, 139–151. Halliday, L. F., Tuomainen, O., & Rosen, S. (2017b). Language development and impairment in children with mild to moderate sensorineural hearing loss. Journal of Speech, Language, and Hearing Research, 60, 1551–1567. Hayes, H., Geers, A. E., Treiman, R., & Moog, J. S., (2009). Receptive vocabulary development in deaf children with cochlear implants: Achievement in an intensive auditoryoral educational setting. Ear and Hearing, 30, 128–135. Helland, W. A., & Helland, T. (2017). Emotional and behavioural needs in children with specific language impairment and in children with autism spectrum disorder: The importance of pragmatic language impairment. Research in Developmental Disabilities, 70, 33–39. Jackson, E., Leitao, S., & Claessen, M. (2016). The relationship between phonological short-term memory, receptive vocabulary, and fast mapping in children with specific language impairment. International Journal of Language and Communication Disorders, 51, 61–73. Kamhi, A. G., & Clark, M. K. (2013). Specific language impairment. In O. Dulac, M. Lassonde, & H. B. Sarnat (Eds.), Handbook of clinical neurology, Vol. III (3rd series), Pediatric Neurology Part I (pp. 219–227). Amsterdam, the Netherlands: Elsevier. Kover, S. T., & Ellis Weismer, S. (2014). Lexical characteristics of expressive vocabulary in toddlers with autism spectrum disorder. Journal of Speech, Language, and Hearing Research, 57, 1428–1441. Kover, S. T., McDuffie, A. S., Hagerman, R. J., & Abbeduto, L. (2013). Receptive vocabulary in boys with autism spectrum disorder: Cross-sectional developmental trajectories. Journal of Autism and Development Disorders, 43, 2696–2709. Kremer, H., (2019). Hereditary hearing loss: About the known and the unknown. Hearing Research. https://doi.org/10​ .1016/j.heares.2019.01.003

Laasonen, M., Smolander, S., Lahti-Nuuttila, P., Leminen, M., Lajunen, H-R. Heinonen, K., . . . Arkkila, E. (2018). Understanding developmental language disorder — The Helsinki longitudinal SLI study (HelSLI): A study protocol. BMC Psychology, 6(24). https://doi.org/10.1186/ s40359-018-0222-7 Laws, G., & Bishop, D. V. M. (2003). A comparison of language abilities in adolescents with Down syndrome and children with specific language impairment. Journal of Speech, Language, and Hearing Research, 46, 1324–1339. Lederberg, A. R., Schick, B., & Spencer, P. E. (2013). Language and literacy development of deaf and hard-of-hearing children: Successes and challenges. Developmental Psychology, 49, 15–30. Leonard, L. B. (2014). Children with specific language impairment (2nd ed.). Cambridge, MA: MIT Press. McDaniel, J., Yoder, P., & Watson, L. R. (2017). A path model of expressive vocabulary skills in initially preverbal preschool children with autism spectrum disorder. Journal of Autism and Developmental Disorders, 47, 947–960. McGregor, K., Oleson, J., Bahnsen, A., & Duff, D. (2013). Children with developmental language impairment have vocabulary deficits characterized by limited breadth and depth. International Journal of Language and Communication Disorders, 48, 307–319. Mitchell, R. E., & Karchmer, M. A. (2004). Chasing the mythical ten percent: Parental hearing status of deaf and hard of hearing students in the United States. Sign Language Studies, 4, 138–163. Morton, C. C., & Nance, W. E. (2006). Newborn hearing screening — A silent revolution. New England Journal of Medicine, 354, 2151–2164. Nittrouer, S., Muir, M., Tietgens, K., Moberly, A. C., & Lowenstein, J. H. (2018). Development of phonological, lexical, and syntactic abilities in children with cochlear implants across the elementary grades. Journal of Speech, Language, and Hearing Research, 61, 2561–2577. Norbury, C. F. (2014). Practitioner review: Social (pragmatic) communication disorder conceptualization, evidence and clinical implications. Journal of Child Psychiatry and Psychology, 55, 204–216. Norbury, C. F., Gooch, D., Wray, C., Baird, G., Charman, T., Simonoff, E., . . . Pickles, A. (2016). The impact of nonverbal ability on the prevalence and clinical presentation of language disorder: Evidence from a population study. Journal of Child Psychiatry and Psychology, 57, 1247–1257. Peterson, R. L., McGrath, L. M., Smith, S. D., & Pennington, B. F. (2007). Neuropsychology and genetics of speech, language, and literacy disorders. Pediatric Clinics of North America, 54, 543–561. Rice, M. L. (2013). Language growth and genetics of specific language impairment. International Journal of SpeechLanguage Pathology, 15, 223–233. Rice, M. L. (2014). Grammatical symptoms of specific language impairment. In D. V. M. Bishop & L. B. Leonard (Eds.), Speech and language impairments in children: Causes, characteristics, intervention and outcomes (pp. 17–34). London, UK: Psychology Press.

7  Pediatric Language Disorders I

Roseberry-McKibbin, C. (2007). Language disorders in children: A multicultural and case perspective. Boston, MA: Pearson Education. Tager-Flusberg, H., & Kasari, C. (2013). Minimally verbal school-aged children with autism spectrum disorder: The neglected end of the spectrum. Autism Research, 6, 468–478. Tomblin, J. B. (2009). Genetics of child language disorders. In R. G. Schwartz (Ed.), Handbook of child language disorders (pp. 232–256). New York, NY: Psychology Press.

97

Tomblin, J. B., Records, N. L., Buckwalter, P., Zhang, X. Y., Smith, E., & O’Brien, M. (1997). Prevalence of specific language impairment in kindergarten children. Journal of Speech Language and Hearing Research, 40, 1245–1260. Volkers, N. (2018). Diverging views on language disorders researchers debate whether the label “developmental language disorder” should replace “specific language impairment. ASHA Leader, 23, 44–53.

8 Pediatric Language Disorders II Introduction This chapter presents information on language characteristics of children (and to some extent, adults) with intellectual disability (ID). The focus is on ID and its effect on speech and language in children with Down syndrome (DS) and Fragile X syndrome (FXS), but the information applies to other disorders in which ID is a prominent characteristic. The effects of hearing impairment on speech and language development, discussed in Chapter 7, must be factored into the effects of ID on speech and language development. This is because children with intellectual disabilities are much more likely to have hearing impairment than children in the general population (Carvill, 2001). The combination of an ID with a hearing impairment makes the challenge of language development more difficult than the effect of either disability alone. Although the combined effects of ID and hearing impairment are not discussed in detail here, the reader should keep in the mind the potential effect of the combination of ID and hearing impairment on speech and language development in DS and FXS. DS and FXS account for the great majority of cases of ID. Speech and language development are affected by the intellectual disability, although not necessarily in the same way for DS and FXS.

The term “syndrome,” common to DS and FXS, deserves a formal definition: a syndrome is a group of symptoms that occur together, and is seen in a series of children (not just one child), and that represent a disease process. In DS and FXS, the grouping of symptoms is an important part of the diagnosis; the diagnosis in both syndromes can also be confirmed by genetic analysis as discussed below.

Criteria for a Diagnosis of ID According to the Diagnostic and Statistical Manual of Mental Disorders, Fifth edition (DSM-5) (American Psychiatric Association, 2013), ID is diagnosed when chronic impairments of general mental abilities have an impact on adaptive functioning in three areas, including (a) conceptual skills: language, reading, writing, math, reasoning, knowledge, and memory; (b) social behaviors: empathy, judgment, interpersonal communication skills, making and maintaining friendships; and (c) practical behaviors: personal care, job responsibilities, management of money, recreation, and organization of tasks.1 These impairments must be observed during childhood. Adults who acquire these impairments, due to (for example) stroke, head trauma, or dementia (Chapter 9), are not diagnosed with ID.

1

The wording of these requirements for a diagnosis of ID is a very close paraphrase of the wording in the DSM-5.

99

100

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

A 2011 analysis of the worldwide prevalence of ID was estimated to be roughly 1% of the population (Maulik, Mascarenhas, Mathers, Dua, & Saxena, 2011). A follow-up analysis of a large number of international studies, extending the work of Maulik et al. (2011), found the intellectual disability prevalence in children/ adolescents and adults who as children met the criteria for ID to range from 0.05% to 1.55% (McKenzie, Milton, Smith, & Ouellette-Kuntz, 2016). The vast majority of individuals with ID (roughly 85%) have mild, rather than more severe, impairments. Formal diagnosis of ID is based on standardized scores obtained from an intelligence quotient (IQ) measure, as well as clinical observation of social and practical skills. A child diagnosed with ID typically has an IQ score below 70. An IQ score of 70 is two standard deviations below the average IQ score of 100 in the general population. When “raw” IQ scores are standardized, meaning they are transformed to form a normal distribution with an average of 100, one standard deviation equals 15 points; thus, an IQ score of 70 or below is at least two standard deviations below the average of 100. IQ scores typically reflect language skills, which may underestimate the child’s intelligence. For example, a child diagnosed with a language disorder may have a relatively low IQ score due to the languagebased nature of the test. There are nonverbal IQ tests that provide an estimate of intelligence that is independent of language skills. It is not uncommon for children to have nonverbal IQ scores that are higher than IQ scores that are based in part on language skills. The

nonverbal IQ scores reflect the child’s cognitive ability, which reflects skills such as reasoning, memory, and processing speed. ID is diagnosed based on IQ and deficits in adaptive behavior/functioning. Adaptive functioning refers to the social and practical skills needed to get along in the world. Clinical evaluation of adaptive functioning, as well as scores on standardized tests of these skills, are equally important as an IQ score in making a diagnosis of ID. IQ scores and scores on standardized tests of social and practical behaviors are usually well below the normal range for children who are evaluated for a diagnosis of ID, and together support the diagnosis. In some cases (probably very few), a child may have an IQ score below 70 but have scores within the normal range for both social and practical skills. Such a child is not diagnosed with ID.

Down Syndrome (DS): General Characteristics The 23 pairs of chromosomes in the human genome can be shown in an image called a karyogram, which shows an individual’s genotype (see Box, “Genetic Terminology,” for definitions of genetic terms). Figure 8–1 shows karyograms for a typical human male (left) and female (right). The chromosome pairs are numbered from 1 to 22; the 23rd pair is shown within the red circles. The 23rd chromosome pair in the male genotype has an X and a Y chromosome, compared with the two X chro-

Human karyotype Male

1

2

Female

3

6

7

8

13

14

15

19

20

4

9

10 16

21

22

5

11 17

X Y

12 18

1

2

3

6

7

8

13

14

15

19

20

4

9

10 16

21

22

5

11 17

12 18

X X

Figure 8–1.  Karyograms showing the karyotype for typically-developing (and developed) human males (left ) and females (right ).

101

8  Pediatric Language Disorders II

mosomes of the 23rd pair in the female genotype. An individual’s sex is coded by the 23rd chromosome pair. The genetic basis of DS is a de novo mutation (see Box, “Genetic terminology”) of the 21st chromosome. A de novo mutation is not inherited, but rather occurs when the sperm joins the egg. The mutation is an additional chromosome — hence, the term “trisomy 21” to designate a third chromosome added to the typical pair at chromosome 21. The DS karyotype is shown in Figure 8–2, where the added chromosome is indicated by an arrow. There are other genotypes (genetic profiles) associated with Down syndrome, but Trisomy 21 is the most common and is the focus of the presentation in this chapter. DS is the most common genetic cause of ID. A photograph of a child with DS is shown in Figure 8–3.

Epidemiology and the DS Phenotype Based on data from the years 2004 to 2006, the prevalence of DS is estimated to be 1 in 691 births, or 6,037

Down Syndrome (Trisomy 21)

1

2

3

4

5

6

7

8

9

10

11

13

14

15

16

17

18

19

20

21

22

x

12

y

Figure 8–2.  Karyogram of Down syndrome, showing Trisomy 21. Arrow indicates the third chromosome at pair 21.

Genetic Terminology Chromosomes:  Twenty-three pairs of strands of DNA and proteins, each strand carrying genes; the 23rd pair is often called the sex chromosome because it is different for males and females. The nuclei of most cells in the body contain the genetic information carried by the 23 chromosomes. Karyotype:  A description of a person’s chromosomes, such as, “The karyotype of a person with Down syndrome is Trisomy 21 — a third chromo­ some added to the 21st pair of chromosomes”; or, “The karyotype of a person with Fragile X syndrome is damage to the X chromosome of the 23rd pair.” A karyogram is an image of the chromosomes arranged in the way shown in Figure 8–1. The karyogram shows the karyotype (https://www.quora.com/What-is-revealed-bya-karyotype). Gene:  A unit of DNA, located on a chromosome, that controls the development of anatomical structures and traits. Genes are passed from one generation to the next. Genotype:  The complete set of genes in an organism, which varies across species; the human genome is estimated to contain about 20,000 genes.

Phenotype:  The observable characteristics of an individual, including traits, anatomical features, biochemical characteristics, and so forth. A phenotype reflects the interaction of a genotype with environmental factors. For a group of individuals with the same genotype (e.g., a group of individuals with FXS, all of whom have the same genotype), the phenotype is typically variable. Mutation:  A gene mutation is a change in its structure that differs from that gene’s structure in most of the population. The change is not reversible (that is, it is permanent) and may be inherited (passed from generation to generation); a de novo mutation is not hereditary but occurs when the sperm and egg are joined and an error occurs during cell division, resulting in a gene (or genes) that differs (differ) from the corresponding genes found in most of the population. Monogenic:  A trait or condition, possibly with many phenotype characteristics, associated with a single gene (as in FXS). Polygenic:  A trait or condition associated with multiple genes (as in autism spectrum disorder [ASD]), typically with many phenotype characteristics.

102

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

severity: any of the characteristics may be present in mild, moderate, or severe form. An individual with DS is also likely to have facial differences, which may include a flattened face, eyes that slant upward and have an almond shape, small ears, a tongue that often sticks out of the mouth, and a short neck.

Language Characteristics in DS Speech and language impairments are common in DS. The impairment may include deficits in phonology, morphology, syntax, content, and social use of language (pragmatics). Many of these deficits can be attributed to ID, which is characteristic of DS (Wester Oxelgren, Myrelid, Annerén, Westerlund, Gustafsson, & Fernell, 2018). In addition, hearing loss may contribute to speech and language impairments in DS (Martin, Klusek, Estagarribia, & Roberts, 2009).

Phonology

Figure 8–3. Photo of a boy with Down syndrome. Reproduced from https://en.wikipedia.org/wiki/Down_ syndrome. This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.

new cases annually (Kirby, 2017, adapting data from Parker et al., 2010). DS is a complex condition. Like FXS, there are many possible problems in DS, some or all of which may be present in a given individual. In other words, the genotype (trisomy 21) has a wide range of phenotypes. Typically, individuals with DS have some degree of intellectual disability and accompanying speech and language impairment. Additional characteristics of a person with DS may include hearing loss (often associated with middle ear disease), visual impairment, congenital heart defects, sleep apnea due to an obstructed airway, mental illness, and dementia. Furthermore, the person with DS may be of small stature and have poor muscle tone and loose joints. The presence or absence of these characteristics in a person with DS may change over time. For example, dementia, a chronic and usually progressive brain disease that causes memory loss, impaired reasoning, and behavioral changes, is likely to be observed in older individuals with DS. The phenotype variability includes

Acquisition of the sound system of the language is delayed in most children with DS, with some errors identified as disordered (i.e., not seen in typical speech sound development; Kent & Vorperian, 2013). The pattern of speech sound development often follows the pattern seen in typically developing children, with a variable age-of-acquisition delay and slower rate of speech sound mastery (Chapter 13). Many adults with DS also have speech sound errors that may be lifetime impairments. Some authors (e.g., Kadaverak, 2014) have stated that the delay in speech sound acquisition in DS is a result of anatomical differences in the speech mechanism. These differences include a protruding tongue, a small oral cavity (Xue, Caine, & Ng, 2010), and atypical laryngeal structures as well as weakness of respiratory muscles. Precisely how these anatomical differences affect speech sound development is unknown. Another factor likely to contribute to speech sound errors is dysarthria, or poor control of speech mechanism structures (e.g., the articulators, and larynx) due to neurologic deficits in speech motor control regions of the central nervous system. Disordered speech sound development in children and adults with DS may involve errors for sounds that are mastered early by typically developing children. For example, sounds such as /t/, /d/, and /n/, typically mastered no later than age 3 years, may be produced incorrectly by children with DS, throughout childhood and into adulthood. In addition, children with DS make vowel errors as they learn their sound

8  Pediatric Language Disorders II

system; as discussed in Chapter 13, vowel errors in typically developing children are unusual past the age of 3 years (see review in Kent & Vorperian, 2013). Speech intelligibility is affected significantly by the speech sound errors in DS. Wild, Vorperian, Kent, Bolt, and Austin (2018) administered a single-word test of speech intelligibility to typically developing individuals and individuals with DS. In this kind of speech intelligibility test, speakers record single words (such as sheep, boot, bath, and hot), and listeners respond to each of the recorded words by entering them via keyboard. Speech intelligibility is expressed as the percentage of total words presented that were heard correctly. For example, 25 correctly heard words from a presentation of 50 words is expressed as 50% speech intelligibility. In Wild et al. typically developing children between the ages of 4 and 5 years were close to 80% intelligibility, on average; children with DS in this age range had speech intelligibility ranging between 10% and 65%. Even at 20 years of age, individuals with DS had speech intelligibility of only 60% to 70% (Wild et al., 2018), well below adult speech intelligibilities which are close to 100% in typically developing individuals. For individuals with DS, poor speech intelligibility is a significant problem in social interaction. Finally, stuttering behaviors and other dysfluencies have been observed in approximately 30% of children with DS; this compares with a prevalence of stuttering among the typically developing population of around 1% (Eggers & van Eedernbrugh, 2018). Fluency disorders are covered in Chapter 17.

Morphology and Syntax Morphology and syntax are both impaired in children with DS. The impairment in these aspects of language seems to be greater than expected from the children’s cognitive skills. In contrast, vocabulary in children with DS is often consistent with cognitive skill; thus, the frequent statement in the research and clinical literature that in DS morphology and syntax are disproportionately impaired, compared to other language components (Martin et al., 2009). Language comprehension and expression for morphology and syntax are both affected. Bound morphemes (e.g., tense markers such as -ed, third person singular such as he does [I do]) are examples of morphological problems in DS. Syntac2 

103

tic impairments include difficulty with pronouns (e.g., “Him pushed her” instead of “He pushed her”), production of short, simple sentences unlike the more complex sentences produced by typically developing children with the same cognitive skill,2 and increased comprehension difficulties as utterances increase in complexity (Martin et al., 2009). These problems in language expression make individuals with DS sound much younger than their chronological age. Problems with language comprehension limit the learning of more complex forms and therefore limit the ability to incorporate these forms into expressive language.

Vocabulary Vocabulary has often been regarded as a relative strength in individuals with DS. This applies to both receptive and expressive vocabulary, with better skills on the receptive side. When receptive vocabulary is compared between children with DS and typically developing children who are matched to children with DS on cognitive skill (see footnote 2), performance among children with DS is close to mental age expectations. Expressive vocabulary skill is not as much of a strength as receptive skill. Delays in expressive vocabulary relative to receptive vocabulary are common, and the expressive vocabulary may be smaller than expected based on nonverbal cognitive skills. Like typically developing children, children with DS continue to add to their expressive vocabulary as they get older, albeit at a slower rate compared with the former group (Martin et al., 2009). A recent study (Loveall, Moore Channell, Abbe­ duto, & Connors, 2019) demonstrates the potential influence of one language component (verbs) on the other (syntax) in language development. These authors used a storytelling approach to obtain expressive language samples from children with DS and from typically developing children. Children with DS produced as many different verbs as typically developing children but used those verbs less frequently. Verbs, which are typically acquired after nouns in both typically developing children and children with DS (Loveall, Moore Channell, Philips, Abbeduto, & Connors, 2016), require additional words to make their meaning clear. This is unlike nouns, which can stand

I n other words, when typically developing children and children with DS are matched for cognitive skill using a nonverbal estimate of IQ, the morphological and syntactic abilities of children with DS are significantly poorer than the abilities of the typically developing children. This means that the poor morphological and syntactic abilities of children with DS are not accounted for by their cognitive level — the language deficit is in excess of what would be predicted by cognition. Of course, when typically developing children and children with DS are matched in this way, the children with DS are older than the typically developing children.

104

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

alone and often do so in the early speech of children. For example, “ball” does not need other words to make its meaning clear, but “throw” is clarified when it is joined to other words (“I throw the ball”). Verbs specify the role of nouns in a sentence — that is, they require syntax for support. Loveall et al. (2016, p. 83) explain this interaction between verbs and syntax well: “verbs are responsible for linking words within a sentence together, and as such, they play a key foundational role in syntax. If disrupted, then syntactic development could also be impacted.” Loveall et al. (2019) argue that the less frequent use of verbs in the expressive language of children with DS may have an important influence on the weakness noted previously for expressive syntax. Expressive syntax is disproportionately impaired in DS (see earlier) — perhaps the less frequent verb usage of this important “linking” vocabulary affects the development of expressive syntactic skill. This information, generated by carefully performed research, has potentially important implications not only for diagnosis of language problems in DS but also for therapy strategies. If expressive use of a specific vocabulary category such as verbs stimulates the growth of syntactic skills, a therapeutic focus on increased use of verbs may be more than an exercise in vocabulary building.

Language Use (Pragmatics) Pragmatic language use is a complex skill with many different aspects, as discussed in Chapters 3 and 7. Examples of pragmatic language use include how much talking is done in social situations, understanding appropriate language use in different communication settings, language redundancy (multiple repetitions of the same sentences), patterns of eye contact during conversations, and the ability (or willingness) to initiate conversations (Klusek, Martin, & Losh, 2014). Pragmatic language use in DS is widely viewed as a weakness, in younger children, older children, and adults (e.g., Klusek et al., 2014; Lee, Bush, Martin, Barstein, Maltman, Klusek, & Losh, 2017; Smith, Næss, & Jarrold, 2017). Not all aspects of pragmatic language use are affected equally. For example, teenagers with DS may have a deficit in indicating that they have not understood a statement, but may have relatively strong skills when they are asked to clarify something they have said, or when they are narrating stories (see review in Martin et al., 2009). Nonverbal skills in social communication may also be a relative strength among individuals with DS. Pragmatic language skills in DS are typically delayed relative to those of typically developing chil-

dren but not as affected as in FXS (especially FXS + ASD) or ASD (see below). As with pragmatic deficits in FXS and ASD, skills in pragmatic language among children with DS are not only delayed but improve at a slower rate when compared to typically developing children.

Fragile X Syndrome: General Characteristics FXS is diagnosed by genetic testing. In FXS, there is a mutation of the X chromosome on the 23rd pair. Figure 8–4 shows a karyogram with an arrow pointing to the area of mutation on the X chromosome of the 23rd pair (enclosed within the red oval); there is a slight break or discontinuity of the chromosome. The mutation results in the group of characteristics (or a subgroup of those characteristics) that make up the phenotype of FXS (see Box, “Genetic Terminology”). Both girls and boys are diagnosed with FXS (both have X chromosomes), with boys usually having a more severe version of the syndrome. This is because girls have a second X chromosome that can compensate for the single, mutated X chromosome in boys with FXS. Much of the research literature in FXS is concerned with boys. The phenotype in FXS includes facial differences, intellectual disabilities, cognitive and language disorders, as well as other characteristics. These may include anxiety, depression, visual, auditory, and psychiatric problems. The facial differences are not necessarily obvious at birth and may not be apparent for a while. Figure 8–5 displays photographs of a male with FXS, as a child (A) and as an adult (B). Note the long face with the high forehead, the large jaw, and large ears characteristic of the facial differences previously described. The facial characteristics often gain greater prominence with age. Genetic testing may be done when ID is suspected because of increasing developmental delays, perhaps accompanied by facial and behavioral characteristics that are increasingly like those observed in children with FXS. The genetic testing confirms or disconfirms the syndrome. The child may also show behaviors consistent with ASD, such as poor eye contact, rocking, and hand flapping. In fact, up to 90% of boys diagnosed with FXS have autistic-like behaviors, and 30% to 50% meet the diagnostic criteria for ASD (Niu et al., 2017). As reviewed in Chapter 7, ASD is diagnosed with formal and semiformal tests of behavior; there is no genetic test for the condition. FXS is the leading heritable cause of ID. The mutation on the X chromosome of the 23rd pair is passed

Fragile X Syndrome

1

2

6

7

13

14

19

20

3

8

4

9

15

21

22

5

10

11

12

16

17

18

X

Y

Figure 8–4.  Karyotype for a male with FXS. The arrow points to the region of the chromosome in which there is a mutation. The chromosome appears to be broken in this location.

A

B

Figure 8–5.  A male with fragile X syndrome, as a young child (A) and young adult (B). Photos provided by permission of Kelly Randels Coleman. 105

106

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

from parent to child. A child who receives the fragile X mutation and is diagnosed with FXS may have a range of the characteristic anatomical (e.g., the facial differences) and behavioral characteristics, and the severity of those characteristics may range from very mild to very severe. This is an example of a genotype being associated with a wide range of phenotypes.

Epidemiology of FXS The prevalence of FXS is approximately 1 in 5,000 males, and 1 in 4,000 to 8,000 females. Additional details of prevalence and patterns of hereditary transmission can be found in Niu et al. (2017) and Saldarriaga et al. (2014). Children diagnosed with FXS are much more likely to have intellectual disabilities compared with children diagnosed with nonsyndromic ASD (e.g., children with ASD who are not diagnosed with FXS). This has implications for the separate and combined language disorders in FXS and ASD.

Language Characteristics in FXS When relevant, the following description of the language disorder in FXS includes discussion of the language disorder in ASD. This is because of the overlap between the two disorders. Language in all areas is typically delayed in boys with FXS relative to typically developing agemates because of the intellectual disability; therefore, most investigations compare language profiles to typicallydeveloping children using mental-age matches. For example, the language abilities of a 10-year-old with FXS might be compared to a 5-year-old typically developing child because both children score similarly on a nonverbal IQ measure. This allows us to answer the question, “How is the language of children with FXS similar/different from that of children at the same cognitive level?” A prominent language disorder in children (and later, adults) with FXS is social communication impairment (pragmatic language deficit). Other areas of language deficit exist and contribute to the social communication problems. Following the structure of the previous chapter, language capabilities of children with FXS in the areas of phonology, morphology, syntax, content (lexicon), and pragmatics are presented. Keep in mind variation across individuals in phenotype — children with FXS, even with a common genetic source of the syndrome, do not have the same language (or other) characteristics.

Phonology Typically developing children learn the sound system of their native language in a systematic way (Chapter 13). Sounds such as stop consonants are mastered before fricatives, /w/ before /r/ and /l/, single consonants before consonant clusters, to name a few wellknown examples. Phonological processes also follow a systematic trend. For example, the process of final consonant deletion (“dah” for “dog”) is typically eliminated earlier than the process of cluster reduction (“pay” for “play” or “sop” for “stop”). Barnes and colleagues (2009) reviewed previous evidence for speech sound development in singleword productions of children with FXS. This review suggests that children with FXS have delayed, not different, speech sound development relative to typically developing children. This means that the children with FXS acquire the sound system of English with the same error patterns as typically developing children but at a slower rate. The literature reviewed by Barnes et al. (2009) included studies in which single-word productions were analyzed. Barnes et al. wanted to know if the same speech sound learning patterns were found in children with FXS when the utterances were from connected, “natural” speech. The participant groups included children with FXS, children with FXS plus ASD, and typically developing children. When the data were analyzed for speech sound error patterns, there was almost no difference between the two groups of children with FXS (FXS alone and FXS plus ASD). Children in both groups had more speech sound errors compared with typically-developing children. Consistent with these speech sound errors, speech intelligibility was significantly lower for the connected speech utterances of children with FXS, and FXS plus ASD, when compared to speech intelligibility of typically developing children. But when the children with FXS and FXS plus ASD were matched to typically-developing children for the number of their speech sound errors, the children with FXS were significantly less intelligible than the typically-developing children. This is somewhat surprising because speech intelligibility is dependent to a large degree on the “goodness” of consonants and vowels, and the matching across the groups of sound errors might lead to the expectation of equivalent intelligibility for the groups. In summary, current research indicates that children with FXS have speech delays with speech sound error patterns very much like speech sound errors observed in typically developing children during the course of their speech sound development (Chapter 13). The pattern of speech sound errors is similar to those

8  Pediatric Language Disorders II

of typically-developing children, but the process of speech sound mastery is delayed. This is similar to the description in Chapter 15 of speech delay in otherwise typically developing children. Even when children with FXS are matched to typically developing children for the stage of their speech sound development, children with FXS are less intelligible than typically developing children. The severity of the speech intelligibility problem in FXS is variable, ranging from essentially normal speech intelligibility to severely unintelligible speech. The variability of the phenotype is not well understood. (See Box, “Embrace the Gray.”)

Morphology and Syntax A fairly large research literature is available on the language characteristics of children with FXS. Studies of language deficits in FXS often report data for boys because their language deficit, as well as the severity of their ID, is typically worse than the deficits observed in girls. The current review is based on information found in several recent reviews and experiments (Finestack, Sterling, & Abbeduto, 2013; Haebig & Sterling, 2017; Komesidou, Brady, Fleming, Esplund, & Warren,

107

2017; Martin, Losh, Estigarribia, Sideris, & Roberts, 2013; Oakes, Kover, & Abbeduto, 2013; and Sterling, 2018). Children with FXS, from toddlers to teenagers, have language deficits that affect their ability to communicate and engage in social experiences. Deficits in morphological and syntactical skills are frequent among children with FXS. Morphological deficits include (for example) tense marking (wait-waited), proper form of the verb “is” (“he is”-“they are”), and third person singular (“I do”-“he does”). Recall that incorrect use of morphemes is characteristic of a brief phase of typical language development; mastery of most if not all morphemes is completed around age 4 years (Brown, 1973). Children with FXS take much longer to master grammatical morphemes and may maintain a deficit in morpheme use into adulthood. Similarly, typically developing children master simple sentence forms (“The boy fed the dog”) earlier than more complex forms such as sentences with embedded clauses or of greater length (“The boy who fed the dog is Joe”). The severity of these deficits in language form varies within the population of children with FXS. These deficits do not necessarily involve the same morphemes or syntactic forms for all children with FXS.

Embrace the Gray An introductory text for Communication Sciences and Disorders must provide information on the most current, research-generated knowledge in a broad way. Coverage of the material must be selective and avoid many of the research and clinical details that are relevant to the conclusions reached by authors. Higher-level courses build on the general information presented in this textbook, largely by adding details left out of an introductory textbook. A good example is the coverage in this chapter of the Barnes et al. study. Their conclusions are that the phonological skills of children with FXS, FXS + ASD, and typically developing children are similar when matched for stage of speechsound development, but that speech intelligibility is lower in the first two groups compared with the typically developing group. This seems to be a little odd. After all, it seems reasonable to expect a strong relationship between the “goodness” of the sound system and the goodness of speech intelligibility. But the details of the study are revealing. For example, Barnes et al.’s analysis is focused on consonants and does not take account of vowels, the articulatory “goodness” of which makes a

significant contribution to speech intelligibility. Measures of other variables that are likely to contribute to speech intelligibility are also left out of the analysis. Another example is that children in the FXS, FXS + ASD and typically-developing groups were matched using a technique that makes the language skills of children in the three groups “equal.” The use of this matching technique means that the children with FXS were much older than the typically developing children when the consonant “goodness” was compared across groups. These strategies are controversial. Now, do not get this author wrong: the study by Barnes et al. is a fine study, and the results are relevant to both basic science and clinical practice. But there is a lot of gray area in the findings. This is the attraction of science: embrace the gray area of current knowledge and by careful analysis and experiment make it less gray. This is the attraction of clinical practice: embrace the gray area of an individual’s cognitive, language, and social skills, apply clinical expertise including familiarity with research findings, and by a carefully structured therapeutic plan improve the individual’s language and social skills.

108

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Not all morphemes are subject to incorrect use, and children with FXS do not necessarily have difficulties with the same morpheme(s) (Sterling, 2018). Language researchers and clinicians who treat developmental language disorders are interested in both comprehension and production (expression) of language forms. Comparison of comprehension with expression skills may provide insight to the most productive approach to language therapy. For example, a child with relatively strong comprehension skills but weaker expression skills may receive the greatest language benefit from therapy focused on expression. There are several ways to assess morphological and syntactic skills for both comprehension and expression in children with FXS (as well as other children with language disorders). These assessments include formal, standardized tests, as well as analysis of language patterns in more natural conversational settings. A general measure of expressive language sophistication, taken from natural language samples, is mean length of utterance (MLU), discussed in Chapter 6. In general, children with FXS have impairments in both comprehension and expression of morphology and syntax, with expression more affected than comprehension. Children with FXS have shorter MLUs compared to typically developing children and increase their MLU throughout early language development at a much slower rate than typically developing children. As a general measure of language sophistication, the difference in MLU between children with FXS and typically developing children reflects a significant deficit in the morphological and syntactic forms of expressive language.

Vocabulary Like other aspects of language development, both receptive and expressive vocabulary of children with FXS lag the vocabulary of typically developing children. Even with this lag, the larger receptive as compared to expressive vocabulary in typically developing children is also found in FXS. The relative deficit of receptive and expressive vocabulary in children with FXS, compared with typically developing children, may be more apparent in younger children (around 8 years of age). Older children (around 12 years of age) may “catch up” to vocabulary skills of typically developing, mental age–matched children. This summary, and the one that follows, is based on studies and reviews published by Finestack, Sterling, and Abbeduto (2013), Haebig and Sterling (2017), Kover and Abbeduto (2010), Lewis et al. (2006), and Martin et al. (2013). Expressive vocabulary development in children with FXS is slower than vocabulary development in

typically developing children. In addition, a greater deficit in expressive vocabulary may be observed in children with FXS only compared with children with FXS plus ASD. The increase in expressive vocabulary over time is not predicted very strongly from a child’s nonverbal, cognitive abilities. The deficit in receptive vocabulary seems to be different from the deficit in expressive vocabulary. An important difference is that receptive vocabulary appears to increase with development in a manner consistent with increasing nonverbal cognitive ability. In other words, unlike expressive vocabulary, receptive vocabulary can be predicted from nonverbal cognitive abilities.

Language Use (Pragmatics) Children with FXS have problems with language use for social situations. As described by Klusek, Martin, and Losh (2014), pragmatic language skills include, “the selection of conversational topics fitting to the situation, appropriate word choice, and the ability to modify language in order to match the expectations and knowledge base of the communication partner” (p. 1692). Based on research to date, pragmatic language skills are more affected in children (and adults) with FXS who have also been diagnosed with ASD, compared with children diagnosed with FXS “only” (see review in Klusek et al., 2014). As discussed in Chapter 7, social communication deficits (nonverbal and verbal pragmatic abilities) are a hallmark of ASD. An interesting issue is the possible difference in the pragmatic language deficit in boys with FXS “only” and boys with FXS plus ASD. Is the pragmatic language deficit more severe in FXS plus ASD compared with FXS “alone”? The answer seems to be “yes.” Niu et al. (2017) have argued that research and clinical evidence indicates that pragmatic language deficits in “nonsyndromic” ASD are more severe than pragmatic language deficits in FXS plus ASD. Notice the difference between these group comparisons: FXS “only” compared to FXS plus ASD, and “nonsyndromic ASD” compared to FXS plus ASD. The determination of how ASD adds to or changes FXS, or FXS adds to or changes nonsyndromic ASD, is very complicated and awaits further research. A specific example of a pragmatic language deficit is difficulty with topic maintenance during a conversation. Conversing is a joint, socially reinforcing pastime, in which the participants discuss a topic until the conversation turns to a related or different topic. It is as if the participants know the rules for supporting productive and socially reinforcing talk. Individuals who struggle with this skill may suddenly change the topic under discussion, in effect not understanding the rules of conversation. And within the same context,

8  Pediatric Language Disorders II

the individual with a pragmatic language deficit may continue to contribute this new, unrelated topic to the discussion, even when the original topic continues to be discussed by the other participants. Perseverative/ repetitious language is a prominent feature of language use in individuals with FXS. A pragmatic language deficit is potentially devastating to an individual’s social development and life. The deficit can partially or largely affect the ability to make friends, share information, participate in sports, and engage in many other aspects of life.

Chapter Summary Intellectual disabilities, which have a prevalence worldwide of approximately 1% of the population, are defined as chronic impairments of mental abilities that affect adaptive functioning in conceptual skills, social behaviors, and practical behaviors. In most cases of ID, the individual is mildly affected. Diagnosis of ID is made in childhood and is not used for adults who acquire mental deficits by (for example) strokes, head injury, or degenerative neurological conditions. Overall IQ scores, which are used as a component of the diagnosis of intellectual disabilities, reflect verbal skills in addition to nonverbal skills; IQ scores can also be obtained by tests that include only nonverbal test items for an estimate of cognitive ability independent of language skills. Down syndrome (DS) is the most common genetic cause of ID. DS occurs when there is a third chromosome at pair 21, hence, the term “trisomy 21,” a description of the genotype in DS. As in other genetic disorders, there is wide variability in the phenotype in DS (i.e., across individuals), which includes variability in language skills. Phonological skills are primarily delayed in DS, meaning that the learning of the speech sound system follows the same progression as in typically developing children, but at a slower rate; some speech sound errors in DS may be unusual, such as later learning of sounds that are acquired early in typical speech sound development. An important outcome of the delay in speech sound development in DS is that speech intelligibility is affected, which may limit social interactions. Morphology and syntax are thought to be particular areas of weakness for language skills in DS; both receptive and expressive language are affected, with a prominent weakness in expressive skills.

109

Vocabulary in DS is characterized by better receptive skills compared to expressive skills; the expressive vocabulary may be less than expected based on a child’s mental age. Pragmatic language skills are generally impaired in DS, but some components of pragmatics, such as story narration and ability to respond to questions that ask for clarification, are relative strengths. FXS, which occurs in approximately 1 in 4,000 boys and 1 in 4,000 to 8,000 girls, is the leading heritable cause of ID, and many children diagnosed with FXS are also diagnosed with autism. The ID and other characteristics in FXS, including impaired language skills, are more severe in boys, compared with girls. The phenotype in FXS is variable, both in the physical and behavioral characteristics; the language skills component of the behavioral phenotype varies from mildly to severely impaired. A prominent characteristic of the language disorder in FXS is in the area of pragmatics. Children with FXS acquire the sound system of English with the same error patterns as typically developing children but at a slower rate; their phonological development is delayed, not different. The development of comprehension and production (expression) of morphology and syntax is delayed in children with FXS, although the specific morphemes and syntactic structures that are affected vary across children. Both receptive and expressive vocabulary of children with FXS lag the vocabulary of typically developing children; like typically developing children, receptive skills for vocabulary are more advanced than expressive skills. Comparison of the language skills of children with FXS “only” and children with FXS plus ASD do not reveal a consistent difference, but children with FXS plus ASD may have more severe language deficits, especially in the area of pragmatic language.

References American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC: Author. Barnes, E., Roberts, J., Long, S. H., Martin, G. E., Berni, M. C., Mandulak, K. C., & Sideris, J. (2009). Phonological accuracy and intelligibility in connected speech of boys with fragile X syndrome or Down syndrome. Journal of Speech, Language, and Hearing Research, 52, 1048–1061. Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.

110

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Carvill, S. (2001). Sensory impairments, intellectual disability and psychiatry. Journal of Intellectual Disability Research, 45, 467–483. Eggers, K., & van Eerdenbrugh, S. (2018). Speech disfluencies in children with Down syndrome. Journal of Communication Disorders, 71, 72–84. Finestack, L. H., Sterling, A. M., & Abbeduto, L. (2013). Discriminating Down syndrome and fragile X syndrome based on language ability. Journal of Child Language, 40, 244–265. Haebig, E., & Sterling, A. (2017). Investigating the receptiveexpressive vocabulary profile in children with idiopathic ASD and comorbid ASD and fragile X syndrome. Journal of Autism and Developmental Disorders, 47, 260–274. Kadaverak, J. N. (2014). Language disorders in children: Fundamental concepts of assessment and intervention (2nd ed.). New York, NY: Pearson. Kent, R. D., & Vorperian, H. K. (2013). Speech impairment in Down syndrome: A review. Journal of Speech, Language, and Hearing Research, 56, 178–210. Kirby, R. S. (2017). The prevalence of selected major birth defects in the United States. Seminars in Perinatology, 41, 338–344. Klusek, J., Martin, G. E., & Losh, M. (2014). A comparison of pragmatic language in boys with autism and Fragile X syndrome. Journal of Speech, Language, and Hearing Research, 57, 1692–1707. Komesidou, R., Brady, N. C., Fleming, K., Esplund, A., & Warren, S. F. (2017). Growth of expressive syntax in children with fragile X syndrome. Journal of Speech, Language, and Hearing Research, 60, 422–434. Kover, S. T., & Abbeduto, L. (2010). Expressive language in male adolescents with fragile X syndrome with and without comorbid autism. Journal of Intellectual Disability Research, 54, 246–265. Lee, M., Bush, L., Martin, G. E., Barstein, J., Maltman, N., Klusek, J., & Losh, M. (2017). A multi-method investigation of pragmatic development in individuals with Down syndrome. American Journal on Intellectual and Developmental Disabilities, 122, 289–309. Lewis P., Abbeduto, L., Murphy, M., Richmond, E., Giles, N., Bruno L., . . . Orsmond, G. (2006). Cognitive, language and social-cognitive skills of individuals with fragile X syndrome with and without autism. Journal of Intellectual Disability Research, 50, 532–45. Loveall, S. J., Moore Channell, M., Abbeduto, L., & Connors, F. A. (2019). Verb production by individuals with Down syndrome during narration. Research in Developmental Disabilities, 85, 82–91. Loveall, S. J., Moore Channell, M., Phillips, B. A., Abbeduto, L. & Connors, F. A. (2016). Receptive vocabulary analysis in Down syndrome. Research in Developmental Disabilities, 55, 161–172.

Martin, G. E., Klusek, J., Estagarribia, B., & Roberts, J. E. (2009). Language characteristics of individuals with Down syndrome. Topics in Language Disorders, 29, 112–132. Martin, G. E., Losh, M., Estigarribia, B., Sideris, J., & Roberts, J. (2013). Longitudinal profiles of expressive vocabulary, syntax, and pragmatic language in boys with fragile X syndrome or Down syndrome. International Journal of Language and Communication Disorders, 48, 432–443. Maulik, P. K., Mascarenhas, M. N., Mathers, C. D., Dua, T., & Saxena, S. (2011). Prevalence of intellectual disability: A meta-analysis of population-based studies. Research in Developmental Disabilities, 32, 419–436. McKenzie, K., Milton, M., Smith, G., & Ouellette-Kuntz, H. (2016). Systematic review of the prevalence and incidence of intellectual disabilities: Current trends and issues. Current Developmental Disorders Reports, 3, 104–115. Niu, M., Han, Y., Dy, A. Du, Jin, J., Qin, J., . . . Hagerman, R. K. (2017). Autism symptoms in fragile X syndrome. Journal of Child Neurology, 32, 903–909. Oakes, A., Kover, S. T., & Abbeduto, L. (2013). Language comprehension profiles of young adolescents with fragile X syndrome. American Journal of Speech-Language Pathology, 22, 615–626. Parker, S. E., Mai, C. T., Canfield, M. A., Rickard, R., Wang, Y., Meyer, R. E., . . . Correa, A., for the National Birth Defects Prevention Network. (2010). Updated national birth prevalence estimates for selected birth defects in the United States, 2004–2006. Birth Defects Research (Part A): Clinical and Molecular Teratology, 88, 1008–1016. Saldarriaga, W., Tassone, F., González-Teshima, L. Y., ForeroForero, J. V., Ayala-Zapata, S., & Hagerman, R. (2014). Fragile X syndrome. Colombia Médica, 45, 190–198. Smith, E., Næss, K-A. B., & Jarrold, C. (2017). Assessing pragmatic communication in children with Down syndrome. Journal of Communication Disorders, 68, 10–23. Sterling, A. (2018). Grammar in boys with idiopathic autism spectrum disorder and boys with fragile X syndrome plus autism spectrum disorder. Journal of Speech, Language, and Hearing Research, 61, 857–869. Wester Oxelgren, U., Myrelid, A., Annerén, G., Westerlund, J., Gustafsson, J., & Fernell, E. (2019). More severe intellectual disability found in teenagers compared to younger children with Down syndrome. Acta Paediatrica, 108, 961–966. Wild, H., Vorperian, H. K., Kent, R. D., Bolt, D. M, & Austin, D. (2018). Single-word speech intelligibility in children and adults with Down syndrome. American Journal of Speech-Language Pathology, 27, 222–236. Xue, S. A., Caine, L., & Ng, M. L. (2010). Quantification of vocal tract configuration of older children with Down syndrome: A pilot study. International Journal of Pediatric Otorhinolaryngology, 74, 378–383.

9 Language Disorders in Adults Introduction Language disorders in adults are usually the result of an acquired condition/disease that disrupts previously normal language skills. An acquired, adult language disorder contrasts with the chronic language impairment that extends from early childhood into adulthood due to conditions such as Down syndrome or Fragile X syndrome. For the purposes of this chapter, a language disorder is considered to be acquired if the condition/ disease responsible for the disorder occurs after the mid-teenage years. Many adult conditions/diseases can result in a language disorder. Examples include stroke, traumatic brain injury (TBI), adult-onset degenerative diseases such as Parkinson’s disease, and dementia-related diseases (such as Alzheimer’s disease). The current chapter focuses on adult speech and language disorders in stroke, TBI, and dementiarelated diseases. Speech impairments are included, when relevant, because the conditions/diseases that result in adult language disorders often include speech disorders as well. We use the term “adult language disorders” throughout the rest of this chapter to include both speech and language. A brief review of speech, hearing, and language structures of the brain precedes the discussion of adult

language disorders. Chapter 2 provides more detailed information on each of the review points.

Review of Concepts for the Role of The Nervous System In Speech, Language, and Hearing Critical review concepts for the understanding of adult language disorders are (a) the cerebral hemispheres; (b) lateralization of language functions to the left cerebral hemisphere; (c) language expression and comprehension as represented in different regions of the left hemisphere, called Broca’s area (expression) and Wernicke’s area (comprehension), respectively; (d) the significant role of connections between different regions of the brain for language production and perception; and (e) the perisylvian language areas.

Cerebral Hemispheres The cerebral hemispheres include the left and right hemispheres. Each hemisphere contains a frontal, parietal, temporal, and occipital lobe. The frontal, parietal, and temporal lobes play a major role in language production and comprehension (reception). Figure 9–1

111

112

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Left Hemisphere Parietal lobe

Frontal lobe

Occipital lobe

Sylvian fissure Temporal lobe Figure 9–1.  The four lobes of the left hemisphere. Note the sylvian fissure, an important landmark.

shows the four lobes in the left hemisphere as well as the sylvian fissure (see Chapter 2).

Lateralization of Speech and Language Functions Both the left and right cerebral hemispheres contain brain tissue associated with speech and language functions. Tissue in the left hemisphere is specialized for language production and comprehension. “Lateralization of language functions” means that, even though the two hemispheres have the same lobes, language functions are represented most prominently in one hemisphere — they are lateralized to the left hemisphere.

Language Expression and Comprehension Are Represented in Different Cortical Regions of the Left Hemisphere Language expression is primarily represented in the frontal lobe of the left hemisphere, and language comprehension is primarily represented in the temporal and parietal lobes of the left hemisphere. This does not mean that language expression is represented exclu-

sively in the frontal lobe, or that language comprehension is represented exclusively in the temporal/parietal lobes. All three lobes play a role in language expression and comprehension. The regions of Broca’s and Wernicke’s area are shown in Figure 9–2. The blood supply to the brain, shown in this figure, is discussed below.

Connections Between Different Regions of the Brain Different regions of the brain are connected by fiber tracts, so the regions can exchange information. The connecting tracts are not secondary to the cortical regions but are equally important to brain function for speech and language. Disruption of fiber tracts, even when cortical regions are healthy, can result in speech and language impairments.

Perisylvian Speech and Language Areas of the Brain A side view of the left hemisphere of the brain is shown in Figure 9­–3. The front of the hemispheres (i.e., of the head) is to the left. The lobes previously identified are labeled, as is the sylvian fissure. The sylvian fissure is

9  Language Disorders in Adults

113

Broca’s area supply

Posterior

Middle cerebral artery

Wernicke’s area supply

Figure 9–2.  General regions of Broca’s and Wernicke’s area, shown by the shaded, oval areas. Blood supply to the region of Broca’s and Wernicke’s area is also shown.

Left Hemisphere Parietal lobe Perisylvian speech and language areas

Frontal lobe

Occipital lobe

Sylvian fissure Temporal lobe Figure 9–3.  View of the left hemisphere showing the region of cortical tissue, as well as the underlying white matter, called the perisylvian speech and language areas.

the deep groove in the cortical tissue that separates the temporal lobe from the frontal and parietal lobes. A red oval encloses cortical tissue and the underlying fiber

tracts that are thought to be of critical functional importance for speech and language. These brain regions are called the “perisylvian speech and language areas.”

114

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The name implies that the enclosed regions are active in speech and language functions and when damaged are likely to cause speech and language impairments.

Adult Language Disorders: Aphasia Aphasia is an impairment of language expression and/ or comprehension, resulting from brain damage. Stroke, the most common cause of aphasia, is a blockage or rupture of the arteries that supply the brain with blood. The blood supply of the cerebral hemispheres is extensive and detailed. It is extensive because it reaches all parts of the hemispheres, and it is detailed because main arteries into the hemispheres branch into smaller and smaller vessels that supply precise, local regions of the brain. A loss of blood supply to a region of the brain prevents the affected neurons from sustaining their functions; the neurons die. This compromises the ability of that brain region to contribute to control of a behavior, such as language. Strokes occur for several reasons. Blood vessels may be blocked completely by a tissue fragment that travels through the bloodstream after breaking off from an artery wall; the fragment may travel a long way through the bloodstream before blocking an artery and depriving blood to regions beyond the blockage. An artery or vessel may have thickened walls due to a buildup of plaque, which narrows the vessel, limiting blood flow to regions beyond the point of narrowing. The neurons beyond the narrowed vessel lose functional ability either partially or completely. A third possibility is a ruptured vessel, which spills blood into the brain and does not allow sufficient blood to reach regions beyond the rupture. Figure 9–2 shows the blood supply to the surface of the left hemisphere. Note the large artery (called the middle cerebral artery) emerging between the tip of the temporal lobe and the bottom of the frontal lobe. The artery turns toward the back of the hemispheres and gives off a branch to furnish blood to Broca’s area (Figure 9–2, upward-pointing red arrow). As the main artery continues in the direction of Wernicke’s area (arrow pointing in the direction of the back of the hemisphere), blood is supplied to Wernicke’s area. Blood is supplied to other areas of languagerelated tissue within the cerebral hemispheres, both in the cortex and in subcortical structures. Figure 9–2 shows the potential for independent strokes in Broca’s area and Wernicke’s area. For example, blockages can occur in the branch to Broca’s area without affecting

blood flow to Wernicke’s area. If Broca’s area is the primary brain area for language expression, a stroke like this is expected to affect language expression but not language comprehension. The reverse is also possible: a blockage after the main artery has passed the branch to Broca’s area does not affect the frontal lobe (the location of Broca’s area) but may affect Wernicke’s area. In this case, language comprehension is likely to be affected without any effect on language production. This simplified account of how blood loss affects language function is not the whole story but makes the point of the relationship of blood supply to the brain and potential loss of expressive versus receptive function. Stroke is not a rare occurrence. There are approximately 800, 000 cases of stroke per year in the United States (https://www.cdc.gov/stroke/facts.htm), and many more worldwide. A significant number of strokes have aphasia as a prominent deficit (Ellis & Urban, 2016). Many patients who have aphasia recover most, if not all, of their language abilities in the days, weeks, or months following a stroke. A smaller number of patients have a chronic language impairment.

Classification of Aphasia The history of aphasia, in both clinical and research work, includes a substantial effort to classify different types of the disorder. Six types of aphasia, linking damage to specific brain structures with language comprehension and expression difficulties, are presented here. They include Broca’s aphasia, Wernicke’s aphasia, conduction aphasia, anomic aphasia, global aphasia, and primary progressive aphasia. Primary progressive aphasia is not the result of a stroke but is discussed here because it is characterized by aphasic deficits (among other deficits). A brief summary of apraxia of speech, which in adults is typically the result of a stroke, is also provided. There are other categories of aphasia as well, but here we discuss the ones most relevant to clinical practice and theories of brain function for language. Aphasia types can be classified in one of two ways (and possibly other ways, not discussed here). One way is to identify the location of brain damage with an imaging technique such as CT (computed tomography) or MR (magnetic resonance). Type of aphasia is diagnosed based on the expectation of the language disorder from brain-behavior relationships. For example, damage to Broca’s area, confirmed by imaging, is expected to result in Broca’s aphasia (see later in chapter), which has certain characteristics that distinguish it from other types of aphasia. Similarly, damage to

9  Language Disorders in Adults

Wernicke’s area is expected to result in Wernicke’s aphasia. This is referred to as a “localization” view of aphasia type: specific areas of the brain are associated with specific behaviors. What does it mean to confirm a lesion (damage to tissue) with an imaging technique? Figure 9–4 shows two CAT (computerized axial tomography) scans, one of a lesion in Broca’s area (top, left), the other in Wer-

Broca’s lesion

115

nicke’s area (top, right). These images were taken from slices through the brain in the horizontal plane. The slice level is shown in the bottom photograph of a real brain; note that the slice cuts through Broca’s and Wernicke’s areas. The sides of the images are flipped: the right side of the images is the left side of the brain. The dark regions indicated by the pointers are lesions. As expected, the lesion that includes Broca’s area is toward

Wernicke’s lesion

Central sulcus Wernicke’s area

Broca’s area

Figure 9–4.  Two CAT (computerized axial tomography) scans, one of a lesion in Broca’s area (top, left ), and the other in Wernicke’s area (top, right ). The labeled pointers show the lesions as darkened areas of the scans. These images were taken from slices through the brain in the horizontal plane, as shown in the lower image. The sides of the images are flipped: the right side of the images is the left side of the brain.

116

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

the front of the left hemisphere, and the lesion that includes Wernicke’s area is toward the back.1 A different way to classify aphasia is on the basis of the speech and language impairments, regardless of lesion location. For example, a patient who suffers a stroke and has agrammatical, effortful speech, with hesitations but close-to-normal comprehension skills, is diagnosed with Broca’s aphasia, regardless of the lesion location and its extent (how much of the brain has been damaged). This chapter does not attempt to resolve the classification controversy; the presentation is neutral, with both probable lesion location and speech-language symptoms described.

Broca’s Aphasia Much of the effort to classify aphasia according to the location of damage within the brain was inspired by the mid- to late-19th century work of Paul Broca (1824– 1880). Broca was a French physician who studied a small group of patients who were able to produce only a single syllable, or a few words but comprehended language with little or no problem. When the patients died, Broca performed autopsies of their brains and found similars locations of brain damage in most of the patients. The damage was in the lower part of the frontal lobe, close to the motor cortex, just above the sylvian fissure (see Figure 9–2). Broca suggested that the ability to articulate syllables and words was localized to this part of the frontal lobe. Damage to this area resulted in a disorder of articulation, but not comprehension. Broca published his results in 1865, and for more than 150 years the region he identified as the articulation center of the brain has been called “Broca’s area.” Broca’s aphasia is primarily an expressive language problem, presumably associated with damage to Broca’s area. The patients hesitate before speaking, and when they do speak produce single words or utterances of just a few words. Speaking seems to a listener to require a great deal of effort; word finding is a problem. The words that are spoken are usually content words; function words are not used, or seldomly used. Patients with Broca’s aphasia are also said to have agrammaticism, because of the lack of function words and the resulting lack of grammatical completeness. Comprehension of language is good (perhaps not quite normal).

Broca’s aphasia is called a nonfluent aphasia because of the hesitations and slow, effortful speech patterns. Here is a transcript of a brief conversation between a speech-language pathologist and a stroke survivor who was diagnosed with Broca’s aphasia.2 SLP:  Can you tell us your name? Patient:  John Doe (very carefully articulated). SLP:  And John, when was your stroke? P:  Seven years ago (each word carefully articulated, with an even, robotic sequence of syllables). SLP:  Okay . . . P:  And . . . (nods, because he sees SLP wants to continue a question). SLP:  And what did you used to do? P:  Um . . . well, um . . . worked (nodding head) . . . um . . . on a desk (this phrase spoken rapidly, does not sound impaired) . . . um . . . seven . . . seven . . . Spouse: Sales. P:  Sales (nodding, confirming spouses assistance and verifying the type of work). Sales . . . and . . . worldwide, and . . . very good, yeah! SLP:  Okay and who are you looking at over there? When you turned your head over there? P:  That’s my . . . wife. SLP:  And why is she helping you . . . to talk? P:  Um . . . she’s . . . a speech . . . (forms lip for a “p”) . . . um . . . (again, tries to position articulators but no sound comes out). SLP:  So you have trouble with your speech? P:  Yeah, yeah. SLP:  And what’s that called? P:  Um, phasia. SLP:  Alright . . . and so why don’t you work now? P:  Um . . . I . . . I . . . well I do (each word separated by very brief pauses).

1

 s used here, the term “includes” means that the lesions are larger than would be expected for lesions confined strictly to the cortical regions A suggested by the aphasia type (i.e., Broca’s aphasia, Wernicke’s aphasia). In other words, brain lesions are typically larger than the anatomical tissue identified as a specific area. Also, the other dark areas on the scans, such as the curved structures in the center and toward the front, are not lesions but other structures where you would expect the scan to show dark. The curved structures are the lateral ventricles, through which cerebrospinal fluid flows.

2 

 his conversation, and all subsequent transcripts, were transcribed from YouTube clips. “John Doe” substitutes for the patient’s name. T Comments are enclosed in parentheses, and long pauses are indicated by ellipses ( . . . ).

9  Language Disorders in Adults

SLP:  What do you do now? P:  Um . . . Voices ah home, aphasia (patient names an organization). In this 1-minute exchange between the SLP and the patient, the frequent hesitations (“um”), pausing ( . . . ), word-finding problems, agrammaticism (“Sales . . . and . . . worldwide, and . . . very good, yeah”), and good comprehension (he clearly understands the questions) are clinical markers for a diagnosis of a mild-tomoderate Broca’s aphasia.

Wernicke’s Aphasia Carl Wernicke (1848–1905), a German physician, was Broca’s equal in his influence on classification of aphasia types. Wernicke studied a series of patients who had great difficulty in comprehending language but gave no clinical evidence of a general (nonlanguage) hearing loss. The comprehension problem seemed to be specific to language. The patients had only slight difficulty with articulation, but their expressive language lacked meaning and often was not responsive to questions even though they replied (but without relevant content). Autopsies of the patients’ brains showed damage to the upper gyrus of the temporal lobe, toward the back of the sylvian fissure and close to the lower edge of the parietal lobe (see Figure 9–2). These patients had no apparent damage in Broca’s area. Wernicke concluded that the region of damage for these patients represented the language comprehension part of the brain. This region of the brain has since been known as Wernicke’s area. Here is a transcript of a brief conversation between a speech-language pathologist and a stroke survivor who was diagnosed with Wernicke’s aphasia. SLP:  Hi John, how are you? Patient:  I’m happy are you pretty? You look good. (wants to speak more but SLP asks a question). SLP:  What are you doing today? P:  We stayed with the water over there at the moment, talked with the people at the dam over there, they’re diving for them at the moment, he had a-water very soon . . . for him . . . with luck . . . for him. SLP:  So we’re on a cruise and we’re about to — 

3 

Jane Doe.

117

P:  We will sort right here and they’ll save their hands right there (pointing). SLP:  And what were we just doing with the iPad? P:  Uh . . . right at the moment they don’t show darn thing (laughs). SLP:  Where’s the iPad that we were doing? P:  (hesitates) I’d like my change for me and change hands for me, it was happy, I would talk with Jane3 sometimes, we’re out with them, other people are working with them with them, I’m very happy with them —  SLP:  Good! —  P:  This girl was really (misarticulated) good. And happy, and I played golf, and hit a trees, we play out with the hands, we save a lot of hands on hold for peoples for us, other hands, I don’t know what you get but I talk with a lot of hand grands . . . The 1-minute transcript shows the comprehension problem — for the most part, the patient is not responding to the questions with relevant answers. The patient does a lot of talking, with well-formed speech sounds, a rapid speaking rate, and mostly grammatical utterances, which nevertheless are nonsensical. There is repetition (“hands” occurs half a dozen times in the brief transcript) and one or two articulation errors (not transcribed here). Wernicke’s aphasia is called a fluent aphasia because utterances show little hesitation, and articulatory sequences are smoothly produced and even run on with excessive utterance length. The diagnosis of Wernicke’s aphasia is consistent with the transcript.

Conduction Aphasia Fiber tracts in the cerebral hemisphere (“white matter”; see Chapter 2) connect one region of gray matter to another. As demonstrated by Wernicke, and others after him, a massive fiber tract called the arcuate fasciculus connects Wernicke’s and Broca’s areas (Figure 9–5, blue lines) (Smits, Jiskoot, & Papma, 2017). A tract connecting the same areas but lower (more ventral) in the cerebral hemispheres is called the ventral stream. These tracts run beneath the cortex — like all white matter in the cerebral hemispheres, fiber tracts cannot be seen on the hemisphere surfaces.

118

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Arcuate fasciculus

Parietal Lobe

Wernicke’s Area

Frontal Lobe

Occipital Lobe

Broca’s Area

Temporal Lobe Ventral stream

Figure 9–5.  Two fiber tracts that connect receptive and expressive regions of the left hemisphere (Wernicke’s region with Broca’s region). The fiber tract represented by the blue-line bundle is the arcuate fasciculus, and the tract represented by the green-line bundle is the ventral stream. The color of the labels matches the color of the areas (magenta, Broca’s area; blue, Wernicke’s area).

Conduction aphasia is believed to be the result of a stroke that damages the arcuate fasciculus and functionally disconnects Wernicke’s area from Broca’s area. Wernicke’s area is not damaged, so the patient can hear and understand a word or sentence. Broca’s area is not damaged, so words and sentences can be produced in a normal way. Damage to the arcuate fasciculus partially or completely prevents the heard utterance from being transferred to Broca’s area for repetition. It is possible that the behavioral characteristics of conduction aphasia (i.e., expressive and receptive language; see below) are related to damage to other areas of the brain in addition to the arcuate fasciculus (Yourganov, Smith, Fridriksson, & Rorden (2015). The signature deficit in conduction aphasia is difficulty repeating others’ utterances, even though language comprehension is good. Expression is often fluent and grammatically well formed. However, expression may also have paraphasias, which are unintended errors on speech sounds, syllables, and words (Ardila, 2010; Pandey & Heilman, 2016). A transcript of a patient with conduction aphasia illustrates some of these characteristics.

SLP:  The producer asked us to count to 10, do you remember that? Can you do that again? P:  Yes. Why. (not a question). One, two, three, four, five, (rapid intake of air), seven, nnn uh, boin, too thuh, gehvry, beople, go I can’t oh I —  SLP:  No, that’s hard, let me get you started —  P:  I was ston —  SLP:  Okay, one, two, three (SLP is counting with the patient) —  P:  One, two, four, five, ss-ff-sixth, better send poined, um . . . SLP:  Good enough, let’s stop —  P:  Is that alright? The patient clearly understands the questions, requesting her to count. The patient begins counting accurately but soon produces paraphasias (e.g., “boin” and “gehvry”) and a fluent, short, well-formed sentence (“Is that alright?”).

9  Language Disorders in Adults

119

Anomic Aphasia

Global Aphasia

“Anomia” is from Greek, meaning “without name.” In anomic aphasia, a patient is fluent but has great difficulty with word-finding, especially for nouns, including names and objects. Often, a patient with anomia can describe the function of an object in great detail, or a person’s appearance, but not be able to name the object or person. Anomia occurs in most types of aphasia — almost all patients, regardless of the particular kind of aphasia, have some anomia. Aphasia in which anomia is the dominant symptom is usually mild. Comprehension is typically good in anomic aphasia. The following transcript is an example of anomia in a patient with aphasia. The patient is looking at pictures of four tools, including a saw, a hammer, an axe, and a screwdriver. The SLP is pointing to the saw and asking the patient to name it.

Global aphasia is a nonfluent aphasia in which both expressive and receptive language are impaired. The patient may be able to express short, automatic-type phrases such as “Hello,” and “How are you?” but longer, less automatic utterances are not produced or are very rare. The impairment of receptive language is similar, with occasional understanding of short, familiar utterances (e.g., “how are you?”) but poor understanding of more complex language. In global aphasia due to stroke, the blood supply to both Broca’s area and Wernicke’s area is blocked. Other areas of the brain, such as parts of the temporal lobe that are important for word and sentence processing, are also affected. In fact, the entire perisylvian language region (see Figure 9–3) is affected, with widespread effects on language expression and comprehension. Global aphasia has been reported as the most common (Flamand-Roze, Flowers, Roze, & Denier, 2013) or third most common (Hoffman & Chen, 2013) type of aphasia. Global aphasia is often diagnosed shortly after a stroke. As a patient recovers from overall stroke symptoms, language recovery may also occur. Global aphasia may “evolve” to different types of aphasia as the patient recovers (Klebic, Salihovic, Softic, & Salihoic, 2011). Global aphasia may continue to change to other aphasia types along the pathway to language recovery. A patient whose global aphasia evolves to functional expressive and receptive skills is likely to have residual anomia ranging in severity from mild to severe. Patients with global aphasia who do not show improvement within a month following a stroke have a poor prognosis for future improvement of language skills (Alexander & Loverso, 1992). This transcript illustrates global aphasia in a patient who had recently suffered a stroke.

P:  You knew what it is, I can’t tell you, maybe I can . . . If I was to carry the wood and cut it in half with that . . . you know . . . if I had to cut the wood down to bring it in . . . SLP: pointing to the saw:  Then you’d use one of these . . . P:  It’s called a . . . I have them in the garage . . . they are yee-ar . . . you cut the wood with them . . . it . . . sssaw! SLP:  How is it? (asking patient to repeat the name of the saw). P:  Unnn . . . (sigh) . . . sss . . . sah (cuts off the end of the vowel quickly) . . . sah (cuts off again) . . . I can’t state it. I know what it is and I could cut the word with it and it’s in my garage . . . and it’s . . . a . . . and also when you go out and you want to stay out in the woods with people you always take those with you and you . . . um . . . it is a . . . it’s not a knife, it’s not a . . . sssah (vowel quickly cut off), I know what it is, I think. SLP:  Well finish saying what you started to say.

{R1 = Relative 1} {R2 = Relative 2} R1:  Alright, say hi! P:  (Smiling) Hi! R1:  What’s your name?

P:  I’m not sure I’m stating I right.

P:  What’s what?

SLP:  Try again.

R1:  What’s your name?

P:  Ssssss. No, I’m incorrect.

P:  Uh . . .

The patient knows what the object is, what it does, and why you use it. The patient recognizes his inability to retrieve the name (“I can’t state it,” “I’m not sure I’m stating it right,” “No, I’m incorrect”). Several times he almost retrieves the word (“sssah” [cut off], “ssssss”).

R1:  Do you know? P:  . . . John? R1: John. P:  Yeah, John.

120

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

R1:  What’s my name? P:  Wha- what’s whose name? (weakly). R1:  My name (pointing to herself). P:  Uh . . . hard to get ‘em all (weakly). R1:  Yeah. Jane. P: Yeah. R1: Yeah. R2:  Can you say Jane? P:  Yeah (weakly). R2:  Say it. P:  Nndeer (unintelligible, weakly). R2: Jaaane. P:  I can’t say it . . . (weakly). R1:  That’s okay. P:  Takes me a little longer to say . . .

result of dementia (see section, “Language Disorders in Dementia”). Throughout the course of PPA, dementialike symptoms appear, but the language impairment remains a significant characteristic and barrier to daily function (Montembeault, Brambati, Gorno-Tempini, & Migliaccio, 2018). A person with PPA may have speech and language characteristics similar to those of Broca’s aphasia, including dysarthria and apraxia of speech (Gross-man, 2012; Chapter 14); or may have specific difficulties with word retrieval and word comprehension, specifically for low-frequency (unusual) words; or may have sentence repetition and comprehension impairments together with reading and written-language problems (Henry & Grasso, 2018). The specific nature of the language impairment may suggest one of three different types of PPA (Montembeault et al., 2018). Two different types of PPA are transcribed here. Transcript 1

R1:  Do you know where you are?

SLP:  We can speak about your concerns . . . .hi Jane!

P:  Do I know who it is?

P:  How are you?

R1:  Do you know where we are?

SLP:  I’m fine! So, tell me something . . . are you, um, are you working on your speaking?

P:  Uh . . . no you live in uh . . . in the . . . park. R1:  Close! We’re at the hospital. P: Yeah. R1: Yeah. The patient’s anomia is apparent in his struggle to retrieve his own name and his daughter’s name. The content of this transcript suggests a mild-to-moderate expressive component (R2: “Can you say Jane? P: Yeah (weakly). R2: Say it. P: Nndeer (unintelligible, weakly)” as well as a mild-to-moderate comprehension impairment (e.g., R1: “What’s my name? P: Wha — What’s whose name?”).

P: Yes. SLP:  And what are you working on, specifically, do you know? P:  Uuh-tho (shakes head, means to say “I don’t know”). SLP:  Okay, so you don’t know, can you say “I,” “don’t,” “know”? (slowly spoken, separating each word with a short pause). P:  I tho. SLP:  Okay, now let’s go a little bit slower, because you’ve got a little bit of a dysarthria. P: Yeah.

Primary Progressive Aphasia Primary progressive aphasia (PPA) is a rare adult language disorder in which a patient has an isolated language problem that increases in severity over time (Mesulam, 2018). PPA does not appear to be the result of a stroke, but rather a separate disease in which deterioration of brain tissue is specific to language areas and networks. In the initial stages, the language impairment is not accompanied by significant memory or psychiatric problems and does not appear to be a

SLP:  And a little bit of an apraxia (P shakes head “yes”), for your speech to come out really nice, we’ve got to slow it down and to put a nice space between each word. Now watch me, see if you can do this: “I.” P: “I.” SLP: “Don’t.” P: Own’t. SLP: Know.

9  Language Disorders in Adults

121

P:  (Makes face as if saying this word is going to be difficult) “Owe.”

P: Yep.

SLP:  Good. Can you say that again?

P: Rickford.

P:  I tho doh.

R1:  Good. And, where did you meet your wife?

SLP:  Good. Very good. And where do you live?

P:  My wife.

P:  Um . . . Iss? (answered as if uncertain, with rising pitch).

R1:  Who’s your wife?

SLP:  Do you know the name of the state where you live?

R1:  Is it that lady over there, is that your wife?

P: Yeah. SLP:  What state is it? P:  Biss? (again as if uncertain). SLP:  Biss (very precisely, as if asking P if that’s what he meant). P:  No (shaking head, meaning that’s not what he meant). SLP: No, Miss! P: Yes. SLP:  Oh, so you shorten it, don’t you, because it’s got a lot of syllables in it. P: Yes! Transcript 2 R1:  Okay. Hey Dad, can you look at me? P:  (unintelligible, does not turn to look). R1:  Can you tell me your name? P:  John Doe (spoken rapidly, with good articulation of sounds). R1:  And how old are you? P: Sixteen. R1:  When were you born? P:  . . . . . . 19xx. R1:  Good! And where were you born? (P looking around the room as if he doesn’t understand the question; long pause). Do you know where you were born? R2:  In the hospital (laughter from other family members). R1:  Where-where did you grow up? P:  (No response). R1:  Do you remember where you went to school?

R1: Rrrrrrrr.

P:  (Looks at his spouse). P: Mmm-hmm. R1:  And what’s her name? P: Jane. R1:  Good! And then you guys moved to Arkansas, where did you live in Arkansas? P:  (Looks around, doesn’t answer, looks to spouse). R2:  We lived in . . . R1:  In Little Rockstar? P:  Little Rockstar. The patient in Transcript 1 has poor articulatory skills (the sound errors in the transcript, e.g., “doh” for “know”) and possibly poor ability to transform sounds as represented in the brain to sounds produced by the speech mechanism. This latter problem is often referred to as an impairment of phonological coding, which can be independent of the ability to articulate sounds. It is best illustrated by the patient’s labored attempts to position the articulators to initiate an utterance, as if searching for the correct position. The patient’s comprehension skills are good — she responds to questions with no apparent problem. The patient in Transcript 2 has no problems with articulatory skills — his speech is fluent and easy to understand. But there are word-finding problems (the several examples of not responding and searching for a response, particularly of proper nouns such as the name of towns or of his spouse), which are solved when he is given a prompt (e.g., “In Little Rockstar”? and his immediate response, “Little Rockstar”). He may also have a comprehension impairment.

Apraxia of Speech Apraxia of speech is discussed in Chapter 15 as a childhood speech sound disorder. As noted there, the diagnosis of childhood apraxia of speech (CAS) is based in large part on speech characteristics in adults with brain

122

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

damage due to stroke and other brain injuries. These characteristics include difficulty initiating speech, articulatory “groping,” in which the patient has difficulty positioning articulators, such as the lips and tongue, to produce a speech sound correctly, and speech sound errors, especially for phonetically complex words. The patients recognize their own speech sound errors and make attempts to correct them. An example of this was noted above in the transcript for the second patient with PPA. Apraxia of speech in adults (“AAS” to distinguish it from “CAS”) is often observed in aphasic patients and is considered by some to be a type of aphasia. Others, perhaps most clinicians and scientists, consider AAS to be strictly a motor speech disorder related to an impairment of programming (planning) sound sequences. In this view, correct speech sounds are represented in the brain but are subject to errors in planning their articulation and sequencing. . For example, phonemes for the word “strategy,” /s/, /t/, /r/, /ae/ (as in “hat”), /d/, /ə/ (a very short “uh”), /dg/ (as in “fudge”), and /i/, are represented in the brains of people with AAS in the same way as they are represented in the brains of people without brain damage. The speech impairment in AAS is an incorrect plan for how these sounds are sequenced, the muscle contraction patterns to produce the sequence, and other variables such as the timing of different sounds. A word like “strategy,” which is phonetically complex with its word-initial, three-consonant cluster /str/ and another complex sound, /dg/, is particularly challenging for people with AAS. Clinicians and scientists who regard AAS as a speech motor disorder argue that phonetically complex words are more susceptible than phonetically simple words (such as “sag”) to speech motor planning disorders. The view of AAS as an aphasia is not that phoneme sequencing is impaired, but rather that the phoneme representation and/or the ability to select the correct phonemes is impaired. This view does not regard AAS as a motor speech disorder. For example, the /s/ in strategy may be incorrectly represented by a /t/, resulting in a word production like “trategy.” The patient could produce the /s/ well if the representation of the sound in the brain was intact. Also, the process of selecting the sounds — a stage of word production preceding the programming of sound sequences — may be impaired. Eling (2016) provides a review of the debate concerning the nature of AAS. The following transcript of a patient with AAS following a stroke is taken from a monologue in which she describes her speech and other impairments: P:  The two areas that I have problems with, um . . . is in my left hand (holds up her right hand),

I have that I’m gonna remember, I’m gonna . . . have this meh, ber, my (syllable cut off quickly), the word wrong. It’s struggity, wait my studiservist . . . ssstext derity, and . . . that’s my hand, and . . . that’s in my left hand (again, holding up right hand) and I’m right-handed. So . . . if you notice I really don’t have a lot of . . . uh . . . makeup, I tried it fir-fir, th, the-fir first time today (syllable repetitions in preceding phrase spoken very rapidly), a lot of that I couldn’t I couldn’t write, du-couldn’t write, I couldn’t write, I even tried my hair spr- my . . . my . . . umm . . . my . . . bandana . . . umm . . . but I had trouble with it, so, . . . um . . . but that’s, that’s the little pit of the problem, uh, my biggest problem is . . . with my speech. Sooo, I started with, therapy, just last week — this week. So, Monday I had . . . just, um, an evaluation, and then, yesterday, Wednesday, . . . and then, hockey therapist, OT, I can’t really pronounce it, opcheetherah . . . um, ah, oxeetelabis, and then there’s, um, filicul, filucul, therapis. The individual recognizes her speech sound errors (attempts to correct “struggity” and “studiservist,” and the several attempts to produce “occupational”). Simple words are well articulated, but multisyllabic, phonetically complex words such as “struggle,” “dexterity,” and “occupational” are not. For the most part, when the individual is not making speech sound errors, her speech does not sound abnormal.

Aphasia Due to Stroke:  A Summary Language disorders in adults are a common impairment following stroke. The aphasia types summarized in this chapter are in common use among SLPs and aphasia researchers. However, aphasia types are not always clearly defined — speech and language characteristics of two or more types may be observed in a single patient. Indeed, “pure” aphasia types are unusual. For example, patients diagnosed with Wernicke’s aphasia are likely to have some language impairment expected from other aphasia types; the same is true for other diagnoses (e.g., Broca’s aphasics having some comprehension deficits). Perhaps a better way to think about aphasia types is that any individual type reflects the primary deficit, with recognition that other language deficits are likely as well. Global aphasia may be the endpoint of these multiple language deficits — all aspects of language expression and comprehension are affected by a stroke that damages most perisyl-

9  Language Disorders in Adults

vian speech and language areas. With this in mind, Table 9–1 provides a summary of speech and language characteristics in the “classic” aphasia types, as previously discussed. Figure 9–6 presents this idea in schematic (simple) form. One arrowhead is in Broca’s area, the other arrowhead in Wernicke’s area. Localized lesions in the far anterior region of the perisylvian language area are likely to result in an aphasia with a primary impairment in expression. Localized lesions in the far posterior region of the perisylvian language area are likely to result in an aphasia with a primary impairment in comprehension. Lesions between these two endpoints, within the perisylvian language area, are likely to produce a “mixed” aphasia, in which expressive and receptive impairments are observed to varying degrees. Aphasiologists often use the terms “anterior lesion” and “posterior lesion” to indicate the general location of damage because these terms are less restrictive than “Broca’s area” and “Wernicke’s area.”

It just so happens that the brains of two of Broca’s patients have been preserved. These brains, located in a Paris museum until 2016 and now still in Paris but at a different location, have been scanned by researchers using modern imaging technology. When Dronkers, Plaisant, Iba-Zizen, and Canabis (2007) scanned these brains, the lesions were not restricted to the classical, relatively small cortical regions named Broca’s area. In fact, the lesions were much larger, extending into middle structures including white matter and specifically the arcuate fasciculus, the tract that connects Wernicke’s and Broca’s areas. Research findings like this suggest caution in linking expressive or receptive speech and language deficits to damage in the classical, cortical brain areas associated with the work of Drs. Broca and Wernicke.

Table 9–1.  Summary of Speech and Language Characteristics in Aphasia Types

Aphasia Type

Speech and Language Characteristics

Broca’s aphasia

Expressive aphasia — nonfluent Slow effortful speech Agrammatism Good comprehension Anomia

Wernicke’s aphasia

Receptive aphasia — fluent Poor comprehension of language in the absence of hearing loss Articulation close to normal Fluent, expressive speech lacks meaning Not specifically responsive to questions

Conduction aphasia

Good receptive and expressive skills Impaired repetition Paraphasias for speech sounds, syllables, and words Anomia

Global aphasia

Expressive and receptive aphasia — nonfluent Short, automatic phrases may be preserved both in expression and comprehension Poor comprehension of complex language Anomia

Primary progressive aphasia

123

Like Broca’s aphasia (one type) Poor word retrieval and comprehension (one type) Sentence repetition and comprehension impairments Problems with reading and written-language skills

124

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Parietal Lobe

Wernicke’s Area

Frontal Lobe

Occipital Lobe

Broca’s Area

Temporal Lobe Figure 9–6.  The concept of anterior and posterior lesions. See text for details.

Traumatic Brain Injury and Aphasia The definition of a traumatic brain injury (TBI) is “a bump, blow or jolt to the head, or penetrating head injury, that results in disruption of the normal function of the brain” (Marr & Coronado, 2004). TBI is a major health concern. According to a fact sheet published by the Centers for Disease Control and Prevention (CDC), in 2013 there were approximately 2.8 million emergency room visits for head injuries in the United States (https://www.cdc.gov/traumaticbraininjury/get_the_ facts.html). Both adults and children are included in this estimate. Around 80% of people who are seen in an emergency room for a TBI are diagnosed with a mild injury (Douglas et al., 2019). In this section, the focus is on closed head injuries, in which there is no pene­ tration of the skull and brain by a flying object such as a bullet. The evolution of TBI symptoms varies widely across individuals and depends on factors such as age, severity of the injury, and the consciousness status of the patient immediately after the trauma. A TBI often evolves as a sequence of events from the time of injury to near or full recovery of function. If the individual is unconscious after the head injury, the time from the trauma to regaining consciousness may vary from very short (e.g., a few minutes) to very long (e.g., a week or more, or, in some cases, a permanent comatose state). When consciousness is

regained, a period of confusion is typical. During this time, the individual may be confused about where he is, how or even if an injury occurred, and personal information. The individual’s language may reflect this confusion. The person may not be able to speak for a short period, may speak in disconnected sentences with little content, and may have comprehension problems. After the period of confusion, most individuals improve steadily; this includes significant improvement in language abilities.

Nature of Brain Injury in TBI Closed-head injuries are associated with brain damage that is different from that observed in stroke. The “bump or jolt” to the head causes the cerebral hemispheres to slam into the bony casing of the skull. In the case of a powerful blow to the head, the cerebrospinal fluid in which the brain floats is not sufficient to protect the soft tissues of the brain. The damage to brain tissue is often in the frontal lobes where the impact may be greatest due to the forward, high acceleration of the cerebral hemispheres in response to the blow to the head. Structures of the temporal lobe are also likely to be damaged in a closed head injury. The blow to the head may cause the cerebral hemispheres to twist inside the skull at high accelerations. This results in shearing injuries to neural tissue. Shear-

9  Language Disorders in Adults

ing injuries occur in axons, which are stretched and rotated so violently that they are torn. The injuries to axons affect the integrity of white matter. Gray matter areas, such as in the cortex or in subcortical nuclei, are partially disconnected by these shearing injuries, which have profound consequences for the brain networks serving both cognitive and high-level language functions (Douglas et al., 2019).

Language Impairment in TBI Speech and language impairments are common in TBI. The impairments are typically most severe immediately after the trauma and improve over time. The improvement often brings the patient back to “normal” language skills. Language skills in TBI are often assessed by formal tests used to evaluate individuals who have suffered a stroke and have aphasia. These tests evaluate language skills such as word recall, naming of objects, and production and comprehension of syntax. According to a different perspective, as the patient recovers from the injury formal tests of aphasia may suggest a return to normal language skills but miss a continuing language impairment of social language use. Language skills in the area of phonology, morphology, syntax, and to a large degree, content, are evaluated by these aphasia tests as normal or near-normal (Steel, Ferguson, Spencer, & Togher, 2015; Vas, Chapman, & Cook, 2015). The remaining language impairment, potentially having substantial functional consequences, is of higher-level skills such as expressive discourse, comprehension of the overall message of discourse, and social language use (language pragmatics).

Structural Components of Language Structural components of language include phonology, morphology, syntax, and content (reviewed in Chapter 3). Dysarthria, a motor speech disorder caused by damage to the speech motor control centers of the brain, is not considered a phonological disorder, but may affect speech sound production in persons with TBI. Although dysarthria is not a language disorder, it is important to mention because about 33% of individuals with TBI have dysarthria (Beukelman, Nordness, & Yorkston, 2011). AAS following a head injury appears to be rare (Cannito, 2014). Morphology and syntax are often affected in the initial phases of recovery from TBI. For all but the most severe cases, morphological and syntactical skill improve over time and are likely to return to normal use. Anomia, an impairment of the content component

125

of language, is a prominent characteristic of the early phases of language recovery in TBI. Like morphology and syntax, anomia improves over time but may persist past the time when other structural language skills have returned to normal.

Pragmatics (Social Use of Language) Language use for social communication is a primary area of concern in adults with TBI. Higher-level, discourse-type language skills may be impaired, even when structural components of language are intact. Comprehension of metaphors (“A veil of secrecy surrounded the committee’s work”) and figurative language (“He inhaled his lunch”) may be impaired. Persons with TBI may not be able to narrate a story in a coherent way. Individual utterances of the story may not be sequenced correctly, and information that is not relevant to the story may be included in the narrative; story details may be repeated several times. As summarized by Vas et al. (2015), there are many other aspects of social language use: “we engage in conversational discourse during speaker–listener interactions and social exchanges, we use descriptive discourse to explain attributes and features of an object, we use narrative discourse to describe an event, we use procedural discourse to explain a task procedure, and we use expository discourse to inform a listener of a topic through facts or interpretations” (p. 499). Narrative language impairments have been demonstrated in individuals with moderate TBIs whose structural language skills are not impaired (Marini, Zettin, & Galleto, 2014). When words, individual sentences, and grammar appear to be normal, and people are not aware that the person they are interacting with has a TBI, it is easy to understand how a social language impairment can be a major challenge to everyday life. Social language use requires a synthesis of cognitive and linguistic skills (Steel, Ferguson, Spencer, & Togher, 2017). Cognitive skills such as memory, processing speed, flexibility in processing, attention, and inhibition are joined with linguistic skills to make social language use as effective as possible (Vas, Chapman, & Cook, 2015). An example of the interaction and mutual dependence of cognitive and linguistic skill is in turn taking during conversation. Linguistic skills of following content and knowing when an individual is about to finish a contribution to a discussion are joined to attention to the structure of the conversation, memory of what has been said, and inhibition of interrupting the speaker before he or she is finished. Damage to the frontal lobes, and its connections with almost all other area of the brain, is almost certainly the basis of impaired social communication in

126

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

individuals with TBI. A healthy frontal lobe is critical to everyday functioning because of its central role in executive function of the brain. Executive function is important to many aspects of behavior, among which are several of the cognitive skills mentioned previously, such as memory, processing speed, and attention. Executive function directs the brain to pay attention to certain stimuli, and not others, directs the brain to adjust its sensitivity to stimuli, controls impulsive behavior, and is critical to organizational and emotional skills (Wood & Worthington, 2017). Impaired executive function in TBI, due to damage to the frontal lobes, affects an individual’s ability to use language effectively in social situations.

Dementia The CDC estimates that 5.7 million people in the United States are living with Alzheimer’s disease (AD), the leading cause of dementia (https://www.cdc.gov/ chronicdisease/resources/publications/aag/alzheimers.htm). AD is most often diagnosed in people above the age of 65 years. The growing number of elderly people in the United States suggests a future with everincreasing diagnoses of AD (https://www.cdc.gov/ chronicdisease/resources/publications/aag/alzheimers.htm). In a 2011 to 2013 review of Medicare data, Alzheimer’s disease accounted for roughly 45% of all diagnosed dementias. The other subtypes of dementia, such as vascular dementia, Lewy body dementia (as well as Parkinson’s disease dementia), and frontotemporal dementia, had substantially lower prevalence compared with AD (Goodman et al., 2017) but still accounted for slightly more than half of all diagnosed dementias. Thus, the total number of people in the United States living with dementia is probably in excess of 10 million. Worldwide the prevalence of dementia and its impact on families is staggering. Dementia has many different behavioral characteristics which depend to some degree on the specific type of the disease. Problems are observed in memory, general cognition, executive function, psychiatric disturbance, depression, and agitation. These all lead to profound effects on everyday life functions. In all cases of dementia, the underlying brain disease is progressive and irreversible. Death of neurons either in specific regions of the brain or throughout the brain is the cause of the dementia symptoms and their worsening over time. For each of the dementias described later in this chapter, a general statement of the underlying brain

pathology is provided. More detailed accounts of brain pathologies in dementia are available in the scientific literature. Speech and language disorders are a prominent characteristic of dementia. To some degree, the specific characteristics of a speech and language impairment, and the way they change throughout the course of the disease, are different for the different types of dementia.

Brain Pathology in Dementia The brain pathology in AD includes an extracellular (on the outside of a brain cell) accumulation of a substance called amyloid, and an intracellular (inside the cell) accumulation of neurofibrillary tangles which are accumulations of proteins that prevent normal function of a neuron (Kumar, Kumar, Keegam, & Deshmuk, 2018). These abnormal accumulations increase over time and cause dysfunction and eventual death of neurons. In vascular dementia, localized or general regions of the brain are destroyed when blood flow to these regions is limited or blocked completely (hence, the term “vascular” dementia). The lesions created by the loss of blood flow are the cause of many of the dementia problems mentioned previously. Memory problems are typically absent in this type of dementia, at least at the beginning of the disease. O’Brien and Thomas (2015) have published an excellent review of vascular dementia and the debate surrounding the diagnosis. Lewy body dementia is a broad category of different dementias. Lewy bodies are pathological clusters of proteins within nerve cells. Symptoms of Lewy body dementia include cognitive impairment, visual difficulties including hallucination, thinking and reasoning impairments, and confusion that varies within a day or from day to day (Walker, Possin, Boeve, & Aarsland, 2015). At the outset of Lewy body dementia, very few patients have speech and/or language impairments, but with progression of the disease, language impairments are likely. Frontotemporal dementia (FTD) is a complex brain disease that takes one of several forms (Olney, Spina, & Miller, 2017; Mesulam, 2018). Progressive primary aphasia, discussed previously, is considered to be a form of FTD, or at least to progress over time to FTD. Regardless of the form, the brain pathology common to a diagnosis of FTD is atrophy of the frontal and temporal lobes (Bang, Spina, & Miller, 2015). The initial signs of an FTD that does not begin with language deficits are primarily behavioral, including personality changes, loss of inhibition, and impairment of executive function.

9  Language Disorders in Adults

Language Disorders in Dementia Speech-language characteristics in several types of dementia are listed in Table 9–2. Most of these characteristics have been discussed in the earlier section

“Classification of Aphasia.” Like the nonlanguage symptoms of dementia, the speech-language symptoms evolve during the progression of the disease and may not be the same at two points in time. Speech and language impairments may be observed in the early

Table 9–2.  Selected Speech and Language Characteristics of Several Types of Dementia

Type of Dementia

Speech and Language Characteristics

Alzheimer’s disease

Anomia Semantic paraphasias (e.g., “toes” for “hand”) Poor word comprehension Reduced word fluency (poor ability to name as many animals as possible) Loss of narrative cohesion Relatively preserved phonological and syntactical skills Formulaic speech preserved (“Hi, how are you?”; “Excuse me”)

PPA (FTD):  Nonfluent aphasia

Nonfluent speech (effortful, slow) Agrammatism Phonemic paraphasias Impaired repetition Late-stage mutism

PPA (FTD):  Semantic dementia

Fluent, empty speech Semantic paraphasias (e.g., “toes” for “hand”) Severe anomia Intact automatic speech (“Hi, how are you?”) Loss of meaning in both expression and comprehension Preservation of syntax

PPA (FTD):  Logopenic

Poor word retrieval Frequent pausing (presumably searching for words) Impaired repetition

FTD: (Behavioral)

Loss of narrative cohesion

Vascular

Loss of phonemic fluency (poor ability to name things when required to confine the names to a single beginning sound) Reduction of expressive grammatical complexity

Lewy body dementia

127

Early preservation of language skills Later narrative incoherence Anomia Reductions in word fluency

Note.  Not all characteristics are seen in a specific type of dementia. FTD = frontotemporal dementia; PPA = primary progressive aphasia. Based on information in “Cognition, Language, and Clinical Pathological Features of Non-Alzheimer’s Dementias: An Overview,” by J. Reilly, A. Rodriguez, M. Lamy, and J. Neils-Strunjus, 2010, Journal of Communication Disorders, 43, pp. 438–452; “Connected Speech in Neurodegenerative Language Disorders: A Review,” by V. Boschi, E. Catricalà, M. Consonni, C. Chesi, A. Moro, and S. F. Cappa, 2017, Frontiers in Psychology, 8, p. 269; Steel, Ferguson.

128

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

stages of dementia or become apparent with disease progression. The possibility that type-specific impairments of speech and language may contribute to a differential diagnosis of dementia type is intriguing (Reilly, Rodriguez, Lamy, & Neils-Strunjus, 2010). The differential diagnosis of type of dementia is not always agreed upon, even by experienced medical professionals. For example, the diagnostic distinction between Alzheimer’s disease and vascular dementia may be particularly challenging. Speech-language pathologists can have an important role in sharpening diagnostic distinctions in the dementias.

Chapter Summary Stroke, TBI, and dementia are common causes of acquired, adult language disorders. Language disorders in adults may occur because of damage to cortical and subcortical structures associated with speech and language skills, or as a result of connections between these structures, or both; these structures are located (for most people) in the perisylvian speech and language area, which includes parts of the frontal, temporal, and parietal lobes. Aphasia is an impairment of language expression and/or comprehension, resulting from brain damage; aphasia can be the result of a stroke, TBI, or dementia. Stroke, which results from a loss of blood flow to regions of the brain, may result in an expressive, receptive, or mixed (both expressive and receptive) aphasia. There are several types of aphasia, including Broca’s aphasia, Wernicke’s aphasia, conduction aphasia, anomic aphasia, global aphasia, and primary progressive aphasia. Broca’s aphasia is a nonfluent, mostly expressive aphasia likely due to anterior damage in the perisylvian speech and language areas. Wernicke’s aphasia is a fluent, mostly receptive aphasia, likely due to posterior damage in the perisylvian speech and language areas. Conduction aphasia is a fluent aphasia in which comprehension and expression are more or less intact (Broca’s and Wernicke’s areas are unaffected by the stroke), but repetition is impaired because of damage to the arcuate fasciculus, the fiber tract that connects Broca’s and Wernicke’s areas. Global aphasia is a nonfluent aphasia in which both expressive and receptive language is impaired; it is usually the result of both anterior and posterior damage in the perisylvian speech and language area. Primary progressive aphasia is not due to stroke but often begins as an isolated language problem that

increases in severity over time; the brain disease selectively affects the frontal and temporal lobes, including speech and language tissue in those lobes. Apraxia of speech is thought by some to be a motor speech disorder and by others to be an aphasia (or both); it is most commonly regarded as a motor speech disorder in which the problem is poor speech motor planning. Aphasia is a frequent result of TBI, but the expressive and receptive language deficits often do not easily fit the aphasia types discussed in the chapter; as people recover from TBI, their structural language deficits resolve, but in many cases, a social language use disorder remains. Dementia takes several different forms, including Alzheimer’s disease, vascular dementia, Lewy body disease, and frontotemporal dementia. Language deficits in dementia are common and vary depending on the type of dementia.

References Alexander, M. P., & Loverso, F. (1992). A specific treatment for global aphasia. Clinical Aphasiology, 21, 277–289. Ardila, A. (2010). A review of conduction aphasia. Current Neurology and Neuroscience Reports, 10, 499–503. Bang, J., Spina, S., & Miller, B. L. (2015). Non-Alzheimer’s dementia 1: Frontotemporal dementia. The Lancet, 386, 1672–1682. Beukelman, D. R., Nordness, A., & Yorkston, K. M. (2011). Dysarthria associated with traumatic brain injury. In K. Hux (Ed.), Assisting survivors of traumatic brain injury: The role of speech-language pathologists (2nd ed., pp. 185–226). Austin, TX: Pro-Ed. Boschi, V., Catricalà, E., Consonni, M., Chesi, C., Moro, A., & Cappa, S. F. (2017). Connected speech in neurodegenerative language disorders: A review. Frontiers in Psychology, 8, 269. https://doi.org/10.3389/fpsyg.2017.00269 Cannito, M. P. (2014). Clinical assessment of motor speech disorders in adults with concussion. Seminars in Speech and Language, 35, 221–233. Douglas, D. B., Ro, T., Toffoli, T., Krawchuk, B., Muldermans, J., Gullo, J., . . . Wintermark, M. (2019). Neuroimaging of traumatic brain injury. Medical Sciences, 7, 2. Dronkers, N. F., Plaisant, O., Iba-Zizen, M. T., & Cabanis, E. A. (2007). Paul Broca’s historic cases: High resolution MR imaging of the brains of Leborgne and Lelong. Brain, 130, 1432–1441. Eling, P. (2016). Broca’s faculté du langage articulé: Language or praxis? Journal of the History of the Neurosciences, 25, 169–187. Ellis, C., & Urban, S. (2016). Age and aphasia: A review of presence, type, recovery, and clinical outcomes. Topics in Stroke Rehabilitation, 23, 430-439. Flamand-Roze, C., Flowers, H., Roze, E., & Denier, C. (2013). Diagnosis and management of language impairment

9  Language Disorders in Adults

in acute stroke. In E. Holmgren & E. S. Rudkilde (Eds.), Aphasia: classification, management practices, and prognosis (pp. 91–114). New York, NY: Nova Science. Goodman, R. A., Lochner, K. A., Thambisetty, M., Wingo, T., Posner, S. F., & Ling, S. M. (2017). Prevalence of dementia subtypes in U.S. Medicare fee-for-service beneficiaries, 2011–2013. Alzheimer’s and Dementia, 13, 28–37. Grossman, M. (2012). The non-fluent/agrammatic variant of primary progressive aphasia. Lancet Neurology, 11, 545–555. Henry, M. L., & Grasso, S. M. (2018). Assessment of individuals with primary progressive aphasia. Seminars in Speech and Language, 39, 231–241. Hoffmann, M., & Chen, R. (2013). The spectrum of aphasia subtypes and etiology in subacute stroke. Journal of Stroke and Cerebrovascular Diseases, 22, 1385–1392. Klebic, J., Salihovic, N., Softic, R., & Salihovic, D. (2011). Aphasia disorders outcome after stroke. Medical Archives, 65, 283–286. Kumar, K., Kumar, K., Keegam, R. M., & Deshmuk, R. (2018). Recent advances in the neurobiology and neuropharmacology of Alzheimer’s disease. Biomedicine and Pharmacotherapy, 98, 297–307. Marini, A., Zettin, M., & Galleto, V. (2014). Cognitive correlates of narrative impairment in moderate traumatic brain injury. Neuropsychologia, 64, 282–288. Marr, A. L., & Coronado, V. G. (2004). Central nervous system injury surveillance data submission standards — 2002. Centers for Disease Control and Prevention. Atlanta, GA: National Center for Injury Prevention and Control. Mesulam, M. M. (2018). Slowly progressive aphasia without generalized dementia. Annals of Neurology, 11, 592–598. Montembeault, M., Brambati, S. M., Gorno-Tempini, M. L., & Migliaccio, R. (2018). Clinical, anatomical, and pathological features in the three variants of primary progressive aphasia: A review. Frontiers in Neuroscience, 9, 692. https://doi.org/10.3389/fneur.2018.00692

129

O’Brien, J. T., & Thomas, A. (2015). Vascular dementia. The Lancet, 386, 1698–1706. Olney, N. T., Spina, S., & Miller, B. L. (2017). Frontotemporal dementia. Neurologic Clinics, 35, 339–374. Pandey, A. K., & Heilman, K. M. (2016). Conduction aphasia with intact visual object naming. Cognitive and Behavioral Neurology, 27, 96–101. Reilly, J., Rodriguez, A., Lamy, M., & Neils-Strunjus, J. (2010). Cognition, language, and clinical pathological features of non-Alzheimer’s dementias: An overview. Journal of Communication Disorders, 43, 438–452. Smits, M., Jiskoot, L. C., & Papma, J. M. (2017). White matter tracts of speech and language. Seminars in Ultrasound CT and MRI, 35, 504–516. Steel, J., Ferguson, A., Spencer, E., & Togher, L. (2015). Language and cognitive communication during post-traumatic amnesia: A critical synthesis. NeuroRehabilitation, 37, 221–234. Steel, J., Ferguson, A., Spencer, E., & Togher, L. (2017). Language and cognitive communication disorder during post-traumatic amnesia: Profiles of recovery after TBI from three cases, Brain Injury, 31, 1889–1902. Vas, A. K., Chapman, S. B., & Cook, L. G. (2015). Language impairments in traumatic brain injury: A window into complex cognitive performance. Handbook of Clinical Neurology, 128, 497–510. Walker, Z., Possin, K. L., Boeve, B. F., & Aarsland, D. (2015). Non-Alzheimer’s dementia 2: Lewy body dementia. The Lancet, 386, 1683–1697. Wood, R. L., & Worthington, A. (2017). Neurobehavioral abnormalities associated with executive dysfunction after traumatic brain injury. Frontiers in Behavioral Neuroscience, 11, 195. https://doi.org/10.3389/fnbeh.2017.00195 Yourganov, G., Smith, K. G., Fridriksson, J., & Rorden, C. (2015). Predicting aphasia type from brain damage measured with structural MRI. Cortex, 73, 203–215.

10 Speech Science I Introduction The term “speech science” is used to designate the discipline in which normal (typical) processes of speech production, acoustics, and perception are studied. An underlying body of knowledge for these areas of study includes the anatomy of the speech and hearing mechanism and an appreciation for the basic principles of acoustics. The value of this knowledge is similar to the requirement of full understanding of anatomy, biology, chemistry, and physics for individuals preparing for a career in clinical medicine, or the value of anatomy and movement in the training of physical therapists. The current chapter presents information on anatomy and physiology of the speech mechanism. The acoustics of speech are presented in Chapter 11, and the anatomy and physiology of the hearing mechanism are presented in Chapter 22.

The Speech Mechanism: A Three-Component Description The speech mechanism is composed of three major divisions, including the respiratory system, the larynx, and the upper and nasal airways (Figure 10–1). Each division has many anatomical components, including bones, ligaments, cartilages, membranes, and muscles. Each division can also be assigned a global function in the production of speech. These global functions sim-

plify the actual workings of each part, but they provide a useful organizing perspective. The anatomical structures and functions are as follows: (a) the respiratory system, which is the power supply for speech; (b) the larynx, which by using airflow from the respiratory system to vibrate the vocal folds is the primary sound source for speech, and (c) the upper and nasal airways, which by movements of structures modify the sound source to form different speech sounds. Detailed presentations of speech anatomy and physiology are available in Zemlin (1997) and Hixon, Weismer, and Hoit (2020). Brain anatomy and physiology for speech production and hearing, which is often included as part of speech and hearing science, is covered in Chapter 2. “Functional anatomy” is discussed for each component of the speech mechanism. Functional anatomy presents the anatomy of a component at a level that is sufficient to understand the broad physiology (function) of the component. The functional anatomy presented in these sections is a much-simplified view of the anatomy presented in courses such as the “Speech Anatomy and Physiology” course listed in Table 1–3.

Respiratory System Component (Power Supply for Speech) It is useful to think of the respiratory system as consisting of two major parts, one being the lungs and the other being the chest wall. The lungs are composed of 131

132

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Upper and Nasal Airways

Larynx

Respiratory System

Figure 10–1. View of the three-part speech mechanism, including the respiratory system, larynx, upper airways/nasal passages.

spongy, elastic tissues that inflate and deflate as air passes into and out of them. Air in the atmosphere is taken into the lungs via a series of tubes (called bronchioles and bronchi) that terminate in the alveoli, where oxygen and carbon dioxide are exchanged between the bloodstream and air. As air is expelled from the lungs, it travels through the increasingly large tubes and passes through the largest tube, the trachea, before passing through the vocal folds and entering the air spaces in the throat (pharynx), mouth, and nose. The chest wall includes all structures of the respiratory system that are outside the lungs but are capable of compressing or expanding the air within the lungs. The anatomical structures that can compress or expand the lungs include a large set of muscles in the thorax (often called the chest), abdomen, and diaphragm (a large muscle that separates the thorax and abdomen). Elastic properties of structures such as the ribs and the sac-like membranes that enclose the lungs also contribute to compression or expansion of the lungs. Compression or expansion of the lungs exerts a force on the air within the lungs. When the lungs are compressed, air molecules within them are compressed

(forced together more closely) which increases lung pressure; when the lungs are expanded, air molecules within them are expanded (pulled apart from each other) which decreases lung pressure.

The Effect of Lung Pressure on Airflow Changes in lung pressure are critical to moving air in the expiratory (breathing out) and inspiratory (breathing in) directions. When the lungs are open to the atmosphere, which means there are no blockages along the pathway from lungs to lips, or from lungs to the nares (outlets of the nostrils), the pressure inside the lungs and in the atmosphere are the same. In this circumstance, air does not flow between the lungs and atmosphere, in either direction. Air flows from one point to another only when there is a pressure difference between the points. Lung pressure is raised when the lungs are compressed by muscular contraction and other forces (such as elastic forces). When lung pressure is greater than pressure outside the mouth, which we refer to as atmospheric pressure, air flows from the lungs to the atmosphere (as in exhalation). When lung pressure

10  Speech Science I

is lowered relative to atmospheric pressure, air flows from the atmosphere to the lungs (as in inhalation). Speech is produced on exhaled air; inhaled air is used to fill the lungs and make the air supply ready for the next utterance.

The Respiratory System and Vegetative Breathing Vegetative breathing is the exchange of air between the lungs and atmosphere that is required to sustain life. Vegetative breathing, also called rest breathing, is what you do when sitting at your desk and studying, sleeping, watching television, or the many other activities that involve quiet and seemingly effortless inhalation and exhalation. The purpose of vegetative breathing is simple. Air is inhaled to bring oxygen (O2) to millions of alveoli, the tiny, sac-like structures that are the terminal chambers within the lungs. The O2 is stored in the alveoli and distributed to the bloodstream in a slow, continuous process. As the O2 travels in the bloodstream, it is picked up and used in the normal functioning of every cell within the body. The functioning of these cells produces byproducts, one of which is carbon dioxide (CO2). CO2 is a toxin that must be continuously eliminated from the body. This is done by carrying CO2 through the bloodstream to the alveoli — the same

organ that stores O2 — where it is “leaked out” to the airways and exhaled to the atmosphere. The alveoli function as a two-way valve, storing O2 collected from inhaled air for delivery to the bloodstream, and storing CO2 collected from the bloodstream’s uptake of by-products of cellular activity for expulsion from the body via exhalation. Figure 10–2 is a breathing record called a spirogram, generated with an instrument called a spirometer or respirometer. The respirometer is an instrument that measures the volume of air inhaled or exhaled to or from the lungs. A participant breathes into the instrument while wearing a face mask; the device records the air volumes that have been displaced into or out of the lungs. The spirogram shows inhalation going up on the graph and exhalation going down. The x-axis of the graph is time, and there are two y-axes: the one on the left is labeled “Percent Vital Capacity” and the one on the right is labeled “Volume in Liters.” These y-axes are two different ways to express lung volume. The spirogram shows three cycles of inhalationexhalation that begin and end at the same height on the y-axis, indicated by the lower, horizontal dashed line. These are rest breathing cycles, which have three noteworthy characteristics. First, they repeat over time; second, the inhalation and exhalation phases are symmetrical, both in time (it takes the same amount of time to inhale as it does to exhale) and volume (the same

6

4 35

rest breaths

VC 2

0

Volume in Liters

Percent Vital Capacity

100

133

0

Time Figure 10–2.  Spirogram showing lung volume events as a function of time. Three rest breaths are followed by a vital capacity maneuver, and then by two more rest breaths. See text for additional detail.

134

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

amount of volume is inhaled and exhaled); and third, the volume exchanged is relatively small. In healthy individuals, the volume exchanged during rest breathing is no more than half a liter (500 milliliters), which is about 1/8th the volume that can be exhaled after taking a maximally deep breath.. The volume inhaled and exhaled during rest breathing is called tidal volume. Now imagine that the person who has generated these rest breathing cycles is asked to inhale as deeply as possible and then exhale as much air as he can from this maximum inhalation. The maximum inhalation is shown on the spirogram as a large, upward trace following the rest breathing cycles. The subsequent long, downward trace ending well below the volume of the rest breathing cycles is the total volume of air that can be exhaled from the maximum inhalation. The act of inhaling maximally and then exhaling maximally is called a vital capacity maneuver. Vital capacity (VC) is defined as the volume of air that can be exhaled from the lungs following a maximal inhalation. VC is marked on Figure 10–2.1 The value of VC varies among individuals, depending on sex, body size, age, health history, and other factors. The y-axis on the left of Figure 10–2 expresses the volume of air within the lungs in a way that allows comparisons across individuals, even though their actual VC volumes may be quite different. This axis expresses all lung volumes as percentages of an individual’s VC.

Speech Breathing The term “speech breathing” is used to differentiate breathing for speech from vegetative breathing, breathing during exercise, breathing to sing, and other behaviors in which the respiratory system plays an important role. Speech breathing depends on increasing or decreasing lung pressure to cause air to flow out of or into the lungs. Flow from the lungs to the atmosphere is used to produce words, sentences, and paragraphs; flow from the atmosphere is used to refill the lungs between utterances. What range of lung volumes do speakers use to generate the airflows and pressures required to produce audible speech? In theory, speaking can take place throughout the vital capacity — over the entire lung volume range. As it turns out, most speakers use only the lung volumes in the middle of the VC. 1

They begin speaking at around 60% of the VC, exhale air during speaking to about 35% VC, refill the lungs (without speaking) to about 60% VC, and continue speaking. The pattern of lung volume usage for speech is shown in Figure 10–4. Note the rest breathing cycles immediately prior to the rapid inhalation, and the long, slow decrease in lung volume for each phrase of the utterances, “You wish to know all about my grandfather, well, he is ninety three years and he still thinks as swiftly as ever.” The speech breathing inhalationexhalation pattern is different from the pattern for vegetative breathing.

The Goal of Speech Breathing The goal of speech breathing is to maintain a roughly constant lung pressure during an utterance. This is illustrated in Figure 10–3, which shows lung pressure measured before, during, and after the utterance, “This is the best textbook on Introduction to Communication Disorders.” The word “best” is emphasized by the speaker. Time is on the x-axis, and lung pressure is on the y-axis, the latter measured in units of centimeters of water (cm H2O, a unit of air pressure). The “zero” value on the pressure scale, indicated by a horizontal dashed line extending across the graph, corresponds to atmospheric pressure. Positive pressures indicate lung pressures greater than atmospheric pressure. Negative lung pressures are less than atmospheric pressure. The lung pressure during the entire utterance, which has a duration slightly less than 2 s, is more or less constant at a value of around +6.5 cm/H2O. There is a small, temporary increase in pressure to about 7 cm/H2O (upward-pointing arrow) for the emphasized word “best.” Note the small, negative pressures immediately before and after the utterance. These negative pressures are the result of expansion of the lungs to allow inhalations before and after the utterance. The momentary pressure increase for the word “best” makes it louder, consistent with the emphasis on this specific word in the utterance. Figure 10–4 shows percentage VC on the y-axis and time on the x-axis for the first three utterances of a famous reading passage in speech-language pathology, the Grandfather Passage (Darley, Aronson, & Brown, 1975). This lung-volume-by-time graph illustrates the small range of lung volumes used for each of these three utterances. Two cycles of rest breathing are shown before a rapid inhalation to about 60% VC.

 ypical values of VC are 4.6 liters (L) for adult males and 4.0 for adult females; values will be lower for children. VC values vary with body T size, sex, and age, so the values given here are averages only. When a participant has exhaled all the air he or she possibly can, there is still a volume of air remaining in the lungs, shown in Figure 10–2 as the shaded region at the bottom of the graph, but this is typically not measured and is not included in the VC measures.

Lung pressure (cm H2O)

10

5

0

This

is

the

best textbook on Communication disorders

breath

breath

2 seconds Time Figure 10–3.  Graph showing lung pressure over time for the utterance, “This is the best textbook on communication disorders,” with emphasis on the word “best.” Note how the positive pressure developed in the lungs is at a nearly constant value for the entire utterance, except for a brief and small increase for the emphasized word “best.”

Percentage VC

100

60

35

0 You wish to know all about my grandfather

well he is nearly ninety three years old

and he still thinks as swifty as ever

Figure 10–4.  Graph showing lung volume, expressed as percent vital capacity (VC) over time, for the three utterances, “You wish to know all about my grandfather/well he is nearly ninety-three years old/and he still thinks as swiftly as ever.” For each utterance, the talker inhales to about 60% VC and talks on exhalation down to a lung volume of about 35% VC. Speech is produced within a limited range of lung volumes, even across three consecutive utterances.

135

136

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The inhalation is followed by a slow, long decrease of lung volume for the first phrase (“You wish to know all about my grandfather”). When the lung volume decreases to roughly 35% VC, there is another rapid inhalation to approximately 60% followed by a similar slow, decreasing lung volume for the second phrase

When Less Is More That speech is produced using a small range of lung volumes may seem uneconomical: use of the entire VC for speech seems to suggest the possibility of more message per unit time. But, in fact, studies of speech breathing (Hixon, Goldman, & Mead, 1973) have shown that speech produced between about 60% and 35% VC (about ¼ of all the air you can exhale following a maximal inhalation) requires less muscular activity in the thorax and abdomen, compared with speech produced at very high lung volumes (near 100%) or very low lung volumes (near the point in lung volume where you cannot exhale any more air). In the midrange of lung volume — between 60% and 35% VC — the least muscular activity is required to maintain a constant lung pressure (see Figure 10–3). What we have here is biological efficiency — minimal exertion with more payoff.

Hands relaxed

(“well, he is nearly ninety-three years old”). This sequence is repeated for the third phrase (“and he still thinks as swiftly as ever”). The breathing cycles for speech use much less air volume than the VC. The breathing cycles for speech also differ from the rest breathing cycles before the first phrase in Figure 10–4. The lung volumes used for rest breathing cycles are less than the volumes used for speech. In addition, the relative durations of the inhalation and exhalation phases of the two types of breathing — rest breathing and speech breathing — are different. The inhalation-exhalation phases are equivalent for rest breathing, but in speech breathing, the inhalation phase occurs very quickly, and the exhalatory phase is much longer. The long exhalatory phase in speech breathing is largely a result of interruptions of the outgoing airflow at the vocal folds and at locations in the airway between the vocal folds and the lips. These interruptions are like valves opening and closing as air flows through a tube.

Speech Breathing and Abdominal Muscles Muscles of the thorax and abdomen contribute to lung compression and therefore the raised lung pressures that cause air to flow from lungs to atmosphere. Contraction of the abdominal muscles is especially important for efficient speech breathing. A balloon model of these muscular events is shown in Figure 10–5. The balloon is inflated with the open end pinched closed. The

Thoracic wall expiration

Thoracic wall expiration and Abdominal wall expiration

Larynx

Thoracic wall

Abdominal wall

Figure 10–5.  Balloon model of how muscular effort in the respiratory system can be applied under inefficient (middle) and efficient (right ) conditions to maintain the constant, positive lung pressures required for speech production.

10  Speech Science I

hands around the balloon represent the compressive muscular effects of the thorax (top hand) and abdomen (bottom hand). In the left-hand image, the hands are relaxed, as if the muscles of the thorax and abdomen are relaxed, exerting no force on the lungs. The middle image shows a squeeze by the upper hand, as if the muscles of the thorax exert compression of the lungs; the bottom hand is relaxed. The upper squeeze momentarily raises the pressure in the balloon, but the incompressible nature of air in a closed volume results in the bulge of the balloon’s bottom half. The pressure increase of the air inside the balloon is “lost” in the expansion of the bottom half — this is not what you want to maintain a constant, positive pressure during a speech utterance as shown in Figure 10–3. How can the pressure be maintained at a constant level when the expiratory muscles of the thorax are contracted? The answer is in the contraction of the abdominal muscles. A constant squeeze of the balloon by the bottom hand (right image, Figure 10–5) allows squeezes at the top of the balloon to maintain the constant pressure required for speech utterances, with relative ease. The thoracic squeezes do not have to be excessive to maintain the pressure. In addition, small squeezes of the top hand during speech can raise the pressure with relative ease (e.g., emphasis on “best” in Figure 10–4).

The Balloon Model Provides a Clinical Hint The balloon model of muscle activity of the thorax and abdomen, and its effect on pressure in the lungs, is more than a simple way to explain speech breathing. Individuals who have weak or paralyzed abdominal muscles and normal or near-normal functioning of the thoracic muscles can compress the lungs with their thoracic muscles, but the increased lung pressure is partially or completely reduced or lost when weak or paralyzed muscles of the abdomen cannot hold in the abdominal wall. These speakers have to work much harder to produce acceptable levels of lung pressure for speech, which can have significant effects on their ability to communicate. A low-tech clinical approach to the speech breathing problem of ineffective abdominal muscles is to use an adaptive device (like a thick belt) to compress the abdomen during speech. The belt takes over the role of the abdominal muscles, holding in the abdomen to prevent it from being pushed outward.

137

The muscular actions in the respiratory system for the generation of utterance pressure are as shown in the balloon model. The expiratory muscles of the thorax do the primary job of lung compression, while the abdominal muscles do the primary job of holding in the abdominal wall, preventing it from bulging out and causing the lungs to lose the pressure raised by contraction of thoracic muscles The constant contraction of the abdominal muscles during speech is an efficient solution to generating a constant lung pressure for speech utterances.

Speech Breathing and Voice Loudness The lung pressure value of 6.5 cm H2O in Figure 10–3 does not mean much to someone who has not worked with air pressures in the speech mechanism, so a realworld reference is offered here. If you are speaking with someone in a quiet room and the two of you are standing about 1 meter apart, speech with a lung pressure of 6.5 cm H2O sounds comfortably loud — not too loud, not too soft. It is a more or less typical value of lung pressure used by people speaking to each other at fairly close range. A lung pressure of 9 cm H2O makes speech seem loud at this distance, and lung pressures approaching 12 cm H2O produce a speech loudness that seems like shouting.

Clinical Applications:  An Example Lung pressure is the primary determinant of speech loudness. Greater lung pressure typically results in louder speech (assuming a constant distance between a speaker and listener). Lung pressure is therefore an important factor in the intelligibility of speech; softer speech is more likely to suffer from intelligibility problems than louder speech. Speech-language clinics see patients whose primary complaint is an inability to be understood because they cannot generate an adequate amount of lung pressure. Speech intelligibility may not be the only problem associated with a speech breathing problem. A person who has speech breathing problems and difficulty producing changes in lung pressure and therefore speech loudness may experience a reduced ability to convey emotional states. We use the loudness of our voices to express a variety of emotions, and a patient who does not have good control over lung pressures may suffer in this area. This paralinguistic function of speech breathing (“paralinguistic” meaning the use of nonverbal cues such as loudness and pitch to convey mood, emotion, and so forth) is an important aspect of communication, especially in social situations (see Chapter 3).

138

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The Larynx (Sound Source for Speech) As shown in Figure 10–6, the larynx is a structure that is composed of cartilage, muscle, ligaments, and membranes. The vocal folds are the component of the larynx that generates the sound source for all vocalic sounds (vowels, semivowels, diphthongs, and nasals) as well as a subset of consonants. This sound source comes from the vibration of the vocal folds, as discussed below.

Laryngeal Cartilages The laryngeal cartilages form a strong but flexible framework to support a collection of soft tissues (muscles, ligaments, and membranes). The major carti-

lages of the larynx, as well as the hyoid bone which is attached to the larynx, are shown in Figure 10–6. Figure 10–6 shows the position of the larynx within the neck. Immediately above the larynx is the bottom of the throat, and immediately below the larynx is the upper edge of the windpipe (trachea). The two red bands between the arytenoid cartilages and the front of the thyroid cartilage represent the vocal folds — the tissue whose vibrations create the sound source. The larynx is very small — from top to bottom about 45 mm (1.75 inches) in men, 36 mm in women (1.40 inches), and much smaller in children. The length of the muscular part of the human vocal folds is equally tiny — about 14 mm (0.58 inches) in men, and 11.1 mm (0.44 inches) in women, and smaller in children (for adults, see Su et al., 2002).

Hyoid bone Epiglottis

Thyroid cartilage Arytenoid cartilages Cricoid cartilage

Figure 10–6.  The position of the larynx in the neck, with four cartilages (cricoid, arytenoid [paired], thyroid, and epiglottis) and one bone (hyoid bone) labeled. Note the top of the trachea immediately below the cricoid cartilage.

10  Speech Science I

139

Laryngeal Muscles and Membranes Laryngeal muscles are classified broadly into one of two categories according to their points of attachment. Anatomists refer to muscle attachment points as origins and insertions.2 Intrinsic muscles of the larynx have both points of attachment within the larynx (e.g., from the arytenoid cartilages to the thyroid cartilage). Extrinsic muscles have one point of attachment within the larynx and one on a structure outside the larynx (e.g., from the breastbone to the thyroid cartilage). Extrinsic muscles are primarily responsible for positioning the larynx within the neck; they are the ones that cause the Adam’s apple to bob up and down during speech. The intrinsic muscles of the larynx open and close the vocal folds, stretch them, and adjust muscular tension to create different types of vocal fold vibration and, therefore, different voice qualities. Among the five intrinsic muscles of the larynx three can close the vocal folds, one can open the vocal folds, and one can stretch the vocal folds. Several (if not all) of these muscles serve double duty. For example, one of the muscles can both close and tense the vocal folds, another “closer” can also change the configuration of the vocal folds, which affects the quality of the voice.

The Vocal Folds The vocal folds are bands of muscular and nonmuscular (e.g., membranes and ligaments) tissue that run from their forward point of attachment on the inside of the thyroid cartilage to the posterior point of attachment on the vocal process of the arytenoid cartilage (red lines in Figure 10–6). The muscular part of the vocal folds is one of the five intrinsic muscles of the larynx. Figure 10–7 illustrates how an examiner obtains a “live” view of the vocal folds. A laryngeal mirror is inserted into the mouth and positioned close to the back of the throat, just under the flap of tissue called the soft palate. The mirror is tilted at an angle relative to the stem of the instrument. The tongue is pulled gently forward to move structures such as the tongue and epiglottis forward and out of the way for a clear view of the vocal folds. The vocal folds are reflected in the mirror, which is illuminated by a bright light fastened to a band around the examiner’s head, much like a miner’s light. In Figure 10–7, a black oval surrounds the vocal 2 

Figure 10–7.  Insertion of a laryngeal mirror to the throat to view the vocal folds (enclosed in the black oval ). The tongue is gently pulled away from the mouth to move structures forward that are likely to prevent a clear view of the vocal folds.

folds, the target of the examiner’s view. A photograph can be taken of the image in the mirror. Devices are also available for recording successive images of the vocal folds over time, during speech. Two images of the vocal folds are shown in Figure 10–8. The image on the left shows open vocal folds. The bands of vocal fold tissue, pearly gray with a touch of pale pink, extend from the back to the front of the larynx. One end of the bands of vocal fold tissue form the point of a “V,” where the two vocal folds come together at the front of the larynx, on the inner surface of the thyroid cartilage. The arms of the “V” diverge as they move to the back of the larynx. Follow the bands from the front (at the point of the “V”) to the back and along the edge of each vocal fold, and you see a change of color from pearly gray/pinkish white to pale white. The pale white is the posterior attachment of the vocal folds. These posterior points of attachment are to a part of the arytenoid cartilage (called the vocal process) on the same side as the vocal fold (see Figure 10–7).

 raditionally in anatomical descriptions, when a muscle contracts, it pulls from the point of insertion toward its origin, and hence, the distincT tion between two points of attachment. Usually, the origin is thought of as the more “fixed” point of attachment, whereas the point of insertion moves the structure to which it is attached. This is convenient anatomical terminology but, in reality (and especially for many muscles of the speech mechanism), it is not quite so simple.

140

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Vocal folds closed

FRONT (Anterior)

Vocal folds open

Figure 10–8.  Two views of the vocal folds from above. In both images, the front of the neck is at the bottom of the image, where the vocal folds come together on the inner surface of the thyroid cartilage. The left image shows the vocal folds open; the glottis is wide. The vocal folds form the point of a “V” where they meet at the thyroid cartilage. The right image shows the vocal folds closed. Technically, when the vocal folds are closed there is no glottis. However, the term “glottis” is widely used to refer to the vocal folds, whether closed or not.

The opening between the vocal folds is called the glottis (see Figure 10-8). With the vocal folds open, air can flow from the lungs and through the larynx to the airways of the throat, nose, and oral cavities. Think of the vocal folds shown in the left image of Figure 10–8 as an open valve between the lungs on the one hand and the upper airways on the other hand. The open vocal folds allow a view of rings of tissue beneath the glottis. The structure containing these rings is the tube-like windpipe (trachea). This is the large air tube that gives off increasingly smaller tubes that reach deep into the lungs where air can be absorbed by alveoli to carry oxygen to cells of the body, and that allows CO2 to flow from the alveoli to the atmosphere. The right side of Figure 10–8 shows a photograph of closed vocal folds. Vocal fold closure in the absence of phonation, the term used for the sound made by the vibrating vocal folds, occurs during exertion (such as lifting a heavy object, or going to the bathroom), and importantly, as part of the swallowing process. Vocal fold closure during swallowing is critical to protecting the airway during the passage of fluids and solid food from the mouth and pharynx into the esophagus (Chapter 20). The structure of the vocal folds is complex and includes both muscular and nonmuscular tissues. The nonmuscular parts include a membranous casing that covers the main bulk, or muscular body of the vocal

fold. Very importantly, even though the membranous cover and muscular body are part of the same structure ​ — ​the vocal fold — the two parts can move somewhat independently of each other. Moreover, the degree of their independent movement can be adjusted in fine increments, depending on the contraction pattern of laryngeal muscles. The fine structure of vocal fold tissues, as viewed under a microscope, has been studied in a fair amount of detail with some surprising results. The eminent Japanese physician Minoru Hirano (1932–2017) devoted much energy to histological study of vocal folds in humans and various animals (“histology” is the study of the microscopic characteristics of biological tissue). Hirano discovered that the adult, human vocal fold has a complicated tissue structure unlike that of any other species. This complicated tissue structure is also not seen in human infants but develops as children mature into adulthood. The difference between human and animal vocal folds was surprising because the primary function of the vocal folds is often regarded as protection of the airway, as noted earlier. Because protection of the airway is so important for health and even life in humans and animals, the vibratory function of the vocal folds — their sound producing capabilities — had sometimes been regarded as a secondary capability of the mammalian larynx, as if the vocal

10  Speech Science I

folds were adapted to a secondary purpose (phonation) while maintaining its primary role of protecting the lungs. Hirano’s research suggested otherwise. His discovery of the elaborate tissue structure of the human vocal folds suggested strongly that they were specialized for human voice. This specialization explained the remarkable range of voice qualities produced by humans and the role played by phonation in the fine nuances of human communication.

Phonation Phonation is the production of sound by the nearly periodic (repeating over time) vibration of the vocal folds. The repetitive opening and closing of the vocal folds during phonation is controlled by air pressures, air flows, and elastic characteristics of vocal fold tissue. While it is true that laryngeal muscles can open and close the vocal folds, as described earlier, opening and closing movements of vocal fold vibration are not produced by repetitive muscle contractions. To initiate phonation, a speaker brings her vocal folds together through the action of laryngeal muscles. Voice scientists refer to this action as adduction of the vocal folds. At the same time, the speaker develops a positive lung pressure, using the muscles and elastic properties of the respiratory system. Because the trachea contains air continuous with that of the lungs, pressure in the lungs and in the trachea, immediately below the closed vocal folds, is essentially the same. The positive tracheal (lung) pressure acts as a force against the closed vocal folds. This pressure overcomes the force of the muscles that adducted them and blows the vocal folds apart. Through a complex interaction of mechanical (e.g., elasticity of vocal fold tissue) and aeromechanic (e.g., air pressures and flows) forces, the vocal folds are displaced outward and then return to the midline, closing rapidly and firmly. With the vocal folds again in the closed position, this cycle of events is repeated, because the tracheal (lung) pressure is still positive and blows the vocal folds apart, each time they close. Recall from the first section of the chapter that the goal of speech breathing is to maintain a constant lung pressure during speech. In the present case, we can say the constant lung pressure is maintained as long as phonation is produced. Figure 10–9 shows a series of images from a highspeed digital recording of one cycle of vocal fold vibration. The instrument used to record these images is a sophisticated digital camera that records sequences of images in very rapid succession. The images are pho-

141

tographed from the reflection of the vocal folds in a laryngeal mirror, as previously described (see Figure 10–7). In each image, the front of the vocal folds is at the bottom, and the back of the vocal folds is at the top. The successive images are recorded so rapidly because a cycle of vocal fold vibration lasts only a fraction of second (e.g., in women about .005 seconds). One cycle of vocal fold vibration is defined as movement from the closed position (frame 1 in Figure 10–9) to the most open position (frame 5), and back to the closed position (frames 9 and 10). When the series of still images in Figure 10–9 is watched as a video, the highly complex motions of the vocal fold tissues are revealed (Google “vocal fold motion,” select “video,” and many options for viewing vocal fold motion are available (see Box, “Waving in the Wind”). The membrane that encases the muscular bulk of the vocal fold vibrates somewhat independently of the muscle, giving the motion of the vocal folds during phonation a rippling appearance across their top surface. In other words, during typical phonation the vocal folds do not vibrate like rigid pistons moving back and forth. Rather, the motions appear complex, almost wavy. This complex motion is what gives the human voice its unique sound. The pattern of vocal fold movement varies depending on factors such as the pitch of the voice and the smoothness or roughness of the voice quality

Waving in the Wind The opening and closing motions of the vocal folds during vocal fold vibration are controlled by aeromechanical forces. The opening and closing motions are not produced by rapid muscular “pulls and pushes”; in fact, the muscles cannot respond quickly enough to produce the extremely rapid motions of the vocal folds. The fact that aeromechanical forces control vocal fold vibration raises an interesting question about the motions (or lack thereof) of a paralyzed vocal fold. Vocal fold paralysis usually occurs only on one side (one paralyzed vocal fold, the other healthy). The paralyzed vocal fold still moves during phonation, because it is moved to the middle of the glottis and away from the middle by aeromechanical forces. However, the vibration is “floppy” and ineffective due to the loss of muscular control for vocal fold tension. Vocal fold paralysis is discussed at greater length in Chapter 18.

142

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

1

2

3

4

5

6

7

8

9

10

Bottom edge

Top edge

Figure 10–9.  Sequential photographs of one cycle of vocal fold vibration as imaged via a laryngeal mirror. The cycle begins in the upper left-hand frame, continues along the top row of images from left to right, and then along the bottom row of images, from left to right. The cycle is complete at the right-hand image in the bottom row. Frame 6 is labeled to show the upper and lower edges of the vocal fold; this illustrates that the vocal folds do not vibrate like a “solid” piston but have complex, wave-like motion. Adapted from Hixon, T. J., Weismer G., and Hoit, J. D. (2020). Preclinical Speech Science: Anatomy, Physiology, Acoustics, Perception (3rd ed.). San Diego, CA: Plural Publishing.

Characteristics of Phonation Three variables are useful in a description of phonation. The variables include fundamental frequency (F0), voice intensity, and voice quality. The variables are not necessarily independent from each other and are not the only ways to describe phonation. They are convenient for the purposes of this introductory discussion.

Fundamental Frequency (F0) Fundamental frequency, abbreviated as F0 (spoken as “F subzero” or “F-oh”), is the rate of vibration of the vocal folds, expressed in cycles per second. One complete cycle is defined as the motion of the vocal folds from the closed position to the open position and back to the closed position (see Figure 10–9). When 100 of these cycles occur in 1 s, then F0 = 100 cycles per second.

The typical F0 for young adult females is roughly 190 to 200 Hz (Hz, short for hertz, is the abbreviation for cycles per second, named after Heinrich Rudolf Hertz, a 19th-century German physicist). Each of the 190 to 200 cycles per second is similar to every other cycle, but every cycle is not exactly of the same duration. The description of vocal fold vibration as nearly periodic recognizes these slight differences (less than 1/1,000th of a second) in the durations of successive cycles. F0 varies with age and sex (as well as some other factors) and is the primary determinant of the pitch of a talker’s voice. Pitch is the perceptual correlate of the physical measurement, F0. The higher the rate of vibration of the vocal folds, the higher is the perceived pitch of the voice. This can be illustrated by comparing typical F0s and pitches of children’s, women’s, and men’s voices. Five-year-old children typically have F0s around 350 Hz (for either sex), young adult women have F0s of 190 to 200 Hz, and young adult men have

10  Speech Science I

F0s around 125 Hz. Thus, children at age 5 years typically have higher-pitched voices than adult women, who have higher-pitched voices than adult men. Why does the F0 differ so much across these three speaker groups? What causes the vocal folds of a 5-yearold child to vibrate at such a faster rate as compared to adult women, or the faster rates of vibration for women as compared to men? A simplified explanation to this complicated question is that, across speakers, the length of the vocal folds is a primary factor in the typical F0 of the voice (Titze, 2011). For example, children have shorter vocal folds than adult women, who in turn have shorter vocal folds than adult men.3 Figure 10–10 illustrates the difference in length of the vocal folds for adult males, adult females, and 5-year-old children (Rogers, Setlur, Raol, Maurer, & Hartnick, 2014). Notice that we have said that across speakers the length of the vocal folds is a primary determinant of F0. Within a given speaker, however, F0 is increased by contraction of a paired muscle that stretches the vocal fold. This is not a contradiction to the statement of vocal fold length as the primary anatomical reason for different F0s in children, women, and men. When a given speaker stretches his or her vocal folds, the tissue is not only longer but also thinner and more tense. These factors cause the vocal folds to vibrate at a faster rate (i.e., with a higher F0). The comparison across

143

children, women, and men assumes that the vocal folds in the respective age and sex groups are in the unstretched state. F0 is more than a marker of a person’s age and sex. A single individual uses a wide range of F0s for normal speech communication. The variation of F0 during normal speech is an important component of prosody, a term that refers to the melody and rhythm of speech. When you listen to a speaker who seems particularly expressive, you are probably reacting to (among other factors) relatively large variations in F0 throughout the speaker’s utterances. When you listen to the voice of someone who is exceptionally sad, her voice may sound as if it is produced on a single pitch, or nearly so. Speakers use F0 to convey emotion, to subtly change the meaning of the same words, to be sarcastic, and to be playful.

Intensity Intensity is a term used to describe the amount of energy in a sound wave. Intensity is a physical measure that reflects the amount of acoustic energy generated by the vibrating vocal folds. Unlike F0, it is difficult to offer “typical” values of voice intensity because they depend so much on characteristics of the communication situation, such as how much noise there is

7.5 mm

11 mm

13 mm

Front (toward thyroid cartilage)

View from above vocal folds

Vocal fold

Adult Male

Adult Female

5-year old child

Figure 10–10.  Vocal folds viewed from above, showing the different lengths of the adult male, adult female, and child (5 years old) vocal folds, as well as the sizes of laryngeal cartilages. 3 

 hese age- and sex-related differences in the length of the vocal folds follow differences in overall size of the larynx. Children have smaller T larynges (plural of larynx) than female adults, who have smaller larynges than male adults.

144

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

when speaking and the distance between the speaker and listener. The distance between a speaker and listener has an effect on the voice intensity reaching the listener’s ear. The sound waves produced by a speaker saying a sustained “ahhhh” have a sound intensity that can be measured at the lips. As shown in Figure 10–11, top

cartoon, the sound waves travel to the listener’s ear where they also can be measured for intensity. These measurements reveal that intensity measured at the listener’s ear is less than the intensity measured at the speaker’s lips. The decreasing intensities of sound waves are shown as the decreasing sizes of the arcs from the speaker to the listener. The fundamental prin-

Figure 10–11.  Cartoon illustrating the effect of distance on voice intensity. The person producing the vowel “ahhh” does so with the same speech intensity measured at his lips, in both the top and bottom cartoons. The intensity decreases steadily as the sound waves move away from him. At the listener’s ear, the intensity is less when she is at a greater distance from the speaker.

10  Speech Science I

ciple of acoustics that applies to the measurements is that sound intensity decreases over distance, from the source of the sound (in this case the speaker’s lips) to the receiver of the sound energy (in this case the listener’s ear). Figure 10–11, bottom cartoon, shows what happens when the distance between speaker and listener is doubled compared to the distance shown in the top cartoon. The speaker in the lower cartoon produces the same sound intensity as in the top cartoon. At this doubled distance, the sound intensity reaching the listener’s ear is even less than in the top cartoon. The greater the distance between speaker and listener, the greater is the loss of sound energy from source to receiver. Loudness is proportional to sound intensity, so the listener hears the “ahhh” much less well in the bottom cartoon than in the top cartoon, even though the speaker’s sound intensity at the lips, in both cases, is the same.

Commonsense Speech and Listener Therapy Parkinson’s disease (PD) is a degenerative disease of the nervous system in which movement functions, including articulatory and laryngeal movements, are affected. A prominent characteristic of the speech and voice disorder in PD is reduced speech intensity. Persons with PD have very stiff respiratory muscles and therefore cannot generate the lung pressure to produce speech with normal intensity. To compound the problem, vocal fold vibration is inefficient. This results in air leaking through the glottis, further affecting the buildup of pressure below the vocal folds that is so essential to generating adequate speech intensity. Speech-language pathologists often focus on training the person with PD to produce greater intensity so that listeners will hear them more easily. Especially with family members, SLPs can also suggest reducing the distance between the relative with PD and the listener. It sounds obvious, but listeners do not always get it unless they are told about distance and intensity. Plus, this kind of speech therapy is simple to implement, very effective, and best of all, free.

Quality Quality is a perceptual term, like pitch and loudness. The terms breathy voice, harsh voice, rough, and metallic voice, are some of the perceptual descriptions of voice quality. The physical basis of differences in voice quality includes variations in frequency, intensity, and the

145

presence of “noise” in the sound wave. “Noise” is an acoustic term indicating energy that is not periodic but has random frequency and intensity variations. To demonstrate the difference between the tonal quality of nearly periodic vibrations and the noisy quality of nonperiodic vibrations, say “ah” for a few seconds in a normal voice, followed by a long whispered “ah” sound. The “ah” said with a normal voice consists primarily of periodic energy — in fact, it is the periodic nature of the energy that makes the “ah” sound tonal. The whispered “ah,” conversely, does not sound tonal but rather has a hissing quality of uncertain pitch. The whispered “ah” sound is an example of noise energy.

Clinical Applications:  An Example When the vocal folds vibrate for phonation, they open and close rapidly at a very fast rate, as discussed earlier. Look again at Figure 10–9 and notice that at the beginning (frame 1) and end (frame 10) of this single cycle, the vocal folds are tightly closed. This is characteristic of healthy phonation, although there are variations in the population in which the closure is not so tight but the phonation remains perceptually “normal.” There are clinical cases, however, in which the voice is too breathy on the one hand, or too “tight” on the other hand. The underlying cause of these voices is often a failure to close the vocal folds adequately on each cycle of vibration (breathy), or a closure that is too fast and forceful (“tight”). An SLP who treats either one of these voice problems must know the structure and function of the vocal folds and determine a way to modify vocal fold closure. In the case of a chronically breathy voice, the SLP may teach the client to exert more effort when phonating, which often has the effect of a better approximation of the vocal folds during vibration. The person who closes the vocal folds too forcefully and has the “tight” voice may be shown how to relax neck muscles, or even be a candidate for an external (on the neck) laryngeal massage to reduce overall muscle tension that may be contributing to the excessive phonatory effort. Knowledge of the anatomy and physiology of the larynx is essential to considering the best options for voice therapy.

Upper Airway (Consonants and Vowels) Figure 10–12 shows an artist’s rendition of upper airway and nasal cavity structures. The view is in the sagittal plane, as if you are looking toward the side of the head.

146

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Nasal cavities

Soft palate Tongue Hard palate

Pharynx Vocal folds

Thyroid cartilage

Figure 10–12.  Sagittal view of the upper and nasal airways, with important landmarks labeled.

The upper airway can be thought of as two columns of air, one with variable shape, the other with relatively fixed shape. The air tube with variable shape is called the vocal tract and extends from the vocal folds to the lips. In Figure 10–12, this airway is filled with a dark blue color. The boundaries of the vocal tract tube include the walls of the throat, the soft and hard palates (the latter typically called the roof of the mouth), the lips, and the tongue. The length and shape of the vocal tract can be changed by the action of most of these movable structures (such as the throat, soft palate, lips, and tongue; the hard palate is not movable) plus the movements of the lower jaw (mandible) which is connected anatomically to the lower lip and tongue. Thus movements of the jaw can affect movements of the lips and tongue. The column of air with relatively fixed shape is called the nasal tract (see Figure 10–12, light blue). The shape of the nasal tract does not change much during speech because the structures forming its boundaries do not move, at least not to a significant degree. As described in greater detail in section “Velopharyngeal Mechanism,” during speech the nasal tract is intermittently connected to and disconnected from the vocal tract. Figure 10–12 shows the nasal tract connected to the vocal tract airways by the lowered soft palate (lighter red) and disconnected from the vocal tract airways (darker red). The mechanism of this connection/ disconnection is of the utmost importance for under-

standing normal speech production and certain speech pathologies.

Muscles of the Vocal Tract Approximately 30 muscles contribute to movements of the structures that change the length and shape of the vocal tract (Hixon, Weismer, & Hoit, 2020). Muscles of the vocal tract include about a half-dozen in the throat, three or four in the soft palate, about eight that can move and shape the tongue, and a dozen or so that can open and close the lips and shape their configuration. Combinations of these muscles, working at the same time and with intricate timing, create many different shapes of the vocal tract. Several muscles control the outlet of the nasal passages. These outlets, called the nares, can be flared and constricted during speech. The size of the nares may also be adjusted during breathing for such behaviors as singing and exercise.

Vocal Tract Shape and Vocalic Production The vocal tract can be considered as the “shaper of speech sounds” because the shape of the column of air between the vocal folds and lips determines the acous-

10  Speech Science I

tic characteristics of a speech sound. Different vocal tract shapes result in different acoustic sound waves emerging from the lips, which in turn result in the different sounds we call speech sounds. In some cases, a connection between the vocal tract and nasal cavities is made for a class of sounds called nasals (such as /m/ and /n/ in English). Recall that the source (phonation) for voiced speech is the sound produced by the vibrating vocal folds. This sound source is basically the same for all voiced, vocalic speech sounds such as vowels, diphthongs, nasals, liquids, rhotics, and glides; the phonated sound source is also used for voiced stops and fricatives. Different shapes of the vocal tract, and connections to the nasal cavities, modify the acoustic characteristics of the vibrating vocal folds to form different speech sounds. More specific information on individual speech sounds is provided in Chapter 12. Figure 10–13 shows vocal tract shapes for five vowels. These images were obtained from magnetic resonance (MR) scans published by Zhou, Woo, Stone, Prince, and Espy-Wilson (2013). The black parts in each image are airways. The vocal tract airway is seen extending from the level of the vocal folds (shown in the left-most image) to the lips, and the nasal airways are shown above the soft and hard palates. The airway below the vocal folds is the windpipe (trachea). Note the different shapes of the vocal tract among these five vowels. For example, compare the vocal tract shapes of /i/ and /u/. In the case of /i/, the front part

147

of the vocal tract, roughly from the middle of the hard palate to the teeth, is tightly constricted by the tongue; the slim black channel between the tongue and palate at the constriction shows the narrow, open airway. Behind the constriction, in the back of the oral cavity and in the pharynx, the large black area shows that the vocal tract opens wide into a large airway. The vowel /u/ has a tight constriction further back in the vocal tract, in the region of the soft palate and the upper pharynx. In front of this constriction, the vocal tract opens until a point at the lips where the vocal tract is again narrowed down. The difference between vocal tract shapes for /i/ and /u/ is easy to see, and consistent with the phonetic description of /i/ as a high-front vowel and /u/ as a high-back vowel (Chapter 12). Vocal tract shape differences between other vowels may be more subtle (compare /ɑ/ to /u/ in Figure 10–13) but are sufficient to result in different speech sounds.

Velopharyngeal Mechanism The velopharyngeal mechanism includes the soft palate and surrounding pharyngeal walls. The soft palate is a soft-tissue, movable structure of the speech mechanism with critical importance to speech production. Figure 10–12 shows the soft palate in two positions, one hanging down as it does at rest (lightly shaded, solid line following the contour of the structure) and

Velopharyngeal port

Vocal folds

/a/

/e/

/i/

/o/

/u/

“ah”

“ay”

“ee”

“oh”

“oo”

“hot”

“hay”

“heat”

“hope”

“hoop”

Figure 10–13.  Magnetic resonance images of an individual producing five different vowels. The black areas above the vocal folds are airways. The varying shapes of the airways from vocal folds to lips — excluding the airways in the nasal cavities — show how the vocal tract is shaped by the articulators for the different vowels. The openness of the velopharyngeal port also varies across the vowels. Adapted from Zhou, X., Woo, J., Stone, M., Prince, J. L., and Espy-Wilson, C. Y. (2013). Improved vocal tract reconstruction and modeling using an image super-resolution technique. Journal of the Acoustical Society of America, 133, EL439–445.

148

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

one raised and pushed against the back of the pharynx (colored red). The hanging-down position allows air to flow from the lower pharynx into the nasal cavity. The lifted position in which the velum is pushed against the back of the pharynx seals off the air in the pharynx from the air in the nasal cavities. The region around the soft palate and posterior pharynx is called the velopharyngeal port. The velopharyngeal port is a valve that can be opened or shut by action of a group of muscles that control the movement of the soft palate and pharynx. The velopharyngeal port is a critical component of the speech mechanism because some sounds require the port to be open (e.g., nasal sounds such as /m/ and /n/), some sounds require it to be completely closed (e.g., stop and fricative consonants), and some sounds are produced with the port partially open (e.g., certain vowels, see later in chapter). (Movements of the soft palate, which are frequent and rapid during speech production, can be seen at https://www.youtube.com/watch?v=T4KRbENmFDk; if that video is not available, search on “MRI speech” for additional relevant videos). The velopharyngeal port also plays a role in rest breathing and swallowing. The velopharyngeal port is typically open in rest breathing, during which most people exchange air through the nasal passageways. The velopharyngeal port must be closed

/b/

during swallowing to prevent movement of food and liquid into the nasal passageways (see Chapter 20). And, when you blow your birthday candles out, the velopharyngeal port is closed to prevent the escape of air through the nasal cavities which might reduce the flow of air through your lips and therefore reduce the likelihood of extinguishing all the candles (and having your wish come true). The closure of the velopharyngeal port for stops, fricatives, and affricates (collectively called obstruents, because in varying degrees they obstruct airflow) is necessary to build up air pressure in the vocal tract. The positive air pressure creates the conditions that are unique to the sound of these consonants (see Chapter 12). Stop consonants are produced when there is a complete blockage of the airstream flowing in the vocal tract. During this blockage, air pressure will build up, but only if the vocal tract is sealed at the velopharyngeal port to prevent leaks of air through the nasal cavity. This is illustrated in the left image of Figure 10–14, which shows closed lips (red pointer) and a closed velopharyngeal port (green pointer). This bilabial stop consonant (/p/ or /b/ in English) has a closed volume of air behind the blockage: one seal at the lips and the other at the velopharyngeal port. Pressure can build up behind the labial blockage for the unique sound

/ʃ/

Figure 10–14.  Left, closure of the vocal tract at the lips (red pointer) and velopharyngeal port (green pointer) for the stop consonant /b/. Right, tight constriction of the vocal tract in the palatal-alveolar region (red pointer) and closure of the velopharyngeal port for the fricative /ʃ/. Note the tight velopharyngeal port closure for both sounds, necessary for the buildup of air pressure behind the constrictions.

10  Speech Science I

characteristics of a stop consonant — a “popping” noise when the block is broken (in this case when the lips are separated). The right image of Figure 10–14 shows a constriction for the fricative /ʃ/ (as in the first sound of “shoe”). Fricative constrictions, or blockages, are not complete as in the case of stops but are sufficiently narrow to allow pressure to build up behind them. The /ʃ/ constriction is shown at one location by the red pointer to one part of the tongue, but as the image shows, the constriction is long and narrow. Pressure built up behind this constriction pushes air through it and creates a hissing noise (try producing a long “sh” and listen to the hissing noise). As in the case of stop consonants, a closed velopharyngeal port is required for efficient pressure buildup. The closed velopharyngeal port is shown clearly in the upper right of the /ʃ/ image. The lips, however, are open to allow the passage of air that emerges from the tight but not complete constriction.

Valving in the Vocal Tract and the Formation of Speech Sounds “Articulatory behavior for speech” refers to the positions and motions of the jaw, tongue, lips, pharynx, and velum, as well the contacts between these movable structures and the rigid boundaries of the vocal tract (such as the hard palate). As might be imagined, obtaining information on positions and motions of certain articulators, especially the tongue, velum, and pharynx, is exceedingly difficult because they are largely hidden within the oral and pharyngeal cavities. Speech scientists have developed several different techniques over the years to study motions of these hidden articulators. Many of these techniques involve x-ray and magnetic field technologies. In recent years, magnetic resonance imaging (MRI) techniques have been used to study speech movements. Regardless of the technique, one fact remains startlingly obvious: During speech, each of the articulators is in constant motion, making it nearly impossible to connect specific movements with specific sounds. A speech scientist cannot look at the motion of a particular articulator and identify where it begins its contribution to a sound (such as a vowel) and where it ends its contribution. For example, when a speaker says the word, electricity, we can represent its 11 component sounds — ee, l, eh, k, t, r, ih, s, ih, t, ee — as a series of discrete symbols (/əlEktrIsIɾi/, in phonetic transcription, Chapter 12), but the articulatory motions used to produce the word appear as a rapid, sometimes jerky blur of undifferentiated gestures. Good examples of speech move-

149

ments are available at https://www.you​tube.com/ watch?v=Nvvn-ZVdeqQ and https://www​.youtube. com/watch?v=ezOwCf835YA (enter “speech movement” as a key phrase for more examples). Knowledge of how articulatory movements relate to speech sounds is likely to be beneficial when attempting to treat a speech disorder in which the underlying problem is one of motor control (as in, for example, Parkinson’s disease or cerebral palsy: see Chapter 14). For the time being, our knowledge of the relationship of articulatory movements to specific speech sounds is very limited. It is a complex issue.

Coarticulation Coarticulation is the term used to describe the influence of one speech sound on another speech sound. For example, even though the words “sad” (/sæd/) and “sag” (/sæg/) share the vowel “ae” as in “hat,” tongue movements throughout the vowel (i.e., across time) differ because of the difference in the final stop consonant. The “d” sound affects the “ae” in one way, the “g” sound in another way. Now consider the word pair “sheik” (/ʃik/) and “shock” (/ʃɑk/), in which the final consonants are the same and the vowel is either “ee” or “ah.” Here the different vowels affect the articulation of the /k/; the movements and location of the complete constriction for /k/ differ depending on the vowel. In this case, the articulation of the consonant is affected by the preceding vowel; in the first case, the articulation of the vowel is affected by the following consonant. Coarticulation is everywhere in speech production. The articulation of any speech sound is simultaneously affected by the sounds preceding it and the sounds following it. Notice the word “sounds” in the preceding sentence. The influence of coarticulation is present not only for adjacent sounds, as in the examples of “sad” versus “sag” and “sheik” versus “shock,” but it can extend across multiple sounds. A good example of this is the word combination “least soon,” in which lip rounding for /u/ may be observed on the “st” consonant sequence in “least.” When a single sound — what is often called a sound segment — is affected by so many variables, no wonder it is nearly impossible to connect specific articulatory movements with specific sound segments.

Clinical Applications:  An Example Knowledge of the location and function of valves in the speech mechanism is essential to accurately diagnose

150

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The Weedy Garden of Speech So . . . articulatory movements for any specific speech sound, and their acoustic results, depend on the sounds preceding and following the specific speech sound. This is variability of a specific speech sound due to its phonetic context — the identity of the surrounding speech sounds. But wait: phonetic context is hardly the only variable that affects the movements and acoustic characteristics of a speech sound. The list of other variables that can cause variability of the articulatory and acoustic characteristics of a speech

and properly treat speech disorders. A good example is the case of a client who is perceived to speak with excessive nasality. In many cases, excessive nasality can be traced to a problem with control of the velopharyngeal port. Structural problems, as in the case of a repaired cleft palate, or muscle weakness in the absence of an obvious structural issue, can prevent the velopharyngeal port from closing sufficiently during the production of vowels. This may result in hypernasality. Specialized techniques for diagnosis and therapy can be used by the person who understands both the anatomical structure of the speech mechanism valves — in the example, the velopharyngeal port — and the physiology (function) of the valves.

Chapter Summary The speech mechanism can be thought of as consisting of three major components. The respiratory system is the power supply, generating pressures and flows. These pressures and flows initiate and maintain vibration of the vocal folds, which generates the sound source for speech. The acoustic signal generated by the vibrating vocal folds is shaped into different speech sounds by the moving structures and fixed boundaries of the upper airway. The respiratory system, larynx, and upper airways are composed of many muscles, membranes, ligaments, bones, and cartilages, all of which are covered in detail in speech anatomy and physiology coursework taken by students majoring in Communication Sciences and Disorders.

sound includes speaking rate (how fast a person talks), dialect, speaking style (e,g., casual versus formal), and the level of emphasis placed on the speech sound, age, and sex of the speaker. Speech scientists have spent years researching coarticulation and the variability that results from it, due to all of these variables. This research has produced many answers, but along the scientific way, new questions have popped up like weeds in a wellfertilized garden.

Knowledge of this basic anatomy and physiology is important to understanding the many disorders and diseases that affect the speech mechanism. The respiratory system functions to support life, exchanging O2 and CO2 at the alveoli; air is brought into the lungs by expanding them (inhalation) and transported out of the lungs by compressing them (exhalation). The respiratory system is composed of muscles in the thorax and abdomen, plus a large muscle (the diaphragm) that separates the thoracic and abdominal cavities; many nonmuscular tissues (membranes, bone, cartilage) play an important role in the respiratory system. Rest (vegetative) breathing involves inhalations and exhalations of small volumes and roughly equal durations. Speech breathing involves very rapid inhalations in preparation for an utterance and relatively long exhalations during which speech is produced. The main goal of the respiratory system is to maintain a constant, positive pressure in the lungs and trachea during speech utterances; this pressure is critical to vibrating the vocal folds for phonation. The pressure generated in the lungs (and trachea) for speech is related to the loudness of speech: the higher the pressure, the greater is the loudness. During speech, the lungs are compressed by the muscular actions of both the thorax and abdomen; abdominal muscles are critical to maintaining an increased lung pressure for speech at a constant level. The larynx is composed of a framework of cartilage, membranes, ligaments, and muscle; the bottom part of the larynx sits on top of the trachea, and its top part can be considered the lower end of the throat.

10  Speech Science I

The vocal folds are two bands of tissue that run from the front to the back of the larynx, between the thyroid cartilage (front point of attachment) and arytenoid cartilages (back point of attachment). There are five intrinsic muscles of the larynx. Three of these muscles can close the vocal folds, one can open the vocal folds, and one can stretch them. One of the muscles that closes the vocal folds also tenses the main muscular bulk of the vocal folds. The vocal folds open and close. When they open, the space between them is called the glottis, which is the opening into the trachea. The vocal folds open to allow air into and out of the lungs, via the trachea, for breathing purposes. The protective function of the larynx is to shut down the airway (to close the vocal folds rapidly and forcefully) when food, liquid, or other material enters the top part of the larynx. The vocal folds also open and close to vibrate for the purposes of phonation; phonation provides the sound source for speech, sometimes referred to more specifically as voice production. The opening and closing motions of the vocal folds for phonation are controlled by aerodynamic (pressures and flows) forces, rather than directly by muscle contraction. The rate at which the vocal folds vibrate depends primarily on the length of the vocal fold. The tissue of the human vocal fold is specialized for purposes of phonation. Important characteristics of vocal fold vibration include the fundamental frequency (F0 = rate of vibration), intensity of the sound generated by the vibration, and quality of the voice resulting from the vibration. The upper airways include the vocal tract, which is the tube of air between the vocal folds and the lips, and the nasal tract, the airway between the velopharyngeal port and the nostrils.

151

The articulators shape the vocal tract for vocalic sounds, in which the vocal tract is open and air can flow freely to the atmosphere, and a class of consonants called obstruents in which there is a constriction that completely or partially blocks the airflow for a short time interval. An important valve between the vocal and nasal tracts is the velopharyngeal port, where air coming through the throat can be blocked from entering the nasal cavities. The velopharyngeal port opens and closes during speech production, according to the requirements of the speech sound being produced.

References Darley, F. L., Aronson, A. E., & Brown, J. R. (1975). Motor speech disorders. Philadelphia, PA: Saunders. Hixon, T. J., Goldman, M. D., & Mead, J. (1973). Kinematics of the chest wall during speech production: Volume displacements of the rib cage, abdomen, and lung. Journal of Speech and Hearing Research, 16, 78–115. Hixon, T. J., Weismer G., & Hoit, J. D. (2020). Preclinical speech science: Anatomy, physiology, acoustics, perception (3rd ed.). San Diego, CA: Plural Publishing. Rogers, D. J., Setlur, J., Raol, N., Maurer, R., & Hartnick, C. J. (2014). Evaluation of true vocal fold growth as a function of age. Otolaryngology–Head and Neck Surgery, 10, 681–686. Su, M. C., Yeh, T. H., Tan, C. T., Lin, C. D., Linne, O. C., & Lee, S. Y. (2002). Measurement of adult vocal fold length. Journal of Laryngology and Otology, 116, 447–449. Titze, I. R. (2011). Vocal fold mass is not a useful quantity for describing F0 in vocalization. Journal of Speech, Language, and Hearing Research, 54, 520–522. Zemlin, W.R. (1997). Speech and hearing science: Anatomy and physiology (4th ed.). Boston, MA: Pearson Zhou, X., Woo, J., Stone, M., Prince, J. L., & Espy-Wilson, C. Y. (2013). Improved vocal tract reconstruction and modeling using an image super-resolution technique. Journal of the Acoustical Society of America, 133, EL439–445.

11 Speech Science II Introduction In the previous chapter, reference was made to the acoustic signal emerging from the vocal tract (or nasal tract, or both at the same time). For the remainder of this chapter, we refer to this as the speech acoustic signal. The speech acoustic signal is the product of respiratory, laryngeal, and upper airway behaviors described in the previous chapter. The current chapter presents a brief introduction to the speech acoustic signal. Speech production can be thought of as the concerted action of all structures of the speech mechanism to produce an acoustic signal that can be recognized by a listener as an intelligible message. The connection between the speech acoustic signal and speech intelligibility suggests the importance of knowing about speech perception as well. In the absence of a listener who performs perceptual analysis of the speech acoustic signal, the signal is a little like the proverbial tree that falls in a forest empty of hearing organisms. The human forest is full of hearing organisms, lots of them human, so speech perception is also considered in the current chapter. To avoid confusions between orthographic and phonetic representations of sounds, phonetic symbols (discussed in Chapter 12) are used here, usually following the orthographic and phonetic representation: “beat” /bit/, “hat” /hæt/. Orthographic representations are enclosed by quotation marks; phonetic

symbols are enclosed in forward slashes. For quick reference, the phonetic symbols used in this chapter are listed in Table 11–1.

Table 11–1.  Phonetic Symbols Used in This Chapter /ɑ/ “ah” in “hot” /i/ “ee” in “beat” /u/ “oo” in “boot” /æ/ “ae” in “hat” // “urr” in “bird” /2/ short “rr” in last syllable of “longer” /ɾ/ tapped “d” in “butter” /dZ/ “j” as in “judge” /z/ “zee” as in “zebra” /ð/ “th” sound in “the” /ɔ/ in “bought”* *The vowel /ɔ/ is pronounced as /ɑ/ in many dialects of American English, such as the dialect heard in many parts of California. Words such as “caught” and “cot” are pronounced with the same vowel (“caught” /kɑt/, “cot” /kɑt/). This is in contrast to the “caught” /kɔt/ and “cot” /kɑt/ heard in eastern Pennsylvania and eastern Maryland, among other regions of the United States.

153

154

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The Theory of Speech Acoustics Theories are common in all academic disciplines. It is relatively uncommon, however, for the majority of scientists in one discipline to agree on a particular theory. An exception to this is the acoustic theory of speech production, developed in the 1940s and 1950s by Dr. Gunnar Fant (1919–2009), an eminent speech scientist who for many years was the director of a well-known speech research laboratory in Stockholm, Sweden (the Speech Transmission Laboratory). In 1960, Fant published a textbook called (not surprisingly), Acoustic Theory of Speech Production. The book provided a detailed account of how the speech mechanism generated the speech acoustic signal. In most respects, this theory is accepted as correct by speech scientists.. The theory has been refined and elaborated by other speech scientists, most notably, James Flanagan (1972) and Kenneth Stevens (1998). Fant’s theory of speech acoustics can be summarized in a single sentence: The output of the vocal tract (that is, the speech acoustic signal) is the product of the acoustic characteristics of a sound source combined with a sound filter. Fant’s theory was developed using elegant mathematics that were most precise for the case of vowel production. Fortunately, explanation of the theory does not require expertise in mathematics.

The Sound Source The sound source in the acoustic theory of vowel production is the signal generated by the vibrating vocal folds. Vibration of the vocal folds generates an acoustic signal that has a fundamental frequency (F0) and a series of harmonics (sometimes called “overtones”). F0 is defined in Chapter 10 as the number of full cycles of vocal fold vibration per second. The speech acoustic signal for vowels includes higher-frequency components as well, called harmonics. The harmonic frequencies of F0 are located at whole number multiples of the F0, a fact best explained with a simple example. If an adult male phonates the vowel “ah” with his vocal folds vibrating with an F0 of 100 Hz, acoustic energy is generated at the F0 as well as at frequencies that are whole number multiples (2, 3, 4, 5, 6, . . . n) of the F0. This acoustic signal has energy at 100 (F0), 200, 300, 400, 500, 600, . . . n × 100 Hz. A periodic signal with multiple frequency components is called a complex periodic signal. Vocal fold vibration produces a complex periodic signal, and it is this signal that serves as “input” to the vocal tract — it is the source for the speech acoustic signal. It is also the signal that people refer to as “voice.”

The sound source is the same for all vowels; it is not adjusted for different vowels. This is an important aspect of the theory and is returned to following a description of the sound filter.

The Sound Filter In the acoustic theory of vowel production, the sound filter is the vocal tract. Fant showed that the vocal tract was like a resonating tube. Tube resonators have more than one resonant frequency, or frequencies at which they vibrate with maximal amplitudes. The air within a tube is an example of a multiple-frequency resonator. When the air within a tube is set into vibration, the acoustic result is a signal with multiple resonant frequencies. A good analogy to the acoustics of the vocal tract filter is the acoustics of organ pipes. Fant showed that the precise frequencies at which these resonances occur depend on the shape of the tube. Different shapes of the vocal tract tube, created by different positions of the jaw, tongue, lips, and throat (pharynx), result in different resonant frequencies. This was illustrated in Chapter 10 by the MRI images in Figure 10–13. Selected vocal tract shapes for three different vowels are shown in Figure 11–1 in a more schematic way. The shape of the vocal tract — the air column between the vocal folds and the lips — is shown in dark blue for the English vowels /ɑ/ “ah” (as in “hot”), /i/, “ee” (as in “beat”), and /u/ “oo” (as in “boot”). The shapes are different for the three vowels. The different shapes change the resonant frequencies of the vocal tract, even though the source signal remains the same for all three vowels. The different shapes of the vocal tract produce different sounds — they create different vowels. The acoustic result of different vocal tract shapes is shown in Figure 11–1 by the three graphs in the right column. These are vowel spectra, showing energy peaks (the y-axis, labeled amplitude) as a function of frequency (the x-axis, with frequency ranging between 0.0 and 5.0 kHz (0 Hz to 5000 Hz). The vowel spectra are ordered from top to bottom just like the vocal tract shapes: /ɑ/ at the top, /i/ in the middle, and /u/ at the bottom. The peaks in each spectrum show the first three resonant frequencies of each vowel; these peaks are indicated by pointers from the labels “F1,” “F2,” and “F3.” Notice how the peaks are at different frequency locations for the three vowels. The frequency of these peaks can be estimated by dropping a vertical line from a peak to the x-axis and noting where the line intersects the frequency scale. The use of the “F” to identify the peaks is explained below.

11  Speech Science II

155

F1 F2 F3

F2

F3

/i /

Amplitude (dB)

F1

/ɑ/

F1 F2

/u /

F3

1.0

2.0

3.0

4.0

5.0

Frequency (kHz) Figure 11–1.  Vocal tract shapes (left ), shown in dark blue, for three vowels (/ɑ/, /i/, /u/). On the right side are spectra associated with the vowels. F1, F2, F3 indicate peaks in the spectrum, called “formants.” Note how the peaks in these spectra occur at different frequencies. Those differences are related directly to the differences in vocal tract shapes for the three vowels.

Vowel Sounds Result From the Combination of Source and Filter Acoustics Now that the acoustic characteristic of the source and filter have been identified, a more specific statement of Fant’s theory can be made. The speech acoustic signal results from a source whose frequencies and

amplitudes are “shaped” by the resonant frequencies of the vocal tract filter (vocal tract tube). The source has energy at a series of frequencies, and the different shapes of the vocal tract “pick out” different frequencies to emphasize or reject. The signal coming from the vocal tract consists of frequencies in the source that are emphasized by the resonant frequencies of the vocal tract tube. This brings us back to the single-sentence

156

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

statement of Fant’s theory given previously: The output of the vocal tract (that is, the speech acoustic signal) is the product of the acoustic characteristics of a sound source combined with a sound filter.

Form and Function Students typically do not encounter the word “formant” until they have lectures on speech acoustics. The word is used to describe speech spectra, and specifically, the peaks in the spectrum (frequencies at which sound intensity is the greatest). Resonators such as organ pipes have spectra with multiple peaks too, but these are not called “formants,” they are called — well — peaks. “Formant” has a Latin root that means “forming.” The idea behind the word “formant” is something being formed, in this case, a resonant frequency resulting from the formation of a vocal tract shape. The word was coined to designate a resonance that was formed by the vocal tract — it is a speech-specific term.

The difference in vowels is primarily a result of their different resonant frequencies. This can be demonstrated in a simple way. Figure 11–2 shows a spectrogram, a type of visual record of the speech signal. The sentence, “The dude that dotted the deed was Dad” was spoken by the author into a microphone connected to a computer. A speech analysis program called TF32 (Milenkovic, 2001) generated the spectrogram and was used to edit and analyze this speech signal. Spectrograms, which are related (but not identical) to the voiceprints used in legal cases of speaker identification, show time on the x-axis and frequency on the y-axis. In Figure 11–2, frequency is marked in steps of 1.0 kilohertz (1000 Hz) from 0 to 4.0 kHz (4000 Hz). On the x-axis, time is marked by a series of hash marks, each separated from the next by 1/10th s (100 ms). The sentence shown in this spectrogram contains examples of the “corner” vowels of English. Vowels are classified by phoneticians according to three categories, two of which describe the position of the tongue in the vocal tract (Chapter 12). The vowels “oo” /u/, “ah”

/ɑ/

/u / “The

Resonant Frequencies of Vowels Are Called Formants:  Spectrograms

dude

that

/æ /

/i /

dotted

the

deed

was

Dad”

4.0

Frequency (Hz)

3.0

2.0

F3 F2

1.0

F1

baseline 100 ms

Time

Beginning

End

Measurement point

Figure 11–2.  Spectrogram showing the first three formant frequencies for the corner vowels of American English, in the sentence, “The dude that dotted the deed was Dad.” The approximate formant frequencies for /æ/ are marked by a short red horizontal bar halfway between the beginning and end of the vowel. Formant frequencies can be estimated for the other vowels in the same way. Starting from the bottom of the frequency (y-axis) scale, the first dark band is F1, the next one F2, and the next one F3 (see labeled formants for “oo” /u/). The red contours for F1, F2, and F3 of /ɑ/ show formant movement (change of formant frequencies over time) throughout the vowel.

11  Speech Science II

157

Speech Scientists Vow to Study Vowels Many speech scientists have studied the acoustic characteristics of vowels. Vowels are fascinating for several reasons. Some languages use just a few vowels, others use relatively many. For example, Greek and Spanish are languages with five vowels, in contrast to the 12-vowel system of English. Differences between the vowel systems of two languages have a significant influence on the ability of a native speaker of one language (say, Spanish) to produce “native-sounding” vowels of a second language (English, for example). Spanish has the vowel /i/ as in the English word “beat,” but does not have the English vowel “ih” /I/ as in the word “bit.” These two English vowels have very similar vocal tract shapes and, as expected, similar formant frequencies. The vowel /I/, however, is quite rare in languages other than English. So, when a native speaker of Spanish (or of Greek, or Korean, or Japanese, as well as many other languages) is learning English and attempts to say a word such as “bit,” he or she is likely to produce the vowel in an /i/-like way (i.e., to say something closer to “beat” than “bit”). It is as if

/ɑ/, “ee” /i/, and “ae” (as in “hat”) /æ/, marked in Figure 11–2, represent the extremes of English vowel articulation — they define the “corners” of the space within which all vowels are articulated. The highest and most back vowel is /u/, the lowest and most back vowel /ɑ/, the highest and most front vowel /i/, and the lowest and most front vowel /æ/. Formant frequencies distinguish different vowels, so the distinctions should be obvious in an acoustic record of these vowels. In Figure 11–2, parts of the spectrographic display appear as dark bars. An example of these bars is found above the vowel /u/ in the word “dude.” Arrows labeled “F1,” “F2,” and “F3” point to the first, second, and third dark bars above the baseline of the spectrogram (bottom of the spectrogram). These bars indicate the regions where there is maximum energy in the signal — that is, they are the formant frequencies of the vowel. The first three formant frequencies for each vowel can be estimated by making an eyeball measurement halfway between the beginning and end of each vowel, as marked for the formant frequencies of the vowel /æ/ in “Dad.” The upward pointing arrows show the beginning and end of the vowel. The red line is placed

the second-language learner absorbs an unknown vowel type like /I/ to a vowel type in her native language (in this case, /i/). Another interesting example of an English vowel that is unusual in other languages is the /æ/ “ae” sound in words such as “cat,” “bat,” and “and.” This English vowel may not be as challenging for native speakers learning English as a second language, possibly because their native languages do not have vowels close to /æ/, eliminating the competition, so to speak. But /æ/ is also interesting in American English because it is produced in so many different ways depending on a talker’s dialect group. Some talkers — such as in Wisconsin — produce this vowel in a very distinctive way, almost as if they are saying “kyat” for “cat.” Other talkers, especially young adults from the east and west coasts (and in parts of the country’s interior), produce this vowel as something between the vowel of “cat” and of “cot.” Vowel changes such as this are common in languages of the world, adding yet another reason why speech scientists tend to go ga-ga over vowels and their acoustics.

in the middle of the vowel’s duration: halfway between its beginning and end. To measure a formant frequency, a single point in time must be chosen for the measurement, because formant frequencies change throughout a vowel (see later in text). The short horizontal lines intersecting the vertical line show where the formant measurements are made at the halfway point of the vowel. For example, the red cross for the F1 frequency of /æ/ is below 1000 Hz (1.0 kHz) a little less than half of the difference between 1000 Hz (1.0 kHz) and the baseline. Referring to the frequency lines for the eyeball estimate, this places the F1 around 600 Hz. The F2 frequency is about one third of the way down from 2000 Hz to 1000 Hz; its frequency appears to be about 1700 Hz. The F3 cross is halfway between 3000 Hz and 2000 Hz; its frequency can be estimated by eye as 2500 Hz. How accurate are the formant frequency estimates for the vowel /æ/? Comparison data from Hillenbrand, Getty, Clark, and Wheeler (1995) for a large group of adult males who produced the vowel /æ/ show an average F1 = 588 Hz, F2 = 1952 Hz, and F3 = 2601 Hz — not bad for our eyeball estimates. The first three formant frequencies for /æ/ can be used as reference points for comparison with the

158

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

first three formant frequencies of the other corner vowels. For example, compared to /æ/, /u/ clearly has a lower F1 frequency, a lower F2, and a lower F3. The vowel /i/ is even more different than /æ/, with a very low frequency F1 and the highest F2 of all the corner vowels. The precise values of the formant frequencies are not of concern to the present discussion. Rather, Figure 11–2 demonstrates that vowels are distinguished from each other by the frequency values of their formants, even by crude (eyeball) estimates of the frequencies.

The Tube Model of Human Vocal Tract Makes Interesting Predictions and Suggests Interesting Problems The part of Fant’s theory that considers the vocal tract a resonating tube, with multiple resonant frequencies, makes an interesting prediction. Tubes that are exactly the same in every way except for their length differ in their resonant frequencies. Casual familiarity with the construction of a pipe organ, with pipes (tubes) of many different lengths, suggests that the shorter pipes produce higher-pitched tones, the longer pipes lowerpitched tones. The higher-pitched tones of short as compared with long organ pipes result from the higher resonant frequencies of shorter pipes. The vocal tract tube is like a resonating pipe and is subject to the same acoustical principles. A shorter vocal tract has higher resonant frequencies than a longer vocal tract. Five-year-old children have shorter vocal tracts than adult women, who in turn have shorter vocal tracts than adult men. Fant’s theory predicts that when the same vowel is spoken by children, women, and men, the formant frequencies are highest for children, next highest for women, and lowest for men. This prediction has been confirmed many times in the research literature (Hillenbrand et al., 1995; Peterson & Barney, 1952). Earlier it was noted that perceptual differences between vowel sounds can be explained by their different formant frequencies. If vowel formant frequencies for a given vowel depend on who is speaking, there is the interesting problem of how listeners perceive the same vowel spoken with such different acoustic characteristics. For example, Table 11–2 shows the average F1, F2, and F3 frequencies reported by Hillenbrand, Getty, Clark, and Wheeler (1995) for the vowel “eh” (as in “head”) spoken by relatively large groups of adult males, adult females, and children aged 10 to 12 years. The values of the first three resonances, or formants, for this vowel are clearly different depending on the speaker group. More specifically, as the vocal tract

Table 11–2.  Average Values of the First Three Formant Frequencies (F1, F2, F3, in Hz) for the Vowel /E/ “eh” (as in “head”) as Spoken by 45 Adult Males, 48 Adult Females, and 46 Ten- to Twelve-Year-Olds

Decreasing Vocal Tract Length Adult Men

Adult Women

Children

F1

580

731

749

F2

1799

2058

2267

F3

2605

2979

3310

Note. The data show how the formant frequencies for the same vowel vary depending on who is producing the vowel. Values reported by “Acoustic Characteristics of American English Vowels,” by J. M. Hillenbrand, L. A. Getty, M. J. Clark, and K. Wheeler, 1995, Journal of the Acoustical Society of America, 24, pp. 175–184.

shortens, formant frequencies increase, just as predicted from Fant’s theory. No one has quite figured out how these differences are heard as the same vowel, even though it is a central problem in speech perception research (see later in this chapter).

A Spectrogram Shows Formant Frequencies and Much More The continuous red lines in Figure 11–2 show contours of the first three formant frequencies for the vowel /ɑ/ from its beginning to end. The red contours show that the formants move up and down the frequency scale throughout the vowel. For example, F1 for /ɑ/ starts at a relatively low frequency, rises to its highest frequency close to the end of the vowel, and then decreases. Similarly, the F2 contour falls in frequency from its beginning and then rises a small amount. As noted in Chapter 10, during speech the articulators are in constant motion which create vocal tract shapes — and therefore formant frequencies — that are constantly changing over time. Formant transitions are the acoustic result of the constantly changing vocal tract shapes. The speech acoustic signal reflects everything taking place in the vocal tract as a speaker produces speech. Courses in acoustic phonetics cover information on the acoustic characteristics of different speech sounds, voice qualities, and prosody (melody of voice). This information is critical to understanding how listeners perceive speech and its various meanings. It is also critical to designing computers programs for the synthesis and recognition of speech.

11  Speech Science II

Letters and Sounds Are Not the Same Phonetic transcription symbols underscore the difference between the speech sounds that are actually produced compared with letters in an orthographic representation of a word. The spelling of “dotted” has a double “t” at the end of the first syllable, which may lead to the expectation of the speech sound /t/. Most speakers of American English (as compared with English spoken in the United Kingdom) do not say /dɑtəd/, however, or even /dɑdəd/. Instead, they produce the middle consonant as a flap (Chapter 12), which is a very short (~25 ms or 0.025 sec) /d/-like sound, much like the one in “butter” or “sitter.” This sound, whose phonetic symbol is /ɾ/, is a good example of the difference between a letter (orthographic) and phonetic representation of words.

Speech Synthesis Scientists have been trying to make machines talk for many years, perhaps even centuries. Speech synthesis is the term used to describe the production of speech-like sounds, words, and sentences by machines. In the early, modern days of speech synthesis, these machines were large, awkward electronic devices. Computers now generate high-quality synthetic speech using sophisticated software. If you are interested in the history of speech synthesis, a good place to start is the historical account published by Dennis Klatt (1987), a pioneer in the field. Enter the phrase, “history of speech synthesis,” in a search engine to get a listing of many sites devoted to the topic, some with audio examples of synthesized speech from 1939 to the current time. Story (2019) has published a more recent account of the history of speech synthesis. The quality of speech synthesis improved dramatically in the 1960s and 1970s. Much of this improvement can be attributed to studies of the acoustic characteristics of speech sounds, as produced by real talkers. These studies were often completed using spectrographic analysis, as in Figure 11–2, or with more sophisticated acoustic analysis tools. As knowledge of the speech acoustic characteristics of human speech became more detailed, programs for the synthesis of speech were improved, and a better “product” emerged from these talking machines. Increases in computing power and capacity allowed more complicated speech synthesis programs to synthesize higher-quality speech. Modern

159

speech synthesis is so good as to be nearly indistinguishable from genuine human speech. Acoustic phonetics research made this possible. Relatively cheap, high-quality speech synthesis is more than an academic exercise for geeks who like speech research. Speech synthesizers are an important communication option for persons who cannot communicate orally because of neurological disease. Speech synthesis by computer offers a wonderful option for persons who, despite a neurological status that prevents speech production, have sufficient control of their hands or another part of the body to issue commands for the rapid synthesis of a message. For example, some children with cerebral palsy cannot speak intelligibly but can control a joystick or a head pointer, which in turn can be used to control a speech synthesizer. Speech synthesis is truly a case where basic research has been translated to clinical application.

Speech Recognition The flip side of speech synthesis is called automatic speech recognition. This is the process of converting human speech into text, or into an action (e.g., operating a door) based on speech commands. The use of automatic speech recognition for simple decision-making is familiar to everyone who has called a business (such as an airline) and been led through a maze of speech-guided options to achieve a goal (like talking to a human, for example). These speech recognizers operate in a relatively simple way, storing a limited number of acoustic phonetic “templates” for words such as “yes” and “no,” numbers from one to ten, or the letters of the alphabet (“em,” “dee,” “ex”). Not surprisingly, the greater the number of options to be distinguished, the more complex the speech recognizer must be for recognition success. One component of modern speech recognition devices is an acoustic analysis program that determines the sounds that make up words. This capability is based on the speech acoustic work described above, in which the acoustic characteristics are determined for each speech sound. Although modern speech recognizers use much more information than just the acoustic characteristics of speech sounds to “hear” speech accurately, all recognizers must incorporate information on these characteristics for successful performance.

Speech Acoustics and Assistive Listening Devices Speech acoustic characteristics are important because they play a critical role in the design and programming

160

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

of assistive listening devices such as hearing aids and cochlear implants. These devices, described in Chapter 24, function as amplifiers and filters. They contain a microphone to sense the speech signal, which is “processed” by electronic components contained within the device. The processing may include selection of certain frequencies to be amplified, and suppression of other frequencies. What is the rationale for amplification of certain frequencies and suppression of others? The most obvious answer is that the device is configured to amplify frequencies most crucial to the understanding of speech. Much of the acoustic phonetics research has been devoted to determining the important frequency characteristics of human speech sounds. Acoustic phonetics research informs the engineer who designs hearing aids or cochlear implants about optimal processing characteristics to achieve the best speech perception performance.

the people, characteristics, and actions to which the speech refers. But this question is not an answer to “how.” Rather, it is a statement of what happens, not an explanation. A deeper question, the question of how speech perception takes place, is what the listener’s brain does with the acoustic signal to hear words and determine the meaning of a speaker’s utterances A brief consideration of some major issues in speech perception research demonstrates that understanding the “how” of the process is challenging. In the discussion that follows, the term “objects of speech perception” is used. This refers to the possible goal of the brain in perceiving speech. What is the final product of the brain’s processing of the incoming speech acoustic signal?

Speech Perception

A very old but enduring theory of speech perception is that part of the brain is devoted exclusively to speech perception (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). This mechanism is thought to be speech- and species-specific. The term speech-specific means that the special brain mechanism is “turned on” when a speech acoustic signal enters the ear. The special mechanism is not used for general auditory perception, such as music, the sound of breaking glass, or the rustling of leaves that suggests the approach of a summer storm. The term species-specific means that only humans are endowed with this special mechanism, and that the special part of the brain for speech perception evolved over thousands of years to create a communication link with speech production. The theory is called the motor theory of speech perception. In its simplest form, the theory states that speech is perceived by reference to articulation. The idea is that speakers produce a speech acoustic signal that encodes (like encryption) the articulator movements that produced the signal. When the speech acoustic signal enters the ear and the brain of the listener, the code must be broken to perceive the sounds produced by the speaker. The special mechanism in the brain performs this decoding. In the motor theory, the role of the speech acoustic signal is to carry the code that allows the brain to unlock the underlying articulation that produced the signal. Why did the motor theorists reject the speech acoustic signal as a reliable source of the perceptual identification of speech sounds? In the view of the motor theorists, the speech acoustic signal for any given sound was too variable and therefore not reliable as a cue to the identity of the sound. The acoustic characteristics of /b/, for example, vary depending on

Many speech scientists, experimental psychologists, and linguists throughout the world devote their research careers to speech perception. As often happens when many scientists work on a single problem, different camps have been developed to defend one of several theories that seek to explain how humans perceive speech. All theories of speech perception require that the acoustic characteristics of speech sounds and perhaps of larger “units” (such as syllables, or perhaps even whole words) be processed by the human auditory system. Children presumably learn a great deal about the acoustic characteristics of speech as they are developing language skills. At some point, children learn to use their knowledge of the acoustic characteristics of speech sounds and prosody (the melody and rhythm of speech) to recognize words, emotional states of talkers, and more subtle aspects of communication such as anger or joking. Knowledge of speech acoustic information is also likely to play a role in the development of speech production skills. The best evidence for this is the connection between “prelingual,” severe hearing loss (loss suffered before the age of about 5 years) and speech production that is significantly unintelligible. Persons with severe hearing loss are likely to produce speech that is difficult to understand. Clearly, the speech acoustic signal plays an important role in normal and disordered communication. Many issues have been investigated to understand the perception of speech. The question may seem trivial: after all, as infants we are exposed to speech and language and like many other skills we master, speech perception is learned by connecting heard speech with

The Perception of Speech: Special Mechanisms?

11  Speech Science II

such factors as the phonetic context in which the consonant is produced (e.g., /bit/ “beat” versus /bæt/ “bat”), who is producing the sound (e.g., men, women, children), how quickly the sound was produced (e.g., the speaking rate), as well as other variables (see the earlier section, “The Theory of Speech Acoustics”). The variable of the speaker is especially instructive. As discussed earlier, the vocal tract lengths of men, women, and children are different, so the acoustic characteristics of all sounds depend to a significant degree on who is doing the speaking. The motor theorists asked, “How does a human store all the many acoustic characteristics of a single sound? Their answer was, “The speech acoustic signal for a speech sound may be highly variable, but the articulatory movements are not. All speakers — men, women, children — produce (for example) a /b/ by closing the lips and closed velopharyngeal port so pressure can be built up in the vocal tract.” Hence, the idea was formed of focusing on the articulatory characteristics of speech sounds as the “objects” of speech perception. The coupling between speech production and speech perception is critical to this theory. Over the course of evolution, the special encoding/decoding process served the needs of human communication, which depends equally on the performances of a speaker and a listener. The specialized coupling is relevant to the ability to produce and hear speech

161

sounds. In other words, the theory did not have much to say about how meaning is obtained from speech perception. The implications of the motor theory are interesting. First, because the special brain mechanism for speech perception is an outcome of human evolution, it must be present in the brains of human infants. Of course, even specialized mechanisms that are part of the human brain endowment are likely to develop throughout childhood, but nonetheless, the neural hardware is there and presumably ready for operation. In fact, there is a good deal of research (as reviewed in Galle & McMurray, 2014) that shows infant speech perception for sound category distinctions such as /p/ versus /b/ to be similar to the same distinctions perceived by adults. This finding seems to be consistent with the idea that the speech perception mechanism is a special part of human brains, whether infant or adult. Another implication is that animals should not be able to make speech sound category distinctions similar to those observed in humans. Animals do not talk, or at least they do not articulate sequences of speech sounds in a human-like way. According to the motor theory of speech perception, animals should not hear phonetic distinctions because they do not produce phonetic distinctions. This makes sense from the evolutionary perspective as previously discussed. Because (according to the motor theory) speech production and speech perception co-evolved in humans, the absence

Alex’s Magic Trick The hypothesized match of the brain mechanisms for speech perception and speech production is complicated by talking parrots (as well as other talking birds like mynahs and parakeets [budgerigars]). Some of you may be familiar with Alex (1977–2007), the African grey parrot written about extensively by the MIT scientist Irene Pepperberg and her colleagues (Patterson & Pepperberg, 1998). Alex talked a lot, had a huge vocabulary, and had a remarkable ability to imitate a wide range of sounds, including speech. Alex was not the only African grey to produce words and sentence; in fact, they have the reputation of being chatterboxes and of continuously learning utterances over the course of their long lifetimes. African greys and other parrots may “articulate” speech, partially by moving their tongues, but the evidence for this is scant or based on rather artificial experiments (Becker, Nelson, & Suthers, 1994; Patterson & Pepperberg, 1994). Parrots do not have a sound source (like the vibrating vocal folds)

modified by moving articulators to create different acoustic signals. Watching an African grey’s beak and tongue as he says a word or sentence, you will not see much movement, certainly not the kind required to produce consonants. You cannot see their lips move because, well, they do not have lips. Where, then, does the magic happen? Like many birds, African greys have a syrinx, a complex anatomical structure deep in their chests (Figure 11–3). This is where the magic happens, but exactly how it occurs is not clear. The anatomy of the syrinx is understood well (Habib, 2019). It is located in the chest and composed of muscularly controlled valves that are “powered” by an air sac, somewhat like the human lungs. African greys present a problem to the motor theory of speech perception because they do not have the speech production mechanism that, over the course of evolution, would evoke a special speech perception mechanism. Hmm. Learn more about Alex at https://en.wikipedia.org/wiki/Alex_(parrot)

162

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

These kinds of concern about the motor theory of speech perception (and there are other concerns: see Hixon, Weismer, & Hoit, 2020) convinced a group of researchers to develop theory and data that support the use of general auditory mechanisms in the perception of speech. Auditory theories of speech perception are built around the idea that the very sophisticated auditory and general computing capabilities of the human brain are well suited to make phonetic distinctions: no special, species–mechanism is required.

The Perception of Speech: Auditory Theories

Figure 11–3.  Photo of African grey parrot with an artist’s image of the syrinx superimposed on the parrot’s chest. The syrinx is located deep in the parrot’s chest. It consists of muscles and membranes, as well as cartilage, that form valves that can be opened and closed in complex ways to produce human-sounding speech as well as a wide variety of other sounds. This 29-year-old’s name is Friday, and he lives with the author’s brother and his spouse. Friday says a lot, even producing utterances that are not mimicked but are contextually “correct.” These utterances include saying “goodnight” when the lights go out, calling out the dog’s name when the dog is present, and saying “hello” when the phone rings. Friday also has lots of other sound productions ​ — ​he does a microwave beep, he whistles, he barks, he does bodily noises really well, and, of course, does bird whistles and bird chatters. All this with an open beak and no lips.

of speech production skills in animals predicts the absence of human-like phonetic perception. If only it were that simple. Experiments have shown that animals such as Japanese quail (Kluender, Diehl, & Killeen, 1987) and chinchillas (Kuhl & Miller, 1975; Kuhl & Miller, 1999) perceive phonetic distinctions in much the same way as humans. This is a difficulty for the motor theory of speech perception, for the reasons stated previously. Why should animals perceive these distinctions in the same way as humans if they, animals, do not have the special speech perception mechanism in the brain — how could they have this mechanism when they lack speech production capabilities?

Auditory theories of speech perception are based on a simple idea. The auditory mechanisms used for the perception of any acoustic event are used to perceive speech. There is no special mechanism required. Auditory regions of the brain, as well as areas associated with these regions, are sufficiently sophisticated to process and identify phonetic events. Auditory theories are not free of problems. An auditory theory is only as good as the speech acoustic data presented for auditory processing. For example, an auditory analysis that distinguishes between a /p/ and a /b/ must have some brain representation of the acoustics of the two sounds, and especially how the acoustics differ between the two. The only way the brain can have such an acoustic representation is to learn what makes a /p/ and what makes a /b/. This learning, presumably in the earliest stages of speech and language acquisition, must be based on a consistent speech acoustic characteristic that is always produced when a person — any person — says a /p/ or /b/ (or any speech sound). And, if the speech acoustic signal demonstrates this consistency for each speech sound in a language, where in the brain are these representations stored, and what is the form of the stored data? Speech scientists have entertained the idea that the brain stores templates (also called prototypes) of the acoustic characteristics of speech sounds (sometimes, of the acoustic characteristics of syllables). Templates can be thought of as “ideal” representations of these speech sound acoustic characteristics. When a speaker produces a /b/, for example, as the first sound in the word “badgers” (/bæ2z/), the speech acoustic signal entering the auditory system is compared with all the stored templates to determine a match or mismatch. A match to the /b/ template is equivalent to perception of the /b/ sound. In auditory theories of speech perception, the “objects” of perception are the acoustic characteristics of speech sounds, perhaps represented in the brain as acoustic templates.

11  Speech Science II

Matching Speech Perception and Speech Production Without a Special Mechanism An influence of speech perception skills on speech production skills does not require a special mechanism to match the two. But, speech perception skills may still play a crucial role in speech production skills. After all, the speech acoustic signal you produce is heard not only by a listener, but by the producer as well. The speech perception mechanism is your monitor for the quality of your speech production. This idea has application in the speech and hearing clinic, where children with delayed mastery of speech sounds (see Chapter 15) are sometimes thought to have poor auditory representations (i.e., templates) of the speech sounds they produce incorrectly. The clinical application is called auditory training, or ear training, and is implemented by stimulating the child’s auditory system with multiple, correct repetitions of the incorrectly-produced sounds. Improvement in speech perception skills is assumed to result in improved auditory representations of the incorrectly-produced sounds, which become the basis for improved speech production skills.

Is it true that each of the many speech sounds within a language has unique and consistent speech acoustic characteristics no matter who produces the sounds? Some scientists say “yes” to this question (e.g., Diehl, Lotto, & Holt, 2004), and others say “no” (e.g., Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). Other scientists place themselves somewhere between these two positions, arguing that speech perception is based on articulatory gestures but assisted by speech acoustic data (Fowler, Shankweiler, & StuddertKennedy, 2016).

Motor Theory and Auditory Theory:  A Summary Figure 11–4 presents a schematic summary of the primary difference between the motor theory of speech

163

perception and an auditory theory of speech perception. Part A shows the speech acoustic signal as the input to the auditory system. The speech signal is the sentence, “The blue dot is a normal dot.” There are 19 speech sounds in this sentence — in the phonetic transcription of the sentence (/DɔbludɑtIzeI nɔrmldɑt/),1 each symbol is assumed to be a separate speech sound. The middle image is a view of the surface of the left hemisphere of the brain, widely considered as the hemisphere that contains the tissue used to produce and perceive speech. The oval encloses the perisylvian speech and language areas, including Broca’s and Wernicke’s areas as well as auditory cortex (see Chapter 2). The right side of Figure 11–4A presents a “black box”2 containing the special mechanism that decodes the encoded speech acoustic signal. The decoded signals for each speech sound are the articulatory movements that produced the speech sounds. The objects of speech perception in the motor theory are these articulatory movements. A similar summary diagram is shown in Figure 11–4B for an auditory theory of speech perception. The left and middle images are the same as in Figure 11–4A; the input is the speech acoustic signal and the relevant brain areas indicated by the tissue within the oval. On the right side of the figure is a single sound segment, the vowel /ɑ/ in “spot” (/spɑt/), which is analyzed by auditory mechanisms that are not specialized for speech perception. The auditory cortex analyzes each sound, which it delivers to a long-term memory bank of templates, each one representing a different sound segment. Using /ɑ/ as an example, the auditory analysis is delivered to a mechanism that compares the analysis to various vowel templates. The best match — in this case the acoustic template for /ɑ/— is perceived. In an auditory theory of speech perception, the objects of perception are these best matches to acoustic templates that contain the ideal acoustic characteristics for each sound segment.

Top-Down Influences:  It Is Not All About Speech Sounds Perhaps the most obvious way to imagine how we perceive speech is to assume that each sound — whatever the object of speech perception may be — is analyzed as it comes into the auditory system. When these sequential analyses (one analysis per speech sound)

1

/eI/ is italicized to indicate that this diphthong (as in the word “take” /teI k/) is considered a single sound even though it is represented by two symbols.

2 

“ Black box” is a term used to designate an unknown mechanism for a hypothesized or known process. In this case, the black box is the special mechanism for perceiving speech.

A

Frequency (kHz)

1.0

2.0

3.0

4.0

1.0

2.0

3.0

4.0

The

The

spot

is

a

blue

spot

is

a

dot

Broca’s area Primary auditory cortex

Wernicke’s area

normal

dot

Broca’s area

Primary auditory cortex

Wernicke’s area

Auditory theory of speech perception

normal

Speech acoustic input to auditory system

blue

Speech acoustic input to auditory system

Motor theory of speech perception

Time (ms)

ɑ

Special brain mechanism transforms acoustic input to articulatory representations. The articulatory representations are the objects of speech perception.

Black box

Figure 11–4.  Schematic summary of the primary difference between the motor theory of speech perception and an auditory theory of speech perception. A. Motor theory, showing the speech signal input (left ), the left hemisphere speech and language areas (middle), and the “black box” special mechanism for converting the acoustic signal into an articulatory representation (right ). The conversion is a speech- and species-specific mechanism. In the motor theory, the objects of speech perception are the articulatory events that produced the acoustic input. B. Auditory theory, left and middle images same as in (A), right image showing a vowel identified by its acoustic properties. General auditory mechanisms perform the analysis. The objects of speech perception are the acoustic characteristics of each speech sound.

B

Frequency (kHz)

164

11  Speech Science II

are completed. the sounds are put together to identify the spoken word. For example, recognition of the word “badgers,” which consists of the five sounds /b/, /æ/, /dZ/, /2/, and /z/, takes place by analysis of the sequence of sounds, followed by a process that groups the sequenced sounds to determine if they form a word. A process such as this requires very large storage in long-term memory of all the acoustic sound sequences that can form words. Each time an auditory (acoustic) analysis is performed on a sequence of sounds, the grouping of a specific sequence can be matched to a word pattern in memory, if it exists. There is good evidence that the process just described is not the way word recognition happens. Listeners are not passive perceptual beings — they are active in the process of speech perception. Listeners use the acoustic analysis of the first or second sound of a word and then begin to search quickly all the words known to them that may be good “word recognition candidates.” This process is called “searching the lexicon.” When this search produces a likely word match, the listener moves on to the acoustic analysis of the next word and another active word search as previously described. An important aspect of this process is that listeners make word identification choices before they have completed the acoustic analyses of all the sounds in the word (an excellent review of these findings is found in Gelfand, Christie, & Gelfand, 2014). The word choices can be made this way because other aspects of the communication setting, including the words already recognized, the conversational setting, and even the very general topic being discussed, help a listener to find the word being analyzed — to predict it, in a sense — before all the component sounds have been analyzed. When you think about it, this is a much more efficient way to perceive speech compared with an analysis of all sounds in a word before a word decision can be made. The use of various kinds of processes and knowledge, such as searching a specific part of the lexicon (e.g., all words beginning with /b/), using situational context, the topic under discussion, and other sources of information to guide word-choice decisions, is called top-down processing. It is “smart” perceptual processing, much more efficient than a passive process of analyzing incoming data in steps and building up a perception. This latter approach is called bottom-up processing, and most scientists believe that as a primary psychological process it is a poor model for any form of perception, including speech perception. In the example previously given, in which the acoustic analysis of the first one or two speech sounds of a spoken word is used to initiate a focused, active lexical search, both bottom-up and top-down processes are used. The bottom-up part of the process is the initial

165

acoustic analysis, and the top-down part is the lexical search supplemented by other sources of knowledge that contribute to word identification.

Speech Intelligibility “Speech intelligibility” describes the degree to which speech is understandable. The idea of a degree of intelligibility is appropriate. Speech can be intelligible or unintelligible by degrees; it is not an either-or phenomenon. Everyone has had the experience of not quite understanding what someone has said or catching just a few words of an utterance that is for the most part unintelligible. Speech intelligibility is an outcome of the processes of speech perception regardless of the theory of speech perception to which you pledge allegiance. The degree to which an utterance is intelligible depends on many variables. Let’s first assume that an utterance is perfectly intelligible when spoken in a quiet environment. In a noisy room, the intelligibility of the same utterance may decrease to a degree depending on the level of the noise. Speak that same utterance over a wireless transmission system (e.g., a cellphone) with limited quality, and the intelligibility may be affected (especially if the listener has a hearing loss — see later in this chapter). These examples are based on a hypothetical reference utterance with perfect intelligibility, heard in an optimal listening environment. Other variables may affect speech intelligibility as well, as is the case when the person speaking the utterance has a speech impairment or the listener has a hearing loss. Speech intelligibility tests have been developed to measure the degree of intelligibility loss among speakers or listeners with communication impairments. For example, the speech of individuals with cleft palate, neurological disorders, and congenital hearing impairment are unintelligible to various degrees. Speech intelligibility tests, in which the listeners have normal hearing, provide an index of the individual’s intelligibility deficit. This index can provide objective data to track progress due to surgery (as in the case of cleft palate), speech therapy (as in the case of neurological disease), and amplification (as in the case of speakers with hearing loss who are fitted with hearing aids). Speech intelligibility can be measured using scaling techniques, word lists, and sentences. In addition, intelligibility for specific speech sounds has been measured using phonetic or orthographic transcription. Scaling techniques use a number scale, such as a 7-point scale with 1 = least impaired and 7 = most impaired. Another version of scaling, called Visual

166

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Analog Scaling, shows the listener a continuous line with numbers ranging from 1 to 100. One end of the scale is defined as completely unintelligible and the other end as completely intelligible. The listener hears an utterance and operates a slider to place a pointer on the scale that corresponds to the perceived degree of intelligibility. Word and sentence tests are often-used measures to index speech intelligibility. Listeners hear a list of words or sentences and write down (or enter into a computer) what they heard. As one example, when 50 words are presented and the listener writes the correct word for 40 of them, speech intelligibility is indexed as 80%. Sentence intelligibility works the same way. In a sentence test having an overall number of 100 words, correct orthographic transcription of 50 words is indexed as 50% speech intelligibility. In some cases, an index of the percentage of speech sound intelligibility is desired. Long passages of speech such as reading or conversation are transcribed orthographically or using phonetic transcription to obtain a count of the number of sounds that are heard correctly for the entire passage. For example, a measure called Percentage of Consonants Correct (PCC) (Shriberg, Austin, Lewis, McSweeney, & Wilson, 1997) is a ratio of the number of consonants heard correctly to the total number of consonants in the passage (PCC = percentage of consonants correct/percentage of consonants in the passage). PCC is used frequently to document phonetic development and disorders in children. Overall, speech intelligibility measures are useful for indexing the degree to which a person’s speech or hearing loss affects the transmission of information between speaker and listener. Speech-language pathologists and audiologists value these measures due to their straightforward clinical application.

Chapter Summary The theory of speech acoustics, formulated by Gunnar Fant, states that the output of the vocal tract (that is, the speech acoustic signal) is the product of the acoustic characteristics of a sound source (the vibrating vocal folds) combined with a sound filter (the vocal tract). The sound source consists of energy at the F0 (the rate of vibration of the vocal folds) plus energy at harmonic frequencies, which are whole-number multiples of the F0. The sound filter can be described as the resonant frequencies of the vocal tract, which change depending on the shape of the tract. The vocal tract, like any tube (pipe) resonator, has multiple resonant frequencies.

The shape of the vocal tract is changed by motions of the jaw, tongue, lips, and pharynx. Vowels can be described acoustically by the first three resonant frequencies of the vocal tract tube. Speech scientists call these resonances formants. Different vowels are heard because different shapes of the vocal tract produce different formant frequencies. Because the vocal tract resonates like a tube, or pipe, shorter vocal tracts have higher formant frequencies; longer vocal tracts have lower formant frequencies. This explains why children have higher formant frequencies than adult women, who have higher formant frequencies than adult men. Speech acoustics is important to assistive listening devices, theories of speech perception, speech synthesis and recognition, and language development. The motor theory of speech perception states that speech is perceived by a special, species-specific mechanism in the human brain; the objects of speech perception are the articulatory movements that generated the acoustic signal. Auditory theories of speech perception state that the speech acoustic signal for any speech sound is sufficiently stable to be analyzed reliably by general auditory mechanisms; the objects of speech perception are the acoustic characteristics of speech sounds. Many speech perception theorists believe that the sound-by-sound analysis of the speech acoustic signal entering the auditory system is supplemented by topdown processes. Top-down processes are essential to an efficient speech perception process; the listener’s knowledge and expectations allow her to identify words before the completion of the sound-by-sound analysis. Speech intelligibility measures use scaling techniques or word and sentence lists to estimate a person’s ability to hear speech or the effect of a speech disorder on the ability of others to perceive their speech. These tests are applied frequently in clinical settings.

References Becker, G. J., Nelson, B. S., & Suthers, R. A. (2004). Vocal-tract filtering by lingual articulation in a parrot. Current Biology, 14, 1592–1597. Diehl, R. L., Lotto, A. J., & Holt, L. L. (2004). Speech perception. Annual Review of Psychology, 55, 149–179. Fant, G. (1960). Acoustic theory of speech production. The Hague, the Netherlands: Mouton. Flanagan, J. L. (1972). Speech analysis, synthesis, and perception (2nd ed.). Berlin, Germany: Springer-Verlag. Fowler, C. A., Shankweiler, D. P., & Studdert-Kennedy, M. (2016). “Perception of the speech code” revisited: Speech is alphabetic after all. Psychological Review, 123, 125–150.

11  Speech Science II

Galle, M. E., & McMurray, B. (2014). The development of voicing categories: A quantitative review of over 40 years of infant speech perception research. Psychonomic Bulletin and Review, 21, 884–906. Gelfand, J. T., Christie, R. E., & Gelfand, S. A. (2014). Largecorpus phoneme and word recognition and the generality of lexical context in CVC word perception. Journal of Speech, Language, and Hearing Research, 57, 297–307. Habib, M.B. (2019). New perspectives on the origins of the unique vocal tract in birds. PLoS Biology, 17, e3000184. https://doi.org/10.1371/journal.pbio.3000184 Hillenbrand, J. M., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 24, 175–184. Hixon, T. J., Weismer, G., & Hoit, J. D. (2020). Preclinical speech science: Anatomy, physiology, acoustics, perception (3rd ed.). San Diego, CA: Plural Publishing. Klatt, D. H. (1987). Review of text-to-speech conversion for English. Journal of the Acoustical Society of America, 82, 737–793. Kluender, K. R., Diehl, R. L., & Killeen, P. R. (1987). Japanese quail can learn phonetic categories. Science, 237, 1195–1197. Kuhl, P. K., & Miller, J. D. (1999). Speech perception by the chinchillas: Voice-voiceless distinction in alveolar plo-

167

sives. Science, identification functions for synthetic VOT stimuli. Science, 190, 69–72. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & StuddertKennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431–461. Milenkovic, P. (2001). TF32 [Computer software]. Madison, WI: Author. Patterson, D. K., & Pepperberg, I. M. (1994). A comparative study of human and parrot phonation: Acoustic and articulatory correlates of vowels. Journal of the Acoustic Society of America, 96, 634–648. Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of the vowels. Journal of the Acoustical Society of America, 24, 175–184. Shriberg, L. D., Austin, D., Lewis, B. A., McSweeney, J. L., & Wilson, D. L. (1997). The percentage of consonants correct (PCC) metric: Extensions and reliability data. Journal of Speech, Language, and Hearing Research, 40, 708–722. Stevens, K. N. (1998). Speech acoustics. Cambridge, MA: MIT Press. Story, B. H. (2019). History of speech synthesis. In W.F. Katz and P.F. Assmann, The Routledge handbook of phonetics, 31–55. London, UK: Routledge.

12 Phonetics Introduction Speech sounds are the phonetic components of language (Chapter 3). Speech sounds are often referred to as speech sound segments to indicate that words can be broken down into their individual, component sounds. In American English, examples of speech sounds include the vowels in words such as “bead” and “dot” (“ee” and “ah,” respectively), the nasals in words such as “Mom” and “never,” and the “f” sound in “rough.” In these examples, the speech sounds do not always match the orthographic representations of the words. The “ah” in “dot” is represented orthographically by an “o,” and the “f” in “rough” is represented by “gh.” The mismatch between orthography and sound does not apply to all languages; in languages such as Finnish and Japanese, the match between the written and spoken form of words is very good (but not always perfect). Phonetic transcription is a tool for representing speech sounds by means of a special set of symbols. A trained transcriber uses the symbols, drawn from the International Phonetic Alphabet (IPA), to record the sounds of speech independently of the words they form. High-quality phonetic transcription requires extensive training. Speech-language clinicians and researchers who have this training make extensive use of phonetic transcription to generate a record of produced or perceived speech sounds. For example, a speech-language pathologist (SLP) who generates a record of a child’s phonetic inventory — all the sounds

the child produces — uses phonetic transcription to generate the record. Or, a researcher who studies dialect variation within a language uses phonetic transcription to record all the sound variants in different dialects of a specific language. A good example of this is the many ways the “eh” sound in words such as “bed,” “head,” and “lead” is spoken in American, British, Scottish, and Irish English, as well as among dialect variations within any of these languages. The term “phonetics” is often broken down into three subareas. These areas are articulatory phonetics, acoustic phonetics, and perceptual phonetics. Articulatory phonetics is the study of speech movements associated with speech sounds. For example, a speech scientist may be interested in documenting tongue movements for the vowel /u/ in words (as in “boot”) and how the movements are modified by variables such as speaking rate, voice loudness, and speech style (casual or formal). Acoustic phonetics is the study of the acoustic characteristics of a vowel like /u/ in words spoken in different speaking conditions (Chapter 11). Articulatory and acoustic phonetics are not the same thing, because knowledge of the articulatory characteristics of a speech sound does not guarantee precise knowledge of the acoustic characteristics of that same sound. Finally, perceptual phonetics is the study of how listeners hear articulatory and acoustic characteristics of speech sounds. For example, a vowel such as /u/ may be heard differently when spoken in a word at a fast versus slow speaking rate. 169

170

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

This chapter deals with all three types of phonetics, but its focus is on perceptual phonetics, because the IPA is a tool for recording heard sounds. Nevertheless, an IPA symbol that represents a heard sound implies something about the articulatory and acoustic phonetics of the sound, as discussed in the material of this chapter. The IPA is a universal tool for all people who study languages and are interested in phonetic descriptions. No matter the dialect, the language, the potential speech and/or language disorder, even the sounds produced by babbling infants, the IPA is meant to be universally applicable and usable for the transcription of speech sounds produced by any speaker. The purpose of this chapter is not to make the reader a transcription user. Rather, it presents the concepts that support use of the IPA for transcription of speech sounds and provides examples of transcriptions. A “convention” (a rule we can all agree on) is followed in this chapter when IPA symbols are used. When the symbol is not the same as its English orthography counterpart, the symbol is accompanied by a word example that includes the sound. For example, /ʃ/ “shave”, /E/ “bet”, /dZ/ “jazz.” Cases in which the orthography and phonetic symbol match, such as the /p/ in “pack,” are not followed by a word example.

International Phonetic Alphabet The history of the IPA extends back to the late 19th century, when it was developed for precisely the reasons noted before — to create a universal symbol system for the speech sounds of the world’s languages. Over the years, the IPA has been revised several times for better accuracy as well as addition of sounds that were not included in the original version. In the early 1990s, the IPA was adapted by the International Clinical Linguistics and Phonetics Association (ICPLA) for specific use in clinical settings (see Shriberg, Kent, McAllister, & Preston, 2019).

Vowels and Their Phonetic Symbols An inventory of vowels in languages of the world is shown in Figure 12–1.1 The four sides of this diagram form a vowel quadrilateral. The vowel quadrilateral shows (theoretically) where vowels can be articulated using tongue heights and tongue forward-backward positions within the oral cavity. This is illustrated in 1

High Mid-High

Front

Central

i y

ɨ ʉ

Low

ɯ u

ɪ ʏ

ʊ

e ø

ɘ ɵ

ɤ o

ə ɜ ɞ

Mid Low-Mid

Back

ɛ œ æ

ʌ ɔ

ɐ a ɶ

ɑ ɒ

Figure 12–1.  The International Phonetics Association vowel diagram. Vowels of American English are enclosed by red circles. The tongue height dimension of the quadrilateral is shown on the vertical axis and the tongue advancement dimension (front-back) on the horizontal axis.

Figure 12–2, where the quadrilateral is superimposed on the oral cavity, and the “corner vowels” are indicated by IPA symbols (see later in chapter). Different vowels can also be made by adjusting the shape and length of the lips. In Figure 12–1, phonetic symbols for American English vowels are circled in red. Table 12–1 lists these phonetic symbols, each of which is paired with a word containing the vowel sound. Vowels are typically categorized using three descriptors: tongue height, tongue advancement, and lip rounding. The goal of the following discussion is not to promote learning of the IPA symbol system, but rather to use the phonetic symbols to make broader points about phonetics and its application to language studies and speech disorders.

Tongue Height (High Versus Low Vowels) Tongue height is a description of the height of the tongue relative to the fixed boundary of the hard palate (roof of the mouth). High vowels have a tongue position very close to the hard palate, and low vowels have a tongue position relatively far from the hard palate. The upper left and right panels of Figure 12–3 show the height of the tongue for two superimposed vowel pairs — /i/ versus /æ/ (left) and /u/ versus /ɑ/ (right). The tongue heights for /i/ and /u/ are very close to the hard palate and are called high vowels; in fact, these

 echnically, there are more vowels than those shown in Figure 12–1. Some languages like Japanese use differences in duration for the same T vowel sound as different phoneme categories. For example, Japanese has a short /i/ “ee” that contrasts with a long /i/.

12 Phonetics

Upward movements of the mandible carry the tongue to the hard palate, in the direction of high vowels. In many cases, the movements of the tongue and mandible are in the same direction, and much of the difference in vowel height is due to mandible movements with smaller contributions of tongue movement. The IPA description of vowels is based on tongue positions, so mandible positions are not considered further.

u

i æ

171

a

Vowel Phonetics Trivia

Figure 12–2.  The human vocal tract (the airway from the vocal folds to the lips) with the vowel quadrilateral superimposed on the oral cavity. The quadrilateral shows the area in the oral cavity in which the tongue moves for different vowels.

Table 12–1.  List of Phonetic Symbols for Each American English Vowel, Together With a Word Containing the Vowel Sound

Front Vowels

Central Vowels

Back Vowels

/i/ (beet)

/ə/ (agree)

/u/ (boot)

/I/ (bit)

/2/ (brother)*

/U/ (book)

/e/ (bait)

// (bird)*

/o/ (boat)

/E/ (bet)

/ / (buck)

/æ/ (bat)

/ɔ / (bought) /ɑ / (bog)

Note. The “r” colored vowels (identified by asterisks) are not included in Figure 12–1.

vowels are the highest of the American English vowels. In contrast, tongue heights for /æ/ and /ɑ/ are relatively far from the hard palate and are called low vowels; these two vowels are the lowest of the American English vowels. Note in Figure 12–1 how other American English vowels fall between the highest and lowest vowels along the vertical axis. Vowel height differences are made not only by the tongue, but also by the mandible (lower jaw). The tongue is attached to the mandible, but not completely.

The sentence, “The vowel quadrilateral shows (theoretically) where vowels can be articulated using tongue heights and tongue forward-backward positions within the oral cavity” is a simplification of vowel phonetics. It is more accurate to say that the diagram shows the vowels of the world’s languages that are currently known to function in a categorical fashion (hence, “theoretically”). “Categorical fashion” means the vowels can function as phonemes, which are described in Chapter 3. Certainly, vowel-like sounds may be produced “between” some of the symbols shown in Figure 12–1 but, to date, they have not been identified or confirmed as actual phonemes in a language. In addition, the true vowel inventory of the world’s languages includes lip articulation, changes in the shape of the pharynx, and possible changes in voice (laryngeal behavior). Two famous phoneticians, Peter Ladefoged and Ian Maddieson, estimated that a truly complete inventory of vowels in languages of the world may total 100 to 200 vowels (Ladefoged & Maddieson, 1996). The vowel diagram in Figure 12–1 reflects a 2018 update by the IPA; the known phonetics of the world’s languages are apparently always in transition and accumulating.

Tongue Advancement (Front Versus Back Vowels) Tongue advancement is a description of the extent to which the tongue is forward or back in the vocal tract. As shown in the two bottom panels of Figure 12–3, the tongue blade and tongue dorsum — the parts of the tongue extending approximately 20 to 25 mm behind the tongue tip — can be placed as far forward as the front of the hard palate. For the most back position, the tongue blade/dorsum is pulled back about 15 mm from the most front position. Vowels with a tongue blade/

172

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

[i]

[u] [ɑ]

[æ]

[ɑ]

[i]

[u]

[æ]

Figure 12–3.  Tongue positions for /i/ versus /æ/ and /u/ versus /ɑ / (left and right upper panels) showing differences in tongue height between front (/i/-/æ/) and back (/u/-/ɑ /) vowels. Tongue positions for /i/ versus /u/ and /æ/ versus /ɑ / (left and right lower panels) showing differences in tongue advancement between high (/i/-/u/) and low (/æ/-/ɑ /) vowels.

dorsum forward in the vocal tract are called front vowels; vowels with a tongue blade/dorsum pulled back are called back vowels. These forward-backward positions of the tongue blade/dorsum are illustrated in the lower two panels of Figure 12–3 for /i/ versus /u/

(left), and /ɑ/ versus /æ/ (right). The tongue position for the vowel /i/ is more advanced (more front) than the tongue position for /u/, as is the tongue position for /æ/ compared with the tongue position for /ɑ/. The difference for the latter pair of vowels is subtle.

Small Movements, Big Effects Vowels are produced in a mini-world of tongue position differences. The same point on the tongue — such as on the tongue blade — may differ in position by no more than 20 mm for the most front high vowel (/i/) and most back vowel (/u/). Measure 20 mm, and you will see what we mean;

these differences in position are surprisingly small for the big effect of easily hearing the difference between an /i/ and /u/, or between any other pair of vowels in which the position differences are even smaller. For readers more comfortable in the world of inches, 20 mm is about 0.8 inch.

12 Phonetics

Lip Rounding Lip rounding is a description of the configuration of the lips for vowel production. In many languages, some vowels are described as rounded, in contrast to other vowels described as unrounded. Rounded vowels are produced with a narrow opening between the lips, and sometimes with the lips protruded, to create a narrow air channel. For example, most (if not all) textbooks on the phonetics of American English describe /u/ as a rounded vowel. Ask a friend — preferably someone in your grandparents’ generation — to say word pairs like “boot”-“beat” or “food”-“feed.” Observe the contrast between the two vowels in the formation of the speaker’s lips. Make the observations from the front (to see the narrow opening between the lips for /u/) and from the side (to see the protruded lips). Other vowels in American English, most notably /U/ (“book”) and /o/ (“boat”), are also described as rounded, although not to the same degree as /u/. The remaining vowels of American English are considered to be unrounded.

Vowels of American English Compared With Vowels of Other Languages Figure 12–1 shows a total of 28 vowel symbols, of which 12 are used to describe the vowels of American English. America English has a relatively “dense” vowel system,

This Is Not Your Grandparent’s Vowel Why ask a grandparent to speak the words? And why not include /æ/ in the corner vowels? Let’s take the grandparent question first. Across generations, there are always changes in the way sounds are produced. Take /u/, for example. College-age speakers especially, and even people approaching age 40 years, have done two things to this vowel. One, they have moved the tongue forward to produce it, in the direction of /i/. Two, they have stopped rounding their lips for the vowel. So, if you ask a friend to do the “boot”“beet” “food”-“feed” thing, they may not show much lip rounding for /u/. Second, let’s consider /æ/. It may be a corner vowel in English, but it is so odd — pronounced so differently in different dialects, and absent from many vowel inventories in languages of the world that it may be better to leave open the question of the lowest, most forward vowel position in American English. Take a look at the /æ/-/ɑ/ contrast in Figure 12–3; it is not easy to see much of a difference between the tongue positions for these two vowels.

173

meaning it has a lot of vowel categories relative to many other languages. Greek, for example, has only five vowels, as does Spanish. These languages, like many others, have “sparse” vowel systems. The relative density or sparseness of a vowel system has no effect on communication efficiency within a native language. American English has several vowels that are relatively rare among languages of the world. These include /I/ “bit,” /E/ “bet,” /U/ “book,” and /æ/ “hat.” These vowels often present pronunciation problems for adult speakers learning English whose native language (including such languages as Greek, Spanish, French, Korean, and Mandarin Chinese) does not include at least one of these vowels. Finally, in virtually all languages of the world, the most extreme vowels — the highest and most front, the highest and most back, and the lowest and most back — define the limits of vowel articulation. These vowels are /i/, /u/, and /ɑ/, respectively (see Figure 12–2). In theory, these vowels enclose the remainder of the vowels in any vowel system.

Tricky Vowels Vowels are tricky. Uniform agreement among phoneticians on the sound associated with a particular phonetic symbol may be difficult to obtain. However, whatever disagreements exist are probably few in number. Organize a group of 50 phoneticians in a room, show them a vowel symbol, and ask each phonetician to say the sound. Your author believes 90% of the symbols in Figure 12–1 will elicit the same vowel from each phonetician. The trick is, what does “same” mean in this context? Most American English vowels are produced in many different dialects — ​ does the vowel /E/ (“bet”) sound the same in Northern, Southern, and Western dialects? Almost certainly not. But listeners can assign the spoken vowel to the same category — they are all examples of the /E/ in “bet” — even when hearing the dialect-bound difference between them. Vowels are tricky for additional reasons as well. Take the vowels /ɑ/ and /ɔ/, for example. These are separate vowel categories in dialects spoken in places such as Philadelphia and Baltimore (e.g., /dɑk/ “dock” versus /dɔg/ “dog”) but not so much in places such as Southern California /dɑk/ “dock” versus /dɑg/ “dog”). So, in SoCal, are the vowels in “cot” and “caught” the same or different? And what about the Bostonian or Pittsburgher who says /hɔt/ “hot” when almost everyone else in the country says /hɑt/ “hot”? Vowels are tricky.

174

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Consonants and Their Phonetic Symbols An inventory of selected consonants in languages of the world is shown in Figure 12–4. Some phonetic symbols for consonant sounds in languages other than American English have been excluded from this image to simplify the accompanying discussion. Selected nonAmerican English consonants are included to make a specific point. Consonants are categorized by the IPA using three major descriptors: place of articulation, manner of articulation, and voicing. In Figure 12–4, American English consonant symbols are shown in red. Selected consonants from other languages, that are not found in American English, are shown inside a green box.

American English consonants are made with constrictions at the lips (/p/, /b/, /m/, and /w/2); between the upper teeth and lower lip (/f/, /v/); at the teeth or between the tongue and the teeth, called dentals /ɵ/ “think,” /D/ “these”; just in back of the teeth, called alveolars (/t/, /d/, /s/, /z/, /n/, /l/, and /ɹ/ [like the “r” sound in “rose”])3; along the hard palate behind the alveolars — hence postalveolars (/ʃ/ as in “shave,” /Z/ as in “beige” or “azure”); in the velar region (/k/ and/g/, with the constriction close to the location where the hard palate joins with the velum); and at the level of the vocal folds (glottis) (/ʔ/, like the first sound in “ever” with emphasis on the first syllable, and the sound heard at the end of words such as “right” in Cockney English; and /h/). American English consonants are not produced at the retroflexed, uvular, or pharyngeal places of articulation.

Place of Articulation

Manner of Articulation

Consonants are made by forming a constriction somewhere in the vocal tract, between the lips and the vocal folds. The location of the constriction is the consonant’s place of articulation. Place of articulation is shown on the horizontal axis, from left to right, with “bilabial” at the left-most extreme, and “glottal” (at the vocal folds) at the right-most extreme.

If place of articulation is the “where” of consonants, manner of articulation is the “how.” Manners of articulation in American English include stops, fricatives, affricates, nasals, approximants, and flaps (taps). Consonants can have different manners of production, even at the same place of articulation. Examples in American English include the different manners of articulation

Manner of Articulation

Place of Articulation Bilabial

Stop Fricative

p b ɸ β

Labiodental

Dental

Alveolar Postalveolar

t d f v

θ ð

s z

Affricate Nasal Approximant Flap

m w

n l ɹ ɾ

ʃ ʒ tʃ dʒ

Retroflex

ʈ ɖ ʂ ʐ

Palatal

ɲ

Velar

Uvular

k g x ɣ

q ɢ χ ʁ

ŋ

ɴ

Pharyngeal

Glottal

ʔ h

ɦ

j

Figure 12–4.  Chart showing selected phonetic symbols for consonants in languages of the world, adapted from the most recent revision from the International Phonetics Association. Place of articulation is on the horizontal axis, from the bilabial place (left-most column) to the glottal place (right-most column). Manner of articulation is shown on the vertical axis, from top-to-bottom in the order stops, fricatives, affricates, nasals, and flaps (taps). Consonants in American English are shown by red symbols. Selected consonants from other languages are enclosed by a green box. When two consonants have the same place and manner of articulation (e.g., /t/ /d/), the first consonant is voiceless, and the second is voiced. 2

/w/ is a special case because it technically has two places of articulation, one at the lips and the other similar to the high-back vowel /u/.

3

/ɹ/ and /l/ are also special cases because they can have places of articulation different from “alveolar” yet be heard as correct versions of these sounds.

12 Phonetics

produced at the alveolar place of articulation. Two stop consonants (/t/, /d/) are alveolars. Stops, also called plosives, are produced by a brief, complete constriction, blocking the airstream from lungs to atmosphere for a brief time interval. Brief, in this case, means about 0.06 to 0.1 s (60–100 ms). Because the velopharyngeal port (the passageway between the throat and nasal cavities, which can be open or shut) is completely closed during this constriction, air pressure is built up and released suddenly, creating the signature stop “pop” when the constriction is released. Put your hand close to your lips, say /ɑpɑ/ “ahpah,” and feel the puff of air when the /p/ is released into the second /ɑ/. Two fricatives (/s/, /z/) also have an alveolar place of articulation. Fricatives have a tight constriction, but not the airtight constriction characteristic of stops. Air from the lungs flows to the fricative constriction and results in a pressure buildup behind it, which forces air to flow through the narrow constriction passageway. As air is forced through the constriction, it makes a hissing noise. This hissing noise is a signature characteristic of fricatives. Produce an /s/ for a few seconds, and you will hear the hissing noise. Try the same exercise with other American English fricatives (e.g., /f/, /T/ “think,” /ʃ/ “shave”). Affricates have a manner of articulation that is like a stop followed by a fricative. Affricates are not just stops followed by fricatives, but a unique manner of articulation. In American English, /tʃ/ “chair” and /dZ/ “judge” are made slightly posterior (in back of) the alveolar place of articulation. In fact, these postalveolar affricates are the only affricates in American English. Many other languages have affricates at different places of articulation. There is one nasal consonant, /n/, produced at the alveolar place of articulation. Nasals are produced by creating a complete constriction in the vocal tract but opening the velopharyngeal port so air can flow through the nasal passageways and to the atmosphere. Other nasals include the bilabial /m/ and the velar /ŋ/ “running.” Approximant is a manner of articulation in the vicinity of the alveolar place — this type of speech sound is categorized as an alveolar in Figure 12–4. Approximants have a constriction that is not as tight as in fricatives but tighter than in vowels. The hissing noise associated with fricatives is not produced in most approximants because their constriction is not sufficiently narrow. Approximants produced at the alveolar place of articulation include /l/ “long” and /ɹ/ “right.” Note that /w/ is an approximant at the labial place of articulation, and /j/ “yes” is an approximant at the 4 

175

palatal place of articulation. (“palatal place” is behind the postalveolar place, see Figure 12-4). Finally, flaps, also known as taps, are a type of stop consonant but are usually given status as a separate manner of articulation. One way to think of taps is as a very brief /d/, produced not by placing the tongue tip just behind the alveolar ridge and blocking the airstream for 100 ms, but rather by “flicking” the tongue tip against the alveolar ridge in a quick touch-and-release gesture. A good way to describe taps is by example. Words like /bɾ2/ “butter” and /lEɾ2/ “letter,” in which the first syllable is stressed and the second syllable unstressed, are produced in American English with a tap separating the two syllables. If the stress pattern is reversed, the middle consonant sounds like a /t/.4

Consonant Voicing The question, “What is the voicing status of a specific consonant?” asks whether the consonant is voiceless or voiced. Speech sounds categorized as “voiceless” are produced without vibration of the vocal folds. Speech sounds categorized as “voiced” are produced with vibration of the vocal fold. In American English, the speech sounds of seven pairs of consonants are differentiated by their voicing status. Stop consonants are in pairs at each of the three places of articulation; one is voiced, and the other is voiceless (bilabial, /p/ /b/; alveolar, /t/ /d/; velar, /k/ /g/). Fricative pairs are differentiated by voicing at four of the five places at which fricatives are produced (labiodental, /f/ /v/; dental, /T/ “think,” /D/ “then”; alveolar, /s/ /z/; postalveolar, /ʃ/ “shave,” /Z/ “beige”). The single affricate pair /tʃ/ “chair” and /dZ/ “judge” is differentiated by voicing (/tʃ/ voiceless, and /dZ/ voiced). The fricative /h/, produced at the glottal place of articulation, is voiceless but does not have a voiced counterpart in American English. In American English, all nasals and approximants are voiced.

Consonants of American English Compared With Consonants of Other Languages Figure 12–4 shows 61 consonant symbols available to represent consonants used in languages of the world. American English uses 26 of these. The 26 consonants used in American English are an average number among languages of the world, similar to the number of consonants in German, Italian, Norwegian, and Turkish. Languages such as Estonian and Bulgarian have consonants that number in the upper 30s to low 40s.

 aps are much less frequent for words like “butter” and “letter” in British English, in which the consonant that separates the two syllables is T likely to be a full /t/ (btə).

176

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

That’s a Lot of Consonants As with vowels, the 61 consonant symbols underestimate the total number of consonants used in languages of the world. There are two reasons for this, one obvious and the other less so. First, the consonant symbols shown in Figure 12–4 are selected, as noted in the text. This decision was made to simplify the discussion of consonants; this is the obvious reason. The less obvious reason is that there are certainly consonant sounds in other languages that have not yet been identified, and there are variants of consonant symbols shown in Figure 12–4 that are the same phoneme category but have slightly different sounds. Such variants are called “allophones” of the phonemic category. For example, the stop consonant /t/, a phoneme category in American English, is produced in (at least) two ways. One is called aspirated (symbolized /t/, see section, “Broad Phonetic Transcription”), and the other unaspirated (usually symbolized as /t/). The /t/ occurs when the /t/ begins a word, as in “tough”; /t/ is usually found at the end of words such as “cat.” These allophones of the /t/ phoneme are all consonant sounds in American English. If you are tallying up the total number of consonant sounds in languages of the world, the number is certainly greater than the 61 symbols shown in Figure 12–4. Peter Ladefoged, the great British-American phonetician, thought there may be as many as 800 consonant sounds produced throughout languages of the world.

The green boxes in Figure 12–4 enclose consonants from other languages. Clearly, there are several places of articulation that are not used in the phonetic inventory of American English. Good examples are the uvular (at the back end of the soft palate) and pharyngeal stops and fricatives that are prominent in Arabic languages. Also, places used in English but restricted to one or two manners of articulation are used in other languages for an additional manner of articulation. German and Hebrew, for example, have velar fricatives; American English does not. Other examples are found in Figure 12–4. The shaded boxes in Figure 12–4 have an interesting story. These are articulations that are considered “impossible” for consonant production. For example, a pharyngeal (place) nasal (manner) is considered impossible. This is because the necessary physiologi-

cal condition for a nasal, which is airflow through the velopharyngeal port to the nasal cavities cannot be met if the place of articulation is in back of the velopharyngeal port. In this case, the airflow is blocked before it can reach the velopharyngeal port; it is therefore not possible to produce a pharyngeal nasal.

Clinical Implications of Phonetic Transcription The relative density or sparseness of a vowel system may affect a speaker’s ability to master the vowel system of a second language. American English, for example, has a high-front vowel pair /i/-/I/ (“beat”-“bit”) and a high-back vowel pair /u/-/U/ (“kook”-“cook”) that are notoriously difficult to learn for speakers whose native language does not have the vowels /I/ and /U/. American SLPs can offer services to people who are not native speakers of English and who want to improve their English pronunciation. Knowledge of different vowel systems and the IPA symbol system for vowel transcription is important for this “accent reduction” therapy. A component of the therapy is likely to be improvement of the client’s ability to both perceive and produce the difference between vowels, such as the /i/-/I/ distinction, when only /i/ is part of the vowel system of the client’s native language. Let’s turn the issue around and consider a native speaker of American English and her attempt to learn Swedish. Swedish has a denser vowel system than American English; there are between 15 and 17 vowels in Swedish. Swedish has four vowel pairs in which the contrast between them is based primarily on lip rounding versus no lip rounding. Figure 12–1 shows the American English vowel symbol /i/ to be very close to the symbol /y/, which is a rounded version of /i/ (and, in many Swedish dialects, a little lower and more forward than /i/). Similarly, Swedish /e/, a vowel in American English, contrasts with its Swedish rounded version /ø/. (Try prolonging the vowels /i/ and /e/, then round your lips without moving your tongue — you will hear something like the Swedish /y/ and /ø/, respectively). A native speaker of American English who is learning to speak Swedish might benefit from a Swedish-speaking SLP who is well trained in phonetic transcription and its application to vowels. An understanding of the American vowel system may also be useful to an SLP who records vowel errors in children for the purpose of making a diagnosis between a speech delay of unknown origin and a speech disorder called childhood apraxia of speech (Chapter 15). Speech delay of unknown origin rarely has vowel errors as a prominent problem; vowels are mastered relatively early in the course of speech sound

12 Phonetics

development, even among children with delayed mastery of consonants. In childhood apraxia of speech, vowel errors may be a key diagnostic sign of the disorder. An accurate understanding and application of IPA transcription for vowels are required to describe a child’s vowel inventory and to compare that inventory to age expectations for vowel development. Knowledge of consonant phonetics is directly relevant to the clinical practice of SLPs. SLPs must provide a transcription record of the sound patterns produced by children who are evaluated for possible speech delay. “Speech delay” is a term used in the evaluation of children’s speech to describe speech sound development that significantly lags the expected sound skills at a given age. For example, studies of typical speech sound development have established the sounds that should be mastered by age 5 years. The speech of a 5-year-old child who is evaluated for speech delay requires an accurate phonetic transcription of his correctly and incorrectly produced consonants. The inventory of correct and incorrect consonants can then be compared to the speech sound mastery of a typically developing child of the same age. Not only is a skilled transcription of consonants required for this comparison, but the universal nature of the IPA symbols allows the phonetic transcription of a child’s vowels and consonants to be understood by any SLP.

Broad Phonetic Transcription The phonetic transcription described to this point in the chapter is called broad transcription. The phonetic symbols represent catgeories of sounds that often function as phonemes. Narrow phonetic transcription is a kind of fine-tuned transcription of these broad symbols. Narrow transcription can be especially useful in clinical settings. For example, in broad transcription, a speech sound is either an /s/ or an /ʃ/ “shave.” In clinical phonetics, however, children and adults often produce sounds that are neither /s/ nor /ʃ/ but something in between the two; or, a sound may be recognized as an /s/ or /ʃ/ but not as a “good” version. How does the IPA handle such occurrences? The basic symbol system of the IPA is supplemented by a series of symbols called diacritics, which allow narrow transcription. These symbols are meant to designate subtle changes in articulation that make a speech sound “different” from a “good” version of the sound, but not to the extent that the sound belongs to a different category. Diacritic symbols are almost always “attached” to the phonetic symbols described earlier, to indicate the subtle articulatory changes heard by a transcriber. The current chapter is a first introduction to phonetic transcription and cannot delve into the

177

intricacies of diacritics. But one example can make the transcription process clear. Suppose you are a trained phonetician and you hear a speech sound that seems like an /s/, but made with the tongue tip placed very close to the upper teeth. The speech sound is not interdental like /T/ “thin” but seems close to it (see Figure 12–4). How do you transcribe this sound? One way is to use a diacritic symbol. Diacritics symbols are appended to an IPA vowel or consonant symbol to indicate a subtle modification of the speech sound. An /s/ that sounds too close to the teeth is said to be dentalized, or more forward than the expected alveolar place of articulation. The diacritic symbol for “dentalized” is  and for the case described here is transcribed as /s/ (dentalized /s/). The “dentalized” diacritic can be used with (in theory) any sound but is especially relevant to other alveolars such as /t/, /d/, and /n/. A dentalized /d/, for example, is transcribed as /d/. In-depth presentations of phonetic transcription and diacritics are available in BaumanWaengler (2016) and Shriberg et al. (2019).

Chapter Summary The IPA is a transcription tool that allows listeners to use a universal set of symbols to represent heard speech sounds. The tool is applicable to any language, and it attempts to represent all the sounds known to occur in languages of the world. Phonetic transcription is a highly developed skill. Vowels are described in phonetic terms along a high-low dimension (the distance between the tongue surface and the roof of the mouth (hard palate), a front-back dimension (the advancement of the tongue toward the front of the mouth versus the retraction of the tongue toward the back of the mouth), and a lip rounding dimension (rounded versus unrounded lips). Vowels vary among different dialects in the United States, and in the dialects of all other countries, as well. Listeners can hear differences in the pronunciation of a vowel even when they assign the different pronunciations to the same vowel category. The vowel system of American English is different from the vowel systems of many other languages, including Greek, Spanish, and Swedish. Consonants are represented by IPA symbols that vary along the dimensions of place of articulation, manner of articulation, and voicing status. At a given place of articulation (such as alveolar), consonants are produced with different manners of articulation (such as stops, fricatives, affricates, nasals, and approximants).

178

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

American English consonants represent only a small proportion of consonants in the languages of the world. IPA transcription is a useful clinical tool for SLPs so they can use a universal symbol system to document correct and incorrect speech sounds in individuals with speech and hearing disorders. Diacritic symbols are used to show subtle modifications in the main phonetic symbols of the IPA.

References Bauman-Waengler, J. (2016). Articulation and phonology in speech sound disorders: A clinical focus (5th ed.). Boston, MA: Pearson Education. Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford, UK: Blackwell. Shriberg, L. D., Kent, R. D., McAllister, T., & Preston, J. L. (2019). Clinical phonetics (5th ed.). Boston, MA: Pearson Education.

13 Typical Phonological Development Introduction This chapter presents information on typical phonetic and phonological development. As defined in Chapter 3, phonology is the study of the sound systems of languages. Casual observation of small children plus years of scientific research point to a common conclusion — children do not master all speech sounds at the same time. There is no mystery that some speech sounds are easy for the developing child, and some are very difficult. There is continuing mystery about why this is so. The continuum of speech sound difficulty in the typically developing child is not the whole story of speech sound development. Phonology is the study of the sound system of a language. A speech sound in a particular language is linguistically effective because it plays a role relative to other speech sounds in the language. The speech sound is part of a system. A phoneme is defined as a speech sound that changes the meaning of a word when it replaces another speech sound in the same position in the word. For example, in English, the speech sounds “b,” “p,” and “f” are phonemes because word examples can be identified in which meaning is changed when the sounds replace each other at the beginning of the word

(“beer” versus “peer” versus “fear”). More precisely, a phoneme includes a class of sounds, all of which are treated as belonging to the same category (i.e., the same phoneme). For example, in English, the “b” sound is sometimes produced with the vocal folds vibrating throughout the entire sound, but at other times, vocal fold vibration may be delayed until just after the lip contact is released. These two ways to produce a “b” are phonetic variants or allophones of the phoneme “b.” Other allophones of English “b” can be defined as well. In saying the word “beer,” for example, a speaker of American English may produce the “b” as if he or she is swallowing the sound (the way some countrywestern singers might produce the sound for extra emphasis in a song, or to identify their dialect as “real country”). No matter, in English, the swallowed “b” is still an allophone of the “b” phoneme. All three of the “b” allophones just described — the vocal folds vibrating throughout the entire sound, vocal fold vibration delayed until just after the lip contact is released, and country-western swallow — are members of the same phoneme. It does not matter which one starts the word “beer”; the meaning of the word will be clear to a native speaker of American English. Allophones are an important part of the “system” of sounds in a language. The question of how children 179

180

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

learn this aspect of phonology is made more complicated, and more interesting, by different languages having very different phoneme/allophone relationships. For example, in Uduk, a language spoken in parts of Ethiopia and Sudan, the country-western-type “b” and the “b” with continuous vocal fold vibration are members of different phoneme categories (Ladefoged & Maddieson, 1996). Swapping these sounds before the same vowel will create different words in Uduk but not in English. Somehow a child must learn not only the phonemes of her language but also the various allophones included within each phoneme category. The child must also learn the phonotactics of her language. Phonotactics concern the sequences of sounds that form words. In a particular language, some sequences are allowed to form words; others are disallowed. These word-form rules are called phonotactic constraints. In English, a word can start with an “l,” “r,” “s,” and so on, but cannot start with the “ng” sound (the sound at the end of the word “rang”). The “ng” sound can occur, however, at the beginning of words in several languages (such as Swahili, a language spoken in a good portion of East Africa, and in Mandarin Chinese). The “ng” sound is a phoneme category in both English and Swahili, but the phonotactic constraints on the sound are different in the two languages. Finally, the child must learn the prosodic characteristics of the language, often considered as a component of phonology. Prosody includes the melodic and rhythmic features of spoken language. Of particular interest for the present chapter is the role of prosody in languages such as English, where multisyllabic words — words with more than one syllable — have a lexical stress pattern. The lexical stress pattern of a word indicates which of the multiple syllables receive linguistic stress and which are unstressed. In the English word “cinnamon,” for example, the first syllable is stressed, and the next two are unstressed (dictionary entry: sin-uh-muhn, where bolding indicates the stressed syllable). Say the word to yourself several times, and you will see that the first syllable is more “prominent” as compared to the second and third syllables. Speakers stress syllables by producing them with slightly longer duration, higher pitch, greater loudness, and more precise articulation as compared to unstressed syllables. These factors combine to make a stressed syllable “stand out” for a listener — this is what is meant by the syllable being “more prominent.” Now say the word with stress on the second syllable — “sinnah-muhn.” Strange, isn’t it? Children must learn the stress characteristics of multisyllabic words. Lexical stress patterns vary across languages, and this crosslinguistic variation often contributes heavily to what

we hear as an “accent” when a nonnative speaker of English produces multisyllabic English words. Additional information on the development of lexical stress is not included in this chapter. The bulk of this chapter is devoted to the pattern of sound learning in typically developing children. Some material on the learning of phonotactic constraints and phonological processes is also presented. It is important to keep in mind that the “typical” pattern presented in this chapter is an idealized description of normal phonological development. Not every typically developing child follows this pattern — there is a lot of individual variation in “normal” phonological development (Vihman, 2004). Finally, comments are made at the end of the chapter on the interaction between vocabulary growth and phonological development. Recent scientific developments suggest that development of the sound system is not a foundation for the development of words. Rather, word learning may be the foundation for the development of phonology.

Phonetic and Phonological Development: General Considerations The development of the speech sound system goes by different names. It is variously called speech sound development, articulatory (phonetic) development, and phonological development. Two of the terms, “speech sound development” and “articulatory (phonetic) development,” can be used interchangeably; “phonetic development” is used in this chapter to refer to these two terms. “Phonological development” refers to something different from phonetic development, as described below. The use of phonetic versus phonological development is not an academic exercise — it can have important implications for the diagnosis and treatment of developmental speech sound disorders (Chapter 15).

Phonetic and Phonological Development What is the difference between speech sound, articulatory (phonetic), and phonological development? The terms “speech sound development” and “articulatory (phonetic) development” share a similar meaning, as discussed later. We use the term “phonetic development” as the cover term for this aspect of speech sound development. The primary distinction for this discussion is between phonetic and phonological development.

13  Typical Phonological Development

Phonetic Development Phonetic development is the sequence of mastering the articulatory movements, positions, and shapes required to produce speech sounds. Phonetic development reflects the maturation of speech motor skills, also called articulatory skills. Can a child position and shape the tongue to produce an /s/? Can he move the tongue sufficiently forward to produce an /i/? Questions like these can be asked about any speech sound in a language.

Phonological Development Phonological development refers to the role of speech sounds in the sound system of a language. Rather than asking if a child has the speech motor capability to produce an /s/, the question is, does the child understand the role of /s/ in the sound contrasts that can distinguish words, the role of /s/ in the morphophonemic aspects of a language, and the allowable sound sequences for word formation? Examples of the three aspects of phonological development mentioned in the previous paragraph and their differences from phonetic development are as follows: n A young child produces a good /s/ for words

such as “sip,” “Sue,” and “see,” but produces the same /s/ for words such as “ship,” “shoe,” and “she.” The use of /s/ for /ʃ/ may not indicate an absence of speech motor control skills (i.e., a phonetic issue) for /ʃ/, but rather the absence of recognition of the /s/-/ʃ/ contrast as phonemic. In this view, the /s/-/ʃ/ contrast has not yet become one of the phonemic contrasts in the phonological system of a child’s native language. n A young child produces a good /s/ and a good /z/ at the end of words but does not use the sounds appropriately when marking plurals for words such as “tacks” and “tags.” The child produces these words as /tæks/ and /tægs/, when they should be /tæks/ and /tægz/. The morphophonemic rule (that is, the interaction between morphemes and phonemes) in the phonological system of English is that plural -s is phonetically /s/ when it follows a voiceless stop (the “t” in “tack”) and z when it follows a voiced stop (the “g” in “tag”). The error of applying the /s/ for the plural in both words is not due to a speech motor control problem, as evidenced by the child’s ability to produce

181

words like “fuss” and “fuzz” with a good /s/ and /z/, respectively. The morphophonemic rule is phonological, not phonetic. n A child produces word forms with word-initial “ng” /ŋ/. The word-forms are protowords (phonetically consistent forms) discussed in Chapter 5. These are phonetic sequences that are used consistently to identify objects, people, or possibly actions, but are not “real” words in the child’s native language. The child’s use of wordinitial “ng” violates the phonotactic rules of English. Word-initial “ng” cannot initiate a word but can appear in word-medial and word-final positions (as in “penguin” and “sing”). Is it meaningful to consider a speech sound difference from the “correct” adult form as phonetic versus phonological? Many clinicians and scientists believe so. Typical speech development includes errors in individual sounds as well as patterns of sound errors, an example of the latter being the deletion of word-final consonants. When a speech sound error is regarded as phonetic in origin, there is a tendency to attribute the error to immature speech motor skills. In this view, speech motor skills for correctly produced speech sounds are sufficiently developed for some sounds, but not for others. Patterns of speech sound errors in typical speech development suggest an alternative view for the origin of the errors. Take the case of a typically developing child who says “da” for “dog,” “ka” for “cat,” and “doh” for “those.” The phonetic view considers the child’s errors as articulatory problems (immature speech motor skills) in the production of the individual sounds /g/, /t/, and /z/ in the word-final position. In the phonological view of these errors, the sounds are omitted as a result of the phonological process of deletion of word-final stops. Phonological processes are regarded as one of many cognitively based language rules. The child probably has the speech motor skill to produce the sounds but produces the intended consonant-vowel-consonant (CVC) words as CV syllables to simplify the task of learning to produce the words. “Da” may not be correct phonetically, but the child’s use of this CV form clearly means “dog.”

Typical Speech Sound Development Figure 13–1 is a representation of the ages at which children learn the speech sounds of English. The process of speech sound development in a typically developing child is not predictable in its details. From child

182

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

MONTHS 24

28

32

36

40

44

48

>48

vowels, diphthongs

nasals, glides

stops lateral, rhotic (liquids) fricatives, affricates clusters

Figure 13–1.  A schematic summary of speech sound learning in typically developing children. The left-hand edge of each bar indicates the age at which about half of typically developing children produce the indicated sounds correctly, and the right-hand edge is the age at which most children have learned correct production of the sound (90% to 95% of tested children). The right-hand arrows extending past 48 months indicate that mastery of some sounds (laterals, rhotics, fricatives, affricates, consonant clusters) may extend, on average, well past 4 years of age.

to child, there is a fair degree of variation in the order in which sounds are learned, and the ages at which the sounds are “mastered.” For a large group of typically developing children, however, general trends in speech sound development can be identified. In fact, the steps in speech sound mastery that are regarded as “typical” enjoy broad agreement across studies (see summary in Smit, Hand, Freilinger, Bernthal, & Bird, 1990; and more recent summaries in Stein-Rubin & Fabus, 2012; and Bauman-Waengler, 2015). Figure 13–1 shows these trends. Studies on the developmental course of speech sound learning have included as few as 90 children (Bricker, 1967) and as many as 997 (Smit et al., 1990). The ages shown in Figure 13–1 reflect the fact that the bulk of speech sound learning takes place between the ages of 2 and 4 years, even though learning begins before age 2 years and continues past age 4 years. The left-hand edge of each bar in Figure 13–1 indicates the age at which about half of typically developing children produce the indicated sounds correctly, and the right-hand edge is the age at which most children have learned correct production of the sound. For example, roughly half of typically developing children produce

stop consonants correctly a little before 24 months, and nearly all children produce all stops correctly just after 48 months. Vowels, diphthongs, nasals, and glides are learned early and probably mastered no later than shortly after the third birthday (36 months). Stops are also learned early but may have a lengthier period of development than vowels, nasals, and glides. The mastery of liquids ​ — the “l” and “r” sounds — lags vowels, diphthongs, nasals, glides, and stops in two ways. First, the age at which half of typically developing children produce liquids correctly is nearly a year later than the age for the earlier-mastered sounds (compare in Figure 13–1 the starting age for liquids to that for the three sound categories shown above the liquids). Second, the development of liquids may extend well beyond 4 years of age, as indicated in Figure 13–1 by the arrow pointing past the 48-month landmark. Mastery of fricatives and affricates lags that of liquids slightly and extends well past 48 months. Finally, correct production of consonant clusters, such as the “sp” sounds in “spot,” the “skr” sounds in “scratch,” the “pl” sounds in “play,” and the “rst” sounds in

13  Typical Phonological Development

“first,” begins to be mastered at around 3 years of age and may not be fully mastered until well past 4 years of age. The last speech sounds acquired throughout the course of sound development are often referred to as “The Late Eight” (Bleile, 2018). These sounds include voiceless and voiced “th” (/T/, /D/), voiceless “s” and voiced “z” (/s/, /z/), “l” (/l/), “r” (/r/), “sh” (/ʃ/), and “ch” (/tʃ/). As Bleile notes, the majority of children with speech sound disorders have errors on one or more of these sounds. Some of the late-eight errors may be present past 8 or 9 years of age, the absolute upper limit for typical development of the sound system. Many children with these errors correct them spontaneously — the errors are “normalized” without therapy. A small number of early and late-age teenagers and even some adults continue to make these errors, most likely on /s/, /z/, /l/, and /r/. Late-eight errors that persist into the teenage and adult years are called persistent or residual errors (Flipsen, 2016).

Determination of Speech Sound Mastery in Typically Developing Children The arrangement of speech sounds in Figure 13–1, from top to bottom, is from the earliest- to latest-appearing sounds in the typical child’s development of English speech sounds. This pattern of sound development can be considered a kind of “average” developmental pattern; as noted earlier, many departures from this “average” pattern are normal and not cause for concern. With this in mind, it is useful to consider how these average patterns were determined, and why it is important to be familiar with an average developmental sequence, even if it does not represent every typically developing child. The basic research strategy for obtaining the information summarized in Table 13–1 is to select a set of words requiring a child to produce a target sound (e.g., the stop “b” or fricative “s”) in two or three positionsin-word. The term “position-in-word” is typically reserved for consonant production and refers to the location of a consonant as word-initial, word-medial, or word-final. For example, the word “baby” contains the “b” sound in the word-initial and word-medial positions, and the word “tub” has the “b” sound in the word-final position. The decision to test the production of a given sound in different positions-in-word emerged from clinical experience. This experience suggested that a child’s ability to produce a sound correctly in any one position did not necessarily mean that he could produce the sound correctly in other positions.

183

Table 13–1.  Example of a Transcription Analysis of a Speech Sound Production Task

Word-Initial

Word-Medial

beIpi

beIpi

Word-Final

Target /b/ “baby”

tp

“tub” Target /s/ “sun” “whistle”

Tn

wIf ə

“rice”

wɑIf

Correct production of one allophone of a phoneme category does not guarantee correct production of all allophones of that category. When a complete set of words has been developed and judged to be familiar to children as young as 2 years of age, pictures or photographs of the object, person, or action are created for each word. The child is shown the picture or photograph and asked to say what he or she sees. Ideally, the child produces the word without hearing it spoken by the experimenter. When a child cannot name a picture, the experimenter provides a spoken model, and the child’s production of the word is an imitation of the experimenter’s model. Each child’s naming of a picture or photograph (or word imitation) is transcribed by a skilled phonetician, using the symbols of the International Phonetic Alphabet (Chapter 12). This transcription shows the sounds used by the child to produce the word. In some cases, the transcription may include allophonic detail (narrow phonetic transcription).

Possible Explanations for the Typical Sequence of Speech Sound Mastery Two major explanations for the sequence of sound mastery shown in Figure 13–1 have been considered in the research and clinical literature. These explanations are not mutually exclusive; they may both explain, in small or large ways, the early learning of sounds such as vowels, stops, and nasals as compared to the later learning of sounds such as liquids and fricatives and of sound sequences such as consonant clusters. One explanation concerns the maturation of speech motor control capabilities. The other explanation concerns the maturation of auditory mechanisms for speech sound identification.

184

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Maturation of Speech Motor Control Children do not control their posture, the use of their hands, or any other aspect of movement or body positioning in an adult-like way. The movements of an 18-month-old child are immature, becoming more refined and accurate over many years. As noted many years ago by the famous speech scientist R. H. Stetson (1872–1950), speech production is a collection of articulatory movements made audible (Stetson, 1951). Speech movements result in positions and shapes of articulatory structures. For example, an adult-like /s/ ​ — a “good” /s/— requires not only movement of the tongue to the correct location within the vocal tract, but also an overall positioning and shaping of the tongue. For the remainder of this chapter, the term “speech movement” is used to denote these aspects of motor control. Like other movements of body structures, movements of the speech mechanism develop and become increasingly adult-like as the child matures. The term “speech motor control” denotes the concept of nervous system mechanisms (and the muscles they control) required for the execution of speech movements. Maturation of speech motor control refers to the way in which it changes over time from infant to adult-like control capabilities. Maturation of speech motor control is necessary for an individual to transition from infant- and toddler-like sound production to the fully intelligible speech of the typical adult. For the child with less mature speech motor control, some speech movements are likely to be challenging, whereas others may be less so and perhaps even simple. Scientists have argued that fricatives and liquids require rather precise and difficult movements and positions and are therefore mastered later in development, when the child’s speech motor control is “up to the task.” Vowels, nasals, and stops are believed to require relatively simple speech motor control capabilities and can be produced accurately by younger children. According to some studies, very young children who are just beginning their speech sound development may avoid producing words composed of “difficult” sounds such as fricatives and liquids (Vihman, 2004). For example, at 2 years of age, a child may produce words such as “dog,” “cat,” “poppa,” and “no,” but not words such as “sun,” “shell,” and “rag.” Does the child lack the latter three words in her lexicon (have they not yet been learned?), or does she have the words but avoid saying them because the component sounds (such as /s/, /ʃ/, and /r/) require speech motor control abilities that are too advanced for her level of development? Many toddlers who do not say certain words

know them as demonstrated in comprehension tasks. This knowledge is not yet useful for expressive (production) language. What is the assumed difference between early versus advanced speech motor control skill? Figure 13–2 shows the difference in tongue shape for a sound mastered early (/t/, green surface) and one mastered late (/s/, black surface). The view is from the front looking back into the vocal tract, as if the mouth and teeth are transparent to reveal the contrasting tongue shapes, both of which are made at the alveolar place of articulation. Only the surface of the tongue at a specific location (a cross-section, along the width of the tongue) is shown. The shape for /t/ is more or less flat, with the tongue pressed against the alveolar ridge (the front of the hard palate) to form a complete blockage of the airstream for about 1/10th of a second. The shape for /s/ is more complex as shown by the groove in the central part of the tongue width. The groove is narrow and tight. When air flows from the lungs to the upper airways, pressure builds up behind the groove and forces air through it, creating the hissing noise typical of fricative consonants. The grooved tongue shape for fricatives is assumed to require more skilled speech

Figure 13–2. Tongue shapes for /t/ (green) and /s/ (black ). The view is from the front, looking into the mouth; the image is drawn as if the mouth and teeth are transparent to show the side-to-side shape of the tongue behind the teeth. The tongue is more or less flat for /t/ and grooved for /s/.

13  Typical Phonological Development

motor control than the flat-tongued shape for stop consonants. This may explain why /t/ is mastered earlier in speech sound development than /s/ (and in general, stop consonants are mastered earlier than fricatives).

Maturation of Perceptual Mechanisms for Processing Speech One of the great controversies in the field of speech development concerns the relationship of what the child hears to what the child produces. At a general level, the ability to hear speech sounds, and more precisely to distinguish between different speech sounds, must be related to how a child learns to produce speech. An obvious example is the effect of significant hearing loss on a child’s speech production abilities. Babbling in babies with significant hearing loss at birth appears much later than babbling in hearing babies (Eilers & Oller, 1994). To the extent that auditory sensitivity predicts the complexity of babbling (von Hapsburgh & Davis, 2006), the child’s auditory capability is one of the foundations of speech sound development. Speech-language pathologists are interested in a more specific question concerning the role of hearing in speech development: Is there a close match between how the child perceives speech sounds and how he produces them? For example, do children develop the ability to produce the distinction between “w” and “r” (as in words such as “right” versus “white”) only when they can hear the difference between “r” and “w”? More generally, do sounds such as liquids and fricatives appear later in speech development because children who are just beginning to learn speech have difficulty hearing and distinguishing contrasts such as /s/ versus /ʃ/ or /r/ versus /l/? Is the explanation for early appearing sounds such as vowels, nasals, and stops explained by an early ability to hear the difference between them? Is there evidence that the perception of fricatives, or liquids, is more challenging for younger, as compared to older, children? In a more general sense, is there evidence for the development of speech soundperception skills as children are learning language? Or, is there evidence that little humans are endowed at birth with the ability to hear all distinctions between speech sounds? The answer to the major question, of whether the perception of speech sounds is a developmental process (rather than being present in adult form, at birth), 1

185

seems to be “yes.” The perception of speech sounds becomes more skilled as a child matures. Infants begin with the ability to discriminate between sounds not only of their own language but of other languages as well (see Chapter 5). At the end of the first year and into the second year of life, children’s ability to hear the difference between sounds that have a contrastive function in other languages, but not in English, diminishes and eventually disappears. At the same time, babies become especially sensitive to the important sound discriminations in their own language (Werker & Yeung, 2005).1 This not only shows a developing speech perception ability in childhood but also the influence on perceptual skill of the specific language being spoken in the child’s environment. As reviewed by Nittrouer (2002), children’s speech perception abilities continue to evolve past the first year of life. Humans at birth are not endowed with a “fixed” set of speech-sound perception abilities. Perceptual skills change and develop. The more focused question of whether perceptual skill for specific sounds must be in place for the correct production of the sounds remains open. This is a complex issue, and the answer (at this point in time) is no more than a “maybe.” Late-developing speech sounds such as liquids, fricatives, and affricates, are apparently not difficult to perceive for the child who is early in speech sound development, and who has not mastered the production of these sounds (Vihman, 2004). Very young children may have difficulty perceiving fricatives, or liquids, but this is not a general finding. But the perception of sounds such as liquids and fricatives may be fine even though the child has not mastered the production of these sounds. This latter finding weakens the case for a close relationship between a typically developing child’s sound perception and sound production capabilities. A study relevant to this issue is provided by Dugan, Silbert, McAllister, Preston, Sotto, and Boyce (2018), and an excellent review is provided by Preston, Irwin, and Turcio (2015). This question is not only relevant to the typical development of speech but also has an important influence on explanations and treatment plans for developmental speech sound disorders. Speech sound disorders are conditions in which a child’s development of speech sounds does not occur within a typically normal age range. Children diagnosed with speech sound disorders learn sounds in the typical order (see Figure 13–1) but at a slower rate. As reviewed by Bankson and Bernthal (2004a), there is a history of research attempting

 erker and Yeung (2005) have reviewed evidence that in the second half of the first year of life, babies’ evolving ability to associate words with W objects or actions is a “trigger” to organizing their greater sensitivity to the phonemes in their own language. In other words, early aspects of word learning prepare the baby to perceive sound categories that have importance to making distinctions between words.

186

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

to show a link between sounds misarticulated by children with speech sound disorders, and their ability to perceive those specific sounds. The practical, clinical implication of this issue is that perceptual training may be a significant aspect of speech therapy for accurate production of a speech sound. Perceptual training may even occur before training to produce the sounds correctly. Some clinical research has demonstrated, in fact, that perceptual training of speech contributes to the elimination of speech sound production errors (Bankson & Bernthal, 2004a).

The Jury Is Sometimes Out and Sometimes In Many speech-language pathologists and audiologists use auditory training as an integral part of speech therapy in children with speech sound errors. Even though the relevant research foundation for auditory training is not firm, we do not have a completely hung jury on the issue. As with the wide variability among children in learning a sound system, some children with speech sound disorders show improvement in speech production skills following perceptual training. The training may be “auditory bombardment” in which the child hears her error sounds over and over. The theory is to “shape up” the sound category in perception so that it can be produced with reference to this stabilized perceptual category. Alternatively, therapy may consist of hearing perceptual contrasts between closely related sounds, such as the contrast between /s/ and /ʃ/, both voiceless fricatives that differ only by place of articulation. When a child is trained on the perception of this contrast and shows improvement in /s/ versus /ʃ/ production, two explanations are possible. One is that the auditory training has a general influence on auditory skills, which includes the skill required for the perception of any phonetic contrast. In this case, improvement in the /s/-/ʃ/ contrast and the resulting influence on good /s/ and /ʃ/ production is a consequence of an upgrade of overall auditory skill. A second explanation is that the auditory training is specific to the /s/-/ʃ/ contrast and does not transfer to other problem contrasts such as /w/-/r/. In either case (or maybe a little bit of both), the possibility that auditory training may contribute to improved articulation is worth the therapeutic effort.

Phonological Processes and Speech Sound Development Phonological learning is not limited to the mastery of individual speech sounds. Children must also learn how words are formed from sequences of sounds (the phonotactic characteristics of a language), and for some languages, the proper prosodic pattern for words with multiple syllables (i.e., lexical stress patterns). In the course of learning these aspects of phonology, children make errors when producing word forms that presumably are intended to “match” the form produced by adults. For example, a child around the age of 2 years may say “da” when he sees the family pooch. The child is attempting to produce the word “dog,” which has a consonant-vowel-consonant (CVC) form. “Da” fails to match the adult form as it lacks the word-final consonant. Stated somewhat differently, the child uses a CV word form for a “target” CVC word form. It is as if, as mentioned earlier, the child simplifies the articulatory task of producing a CVC by eliminating one of the component sounds. Part of phonological development is learning the correct matches to adult word forms by eliminating these simplifying processes. Notice how the simplification of CVC to CV syllables introduces a phonotactic constraint on the child’s speech production: the adult phonology allows words to be formed by CVC sequences, but the child restricts his own word forms to CV syllables. The CV for CVC mismatch can be described as a phonological error. Within certain age limits, such errors are expected as a typical part of phonological development. An interesting characteristic of phonological development is found in the nature of these errors: they are not random but are systematic. Some examples make this clear. For many children learning the phonology of their language, an error such as “da” for “dog” is not an isolated case of leaving off the final consonant of a specific word. Many typically developing children go through a phase of producing “target” CVC word forms by always (or nearly always) deleting the final consonant. “Dog” is produced as “da,” “cat” as “kah,” “bus” as “buh,” and so forth. Scientists who study normal and delayed/disordered speech in children acquisition of phonology claim that groups of similar errors are the result of phonological processes. A phonological process is a rule that changes the expected word form (what an adult produces) to a different, more simple form. In the current example, the phonological process is one of final consonant deletion, in which all (or nearly all) CVC word forms are changed to CV forms. The rule is not

13  Typical Phonological Development

“delete the ‘g’ at the end of the word ‘dog,’” but applies broadly across final consonants in any CVC word. Presumably, this phonological process simplifies the articulation of single-syllable, CVC word forms because CV forms are easier to produce. The phonological process of cluster reduction changes words with a CCV(C) or CVCC form (where “CC” = consonant cluster) to CVC forms. For example, the CCVC form for the word “stop,” in which the consonant cluster is the word initial “st,” is changed to “top” (a CVC form). Or, the CVCC form for the word “best” is changed to “bes” or “bet” (again, a CVC form). In both cases, the child simplifies the articulatory requirements for the adult CCVC or CVCC forms by “reducing” two successive consonants — a consonant cluster — to a single consonant. Another phonological process is called stopping of fricatives. This process changes fricatives to stops, as when “sip” is produced as “tip.” Stopping of fricatives simplifies articulation by changing a sound thought to require advanced speech motor skill (the fricative “s”) to one of relatively simple speech motor skill (the stop consonant “t”). English words with multiple syllables have varying stress on the syllables. Some syllables are heard as “prominent,” and others are heard as weak. In the word “banana,” for example, the first syllable “ba” is produced with very little stress and therefore does not sound prominent. In contrast, the second syllable “na” receives primary stress in the word and is heard as prominent. The third syllable is also produced with little stress and is not very prominent. The stress pattern of this three-syllable word can be described as weakstrong-weak. The phonological process of unstressed syllable deletion reduces the number of syllables in a multisyllabic word by eliminating an unstressed syllable, typically the first “weak” syllable of the word. “Banana” is produced as “nana” (strong-weak), “elephant” as “elphant” (strong-strong), and “incredible” as “creble” (“strong-weak”).2 Presumably, the articulation of multisyllabic words can be simplified by eliminating one (or more) of the syllables from the production. Are phonological processes merely descriptions of the way children modify word forms during typical phonological acquisition, or are they the result of biological tendencies unique to the human ability to communicate by speech? The difference between these two possibilities is important. Behavioral regularities can always be described and stated as a formula or “rule” 2 

187

such as “CVC becomes CV,” as in the phonological process of final consonant deletion. This description does not prove, however, the presence of a mechanism in the heads of little humans that takes a CVC word form and changes it to a CV form as a biologically directed part of speech sound development. Nevertheless, some scientists regard phonological processes as biologically based, cognitive mechanisms that guide, at least in part, the natural course of phonological acquisition. What kind of scientific evidence supports the biological view of phonological processes? One observation, that the same phonological processes are seen among children learning very different languages (Vihman, 2004), has been used to support the biological perspective. Even when different languages have different phonetic inventories and use them in different ways (i.e., have different phonemes and allophones), the same phonological processes tend to change adult word forms to simpler forms. This suggests something “universal” about phonological processes, something applying to all languages even when other components such as phonemes vary from language to language. The concept of a “universal” language characteristic is almost always an important piece of a belief in a biological basis of speech and language. Like the mastery of individual speech sounds, different phonological processes seem to have a schedule of appearance (and disappearance) during the course of phonological acquisition. For example, the process of final consonant deletion is typically seen in the early stages of acquisition and disappears before the age of 3 years (Stoel-Gammon & Dunn, 1985). The concept of disappearance of a phonological process is important: the mastery of the phonological system of a language is characterized not only by learning speech sounds but also by the elimination of processes that create mismatches between child and adult word forms. There are gains and losses along the child’s pathway to “correct” phonological behavior. Based on available data, certain processes seem to disappear early in typical speech sound development; others persist to later ages. The process of final consonant deletion has already been noted as disappearing relatively early in the course of typical phonological development. Cluster reduction may not disappear until later in phonological development, around age 3 and a half years (Cohen & Anderson, 2011). Stated otherwise, cluster reduction observed in a child’s speech past the age of 3 years is not considered atypical, but

 hen my son Ben was learning the sound system of English, his multisyllabic word productions were changed by the phonological process of W unstressed syllable deletion. For a long time, he referred to “The Incredible Hulk,” one of his favorite destructive superheroes, as “creblhulk” (three syllables). I combine “hulk” with the first part of the word because he obviously treated the two words as one. Ben had no idea, at this point in his phonological development, that “incredible” was a word that could be separated from “hulk.”

188

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

deletion of word-final consonants past the age of 3 years suggests a phonological delay. Bankson and Bernthal (2004b, pp. 245–249) have an excellent review of phonological processes during the typical course of phonological acquisition.

Phonological Development and Word Learning This chapter has focused on speech sound development. Speech sounds have been treated as independent units to be learned, such as the typical age of mastery of /g/ or /s/. In addition, mastery of the sound system has been shown to be part of word learning, as in the case of the development of word forms with “allowable” sound sequences (phonotactics) and simplification of word forms (phonological processes). This description of phonological development implies a directionality between sound learning and word learning: sounds are learned, and words are built from them. In different terms, vocabulary growth is dependent on sound mastery. In recent years, this logic has been reversed. The results of several studies suggest that word learning leads sound learning. The growth of vocabulary requires finer and finer articulatory distinctions between sounds to distinguish the new lexical entries. For example, the child may add words such as “sign” and “shine” to the lexicon at about the same time, perhaps before the child is making a sharp articulatory distinction between “s” /s/ and “sh” /ʃ/. The distinction between these two sounds is one of place of articulation (alveolar for /s/, palato-alveolar for /ʃ/). These late-mastered sounds may both be distortions, with imprecise places of articulation. The acquisition of these new words is thought to promote greater articulatory distinction between the sounds — to get their places of articulation correct and thus match the adult form of the words. In this sense, vocabulary growth may “shape” sound system growth. This idea has implications for speech-language therapy, as described in Chapter 15. The way in which phonological development is thought of — words built from sounds versus sounds built from words — may have important implications for treatment of developmental speech sound disorders. Gierut (2016) has an excellent review of the relationship between vocabulary growth and mastery of the speech sound system.

Chapter Summary Phonetic and phonological development in typically developing children includes the development of both

speech motor skills (phonetic skills) and the sound system (phonology) of a language. Phonemic contrasts, allophones of the phoneme categories, morpho-phonemic rules, phonotactic constraints, and prosody are components of phonological development. The development of speech motor control determines the development of phonetic skills, which are sometimes referred to as articulatory skills. Examples are provided of the potential independence of speech motor and phonological development; both of these contribute to development of speech sounds. The order in which speech sounds are mastered is based on research using large numbers of typically developing children; the age ranges over which specific sounds are mastered is an average, and many typically developing children do not follow a fixed pattern of sound development. Speech sound development begins around 1 year of age and is often complete by 5 or 6 years of age; for some children, speech sound development continues until 8 or 9 years of age. Speech sounds that are mastered early in development include vowels, diphthongs, nasals, glides and stops; lateral, rhotics, fricatives, and affricates are learned later in speech sound development. Consonant clusters are mastered later in the course of speech sound development. The order of mastery of specific sounds may be explained by the development of speech motor skill, perceptual skill, and cognitive skills. Phonological processes are important in the development of speech sounds; the processes often result in mismatches between a child’s production of a word and the “target” adult form. Many phonological processes produce “errors” early in speech sound development, but the processes actually simplify the child’s task of producing words; such processes are referred to as simplification processes. Throughout the course of speech sound development, simplification processes disappear, which allows the child to produce a word that is a good match to the adult form of the word.

References Bankson, N. W., & Bernthal, J. E. (2004a). Etiology/factors related to phonologic disorders. In J. E. Bernthal & N. W. Bankson (Eds.), Articulation and phonological disorders (5th ed., pp. 139–200). Boston, MA: Pearson Education. Bankson, N. W., & Bernthal, J. E. (2004b). Phonological assessment procedures. In J. E. Bernthal & N. W. Bankson (Eds.),

13  Typical Phonological Development

Articulation and phonological disorders (5th ed., pp. 201–267). Boston, MA: Pearson Education. Bauman-Waengler, J. (2015). Articulation and phonology in speech sound disorders: A clinical focus (6th ed.). Boston, MA: Pearson Education. Bleile, K. M. (2018). The late eight (3rd ed.). San Diego, CA: Plural Publishing. Bricker, W. A. (1967). Errors in the echoic behavior of preschool children. Journal of Speech and Hearing Research, 10, 67–76. Cohen, W., & Anderson, C. (2011). Identification of phonological processes in preschool children’s single-word productions. International Journal of Language and Communication Disorders, 46, 461–488. Dugan, S. H., Silbert, N., McAllister, T., Preston, J. L., Sotto, C., & Boyce, S. E. (2018). Modelling category goodness judgments in children with residual sound errors. Clinical Linguistics and Phonetics, 24, 1–21. Eilers, R. E., & Oller, D. K. (1994). Infant vocalizations and the early diagnosis of severe hearing impairment. Journal of Pediatrics, 124, 199–203. Flipsen Jr., P. (2016). Emergence and prevalence of persistent and residual speech errors. Seminars in Speech and Language, 36, 217–223. Gierut, J. (2016). Nexus to lexis: Phonological disorders in children. Seminars in Speech and Language, 37, 280–290. Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford, UK: Blackwell. Nittrouer, S. (2002). From ear to cortex: A perspective on what

189

clinicians need to understand about speech perception and language processing. Language, Speech, and Hearing Services in the Schools, 33, 237–252. Preston, J. L., Irwin, J. R., & Turcio, J. (2015). Perception of speech sounds in school-aged children with speech sound disorders. Seminars in Speech and Language, 36, 224–233. Smit, A. B., Hand, L., Freilinger, J. J., Bernthal, J. E., & Bird, A. (1990). The Iowa articulation norms project and its Nebraska replication. Journal of Speech and Hearing Disorders, 55, 779–798. Stein-Rubin, C., & Fabus, R. (2012). A guide to clinical assessment and professional report writing. Clifton Park, NY: Delmar. Stetson, R.H. (1951). Motor phonetics: A study of speech movements in action (2nd ed.). Amsterdam, the Netherlands: North Holland Publishing. Stoel-Gammon, C., & Dunn, C. (1985). Normal and disordered phonology in children. Baltimore, MD: University Park Press. Vihman, M. M. (2004). Later phonological development. In J. E. Bernthal & N. W. Bankson (Eds.), Articulation and phonological disorders (5th ed., pp. 105–138). Boston, MA: Pearson Education. von Hapsburgh, D., & Davis, D. L. (2006). Auditory sensitivity and the prelinguistic vocalizations of early-amplified infants. Journal of Speech, Language, and Hearing Research, 49, 809–822. Werker, J. F., & Yeung, H. H. (2005). Infant speech perception bootstraps word learning. Trends in Cognitive Sciences, 9, 519–527.

14 Motor Speech Disorders in Adults Introduction Motor speech disorders are a group of speech disorders resulting from damage to the central nervous system (cerebral hemispheres and their contents, the cerebellum, brainstem, and spinal cord) or peripheral nervous system (nerves leading from the brainstem or spinal cord to and from muscles). The causes of this damage include a range of neurological diseases, including degenerative diseases (like Parkinson’s disease, or multiple sclerosis), strokes, tumors, inflammatory conditions, and other conditions. The clinical and research history of motor speech disorders is unique in the field of communication sciences and disorders. The history is unique because there is a well-accepted classification system for different types of motor speech disorders. Many other speech and language disorders do not enjoy the benefit of agreement on their classification. The classification of motor speech disorders assumes that damage to different parts of the brain produces different — and unique — speech symptoms. Table 14–1 summarizes terms from Chapter 2 that are relevant to the classification of motor speech disorders.

Classification of Motor Speech Disorders Figure 14–1 presents a simple classification scheme for motor speech disorders in adults. Motor speech disorders include two major subcategories, one being dysarthria and the other apraxia of speech. Dysarthria is a motor speech disorder in which neurological disease results in weakness, paralysis, or incoordination among the muscles of the speech mechanism (Darley, Aronson, & Brown, 1975; Duffy, 2013). These muscle problems result in poor control of movements of the lips, tongue, jaw, velum, larynx, and respiratory structures. The poor speech movement control results in speech production problems. SLPs are able to identify most cases of dysarthria by listening to a short sample of speech. Occasionally, speech of the hearing impaired or of adults with apraxia of speech (see later in this chapter) may be confused with dysarthria. Apraxia of speech in adults (AAS, for “adult apraxia of speech)1 is thought to be a planning (also called programming) disorder resulting from neurological damage within the cortex and subcortical nuclei such as the basal ganglia. Muscle paralysis, weakness,

1

I n much of the clinical and research literature on apraxia of speech in adults, the acronym “AOS” (apraxia of speech) is used. However, “AAS” is used in the textbook to be consistent with more recent literature in which the adult version of the disorder is compared and contrasted with childhood apraxia of speech (CAS; see Chapter 15).

191

Table 14–1.  Terms From Chapter 2 That Are Relevant to the Current Chapter on Motor Speech Disorders in Adults

Term

Definition

Central nervous system

Cerebral hemispheres and their contents, cerebellum, brainstem, spinal cord.

Peripheral nervous system

Nerves attached to brainstem and spinal cord that carry information to and from the central nervous system; cranial nerves serve structures of the head and neck, spinal nerves serve structures of the limbs and torso.

Gray matter

Clusters of neuron cell bodies. The cerebral cortex and the cortex of the cerebellum are composed of cell bodies; clusters of gray matter in the basal ganglia and thalamus, the brainstem, below the cerebellar cortex, and spinal cord, are called nuclei.

White matter

Bundles of myelinated axons connecting one or more areas of gray matter.

Substantia nigra

A nucleus (group of cell neuron cell bodies) in the midbrain (top part of brainstem) that manufactures dopamine, a neurotransmitter critical to control of movement.

Motor neuron

The cell body of a neuron whose axon carries information to muscles to control their contraction time, force, and coordination with other muscles. Groups of motor neurons are found in the primary motor cortex and in other regions of the central nervous system such as the basal ganglia and brainstem, and spinal cord.

Upper motor neuron

The pathways from the cortical motor neurons to motor nuclei in the brainstem or spinal cord.

Lower motor neuron

The pathways from the motor nuclei in the brainstem or spinal cord to muscles of the head and neck (via cranial nerves) and limbs and torso (via spinal nerves).

Motor Speech Disorders

Dysarthria

Apraxia of Speech

Flaccid Spastic Ataxic Hypokinetic Hyperkinetic Mixed Unilateral upper motor neuron

Weakness, paralysis, incoordination

Planning (no muscle deficit)

Figure 14–1.  A simple classification scheme for motor speech disorders in adults. 192

14  Motor Speech Disorders in Adults

and/or incoordination are not thought to be impaired in AAS. Rather, the patient has difficulty with the plan for production of the utterance. The planning deficit may include problems with the order of consecutive syllables in an utterance, and problems with instructions from the cortex for the correct timing of muscle contractions throughout an utterance.

Dysarthria The problem in dysarthria is thought to be limited to neuromuscular execution. The speech disorder is not thought to be a partial or significant result of problems with the symbolic component of language, as in aphasia (see Chapter 9). In dysarthria the patient knows what she wants to say, plans the utterance in a normal way, but fails to produce it normally because of the muscle problems noted earlier.

Subtypes of Dysarthria The subtypes of dysarthria in Figure 14–1 constitute the Mayo Clinic System for Classification of Dysarthria. Darley, Aronson, and Brown (1975), clinician-scientists who worked at the Mayo Clinic where they developed the classification system, believed that each subtype had a unique “sound” that could be related directly to the location of damage within the nervous system. In fact, Darley et al. (1975) claimed that in dysarthria, the sound of a patient’s speech had a “localizing” value. In this view, skilled clinicians can identify the location of neurological damage by listening to a patient’s speech. In the 1960s, when Darley et al. developed the classification system, imaging techniques such as computerized axial tomography (CAT) scans and magnetic resonance imaging (MRI) were not available to identify the location of brain damage. Localization of neurological damage by listening to a patient’s speech was a major contribution to medical diagnosis.

The Mayo Clinic Classification System for Motor Speech Disorders Throughout the 1960s, a monumental study of motor speech disorders took place at the Mayo Clinic in Rochester, Minnesota. Darley, Aronson, and Brown (1975) listened to tapes of over 200 patients with different 2 

193

neurological diseases, and based on what they heard made precise estimates of the severity of different speech characteristics. For example, Darley, Aronson, and Brown (hereafter, DAB) knew that a very common speech characteristic in motor speech disorders was imprecise consonants. The term was meant to indicate speech in which the consonant sounds appeared to be articulated in a noncrisp, imperfect way.2 DAB also knew, from listening to many patients with motor speech disorders, that the loss of crisp consonant articulation may range from a very subtle consonant problem to an obvious loss of articulatory ability. They therefore used a seven-point, equal-appearing interval scale to record their impressions of the severity of each patient’s consonant articulation ability. The two ends of the scale are given labels — scientists call this “anchoring” the scale — with the number 1 indicating normal consonant articulation and the number 7 a very severe deviation from normal consonant articulation — severely imprecise consonants. The numbers in between these two extremes indicate different degrees of imprecise consonants. DAB made interval-scale estimates for a total of 38 characteristics of speech. The 38 perceptual dimensions were selected to represent the different components of the speech production process (speech breathing, phonation, velopharyngeal function, articulation). The selected dimensions were based on DAB’s extensive experience with motor speech disorders and the authors’ knowledge of aspects of speech most likely to be impaired as a result of neurological disease. A few of the dimensions scaled by DAB, along with their definitions (as given by Darley, Aronson, & Brown, 1969) are listed in Table 14–2. The more than 200 patients studied by DAB were not a random sample of people with motor speech disorders seen at the Mayo Clinic but were selected to represent six major disease types. These types were brainstem disease, stroke, Parkinson disease, cerebellar disease, Huntington’s disease, and amyotrophic lateral sclerosis. For each disease type, DAB summarized their perceptual analysis of the 38 dimensions with a statistical procedure designed to detect patterns among the perceptual dimensions. According to DAB’s hypothesis, each disease was expected to have unique patterns among the 38 perceptual dimensions. Because each of the diseases they studied has specific and unique locations of brain damage, DAB’s hypothesis of a unique “sound” (in the broad sense, not in the sense of a

Note that many of these speech sound errors were not substitutions of one sound for another, such as a clear /ʃ/ (“shave”) for /s/ (“save”) error. Documentation of these kinds of error requires narrow phonetic transcription (Chapter 12). Sound substitutions such as /ʃ/ for /s/ are also heard in dysarthria.

194

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Table 14–2.  Selected Perceptual Dimensions and Their Definitions

Dimension

Definition

Imprecise consonants

Consonant sounds lack precision. They show slurring, inadequate sharpness, distortions, and lack of crispness. There is clumsiness in going from one consonant sound to another.

Strained-strangled voice

Voice (phonation) sounds strained or strangled (an apparently effortful squeezing of voice through the glottis).

Harsh voice

Voice is harsh, rough, and raspy.

Breathy voice

Continuously breathy, weak, and thin

Distorted vowels

Vowel sounds are distorted throughout their total duration.

Prolonged intervals

Prolongation of interword or intersyllabic intervals.

Hypernasality

Voice sounds excessively nasal. Excessive amount of air is resonated by nasal cavities.

Monopitch

Voice is characterized by a monopitch or monotone. Voice lacks normal pitch and inflectional changes. It tends to stay at one pitch level.

Monoloudness

Voice shows monotony of loudness. It lacks normal variations in loudness.

Excess and equal stress

Excess stress on usually unstressed parts of speech, e.g., (a) monosyllabic words and (b) unstressed syllables of polysyllabic words.

Note.  These perceptual dimensions were prominent in the Mayo Clinic analysis of patients with motor speech disorders. The definitions are reproduced verbatim from Darley, Aronson, and Brown (1969). Some dimension names have been reordered (e.g., “distorted vowels” was “vowels distorted” in Darley et al. [1969]).

unique phoneme sound “problem”; see immediately above) for each of these diseases was equivalent to a hypothesis of a unique-sounding speech for damage to different parts of the brain.

Based on their analysis of the perceptual data, DAB confirmed their hypothesis by identifying six unique dysarthrias. These include the first six listed in Figure 14–1. The unique subtypes were flaccid,

The Nervous System Is More Complicated Than That Darley, Aronson, and Brown stated their hypothesis of a strong link between the location of nervous system damage (neurologists call this “site of lesion”) and speech symptoms, knowing that a given neurological disease does not have a single site of lesion. What Darley, Aronson, and Brown meant was primary site of lesion. For example, in Parkinson’s disease, the primary site of lesion is in the midbrain (mesencephalon) where cells in the substantia nigra deteriorate and die. The death of these cells deprives the central nervous system of

dopamine, a neurotransmitter critical to control of movement. Other lesion sites, however, have also been identified in Parkinson’s disease, including the cerebellum, basal ganglia, and brainstem (Joutsa, Horn, Hsu, & Fox, 2018). The lack of simple lesion-location/disease combinations applies to other neurological diseases as well. Keep this in mind when considering the idea that listening to the speech of someone with a neurological disease is a straightforward way to know where the lesion is.

14  Motor Speech Disorders in Adults

spastic, ataxic, hypokinetic, hyperkinetic, and mixed dysarthria. These six dysarthrias can be described by their most severely affected perceptual dimensions, the location of the relevant brain damage, and the muscle control problems thought to produce the abnormal and distinguishing speech symptoms. The reader should refer back to Table 14–2 for explanations of the perceptual dimensions discussed later.

Flaccid Dysarthria The distinguishing perceptual dimensions in flaccid dysarthria included breathy voice, hypernasality, and imprecise consonants. An overall impression of the typical speaker with flaccid dysarthria is one of a weak, somewhat nasal voice with weak (noncrisp) articulation.

195

The group of patients with flaccid dysarthria studied by DAB had damage in the brainstem (blue-shaded areas in Figure 14–2) or in the cranial nerves exiting the brainstem to innervate muscles of the speech mechanism (yellow arrows pointing from the brainstem in direction of muscles). Damage to the brainstem motor neurons that innervate speech muscles results in paralysis or weakness of the muscles, as well as atrophy (often called “wasting”) of muscle tissue. Similar problems occur with damage to the nerves that carry the motor neuron commands to the muscles. For example, weakness of the laryngeal muscles will prevent firm closure of the vocal folds during each cycle of phonation, producing the breathy voice noted earlier. Similarly, weakness of muscles of the velopharyngeal port will result in hypernasality; weakness of the jaw, tongue, and lips will cause imprecise consonants.

To speech muscles Spinal nerves to muscles of respiratory system Figure 14–2.  Sagittal (from the side) view of the inner wall of the right hemisphere. The brainstem is shaded blue, yellow-orangish arrows directed away from the pons and medulla represent cranial nerves that control the muscles of the speech mechanism (muscles of the jaw, lips, tongue, soft palate, pharynx, and larynx). An arrow below the brainstem represents spinal nerves that control muscles of the limbs and torso. Corpus callosum, in red, is shown as a landmark.

196

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Flaccid dysarthria is the result of lower motor neuron disease. “Lower motor neuron” refers to the motor nuclei in the brainstem and the cranial nerves that carry information from these nuclei to muscles. The term “lower motor neuron” contrasts with “upper motor neuron” which is discussed later in the section, “Spastic dysarthria.”

Spastic Dysarthria Imprecise consonants, monopitch, and reduced stress were the three most impaired perceptual dimensions in spastic dysarthria. Many of these patients also had a slow speaking rate and a strained-strangled voice quality. The extremely slow speaking rates of patients with spastic dysarthria distinguished them from patients with flaccid dysarthria (most patients with flaccid dysarthria had normal speaking rates), and the strangled, harsh voice quality was very different from the breathy voice quality of flaccid dysarthria.

Patients with spastic dysarthria typically had damage somewhere along the fiber tracts connecting motor cells in the cortex to motor neuron cells in the brainstem. Recall from Chapter 2 that the cortex consists of massive amounts of gray matter — clusters of neuron cell bodies. These cell bodies are connected to other groups of cell bodies in the brain by white matter, formed from bundles of axons (fiber tracts). In Figure 14–3, cortical motor cells for muscles of the speech mechanism such as the jaw, lips, tongue, velum, and pharynx are represented by the two upper, brown dots. The two brown dots located in the brainstem represent the motor neurons for these muscles. The yellow-orange arrows originating in the cortical cells and ending on these brainstem motor neurons represent a fiber tract (white matter) called the corticobulbar tract. Part of this fiber tract terminates in motor nuclei of the pons, another part of the tract terminates in motor nuclei of the medulla.3 Patients with spastic

Corticobulbar tract

Corpus callosum

Pons Medulla

Figure 14–3.  Sagittal (from the side) view of the inner wall of the right hemisphere. Two brown circles in the cortex indicate motor cells controlling articulators such as the jaw and tongue, and two brown circles in the pons and medulla indicate motor neurons that are connected directly via cranial nerves to muscles of the speech mechanism. The yellow-orangish lines connecting the cortical motor neurons and brainstem motor neurons represent the corticobulbar tract. 3 

Part of the corticobulbar tract also terminates in the midbrain where motor nuclei are located but are not associated with speech.

14  Motor Speech Disorders in Adults

dysarthria studied by DAB had lesions on both sides of the brain, that is, in the left and right corticobulbar tracts. The corticobulbar tract transmits motor commands in the cortex to cells in the brainstem. Because the fiber tract is above the motor neurons in the brainstem, damage to it is referred to as upper motor neuron disease. According to DAB, damage to this fiber tract — but not to the motor neuron cells in the brainstem — results in spastic dysarthria.4 Upper motor neuron disease usually results in very specific changes to the affected muscles, such as hypertonic (excessive) tone which makes them stiff. The muscles also have overly sensitive reflexes (hyperreflexia), causing them to contract with unusual force when stretched even a small amount. Hypertonic muscles have difficulty causing movement of structures (such as the jaw), and hyperreflexive muscles result in unstable muscle contraction and movement. Thus, movement is impaired in many muscles of the speech mechanism, such as muscles of the tongue, lips, soft palate, larynx, and muscles that control jaw opening and closing. DAB believed these excessively contracted, stiff muscles were responsible for many of the primary speech symptoms of spastic dysarthria. For example, the “monopitch” characteristic of spastic dysarthria was caused by excessive stiffness of the laryngeal muscle responsible for voice pitch changes. Similarly, the “strained-strangled” voice quality was caused by excess tone of laryngeal muscles, resulting in overlytight vocal fold closure during phonation. Imprecise consonants were the result of difficulty in moving structures, such as the tongue, into the proper positions for the articulation of speech sounds. The perception of “weak stress” was due to the patient’s inability to produce stress distinctions between unstressed and stressed syllables, as in words like “about” in which the first syllable is unstressed. This inability was thought to be related to the difficulty of adjusting the aspects of speech production (e.g., pitch, loudness, and duration) that are used to create stress differences.

Ataxic Dysarthria The primary perceptual features of ataxic dysarthria were imprecise consonants, excess and equal stress, and irregular articulatory breakdown. In English speech production, syllables alternate between ones with relatively long duration and ones with relatively short duration. For example, in the sentence, “The 4 

197

party is off the hook,” there are two relatively long syllables (“par” and “hook”) and five relatively short syllables (the two “the’s”; the “ee” in “party,” “is,” and “off”). Long syllables are typically stressed and short syllables unstressed. Speakers with ataxic dysarthria tend to equalize all syllable durations in an utterance by making each syllable long and fairly loud. This is what is meant by “excess and equal stress.” “Irregular articulatory breakdown” is a term coined by DAB to capture the fluctuating nature of the speech problem in ataxic dysarthria. Speakers with ataxic dysarthria may sound fairly normal for short stretches of speech and suddenly produce a clearly dysarthric string of syllables. Listeners often describe ataxic dysarthria as drunk-sounding speech. This impression is due partially to the excess and equal stress on the speaker’s syllables, which sounds to listeners as if each syllable is being “metered out” on a strict time schedule rather than following the normal speech rhythm of alternating long and short syllables. Speech-language pathologists refer to this perceptual impression as “scanning speech” (each syllable being “scanned” carefully and then produced as if disconnected from the next syllable). The impression of drunkensounding speech is probably also promoted by the tendency of speakers with ataxic dysarthria to produce a speech melody (intonation) with markedly exaggerated pitch changes, giving their speech an “out of control” quality. Ataxic dysarthria results from damage to the cerebellum or its connecting fiber tracts. In Figure 14–4, cell bodies within the cerebellum are represented by brown dots, and the connecting fiber tracts are shown as yellow-orange lines ending in arrowheads. These arrowheads point away from the cerebellum to other parts of the brain or are directed from other parts of the brain into the cerebellum. Damage to the cerebellum causes a number of wellknown, general neurological symptoms. For example, patients with cerebellar damage have difficulty maintaining a steady rhythm, even when asked to open and close their forefinger and thumb in a repetitive, simple way. Patients also have difficulty controlling the force of their muscle contractions, which results in actions performed with either excessive or insufficient force. It is as if the patient has difficulty scaling muscle contraction to the needs of the task. This may explain the tendency of people with ataxic dysarthria to produce sequences of syllables with excess and equal stress (see Box, “Spanish and Ataxic Dysarthria”).

 pper motor disease also refers to damage to the corticospinal tract connecting cortex to motor neurons in the spinal cord. The damaged U corticospinal tract and its possible relationship to dysarthria is not discussed further in this chapter.

198

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Corpus callosum

Cerebellum Pons Medulla

Figure 14–4.  Sagittal (from the side) view of the inner wall of the right hemisphere. The arrows show the interconnections between the cerebellum and structures of the central nervous system. The arrows show that the cerebellum is connected to the spinal cord, the brainstem, and the cortex, sending and receiving information from all three major components of the CNS.

Spanish and Ataxic Dysarthria Darley, Aronson, and Brown developed their classification system based on speech production of American English speakers with neurological disease. The reasoning for a prominent “excess and equal stress” dimension in ataxic dysarthria is solid, but only for languages (such as English) with syllable sequences that vary in a long-short-long-short pattern. Many languages have syllable sequences in which each syllable has roughly the same duration; syllables do not vary in a long-short-long-short pattern. Spanish is such a language, as are Finnish and several Asian

Hypokinetic Dysarthria Monopitch, reduced stress, and monoloudness were the three prominent perceptual dimensions for patients with hypokinetic dysarthria. In addition, many speak-

languages (Korean, Mandarin Chinese, Japanese). Is “equal and excess stress” relevant to the dysarthria classification of a Spanish speaker with cerebellar disease? The same question can be asked of many of the speech dimensions that contribute to the Mayo classification system but do not apply in the same way across different languages. The answers to these questions are not known with any certainty; cross-linguistic studies of dysarthria are in their infancy (Kim & Choi, 2017; Liss, Utianski, & Lansford, 2013).

ers with Parkinson’s disease (PD) had imprecise consonants and “short rushes of speech,” a tendency for sudden, very rapid, and “mumbly” sequences of syllables. Speakers with hypokinetic dysarthria are also perceived as having an extremely weak, soft voice.

14  Motor Speech Disorders in Adults

Hypokinetic dysarthria is associated almost exclusively with Parkinson’s disease.5 Parkinson’s disease involves cell death in a midbrain nucleus called the substantia nigra (see blue oval in Figure 14–5 for approximate location). Cells in the substantia nigra are of critical importance because they manufacture the chemical dopamine that is used as a neurotransmitter in parts of the brain responsible for movement control (as well as other parts of the brain involved in memory and the experience of pleasure). Dopamine manufactured in the substantia nigra is delivered to the basal ganglia, the group of cells above the brainstem but deep within the cerebral hemispheres. The dopamine pathway from the substantia nigra to the basal ganglia is indicated in Figure 14–5 by the upwardpointing arrows. The basal ganglia play an important role in movement control; to perform that role, they need an adequate supply of dopamine. Loss of dopamine due to cell death in the substantia nigra is responsible for movement problems in Parkinson’s disease.

199

The patient with Parkinson’s disease often has a resting tremor, usually in the hand but sometimes in other body structures. A resting tremor occurs when the hand is not moving; the tremor is not seen when the hand is moving to accomplish a goal such as twisting a beer cap. Body structures such as the arms and legs, as well as in the speech mechanism, have a rigid quality ​ — they are stiff and resist movement when displaced (as when an examiner pulls or pushes on the arm or leg). Movements, when produced, are slow and small. At times, the patient has difficulty initiating movement, as if he or she is “frozen” in place. DAB thought that the top-ranked perceptual dimensions in hypokinetic dysarthria of monopitch, reduced stress, and monoloudness, as well as overall reduced loudness and imprecise consonants, could be explained by slow and small respiratory, laryngeal, and articulatory movements resulting from the loss of dopamine in the basal ganglia.

Corpus callosum

Midbrain

Figure 14–5.  Sagittal (from the side) view of the inner wall of the right hemisphere. The blue oval shows the location of the substantia nigra in the midbrain (mesencephalon) and the arrows the direction of dopamine delivery to the basal ganglia where it is used as a critical neurotransmitter in the control of movement. 5 

 ertain neurological diseases produce symptoms like those of Parkinson’s disease yet do not qualify for a specific diagnosis of the disease. C Patients with these diseases are often referred to as having “parkinsonism.” Patients with parkinsonism may also have hypokinetic dysarthria.

200

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Hyperkinetic Dysarthria Hyperkinetic dysarthria is a result of several different diseases of the basal ganglia. DAB studied hyperkinetic dysarthria in two of these diseases, Huntington’s disease and dystonia, both described later in this chapter. The basal ganglia are a complex group of cells, composed of several separate but interconnected nuclei. Figure 14–6 shows a midsagittal view of the right hemisphere in which an oval indicates the approximate location of the nuclei of the basal ganglia. The region outlined by the oval is deep within the cerebral hemispheres, above the brainstem (Chapter 2, Figures 2–6 and 2–7). Damage to any one of these nuclei may produce somewhat unique neurological symptoms, far too many to cover in this chapter. Here the general characteristics are presented for the two basal ganglia diseases studied by DAB. Huntington’s disease is a genetic disorder in which movement difficulties are among the first symptoms. Later in the course of the disease, patients experience severe cognitive and psychiatric disturbances. The movement symptoms are dominated by chorea, in which a series of twitches, jerks, and sudden move-

ments give the patients the appearance of being in constant motion. Muscle tone may be continuously variable in Huntington’s disease, sometimes being normal but often ranging from excessively stiff (hypertonic) to floppy (hypotonic) in a short period of time. The movement symptoms are sometimes described as having an ataxic component — inability to control the range and force of movements, and difficulty in producing a steady rhythm. Dystonia is a basal ganglia disease in which muscle contraction builds up and is sustained with excessive force for unusually long intervals. Muscle contractions in dystonia are overly strong (hypercontraction) for the task at hand, often resulting in an unintended, sustained posture of the trunk, arm, hand, eyelids, or jaw. Hypercontraction in dystonia often occurs as an exaggeration of a purposeful movement to accomplish a task, rather than occurring at random times or at rest. For example, in oromandibular dystonia, a hypercontraction of the jaw muscles occurs when the patient begins to speak, or perhaps when he chews, but typically does not occur when the patient is not using the jaw for a specific purpose. Similarly, spasmodic dysphonia is a dystonia that affects the vocal folds by closing them

Corpus callosum

Location of group of nuclei called the basal ganglia Figure 14–6.  Sagittal (from the side) view of the inner wall of the right hemisphere. The large blue oval encompasses the regions above the midbrain and below the cortex where the nuclei of the basal ganglia are located. The basal ganglia include several interconnected nuclei.

14  Motor Speech Disorders in Adults

forcefully when the patient attempts to produce voice (i.e., to phonate). The forceful closing is the result of overly strong and sustained contractions produced by the muscles that close the vocal folds (see Chapter 10 and Chapter 18). This closing spasm of the vocal folds interrupts vocal fold vibration, essentially preventing the patient from producing voice. Like oromandibular dystonia, the laryngeal spasms occur when a person phonates, not at random times. The most affected perceptual dimensions in these two forms of hyperkinetic dysarthria were imprecise consonants, prolonged intervals, and variable rate in patients with Huntington’s disease, and imprecise consonants, distorted vowels, and harsh voice quality in dystonia. Both groups of patients had irregular articulatory breakdowns, as in ataxic dysarthria. The hyperkinetic dysarthria in Huntington’s disease and dystonia is thought to be the result of overly strong and sustained contractions of speech muscles, which make it difficult to move from one speech sound to the next. The sustained contractions in dystonia “hold” articulators in one position when they should be moving smoothly and quickly to the next speech sound. The constant and variable movements in Huntington’s disease are thought to affect the muscles of the speech mechanism from making precise, consistent movements. This loss of control causes speech to sound inconsistent, with fluctuating voice and articulation characteristics.

Mixed Dysarthria In each of the five dysarthria types summarized previously, the lesion causing the problem was thought to be in one major region of the brain. Flaccid dysarthria was a result of brainstem or cranial nerve damage (lower motor neuron disease), spastic dysarthria from damage in the cortex or the tract that carries information from the cortex to the brainstem motor nuclei (upper motor neuron disease), ataxic dysarthria from cerebellar disease, hypokinetic dysarthria from substantia nigra lesions (usually Parkinson’s disease), and hyperkinetic dysarthria from basal ganglia lesions (Huntington’s disease). Some neurological diseases are known to have damage to two or more of these major brain regions. For example, patients with multiple sclerosis (MS) often have upper motor neuron and cerebellar lesions. For these patients, a dysarthria with both spastic (upper motor neuron disease) and ataxic (cerebellar) characteristics might be expected. DAB believed their perceptual analyses of speakers with MS were consistent with a mixed, spastic-ataxic dysarthria. DAB regarded amyotrophic lateral sclerosis (ALS) as another neurological disease associated with a

201

mixed dysarthria. In ALS, death of motor neurons in the brainstem (lower motor neuron disease) is combined with lesions in the fiber tract connecting cortical motor cells with brainstem motor neurons (upper motor neuron disease). In this case, the mixed dysarthria is of the flaccid-spastic type. In theory, a mixed dysarthria may consist of any combination of the five major categories described previously. Whether or not patients with any combination of a mixed dysarthria (such as a flaccid-hyperkinetic dysarthria) have lesions consistent with these perceptual impressions is unknown. Surprisingly, there are no brain imaging studies linking dysarthria categories with documented site of lesion.

Unilateral Upper Motor Neuron Dysarthria As described earlier, spastic dysarthria is thought to be the result of bilateral (both sides) lesions to the corticobulbar tract. Most of these lesions are the result of stroke, where a loss of blood flow to the fibers connecting cortical cells to brainstem motor neurons results in damage to or destruction of the fibers. There are cases, however, in which a stroke affects only one side of the brain (unilateral damage), leaving the corticobulbar tract on the other side intact and functional. In particular, some strokes may produce a loss of blood flow to a very small part of the corticobulbar tract, resulting in a typically mild and often transient motor speech disorder called unilateral upper motor neuron (UUMN) dysarthria. UUMN dysarthria was not part of the original DAB classification system for motor speech disorders. As pointed out by Duffy (2015), UUMN dysarthria was probably left out of the classification system because the speech characteristics were so mild. In addition, the mild dysarthria often resolved over a short period of time following a stroke. The mild characteristics of UUMN dysarthria include imprecise consonants, irregular articulatory breakdowns, harsh voice, and slow speaking rate (Duffy, 2015). The impression of imprecise consonants dominates the speech of persons with UUMN dysarthria.

The Dysarthrias:  A Summary Table 14–3 lists the categories of dysarthria in the Mayo Clinic classification system, along with the prominent perceptual dimensions heard by DAB for each type. Some of the perceptual dimensions were prominent for several dysarthria types (e.g., imprecise consonants) and some were uniquely prominent in only one type (e.g., distorted vowels). Table 14–3 also includes prom-

202

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Table 14–3.  Mayo Clinic Classification of Dysarthria With Prominent Perceptual Impressions for Each Category

Classification

Prominent Perceptual Impressions

Flaccid

Breathy voice, imprecise consonants, hypernasality

Spastic

Imprecise consonants, monopitch, reduced stress, slow rate

Ataxic

Imprecise consonants, excess and equal stress, irregular articulatory breakdowns

Hypokinetic

Monopitch, reduced stress, breathy voice

Hyperkinetic

Imprecise consonants, prolonged intervals, variable rate (Huntingtons disease) Imprecise consonants, distorted vowels, harsh voice (Dystonia)

Mixed

Any combination of above, e.g., spastic-flaccid dysarthria (as in ALS), spastic-ataxic dysarthria (as in MS)

Unilateral upper motor neuron dysarthria

Imprecise consonants, slow speaking rate, harsh voice, irregular articulatory breakdowns (all mild)

inent perceptual characteristics of unilateral upper motor neuron dysarthria. When the speech of a person with dysarthria is heard, listeners do not separate their perceptual impressions into the individual perceptual dimensions used by DAB to scale their prominence. Listeners hear a “whole” (integrated) percept. The point made by DAB, following analysis of all the perceptual dimensions and how they clustered differently in each of the dysarthrias listed in Figure 14–1 and Table 14–3, was

that listening to the speech of a person with dysarthria provided a good clue to the location of damage within the nervous system. This was the localizing value of careful listening. Some dysarthrias may have natural recovery, such as the improvement of speech during recovery from stroke. Some dysarthrias may become increasingly worse, as in degenerative neurological diseases such as multiple sclerosis, Parkinson’s disease, and ALS. In most cases of dysarthria, SLPs are effective in

Identification of Dysarthria Type by Listening: A Brief Natural History In the mid- to late-1960s, when Darley, Aronson, and Brown were developing their classification system for dysarthria, the idea of identifying site of lesion (location of damage) by listening to a patient’s speech was a big deal. Brain imaging techniques were in their infancy, making the perceptual expertise of SLPs a significant contribution to diagnosis of site of lesion. Two developments have lessened the role of SLPs in the medical diagnosis of site of lesion, and by extension, the disease in persons with dysarthria. First, in many cases, the contemporary use of highly sophisticated imaging techniques (CAT,

MRI, positron-emission tomography [PET]) allows for detailed visualization of brain structures and accurate location of site of lesion. Second, identification of disease type from listening to speech, and the inference to site of lesion, are not very reliable. Several studies have reported a relatively low level of agreement among professionals for both individual perceptual dimensions and dysarthria type (see review in Bunton, Kent, Duffy, Rosenbek, & Kent, 2007). Nevertheless, the Mayo Clinic classification system is in contemporary use by both clinicians and scientists.

14  Motor Speech Disorders in Adults

finding treatment strategies to improve communication skills.

Apraxia of Speech The other major subcategory of motor speech disorders is AAS. AAS is a controversial disorder, mostly because professionals have not agreed on its defining characteristics (Ballard et al., 2016). AAS has been related to lesions in various parts of the central nervous system, most often in the left cerebral hemisphere (Graff-Redford et al., 2013). It is possible that there are several types of AAS, but in this chapter the possible subtypes are not discussed. In their original description of the Mayo Clinic classification system, DAB claimed that AAS was different from dysarthria. Unlike dysarthria, the speech characteristics of AAS were not the result of paralysis, weakness, or incoordination of the speech muscles. Despite the absence of muscle problems in AAS, the patients had articulatory errors as well as other types of speech abnormalities (Darley, Aronson, & Brown, 1975). These speech abnormalities typically appeared following a stroke or surgery affecting the left hemisphere of the brain. Because muscle problems did not seem to explain AAS, the speech problems had to be explained on a different basis. The perceptual impressions of apraxia of speech included very slow speaking rate (see earlier, description of spastic dysarthria), a tendency to produce speech as if the component syllables were “pulled apart” from each other (see earlier, description of ataxic dysarthria), and imprecise consonants and vowels. In addition, when asked to produce a word or sentence, people diagnosed with AAS appeared to search for the right articulatory configuration to begin the utterance, as if they were unsure of the correct way to produce the initial speech sound(s). Patients configured their lips and tongue in a certain position, hesitated, and tried another configuration as if searching through several attempts to “get it right” before beginning the utterance. DAB described this searching as articulatory groping. The patients also had greater articulatory difficulty when trying to produce a multisyllabic word (such as “statistical”) as compared with the single syllable at the beginning of the word (e.g., “statistical” versus “stat”). The multisyllabic word was more likely to elicit sound errors and articulatory groping (word length effect). DAB proposed AAS as a motor speech programming problem. In this kind of disorder, the speech problems are not the result of deficits in the direct control of the

203

speech muscles but rather in the plan for their control. Evidence in studies of speech motor control supports the separation of neural planning (programming) processes from execution of an act (Maas & Mailend, 2012). A programming problem is something like a problem with computer code. A computer in which the hardware is in perfectly good shape does not perform its tasks correctly when the software code contains errors (see Box, “The Code: Can It Be Fixed”). Apraxia of speech is also diagnosed in children, as discussed in Chapter 15.

The Code:  Can It Be Fixed? The analogy of a well-functioning computer running defective software to the speech-motor planning problem in AAS is based on the meaning of the term “praxis.” Praxis, a word from Old Greek, means the process of producing a skill. DAB categorized the disorder as apraxia of speech to represent the presumed deficits in the process — that is, the plan for an articulatory sequence, just as lines of computer code are the process for the computer’s action. Other kinds of apraxia occur in persons who have had strokes and are recovering. For example, given a toothbrush and asked to show how to use it, a patient may raise the toothbrush to his face and hesitate as if he is not sure how to continue the act of brushing teeth. Then, the patient may act but in the wrong way: he may use the toothbrush to perform a hair-brushing gesture. How does a clinician go about fixing (or reducing) the programming problem in AAS? A lot of discussion surrounds this question (e.g., McNeil, Ballard, Duffy, & Wambaugh, 2016), but at the current time the evidence for effective clinical approaches is not convincing.

Chapter Summary Motor speech disorders in adults are a group of speech disorders caused by damage to the central and/or peripheral nervous system. Many neurological diseases are associated with a motor speech disorder. Classification of motor speech disorders was formalized by Mayo Clinic clinician-scientists in the

204

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

late 1960s and early 1970s; the classification system is widely used and accepted by speech-language pathologists and researchers. The classification for motor speech disorders was based primarily on perceptual impressions of the patient’s speech. The classification system for motor speech disorders included a group of dysarthrias, associated with problems in the control of speech muscles, and a planning disorder in which muscle control was more or less intact, but the ability to plan (program) articulatory sequences was impaired. The classification categories for dysarthria included flaccid, spastic, ataxic, hypokinetic, hyperkinetic, and mixed types. The classification category for the planning disorder was apraxia of speech. The classification is based on groups of perceptual impressions, called “perceptual dimensions,” which were scaled from normal to most severe by the Mayo Clinic clinician-scientists as they listened to audiotape recordings of a paragraph-level passage. The patients who read the passage included persons with known neurological diseases, including brainstem disease, damage to the corticobulbar and corticospinal tracts, damage to the cerebellum, damage to dopamine-producing cells in the midbrain, and damage to the basal ganglia. Patients with multiple sclerosis (damage to both the corticobulbar/corticospinal tracts, and to the cerebellum) and amyotrophic lateral sclerosis (damage to both the corticobulbar/corticospinal tracts, and to the brainstem) were also studied. The general theory of the Mayo Clinic classification system is based on the idea that by listening to a patient’s speech, a trained clinician-scientist can make a likely estimate of the location of neurological damage, and by extension the patient’s neurological disease.

References Ballard, K. J., Azizi, L., Duffy, J. R., McNeil, M. R., Halaki, M., O’Dwyer, N., . . . Robin, D. A. (2016). A predictive model for diagnosing stroke-related apraxia of speech. Neuropsychologia, 81, 129–139. Bunton, K., Kent, R. D., Duffy, J. R., Rosenbek, J. C., & Kent, J. F. (2007). Listener agreement for auditory-perceptual ratings of dysarthria. Journal of Speech and Hearing Research, 50, 1481–1495. Darley, F. L., Aronson, A. E., & Brown, J. R. (1969). Differential diagnostic patterns of dysarthria. Journal of Speech and Hearing Research, 12, 246–269. Darley, F., Aronson, A., & Brown, J. (1975). Motor speech disorders. Philadelphia, PA: Saunders. Duffy, J. D. (2015). Motor speech disorders: Substrates, differential diagnosis, and management (4th ed.). St. Louis, MO: Mosby Elsevier. Graff-Redford, J., Jones, D. T., Strand, E. A., Rabinstein, A. A., Duffy, J. R., & Josephs, K. A. (2013). The neuroanatomy of pure apraxia of speech. Brain and Language, 129, 43–46. Joutsa, J., Horn, A., Hsu, J., & Fox, M. D. (2018). Localizing parkinsonism based on focal brain lesions. Brain, 141, 2445–2456. Kim, Y., & Choi, Y. (2017). A cross-linguistic study of acoustic predictors of speech intelligibility in individuals with Parkinson’s disease. Journal of Speech, Language, and Hearing Research, 60, 2506–2518. Liss, J. M., Utianski, R., & Lansford, K. (2013). Cross-linguistic application of English-centric rhythm descriptors in motor speech disorders. Folia Phoniatrica et Logopaedica, 65, 3–19. Maas, E. L., & Mailend, M. L., (2012). Speech planning happens before speech execution: Online reaction time methods in the study of apraxia of speech. Journal of Speech, Language, and Hearing Research, 55, S1523–S1534. McNeil, M. R., Ballard, K. J., Duffy, J. R., & Wambaugh, J. (2016). Apraxia of speech: Theory, assessment, differential diagnosis, and treatment: Past, present, and future. In P. H. H. M. Lieshout, B. Maassen, & H. Terband (Eds.), Speech motor control in normal and disordered speech: Future developments in theory and methodology (pp. 195–221). Rockville, MD: ASHA Press.

15 Pediatric Speech Disorders I Introduction Many writers have discussed the history of how the field of Communication Sciences and Disorders has viewed speech sound disorders in children (see Bankson, Bernthal, & Flipsen, 2017). The history is interesting because the vast majority of children seen in speech and hearing clinics for delayed or different speech sound development do not present other symptoms that clearly point to the cause of the problem. The explanation for delayed or different speech sound development in these children has therefore been the cause of much speculation and debate; speculation and spirited debate always make for an interesting scientific and clinical history. What kinds of conditions would clearly explain a developmental speech sound disorder? Three conditions are immediately suggested, including hearing impairment, structural (anatomical) problems with the speech mechanism, and neuromuscular problems associated with a known form of disease of the central and/ or peripheral nervous system. A significant hearing impairment has an effect on a child’s speech sound development. The details of the hearing impairment may not account for the details of the speech sound problems, but in a general sense, the child with hearing impairment has a strong likelihood of delayed or different (as compared to the typically developing child) development of the speech sounds of her language.

Structural problems in the speech mechanism can also result in developmental articulation disorders. For example, a child born with a cleft palate may have problems closing the velopharyngeal port even after surgery is performed to close the palate and reattach the relevant muscles in the correct configuration. This child can be expected to have difficulty with obstruents, the speech sounds requiring a positive oral pressure for correct production. When the child attempts to produce these sounds, air leaks through the ineffective VP port and the resulting speech sounds are incorrect. The same child may try to compensate for this structural problem in a way that introduces yet another error into her developmental speech sound profile (see Chapter 19). Finally, a child born with a neurological disease such as cerebral palsy or who suffers some other form of brain insult (as a result of surgery, traumatic brain injury, and other diseases) may have difficulty moving the articulators, laryngeal muscles, and/or muscles of the respiratory system, any or all of which may contribute to delayed or different speech sound development. This chapter presents information on two pediatric speech sound disorders. By definition, pediatric speech sound disorders arise in childhood. The present chapter considers two such disorders whose cause has yet to be identified. These disorders are called speech delay (SD), and childhood apraxia of speech (CAS). Stuttering is also a developmental speech disorder of currently unknown origin. There are well-founded 205

206

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

suspicions, however, that stuttering is best classified as a developmental motor speech disorder — that a speech motor control problem is the underlying basis of stuttering even if it is not the only factor that determines stuttering behavior (Smith & Weber, 2017). We have chosen to present material on stuttering in a separate chapter (Chapter 17). CAS is also considered a motor speech disorder by many clinicians and scientists largely because some of the speech symptoms are said to resemble those in adults with known brain lesions and who have been diagnosed with adult apraxia of speech (AAS) (Ad Hoc Committee on Childhood Apraxia of Speech, 2007). There is some preliminary evidence for central nervous system dysfunction in CAS (e.g., Fiori et al., 2016). This evidence, however, requires extensive, speculative inferences from the tentative results of a few brain imaging studies to the speech behavior in CAS. Some scientists have argued that current evidence does not support a brain basis for CAS (Liégeois & Morgan, 2012; Morgan & Webster, 2018).The evidence for a neural basis in CAS is also not nearly as strong as it is in stuttering, and there is much uncertainty and controversy about the diagnosis of CAS (Ad Hoc Committee on Childhood Apraxia of Speech, 2007). In the current chapter, CAS is considered as a speech sound disorder of unknown origin. Speech sound disorders with unknown causes constitute the majority of all childhood speech disorders. The term “speech delay” refers to speech sound development that lags typical development (see Chapter 13) without a clear explanation for the delay. “Speech delay” is not the only term used to designate this category of childhood speech sound disorders. Some authors use the terms “phonological delay” or “articulatory disorders” to refer to delayed mastery of speech sounds with unknown cause (e.g., Eecen, Eadie, Morgan, & Reilly, 2018). In this chapter, we use “speech delay” to represent all of these terms. Some comments are made, however, concerning potential implications of the difference between the terms “speech delay,” “articulatory delay,” and “phonological delay.” CAS refers to delayed and disrupted speech sound development that includes speech sound patterns, prosodic characteristics, and continued (throughout development) severity not seen in children with speech delay. A child may have speech characteristics that do not clearly suggest a diagnosis of either speech delay or CAS. Clinicians and scientists do not always agree on the specific speech characteristics that fit speech delay versus CAS. As described in the section “Childhood Apraxia of Speech,” the disorder may be due to a deficit in planning speech sound sequences, in much the

same way as hypothesized for the adult version of the disorder (AAS). An understanding of pediatric speech sound disorders can benefit from a classification system. The classification system may include the cause of the disorder, known or presumed, the severity of the disorder, the natural history of the disorder (if and how it changes over time), and subtypes within a single named disorder (e.g., subtypes of motor speech disorders). A detailed classification system for pediatric speech sound disorders, supported by some data as well as reasonable speculation, has been published by Shriberg and colleagues (2010). Some of the ideas from this classification system are used in this chapter. A final introductory point is the role of speech intelligibility in a speech disorder. A basic question in almost any speech disorder is, to what extent does it affect a speaker’s intelligibility? How difficult is it to understand what the speaker is saying? This is a central concern in pediatric speech disorders, regardless of the underlying cause.

Speech Delay Estimates of the prevalence of speech delay are as high as 15.6% in children aged 3 years. By age 6 years, many of these children “catch up” to typical development norms. Prevalence estimates of speech delay drop to about 4% at age 6 years, reflecting a positive outcome for many children who were diagnosed with speech delay at age 3 years. This leaves a significant number of children who enter first grade with speech characteristics that are noticeably delayed relative to expectations for their age. (See Vick, Campbell, Shriberg, Green, Truemper, Rusiewicz, & Moore [2014] for a review of the prevalence data and Flipsen [2016] for a summary of data showing a decrease in the prevalence of speech sound disorders throughout grade, middle, and high school.) Speech sounds are mispronounced by typically developing children who are partially or largely unintelligible at certain times during their speech development. The expectation for children who have typically developing sound systems is for a decreasing number of speech sound errors and increasing speech intelligibility as the child gets older. The child with speech delay also mispronounces individual speech sounds but at ages when these sounds are mastered by the majority of children. Children with speech delay are therefore more unintelligible than they should be at a specific age. For example, a 4-year-old child who is diagnosed with speech delay has more speech sound

15  Pediatric Speech Disorders I

errors and lower speech intelligibility than a typically developing child of the same age. When a significant degree of unintelligible speech seems to be unusual for a child’s age, parents or teachers may refer the child to an SLP for formal evaluation. A typical age of referral is between 4 and 4½ years (Shriberg & Kwiatkowski, 1994). Of course, many children with speech delay correct their speech sound errors over time, without therapy, and become fully intelligible. Very generally, speech delay can be defined as a childhood disorder in which speech sound errors and phonological processes reflect immaturity of speech development for a child of a given age. This impression of age-inappropriate speech skills by parents or teachers is not very precise, especially in the absence of an obvious cause for the delay as well as the significant variability in speech development among typically developing children. What seems to be excessively unintelligible speech for a typically developing 5-yearold may reflect nothing more than a different path to full intelligibility; in a year or two, the child may have no speech sound errors. This is one reason for a formal evaluation of a child’s speech when a delay is suspected. A decision to obtain a formal evaluation may be prompted by parents, or by a teacher who has the impression that a child is not learning speech sounds in a typical way.

Diagnosis of Speech Delay A speech-language pathologist is most likely to initiate the evaluation by conducting a standardized articulation test. Several formal articulation tests are available to speech-language pathologists. Most such tests are based on the speech sound production of a large sample of children whose ages range from 2 years 0 months (hereafter, 2;0) to as high as nearly 22 years. These data are called “norms,” because they reflect typical development as defined by the test. Most often, the norms are in the form of ages at which a high percentage of tested children produce a sound correctly. For example, one often-used test of articulation reports the average age of mastery for /s/ in the word-initial position as 5 years (Goldman & Fristoe, 2015). “Average age” means that a criterion percentage (e.g., 85%) of all children tested produced the sound correctly. Some children master /s/ by age 3 years, others may not master it until age 8 years. But a percentage criterion is a useful way to answer the question, “At what age does the typically developing child master the /s/ sound?” The same question can be asked and answered about any speech sound or combination of speech

207

sounds. Slightly different norms are given for males and females to recognize the typically faster mastery of the sound system by female children as compared with male children. The tested speech sounds are classified in one of four ways, including correct, substitution, omission, or distortion. A correct speech sound is self-explanatory (e.g., a “g” in the word “dog”). A substitution is the replacement of the correct sound with another phoneme (e.g., a “d” for “g” substitution, resulting in “dod”). An omission is the absence of a sound that is present in the correctly produced word (e.g., an omitted “g” in “dog,” “da” (or “daw,” depending on dialect) for “dog.” Finally, a distortion is a speech sound having the characteristics of the target but produced unclearly, like a poor version of the sound (e.g., a stop consonant like a “g” produced slightly in front of the place of articulation for “g”). The results of formal articulation tests are compared to the norms; the scores for a child are totaled across all tested sounds. This total score is expressed as a deviation from the average total score for the typically developing norms. For example, a 4-year-old child evaluated for speech delay may have a total sound production score of 85, which is compared to the total score norms for typically developing 4-yearold children. In this example, we assume the 4-year-old norm is 100. Depending on the score criterion recommended in a specific test of articulation, the total score of 85 may deviate sufficiently from the norm to merit a diagnosis of speech delay. Another approach to estimating a child’s articulatory mastery is to compute a measure of percentage of consonants correct (PCC). Imagine a child engaged in a conversation, a recording of which is made for analysis. The recording of the child’s speech (the “speech sample”) is transcribed using the IPA (Chapter 12) to obtain a count of all consonants within the sample including correctly and incorrectly produced consonants. The correctly produced consonants are expressed as a percentage of the total number of consonants in the sample. The PCC measure was originally described by Shriberg and Kwiatkowski (1982) and has been updated and refined several times (Shriberg, Austin, Lewis, McSweeny, & Wilson, 1997). At 5 years of age, typically developing children have PCC scores of 90% to 95%. PCC varies with several factors, including age and type of speech material used to extract the measure. As described earlier, the measure as originally developed was taken from conversational speech samples, but it has also been used in the analysis of single words (e.g., Fabiano-Smith & Hoffman, 2018).

208

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

PCC can also be used to diagnose speech delay. Much like the score from a standardized articulation test, there is no firm PCC cutoff to separate children with typically developing speech sound mastery from those with speech delay. The clinician who uses the PCC score to confirm a diagnosis of speech delay must adopt a criterion percentage to make the decision. For example, original data reported by Shriberg and Kwiatkowski suggested a cutoff PCC of around 85%. Children with a PCC score below 85% were candidates for a diagnosis of speech delay.

Quantitative Measures of Speech Delay and Speech Intelligibility Articulation tests such as the Goldman-Fristoe Test of Articulation (GFTA) and PCC are quantitative because they use numbers to estimate the goodness of articulatory skills. This is in contrast to qualitative estimates of a child’s speech such as “excellent,” “good,” “poor,” and so forth. Earlier in the chapter, speech intelligibility was cited as a primary issue for children with speech delay. When quantitative measures of articulation (standardized articulation test or PCC) and speech intelligibility (e.g., the number of words heard correctly) are available for the same children, is there a relationship between them that allows a reliable estimate of speech intelligibility from the articulation score? This is a basic

statistical question: can y (speech intelligibility score) be predicted from x (articulation score)? Among children diagnosed with speech delay, there is, at best, a modest correlation between PCC and a measure of speech intelligibility (Shriberg & Kwiatkowski, 1982). More recent studies of children with speech sound disorders fail to show a convincing relationship between PCC and speech intelligibility: “Improvements in severity (as measured by PCC) were noted in some of the children, but these improvements did not translate into improvements in intelligibility” (Lousada, Jesus, Hall & Joffe, 2014, p. 593). Knowing the PCC for an individual child may be a weak predictor of speech intelligibility.

Speech Delay and Individual Speech Sounds Children diagnosed with speech delay often have the most pronounced delay for consonants mastered late in the course of typical speech sound development (Shriberg et al., 1997). In children with speech delay, the “late eight” /s/, /z/, /r/, /l/, /T/ (“thin”), /D/ (“those”), /ʃ/ (“shine”), and /tʃ/ (“chop”) may show more delay than sounds mastered early in development such as /b/, /d/, /g/, /p/, /t/, /k/, /m/, /w/, and /n/. Speech delay for the late eight, being more likely and lasting a longer time than speech delay for early mastered sounds, may have a disproportionate effect on speech intelligibility.1 A therapy plan for a child with speech

It Just Doesn’t Add Up Speech intelligibility tests were developed many years ago in the dark ages of landlines to evaluate the quality of telephone transmission. The idea was to have tests that “added up” the quality of each speech sound in a list of words to obtain a percentage of the words recognized by a crew of listeners (Weismer, 2008). This seems to make sense, but as it turns out it does not work very well when speech intelligibility tests are used to estimate the severity of a person’s speech disorder. Speech intelligibility tests are used routinely to estimate the severity of a speech problem in persons with cleft palate, hearing impairment, and motor speech disorders (dysarthria and apraxia of speech), as well as in children with speech sound disorders. Study after study has shown that

1

counts of “phonemes correct” do not match overall intelligibility scores (reviewed in Weismer, 2008). This is consistent with the more recent findings of Lousada et al. (2014) for children, and with results of a study conducted by Ertmer (2010) on the speech intelligibility of children with hearing loss. Why doesn’t it add up? It is not yet entirely clear, but a good guess is that the connections between sounds — how you get from one sound to another within a word — are just as important as the quality of individual sounds. In addition, speech intelligibility is not simply the “sum” of individual sounds, but as discussed in Chapter 11 makes use of top-down processes in which words are often identified before all the sounds have been analyzed.

I f speech delay was the same for all speech sounds, it might seem reasonable to expect each sound to make an equal contribution to speech intelligibility problems. This hypothesis is not correct, however, because the frequency of occurrence varies across the speech sounds of a

15  Pediatric Speech Disorders I

delay may include focused work on sounds thought to have a large effect on speech intelligibility.

Speech Delay:  Phonetic, Phonological, or Both? Earlier, it was noted that some professionals use the diagnostic term “phonological disorders” for developmental speech sound disorders with no known cause. The view of speech sound errors as phonetic (articulatory) or phonological may have important implications for clinical practice. A child diagnosed with multiple speech sound errors may be treated with articulatory practice (also called traditional articulation therapy: see Hegarty, Titterington, McLeod, & Taggart, 2018) as part of his speech therapy. A central component of this approach is repetition of each error sound as a way to establish and refine the speech motor control required for its correct production. An assumption of this approach is that extensive movement and placement practice is likely to result in mastery of an incorrectly produced speech sound (Powell, Elbert, Miccio, Strike-Roussos, & Brasseur, 1998; Lousada et al., 2013). The effect of articulatory practice, in this view, is similar to the expected effect of practice on any skill (e.g., throwing a football; keyboarding).2 When speech sound errors are considered phonological, the emphasis in therapy is on the sound system, rather than on individual sounds and phonetic practice (Brumbaugh & Smit, 2013). Children who receive phonological therapy may be trained to recognize and produce minimal pair contrasts. A minimal pair is formed by words that differ by a single feature, such as consonant voicing, place of articulation, and manner of articulation. Examples of minimal pairs are “pack-back” (word-initial voicing), “pack-tack” (wordinitial place of articulation), and “sack-tack” (wordinitial manner of articulation). Minimal pair therapy addresses the component of the sound system that makes phoneme contrasts — the sounds that change the word meaning when exchanged in the same word position. Treatment of phonological processes that have not disappeared from the child’s sound system is another example of phonological therapy. For example, when

209

a child deletes consonants from the word-final position, the specific “missing” sounds are not treated one by one. Rather, the child is exposed to groups of CVC words in which the word-final C varies across several consonant types (e.g., /t/, /g/, /s/, /n/). The object of the therapy is the child’s mastery of the CVC word form, which includes many different word-final consonants. The assumption is that exposure to the CVC words generalizes across consonant types, eliminating the phonological process of final consonant deletion in these word forms.

Residual and Persistent Speech Sound Errors In the majority of typically developing children, mastery of the speech sound system is complete around 8 years of age. A small number of children have speech sound errors that extend past this age and into the teenage years, and possibly into and beyond young adulthood. The terms residual speech sound errors and persistent speech sound errors have been used to describe articulatory errors lasting past the age of complete speech sound mastery. As pointed out by Flipsen (2016), clinicians and researchers have tended to use the terms, “residual” and “persistent” speech sound errors interchangeably to classify children who have these long-lasting errors. Although the terms are now thought to classify children with partially different histories of speech sound errors, the two groups share a characteristic — the speech sounds in error are transcribed as distortions, rather than substitutions or omissions. As discussed earlier, a distortion is recognizable as the intended but poorly produced speech sound. For example, the word-initial [s] in [sIn] “sin” is produced with the tongue too far forward in the mouth (like a lisp) or in a “slushy” way. The perceived word “sin” is recognized as such, not as “thin” or “shin.” On the other hand, a substitution is heard as a different sound than the one intended, as when the speaker says “shin” for the intended “sin” (a /ʃ/ for /s/ substitution). An omission is the absence of a sound in the intended word. For example, omission of the /l/ in “slow” results in a production transcribed as [soU], or omission of the /ŋ/ in “sing” is spoken as [sI] (possibly with a nasalized /I/, hence [s˜I ]).

language. For example /t/, /d/, and /n/ are frequently occurring sounds in American English, whereas /T/, /ʃ/, and /h/ occur relatively infrequently (Mines, Hanson, & Shoup, 1978). /T/ and /ʃ/, two of the infrequently occurring sounds in this example, are also among the lateeight sounds. Perhaps the late-eight difficulties are not so important to speech intelligibility problems in children with speech delay? Of course, it is never that simple. /s/, a late-eight sound, is among the most frequently occurring speech sounds. 2 

 here are cognitive contributions to establishing and refining any motor skill. The analogy between articulatory practice and (for example) T throwing a football is not meant to eliminate cognitive skill, such as knowing the relationship between articulatory movement and placement with acoustic results, or between arm motion and grip on the football with the location of the ball when it is thrown).

210

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

If speakers with residual errors are different from speakers with persistent errors even though they both produce distortions of speech sounds, how are they distinguished? Residual errors are usually distortions of /r/ and /s/, with [w]-like for /r/ errors, and [T]-like for /s/ errors (or, slushy-like [s] for /s/ errors). The “-like” part of these error patterns is important: casual listening to a distorted /r/ may at first seem like a [w] for /r/ substitution, but closer listening reveals a [w]-like sound with rhotic (/r/-like) qualities. The /r/ distortion often sounds as if it is between an /r/ and a /w/. /s/ distortions have the same “in-between” quality. Many clinicians and researchers believe that /r/ and /s/ residual errors reflect an incomplete process of speech motor learning and are usually unresolved articulatory imperfections from a previously diagnosed and treated speech delay (Flipsen, 2016). The diagnosis of speech delay earlier in the child’s developmental history was made because of multiple speech sound errors, not just /r/ and /s/ errors. Persistent errors are thought to be /r/ and /s/ distortions that were present earlier in a child’s speech sound development, but were not treated because the remainder of the speech sounds were learned in a typical way. In other words, the child was not diagnosed with speech delay because the errors were limited to only one or two speech sounds, both typically mastered late in the speech-sound learning process. There is, however, some evidence that children who are thought to have persistent errors may have other speech sound errors in addition to /r/ and/or /s/. This may also distinguish these children from those who are described as having residual errors. Children with residual or persistent errors may recover spontaneously to produce distortion-free /r/ and /s/. However, up to 25% of children who have residual or persistent errors around 9 years of age may not recover spontaneously, and may require services to correct these errors. The distinction between residual and persistent speech sound errors may be important when therapy is undertaken to eliminate distortions of these sounds. Another potential explanation for residual and persistent speech sound errors is that a child has subtle perceptual problems specific to /r/-/w/ and /s/-/T/ (or /s/-/ʃ/) distinctions. In a recent study, children aged 9 to 14 years with residual /r/ errors were able to hear the difference between /r/ and /w/ with the same proficiency as children without residual errors (Preston, Irwin, & Turcios, 2015). Other studies, reviewed by Preston et al., have found some evidence of subtle perceptual problems in children with speech sound disorders.

The possibility of perceptual difficulties in children with residual errors is relevant to treatment options. If residual /r/ and /s/ errors have a basis in poor perception of phoneme-specific contrasts (e.g., the /r/-/w/ contrast), “ear training” makes sense as part of a therapy program to correct the production errors. Ear training — listening to many examples of /r/-/w/ pairs — may establish a better representation of the sounds as different phoneme categories. Once the categories are established by improvements in perceptual skill for specific contrasts, the child uses the categories to produce the sounds correctly.

Additional Considerations in Speech Delay and Residual and Persistent Speech Sound Errors Speech sound disorders, including both speech delay and residual errors, have been associated with causes and effects other than those presented. For example, some authors (review in Eaton, 2015) have suggested that subtle deficits in cognitive abilities may be a factor in residual errors. “Cognitive abilities” include the ability to process, represent, store, and represent information and to focus on relevant stimuli and exclude irrelevant stimuli. Of special interest to the presence of residual speech sound errors, these cognitive abilities must be employed to compare speech sound output, such as a residual error, to stored representations of the sound. This is called self-monitoring. The success of self-monitoring depends on the ability to focus on the speech output, to use memory to access the representation of the sounds in the brain, and to make a proper comparison between the output and the stored representation. In theory this makes sense, but research to date has not produced clear results on deficiencies in cognitive abilities in children with residual errors, or the effectiveness of therapy techniques to correct residual errors based on training of self-monitoring skills. Still, cognitive skills as a partial explanation of residual errors and as a potential target in therapy deserve further research effort. Children with residual errors may also experience challenges in social settings and academic performance. Hitchcock, Harel, and Byun (2015) conducted a survey filled out by parents whose children had /r/ errors. The survey focused on the effect of the /r/ errors on the child’s social and academic life. The ratings suggested that the greatest impact of the residual errors was on social interactions, especially for children over 8 years of age. The findings of this survey are consistent with research suggesting that children with residual errors are judged more negatively than

15  Pediatric Speech Disorders I

children with typically developing articulation (Crowe Hall, 1991). In another survey, adults who as children had a history of speech sound disorders received lower grades in high school and had fewer years of post–high school education as compared to children who had typically developing speech sounds (Felsenfeld, Broen, & McGue, 1994). Children diagnosed with speech delay as they enter grade school may also have delayed literacy skills (literacy skills include reading, writing, and spelling) relative to children with typically developing speech sound skills (Haylou-Thomas, Carroll, Leavett, Hulme, & Snowling, 2017). Delayed literary skills can have profound effects on academic success. These studies, taken together, suggest a connection between childhood speech sound disorders and the quality of social and academic aspects of life. A clear cause-and-effect relationship cannot be established from these studies, and certainly not all children with speech sound disorders have, or as adults will have, social and academic problems. The trends in these studies, however, point to the potential value of speech therapy to correct speech sound disorders and minimize their effect on the quality of life.

Speech Delay and Genetics Finally, there is interest in the possibility of a genetic basis for speech delay. A simplified explanation for this interest is the possibility of inheritance of a predisposition for speech delay. Researchers acknowledge the unlikely case of a single gene “explaining” speech delay, Rather, a larger group of genes under the influence of variables in the environment is thought to contribute to speech and language development (Lewis, Shriberg, Freebairn, Hansen, Stein, Taylor, & Iyengar, 2006; Peterson, McGrath, Smith, & Pennington, 2007); a disruption in these speech/language genes may result in speech delay. Variation in the environment (e.g., extensive language stimulation versus minimal language stimulation) almost certainly modifies if, and how, the speech/ language genes affect speech sound development. An interaction between multiple genes with an environmental basis for speech/language development makes the identification of specific speech-language genes very challenging. Given the complexity of human genetic material and its interaction with environmental factors, why do researchers explore a genetic predisposition for speech delay? As reviewed by Felsenfeld (2002), children with speech sound disorders (including speech delay) are more likely to have family members with a history of speech sound disorders when compared to children whose speech sound development is typical. The rea-

211

soning is that the likelihood of children with speech delay having family members with speech sound disorders should be the same as children with typically developing speech sounds, if there is no hereditable (genetic) susceptibility to speech sound disorders. This line of reasoning is complicated by the effect of environment on speech sound development. As previously noted, the familial pattern of speech sound disorders may reflect a language stimulation environment that is similar in multiple generations of a family. Perhaps parents in a family do not direct a good deal of spoken language to their babies, much in the same way that the parents’ parents did with them. Across generations, this style of language stimulation may be an environmental influence on speech-language genes that results in a high probability of speech sound disorders. Evidence in support of a hereditable component for speech sound disorders has also been gathered from studies in which the tendency for speech sound disorders among adopted children was tied to a history of speech sound disorders in a biological parent, as compared with an adoptive parent (Felsenfeld & Plomin, 1997). This finding is consistent with a genetic component in childhood speech sound disorders, but does not rule out an environmental effect as well.

Childhood Apraxia of Speech CAS is a diagnostic term for children with a speech sound disorder that shares some characteristics with speech delay and has unique characteristics as well. “Praxis” is a Greek word meaning “doing,” or “action.” “Apraxia” is “not doing.” Most patients diagnosed with CAS are able to “do” speech behaviors, albeit with difficulty. This is why some clinicians and researchers prefer the term “dyspraxia” to categorize the disorder. In this chapter we use the acronym CAS to designate childhood apraxia of speech. Apraxia is a diagnostic term that has been in use for many years to describe an inability to perform certain motor behaviors in the absence of muscle weakness, or other muscle problems. Praxis is defined as the ability to perform . . . skilled or learned movements. Apraxia refers to the inability to carry out . . . praxis movements in the absence of elementary motor, sensory, or coordination deficits that could serve as the primary cause. (Park, 2017, p. 317)

A nearly identical definition for apraxia is provided by Zadikoff and Lang (2005, p. 1480), who also point out that apraxia may coexist with “elementary” (e.g.,

212

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

weakness, loss of sensation) muscle disorders. For example, a limb may be weak but capable of producing an appropriate gesture, even if slow and reduced in magnitude. A patient with apraxia and limb weakness is likely not to produce the appropriate gesture, even if slow and small. The 2007 American Speech-Language-Hearing Association (ASHA) Committee report on CAS proposed the following definition of the disorder: Childhood apraxia of speech (CAS) is a neurological childhood (pediatric) speech sound disorder in which the precision and consistency of movements underlying speech are impaired in the absence of neuromuscular deficits (e.g., abnormal reflexes, abnormal tone). CAS may occur as a result of known neurological impairment, in association with complex neurobehavioral disorders of known or unknown origin, or as an idiopathic neurogenic speech sound disorder. The core impairment in planning and/or programming spatiotemporal parameters of movement sequences results in errors in speech sound production and prosody.

Apraxia as a sign of neurological disease has been recognized for many years. Long before the term “apraxia” was used to describe speech deficits, it referred to a neurologically based disorder of limb and orofacial movement in which patients are unable to produce voluntary actions that may nevertheless be produced spontaneously. The apraxic patient, asked to demonstrate how to brush his hair (either miming the action or asked to use a brush), may have difficulty initiating the proper gestures (or use an actual brush properly), make stop-and-go movements around his head that are not hair-brushing movements, and show a good deal of frustration with his inability to respond appropriately to the request. Similar difficulties may be observed when asking the patient to open and close the jaw, stick out his tongue, or purse his lips. These latter movement problems are called orofacial nonverbal apraxias, or simply oral apraxias. Another example of orofacial nonverbal apraxia is the patient who, when asked to “Show me how you whistle,” hesitates before attempting to respond to the request, and once movement begins may have difficulty narrowing the lips, forming them into the shape required for a whistle. The patient seems to “grope” for the required lip configuration and make several attempts to get it right. Interestingly, the diagnosis of oral nonverbal apraxia does not mean the patient is also diagnosed with apraxia of speech, and vice versa: a patient diagnosed with apraxia of speech does not necessarily have oral nonverbal apraxia (Whiteside, Dyson, Cowell, & Varley, 2015).

In adults, oral nonverbal apraxia, and/or apraxia of speech, are almost always the result of a known brain lesion or known neurological disease. Studies such as that by Whiteside et al. (2015) included 50 patients who had suffered a stroke and had documented brain lesions. Adult stroke survivors are the most frequent participants in studies of apraxia of speech.

CAS Compared With Adult Apraxia of Speech (AAS) AAS is covered in greater detail in Chapters 9 and 14, but a sketch of the speech problem is required to introduce the speech characteristics in CAS. The characteristics of AAS include hesitation in the initiation of speech, groping for articulatory postures as if searching for the correct articulatory position and shape for a specific sound, production of multisyllabic words as if each syllable is “pulled apart” with a robotic-sounding effect due to equal stress on each syllable, vowels and consonants that have unusually long durations, and inconsistent sound errors such as substitutions and distortions. Many clinicians and scientists have commented on the inconsistency of these sound errors: when asked to repeat the same word many times, the sound errors are not the same on each repetition, and a sound error on one repetition may be produced correctly on the next repetition (Bislick, McNeil, Spencer, Yorkston, & Kendall, 2017). Adults with apraxia of speech are also likely to make more sound errors and to show more initiation problems and articulatory groping, as a sound sequence increases in complexity. For example, the three words, “please,” “pleasing,” “pleasingly” increase in complexity by virtue of the number of syllables in the words: “pleasing” is more complex than “please,” and “pleasingly” is more complex than “pleasing.” The key to understanding this phenomenon is to focus on errors made, or increased hesitation before, the initial sounds (/p/ and /l/) of each word. Adults with apraxia of speech tend to make more word-initial errors with an increasing number of syllables following the wordinitial syllable. Articulation of the word-initial /p/ and /l/ in “please-pleasing-pleasingly” is affected by the number of syllables following it, suggesting that the beginning of multisyllabic words is not produced independently of the middle and end of the word. Another example of a difference in phonetic complexity is the word pair “sit” and “split.” The /spl/ consonant cluster is considered more complex than the singleton /s/; otherwise, the two words share the same phonetic segments (/s/ as the word-initial sound, /I/ and /t/ as the final two sounds). Adults with apraxia of speech

15  Pediatric Speech Disorders I

typically have greater difficulty with the word-initial /s/ for words such as “split” as compared with “sit.” Selected speech characteristics in AAS are listed in Table 15–1 (a complete review of these characteristics is found in McNeil, Robin, & Schmidt, 2009). These characteristics are not observed in every patient diagnosed with apraxia of speech and may even appear and disappear in the speech of a single patient.

Speech Motor Programs One explanation for the phonetic complexity effect depends on the concept of a motor program. Motor programs are thought to be the organizational processes in the brain that prepare execution of an action. The speech motor program is assumed to contain plans for placement and configuration of the articulators as

213

well as the timing of muscle contractions to produce these articulatory goals. A program is not the same as the action; the program (sometimes called the “plan”) is the representation in the brain of the action and may exist without its execution. The idea of speech motor programs is controversial, and even the nature of what is programmed (phonemes? syllables? Whole, multisyllabic words?) is debated. Even with controversy, the concept of speech motor programs enjoys widespread acceptance among clinicians and scientists who study apraxia of speech. Motor programs are thought to take time to assemble, even if the scale of the time is in micro- or milliseconds (typical for brain-time processes to be fast). The programming time is assumed to be longer for complex actions compared with simpler actions. For example, the motor program for the production of “pleasingly”

Table 15–1.  Selected Speech Characteristics of AAS

Characteristic

Brief Description

Initiation difficulties

Patient hesitates before the onset of speech, as if he cannot get started; he appears to “grope” for the correct articulatory configuration for the first sound.

Loss of stress contrasts*

Syllables in multisyllabic words have roughly equal duration and seem “pulled apart” from surrounding syllables, making speech sound robotic.

Vowels and consonants have long durations

Adults with apraxia of speech have slow speaking rates, which means that the durations of speech sounds are longer than normal.

Sound errors are substitutions and distortions

Speech sound errors are replacements of one sound with another (like phonemic errors) or poorly produced sounds (phonetic errors).

Sound errors are inconsistent

Multiple repetitions of one word by a patient do not always have sound errors that are the same. The sound errors are variable in type. A word such as “save” may have a substitution for /s/ in one repetition (“shave” for “save”) and a distortion in another repetition (a sound between an “s” and “sh” for “save”). Another repetition may have a correct “s.”

Increased hesitation, groping, and articulatory errors with increased phonetic complexity

Multisyllabic words “bring out” apraxic errors more than single-syllable words; more complex syllables (“spl” versus single-consonant “s”) also bring out more apraxic errors.

*This speech characteristic in adult apraxia of speech is specific to languages such as English (and, e.g., Dutch, French, and Russian) in which there are “long” and “short” syllables within multisyllabic words. In the English word “elephant,” the first syllable is stressed, the second and third are unstressed (the first syllable is longer than the following two syllables). Try saying “elephant” with equal duration for each syllable to get an idea of the “robotic speech” characteristic of adult apraxia of speech. There are languages (e.g., Spanish, Korean, Mandarin Chinese), however, in which successive syllables are not long or short but have roughly equal duration. The characteristics of apraxia of speech have not been defined well (or at all) in these languages.

214

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

takes more time to assemble than the programming for “please,” as does the programming of “split” compared with “sit.” One hypothesized proof of motor programming and the complexity effect is that reaction times to say the word “split” are greater than reaction times to say “sit.” Similarly, reaction times are longer for “pleasingly” versus “please.” Ballard, Tourville, and Robin (2014) provide a review of the programming hypothesis in adult apraxia of speech.

CAS:  Prevalence and General Characteristics In a 2007 report posted at the ASHA website (https:// www.asha.org/practice-portal/clinical-topics/child​ hood-apraxia-of-speech/), CAS was estimated to have a prevalence of between 0.1% and 0.2%, or between 1 and 2 per 1,000 children with the disorder. Boys are diagnosed with CAS more often than are girls. The prevalence estimate is an educated guess, because fixed criteria for the diagnosis have not been determined, or at least have not been widely agreed upon in clinical and research settings. Excessive diagnosis of CAS has been discussed in the literature, although it is hard to know how to identify an incorrect diagnosis (a “falsepositive” diagnosis) when the criteria for diagnosing the disorder have not been firmly established. CAS is thought to occur as part of the develop­ mental delays and disorders in several genetic conditions. CAS can be said to have a known origin when it is part of a known disease or syndrome (such as fragile X syndrome, discussed in Chapter 8). Because the majority of diagnosed CAS cases are not associated with known genetic disorders, this chapter includes CAS as a developmental speech sound disorder of unknown origin.

CAS:  Speech Characteristics Children diagnosed with CAS often have a severe articulatory disorder. These children are likely to have many speech sound errors, possibly including frequent vowel errors. The vowel errors are notable because vowels are mastered early in the course of speech sound learning and are typically not observed as errors in children diagnosed with speech delay. 3

As in AAS, speech sound errors are inconsistent in CAS. A specific sound segment may be misarticulated in different ways as a child repeats the same word multiple times. The inconsistent errors may be omissions, substitutions, or distortions; for some repetitions of a word, the speech sound may be produced correctly. Children diagnosed with CAS are likely to be quite unintelligible as a result of the severe articulatory disorder.3 Additional speech characteristics in CAS include lengthened connections between speech sounds (that is, longer transitioning movements between successive sounds, similar to the “pulled apart” syllables in AAS), disturbed prosody, articulatory groping, increased articulatory errors with increased phonetic complexity, and “unusual” phonetic errors such as the omission of word-initial consonants (/æt/ for “cat”). Table 15–2 lists selected speech characteristics in CAS. This list of speech characteristics in children with CAS is in many cases similar or identical to the list of characteristics described earlier for AAS. This seems to validate the use of the term “apraxia” for both the adult and child versions of this speech sound disorder. In fact, the speech sound disorder in CAS is thought to be the result of a motor programming disorder, as it is in the adult form of apraxia of speech (ASHA, 2007). There are interesting differences between the adult and child versions of apraxia of speech. With very few exceptions, the adult version is associated with known neurological diseases such as stroke, Parkinson’s disease (review in Presotto, Rosenfeld Olchik, Schumacher Shuh, & Reider, 2015), and dementia (e.g., Alzheimer’s disease; review in Cera, Ortiz, Bertolucci, & Minett, 2013). In most cases of stroke, a region of damaged brain tissue (a lesion) is seen in brain images. In Parkinson’s disease, lesion locations in the midbrain and frontal lobe have been demonstrated at autopsy. Lesion locations throughout the central nervous system have also been demonstrated in Alzheimer’s disease and other types of dementia. Brain lesions in CAS have been difficult or impossible to locate. Imaging studies have suggested size differences in certain locations of the cerebral hemispheres of children with CAS as compared with typically developing brain areas (ASHA, 2007). In at least one study, children with CAS appear to have different connections between brain areas compared with children who do not have speech sound errors (Fiori et al., 2016).

 n example of the poor speech intelligibility of children diagnosed with CAS is found in Namasivayam et al. (2015, Figure 2, p. 539). Speech A intelligibility scores for word or sentence lists (number of correctly heard words/sentences by a group of listeners) that are less than 50% are typically considered to indicate a severe intelligibility deficit. In Namasivayam et al., the average intelligibility scores for children with CAS are no greater than 30% and as low as 7%.

15  Pediatric Speech Disorders I

215

Table 15–2.  Selected Speech Characteristics in Childhood Apraxia of Speech

Characteristic

Brief Description

Severe articulatory disorder

Many speech sounds in error, including vowels; children often have severe intelligibility deficits.

Sound errors are inconsistent

Multiple repetitions of one word by a child do not always have sound errors that are the same. The sound errors are variable in type.

Lengthened durations between adjacent syllables or individual sounds

Speech perceived as slow and lacking smoothness.

Disturbed prosody

Speech characteristics that distinguish stressed from unstressed syllables, which include pitch, loudness, and duration, are atypical; melody of whole utterances may seem atypical.

Increased hesitation, groping, and speech sound errors with increased phonetic complexity

Multisyllabic words “bring out” apraxic errors more than single syllable words; more complex syllables (“spl” versus “sit”) also bring out more errors than phonetically simple syllables.

Unusual speech sound errors (unusual in the process of normal speech sound development)

Vowel errors may be common; word-initial consonants may be omitted.

CAS and Overlap With Other Developmental Delays Some children with speech delay, as discussed previously, also have language and reading delays. In children with CAS, language learning delays tend to be more common and more severe than in children with speech delay. Problems in language comprehension and language expression, as well as literacy skills, have been linked with CAS. The clinical opinion of many SLPs is that the speech and language deficits associated with CAS are likely to persist into the teenage years and even into adulthood. The implications of these cooccurring speech and language disorders for academic and work-life success are clear and point to the need for effective speech and language therapies for children with CAS. Reviews of the co-occurring speech and language deficits in CAS are available in ASHA (2007), Peterson et al. (2007), Gillon and Moriarty (2007), and Zaretsky, Velleman, and Curro (2010). There are many treatments available for the child who is diagnosed with CAS. Almost all treatment strategies include a focus on the reduction of articulatory errors for individual sound segments, in the same way as treatment for speech delay. The focus on correction of articulatory errors does not seem to reflect the view of CAS as a programming disorder — that is, there is no

attempt to modify or correct the program for an articulatory sequence. It is not clear how a motor program can be modified or corrected, although an argument can be made that practice of multisyllabic words can achieve this goal. A therapy like this for children with CAS has been developed by Murray, McCabe, and Ballard (2012) and has shown some preliminary, promising results. Morgan, Murray, and Liégeois (2018) provide a comprehensive review of therapies for CAS.

CAS and Genetics As in speech delay, children with CAS have family members with a higher probability of a history of CAS (or other speech sound disorders) compared with typically developing children. The interest in the genetic basis of CAS was significantly motivated by study of a multigenerational family in Great Britain (Lai, Fisher, Hurst, Vargha-Khadem, & Monaco, 2001). Severe speech problems with characteristics resembling apraxia of speech were identified in roughly 50% of the family members, across generations and in both children and adults. Many of the family members with apraxia of speech also had intellectual disability (Chapter 8). Study of the genetic profile of each of the family members revealed that those with apraxia of speech shared an abnormality

216

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

of a single gene, on the seventh chromosome. The common gene abnormality among the family members with apraxia of speech suggested the presence of a “speech and language gene.” During embryonic development, this gene abnormality was hypothesized to disturb the regulation of other genes that are important to the proper development of brain structures thought to be active in speech and language development. Since the original publication of the Lai et al. (2001) report, several other gene abnormalities have been proposed to be important in developmental speech and language disorders. Some of these speech and language disorders are characteristic of other diseases in which there are many other problems. In these diseases, the speech and language problems in CAS do not stand alone, as in a “pure” version of CAS. At this point in time, it seems clear that multiple genes play a role in “pure” cases of CAS (see Box, “What Is ‘Pure’ CAS?”). A review of genetic factors in CAS is found in Worthey et al. (2013).

What Is “Pure” CAS? In theory, “pure” CAS is a speech sound disorder with a neurological basis. Pure CAS, if it exists, is a speech motor control disorder. The control of the nervous system over movements of the articulators, larynx, and respiratory system are compromised, resulting in the selected speech characteristics listed in Table 15–2. Other speech characteristics may also be observed in pure CAS, but they are all the result of a speech motor control problem. More often, as described in the text, children with CAS have other developmental delays and problems including language and literacy delays. If most children with CAS have these additional problems that cannot be explained by a pure disorder of speech motor control, why are scientists interested in the identification of the small number of children with the pure variety? The reason is that pure cases may allow the identification of the genetic basis of the speech motor control component of CAS. Stated in a different way, perhaps the pure cases have a more focused genetic basis — one or two gene abnormalities. The genetic basis of the more frequent “CAS plus language disorders” may be harder to pin down because of the presumed larger set of genes at work in the disorder (Morgan, Fisher, Scheffer, & Hilde­brand, 2017).

Chapter Summary Speech delay and CAS are presented in this chapter as two speech sound disorders of unknown origin, meaning that factors such as abnormal structural characteristics of the speech mechanism, documented neurological disease, intellectual disability, and/or hearing loss are ruled out and therefore do not explain the speech disorder. Speech delay is a childhood speech sound disorder with high prevalence and unknown origin; the characteristics of the disorder include speech sound mastery that is delayed relative to age and sex norms that are documented as a part of standardized tests of articulation, or by other measures. Children with speech delay are most likely to have speech sound errors for later-learned, as compared to earlier-learned sounds, which presumably reflects the more complex speech motor control skills required for the later-mastered sounds. Possible explanations for speech delay include (but may not be limited to) immature speech motor control, delayed learning of phonological rules, and delayed speech perception skills, or some combination of the three. The speech intelligibility problem in speech delay is partly, but not completely, a result of speech sound errors. The speech intelligibility deficit that is due to the speech sound errors is greater than expected for a child’s age. Speech delay may be associated with problems in socialization and academic performance in grade school and possibly into middle and high school. Some children have /r/ and/or /s/ errors, called residual errors, past the age (around 8 or 9 years) at which speech sound mastery is typically completed. A smaller number of children have persistent speech sound errors for several speech sounds in addition to /r/ and /s/ errors. Speech delay is frequently a communication component of a more general language delay, which may include language comprehension and/or expression delay. Children with speech delay are more likely to have delays in reading ability compared with children who have typically developing sound production. Research evidence points to a genetic component in speech sound disorders, including speech delay; the genetic component is likely to involve multiple genes that interact with the environment to make a child more susceptible to speech delay, compared with children who do not have these genes.

15  Pediatric Speech Disorders I

CAS is a relatively rare, severe speech sound disorder in which inconsistent, multiple speech sound errors and prosodic disturbances are thought to be core features. Vowel errors, hesitation before initiating speech, and groping for articulatory positions and shapes are also speech characteristics observed in CAS. The speech characteristics of CAS are similar (but not identical) to the speech characteristics seen in AAS. As in AAS, the core and other speech characteristics in CAS are hypothesized to be the result of a planning/programming problem in the brain mechanisms that control speech production. CAS is a disorder in which the diagnosis does not have good-to-excellent reliability; there is disagreement about the precise characteristics that point to a diagnosis of CAS, and apparently many diagnoses of the disorder that turn out to be incorrect (“false positives”). Children with CAS are frequently diagnosed with expressive and/or receptive language delay and with poor literacy skills. Children with CAS and language and literacy delays are at risk for poor academic and career achievement. Like speech delay, there is interest in a possible genetic basis for CAS. Research publications over the last 20 years have demonstrated the likelihood of a group of genes that are associated with typical speech and language development, and when mutated (when a gene is changed from its typical properties) predispose a child to CAS and language and literacy delays as well. Speech-language treatments for CAS show promise and are still in the early stages of development and evaluation.

References Ad Hoc Committee on Childhood Apraxia of Speech. (2007). Childhood apraxia of speech [Technical report]. Rockville, MD: American Speech-Language-Hearing Association. Ballard, K. A., Tourville, J. A., & Robin, D. A. (2014). Behavioral, computational, and neuroimaging of acquired apraxia of speech. Frontiers in Human Neuroscience, 8, 1–9. Bankson, J. E., Bernthal, N. W., & Flipsen, Jr., P. (2017). Articulation and phonological disorders (8th ed.). New York, NY: Pearson. Bislick, L., McNeil, M., Spencer, K. A., Yorkston, K., & Kendall, D. L. (2017). The nature of error consistency in individuals with acquired apraxia of speech and aphasia. American Journal of Speech-Language Pathology, 26, 611–630.

217

Brumbaugh, K. M., & Smit, A. B. (2013). Treating children ages 3–6 who have speech sound disorders: A survey. Language, Speech, and Hearing Services in the Schools, 44, 306–319. Cera, M. L., Ortiz, K. Z., Bertolucci, P. H. F., & Minett, T. S. C. (2013). Speech and orofacial apraxias in Alzheimer’s disease. International Psychogeriatrics, 25, 1679–1685. Crowe Hall, B. J. (1991). Attitudes of fourth and sixth graders toward peers with mild articulation disorders. Language Speech and Hearing Services in the Schools, 22, 334–340. Eaton, C. T. (2015). Cognitive factors and residual speech errors: Basic science, translational research, and some clinical frameworks. Seminars in Speech and Language, 36, 247–256. Eecen, K. T., Eadie, P., Morgan, A. T., & Reilly, S. (2018). Validation of Dodd’s model for differential diagnosis of childhood speech sound disorders: A longitudinal community cohort study. Development Medicine and Child Neurology, 61, 689–696. Ertmer, D. J. (2010). Relationships between speech intelligibility and word articulation scores in children with hearing loss. Journal of Speech, Language, and Hearing Research, 53, 1075–1086. Fabiano-Smith, L., & Hoffman, K. (2018). Diagnostic accuracy of traditional measures of phonological ability for bilingual preschoolers and kindergarteners. Language, Speech, and Hearing Services in the Schools, 49, 121–134. Felsenfeld, S. (2002). Finding susceptibility genes for developmental disorders of speech: The long and winding road. Journal of Communication Disorders, 35, 329–345. Felsenfeld, S., Broen, P. A., & McGue, M. (1994). A 28-year follow-up of adults with a history of moderate phonological disorder: Educational and occupational results. Journal of Speech and Hearing Research, 37, 1341–1353. Felsenfeld, S., & Plomin, R. (1997). Epidemiological and offspring analyses of developmental speech disorders using data from the Colorado adoption project. Journal of Speech, Language, and Hearing Research, 40, 778–791. Fiori, S., Guzzetta, A., Mitra, J., Pannek, K., Pasqualliero, R., Cipriani, P., . . . Chilosi, A. (2016). Neuroanatomical correlates of childhood apraxia of speech. Neuroimage Clinical, 12, 894–910. Flipsen, Jr., P. (2016). Emergence and prevalence of persistent and residual speech errors. Seminars in Speech and Language, 36, 217–223. Gillon, G. T., & Moriarty, B. C. (2007). Childhood apraxia of speech: Children at risk for persistent reading and spelling disorder. Seminars in Speech and Language, 28, 48–57. Goldman, R., & Fristoe, M. (2015). Goldman-Fristoe Test of Articulation — Third edition (GFTA-3). Circle Pines, MN: American Guidance Service. Haylou-Thomas, M. E., Carroll, J. M., Leavett, R., Hulme, C., & Snowling, M. J. (2017). When does speech sound disorder matter for literacy? The role of disordered speech errors, cooccurring language impairment and family risk of dyslexia. Journal of Child Psychology and Psychiatry, 58, 197–205. Hegarty, N., Titterington, J., McLeod, S., & Taggart, L. (2018). Intervention for children with phonological impairment: Knowledge, practices, and intervention intensity in the

218

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

UK. International Journal of Language & Communication Disorders, 53, 995–1006. Hitchcock, E. R., Harel, D., & McAllister Byun, T. (2015). Residual speech errors in school-aged children: A survey study. Seminars in Speech and Language, 36, 283–294. Lai, C. S. L., Fisher, S. E., Hurst, J. F., Vargha-Khadem, F., & Monaco, A. P. (2001). A forkhead domain gene is mutated in a severe speech and language disorder. Nature, 413, 519–523. Lewis, B. A., Shriberg, L. D., Freebairn, L. A., Hansen, A. J., Stein, C. M., Taylor, H. G., & Iyengar, S. K. (2006). The genetic bases of speech sound disorders: Evidence from spoken and written language. Journal of Speech, Language, and Hearing Research. 49, 1294–1312. Liégeois, F. J., & Morgan, A. T. (2012). Neural bases of childhood speech disorders: Lateralization and plasticity for speech functions during development. Biobehavioral Review, 36, 439–458. Lousada, M., Jesus, L. M. T., Capelas, S., Margaça, C., Simões, D., Valente, . . . Joffe, V. (2013). Phonological and articulation treatment approaches in Portuguese children with speech and language impairments: A randomized controlled intervention study. International Journal of Language & Communication Disorders, 48, 172–187. Lousada, M., Jesus, L. M. T., Hall, A., & Joffe, V. (2014). Intelligibility as a clinical outcome measure following intervention with children with phonologically-based speech-sound disorders. International Journal of Language and Communication Disorders, 49, 584–601. McNeil, M. R., Robin, D. A., & Schmidt, R. A. (2009). Apraxia of speech: Definition, and differential diagnosis. In M. R. McNeil (Ed.), Clinical management of sensorimotor speech disorders (2nd ed., pp. 249–268). New York, NY: Thieme. Mines, M. A., Hanson, B. F., & Shoup, J. E. (1978). Frequency of occurrence of phonemes in conversational English. Language and Speech, 21, 221–241. Morgan, A., Fisher, S. E., Scheffer, I., & Hildebrand, M. (2017). FOXP2-related speech and language disorders (2016 Jun 23 [Updated 2017 Feb 2]). In M. P. Adam, H. H. Ardinger, R. A. Pagon, et al. (Eds.), GeneReviews [Internet]. Seattle, WA: University of Washington, Seattle (1993–2018). Retrieved from https://www.ncbi.nlm.nih.gov/books/ NBK368474/ Morgan, A. T., Murray, E., & Liégeois, F. J. (2018). Interventions for childhood apraxia of speech. Cochrane Database of Systematic Reviews, 2018(5), CD006278. https://www.doi​ .org/10.1002/14651858.CD006278.pub3 Morgan, A. T., & Webster, R. (2018). Aetiology of childhood apraxia of speech: A clinical practice update for paediatricians. Journal of Paediatrics and Child Health, 54, 1090–1095. Murray, E., McCabe, P., & Ballard, K. J. (2012). A comparison of two treatments for childhood apraxia of speech: Methods and treatment protocol for a parallel group randomized control trial. BMC Pediatrics, 12, 112. Namasivayam, A. K., Pukonen, M., Goshulak, D., Hard, J., Rudzicz, F., Rietveld, T., . . . van Lieshout, P. (2015). Treatment intensity and childhood apraxia of speech. Interna-

tional Journal of Language and Communication Disorders, 50, 529–546 Park, J. E. (2017). Apraxia: Review and update. Journal of Clinical Neurology, 13, 317–324. Peterson, R. L., McGrath, L. M., Smith, S. D., & Pennington, B. F. (2007). Neuropsychology and genetics of speech, language, and literacy disorders. Pediatric Clinics of North America, 54, 543–561. Powell, T.W., Elbert, M., Miccio, A.W., Strike-Roussos, C., & Brasseur, J. (1998). Facilitating s production in young children: An experimental evaluation of motoric and conceptual treatment approaches. Clinical Linguistics & Phonetics, 12, 127–146. Presotto, M., Rosenfeld Olchik, M., Schumacher Shuh, A. F., & Reider, C. R. M. (2015). Assessment of nonverbal and verbal apraxia in patients with Parkinson’s disease, Parkinson’s Disease (Article ID 840327). https://doi.org/​ 10.1155/2015/840327 Preston, J. L., Irwin, J. R., & Turcios, J. (2015). Perception of speech sounds: I. School-age children with speech sound disorders. Seminars in Speech and Language, 26, 224–233. Shriberg, L. D., Austin, D., Lewis, B. A., McSweeny, J. L., & Wilson, D. L. (1997). The percentage of consonants correct (PCC) metric: Extensions and reliability data. Journal of Speech, Language, and Hearing Research, 40, 708–722. Shriberg, L. D., & Kwiatkowski, J. (1982). Phonological disorders III. A procedure for assessing severity of involvement. Journal of Speech and Hearing Disorders, 47, 256–270. Shriberg, L. D., & Kwiatkowski, J. (1994). Developmental phonological profiles: I. A clinical, profile. Journal of Speech, Language, and Hearing Research, 37, 1100–1126. Smith, A., & Weber, C. (2017). How stuttering develops: The multifactorial dynamic pathways theory. Journal of Speech, Language, and Hearing Research, 60, 2483–2505. Vick, J. C., Campbell, T. F., Shriberg, L. D., Green, J. R., Truemper, K., Rusiewicz, H. L., & Moore, C.A. (2014). Data-driven subclassification of speech sound disorders in preschool children. Journal of Speech, Language, and Hearing Research, 57, 2033-2050. Weismer, G. (2008). Speech intelligibility. In M. J. Ball, M. R. Perkins, N. Müller, & S. Howard (Eds.), Handbook of clinical linguistics (pp. 568–582). Oxford, UK: Blackwell. Whiteside, S. P., Dyson, L., Cowell, P. E., & Varley, R. A. (2015). The relationship between apraxia of speech and oral apraxia: Association or dissociation? Archives of Clinical Neuropsychology, 30, 670–682. Worthey, E. A., Raca, G., Laffin, J. J., Wilk, B. M., Harris, J. M., Jakielski, K. J., . . . Shriberg, L. D. (2013). Whole-exome sequencing supports genetic heterogeneity in childhood apraxia of speech. Journal of Neurodevelopmental Disorders, 5, 29. Zadikoff, C., & Lang, A. E. (2005). Apraxia in movement disorders. Brain, 128, 1480–1497. Zaretsky, E., Velleman, S. L., & Curro, K. (2010). Through the magnifying glass: Underlying literacy deficits and remediation potential in Childhood Apraxia of Speech. International Journal of Speech-Language Pathology, 12, 58–68.

16 Pediatric Speech Disorders II Introduction Motor speech disorders in children are often discussed separately from motor speech disorders in adults. Several educated guesses for the separate discussion can be offered. First, motor speech disorders in adults, as discussed in the professional literature and reviewed in Chapter 14, are almost always the result of acquired neurological disease. In these cases, a previously healthy adult has a stroke or other acute condition affecting the brain. Dysarthria and apraxia of speech may also result from neurological deficits resulting from degenerative diseases such as Parkinson’s disease, amyotrophic lateral sclerosis, and multiple sclerosis. Dysarthria in adults may also be an outcome of a traumatic brain injury; traumatic brain injury and motor speech disorders in children are discussed later in the chapter. When these conditions, and other neurological diseases in adults, affect the neurological substrate of the speech mechanism, the damage is to a fully mature system, one in which speech motor control has been established. Presumably, the speech motor control skills developed and maintained over a lifetime may be used by adults, to some degree, to compensate for the loss of control associated with the acquired neurological damage. In contrast, motor speech disorders in children are associated with known or suspected neurological

damage present at birth or throughout childhood. The neurological damage may be acquired in early childhood, as in the case of a child who has a brain tumor removed surgically or experiences a penetrating or closed head injury. It is safe to say that in the case of childhood neurological disease, the development of speech motor control is within the context of brain mechanisms different from those of either healthy adults, or in adults who acquire a neurological disease after typical development of speech motor control. We therefore expect speech behavior, developed within the context of atypical brain mechanisms, to look different from speech behavior of someone whose mechanisms are mature and then damaged by an acquired disease process. “Expect” is an important word in the previous sentence: the difference between speech motor control in children born with neurological disease (or acquiring it very early in life) versus speech motor control in previously healthy adults who acquire neurological disease is a hypothesis. Firm data to support this hypothesis are not yet available. This chapter describes motor speech disorders in children with cerebral palsy, traumatic brain injury, and who have had surgical removal of tumors. The reader is encouraged to review Table 14–1 in Chapter 14 as preparation for the current chapter. Speech disorders in hearing impairment, many cases of which are the result of neurological disease, are discussed in Chapter 23.

219

220

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Childhood Motor Speech Disorders:  Cerebral Palsy Cerebral palsy is the most common motor disability in childhood, with an estimated prevalence of 1.5 to 4 cases per 1,000 live births (Stavsky, Mor, Mastrolia, Greenbaum, Than, & Erez, 2017). The estimated prevalence varies by race, country, and other factors beyond the scope of this chapter; the range of 1.5 to 4 is thought to encompass all of these varying estimates (see Stavsky et al., 2017). Motor speech disorders are found in a minimum of 20% and possibly as many as 50% of children born with cerebral palsy (Nordberg, Miniscalco, Lohmander, & Himmelmann, 2013). The motor speech disorder in cerebral palsy is almost always dysarthria, which results in speech intelligibility deficits. In a small number of cases, apraxia of speech may co-occur with dysarthria. Cerebral palsy is a childhood disorder resulting from brain damage incurred before, during, or shortly following birth. The term “cerebral palsy” includes several different types. A unifying feature of cerebral palsy across types is the impairment of posture and movement. When the impairment of movement is also observed in the speech mechanism, dysarthria is a likely result. Roughly 40% to 45% of children with cerebral palsy have intellectual impairment, which may range from mild to severe (Reid, Meehan, Arnup, & Reddihough, 2018). In addition, about 40% of children with cerebral palsy have some degree of hearing loss (Foo, Guppy, & Johnston, 2013; Weir, Hatch, McRackan, Wallace, & Meyer, 2018). Intellectual disability and hearing loss are both associated with speech production problems. Speech breathing, voice, prosodic, and sound errors in cerebral palsy may therefore have more complicated causes than a purely speech motor control problem.

Subtypes of Cerebral Palsy The types of cerebral palsy have been classified in many different ways since the disease was first described in the mid-19th century (Morris, 2007). In this chapter, we adopt a classification system in common clinical use and that fits with a discussion of speech disorders in cerebral palsy. The main types of cerebral palsy are

spastic, athetoid/dyskinetic, ataxic, and mixed.1 Recent estimates of the occurrence of these types are 85% to 90% for the spastic type, 7% for the dyskinetic type, and 4% for the ataxic type (Wimalasundera & Stevenson, 2016).2 The most common mixed type is a combination of spasticity and dyskinesia. Other subtypes have been discussed in the literature, but they are rare and not discussed here. A brief review of neuroanatomical structures provides a framework for the following description of cerebral palsy types. The corticobulbar and corticospinal tracts (fiber bundles) connect cells in the primary motor cortex to motor cells in the brainstem (corticobulbar) and spinal cord (corticospinal). The basal ganglia, a group of subcortical nuclei with connections to cortical cells, play a major role in the planning, initiation, and execution of movements; the basal ganglia also inhibit movement when it is not appropriate for an action. Finally, the cerebellum is located beneath the cerebral hemispheres, just behind the brainstem. The cerebellum plays an important role in the coordination of movements, and the smoothness with which the movements are executed.

The Spastic Type of Cerebral Palsy Spastic cerebral palsy is the outcome of damage to the corticobulbar and/or corticospinal tracts. The damage is likely to occur early in fetal development. Damage to these tracts results in excessive muscle tone that results in stiffness and weakness in affected structures (such as a limb or the jaw). The excessive muscle tone may be chronic. The constant contraction of wrist, arm, and leg muscles may result in distorted postures of these structures, even when they are not being used. Damage to the corticobulbar and corticospinal tracts also results in hypersensitive reflexes, which may be triggered by a purposeful movement. These reflexes can interfere with the movement and its goal.

The Dyskinetic Type of Cerebral Palsy Dyskinetic cerebral palsy is the result of damage to the basal ganglia. Dyskinesias are uncontrolled body movements, either at rest or during purposeful movements. Dyskinesias are sometimes described as writhing, uncontrolled movements and have the potential

1

 ome scientists and clinicians add to this classification the number of limbs involved. For example, “spastic diplegia” indicates the spastic S type with symptoms observed in two limbs, “spastic quadriplegia” symptoms observed in all four limbs, and “spastic hemiplegia” symptoms observed on just one side of the body (usually in both limbs).

2 

 ercentages vary in different surveys published in the literature. In all surveys, however, the spastic type of cerebral palsy clearly occurs with P the greatest frequency. As reviewed by Bugler, Gaston, and Robb (2019), the occurrence of the spastic type of cerebral palsy among all cases in six major studies (including their own) ranged from 81% to 100% (average across all six studies = 91% spastic type, based on a total of 4,385 children diagnosed with cerebral palsy).

16  Pediatric Speech Disorders II

to interfere with proper movement control to achieve a goal. A simple example of a disturbance in achieving such a goal is the involuntary, uncontrolled movement of the arm that interferes with the act of raising a fork to the mouth.

The Ataxic Type of Cerebral Palsy This relatively rare type of cerebral palsy results from damage to the cerebellum. The function of the cerebellum is to control posture and balance, coordination of muscle contraction during voluntary movement, and control of the force of muscle contraction to guarantee accuracy and scale of movement (how big or small the movement is depends on the goal of a movement). Cerebellar damage often results in disturbance of these functions. A child may be diagnosed with the ataxic type of cerebral palsy when his feet are widely planted when walking and he is unsteady. The diagnosis is reinforced if accuracy and force of movement are poorly controlled. Walking may be delayed in babies with suspected ataxia, and other milestones (e.g., sitting up) may also be delayed.

The Mixed Type of Cerebral Palsy Children diagnosed with the mixed type of cerebral palsy are assumed (or known, as revealed by imaging studies) to have damage in multiple parts of the brain. As reviewed earlier, the mixed type of cerebral palsy is infrequent, the most likely type being a spasticdyskinetic form of the disease.

Dysarthria in Cerebral Palsy The dysarthria in cerebral palsy is often described by the Mayo Clinic classification system, described in Chapter 14. Children diagnosed with the spastic type of cerebral palsy are typically judged to have spastic dysarthria, the dyskinetic type of cerebral palsy have hyperkinetic dysarthria (dysarthria resulting from involuntary movements during attempts to move articulators), and so forth. As in adult dysarthria, the diagnosis of dysarthria type is made by perceptual analysis. Also as in adults, the dysarthria diagnosis is not necessarily matched to the diagnosis of cerebral palsy type. Thus, a child diagnosed with the spastic type of cerebral palsy may be perceived with hyperkinetic dysarthria (or vice versa). An important question is the extent to which the dysarthria of a specific type in children has the same speech characteristics as the adult dysarthria of the same type. For example, does spastic dysarthria in

221

children with cerebral palsy sound the same, and on careful analysis have the same characteristics, as spastic dysarthria in adults who have suffered a stroke? Keep in mind an important caution throughout the description of speech characteristics in different types of childhood dysarthria. For a particular type of dysarthria, not every characteristic applies to each child diagnosed with the type. There is a good deal of variation from child to child in speech characteristics of any single dysarthria type. This is similar to the case of adult dysarthrias.

Spastic Dysarthria According to Workinger (2005), children with the spastic type of cerebral palsy have speech breathing and phonation problems, with less-affected articulation. The speech breathing and laryngeal problems result in a weak voice and a strained voice quality (Solomon & Charron, 1998). The strained voice quality, the result of unusually strong muscle force in the larynx during phonation, may be a compensation for the muscular problems in the respiratory system and larynx. If the natural result of spasticity in the larynx — excessive muscle tone, for example — prevents the larynx from closing during phonation, laryngeal muscles may work too forcefully to overcome excessive “leaks” of airflow through the vibrating vocal folds. Similarly, the stiff muscles of the respiratory system that compress the lungs may have difficulty generating the pressures required for phonation. The low lung pressures and consequently low airflows are tightly metered through the larynx to conserve the limited air supply, resulting in the strained voice quality. More recent research (Lee, Hustad, & Weismer, 2014; Levy, Chang, Ancelle, & McAuliffe, 2017; Schölderle, Staiger, Lampe, Strecker, & Ziegler, 2016) points to a more prominent role of disordered articulation in children with the spastic type of cerebral palsy and dysarthria. The outcome measure of interest in these studies is speech intelligibility; increases in the articulatory deficit result in decreases in speech intelligibility. Another aspect of articulatory problems noted by Lee et al. and Levy et al. is an abnormally slow speaking rate in children with spastic dysarthria. Slow speaking rates stretch out the duration of speech sounds, which may contribute to problems with speech intelligibility. The results of Lee et al. (2014) and Levy et al. (2017) are consistent with results reported by Platt, Andrews, Young, and Quinn (1980), who performed detailed analysis of sound errors and speech intelligibility in young and older adults with the spastic type of cerebral palsy. On average, approximately 20% of all sounds in the words, including consonants and vowels,

222

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

were misarticulated. The average speech intelligibility for these speakers was 59%. Platt et al.’s results suggest a substantial role for speech sound errors in speech intelligibility. Are the data on speech sound errors reported for adults with the spastic type of cerebral palsy applicable to children with spastic dysarthria? A convincing answer is not available. No one has done a study of children with spastic dysarthria, similar to the study of Platt et al. (1980), in which detailed phonetic transcription was performed to identify speech sound errors. However, Platt et al. compared the types of phonetic errors they obtained to the same kind of analysis reported by Byrne (1959) many years ago for children with cerebral palsy. The pattern of errors in both studies was the same, suggesting (tentatively) that the adult data may provide a partial model for speech sound errors in children with cerebral palsy. The reason for the exaggerated caution in the previous sentence is the strong possibility of change in a child’s dysarthria characteristics as he grows older, which may change the similarity between the adult and child forms of spastic dysarthria in cerebral palsy (Schölderle et al., 2016). Prosody is also affected in spastic dysarthria. The effect is a reduced ability to change the pitch of the voice to produce the melody (intonation) of speech, and

Intelligibility Is More Than Understanding Words Prosody is tricky. Melodies of statements versus questions are different, but not always. In general, “Who,” “what,” “where,” “when,” and “why” questions have falling pitch at the end of an utterance. Exceptions to this for “wh-” questions occur when a speaker does not quite get the question, or is in disbelief at the question: “When do I get my allowance?” (typical falling pitch) versus “When do I get my allowance?” (rising pitch at the end, parent response in disbelief that the question was asked). The inability to control the melody of utterances can interfere with a listener’s understanding of subtle shades of meaning between these utterances that share the same speech sounds. This potential for communication failure extends to mood, which is conveyed both subtly and in dramatic ways by pitch change across utterances. Happy, sad, bored, angry, hopeful, are all communicated by variations in the melody of utterances. The lesson is: Intelligibility is more than understanding words.

the pitch contrasts for syllables in multisyllabic words. Pitch contrasts at the sentence level are used to convey meaning (as in the difference between statements and questions) and emotion. Pitch contrasts for syllables in multisyllabic words are an important component of listener word recognition. A word such as “copycat” has two stressed syllables “cop” and “cat” surrounding an unstressed syllable. The unstressed “ee” in “copy” is very short, with a lower pitch compared with the two stressed syllables. The pitch differences are part of lexical stress as indicated in dictionaries.

Compensation in the Speech Mechanism Like limb motor control strategies, adjustment of the speech mechanism for changing or changed conditions is not unusual. For example, when speakers are asked to increase speaking rate, they make articulatory movements that are smaller than movements at “habitual” (normal) speaking rates. This strategy allows the production of more syllables per unit time — smaller movements require less time. Speech therapy that utilizes variations in speaking rate are based on the idea that movements at a slowed rate are larger than at habitual rates. A slowed rate gives clients a better chance to produce the articulatory movements required to position articulatory structures (the tongue, lips, soft palate) for a correct speech sound. Another example of compensation in persons with speech motor control problems is a tightening of the larynx to restrict the limited airflow coming from the lungs. Individuals adjust one part of the speech mechanism — in this case, the tightness of closure during vocal fold vibration — to compensate for another part (the respiratory system) that is not functioning well. In some cases, the adjustment can be excessive — like the strain-strangled voice in spastic dysarthria that attempts to compensate for weak airflow from the lungs.

Dyskinetic Dysarthria Damage to the basal ganglia occurs in 10% to 20% of children with cerebral palsy. The damage occurs later in fetal development as compared with the earlier damage thought to occur in the corticobulbar and corticospinal tracts. As reviewed earlier, damage to the basal ganglia results in loss of control of speech structures. Articula-

16  Pediatric Speech Disorders II

tory movements are directed to unintentional locations within the vocal tract, and sudden, unintentional movements disrupt these movements. Also, slow, unplanned changes occur in structures that are intended to maintain articulatory and phonatory positions and shapes for a limited amount of time. The writhing, changing configuration of the lips for vowel sounds is one example of this motor control problem. Another example is the background muscular tone in the larynx that allows the vocal folds to vibrate for phonation with controlled pitch, loudness, and quality. The inability to stabilize these background forces results in a lack of phonatory control. These motor control problems — really, a group of motor control problems — are called dyskinesias (impairments of voluntary movement). The older literature on types of cerebral palsy uses the term “athetoid cerebral palsy” to describe these movement problems. Currently, dyskinetic cerebral palsy is the preferred term. The involuntary, often constant movements in the dyskinetic form of cerebral palsy are frequently observed in structures of the head and neck. Many children with dyskinetic cerebral palsy have random, involuntary movements of the mouth and tongue. The

involuntary movements of the head and neck, and especially of speech mechanism structures such as the lips and tongue, are thought by many scientists and clinicians to play a major role in the dysarthria observed among many children with the dyskinetic type of cerebral palsy. Dysarthria in the dyskinetic type of cerebral palsy shares many characteristics with spastic dysarthria. Both types have many speech sound errors and similar patterns of errors across different speech sounds (Platt et al., 1980). The lack of voice stability in dyskinetic dysarthric is different from the strain-strangled voiced quality in spastic dysarthria. Dyskinetic dysarthria may have a more intermittent hypernasality (sometimes too nasal, other times not) compared with spastic dysarthria in which the hypernasality may be more or less constant. Selected characteristics of spastic and dyskinetic dysarthria in children observed directly or expected from analyses of adult data are summarized in Table 16–1. As stated above, not all characteristics are seen in each child, and the severity of the characteristics varies across children even when they share a common speech diagnosis such as spastic dysarthria.

Table 16–1.  Summary of Speech Characteristics of Children With the Spastic and Dyskinetic Types of Cerebral Palsy and Dysarthria Spastic type

Speech breathing problems (difficulty generating positive lung pressure and adequate airflows) Weak voice; strain-strangled quality Speech sound errors (consonants and vowels) contribute heavily to speech intelligibility deficits Slow speaking rate (lengthened sound durations) Chronic hypernasality Pauses (dysfluency)

Dyskinetic type

223

Instability of phonatory (voice) pitch, loudness, and quality Speech sound errors (consonants and vowels) Intermittent hypernasality Speech breathing problems Pauses (dysfluency) Prosody problems

Note.  Data from adults with the spastic type of cerebral palsy and dysarthria contribute to this summary, mostly for the component of speech sound errors. Data from “Dysarthria of Adult Cerebral Palsy: I. Intelligibility and Articulatory Impairment,” by L. G. Platt, G. Andrews, M. Young, and P. T. Quinn, 1980, Journal of Speech and Hearing Research, 23, pp. 28–40; “Dysarthria In Adults With Cerebral Palsy: Clinical Presentation and Impacts on Communication,” by T. Schölderle, A. Staiger, R. Lampe, K. Strecker, and W. Ziegler, 2016, Journal of Speech, Language, and Hearing Research, 59, pp. 216–229; Platt, Andrews, & Howie, 1980; and review of the adult literature in “Acoustic-Phonetic Contrasts and Intelligibility in the Dysarthria Associated with Mixed Cerebral Palsy,” by B. M. Ansel and R. D. Kent, 1992, Journal of Speech and Hearing Research, 35, pp. 296–308.

224

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Ataxic Dysarthria The ataxic type of cerebral palsy reflects damage to the cerebellum, or to the fiber tracts that connect the cerebellum with other parts of the central nervous system. Although ataxic disorders in children who have a variety of diseases are not exceedingly rare, the ataxic type of cerebral palsy is. An estimate of the prevalence of the disorder is 0.1 cases per 1,000 live births (Musselman et al., 2014). The data on the dysarthria of the ataxic type of cerebral palsy are limited, largely due to the rareness of the disorder. In the small number of ataxic children studied by Workinger (2005), the dysarthria was judged to be mild. However, among dysarthric children with the spastic, dyskinetic, and ataxic types of cerebral palsy, Nordberg, Miniscalco, and Lohmander (2014) reported that the most severe speech symptoms were observed for the ataxic children. The ataxic children had more severe consonant errors compared with the spastic and dyskinetic children. The number of children in the Workinger (2005) and Nordberg et al. (2014) studies is too small to draw firm conclusions concerning the typical characteristics of the dysarthria in the ataxic type of cerebral palsy. It is possible that these match speech characteristics that are well-documented in the adult literature on ataxic dysarthria (Chapter 14). It is also possible that descriptions of ataxic dysarthria in children with cerebral palsy can be estimated from ataxic dysarthria in children who have had tumors removed from their brainstem or cerebellum. These cases are reviewed in the next sections.

Childhood Motor Speech Disorders: Traumatic Brain Injury and Tumors Traumatic Brain Injury The diagnostic category traumatic brain injury (TBI) is separated into closed-head injuries and penetrating head injuries. Closed-head injuries include blows to the head that leave the skull more or less intact but result in brain damage. In penetrating head injuries, an object (such as a bullet or bomb fragment) pierces the skull and enters the brain. Here we limit our discussion to closed head injuries. In the United States, the prevalence of childhood TBI has been estimated at 2.5%; approximately 18% of children with lifelong effects from a TBI have dysarthria and language deficits (Haarbauer-Krupa, Lee, Bitsko, Zhang, & Kresnow-Seddaca, 2018). Note the phrase, “lifelong effects,” in the preceding sentence.

This is an important consideration in the likelihood of dysarthria in TBI. Morgan, Mageandran, and Mei (2009) reported on an 8-year series of 1,895 children with a recent TBI and identified only 22 diagnosed with dysarthria. However, Morgan et al. point out that children with more severe head injuries — the ones likely to have significant lifelong deficits — were also more likely to have dysarthria. TBI occurs as a result of motor vehicle accidents, falls, sports-related accidents, and other mishaps. Most children with a TBI show improvement following the accident, and many show great gains over time and regain nearly normal function. This recovery includes speech and language function, and nearly full speech function may be recovered even when a child is mute shortly after the accident (Campbell & Dollaghan, 1995). The brain damage in TBI is often diffuse, meaning it includes widespread parts of the brain. Recall that the brain is protected from the bony casing of the skull by the fluid surrounding it and a thick, hide-like covering which is the outer layer of the meninges. In TBI, the brain may be twisted and rotated at high accelerations within this protective environment, resulting in injury to the fiber tracts connecting various cell groups (nuclei). The axons — the parts of brain tissue that make up the fiber tracts — are stretched and in some cases “sheared” by the twisting and rotating motion. There also can be focused regions of damage in the brain, as when the brain tissue beneath the point of impact on the skull is damaged. Children who suffer TBI are likely to have speech, language, and cognitive symptoms that interfere with their academic and social lives. One clinic’s experience over a 5-year period suggests that approximately 15% of the children seen for TBI-related services will have some form of dysarthria (Hodge & Wellman, 1999).

Prevalence and Incidence — Part I The term “prevalence” refers to the existing number of cases of a disorder or condition at a given time; “incidence” refers to the number of “new” cases over some time period, such as at birth, or over the course of a year. Analysis of prevalence and incidence are ways to estimate the “true” numbers in the whole population — think, for example, of doing a study that sampled the entire population in the United States for prevalence of fourth ventricle tumors. Thus, percentages vary across investigators because they are estimates, not true values.

16  Pediatric Speech Disorders II

Dysarthria in TBI Speech breathing, voice, velopharyngeal function, and articulation are likely to be affected in children with TBI and dysarthria (Morgan et al., 2009). The Mayo Clinic system categories (Flaccid, Spastic, Ataxic, Hyperkinetic, Hypokinetic, Mixed) have been used to describe the speech of these children, but in all likelihood, the “mixed” type is the most frequent. This reflects the diffuse brain injury in TBI. Chronic hypernasality, slow speaking rate, voice quality abnormalities, prosodic abnormalities (including inappropriate pauses), and imprecise consonants and vowels are heard frequently in this clinical group. Results of a brain imaging study suggest that the corticobulbar tract is frequently damaged in children with TBI and dysarthria (Liégeois, Tournier, Pigdon, Connelly, & Morgan, 2013). As discussed earlier, the spastic type of cerebral palsy is associated with dam-

Prevalence and Incidence — Part II In the series of children analyzed by Morgan et al., 1.2% were diagnosed with dysarthria. This series of children was a consecutive one — the children were not selected for the study based on a characteristic other than being diagnosed with a TBI. This means that the sample of 1,895 children included mild, moderate, and severe cases. When cases are chosen only for severe TBI, a greater percentage of children are diagnosed with dysarthria. A different series of patients with TBI, over a 5-year period, estimated a 15% occurrence of dysarthria (Hodge & Wellman, 1999). Children in Hodge and Wellman came to a speech-language clinic seeking services, whereas children in the Morgan et al. (2009) study were taken from a social medicine registry in Australia, and no doubt included many cases in which services were not pursued. Thus, the Moran et al. sample probably included many more mild cases (hence, not seeking services for speech and language problems) compared with Hodge and Wellman. When Morgan et al. extracted from the total sample only those children who had been referred for services to a speech-language pathologist, 14% of the children were diagnosed with dysarthria, a number very similar to the one reported by Hodge and Wellman. These percentages are estimates of prevalence, not incidence, and many variables affect the estimates.

225

age to the corticobulbar and corticospinal tracts. Based on the imaging data of Liégeois et al., spastic dysarthria alone or in combination with another dysarthria type might be expected to occur frequently among children with TBI and dysarthria. When diffuse injury affects both cerebral hemispheres or the tracts between cortical motor cells and motor cells in the brainstem and/or spinal cord, the dysarthria is severe. In these cases, there is a poor outlook for improvement of the dysarthria.

Brain Tumors Tumors in the region of the fourth ventricle are called posterior fossa tumors; they are relatively rare. The posterior fossa is a cradle within the skull base in which the posterior structures of the brain — the brainstem, fourth ventricle, and cerebellum — are contained. These structures are immediately below the occipital lobes of the cerebral hemisphere. The occipital lobes are the most posterior of the four lobes of the cerebral hemispheres. Posterior fossa tumors are typically cancerous and are removed surgically. Often the surgery is followed by radiation and/or chemotherapy.

Dysarthria in Brain Tumors Approximately 50% of childhood brain tumors are located in and around the fourth ventricle. The fourth ventricle is the diamond-shaped cavity located between the posterior wall of the brainstem and the anterior wall of the cerebellum (see Chapter 2, Figure 2–8). The fourth ventricle is one of the cavities in the brain through which cerebrospinal fluid flows. A tumor in this area exerts pressure on brainstem and cerebellar structures and causes disruptions of the functions served by those structures. Two of those important functions are speech and swallowing. The cerebellum plays a major role in the coordination of muscles of the respiratory system, the larynx, and the articulators to produce speech smoothly and with syllable sequences that sound “seamless.” A disease process that interferes with cerebellar function disrupts coordination and makes speech sound choppy, as if consecutive syllables, and even sounds within a syllable, are pulled apart. The cranial nerves are connected directly to muscles of the larynx and articulators and issue the final commands to control contraction characteristics such as force and timing. Muscle contraction strength is weak and ineffective when cranial nerves that serve head and neck muscles are affected by a disease process.

226

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Many children with posterior fossa tumors have dysarthria. Shortly after surgical removal of the tumor, approximately 10% to 33% of the children have a condition called cerebellar mutism (compare Morgan et al., 2011, to Tamburrini, Frassanito, Chieffo, Massimi, Caldarelli, & Di Rocco, 2015). In this stage of postsurgical recovery, children are mute (unable to speak) for a variable period of time. The duration of mutism can be 2 weeks or more, although some children with cerebellar mutism begin to speak within hours or a few days after the surgery. Children who are mute and then regain an ability to speak usually have dysarthria, which typically improves but not necessarily to the point of normal speech. The disruption of cerebellar and cranial nerve speech functions by a posterior fossa tumor may be expected to result in a flaccid dysarthria, or a cerebellar (ataxic) dysarthria, or a mixed flaccid-ataxic dysarthria (Chapter 14). Some authors endorse this expectation, based on their data (Morgan et al., 2011), whereas others do not (De Smet et al., 2012). Children who are not mute following surgery often regain normal speech abilities even when dysarthria is present during the early days of their recovery.

Treatment Options and Considerations The end-goal of speech-language pathologists when treating children with neurologically based speech disorders is to improve speech intelligibility. Publications in the speech-language pathology literature provide preliminary evidence that speech-language therapy can result in positive gains in the outcome goal of improved intelligibility. In one approach, speech-language pathologists focus initially on simple strength and coordination of structures of the speech mechanism. The idea behind this first step is to provide a foundation for the more complex motor control requirements of speech production. These therapy activities are called oro-motor nonspeech exercises, because they are done in the absence of phonation (voice) or articulatory movements. For example, the lips and tongue can be exercised by having a child compress his lips together as forcefully as possible or push his tongue against a resistance (e.g., using tongue protrusion to move a barrier placed in front of the lips). The analogy to increasing strength in an arm or leg by exerting high limb force and pressing against resistances is direct. Nonspeech exercise is also extended to muscles of the respiratory system that support speech breathing (Chapter 10). These muscles can be strengthened by a program of forceful exhala-

tion through an airflow resistance. The airflow resistance is provided by a tube containing a float that is displaced by incoming air, or a wire mesh screen held in place by a face mask. Forceful exhalation exercises rib cage muscles used to raise lung pressures required to vibrate the vocal folds and achieve an effective voice loudness. When a child succeeds in completing multiple trials of the nonspeech exercises, a layer of motor complexity is added to the task. One version of this is to train the child to coordinate the initial part of expiratory airflow with phonation of a sustained “ah.” Although this seems to be a very simple task, many children with either cerebral palsy or a TBI are challenged by it. When the training of coordinating speech breathing with phonation results in good performance, a child may be trained to open her jaw wide during phonation to produce a louder sound. These exercises are hierarchical, in the sense that separate training tasks are sequenced to build on simple skills and make them increasingly complex. In the sequence just described, there is no mention of direct training of articulatory skills. Some therapeutic programs may, in fact, not have a major goal of training articulatory skills. Rather, the foundation of stronger speech muscles, coordination of speech breathing, and increased vocal loudness are thought to have a “spreading effect” to articulatory skill. Even in the absence of training to produce (for example), good fricative, vowel, rhotic and lateral sounds, the improved foundation of speech motor skills leads to improved articulatory behavior. The overall effect of the therapy is better speech intelligibility — the long-term outcome goal of speech therapy. A small amount of research evidence is available to support the connection between a stronger foundation of speech motor skills and improved speech intelligibility. In both young (5 to 11 years old) and older children (12 to 18 years) with cerebral palsy, speech intelligibility for single words improved between 9% and 14% following the therapy previously described (Pennington, Miller, Robson, & Steen, 2010; Pennington, Roelant, Thompson, Robson, Steen, & Miller, 2013). Not every child in these studies had improved intelligibility following the therapy, but the overall improvement for the groups of children is encouraging. Whether or not the examples of speech therapy in children with cerebral palsy can be generalized to children with tumors of the fourth ventricle or with a TBI is unknown; the appropriate studies are not available. Clark (2003) provides a review of treatment in neurological speech disorders. When a child has severe dysarthria, and is either completely unintelligible or barely intelligible, therapy

16  Pediatric Speech Disorders II

approaches exist to provide the child with communication options. Augmentative and alternative communication (AAC) technology can supplement (augmentative) or substitute (alternative) for the severely impaired speech skills. AAC can be low-tech, such as an alphabet board that allows users to spell words or a picture board to convey simple ideas, or high-tech, such as speech synthesizers, controlled by hand, a light pointer, or eye movements, like the one used by the famous physicist Stephen Hawking. AAC options are adapted to each user’s needs and capabilities, under the guidance of a specialist, to make communication available to those children who cannot speak.

Chapter Summary Motor speech disorders in children are discussed separately from motor speech disorders in adults because the effects of neurological disease on a child’s speech are in the context of a developing brain, while in adults the effects are in the context of a previously developed and mature brain. Cerebral palsy is the most common motor disability in childhood, and it has been estimated that dysarthria is present in 20% to 50% of the diagnosed cases. The major types of cerebral palsy are spastic, dyskinetic, and ataxic, with the majority of children with cerebral palsy having the spastic type. The spastic type of cerebral palsy results from damage to the corticobulbar and corticospinal tracts, the dyskinetic type from damage to the basal ganglia, and the ataxic type from damage to the cerebellum and its tracts connecting it to other parts of the central nervous system. The types of dysarthria in children with cerebral palsy are often matched to those in adults with dysarthria: spastic dysarthria in the spastic type, hyperkinetic dysarthria in the dyskinetic type, and ataxic dysarthria in the ataxic type. The perceptual analysis used to diagnose the dysarthria type may not match the diagnosed type of cerebral palsy (e.g., spastic dysarthria may be diagnosed in a child diagnosed with the dyskinetic type of cerebral palsy). Spastic dysarthria in cerebral palsy is thought to result from stiff muscles which are the result of excessive tone in speech structures such as the larynx and tongue. The speech characteristics in spastic dysarthria include sound errors (consonants and vowels) and problems with prosody; these characteristics result in speech intelligibility deficits.

227

Hyperkinetic dysarthria in cerebral palsy is thought to result from uncontrolled movements of the speech structures, and muscle tone that is constantly changing. The speech characteristics in dyskinetic dysarthria are similar to those in spastic dysarthria, including problems with speech intelligibility. Ataxic dysarthria in cerebral palsy is thought to result from coordination problems among the articulators, larynx, and respiratory system. The speech characteristics in ataxic dysarthria are not well understood because of the small occurrence of the ataxic type of cerebral palsy, and a lack of relevant studies. Traumatic brain injury (TBI) associated with a closed head injury results in diffuse damage to brain tissue, as well as focal (one part of the brain) injuries. Children with TBI often have cognitive, social, and speech and language problems, many of which improve over time; approximately 15% of children with TBI have dysarthria. The dysarthria in TBI is likely to be a mix of the Mayo categories and to include slow speaking rate, hypernasality, abnormal voice quality and prosodic deficits, and speech sound errors. Posterior fossa tumors are located in and around the fourth ventricle and may result in muteness immediately after surgery, followed by a period of recovery of speech skills; many children regain nearly normal speech following surgery, but some children have dysarthria as a long-term communication disorder. In preliminary studies, speech therapy has been shown to be effective in children with dysarthria, especially when the focus of the therapy is improvement of speech intelligibility by working on basic strength and coordination skills as the foundation for more complex speech motor tasks.

References Ansel, B. M., & Kent, R. D. (1992). Acoustic-phonetic contrasts and intelligibility in the dysarthria associated with mixed cerebral palsy. Journal of Speech and Hearing Research, 35, 296–308. Bugler, K. E., Gaston, M. S., & Robb, J. E. (2019). Distribution and motor ability of children with cerebral palsy in Scotland: A registry analysis. Scottish Medical Journal, 64, 16–21. Byrne, M. (1959). Speech and language development of athetoid and spastic children. Journal of Speech and Hearing Disorders, 24, 231–240. Campbell, T. F., & Dollaghan, C. A. (1995). Speaking rate, articulatory speed, and linguistic processing in children and adolescents with severe traumatic brain injury. Journal of Speech and Hearing Research, 38, 864–875.

228

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Clark, H. M. (2003). Neuromuscular treatments for speech and swallowing: A tutorial. American Journal of SpeechLanguage Pathology, 12, 400–415. De Smet, H. J., Catsman-Berrevoets, C., Aarsen, F., Verhoeven, J., Mariën, J., & Paquier, P. F. (2012). Auditory-perceptual speech analysis in children with cerebellar tumours: A longterm follow-up study. European Journal of Pediatric Neurology, 16, 434–442. Foo, Y., Guppy, M., & Johnston, L. M. (2013). Intelligence assessments for children with cerebral palsy: A systematic review. Developmental and Child Neurology, 55, 911–918. Haarbauer-Krupa, J., Lee, A. H., Bitsko, R. H., Zhang, X., & Kresnow-Seddaca, M. J. (2018). Prevalence of parentreported traumatic brain injury in children and associated health conditions. JAMA Pediatrics, 172, 1078–1086. Hodge, M. M., & Wellman, L. (1999). Management of children with dysarthria. In A. J. Caruso & E. A. Strand (Eds.), Clinical management of motor speech disorders in children (pp. 209–280). New York, NY: Thieme. Lee, J., Hustad, K. C., & Weismer, G. (2014). Predicting speech intelligibility with a multiple speech subsystems approach in children with cerebral palsy. Journal of Speech, Language, and Hearing Research, 57, 1666–1678. Levy, S. E., Chang, Y. M., Ancelle, J. A., & McAuliffe, M. J. (2017). Acoustic and perceptual consequences of speech cues for children with dysarthria. Journal of Speech, Language, and Hearing Research, 60, 1766–1779. Liégeois, F., Tournier, J-D., Pigdon, L., Connelly, A., & Morgan, A. T. (2013). Corticobulbar tract changes as predictors of dysarthria in childhood brain injury. Neurology, 80, 926–932. Morgan, A. T. Liégeois, F., Liederkerke, C., Vogel, A. P., Hayward, R. Harkness, W., . . . Vargha-Khadem, F. (2011). Role of cerebellum in fine speech control in childhood: Persistent dysarthria after surgical treatment for posterior fossa tumour. Brain and Language, 117, 69–76. Morgan, A. T., Mageandran, S-D., & Mei, C. (2009). Incidence and clinical presentation of dysarthria and dysphagia in the acute setting following pediatric traumatic brain injury. Child: Care, Health, and Development, 36, 44–53. Morris, C. (2007). Definition and classification of cerebral palsy: A historical perspective. Developmental Medicine and Child Neurology, 49, 3–7. Musselman, K. E., Stoyanov, C. T., Marasigan, R., Jenkins, M. E., Konczak, J., Morton, S. M., & Bastian, A. J. (2014). Prevalence of ataxia in children. Neurology, 82, 80–89. Nordberg, A., Miniscalco, C., & Lohmander, A. (2014) Consonant production and overall speech characteristics

in school-aged children with cerebral palsy and speech impairment. International Journal of Speech-Language Pathology, 16, 386–395. Nordberg, A., Miniscalco, C., Lohmander, A., & Himmelmann, K. (2013). Speech problems affect more than one in two children with cerebral palsy: Swedish populationbased study. Acta Pædiatrica, 102, 161–166. Pennington, L., Miller, N., Robson, S., & Steen, N. (2010). Intensive speech and language therapy for older children with cerebral palsy: A systems approach. Developmental Medicine and Child Neurology, 52, 337–344. Pennington, L., Roelant, E., Thompson, V., Robson, S., Steen, N., & Miller, N. (2013). Intensive dysarthria therapy for younger children with cerebral palsy. Developmental Medicine and Child Neurology, 55, 464–471. Platt, L. G., Andrews, G., Young, M., & Quinn, P. T. (1980). Dysarthria of adult cerebral palsy: I. Intelligibility and articulatory impairment. Journal of Speech and Hearing Research, 23, 28–40. Reid, S. M., Meehan, E. M., Arnup, S. J., & Reddihough, D. S. (2018). Intellectual disability in cerebral palsy: A populationbased retrospective study. Developmental Medicine and Child Neurology, 60, 687–694. Schölderle, T., Staiger, A., Lampe, R., Strecker, K., & Ziegler, W. (2016). Dysarthria in adults with cerebral palsy: Clinical presentation and impacts on communication. Journal of Speech, Language, and Hearing Research, 59, 216–229. Solomon, N. P., & Charron, S. (1998). Speech breathing in able-bodied children and children with cerebral palsy: A review of the literature and implications for clinical intervention. American Journal of Speech-Language Pathology, 7, 61–78. Stavsky, M., Mor, O., Mastrolia, S. A., Greenbaum, S., Than, N. G., & Erez, O. (2017). Cerebral Palsy — Trends in epidemiology and recent development in prenatal mechanisms of disease, treatment, and prevention. Frontiers in Pediatrics, 5, 21. doi:10.3389/fped.2017.00021 Tamburrini, G., Frassanito, P., Chieffo, D., Massimi, L., Caldarelli, M., & Di Rocco, C. (2015). Cerebellar mutism. Child’s Nervous System, 31, 1841–1851. Weir, F. W., Hatch, J. L., McRacken, T. R., Wallace, S. A., & Meyer, T. A. (2018). Hearing loss in pediatric patients with cerebral palsy. Otology and Neurotology, 39, 59–64. Wimalasundera, N., & Stevenson, V. L. (2016). Cerebral palsy. Practical Neurology, 16, 184–194. Workinger, M. S. (2005). Cerebral palsy resource guide for speech-language pathologists. Clifton, NY: Thomson Delmar Learning.

17 Fluency Disorders Introduction A diagnosis of stuttering is made when a child or adult cannot maintain the normal flow of speech even when there is no firm evidence of a disease process or structural problem affecting the peripheral speech mechanism (respiratory system, larynx, upper articulators) or the brain. Typically, the diagnosis is made by listening to the child (or adult) and counting dysfluencies. Historically, stuttering has been mysterious and intriguing because it is a speech problem of unknown cause. In keeping with contemporary scientific and clinical literature on fluency disorders, we use the term “people who stutter” (PWS), among whom are children who stutter (CWS) and adults who stutter (AWS). Stuttering behavior is variable. PWS may have severe episodes of stuttering in some situations but not others. PWS may enjoy extended periods of apparent fluency, preceded and followed by long periods of severe stuttering behavior. Within a matter of minutes, certain types of speech may be produced with severe stuttering, whereas other types (e.g., singing, perhaps talking to a pet) may be completely fluent. Stuttering is therefore not a fixed set of symptoms, but rather a variable set of behaviors. For children who are diagnosed with stuttering around age 3 years and who continue to stutter past the age of 5 or 6 years and into the teenage years and adulthood, nonspeech behaviors may be associated with stuttering (Table 17–1, discussed later).

Incidence and Prevalence of Stuttering A great deal of research has been devoted to the incidence and prevalence of “childhood-onset fluency disorder” (Yairi & Ambrose, 2013). Childhood-onset fluency disorder (hereafter, developmental stuttering), classified in the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association [APA], 2013) as a neurodevelopmental disorder, begins in very early childhood, perhaps as early as 2 years of age. “Incidence” is the percentage of new diagnoses of a disorder within the population, over a given time period (such as a year). “Prevalence” is an estimate of the percentage of the population who stutter at a particular point in time. The contrast between incidence and prevalence of all cases of stuttering (including CWS and AWS) is particularly important to understanding the developmental nature of the disorder. The focus in this chapter is on developmental stuttering because the great majority of cases are diagnosed well before the age of 6 years. One estimate of the average onset age of stuttering is 33 months. A conservative estimate of the incidence of stuttering is 5% to 8% of the population (Yairi & Ambrose, 2013). As previously noted, this estimate includes all children and adults newly diagnosed within a fixed period of time. In contrast, the prevalence of stuttering ​

229

230

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Table 17–1.  A Brief, Natural History of Stuttering

Stage Typical dysfluencies

Age Range (years) 1.5–6

Core Behaviors

Secondary Behaviors

• Low number of dysfluencies

None

• Part-word or single syllable repetitions • Possible sound prolongations and “tense” pauses • Revisions more frequent over time Borderline stuttering

1.5–6

• High number of dysfluencies

Rare

• Multiple syllable repetitions Beginning stuttering

2–8

• Rapid, multiple repetitions with tension

• Escape behaviors (e.g., eyeblinks, “um’s”)

• Blocks

• Frustration (awareness)

• Difficulty initiating words Intermediate stuttering

6–13

• Blocks • Repetitions, prolongations

• Escape behaviors to terminate blocks • Avoidance behaviors (sounds, words, situations)

Advanced stuttering

>14

• Long, tense blocks • Repetitions, prolongations

• Sophisticated escape and avoidance behaviors

• Lip, jaw tremors Note. It has been argued that a group of children, perhaps about one-third of all children who are diagnosed with childhood stuttering, do not begin stuttering behavior with easy repetitions, as summarized in text and this table. Rather, these children show initial symptoms that look more like intermediate or advanced stuttering, with tense blocks and struggle behavior. This sudden onset of more severe symptoms, rather than the gradual onset and progression summarized in this table, does not seem to predict which children will have persistent stuttering and which children will have natural recovery. See Guitar (2005) and Yairi & Ambrose (2005) for more details. Adapted from Stuttering: An Integrated Approach to Its Nature and Treatment, by T. J. Peters and B. Guitar, 1991, Baltimore, MD: Williams & Wilkins; Stuttering: An Integrated Approach to Its Nature and Treatment (3rd ed), by B. Guitar, 2005, Baltimore, MD: Lippincott Williams & Wilkins.

— all CWS and AWS — is 1% of the population, or even a bit lower (Yairi & Ambrose, 2013). The lower prevalence, compared with the incidence, reveals a critical characteristic of stuttering: roughly 80% of children who are diagnosed with stuttering as toddlers recover without therapy. This “natural recovery,” which currently is not well understood, results in a higher incidence than prevalence. Many children included in the incidence estimate do not contribute to the prevalence estimate — they become fluent and are not included in the estimate of “percentage of people who currently have the disorder.” Children who do not recover fluency in childhood are said to have persistent stuttering. Diagnoses of stuttering in toddlerhood are made

about equally for boys and girls. Girls, however, are much more likely than boys to recover naturally. Roughly 80% of children who recover fluency after the early diagnosis are girls. Because of the difference in recovery rates between girls and boys, the ratio of males to females who stutter in teenage years and adulthood is about 4:1. There is little evidence that the incidence and prevalence of stuttering varies in a significant way across countries, cultures, races, and socioeconomic groups. Thus, the population of any of these groups (countries, cultures, and so forth) can be multiplied by1% to arrive at the number of people who currently stutter in that group (see http://www.nsastutter.org/ and https:// www.stutteringhelp.org/).

17  Fluency Disorders

Genetic Studies Research since the 1960s and extending to the present time leaves little doubt that stuttering or a predisposition to stuttering is transmitted from parent to child (Etchell, Civier, Ballard, & Sowman, 2018; Smith & Weber, 2017). As reviewed by Peters and Guitar (1991), “family tree” studies from the 1960s and 1970s showed that PWS were far more likely to have a first-degree relative who stuttered, when compared with a control group of fluent speakers. Twin studies (see Andrews, Morris-Yates, Howie, & Martin, 1991; Felsenfeld et al., 2000) added to the evidence for a genetic component in stuttering. The results of these studies were striking. If one member of an identical twin pair stuttered, the likelihood of stuttering in the other member of the pair was about 70%. In contrast, if one member of a fraternal (nonidentical twin pair) stuttered, there was only a 25% to 30% likelihood of stuttering in the other member of the pair. Identical twins are genetically the same; fraternal twins are genetically no more similar than any siblings from the same family. The much higher likelihood of stuttering in both members of identical twins, as compared to fraternal twins, provides strong evidence of a genetic component in stuttering. Frigerio-Domingues and Drayna (2017) provide an up-to-date review of twin studies and stuttering. The specific nature of the genetic component in developmental stuttering remains unknown. Candidate genes for stuttering have been suggested but for the time being are best regarded as hypotheses rather than settled fact (Frigerio-Domingues & Drayna, 2017; Yairi & Ambrose, 2013). Whatever genetic component exists in developmental stuttering is probably more a predisposition for stuttering, rather than a guarantee of stuttering (see Chapters 7 and 15 for similar observations on a genetic component in pediatric language disorders, and speech sound disorders, respectively). This is consistent with the high but not perfect occurrence of stuttering in identical twins. As noted by Smith and Weber (2017), the “expression” of genes is shaped by the environment. If there is a stuttering gene (or genes), it does not guarantee stuttering in a child who has received the gene(s) from a parent.

231

Ambrose and Yairi (1999) used the term “stutteringlike dysfluencies” (SLDs) to identify instances of dysfluency that are typically not seen in children who develop fluent speech. More specifically, children perceived to be stuttering, by parents or other caretakers, made multiple repetitions of single syllables (in multisyllabic words) or of single-syllable words. “Multiple repetitions” means at least three to five syllable repetitions (e.g., “buh-buhbuh-buh . . . ”). These repetitions were produced easily, with no apparent struggle, and rapidly. Children developing fluent speech often produced only a single repetition of a single syllable or a single-syllable word. Ambrose and Yairi (1999) computed the number of SLDs per 100 syllables spoken by fluent children and by children with suspected stuttering. Children with typically developing fluency produced, on average, less than one SLD per 100 syllables spoken. Children with suspected stuttering onset produced at least three such dysfluencies per 100 syllables. Although there were other types of SLDs, multiple syllable repetitions were the dominant sign of the onset of developmental stuttering. Types of SLDs are listed in Table 17–2.

The Natural History of Developmental Stuttering Developmental stuttering can have a sudden onset in toddlers, with symptoms that develop slowly or rapidly. Because the development of stuttering occurs throughout the same time period as speech-sound and language development, a potential interaction between fluency and speech and language skills has been discussed in the literature (see Smith & Weber, 2017, for a summary). These interactions are discussed later in this chapter. Table 17–2.  Types of Stuttering-Like Dysfluencies (SLDs) Observed in Children Who Stutter (CWS)

SLD

Example

Sound, syllable and/ or single-syllable repetition

“b-b-b-but . . . ”

Diagnosis of Developmental Stuttering

Whole-word repetition

“but, but, but . . . ”

Sound prolongations

“Ffffffffffffine”

Dysfluencies are a normal part of speech and language learning in toddlers. How, then, is stuttering diagnosed in a young child? How are normal dysfluencies distinguished from dysfluencies that suggest a diagnosis of developmental stuttering?

Blocks

Inaudible stoppage of speech with mouth open or closed; inability to initiate sounds

“buh-buh-buh . . . ” (for “but”)

Note.  Adapted from https://www.asha.org/practice-portal/clinicaltopics/childhood-fluency-disorders/

232

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Early symptoms of developmental stuttering, such as the easy repetitions described earlier, to advanced stuttering behavior can be described as a sequence of behaviors. We call this the natural history of persistent stuttering. The abbreviated description given here owes much to information published by Peters and Guitar (1991), Guitar (2005), Yairi and Ambrose (1999), and Yairi and Seery (2015). Table 17–1 provides a summary of this history. As in all cases of normal and disordered speech and language development, child-to-child variability is to be expected. The age ranges and characteristic symptoms listed in Table 17–1 are general guidelines to the development of stuttering, rather than fixed milestones. Table 17–1 contains columns labeled Core and Secondary Behaviors. The distinction is important in the natural history of developmental stuttering. Core behaviors are actual speech behaviors produced by a person who stutters. These include syllable and word repetitions, “blocks” (stoppages) in the stream of speech, and prolongations of speech sounds. Secondary behaviors are learned behaviors in PWS. These are reactions to the difficulty of producing a fluent stream of speech. Secondary behaviors include almost anything with the purpose of avoiding or escaping an ongoing or anticipated SLD (Peters & Guitar, 1991; Guitar, 2005). Secondary behaviors include “um”s, hesitations, and body part movements such as blinking or turning the head (as well as others).

Stage I: Typical Dysfluencies As noted earlier, toddlers have dysfluencies that are typically single repetitions of syllables or words. Examples of these typical dysfluencies are part-word (“mimilk”) or whole-word (“milk-milk”) repetitions and single repetitions of syllables in multisyllabic words (as in “do-doggie eat”). Sound prolongation (“sssssssee”) is another type of typical dysfluency. Many scholars interested in the development of stuttering have suggested that these early dysfluencies make sense because children are learning complex aspects of language such as syntax, vocabulary, and grammatical morphology. In addition, children are trying to output this complicated new stuff of language through a system — the speech 1

mechanism — that is undergoing growth and maturation of motor control. Some hesitations, repetitions, and movement prolongations should be expected, as with the initial learning of any complicated skill. The distinction between typical dysfluencies and SLDs is important — children who are diagnosed with developmental stuttering are best served when they receive clinical services at the earliest possible age (Conture, 2001). Table 17–1 contrasts “typical” dysfluencies like the “mi-milk” or “do-doggie” examples given, with SLDs such as “Mi-mi-mi-milk” or “Do-do-do-doggie.” Even when a young (preschool) child produces multiple repetitions and is considered for a diagnosis of developmental stuttering, the repetitions are likely to sound relaxed and without struggle. As the fluency problem develops, frequent signs of struggle with the flow of speech are important in the diagnosis of stuttering as a potentially persistent problem. Research suggests that typical dysfluencies are fairly rare when actually counted. In Table 17–1, the entry “low number of dysfluencies” for Stage I indicates the relative rareness of dysfluent events in typical speech-language development.1 Note in Table 17–1 the fairly wide age range for typical dysfluencies during speech and language development. Peters and Guitar (1991) summarize research suggesting that, as typically developing children master speech production skills, part-word and wholeword repetitions become less frequent, and revisions of the repetitions more frequent. Children age 2 years and 5½ years may both have “normal” dysfluencies, but the younger child repeats more, whereas the older child has few repetitions but more revisions (e.g., “I was going . . . when I left I went . . . and before I left . . . ”).

Stage II:  Borderline Stuttering Borderline stuttering covers the same age range as the period of “typical” dysfluencies (Table 17-1). Borderline stuttering is distinguished from typical dysfluency in the following two ways: (a) dysfluency becomes more frequent in children with borderline stuttering, compared with typically developing children, and (b) part- or whole-word repetitions include three or

 he actual number of dysfluencies per 100 words that distinguishes typically dysfluent children from those who are diagnosed later with T stuttering varies somewhat in different studies. Peters and Guitar (1991) summarized research to the date of publication of their text and suggested that typically developing children had fewer than 10 dysfluencies per 100 words spoken, whereas children who stutter (or who would eventually be diagnosed with stuttering) had 16 to 20 dysfluencies per 100 words. Yairi and Ambrose (1991) used a much more stringent criterion, separating beginning stutterers from those with “typical” dysfluencies with a criterion of 3 dysfluencies per 100 words spoken — more than 3/100, and the child was considered to be atypically dysfluent.

17  Fluency Disorders

more consecutive repetitions in borderline stuttering. For the most part, the child who meets criteria for borderline stuttering shows no secondary behaviors. Occasionally, the child with borderline stuttering may begin to show some frustration with the repetitions. As noted by Guitar (2005), there is substantial gray area between the stages of typical dysfluency and borderline stuttering. The behaviors outlined here for the two stages may blend together and even shift over time. Sometimes the child may seem to have typical dysfluencies; other times, SLDs suggest borderline stuttering. The child who fits criteria for borderline stuttering may, in fact, become fluent in the near future; a smaller number of children in this category continue to show stuttering behavior.

Stage III:  Beginning Stuttering Beginning stuttering, which may be diagnosed across a wide age range of 2 to 8 years but rarely after about 6 years of age, differs from borderline stuttering in several ways. First, the multiple, part-, or whole-word repetitions are produced very rapidly. These repetitions sound less controlled than the easy, relaxed repetitions heard in borderline stuttering. The repetitions of beginning stuttering also appear to be produced with tension, indicating signs of struggle with the flow of speech. Another sign of tension in beginning stuttering is the presence of blocks — the complete stoppage of speech. The child seems to get “stuck” on a speech sound. Blocks may occur with the child’s mouth open, or closed, but it is clear that he or she is trying to produce speech but cannot release the air required to maintain the flow of syllables. Blocks may last for a second or two, or much longer.2 Beginning stuttering is often accompanied by the onset of secondary behaviors. Now the child who stutters incorporates escape behaviors into his or her attempt to speak. These behaviors are meant to assist the child in reestablishing the flow of speech when it is interrupted by a series of repetitions or blocks or sound prolongations (Peters & Guitar, 1991). Eye blinks, head nods, or even slapping a hand against the body, are used to “release” a stuttering episode, to trigger the resumption of the smooth flow of speech. The child who meets the criteria for the beginning stage of stuttering shows awareness of his or her fluency problems 2 

233

and may feel a good deal of frustration with the inability to produce fluent speech.

Stage IV:  Intermediate Stuttering Intermediate stuttering is an elaboration of behaviors seen in beginning stuttering. Blocks, repetitions, and sound prolongations are heard in the child’s speech with increasing frequency and severity. In the intermediate stage of stuttering, the child uses escape behaviors to release blocks and other dysfluencies, and begins to use and refine avoidance behaviors. Avoidance behaviors may be associated with specific speech sounds and/or words, communication partners, and communication situations. For example, the child in the intermediate stage of developmental stuttering may have sufficient experience with dysfluency to know that the “m” sound is particularly difficult and may begin to avoid words beginning with this sound. Or, the child may have a history of dysfluency with a person’s name, and avoid saying it. Avoidance of specific sounds and words reflects anticipation of upcoming dysfluencies, and often leads to a search for a different way to communicate the same thought. Such behavior may involve revisions, hesitations, and insertions of “um’s” to give the child time to find an “easier” way to produce the message. Children in the intermediate stage of stuttering begin to connect their stuttering with negative consequences, and, like all of us, look for ways to avoid the negative feelings.

Stage V:  Advanced Stuttering The age range of 14 years and older for advanced stuttering (see Table 17–1) includes individuals who have been dysfluent for many years and have substantial experience with the entire spectrum of stuttering behaviors. The core behaviors include long, tense blocks, part- and whole-word repetitions, and visible lip and jaw tremors (Smith & Weber, 2017). A distinguishing characteristic of advanced stuttering is the sophistication of learned and practiced escape and avoidance behaviors. For example, one of the authors of the Peters and Guitar (1991) textbook recounts a behavior he used, as a young man, to avoid placing food orders in restaurants. He knew that the pressure to

 hen I was an undergraduate student, I had a clinical assignment in a summer residential program for AWS. The day I was introduced to W one of the young men assigned to work with me, the client began to introduce himself and when he came to his name, he had an open-mouth block that lasted well over 30 seconds.

234

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

speak his order quickly would cause him to be dysfluent. He waited until he saw the server approaching the table, let his friends know what he wanted to order, and then excused himself to go to the rest room. Personal accounts from adults who stutter reveal many of these types of avoidance behaviors (https://www.mnsu.edu/ comdis/kuster/PWSspeak/PWSspeak.html).

Recovery of Fluency An important component of the natural history of stuttering is the trajectory of the stages outlined in Table 17–2. For example, when a child appears to be in Stage III of this natural history, with multiple repetitions, blocks, and evidence of frustration in attempts to deal with dysfluency, does he necessarily move through the final two stages and into advanced stuttering? This question has occupied a good number of scientists whose findings agree in a general sense, but perhaps not specifically. Yairi and Ambrose (1999) concluded that many children diagnosed with stuttering around the age of 3 years stop stuttering within a year or two (or even sooner) and become fluent speakers. This recovery from the diagnosis of stuttering in young children seems to be a natural process, one that does not require (although may be helped by) clinical treatment. Recall that about four of five children who are diagnosed with stuttering recover, and become fluent speakers (Yairi & Ambrose, 1999). The question concerning the natural, or spontaneous, recovery of stuttering has focused on the factors that may explain why children recover fluency after an initial diagnosis of developmental stuttering. One factor is clearly the sex of the child: females are much more likely to recover fluency compared with males (see sections “Incidence and Prevalence of Stuttering”). Recovery of fluency may also be predicted by family history (Walsh et al., 2018; Yairi, 2007). Children with a documented family history of persistent stuttering are less likely to recover fluency than children whose family history of persistent stuttering is less certain. These facts suggest that persistent stuttering has a genetic component (see sections, “Genetic Studies” and “Possible Causes of Stuttering”). Recovery of fluency may also be linked to language skills and temperament at the time of diagnosis, but the research findings on these factors are much less convincing than the sex and family history factors. Other factors, such as age at onset of stuttering symptoms, type of dysfluency (e.g., repetitions versus tense blocks), the child’s phonological development (development of the sound system), and a child’s

speech motor skills may also be important factors in persistent stuttering versus recovery from stuttering. Factors that appear to contribute to recovery of fluency versus persistence of stuttering are summarized in Figure 17–1 (see Ambrose & Yairi, 1999; Conture, 1999; Nippold, 2001, 2002; Walsh et al., 2018; Watkins, Yairi, & Ambrose, 1999; and Yairi & Ambrose, 2005 for additional information on the controversy surrounding factors that account for persistence versus recovery from stuttering).

Possible Causes of Stuttering Three theories of stuttering are considered here. These include psychogenic, learning, and biological linguistic theories. The theories are labeled this way for convenience of presentation; they are not necessarily mutually exclusive in their proposed explanations. The first two theories are presented only briefly. These theories provide historical perspective but among contemporary researchers and clinicians are not widely accepted.

Psychogenic Theories Psychogenic theories explain stuttering as a neurosis. The neurotic basis of stuttering may include hostility, repression of unwanted feelings, phobias, as well as other psychological constructs often associated with Freudian psychopathology (see summaries of psychogenic theories of stuttering in Owens, Metz, & Haas, 2003, and Ramig & Shames, 2006). In contemporary thinking about stuttering, a psychogenic basis for the disorder is largely discounted because evidence of common personality traits among PWS is not compelling, and psychotherapy does not seem to lessen stuttering symptoms. PWS seem to be just like everyone else, except that they stutter. This conclusion may need to be qualified by considering the differences between underlying causes of a disorder, and psychological outcomes of having a disorder. Tran, Blumgart, and Craig (2011), in a review of the literature on psychological issues among PWS, failed to find a persuasive case for psychological problems as the cause of stuttering. Results of their own study, however, in which AWS filled out a questionnaire concerning their levels of anxiety, mood, and other psychological states, suggested differences from a control group. PWS were more likely (by self-report) than control participants to have anxiety and negative mood states. Tran et al. regarded these findings as indicators of the potential effect of stuttering on psychological well-being, rather than the cause of stuttering.

17  Fluency Disorders

235

re

20 (~

ve d(

nt

co )

Pe

0%

rsi

~8

ste

Re

%)

BEGINNING STUTTERING

Male

Female

Family members with history

Lack of family history

Age at onset? Type of SLDs? Developmental speech sound and/or language disorders? Temperament?

Age at onset? Type of SLDs? Typical speech and language development?

Figure 17–1.  Variables likely to be associated with persistent versus recovered stuttering in children.

Learning Theories Learning theories of stuttering propose that some children learn to be dysfluent during a period of typical dysfluencies. A child develops an association between normal dysfluencies and fear of speaking; when the dysfluencies are “released,” an immediate reduction of fear reinforces dysfluency, making it more likely to occur in the future. Over time, this pattern of dysfluency, fear, release of the dysfluency with its reduction in anxiety and fear, leads to chronic stuttering as a learned habit. Learning theories of stuttering are controversial. There is little doubt that as stuttering develops, emotions such as fear, anxiety, and shame increase as the child finds it more difficult to produce fluent speech and be an effective communicator. Moreover, it makes sense that chronic, speaking-related emotions complicate and possibly undermine the child’s attempts to produce fluent speech. As a child who stutters matures through adolescence and young adulthood, and has lengthy experience with advanced stuttering, a deeply rooted association of frustration, anxiety, fear, and shame with the act of speaking may be established. It is easy to see how this association, together with the reduction

or elimination of the negative emotions when fluency follows repetitions or blocks, is viewed as a likely setting for stuttering as learned behavior. To add to this logic, reviews of the efficacy (effectiveness) of speech therapy in CWS and AWS indicate that therapies using rather simple principles of learning can reduce stuttering to a significant degree (Bothe, Davidow, Bramlett, & Ingham, 2006). If we follow the logic used earlier to discount neurotic theories of stuttering, the relative success of many of these “learning” therapies could be taken as “proof” for stuttering as learned behavior.

Biological Theories A biological theory of stuttering implies a physical or physiological basis for the disorder (or both). Biological explanations for stuttering focus on brain differences between PWS and fluent speakers, because there is no evidence that articulators such as the tongue, lips, and jaw are different in PWS than in fluent speakers. In most cases, these hypothesized brain differences are thought to be present at birth. Over the past several decades, three types of evidence have been used to support a brain basis for childhood-onset fluency disorder. Differences in brain

236

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

anatomy and brain physiology between CWS and children who are fluent constitute two of the three types of evidence. The third type of evidence, which may be the basis for the first two, suggests that stuttering is a genetic disorder. The three types of evidence are not independent. A genetic basis for stuttering may include a difference in the development of speech motor control mechanisms in the central nervous system.

Anatomical Differences Sophisticated brain imaging techniques are available to address the question of anatomical differences between PWS and fluent speakers. Studies have examined the relative size of gray matter in brain locations known to be part of the speech production network, as well as the connections (white matter) between these locations. A related question has been the possibility of different patterns of anatomical asymmetry in the cerebral hemispheres of PWS compared with fluent speakers. Summaries of this work are found in Etchell, Civier, Ballard, and Sowman (2018) and Ingham, Ingham, Euler, and Neumann (2018). Recall from Chapter 2 the size differences between structures in the left and right hemispheres: in a fluent speaker, left-side structures that are part of the speech and language network are often larger than corresponding right-side structures, apparently consistent with the left-side dominance for speech and language structure. The findings of several relevant studies of AWS are not always consistent, but Etchell et al. (2018) conclude that compared to fluent adults, two structures in the right hemisphere of AWS have more gray matter. These structures are the right-hemisphere areas in the frontal lobe corresponding more or less with Broca’s area in the left hemisphere, and the planum temporale, an auditory area important for speech perception. In other words, the asymmetries observed in fluent speakers may be reduced or not observed in PWS. At first glance, these claims for the brain structures in PWS contradict the typical lateralization of speech structures to the left hemisphere. As argued by Etchell et al. (2018), though, the conclusion makes sense if poorly functioning speech motor control structures in the left hemisphere, which presumably are one cause of stuttering, are compensated for by activity in the analogous structures of the right hemisphere. It is as if the right hemisphere structures take over part or all of the job of fluent speech production, and the assumption of greater speech activity requires more gray matter (see Foundas, Bollich, Corey, Hurley, & Heilman, 2001; and Foundas et al., 2003).

In another example, the size difference in many humans between the left- and right-hemisphere planum temporale — larger in the left hemisphere, consistent with left lateralization of speech and language ​ — ​has been shown to be eliminated or even reversed in AWS, but not in CWS. This makes it unclear if the loss of asymmetry in AWS is due to the origin of stuttering (e.g., is part of “programmed brain growth” in PWS) or is the result of chronic stuttering into adulthood which results in changes in brain anatomy (Chang, Erickson, Ambrose, Hasegawa-Johnson, & Ludlow, 2008). White matter differences between PWS and fluent adults (or the comparison of CWS and fluent children) include differences in the corpus callosum (the fiber tract that connects the two hemispheres). Specifically, the corpus callosum has been reported to have greater volume in CWS compared with fluent children (Etchell et al., 2018). This may be consistent with increased gray matter on the right side of the brain of PWS. Compensation by the right side of the brain for dysfunctional left-side speech structures may require that more information be transferred to the right hemisphere during speech. To accommodate this increased flow of information, a larger interhemispheric (between hemispheres) fiber tract may be needed. There is also evidence that white matter connections between brain structures for speech and language, such as between Wernicke’s and Broca’s areas or between cortical and subcortical areas for speech motor control, are less well developed in CWS compared with fluent children (Chang, Zhu, Choo, & Angstadt, 2015). Poor development of white matter tracts that are essential to the speech motor control brain network may play a role in childhood stuttering.

Physiological (Functional) Differences Functional neuroimaging studies (using, for example, the functional magnetic resonance imaging [fMRI] technique described in Chapter 2) have revealed differences in brain activity patterns for speech production in PWS, compared with fluent people. Brown, Ingham, Ingham, Laird, & Fox (2005) examined brain regions for fluent speakers as they produced speech and observed a small “core” set of active regions. These regions included Wernicke’s and Broca’s areas, among others known to be involved in the motor control of structures such as the lips, tongue, and larynx. PWS showed activation during speech of the same “core” regions but had either stronger- or weaker-than-normal activity in those areas. In addition, regions of the brain associated with auditory processing, typically active in fluent speakers, showed little activity in PWS. This last finding is provocative in light of the planum tempo-

17  Fluency Disorders

rale asymmetries discussed previously. Both the anatomical and physiological findings therefore suggest a role of the brain’s auditory processing centers in stuttering, and perhaps specifically for the perception of speech and language and its integration with speech motor control.

Speech Motor Control and Stuttering The possibility of developmental stuttering as a consequence of immature or dysfunctional speech motor control has been gaining traction over the past few decades. Execution of speech movements, such as tongue speed and coordination of two or more articulators, is believed to be affected in toddlers who begin to stutter. As stated by Smith and Weber (2017), “Disfluencies arise when the motor commands to the muscles are disrupted, and normal patterns of muscle activity required for fluent speech are not generated” (p. 2487). Deficits in the planning of speech sound sequences are also thought to be a component of the speech motor control deficit. As discussed previously for apraxia of speech in adults (Chapter 14) and in children (Chapter 15), the distinction between the planning and execution components of speech motor control is important. Execution is the direct control over movements of the speech mechanism (such as the tongue) by cells in the primary motor cortex. Planning is the preparation of a program for the execution of movements, which includes (at least) the selection of speech sounds, their ordering, and the commands for intended movements. A speech motor control plan can be assembled without executing the plan — it is like a mental representation of what is intended. In contrast, execution of the plan is the result of the commands from the primary motor cortex that have direct control over the muscles. Smith and Weber (2017) say, “It has been hypothesized that the underlying speech motor deficit in adults with persistent stuttering is a failure to form stable underlying motor programs for speech” (p. 2487). Smith and Weber support the idea of immature speech motor programs by citing their own work on articulatory stability (variability of articulator movements over multiple repetitions of a short phrase). Fluent adults did not improve their stability over many trials of the repetitions because their speech was already planned at a mature level. Conversely, AWS and fluent children showed improvement over trials, presumably in the stability of the program. When the planned speech sound sequence is stable, so are the executed movements. AWS improved because they had immature speech motor programs and therefore had room to improve. Fluent children improved because their speech motor control is still maturing; they too had room to improve.

237

A speech motor control perspective on developmental stuttering is compelling for several reasons. First, it explains why articulatory movements of PWS are different from the articulatory movements of a fluent speaker — even during fluent utterances. (Zimmerman, 1980). Second, it is consistent with different types of stuttering, most notably clonic versus tonic SLDs. “Clonic” is a term that describes rhythmic, repetitive movements of a body part. Multiple, consecutive sound or syllable repetitions can be considered a type of speech clonus (noun form of “clonic”). Similarly, “tonic” is a term that describes a muscle contraction that is sustained for a relatively long period of time. The long “blocks” seen in advanced stuttering can be regarded as a form of speech tonus. The terms “clonus” and “tonus” are, in fact, used to describe signs of certain neurological diseases, but a caution is in order: although the terms have been used to describe the SLDs of multiple repetitions and blocks, respectively, they are not used to link stuttering with specific neurological diseases (Schwartz & Conture, 1988). However, these two SLD types fit with unintended and uncontrolled aspects of speech production that seem explained better by a speech motor control deficit than by (for example) a learning theory. A third piece of evidence concerns the hypothesized speech motor planning component of a speech motor control deficit. Stuttering is not a linguistic, equalopportunity speech disorder. As reviewed by Anderson, Pellowski, and Conture (2005), stuttering is most likely to occur on low-frequency words, one of the first three words of an utterance, function words in young CWS versus content words in older CWS and AWS, and longer, grammatically complex utterances (for grammatical complexity, see Melnick & Conture, 2000). An argument can be made that each of these four linguistic conditions requires more speech motor planning skills than its opposite. For example, lipstick and rabbit are low- and high-frequency words, respectively, likely to be known by a 4-year-old. Frequently used words such as rabbit are produced so many times by children, and therefore planned so many times, that the plan becomes more or less automatic. In contrast, a low-frequency word like lipstick that is said fewer times may require more active planning, and thus be subject to programming demands that challenge speech motor control maturity in CWS. Similarly, longer and more grammatically complex utterances such as Where does he go when he is hungry? are likely to require more sophisticated planning skills than the shorter and less complex, Where does he go? Finally, an explanation of developmental stuttering as a speech motor control deficit may seem

238

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

incompatible with the possibility that CWS and AWS are more likely to stutter when they are anxious (e.g., Davis, Shisca, & Howell, 2007). How does a speech motor control perspective on stuttering accommodate the effect of anxiety on either speech motor planning and/or execution? Although this question has no current answers, two observations from the research literature are relevant to an increased understanding of why motor planning and execution might be expected to be sensitive to fluctuating states of anxiety. The quality of any motor control task, including speech production, is likely to be sensitive to a person’s anxiety level at the time the task is performed. For example, fine motor control and precise coordination required for finger movements during skilled piano performance deteriorates to various degrees when the person playing the piano is anxious (Kotani & Furuya, 2018). The cognitive resources associated with planning and execution of finger movements may be compromised by anxiety. This suggestion has been made directly for the case of speech motor planning and execution by Hennessy, Dourado, and Beilby (2014), who showed in an experimental task that severity of stuttering in a verbal task was related to the current anxiety state in the person who stutters. Nonverbal responses (pressing a button to indicate an answer to the same question posed in a verbal-response task) did not show this relationship between anxiety and button-press errors. Hennessy et al. concluded that the relationship between variations in stuttering severity and variations in anxiety was specific to speech, and probably was due to anxiety competing for response resources with speech motor planning and execution.

Speech Motor Control and Developmental Stuttering:  A Summary Smith and Weber (2017) suggest the idea that the onset and progression of stuttering in children can be explained by the brain anatomy and physiology differences previously described, and (importantly) by the child’s ability to find a compensation for these differences. Two young children who are diagnosed with stuttering at age 3 years, for example, may have equal levels of immaturity in brain structures and physiology for speech. One child may not recover fluency and have persistent stuttering into adulthood; the other may become fluent within months. This hypothetical comparison illustrates how genetic predispositions for stuttering do not guarantee persistent stuttering. Both children may have the same, genetically determined, immature brain structures and physiology for speech. The child who becomes fluent does so because an environmental influence “shapes” the genetic predis-

position away from persistent stuttering and toward recovery of fluency (Smith & Weber, 2017).

Acquired (Neurogenic) Stuttering Adults who have been fluent throughout their lives (with some exceptions) and who acquire brain damage from (for example) a stroke (Theys, van Wieringen, Sunaert, Thijs, & De Nil, 2011) or traumatic brain injury (Penttilä, Korpijaakko-Huuhka, & Kent, 2019) may have stuttering as a speech-language problem. It is called “neurogenic” stuttering to recognize its cause from a documented brain injury/damage. Neurogenic stuttering is rare and is not well understood. Two questions are often asked about neurogenic stuttering. First, are the symptoms of neurogenic stuttering like those seen in developmental stuttering? Second, which neurological diseases have neurogenic stuttering as a possible symptom, and when neurogenic stuttering occurs in one of these diseases, which parts of the brain are likely to be damaged? The summary that follows is based on the reviews cited previously, as well as articles cited in those reviews.

Symptoms of Neurogenic Stuttering Compared With Developmental Stuttering A significant amount of attention has been devoted to the similarity (or dissimilarity) of stuttering symptoms in acquired (neurogenic) as compared to developmental stuttering. The reason for making this comparison is, in a broad sense, to determine if the two kinds of stuttering are the “same” thing. Think of the comparison as an experimental hypothesis: If symptoms in acquired and developmental stuttering are clearly different, the disorders may share a name (stuttering) but are different types of communication disorders. This has implications for both the clinical management of the disorders and their underlying theories. But if the symptoms in the two disorders are the same, stuttering may be viewed as the same phenomenon in both children and adults. This potential outcome, in which stuttering behavior is essentially the same in developmental and acquired versions of the disorder, may be regarded as consistent with a biological view of all stuttering behavior. As noted earlier, stuttering episodes in developmental stuttering are not found equally at word beginnings and endings, or on content versus function words. Early case reports of acquired (neurogenic) stuttering suggested that patients may not follow this pat-

17  Fluency Disorders

tern, but rather stutter frequently at word endings and on function words. More recent data, however, suggest as much similarity as difference for the actual types and locations of dysfluencies observed for developmental and acquired stuttering (Theys, van Wieringen, & De Nil, 2008). Types and locations of dysfluencies do not seem to reveal a clear-cut distinction between developmental and acquired stuttering. Another well-known feature of developmental stuttering, at least in the intermediate and advanced stages (see Table 17–1), is the presence of secondary characteristics that include struggle and release behaviors associated with repetitions, prolongations, and blocks. Some case reports (articles written to describe a single patient’s behavior, common in the medical literature) indicated that secondary characteristics did not occur in acquired stuttering. Other cases show struggle, avoidance, and release behaviors similar to those seen in the later stages of developmental stuttering and in AWS. Theys et al. (2008) concluded that the presence versus the absence of secondary behaviors does not seem to provide a reliable distinction between acquired and developmental stuttering. In summary, evidence collected so far has not shown a clear distinction between the core or secondary characteristics of acquired and developmental stuttering. A future analysis may reveal a clear distinction, but for the present it seems best to regard the symptoms of the two forms of the disorder as overlapping.

Treatment Considerations Many different behavioral techniques exist for the treatment of stuttering, in both children and adults. In children past the age of 9 or 10 years, and in adults, several of these techniques seem to work for many, as evaluated by very strict scientific criteria (Baxter et al., 2016; Bothe, Davidow, Bramlett, & Ingham, 2006). The techniques work not only in the therapy session, where a person who stutters can reduce or eliminate stuttering episodes under controlled circumstances and with the help of a clinician, but also in real-world talking situations. PWS who have been treated by an SLP have a real hope that stuttering, and the various feelings and behaviors associated with it, can be brought under some degree of control. Evaluation of treatment effectiveness in preschool children is complicated by the 80% spontaneous recovery rate of children who have an initial diagnosis of beginning stuttering. Treatment may be the reason for recovery of fluency, or may be due to natural recovery (Salturklaroglu & Kalinowski, 2005). Of course, treatment may hasten recovery in children who at a later

239

date have spontaneous recovery. A review of techniques used to treat developmental stuttering in preschool children is by Shenker and Santayana (2018).

Chapter Summary Stuttering is a speech-language disorder in which the smooth, fluent stream of speech is interrupted by repetitions, blocks, complete stoppages of speech, and revisions; its cause is unknown. Developmental stuttering begins in early childhood and usually involves a slow progression of symptoms in which the “typical” dysfluencies of early speech and language development increase in frequency and severity as the child matures. A “natural history” of developmental stuttering describes how these symptoms change from the early, typical dysfluencies to the later blocks, repetitions, and especially struggle behaviors that are characteristic of advanced stuttering. As many as 80% of children suspected of having a stuttering disorder recover with or without therapy. The likelihood of recovery from childhood stuttering is greater for girls, as compared to boys, and for children who do not have a first-degree relative who has or had a stuttering problem. There are several different theories of stuttering. Psychological theories, which regard stuttering as an expression of a neurosis, were popular at one time but do not have much scientific support. Learning theories are based on the assumption that the normal dysfluencies of early childhood become a chronic pattern as the child learns to associate stuttering episodes with negative outcomes, and the “release” of the stuttering episodes with positive outcomes; in these theories, the child becomes conditioned to stutter. Biological theories hold that there is some brain difference or dysfunction that explains stuttering; in contemporary scientific discussions, biological theories are very much intertwined with the idea of a genetic basis for stuttering. The genetic basis for stuttering is supported by (a) the greater likelihood of stuttering among relatives of PWS, as compared to relatives of fluent speakers; and (b) the greater occurrence of stuttering in monozygotic as compared to dizygotic twins. Dysfluencies are sensitive to the linguistic structure of an utterance; these linguistic factors can be integrated with biological theories of stuttering. Neurogenic stuttering is the term used to describe the onset of stuttering in adulthood, usually as a result of neurological disease such as stroke or head injury.

240

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The symptoms of neurogenic stuttering are in some ways like those of developmental stuttering, but in some cases, there may be differences (such as the absence of struggle behaviors in neurogenic stuttering, at least according to some reports). Evidence exists in the scientific literature that developmental stuttering can be treated successfully.

References Ambrose, N., & Yairi, E. (1999). Normative dysfluency data for early childhood stuttering. Journal of Speech, Language, and Hearing Research, 42, 895–909. Anderson, J. D., Pellowski, M. W., & Conture, E. G. (2005). Childhood stuttering and dissociations across linguistic domains. Journal of Fluency Disorders, 30, 219–253. Andrews, G., Morris-Yates, A., Howie, P., & Martin, N. (1991). Genetic factors in stuttering confirmed (Letter). Archives of General Psychiatry, 48, 1034–1035. Baxter, S., Johnson, M., Blank, L., Cantrell, A., Brumfitt, S., Enderby, P., & Goyder, E. (2016). The state of the art in non-pharmacological interventions for developmental stuttering. Part 1: A systematic review of effectiveness. International Journal of Language and Communication Disorders, 50, 676–718. Bothe, A. K., Davidow, J. H., Bramlett, R. E., Franic, D. M., & Ingham, R. J. (2006b). Stuttering treatment research 1970–2005: II. Systematic review incorporating trial quality assessment of pharmacological approaches. American Journal of Speech-Language Pathology, 15, 342–352. Bothe, A. K., Davidow, J. H., Bramlett, R. E., & Ingham, R. J. (2006a). Stuttering treatment research 1970–2005: I. Systematic review incorporating trial quality assessment of behavioral, cognitive, and related approaches. American Journal of Speech-Language Pathology, 15, 321–341. Brown, S., Ingham, R. J., Ingham, J. C., Laird, A. R., & Fox, P. T. (2005). Stuttered and fluent speech production: An ALE meta-analysis of functional neuroimaging studies. Human Brain Mapping, 25, 105–117. Chang, S-E., Erickson, K. I., Ambrose, N. G., Hasegawa-Johnson, M. A., & Ludlow, C. L. (2008). Brain anatomy differences in childhood stuttering. NeuroImage, 39, 1333–1344. Chang, S. E., Zhu, D. C., Choo, A. L., & Angstadt, M. (2015). White matter neuroanatomical differences in young children who stutter. Brain, 138, 694–711. Conture, E. G. (2001). Stuttering: Its nature, diagnosis, & treatment. Needham Heights, MA: Allyn & Bacon. Davis, S., Shisca, D., & Howell, P. (2007). Anxiety in speakers who persist and recover from stuttering. Journal of Fluency Disorders, 40, 398–417. Etchell, A. C., Civier, O., Ballard, K. J., & Sowman, P. F. (2018). A systematic literature review of neuroimaging research on developmental stuttering between 1995 and 2016. Journal of Fluency Disorders, 55, 6–45. Felsenfeld, S., Kirk, K. M., Zhu, G., Statham, D. J., Neale, M. C., & Martin, N. G. (2000). A study of the genetic and envi-

ronmental etiology of stuttering in a selected twin sample. Behavior Genetics, 30, 359–366. Foundas, A. L., Bollich, A. M., Corey, D. M., Hurley, M., & Heilman, K. M. (2001). Anomalous anatomy of speechlanguage areas in adults with persistent developmental stuttering. Neurology, 57, 207–215. Foundas, A. L., Corey, D. M., Angeles, V., Bollich, A. M., Crabtree-Hartman, E., & Heilman, K. M. (2003). Atypical cerebral laterality in adults with persistent developmental stuttering. Neurology, 63, 1640–1646. Frigerio-Domingues, C., & Drayna, D. (2017). Genetic contributions to stuttering: The current evidence. Molecular Genetics and Genomic Medicine, 5, 95–102. Guitar, B. (2005). Stuttering: An integrated approach to its nature and treatment (3rd ed). Baltimore, MD: Lippincott Williams & Wilkins. Hennessy, N. W., Dourado, E., & Beilby, J. M. (2014). Anxiety and speaking in people who stutter: An investigation using the emotional Stroop task. Journal of Fluency Disorders, 40, 44–57. Ingham, R. J., Ingham, J. C., Euler, H. A., & Neumann, K. (2018). Stuttering treatment and brain research in adults: A still unfolding relationship. Journal of Fluency Disorders, 55, 106–119. Kotani, S., & Furuya, S. (2018). State anxiety disorganizes finger movements during musical performance. Journal of Neurophysiology, 120, 439–451. Melnick, K. S., & Conture, E. G. (2000). Relationship of length and grammatical complexity to the systematic and nonsystematic speech errors and stuttering of children who stutter. Journal of Fluency Disorders, 25, 21–45. Nippold, M. A. (2001). Phonological disorders and stuttering in children: What is the frequency of co-occurrence? Clinical Linguistics and Phonetics, 15, 219–228. Nippold, M. A. (2002). Stuttering and phonology: Is there an interaction? American Journal of Speech-Language Pathology, 11, 99–110. Owens, R. E., Metz, D. E., & Haas, A. (2003). Introduction to communication disorders: A life span approach (2nd ed.). Boston, MA: Allyn & Bacon. Penttilä, N., Korpijaakko-Huuhka, A. M., & Kent, R. D. (2019). Disfluency clusters in speakers with and without neurogenic stuttering following traumatic brain injury. Journal of Fluency Disorders, 59, 33–51. Peters, T. J., & Guitar, B. (1991). Stuttering: An integrated approach to its nature and treatment. Baltimore, MD: Williams & Wilkins. Ramig, P. R., & Shames, G. H. (2006). Stuttering and other disorders of fluency. In N. B. Anderson & G. H. Shames (Eds.), Human communication disorders: An introduction (7th ed., pp. 183–221). Boston, MA: Pearson Education. Salturklaroglu, T., & Kalinowski, J. (2005). How effective is therapy for childhood stuttering? Dissecting and reinterpreting the evidence in light of spontaneous recovery rates. International Journal of Language and Communication Disorders, 40, 359–374. Schwartz, H., & Conture, E. (1988). Subgrouping young stutterers. Journal of Speech and Hearing Research, 31, 62–71.

17  Fluency Disorders

Shenker, R. C., & Santayana, G. (2018). What are the options for the treatment of stuttering in preschool children? Seminars in Speech and Language, 39, 313–323. Smith, A., & Weber, C. (2017). How stuttering develops: The multifactorial dynamic pathways theory. Journal of Speech, Language, and Hearing Research, 60, 2483–2505. Theys, C., van Wieringen, A., & De Nil, L. (2008). A clinician survey of speech and non-speech characteristics of neurogenic stuttering. Journal of Fluency Disorders, 33, 1–23. Theys, C., van Wieringen, A., Sunaert, S., Thijs, V., & De Nil, L. F. (2011). A one year prospective study of neurogenic stuttering following stroke: Incidence and co-occurring disorders. Journal of Fluency Disorders, 44, 678–687. Tran, Y., Blumgart, E., & Craig, A. (2011). Subjective distress associated with chronic stuttering. Journal of Fluency Disorders, 36, 17–26. Walsh, B., Usler, E. Bostian, A., Mohan, R., Gerwin, K. L., Brown, B., . . . Smith, A. (2018). What are predictors for persistence of childhood stuttering? Seminars in Speech and Language, 39, 299–312.

241

Watkins, R., Yairi, E., & Ambrose, N. (1999). Early childhood stuttering. III: Initial status of expressive language abilities. Journal of Speech, Language, and Hearing Research, 42, 1125–1136. Yairi, E. (2007). Subtyping stuttering. I: A review. Journal of Fluency Disorders, 32, 165–196. Yairi, E., & Ambrose, N. G. (1999). Early childhood stuttering. I: Persistency and recovery rates. Journal of Speech, Language, and Hearing Research, 42, 1097–1112. Yairi, E., & Ambrose, N. G. (2005). Early childhood stuttering. Austin, TX: Pro-Ed. Yairi, E., & Ambrose, N. G. (2013). Epidemiology of stuttering: 21st century advances. Journal of Fluency Disorders, 38, 66–87. Yairi, E. H., & Seery, C. H. (2015). Stuttering: Foundations and clinical applications (2nd ed.). New York, NY: Pearson. Zimmerman, G. (1980). Articulatory dynamics of fluent utterances of stutterers and nonstutterers. Journal of Speech and Hearing Research, 23, 95–107.

18 Voice Disorders Introduction Chapter 10 describes the role of the vibrating vocal folds as the primary sound source for speech. The vocal folds, contained within the cartilage framework of the larynx, vibrate periodically and generate a tone whose pitch is proportional to the rate of vibration. The production of tone by the vibrating vocal folds is called phonation. Perceptual impressions of voice production include pitch, loudness, and quality. The physical (acoustic) bases of these perceptual impressions are fundamental frequency (F0), intensity (amount of sound energy), and spectrum (mix of periodic and noise characteristics produced by the vibrating vocal folds). The first sign of a voice disorder is often a perception of voice abnormality, either by the person producing the voice or by listeners. The pitch may seem too low for the speaker’s gender and/or age, the loudness too soft, or the quality too rough or breathy. Alternatively, a first sign of a voice disorder may be a sense of pain, unusual effort, or tightness in the neck when producing phonation. Voice disorders are not only a concern for their social implications  —  listeners often do not “like” abnormal-sounding voices — but also for their potential to affect careers. Teachers with serious voice problems cannot teach, or can teach but with reduced effectiveness. Actors, tour guides, singers, athletic coaches, and other professionals are greatly affected by a voice dis-

order. These effects include lost workdays; voice disorders have an economic impact. Voice disorders have many different causes. A selected group of voice disorders are presented in this chapter. The term dysphonia is used to indicate voice characteristics that sound abnormal. More specifically, dysphonia is the auditory impression of abnormality in the pitch, loudness, and/or quality of the voice. Dysphonia may also be used to describe a voice perceived as being produced with unusual effort. Dysphonia may or may not have an obvious cause. The present chapter focuses on adult voice disorders; pediatric voice disorders are discussed briefly at the end of the chapter. (See https://www.asha.org/PRPSpecific​ Topic.aspx?folderid=8589942600§ion​=Overview for a comprehensive overview of voice disorders.)

Epidemiology of Voice Disorders How prevalent are voice disorders in the general population, and are there specific factors that make people more or less likely to have had (or have) a voice disorder? Among people who have been diagnosed with a voice disorder, which voice disorders are most common? Roy, Merrill, Gray, and Smith (2005) conducted a survey in which approximately 6.5% of the respondents claimed they were experiencing a voice disorder. In a survey of 14,794 young adults, Bainbridge, Roy, 243

244

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Losonczy, Hoffman, and Cohen (2017) found that 6% reported having a voice problem over the previous 12 months. A prevalence estimate of 6% to 7% for voice disorders among the general population is startling. In a state such as Wisconsin, which has a population of approximately 5,800,000 people, Roy et al.’s work (2005) suggests that 350,000 people have had a voice problem over a 12-month period. Voice disorders can be short term (such as laryngitis), or longer term, as described later in this chapter. The prevalence of voice disorders reported by surveyed individuals increases with age, is greater with a family history of voice disorders, is greater for females compared with men, and is higher among professional voice users (e.g., teachers; Roy et al., 2005) than in the general population

Initial Steps in the Diagnosis of Voice Disorders Patients are referred by primary physicians, or refer themselves, to a voice clinic because of a perceived change in voice production and/or a sense of extreme effort during phonation. The voice therapist goes through a series of steps to diagnose the presence and type of voice disorder. These steps include a case history, perceptual evaluation of the client’s voice, viewing of the vocal folds via a laryngeal mirror (Chapter 10), recording of vocal fold motion during phonation by videostroboscopy, and measurement of basic voice parameters. Information gathered in these assessments contributes to a diagnosis, which in turn is the basis of a treatment plan. The treatment plan may include behavioral therapy which involves direct modification of voice, and/or indirect therapy such as counseling which addresses psychological aspects of the voice disorder. The treatment plan may also include surgery to correct a structural problem of the vocal folds. Behavioral and surgical treatments are combined for certain diagnoses.

Case History As in any type of medical setting, a case history is critical to accurate differential diagnosis of a disease or condition. Differential diagnosis is the systematic identification of the cause or causes of a symptom (or symptoms) by ruling out likely candidates. For example, when a patient reports hoarseness for 2 weeks, many possible causes may explain the voice disorder. A careful case history can rule out some or most of these candidates prior to more in-depth testing.

The voice therapist wants to know if a patient is a professional voice user or has a recent or chronic history of unusual voice usage, such as overuse of the voice (e.g., shouting, screaming), if there is extended use of the voice every day, as in teachers (Martins, Pereira, Hidalgo, & Tavares, 2014), and if there is use of the voice in a unusual way for long periods of time. Professional voice users may develop inflammation of the vocal folds or vocal fold fatigue from excessive, high-effort phonation. Excessive, high-effort phonation can also occur when the voice is pushed to its operating limits, as in the parent who screams at a child, or a child who screams when he or she plays. The voice therapist also wants to know about the patient’s history of tobacco and alcohol use, current or previous use of therapeutic and/or recreational drugs, and if there is a history of frequent laryngopharyngeal reflux (LPR, a technical name for acid indigestion). Smoking and drinking alcohol (especially in combination), certain drugs, and LPR are all known to be causes of inflammation and in some cases permanent tissue change in the vocal folds.

Perceptual Evaluation of the Voice Voice therapists are trained to listen carefully for perceptual signs (voice characteristics) associated with specific disorders. The terms used most often to describe disordered voice characteristics include, but are not limited to, rough, breathy, weak, strain, hoarse, spasm, pitch, and loudness; overall severity is also an important perceptual evaluation of a voice disorder (Kempster, Gerratt, Verdolini Abbott, Barkmeier-Kraemer, & Hillman, 2009). Perceptual impressions, even those of carefully trained voice therapists, do not provide specific information concerning the underlying cause of a voice disorder. For example, the impression of a breathy voice may be the result of vocal fold paralysis, the aging process, or vocal nodules. Or, a breathy voice quality may not be associated with an underlying disease — it may be simply one of the variants of voice quality heard in the population. Perceptual impressions of voice quality are an important step in the diagnostic process, especially when they serve the purpose of narrowing down hypotheses of the underlying cause of a voice disorder.

Viewing the Vocal Folds Indirect laryngoscopy is a technique used to view the vocal folds (Chapter 10, Figure 10–1). The examiner places the mirror close to the back of the throat while holding the tongue with a gauze pad. A strong light

18  Voice Disorders

is aimed at the mirror and reflected to illuminate the vocal folds. The laryngeal mirror examination is performed first as the patient is breathing normally, rather than during phonation. This allows the examiner to see obvious lesions on the vocal folds, such as nodules, polyps, or other growths, and may also provide evidence of vocal fold paralysis, if it exists. The patient is then asked to phonate while the examiner views the vocal folds. The individual cycles of vocal fold vibration are not visible to the examiner because they are faster than the time-resolving ability of the human eye. If there is a lesion on one or both vocal folds, however, the examiner may be able to see whether or not it interferes with vocal fold closure during phonation. Videostroboscopy provides a slowmotion view of the vibrating vocal folds that allows an examiner a more detailed evaluation of possible causes of voice disorders. (See Google “laryngeal videostroboscopy” for video clips of vibrating vocal folds.)

Measurement of Basic Voice Parameters The present discussion focuses on three acoustic measurements that are used often to describe voice characteristics: F0 (perceptual correlate = pitch), intensity (perceptual correlate = loudness), and spectrum (perceptual correlate = voice quality).

F0 (Pitch) F0 is the number of cycles of vocal fold vibration completed in 1 second. The perceptual correlate of F0 is pitch. All other things being equal, as F0 increases so does voice pitch. F0 has been widely studied in the normal population. A graphic summary of F0 data by age, for both males (curve with blue points) and females (curve with pink points), is shown in Figure 18–1. Age is shown on

300

Fundamental frequency (Hz)

260

220

Female

180

140

Male 100

0

10

20

30

40

50

245

60

70

80

90

100

Age (years) Figure 18–1.  Plot of fundamental frequency (F0) across age for males (blue points) and females (red points). Age is on the x-axis, and F0 is on the y-axis. Adapted from Hixon, T. J.,Weismer, G., and Hoit, J. D. (2020). Preclinical speech science: Anatomy, physiology, acoustics, perception (3rd ed.). San Diego, CA: Plural Publishing.

246

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

the x-axis, F0 in Hz on the y-axis. The curves reflect data from many sources in the literature, including Baken and Orlikoff (2000), Kent (1976), Nishio and Niimi (2008), and Lee, Potamainos, & Narayanan (1999). F0 is related to the size of a person’s larynx — in general, the larger the larynx, the lower is the F0. Prior to puberty, the F0 values of males and females are similar, because sex-specific anatomical characteristics of the larynx do not appear before puberty.1 Around puberty, the size of both the male and female larynx grow, with more dramatic growth in males. This is reflected in Figure 18–1 in which male F0 lowers substantially around puberty relative to female F0. Postpuberty F0 values are close to adult values, which remain relatively constant until old age. Female adults between ages 20 and 70 years have F0 values between 190 and 210 Hz. Over this same age range, F0 for males ranges between 115 and 135 Hz. This sex-related difference in F0 explains why males are typically perceived to have lower-pitched voices than females. In old age, the F0 of females decreases (probably as a result of hormone changes), and the F0 of males increases (probably as a result of increased stiffness of the vocal folds and the cartilages of the larynx). Diagnosis of a voice disorder in which pitch seems abnormal can make use of the average data shown in Figure 18–1. These data are a rough guide to “normal.” However, people with heathy voices, at any given age, have a range of F0 values. The F0 values in voice problems with unusual pitch are usually very different from average F0 values such as those shown in Figure 18–1.

Intensity (Loudness) Voice intensity is not precisely the same as voice loudness. Voice intensity refers to the amount of acoustic energy generated by the vibrating vocal folds and modified as the energy passes through the vocal tract and exits the lips. Voice loudness refers to a listener’s perception of the voice. Greater voice intensity typically results in greater voice loudness. Voice intensity is likely to change with the loudness of background noise, as well as with the distance between the speaker and the listener. For example, at a party where many people are talking, voice intensity must be greater than usual to be heard above the loud-

ness of the background noise. Voice intensity is also likely to change with the distance between a speaker and the listener. To maintain a constant loudness for the listener as the distance between speaker and listener increases, voice intensity must increase as well. This is because sound energy decreases as it travels over distance. An increased intensity of the voice is a normal adjustment when speaking to someone located at a distance.2 How do SLPs judge a person’s voice loudness as typical or atypical? Like F0, there is a great deal of speaker-to-speaker variation in voice intensity (and hence, perceived voice loudness) that is accepted by listeners as “normal.” In most cases, however, voice loudness that calls attention to itself is not subtle. The case of an unusually soft voice, or the less-frequent case of a chronically loud voice, is sufficiently noticeable to eliminate the need for precise measures of voice intensity as a requirement to perform further diagnostic tests. The diagnostic evaluation of voice intensity is further complicated by factors that are internal to the speaker. An individual may speak with what appears to be normal loudness but complain about the effort required to make herself heard, even when the communication setting is quiet. Or, a speaker may feel as if he is exerting a normal effort for voice intensity but in fact sound too soft. The case history includes questions to determine the individual’s judgment of effort for the production of loudness appropriate for the communication situation.

Voice Spectrum (Voice Quality) An acoustic spectrum is defined as the relative amplitudes of the many frequency components that make up a sound. In the sound generated by the vibrating vocal folds, the voice spectrum has energy at the F0, as well as a series of harmonics (sometimes called overtones). These harmonic components are found at frequencies 2 times, 3 times, 4 times, . . . n times the F0.3 The precise details of the voice spectrum are related to voice quality. For example, breathy voices have fewer frequency components (harmonics) compared with normal voices, as well as a substantial degree of noise (aperiodic energy), which is not typical of the normal voice. Strained voice qualities (often

1

 ubtle signs of sexual dimorphism in the human larynx may begin to appear several years before puberty, and these may account for the S slightly lower F0 values seen for males, as compared to females, around 8 to 9 years of age.

2 

 he relationship between sound intensity and distance from the sound source is described by the inverse square law. This law states that T sound intensity decreases from its source at a rate proportional to the inverse of the square of the distance from that source (in formulese, Sound intensity ~ 1/d2, where d = distance from the source). Also, the relationship between sound intensity and loudness, like that between frequency and pitch, is not one-to-one.

3 

 or example, a voice spectrum for a speaker whose F0 = 100 Hz has harmonics at 200, 300, 400, 500, . . . n × F0. The amplitude of the harmonics F decrease with increasing frequency.

18  Voice Disorders

called “pressed” qualities) have many harmonics with excessive intensity. Voice spectra are obtained with speech analysis computer programs and are used clinically as an objective, acoustic measure of voice quality. What is the value of objective measures of voice production? Scientists and clinicians may prefer acoustic measures of voice production, compared with perceptual measures which are notoriously unreliable (Kreiman, Gerratt, Kempster, Erman, & Berke, 1993). Acoustic measures of voice production may be better suited than perceptual measures as metrics of the effectiveness of voice therapy.

247

Classification/Types of Voice Disorders Voice disorders can be classified in several different ways but do not always fit neatly into one classification or another. The classifications presented here are not mutually exclusive, as explained later. Specific examples of voice conditions and pathology are presented and discussed within the framework of different classification systems. Table 18–1 provides a summary of ways to classify voice disorders; each of these classification approaches is discussed in the next sections.

Table 18–1.  A Summary of Alternative Classifications of Voice Production and Voice Disorders

Classification

Description

Hypo-hyperfunctional

A description of voice types based on the levels of contraction of laryngeal muscles and how those levels affect the closing phase and closed phase of vocal fold vibration. The hypo-hyperfunctional continuum includes many normal voice types; voice types that are chronically close to or at the ends of the continuum are often diagnosed as dysphonia.

Phonotrauma

A classification based on voice disorders resulting from excessive phonatory behaviors such as constant and effortful talking (e.g., teachers, actors), singing (e.g., professional vocalists), and overdriving the phonatory mechanism (e.g., yelling, screaming). Such phonatory behaviors may result in benign mass lesions on the vocal folds, such as nodules and polyps. The dysphonia resulting from phonotrauma may be of the hypo- or hyperfunction type. Chronic hyperfunction (e.g., excessive screaming) may lead to nodules, which can result in a hypofunctional-sounding voice, which makes the individual exert excessive muscular force (hyperfunction) to overcome the difficulty in closing the vocal folds due to the benign, bilateral masses.

Organic voice disorders

Organic voice disorders is a classification of voice disorders that are the result of benign vocal fold masses. These include (but are not limited to) nodules, polyps, cysts, and granulomas. These benign masses have the potential to interfere with vocal fold vibration by preventing adequate closure and stiffening the cover of the vocal folds. The classification of organic voice disorders may include disorders classified as resulting from phonotrauma.

Functional voice disorders

Dysphonia in the absence of observable pathology in the larynx, or known neurological disease. Functional voice disorders are in some cases called psychogenic voice disorders, meaning that psychological issues are partly or largely responsible for the voice disorders. Muscular tension dysphonia (MTD) is an example of a functional voice disorder in which some of the cases are regarded as psychogenic voice disorders. Puberphonia is another functional disorder that is considered to be psychogenic.

Neurological voice disorders

Neurological voice disorders are those in which a known or suspected disease/ condition of the peripheral or central nervous system is the cause of dysphonia. A neurological voice disorder may be found in diseases of the central nervous system (e.g., stroke, and degenerative diseases such as Parkinson’s disease and multiple sclerosis) or damage to a peripheral nerve that supplies the muscles of the larynx, as in unilateral vocal fold paralysis. Spasmodic dysphonia is a voice disorder that has a suspected neurological cause.

Cancer of the larynx

Cancer of the larynx, in which malignant tumors grow within the larynx, often on or in the vocal folds, causes dysphonia that worsens as the cancer spreads.

248

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

The Hypo-Hyperfunctional Continuum The hypo-hyperfunctional continuum of vocal fold vibration is best understood by a simple illustration of how the opening, closing, and closed phases of vocal fold cycles can be changed by muscular activity within the larynx. Figure 18–2 is a schematic drawing of three consecutive cycles of vocal fold vibration. The trace shows the space between the two vocal folds (that is, the glottis) as they vibrate over time. The glottal space increases as the vocal folds move apart and decreases as they come together. As the trace moves up on the graph, the vocal folds are separating (opening); as it moves down, they are coming together (closing). The horizontal line at the bottom of the trace shows the portion of the cycle dur-

ing which the vocal folds are closed. One cycle of vocal fold vibration is marked. Two parts of a single vocal fold cycle, the closing phase and the closed phase, are critical to understanding the functional variations of vocal fold vibration. The closing phase is the time between the maximum opening of the glottis (at the top of the trace for each cycle) to the instant of vocal closure (the left-hand edge of the horizontal lines at the bottom; Figure 18–2). The closed phase is the portion of the cycle during which the vocal folds are completely closed, as described earlier. The closing phase and closed phase intervals are the basis of the hypo-hyperfunctional continuum (Figure 18–3). During normal vocal fold vibration, the closing phase occurs quickly. Once the vocal folds close, they remain so for a significant portion of the cycle — nearly 40% to 50% of the entire cycle time, which is called

Close

Open

One cycle

Time (sec)

Figure 18–2.  A schematic drawing of three cycles of vocal fold vibration, drawn as the opening and closing of the vocal folds over time. The horizontal lines show the portion of each cycle during which the vocal folds are closed.

Average “normal” phonation HYPOFUNCTION

HYPERFUNCTION

Range of “normal” voice qualities

Insufficient muscular tension Slow closing phase Short closed phase “Breathy” (weak) voice quality

Excessive muscular tension Overly fast closing phase Too-long closed phase “Pressed” voice quality

Figure 18–3.  The hypo-hyperfunctional voice continuum. See text for additional detail.

18  Voice Disorders

the period of vocal fold vibration. In certain voice disorders, an excess of muscular tension in the larynx may result in an excessively fast closing phase and an overly long closed phase. The excess muscular tension is called hyperfunction and results in a voice quality that sounds overly effortful and strained. A hyperfunctional voice quality is often called “pressed,” as if the speaker is pressing the vocal folds together too tightly. In contrast, insufficient muscular tension during vocal fold vibration results in a very slow closing phase and overly short closed phase. This is called laryngeal hypofunction and is associated with a breathy, weak voice quality. The hypo-hyperfunctional continuum shown in Figure 18–3 reflects the concept of continuous variation in voice qualities between two extreme endpoints. One endpoint (hypo) reflects too little muscular tension in the larynx, the other endpoint (hyper) too much muscular effort. Any combination of closing phase speed and closed duration may occur due to different amounts of muscular effort, resulting in many different voice qualities between breathy/weak and strained/ pressed. Because the hypo-hyperfunctional voice continuum includes normal voice production, it is not so much a classification of voice disorders as a range of phonation styles. The more extreme parts of the range are often associated with voice disorders.

Direct Control of the Closing and Closed Durations? When you change voice quality from breathy to pressed, the adjustments do not reflect conscious control of specific laryngeal muscles. If you make the strange decision to approach someone on the street and ask, “Can you slow up the closing phase of your vocal fold vibration so we can see how it affects your voice?” in all likelihood a puzzled look will follow. But ask someone to gradually change voice quality between breathy, through “normal,” and then with increasing tension (the layperson will not understand the term “pressed”) — most people understand, and they can do it. How do they do it? Most likely, they imitate a series of voice images, from auditory memories, connected with the terms “breathy,” “normal,” and “tense.” Like a continuous movement between smiling and frowning, people can do it easily but cannot state conscious muscular strategies for how it is done.

249

As shown in Figure 18–3, the hypo-hyperfunctional continuum of voice qualities includes a wide range of “normal” voices. An ideal voice does not exist, either for the person producing voice or for the person listening to voice. A range of acceptable voice qualities is produced among different people. An individual uses different voice qualities for different circumstances. Many of these voice qualities vary within the normal range, but some may be outside the normal range, temporarily, to fit a situation, to make a point, to convey a message that supplements words. For example, speakers may use extreme hypofunction to produce a breathy voice outside the normal range to comfort someone or express tender emotion. Anger may be expressed with pressed voiced resulting from extreme hyperfunction. Extremely breathy or pressed voice is therefore not unusual in certain situations. These voice qualities become a clinical issue when they are used chronically ​ — when they are a person’s typical voice quality.

Phonotrauma The vocal folds consist of delicate tissues that may be damaged by certain phonation and lifestyle behaviors. Damaged vocal fold tissues interfere with the vibratory motions that produce normal voice qualities. The term “phonotrauma” is used to describe damaged vocal fold tissues (lesions) and the resulting dysphonia due to excessive phonatory behaviors and/or other causes. The lesions caused by phonotrauma are referred to as benign vocal fold lesions, to differentiate them from precancerous or cancerous lesions. Phonotrauma results from behaviors such as overuse of high-intensity voice (as in some kinds of singing or acting), chronic screaming (or sometimes a single episode of very intense screaming), chronic throat clearing, and chronic use of a hyperfunctional voice quality. Vocal fold tissues may also be damaged when they are chronically exposed to environmental agents such as tobacco and alcohol, in individuals who experience chronic reflux, and in persons who have chronic cough and/or throat clearing. The phonation behaviors that result in phonotrauma are usually associated with hyperfunctional voice disorders — far to the right in the continuum of voice use (see Figure 18–3). For the discussion that follows, Figure 18–4 can be used as a reference for the appearance of healthy vocal folds. The photo on the left shows the vocal folds open, during inhalation. The point of the “V” at the bottom of the frame is the most anterior part of the vocal folds, where they are attached to the inside of the thyroid cartilage. The posterior attachment of the vocal folds

250

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Vocal folds making contact for closure during phonation

Anterior (Front)

Arytenoid cartilage

Normal vocal folds Open for inhalation

Normal vocal folds Closed during phonation

Figure 18–4.  Normal vocal folds as seen through an endoscope; the point of the “V” is the front attachment of the vocal folds to the inside of the thyroid cartilage. Left, vocal folds open for inhalation, note straight edge of both vocal folds at their medial boundary (next to the glottis). The space between the vocal folds is the glottis; right, vocal fold closure for the closed phase during phonation, the incomplete rectangle shows firm closure front to back, with just a small opening at the back. Photos courtesy of Professor Susan Thibeault, Department of Surgery, University of Wisconsin Clinical Sciences Center.

is to the arytenoid cartilages. The glottis is the space between the vocal folds. Notice the smooth, straight edge of each vocal fold as it extends from anterior to posterior along its border with the glottis. Each vocal fold is a mirror image of the other — they give the appearance of symmetry. The photo on the right shows the vocal folds during the closed phase of phonation. The vocal folds are pressed together firmly, from front to back, with only a slight opening at the very back of the glottis.

Vocal Nodules Chronic overuse of the voice, whether in singing, screaming, or cheerleading, may result in growths on the vocal folds called vocal nodules (sometimes called singers’ nodules). Vocal nodules are callous-like lesions resulting from chronic slamming together of the vocal folds during phonation. The nodules develop in much the same way as callouses develop on a gymnast’s or baseball player’s hands, or a guitarist’s fingertips. In the early stages, the growths are soft and blister-like. As phonotrauma continues, the growths develop the hard, fibrous texture of a callous.

The location and appearance of vocal nodules are shown in Figure 18–5. The photo on the left shows bilateral nodules on the open vocal folds. The nodules are the small “bumps” along the edge of each vocal fold, about one third of the distance between the front and back of the glottis. The bumps interrupt the smooth edges of the vocal folds, and are located at the same location on the two folds — they are symmetrical. Nodules occur one-third of the distance between the front and back of the vocal folds because the highest collision forces occur at this location when the vocal folds come together for each cycle of vibration. When these forces are chronically excessive, as in behaviors such as yelling, screaming, and constant talking, nodules may develop and interfere with normal phonation. Vocal nodules are likely to change voice quality due to incomplete closure during vocal fold vibration. In Figure 18–5, the folds appear to be closed at the location of the nodules, but not in front (forward toward the thyroid cartilage) or back (toward the arytenoid cartilages). Voice quality may be breathy and noisy due to the escape of air through these openings in the vocal folds. Attempts to overcome the effect of vocal nodules

18  Voice Disorders

Nodules prevent full closure for phonation

Anterior (Front)

Bilateral nodules

251

Figure 18–5. Vocal fold nodules: Two endoscopic views of the vocal folds. Left, vocal folds open for inhalation, with bilateral nodules indicated by pointers; right, vocal folds during the closed phase of vocal fold vibration; nodules interfere with complete closure as seen in front of and in back of the point of contact between the nodules. Photos courtesy of Professor Susan Thibeault, Department of Surgery, University of Wisconsin Clinical Sciences Center.

on vocal fold closure with additional, excessive muscular effort may add strain to the voice quality (Leonard, 2009). The phonotrauma that resulted in the nodules induces more phonotrauma in an attempt to overcome the poor closure due to the presence of the nodules. This may become a viscious cycle of voice behavior which increases the size of the nodules and their effect on vocal fold vibration. The nodules also stiffen the outer layers of the vocal folds, interfering with their motion and affecting voice quality. Nodules grow in the outer layer of the vocal folds, restricting the independent motion of the different tissue layers. Partial loss of the wave-like motion of the outer layer of vocal fold tissue relative to inner layers has a significant effect on voice quality.

Vocal Fold Polyps Vocal fold polyps are the result of phonotrauma but have a tissue structure unlike the callous-like nodules. Polyps are softer and often larger than nodules, and may occur as a result of long-term phonotrauma or even from a single instance of extreme vocal use, such as a particularly intense scream or cheer. Polyps are often unilateral, unlike the typically bilateral nodules. Polyps interfere with phonation in much the same way as nodules, preventing firm vocal fold closure and interfering with the wave-like motion of the vocal fold cover.

Other Benign Vocal Fold Lesions Benign vocal fold lesions are not limited to nodules and polyps. Other phonotrauma-related damage to vocal fold tissue, including the temporary inflammation of viral laryngitis or the long-term inflammation due to chronic LPR, can result in dysphonia. Cysts (fluidfilled sacs) may also occur on or in the vocal fold. These benign lesions may interfere with vocal fold closure and/or stiffen the top layer of the vocal folds. Excellent sources for information on benign vocal fold lesions are Altman, (2007), Naunheim and Carroll, (2017), and Sapienza and Ruddy (2017).

Treatment of Phonotrauma Vocal nodules are often treated with vocal rest — if the patient does not talk for a period of time, nodules may disappear, much as callouses disappear when the irritating cause is removed. Polyps are more likely to be treated surgically, although they may also be treated with behavioral voice therapy. Even when a benign vocal fold lesion is treated surgically, usually by removal of the mass as in the case of polyps, voice therapists play an important role after the surgery. Information on vocal hygiene — using the voice properly, avoiding overuse, restricting talking time, maintaining proper hydration — can be structured

252

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

for the patient to achieve voice recovery. Vocal hygiene programs are relevant to the behavioral treatment of vocal fold nodules and are often successful (Hosoya et al., 2018).

Organic Voice Disorders Dysphonias that are classified as organic voice disorders are often the result of phonotrauma that leads to benign mass lesions — growths on the vocal folds such as nodules, polyps, and cysts (Carding et al., 2017). The presence of a benign mass lesion is a reason to classify a voice disorder as organic; a misuse of the voice is a reason to classify the cause of a voice disorder as phonotrauma. This is an example of overlap between classification categories for voice disorders.

Functional Voice Disorders Dysphonia can exist in the absence of observable pathology on or around the vocal folds. The phonation problem may include an inability to produce voice, an extremely weak, whispery voice, or a voice interrupted by apparent spasms. When neurological disease and mass lesions are ruled out as explanations for dysphonia, a functional voice disorder may be diagnosed — one not explained by organic pathology. Individuals diagnosed with functional dysphonia are often professional voice users. This includes, but is not limited to, teachers, actors, clergy, singers, and tour guides. Extremely talkative and unusually loud individuals may also be at risk for a functional voice disorder (Bastian & Thomas, 2016). The term “psychogenic voice disorder” may be used to classify a functional voice disorder with roots in a psychiatric disorder. As such, psychogenic voice disorders may be considered as a subtype of functional voice disorders. According to the famous speech-language pathologist Arnold Aronson (1990), a psychogenic voice disorder “is a manifestation of one or more types of psychological disequilibrium, such as anxiety, depression, conversion reaction, or personality disorder, that interferes with normal volitional control over phonation” (p. 131).4 The diagnosis of a functional versus psychogenic voice disorder may have important implications for treatment. A patient diagnosed with a functional voice disorder may be treated by a voice therapist who has experience in training patients to regain a normal voice; psychiatric concerns are not significant, and the thera4 

Dr. Aronson lived from 1928 to 2018.

peutic focus is on voice behaviors. A patient diagnosed with a psychogenic voice disorder may be best treated with combined psychiatric and voice therapy. There is no clear-cut distinction between functional and psychogenic voice disorders. Patients with functional voice disorders are likely to have varying psychological issues such as anxiety and depression (Andrea, Dias, Andrea, & Fugeira, 2017; Rosen, Heuer, Levy, & Sataloff, 2003). A voice disorder called muscular tension dysphonia (MTD) illustrates the potential role of psychological issues in a well-known functional voice disorder.

Muscular Tension Dysphonia MTD is a voice disorder in which vocal fold vibration is disturbed by excessive tension in head and neck muscles. Many of these muscles attach to laryngeal cartilages. Hypercontraction of the muscles during phonation may distort the position and shape of the cartilages, the effect of which is to squeeze the vocal folds front to back and side to side, preventing normal vibration. Figure 18–6 shows the closed phase of vocal fold vibration for phonation in a speaker with healthy vocal folds (left) and in a speaker with MTD (right). This view shows how the laryngeal cartilages and the vocal folds are squeezed together in MTD (the incomplete rectangle shows the comparative lengths of the vocal folds during phonation by the two individuals). Individuals with MTD often experience voice fatigue and neck pain when phonating. Although a large-scale study of the prevalence and demographics of MTD has not been published, an estimate of gender and age distribution can be made from various publications. da Cun Pereira, de Oliveira Lemos, Gadenz, and Cassol (2018), in a review of treatment success in MTD, found that 68% of 252 individuals diagnosed with MTD were female. The age range of these individuals was 18 to 84 years. Dietrich, Verdolini Abbott, Gartner-Schmidt, and Rosen (2008) reported similar results among 68 individuals with MTD: 82% were female within an age range of 18 to 68 years. Similar gender and age data were reported by Eastwood, Madill, and McCabe (2014). Taken together, MTD seems to be diagnosed largely in women and from people in young to elderly adulthood. Voice disorders are due to a wide variety of causes, ranging from the clearly organic, where there is a known lesion on the vocal folds, and neurological disease, to functional disorders in which no underly-

253

Anterior (Front)

18  Voice Disorders

Healthy phonation Vocal folds together for phonation No excessive contraction

Muscular tension dysphonia Vocal folds together for phonation Hypercontraction, vocal folds compressed side-to-side, front to back

Figure 18–6.  Left, closed phase during normal vocal fold vibration. Right, closed phase during vocal fold vibration in an individual with muscular tension dysphonia. Photos courtesy of Professor Susan Thibeault, Department of Surgery, University of Wisconsin Clinical Sciences Center.

ing physical cause is present and the disorder is likely to respond to behavioral voice therapy. MTD seems to have characteristics of both organic and functional voice disorders. Hypercontraction of neck muscles may be the result of a speaker trying to compensate for an organic problem in the larynx (e.g., a mass lesion resulting from phonotrauma), and/or a symptom of underlying psychological issues (such as anxiety and depression, all of which may be associated with personality traits). As stated by Van Houtte, Van Lierde, and Claeys (2011), MTD is “the bridge between the purely functional voice disorders . . . and prominent organic disorders” (p. 205). Speakers with MTD have strained voiced qualities that are perceived to be produced with excessive effort. Hoarseness and breathiness may also be heard. In some cases, individuals with MTD may be aphonic — unable to produce phonation even when attempting to use the voice. There is controversy surrounding the diagnosis of MTD. Most often, MTD is confused with a disorder called spasmodic dysphonia, even when experienced voice therapists, voice scientists, and otolaryngologists make the diagnoses (Ludlow et al., 2018). Spasmodic

dysphonia is described later in the section, “Neurological Voice Disorders.”

Treatment of MTD The following information on treatment of MTD is based on Andreassen, Litts, and Randall (2017), da Cun Pereira et al. (2018), Ramig and Verdolini (1998), and the ASHA website on voice therapy (https://www​ .asha.org/PRPSpecificTopic.aspx?folderid=8589942600 §ion=Treatment). Voice therapy for MTD can be direct or indirect. In a direct approach, the therapist works on the individual’s ability to produce a better voice. Indirect approaches may involve education about voice production and vocal hygiene (taking good care of the voice mechanism), and counseling when anxiety and depression are believed to play a significant role in MTD. The general consensus is that voice therapy for MTD is often successful, but that a specific technique among several available has not emerged as a clear choice for maximally effective treatment (Andreassen et al., 2017). Psychological counseling may be a com­ ponent of MTD, if the health care team believes there

254

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

is a mental health issue as one of the causes of the disorder (Dietrich, Verdolini Abbott, Gartner-Schmidt, & Rosen, 2008).

Voice Therapy by Straw A therapy technique called “semioccluded vocal tract” has shown promise in the treatment of MTD. Recall from Chapter 10 that phonation is initiated when the vocal folds are brought together and then blown apart when the air pressure below them (in the trachea) is sufficiently greater than the air pressure above them (in the vocal tract). As long as this pressure difference is maintained, the vocal folds vibrate, each cycle of vibration defined by an opening, closing, and closed phase. To a large degree, the force with which the vocal folds close against each other increases as the pressure difference across the vocal folds increases. In MTD, part of the phonation problem is that the excessive tension in head and neck muscles results in overly forceful closure of the vocal folds. Using the semi-occluded vocal tract technique, individuals learn to reduce the excessive closing force by phonating into a straw. Phonation into the straw (usually submerged in water) creates a partial “block” of the air coming through the vocal tract, which raises the vocal tract pressure to a value closer to the pressure below the vocal folds. This reduces the pressure across the vocal folds and therefore reduces the force of vocal fold closing. The reduced force is thought to lead to a less-tense voice, which helps the patient learn a more relaxed voice quality. Straw phonation is gradually eliminated during therapy, with a goal of transferring the reduced-force vocal fold closing to more natural speech.

Puberphonia Another functional voice disorder that responds well to treatment is puberphonia (sometimes called mutational falsetto). This is a disorder of voice pitch in which a postpubescent male with a normal-sized larynx produces phonation with very high F0, well outside the range typical for adult males. Examination of the larynx fails to show any abnormality or underdevelopment of laryngeal structures that would be consistent with the high-pitched voice. 5 

Treatment of Puberphonia Puberphonia, a functional voice disorder considered to have psychogenic origins, is treatable. Once patients are shown the ability to generate a male-appropriate voice pitch, there seems to be little problem in maintaining a gender-appropriate pitch (Roy, Peterson, Pierce, Smith, & Houtz, 2017).

Neurological Voice Disorders Neurologic disorders that affect phonation are seen frequently in voice clinics (De Bodt et al., 2015). Degenerative neurologic diseases, such as Parkinson’s disease, multiple sclerosis, and amyotrophic lateral sclerosis often have a voice problem as a prominent symptom. The same can be said of strokes in adults, head injuries in children and adults, and congenital neurologic diseases such as cerebral palsy. The focus in this chapter is on dysphonia in two neurological diseases: vocal fold paralysis and spasmodic dysphonia.

Unilateral Vocal Fold Paralysis Vocal fold paralysis has many causes, including inflammatory conditions, neck or chest trauma, neck or chest tumors, and surgical procedures. A diagnosis of vocal fold paralysis implies an absence of innervation by the nervous system to one or both folds. There are also cases of vocal fold paresis (“paresis” means “weak”), in which the affected fold is not completely paralyzed but is weakened to varying degrees by dysfunction of the nerves supplying the larynx (Syamal & Benninger, 2016). Many of the symptoms of vocal fold paralysis and vocal fold paresis are similar; therefore, the following discussion focuses on paralysis. The nerves that control contraction of laryngeal muscles exit the brainstem and run down the neck and chest where they are susceptible to compression trauma (e.g., from upper chest injuries) and accidental surgical damage (e.g., removal of part or all of the thyroid gland). Nerves supplying the larynx are on both sides of the neck; surgical or traumatic injuries may affect only one side, resulting in unilateral vocal fold paralysis. Figure 18–7 shows two images of unilateral vocal fold paralysis, one during inhalation (left), and the other when the speaker was attempting to phonate (right). The paralysis of the left vocal fold (right, in the image) is due to injury to the nerve that innervates the main muscle of the vocal folds as well as muscles that open, close, and compress them.5

 he description of the specific nerve paralysis responsible for the vocal fold appearance applies to one branch of the nerves that supply the T larynx. There is a nerve that supplies a single muscle of the larynx — the muscle that stretches the vocal fold. This nerve can also be paralyzed, but it is not considered further in this chapter.

18  Voice Disorders

255

Anterior (Front)

Paralyzed vocal fold

Unilateral paralysis of left vocal fold inhalation

Unilateral paralysis of left vocal fold attempt to bring vocal folds together

Figure 18–7.  Unilateral vocal fold paralysis. Left, paralysis of left vocal fold during inhalation; right, paralyzed vocal fold during closed phase of phonation. Note the inability of the paralyzed vocal fold to reach the midline for closure. Photos courtesy of Professor Susan Thibeault, Department of Surgery, University of Wisconsin Clinical Sciences Center.

The image on the left shows the vocal folds open for inhalation. Compared with the healthy, right vocal fold, the paralyzed fold is shorter; in many cases, a paralyzed vocal fold appears to have less mass compared with a healthy fold. The image on the right shows the position of the paralyzed vocal fold during the closed interval of vocal fold vibration. Note the lack of contact between the two vocal folds and the slightly curved appearance (“bowed”) of the paralyzed fold. A paralyzed vocal fold can vibrate for phonation. This is because the rapid opening-and-closing motions of the vocal folds for phonation are controlled by air pressures and air flows, not by direct muscular contractions (see Chapter 10 and earlier discussion). However, the loss of muscular control of the paralyzed vocal fold makes it difficult to achieve adequate vocal fold closure for each cycle of vibration. The paralyzed vocal fold vibrates, but weakly. The loss of muscular control of the paralyzed vocal fold affects voice quality. As reviewed by Samlan and Story (2017), a paralyzed vocal fold is likely to result in a breathy, weak, and strained voice quality. The strained component of the voice quality reflects an individual’s attempt to overcome the inability to

achieve good closure of the folds by means of excessive effort. People with a paralyzed vocal fold may experience fatigue when phonating for extended periods of time; it is hard work to produce voice when the vocal folds cannot close effectively.

Treatment of Unilateral Vocal Fold Paralysis Treatment of unilateral vocal fold paralysis combines direct voice treatment with surgical techniques. Direct voice therapy focuses on improving speech breathing to manage the airflow problems of a “leaky” glottis, and voice exercises to achieve better vocal fold closure for the closed phase of vibration. Effort exercises, such as pushing against a fixed-position surface (e.g., a wall) or pulling up on the underside of a chair while sitting in it, may be used to evoke closure of the glottis during phonation. Surgical techniques to improve phonation in unilateral vocal fold paralysis may or may not include the injection of biomaterials into the paralyzed fold to “plump it up.” This provides a larger mass against which the healthy fold can make contact for phonation. Another popular surgical technique is to reposition the paralyzed fold closer to the midline, so it is easier

256

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

for the healthy fold to contact it during vibration. The measure of success of these surgical techniques is an improved voice quality, as well as better swallowing function (see Chapter 20). Bilateral vocal fold paralysis, a rare disorder in which the nerves serving the muscles of the larynx are damaged on both sides of the neck, is a life-threatening condition. Because the nerves control the single muscle of the larynx that separates (abducts) the vocal folds, paralysis on both sides prevents separation of the folds for inhalation. A tracheostomy (opening in the neck, below the vocal folds) is a common way to provide a patient with an alternate airway to sustain life. Another surgical approach is a permanent repositioning of one or both vocal folds away from the midline to restore the natural airway (Naunheim, Song, Franco, Alkire, & Shrime, 2017).

Spasmodic Dysphonia Spasmodic dysphonia (SD) is a rare voice disorder regarded by a majority of voice specialists as a neurological disease. The estimated prevalence of SD is 1 in every 100,000 people. This estimate is complicated by disagreement among professionals as to the nature and even existence of the disorder as a neurologically based voice disorder (Hintze, Ludlow, Bansberg, Adler, & Lott, 2017). Patients diagnosed with SD typically have intermittent, irregularly occurring voice spasms when they attempt to phonate; they may also have voice tremors (shaking voice). The spasms are related to massive and sustained hypercontraction of laryngeal muscles. In some cases, hypercontraction is observed in muscles above the larynx, such as the tongue. Patients with SD appear to be exerting tremendous effort to initiate and maintain phonation. SD is a controversial disorder. The reader may notice a similarity between symptoms of SD and those of MTD. The two diagnoses — SD, implying a neurologic cause, MTD a psychological cause, are often in dispute. Voice specialists who are asked to make a diagnosis of one or the other disorder, based on audio recordings of speech or views of the vocal folds during phonation, often disagree. Even the case histories and other characteristics of the individuals with the two disorders can be very similar. For example, in both disorders, a large percentage of patients are women, typically professional and/or frequent voice users in the middle years of life. Many patients with either of the two diagnoses report a traumatic event preceding the onset of voice symptoms.

Differences between SD and MTD include the following: (a) SD does not seem to respond to voice therapy in the way described previously for MTD; (b) the symptoms of SD seem to be sensitive to specific speech sounds, whereas the symptoms of MTD are more constant regardless of which speech sounds are produced; and (c) a significant number of patients with SD have a voice tremor, and patients with MTD do not.

Treatment of Spasmodic Dysphonia SD, as mentioned earlier, does not respond well to behavioral voice treatment. SD is treated by injection into the vocal folds or other muscles of the larynx of the neurotoxin botulinum toxin. Botulinum toxin inhibits the release of the neurotransmitter acetylcholine at the junction of motor nerves and muscle fibers. Acetylcholine is required for the contraction of muscles. By inhibiting the release of acetylcholine to laryngeal muscles, laryngeal spasms are less likely to occur, allowing patients to phonate more or less normally until the effect of the drug wears off.6

Cancer of the Larynx The vocal folds are affected in about 50% of laryngeal cancers, but cancer can occur at any site in the larynx. Cancer of the larynx is almost always a disease of adulthood and is substantially more frequent in males compared with females. At age 60 years, the 2005 incidence in males and female was approximately 35 in 100,000 and 5 in 100,000, respectively (Schultz, 2011). In most cases, cancer of the larynx is a result of long-term tobacco and/or alcohol use. Other environmental factors (such as exposure to certain chemicals) also contribute to the risk of laryngeal cancer. Often, the initial symptoms of laryngeal cancer are voice changes that become increasingly severe over time. The dysphonia in laryngeal cancer has been described as hoarse, rough, and irregular. Cancerous lesions on the vocal folds interfere with the motions essential to normal phonation. The dysphonia worsens as the lesions increase in size.

Treatment of Cancer of the Larynx An initial bout of dysphonia may not be regarded as unusual by an individual. Voice changes such as laryngitis may interfere with phonation for several days or even longer but may not seem to warrant a visit to an otolaryngologist. Even persistent voice changes

Botox  injections in laryngeal muscles are effective for about 3 to 4 months. Patients return for periodic injections to maintain the ability to phonate without spasms.

6 

18  Voice Disorders

coupled with swallowing problems may not seem like compelling reasons to see a physician. Laryngeal cancers in early stages may be treated by surgeries to remove the lesion while preserving sufficient tissue for voice production and swallowing. More advanced stages of laryngeal cancer require more extensive surgeries, ranging from removal of large parts of the larynx to removal of the entire larynx. This latter procedure, called a laryngectomy, requires the creation of a new airway because the path through the larynx is eliminated. The trachea is attached via a small opening in the front of the neck, providing the ability to breathe for life. Surgeries for the removal of laryngeal cancers are performed with an objective of restoring the ability to produce speech with a sound source. Most often, a valve connecting the trachea with the esophagus is inserted during surgery. Air within the trachea is connected to the esophagus; when tracheal pressure is raised in the respiratory system, the increased pressure is forced through into the esophagus. The esophageal pressure causes sound-producing vibration of the ring of muscles at the top of the esophageal tube, resulting in a sound source for the articulation of speech sounds. The sound source does not have the quality of vibrating vocal folds, but it is sufficient for intelligible speech. Other techniques for a substitute sound source following removal of the larynx are also available (Sapienza & Ruddy, 2017).

Pediatric Voice Disorders Many vocal fold conditions/diseases that result in dysphonia are similar to those described previously for adults. The summary that follows is based on a selected group of relevant publications (Lee, Roy, & Dietrich, 2018; Possamai & Hartley, 2013; Smillie, McManus, Cohen, Lawson, & Wynne, 2014; Smith, 2013; Verdolini Abbott, 2013). A comprehensive textbook on voice disorders in children has been written by Kelchner, Brehm, and Weinrich (2014).

Prevalence of Childhood Voice Disorders The prevalence of voice disorders in children has been estimated to be 4% to 6%. Some authors believe the prevalence may be even greater because a childhood dysphonia may not be regarded as a health concern. The prevalence estimate of 4% to 6% is similar to the prevalence estimate of voice disorders in the adult population (Bainbridge et al., 2017).

257

The child larynx is not simply a scaled-down version of the adult larynx (Chapter 10). Most importantly, the layer structure of adult vocal folds is not fully developed in children. Benign vocal fold masses are, in adults, typically located within the cover of the vocal folds ​— the outer two layers. In adults, it is relatively easy to remove a mass such as a polyp and reattach the cover to preserve near-normal phonation. In children, the absence of well-defined outer layers increases the difficulty of a surgeon’s task to remove a mass without damaging the developing layers.

Types of Childhood Voice Disorders Vocal nodules are the most frequent cause of dysphonia in childhood. The cause of vocal nodules in children is similar to their cause in adults — phonotrauma, resulting from excessive closing forces during vocal fold vibration, and subsequent damage to the outer edges of vocal fold tissue. The dysphonia heard in cases of childhood vocal nodules is often described as hoarseness. Other benign vocal fold masses such as polyps and cysts are found in children. Like nodules, these masses have the potential to interfere with adequate vocal fold closure. Also like nodules, polyps, and cysts in adults, the interference with closure may result in the compensation of excessive muscular effort to achieve closure. The original problem — poor closure of the vocal folds due to a phonotrauma-related mass — is magnified by attempts to overcome the dysphonia with increased phonotraumatic behaviors. For children under the age of about 12 years, vocal nodules are more common among boys compared with girls. There is evidence that children — and perhaps especially boys — with extraverted, talkative, and immature behavior styles are more likely to develop vocal nodules. Similar evidence of a link between personality type and vocal nodules exists for adults (Roy, Bless, & Heisy, 2000).

Treatment of Childhood Voice Disorders The treatments described earlier for adult voice disorders are, in many cases, applied in childhood voice disorders. Vocal fold nodules are first treated with behavioral therapy to minimize talking for a period of time, and to teach and practice the use of a “normal” voice. When nodules do not respond to voice therapy, the masses may be removed surgically. Polyps and cysts are more likely than nodules to be treated surgically and followed up with voice therapy.

258

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Chapter Summary There are many different types of voice disorders, with many different causes. Dysphonia is the term used to describe a voice that seems abnormal in pitch, loudness, and/or quality. Based on survey data, it is estimated that approximately 6% to 7% of the adult population experiences dysphonia over any 12-month period. Dysphonia is more common among professional voice users (teachers, singers, actors) than in the general population and can have profound effects on social, emotional, and employment aspects of life. A case history, perceptual evaluation, laryngoscopic study of the vocal folds, and measurement of acoustic and aerodynamic parameters are important to the accurate diagnosis of a voice disorder. Important acoustic measures of the voice include fundamental frequency (F0), intensity, and a voice spectrum; the perceptual correlates of these measures are pitch, loudness, and quality, respectively. Voice disorders are classified in several ways that overlap; the different classifications are not independent. Voice disorders can be classified along a hypohyperfunctional continuum that ranges from very weak to overly forceful closure of the vocal folds during their vibration. Phonotrauma is a classification for a group of voice disorders resulting from excessive use of the voice that results in damage to vocal fold tissue, which in some cases may lead to nodules, polyps, and other lesions on the vocal folds. Organic voice disorders are classifications of dysphonias caused by benign mass lesions on the vocal folds such as nodules and polyps, which typically are the result of phonotrauma. Functional voice disorders are dysphonias in which there is no laryngeal pathology that explains the disorder, as in MTD; a subset of functional voice disorders, called psychogenic voice disorders, are thought to have their roots in psychological disorders. Neurological voice disorders are those in which damage to peripheral nerves or structures of the central nervous system affects vocal fold vibration as a result of weakness, paralysis, or dyscoordination of laryngeal muscles; laryngeal spasms during phonation may also reflect a central nervous system disorder. Laryngeal cancer may occur anywhere within the larynx; lesions of the vocal folds occur in nearly half of all laryngeal cancers and interfere with vocal fold vibration and therefore result in dysphonia.

Treatment for voice disorders ranges from direct therapy to reduce or eliminate dysphonia, including voice exercise and laryngeal manipulation, indirect therapy to counsel patients on issues that may be associated with dysphonia, and surgical techniques to remove benign masses and cancerous lesions. Pediatric voice disorders such as benign vocal fold masses result in a dysphonia often called “hoarseness” and may not be recognized by parents or teachers as a disorder requiring professional attention; a combination of voice and/or surgical therapy may be used for pediatric voice disorders, depending on the type of pathology causing the dysphonia.

References Altman, K. W. (2007). Vocal fold masses. Otolaryngologic Clinics of North America, 40, 1091–1108. Andrea, M., Dias, Ó, Andrea, M., & Fugeira, M. L. (2017). Functional voice disorders: The importance of the psychologist in clinical voice assessment. Journal of Voice, 31, 507.e13–507.e22. Andreassen, M., Litts, J. K., & Randall, D. R. (2017). Emerging techniques in assessment and treatment of muscle tension dysphonia. Current Opinion in Otolaryngology and Head and Neck Surgery, 25, 447–452. Aronson, A. (1990). Clinical voice disorders (3rd ed., p. 131). New York, NY: Thieme Medical. Bainbridge, K. E., Roy, N., Losonczy, K. G., Hoffman, H. J., & Cohen, S. M. (2017). Voice disorders and associated risk markers among young adults in the United States. Laryngoscope, 127, 2093–2099. Baken, R. J., & Orlikoff, R. F. (2000). Clinical measurement of speech and voice. San Diego, CA: Singular Publishing. Bastian, R. W., & Thomas, J. P. (2016). Do talkativeness and vocal loudness correlate with laryngeal pathology? A study of the vocal overdoer/underdoer continuum. Journal of Voice, 30, 557–562. Carding, P., Bos-Clark, M., Fu, S., Gillivan-Murphy, P., Jones, S. M., & Walton, C. (2017). Evaluating the efficacy of voice therapy for functional, organic and neurological voice disorders. Clinical Otolaryngology, 42, 201–217. da Cun Pereira, G., de Oliveira Lemos, I., Gadenz, C., & Cassol, M. (2018). Effects of voice on muscle tension dysphonia: A systematic literature review. Journal of Voice, 32, 546–552. De Bodt, M., Van den Steen, L., Mertens, F., Raes, J., Van Bel, L., Heylen, L., . . . van de Heyning, P. (2015). Characteristics of a dysphonic population referred for voice assessment and/or voice therapy. Folia Phoniatrica et Logopaedica, 67, 178–186. Dietrich, M., Verdolini Abbott, K., Gartner-Schmidt, J., & Rosen, C. A. (2008). The frequency of perceived stress, anxiety, and depression in patients with common pathology affecting voice. Journal of Voice, 22, 472–488.

18  Voice Disorders

Eastwood, C., Madill, C., & McCabe, P. (2014). The behavioural treatment of muscle tension voice disorders: A systematic review. International Journal of Speech-Language Pathology, 17, 287–303. Hintze, J. M., Ludlow, L., Bansberg, S. F., Adler, C. H., & Lott, D. G. (2017). Spasmodic dysphonia: A review. Part 1: Pathogenic factors. Otolaryngology–Head and Neck Surgery, 157, 551–557. Hixon, T. J., Weismer, G., & Hoit, J. D. (2020). Preclinical speech science: Anatomy, physiology, acoustics, perception (3rd ed.). San Diego, CA: Plural Publishing. Hosoya, M., Kobayashi, R., Ishii, T., Senarita, M., Kuroda, H., Misawa, H., . . . Tsunoda, K. (2018). Vocal hygiene education program reduces surgical interventions for benign vocal fold lesions: A randomized controlled trial. Laryngoscope, 128, 2593–2599. Kelchner, L. N., Brehm, S. B., & Weinrich, B. D. (2014). Pediatric voice: A modern, collaborative approach to care. San Diego, CA: Plural Publishing. Kempster, G. B., Gerratt, B. R., Verdolini Abbott, K., BarkmeierKraemer, J., & Hillman, R. E. (2009). Consensus auditoryperceptual evaluation of voice: Development of a standardized clinical protocol. American Journal of Speech-Language Pathology, 18, 124–132. Kent, R. D. (1976). Anatomical and neuromuscular maturation of the speech mechanism: Evidence from acoustic studies. Journal of Speech and Hearing Research, 19, 421–445. Kreiman, J., Gerratt, B. R., Kempster, G. B., Erman, A., & Berke, G. S. (1993). Perceptual evaluation of voice quality: Review, tutorial, and a framework for future research. Journal of Speech and Hearing Research, 36, 21–40. Lee, J. M., Roy, N., & Dietrich, M. (2018). Personality, psychological factors, and behavioral tendencies in children with vocal nodules: A systematic review. Journal of Voice. https://doi.org/10.1016/j.jvoice.2018.07.016 Lee, S., Potamianos, A., & Narayanan, S. (1999). Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. Journal of the Acoustical Society of America, 105, 1455–1468. Leonard, R. (2009). Voice therapy and vocal nodules in adults. Current Opinion in Otolaryngology and Head and Neck Surgery, 17, 453–457. Ludlow, C. L., Domangue, R., Sharma, D., Jinnah, H. A., Perlmutter, J. S., Berke, G., . . . Stebbins, G. (2018). Consensusbased attributes for identifying patients with spasmodic dysphonia and other voice disorders. JAMA Otolaryngology–Head and Neck Surgery, 144, 657–665. Kent, R. D. (1976). Anatomical and neuromuscular maturation of the speech mechanism: Evidence from acoustic studies. Journal of Speech, Language, and Hearing Research, 19, 421–447. Martins, R. H., Pereira, E. R., Hidalgo, C. B., & Tavares, E. L. (2014). Voice disorders in teachers: A review. Journal of Voice, 28, 716–724.

259

Naunheim, M. R., & Carroll, T. C. (2017). Benign vocal fold lesions: Update on nomenclature, cause, diagnosis, and treatment. Current Opinion in Otolaryngology and Head and Neck Surgery, 25, 453–458. Naunheim, M. R., Song, P. C., Franco, R. A., Alkire, B. C., & Shrime, M. G. (2017). Surgical management of bilateral vocal fold paralysis: A cost-effectiveness comparison of two treatments. Laryngoscope, 127, 691–697. Nishio, M., & Niimi, S. (2008). Changes in speaking fundamental characteristics with aging. Folia Phoniatrica et Logopaedica, 60, 120–127. Possamai, V., & Hartley, B. (2013). Voice disorders in children. Pediatrics Clinics of North America, 60, 879–892. Ramig, L. O., & Verdolini, K. (1998). Treatment efficacy: Voice disorders. Journal of Speech, Language, and Hearing Research, 41, S101–S116. Rosen, D. C., Heuer, R. J., Levy, S. H., & Sataloff, R. T. (2003). Psychologic aspects of voice disorders. In J. S. Rubin, R. T. Sataloff, & G. S. Korovin (Eds.), Diagnosis and treatment of voice disorders (2nd ed., pp. 479–506). Clifton Park, NY: Delmar Learning. Roy, N., Bless, D. M., & Heisey, D. (2000). Personality and voice disorders: A superfactor analysis. Journal of Speech, Language, and Hearing Research, 43, 749–768. Roy, N., Merrill, R. M., Gray, S. D., & Smith, E. (2005). Voice disorders in the general population: Prevalence, risk factors, and occupational impact. Laryngoscope, 115, 1988–1995. Roy, N., Peterson, E. A., Pierce, J. L., Smith, M. E., & Houtz, D. R. (2017). Manual laryngeal reposturing as a primary approach for mutational falsetto. Laryngoscope, 127, 645–650. Samlam, R. A., & Story, B. H. (2017). Influence of left-right asymmetries on voice quality in simulated paramedian vocal fold paralysis. Journal of Speech, Language, and Hearing Research, 60, 306–321. Sapienza, C., & Ruddy, B. H. (2017). Voice disorders (3rd ed.). San Diego, CA: Plural Publishing. Schultz, P. (2011). Vocal fold cancer. European Annals of Otorhinolaryngology, Head and Neck Diseases, 128, 301–308. Smillie I., McManus K., Cohen W., Lawson, E., & Wynne, D. M. (2014). The paedriatic voice clinic. Archives of Disorders in Childhood, 99, 912–915. Smith, M. E. (2013). Care of the child’s voice: A pediatric otolaryngologist’s perspective. Seminars in Speech and Language, 34, 63–70. Syamal, N. M., & Benninger, M. S. (2016). Vocal fold paresis: A review of clinical presentation, differential diagnosis, and prognostic indicators. Current Opinion in Otolaryngology and Head and Neck Surgery, 24, 197–202. Van Houtte, E., Van Lierde, K., & Claeys, S. (2011). Pathophysiology and treatment of muscle tension dysphonia: A review of the current knowledge. Journal of Voice, 25, 202–207. Verdolini Abbot, K. (2013). Some guiding principles in emerging models of voice therapy for children. Seminars in Speech and Language, 34, 80–93.

19 Craniofacial Anomalies Introduction Craniofacial anomalies include a wide range of disorders with effects on speech production that are, in a general sense, predictable from the structural (anatomical) problems in the head and neck area. This chapter describes the origin and nature of craniofacial anomalies and relates the structural problems to speech production problems. In addition, the concept of syndromes is introduced. A craniofacial anomaly is often one component of a syndrome, in which there are multiple anomalies. The knowledge that craniofacial anomalies are often a syndrome component is important in health care settings where the totality of a child’s needs must be considered and integrated across various specialists who treat the different problems.

Definition and Origins of Craniofacial Anomalies A craniofacial anomaly is defined as any deviation from normal structure, form, or function in the head and neck area of an individual. The craniofacial anomalies that receive the most attention in this chapter are cleft lip and cleft palate. Cleft lip and cleft palate can occur independently of each other but often occur together. To appreciate the potential independence of cleft lip and cleft palate, basic aspects of embryological development of head and neck structures must be considered. For a

more in-depth presentation of general embryology and with excellent chapters on head and neck embryology see, Schoenwolf, Bleyl, Brauer, and Francis-West (2014) and Moore, Persaud, and Torchia (2019). Most cases of cleft lip and/or cleft palate result from errors in embryological development that result in incomplete or incorrectly formed structures. People who study such embryological errors are called dysmorphologists; dysmorphogenesis is the process of abnormal tissue development during embryological development.

Embryological Development of the Upper Lip and Associated Structures Figure 19–1 shows a photo of an embryo, 26 to 28 days after fertilization; an artist’s drawing of the embryo is at the right. The view is from the side. The structures labeled “Pharyngeal arches” are duplicated on the other side of the embryo (that is, the structures of the embryo are bilaterally symmetrical). Each of the pharyngeal arches, the four ridges in sequence along the side of the embryo, are separated from the adjacent arch by a deep groove. The pharyngeal arches are numbered from one (closest to the head) to four as labeled in Figure 19–1. A fifth and sixth pharyngeal arch are located behind the fourth, but they lack the prominent ridge-like bulge of the first four and are not easily seen. 261

262

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Pharyngeal arches

12 3 4

Figure 19–1.  Left, photo of embryo 26 to 28 days after fertilization; right, artist’s drawing of the embryo with first four arches labeled.

The pharyngeal arches are the source of embryological tissue for the development of the majority of head and neck structures. The first pharyngeal arch is the source of tissue for the development of the lower lip, jaw, upper lip, and hard palate, as well as many structures of the ear. A frontal view of the embryonic head tissue that develops into head and neck structures is shown at 5, 6, 8, and 10 weeks post-fertilization in Figure 19–2. The structures labeled “mandibular prominence” and “maxillary prominence” are generated from the first pharyngeal arch. In the center of the embryo, the small space just above the mandibular prominence (henceforth, “mandibular arch”) is the primitive mouth which is called the “stomodeum.” Embryological development of the mandibular arch results in the lower jaw (the mandible). The development of the maxillary prominence results in the fully formed upper lip, parts of the nose, a small wedge of bone that will eventually contain the upper four front teeth, and the bones that form the roof of the mouth. Most of the muscular soft palate (velum) is formed from the 4th and 6th pharyngeal arch. The formation of the upper lip and its associated structures proceeds as follows. Roughly 6 weeks after fertilization, the two maxillary prominences begin to “push” toward the center of the face. The nearly circular

structures right next to the maxillary prominences (Figure 19–2, upper right panel), which could be mistaken for the primitive eyes but are actually the early version of the nose, are moved toward the center of the face by this push. As the maxillary prominences and circular structures move toward the center of the face, they begin to fuse, meaning that tissues of the two structures knit together. Figure 19–2 shows this process as a sequence of drawings over time. The fusion is complete around 10 weeks, and when successful creates an upper lip that is continuous from the right to left corner of the mouth, a well-formed nose, and a small wedge of bone in the front of the mouth. Figure 19–3 (discussed further in the next paragraphs) shows on the right another view, looking up at the roof of the mouth at the completed upper lip and wedge of bone. Keep in mind that the wedge of bone that will contain the upper four teeth is formed with the upper lip and soft tissue of the nose. It is not considered part of the hard palate, even though it becomes bone. Figure 19–3 shows the formation of the hard palate in two views, one looking directly into the mouth (left column of images), and the other looking up at the roof of the mouth (right column of images). At 7 weeks, the two shelves of embryological tissue (called the palatine shelves, see Figure 19–3) that become the

19  Craniofacial Anomalies

5 weeks

263

6 weeks

Maxillary prominence

Eye

Mandibular prominence

Stomodeum

8 weeks

Maxillary prominence Mandibular prominence

10 weeks

Eye

Philtrum

Figure 19–2.  Embryological development of upper lip and associated structures, nose, philtrum, intermaxillary segment (wedge of bone with upper four front teeth, see Figure 19–3), at 5, 6, 8, and 10 weeks post-gestation).

bony hard palate are widely separated; there is an open space between them. The top image in the left column shows why this is so. The palatine shelves are oriented at a downward angle and, in fact, are beneath the tongue tissue that is high in the mouth because the mandible (lower jaw) is still very small. The embryological tongue prevents the palatine shelves from “snapping up” into the horizontal position, which is required for the shelves to meet and fuse, thus forming the hard palate. As indicated by the downward-pointing arrow (left column, top image), the tongue must be lowered to get out of the way of the palatine shelves. The right-hand image at 7 weeks, this view from above the tongue and looking directly up at what becomes the roof of the mouth, shows the wide separation between the palatine shelves.

At 9 weeks, the tongue has lowered, and the palatine shelves have snapped up into the horizontal position (seen in both the left and right images of the middle row). The lowering of the tongue is made possible by growth of the mandible, which allows the tongue to drop down in the mouth and free the palatine shelves to move. The shelves are not yet touching each other as shown in both of these middle row images, but the opening between the shelves is smaller at 9 weeks than it was at 7 weeks. The tissue of both shelves grows toward the midline to fuse and form a continuous hard palate that separates the oral cavity from the nasal cavity. This growth is indicated in the image in the right column by the short horizontal arrows pointing toward each other.

264

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

7 weeks Primary palate

Nasal chamber

Nasal Septum

Tongue

Palatine shelf

9 weeks

Nasal chamber

Primary palate

Nasal Septum Palatine shelf Tongue

12 weeks

Nasal chamber

Incisive foramen Nasal Septum

Oral cavity

Fused palatal shelves Tongue Uvula

Figure 19–3.  Embryological development of hard and soft palates at 7, 9, and 12 weeks postgestation. Note at 7 weeks palatine shelves trapped under tongue; at 9 weeks mandible has grown substantially, allowing tongue to drop in oral cavity so palatine shelves can snap up into horizontal position and grow to midline where they fuse together, front to back. Left column of images, looking directly into the mouth; right column of images, looking upward to the roof of the mouth.

The bottom images show the fused palatine shelves forming a continuous hard palate. The closure of the hard palate is seen looking up to the roof of the mouth (left image) and in the frontal view (right image). The images at the bottom show the development of the structures at 12 weeks post-gestation. The image in the right column contains a vertical arrow pointing from the front of the mouth to the back. This is the direction of fusion of the hard and soft palates. Fusion occurs first at the front, immediately behind the yellow wedge of bone described previously. Fusion continues

systematically from front to back, like a closing zipper. This fusion pattern is important for understanding partial clefts of the palates, which are discussed later.

Embryological Errors and Clefting:  Clefts of the Lip Cleft lips are the result of an error (dysmorphogenesis) in the embryological process of upper lip development. The errors are failures of tissue moving from the

19  Craniofacial Anomalies

two sides of the developing lip to meet each other and “knit” tissue together to form a continuous lip. These embryological errors occur in varying degrees, and on one or both sides of the lip midline. A cleft on one side only is a unilateral cleft lip; a cleft on both sides is a bilateral cleft lip. Unilateral cleft lips are more common than bilateral cleft lips. Unilateral cleft lips may be partial, sometimes no more than a “notch” in the upper lip, or a cleft that is halfway between the upper lip and floor of the nostrils. A complete unilateral cleft lip extends into the base of the nasal cavity (Figure 19–4, left). Note how the cleft is to the left of the center of the mouth (see Box, “More Embryology”).

265

Figure 19–4, right, shows bilateral (both sides), complete clefts of the upper lip. The mass of tissue in the center of the lip is the intermaxillary segment (wedge of bone in Figure 19–3, in yellow), which like the lip is cleft on both sides and therefore not attached to the maxillary arch (pink in Figure 19–3).

Embryological Errors and Clefting:  Clefts of the Palate Clefts of the palate are the result of an error in development of the palatal shelves, or of the mandible, or both. Like clefts of the upper lip, clefts of the palate occur in

More Embryology Two questions that are often asked by students about orofacial embryology are as follows: (a) why are clefts of the lip off the midline (to the left or right of the center of the mouth), and (b) what is the meaning of a unilateral versus bilateral cleft of the palate? Both questions can be answered with reference to Figure 19–3. First, note in the right-hand column of images the boundaries between the yellow wedge of bone (called the intermaxillary segment) and the front part of the maxillary arch (the pink part). Remember that the embryological formation of the intermaxillary segment is part of the formation of the upper lip. The boundaries between the intermaxillary segment and the maxillary arches are off center;

Unilateral, complete cleft lip

they are also the location of fusion of embryological tissue to make a complete upper lip. An embryological error of failure to fuse tissue occurs at these boundary points ​— ​hence, the lip clefts are to the side of the center of the mouth. Second, note in the left-hand column of images the nasal septum, which during embryological development of the palate descends and fuses with the nasal surface of the palate. What happens to the nasal septum when there is a cleft palate? Either the nasal septum is attached to one maxillary (palatine) shelf, or it is attached to neither. A unilateral cleft palate is when the septum is attached to one shelf; a bilateral cleft palate is when the septum is attached to neither shelf.

Bilateral, complete cleft lip

Figure 19–4.  Left, complete, unilateral cleft of the lip; right, complete bilateral clefts of the lip.

266

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

varying degrees, from a minimal, notch-like split at the back of the soft palate to a full split from the back of the soft palate to the bone wedge formed with the development of the upper lip. Figure 19–5 shows drawings of three degrees of cleft palate severity, ranging from the notch-like defect in the back of the soft palate (left), to a partial cleft of the soft and hard palates (middle), and to a complete defect of the hard palate, extending to the wedge of bone formed with the upper lip. The middle image shows a partial cleft of the soft and hard palates; the dashed line is the approximate boundary between the hard and soft palates. Clefts of the palate also result when the mandible fails to grow sufficiently to allow the tongue to lower in the oral cavity so that the palatine shelves are free to snap up in the horizontal position. As previously described, early in the embryological development of the hard palate, the palatine shelves are trapped beneath the developing tongue tissue. When the mandible fails to grow at a specific time during the embryological schedule of development, the two shelves remain trapped and lose their ability to fuse together. The cleft palate is a result not of a primary embryological error of palatal development, but of a primary error of mandible development that prevents the palatine shelves from lifting and fusing. The mandible may eventual grow sufficiently to allow the tongue to drop in the oral cavity, and the palatine shelves may elevate to the horizontal position. The problem is that the tissue growth required of the two palatine shelves to extend to the midline for fusion is, like much of the rest of embryological development, on a schedule. When the schedule is not met or prevented, as when the high tongue prevents the shelves from meeting their time for growth, the schedule for the growth is cancelled.

Photos of partial and complete clefts of the palate are shown in Figure 19–6. A partial cleft is on the left and a complete cleft is on the right. The partial cleft on the left is called a submucous cleft. The mucous membrane that covers the muscles of the soft palate is intact, but the underlying muscle is cleft. Note the small “notch” (arrow) in the soft palate. A complete cleft of the hard and soft palates is seen on the right of Figure 19–6. Both photographs show clefts of the palate in the absence of clefts of the lips.

Cleft Lip With or Without a Cleft Palate; Cleft Palate Only (Isolated Cleft Palate) As noted, cleft lips and palates occur as independent dysmorphologies. Children are born with cleft lips but a fully formed palate, or isolated cleft palates with a fully formed upper lip. Some scientists and clinicians believe, however, that a cleft lip with a cleft palate is a more severe form of cleft lip alone. In other words, clefting of the lip and palate together is viewed as one category of clefting, with cleft lip alone being a less severe form of the defect. This category is referred to as cleft lip with or without a cleft palate (abbreviated as CL/P). Isolated cleft palate (abbreviated as CPO) is regarded as a category separate from CL/P. The distinction between the two categories is made because (a) sex ratios computed for babies born with clefts favor boys for CL/P but favor girls for isolated cleft palate; (b) the incidence (frequency of occurrence) of the two categories is different — approximately 1 in 700 babies are born with CL/P, but the incidence for isolated cleft

Figure 19–5.  Left, a partial cleft of the soft palate; center, partial cleft of soft and hard palates, extending from the back of the soft palate to the middle of the hard palate; right, complete cleft of the soft and hard palates. The dashed, horizontal line is the approximate boundary between the hard and soft palates.

19  Craniofacial Anomalies

Submucous cleft palate

267

Complete cleft palate

Figure 19–6.  Left, partial cleft of the soft palate, submucous cleft with a small notch indicated by arrow; right, complete cleft of hard and soft palates.

palate is about 1 in 2,000; and (c) CL/P occurs more frequently in certain racial groups as compared to others (e.g., more frequently in Asian as compared to African American populations), but isolated cleft palate does not seem to vary with racial group (Shkoukani, Lawrence, Liebertz, & Svider, 2014).

Epidemiology of Clefting The epidemiology of clefting is complicated because it depends on where the data were collected. Prevalence of clefting at birth varies by country. European birth registries for the years 1993 to 1998 showed a substantially higher rate of CL/P in Sweden and Norway compared with Spain. Even within a single country, such as France, the rate of CPO for the same years was higher than the rate of CL/P. An average global estimate of the prevalence of orofacial clefting (both CL/P and CPO) is about 1 in every 700 live births (Mossey, Little, Monger, Dixon, & Shaw, 2009). In the United States, the prevalence of live-birth clefting is lower, estimated at 1 in every 940 live births for Cl/P, and 1 in every 1,574 for CPO (Parker et al., 2010). In a Norwegian study covering nearly 2,700 live births of children with clefting between the years 1967 and 1998, CL/P was roughly 1.5 times more likely than CPO. The same study reported that 42% of these chil-

dren had clift lip only, and 58% had a cleft lip and cleft palate (Harville, Wilcox, Lie, Vindenes, & Abyholm, 2005). Prevalence of CL/P in different countries should be kept in mind when thinking about the place of clefting in the big picture of medical care. In the United States, CL/P is among the most common birth defects. For more information, see: https://www.asha​ .org/PRPSpecificTopic.aspx?folderid=8589942918&​ section=Incidence_and_Prevalence

Speech Production in CL/P and CPO As presented in Chapter 10, the velopharyngeal port (hereafter, VP port) is the passageway between the oral and nasal cavities. The VP port is opened and closed by muscular forces. These forces lift the soft palate to the posterior pharyngeal wall, as shown in Figure 19–7. The salmon-colored soft palate is shown in the open position, which allows airflow coming from the lungs and through the larynx to pass through the pharynx, into the nasal cavities, and through the nostrils to the atmosphere. In contrast, when the soft palate is lifted by muscular forces, it is pushed against the posterior pharyngeal wall (soft palate shown as red structure with dashed outline). The muscular lift of the soft palate is accompanied by muscular activity of the pharyngeal walls. Pharyngeal muscles squeeze the sides of the soft

268

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Nasal cavities

Soft palate Lips Tongue

Pharynx Vocal folds

Figure 19–7.  Vocal tract with lips closed for bilabial stop consonant /b/. With the VP port open (salmon-colored soft palate) the airflow leak into the nasal cavities prevents the pressure buildup in the vocal tract required for correct production of /b/. With the VP port closed (red soft palate) the vocal tract is completely sealed and pressure can be developed for /b/.

palate. These muscular actions — lifting the soft palate and squeezing its sides when it makes contact with the back (posterior) wall of the pharynx — close off communication between the nasal cavities and oropharyngeal cavities. The effect of squeezing the sides of the soft palate with contraction of the pharyngeal muscles is as relevant to VP closure as the contact of the soft palate with the posterior pharyngeal wall. Opening of the velopharyngeal port from the closed position is largely due to relaxation of the lifting and squeezing muscles which allows gravity to pull down on the soft palate. A small degree of muscular contraction may also contribute to opening the VP port. Overall, the VP port is closed and opened like the tightening and loosening of a sphincter. The VP port is constantly opening and closing during speech production, opening for nasal sounds such as “m,” “n,” and the “ng” sound in words like “ring,” and closing for vowels and obstruent consonants. In English, the VP port closes for vowels and other vocalic-like sounds (such as “r,” “l,” “w,” and diphthongs like the sounds in “eye,” “ay,” and “oy”) to prevent them from sounding too nasal. These vowel

and vowel-like sounds can be produced with an open VP port, but they will sound displeasing because of excessive nasality resulting from the transmission of sound through the nasal cavities. Closure of the VP port for obstruent consonants — ​ stops, fricatives, and affricates — is required for the buildup of oral pressure in the vocal tract. Obstruents are speech sounds that have a tight or complete constriction in the vocal tract. When airflow encounters these constrictions, pressure behind them rises above atmospheric pressure provided there is no leak to the atmosphere. These positive pressures are necessary for the correct production of obstruents, as discussed in Chapters 10 and 12. A potential leak through the VP port is prevented by its closure, so that the effort of increasing air pressure behind the obstruent constriction is not compromised by the escape of air through the nasal passageways. The requirement that the VP port be closed for production of obstruents is illustrated in Figure 19–7 by closure of the lips for the bilabial stop consonant /b/. The lip closure seals the vocal tract at its front end and closure of the VP port seals the vocal tract toward

19  Craniofacial Anomalies

the back. As air flows into the closed volume of the vocal tract, pressure builds up, just as it should for a stop consonant. A completely sealed vocal tract cavity is necessary for all stops, fricatives, and affricates. This review suggests two major problems when an individual does not have good control over the opening and closing of the VP port during speech production. First, the individual is likely to have intermittent or chronic hypernasality. Hypernasality denotes excessive nasality during speech, and especially during vowels. Hypernasality is regarded by most listeners as aesthetically displeasing and has the potential to produce a muffled-sounding sequence of sounds, making speech relatively difficult to understand. Second, a speaker who has difficulty closing the VP port cannot produce obstruents correctly because of a partial or complete inability to develop a positive pressure in the vocal tract. An open VP port during an attempt to produce a stop consonant results in air leaking through the nasal cavities and into the atmosphere. This makes it nearly impossible to build up and maintain an oral pressure for the time interval during which the oral cavity is sealed by either the lips (as in “b” and “p”) or by contact between the tongue and part of the palate (“t,” “d,” “k,” “g”). Failure to develop a positive oral pressure and produce obstruents correctly is likely to have a major effect on speech intelligibility. Patients who have problems with control of the VP port have velopharyngeal insufficiency (typically abbreviated VPI). VPI is a matter of degree; patients may have no control over the opening and closing of the VP port, whereas others may have some, but not complete control. It is not unusual for a person with a repaired cleft palate to have some remaining VPI, and therefore hints of hypernasality and perhaps audible escape of air through the nose when he produces obstruent consonants. An individual with an unrepaired cleft palate (an almost nonexistent circumstance in developed countries for children who are old enough to have speech and language skills) has VPI of the most severe kind, because the structural basis for closing the VP port does not exist. The concept of VPI is more applicable to those patients born with a craniofacial anomaly and who have had surgery to restore the structural basis for opening and closing the VP port. The surgeries are not only designed to close the palate, thus eliminating the “hole” in the roof of the mouth, but also to reattach muscles in their proper orientation so the VP port can be closed for speech production (and, it should be added, efficient swallowing: see Chapter 20). As noted previously, surgery does not always eliminate VPI completely.

269

Diagnosis of VPI Perceptual analysis is one approach to the diagnosis of VPI. Hypernasality may be judged by listening and making a qualitative judgment (e.g., mild hypernasality versus severe hypernasality) or using simple scales, such as a 5-point scale with one end of the scale defined as “no hypernasality,” and the other end of the scale defined as “severe hypernasality.” In addition, listening for consonant errors made by a child or adult under evaluation for a speech disorder can be a valuable source of diagnostic information. Most speech-language pathologists, including those who do not regularly provide services for children or adults with cleft palate, are able to listen to a speech sample from an individual and, with a high degree of certainty, identify the presence of VPI due to cleft palate. Perceptual analysis, however, cannot give reliable information on the magnitude of, or specific reasons for, the VPI problem. For example, two patients may have the same-sounding hypernasality and consonant errors due to VPI but have different underlying reasons for the VPI. Recall the sphincter-like closure of the VP port, produced by several different muscles. Different muscles may be responsible for VPI in different speakers. Precise knowledge of which muscles are contributing to VPI may have direct implications for surgical and behavioral treatment of the problem. Fortunately, speech-language pathologists and surgeons have techniques for direct visualization of the VP port. Two of these techniques are described briefly here; the interested reader can consult PetersonFalzone, Trost-Cardamone, Karnell, and Hardin-Jones (2017), and Kummer (2018, 2020) for further information on these and other techniques. Most patients with craniofacial anomalies and VPI have x-ray studies performed to gain knowledge on the function of the VP port. These studies focus the x-rays at the VP port, from several different angles. A technique called videofluoroscopy generates x-ray motion pictures to evaluate movement of VP port structures during speech. An example of the benefit of these studies is the ability to determine whether VPI is due to inadequate movement of the soft palate, the pharyngeal walls (due to contraction of the pharyngeal muscles, as previously described), or both. This information, which cannot be derived from perceptual evaluation, is very informative for the combined efforts of an SLP and craniofacial surgeon to determine future management of an ongoing problem with VPI. Another way to visualize the VP port is by means of a nasoendoscope inserted through the nose and positioned just above the port. Advantages of this kind

270

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

of examination are the ability to see the entire, circular perimeter of the port, and to see which parts of the circle are not moving in cases of VPI. Nasoendoscopy is usually performed using a topical anesthetic applied within the nasal cavities. This lessens the discomfort of the procedure. A skilled endoscoper can collect valuable information from this procedure without exposing the patient to radiation.

VPI and Hypernasality As previously noted, the VP port opens and closes during speech. The three sounds of English called nasals (“m,” “n, “ng”) are produced with an open VP port because they are meant to sound nasal. An open VP port during the production of vowels and other vocalic sounds, however, causes speech to sound hypernasal. This is because sound waves move through the nasal cavities where they produce characteristic nasal resonances. The vowels of the person with VPI therefore have the characteristic resonances of the vocal tract — ​ those usually associated with vowels and vocalic sounds ​ — mixed with the nasal resonances. It is as if the vowels are “colored” by nasal acoustic energy, a coloring not present when the VP port is closed for vowel production. The idea of closure of the VP port for obstruents and opening for nasals is a simplification of the behavior of the port during connected speech such as occurs during conversation. The size of the VP port is constantly changing during connected speech, from completely closed to completely open, as well as many sizes in between these two extremes. The complexity of VP port behavior during speech produced by a healthy adult can be observed at https://www.youtube.com/ watch?v=-kHtGlhPs3Y . Children who are born with a cleft palate and who have had surgery to repair the cleft around 12 months of age may still, in later childhood, be perceived with excessive hypernasality. Among recently studied 10-year-olds with repaired clefts and variable severity of hypernasality, children with greater severity had other developmental difficulties (such as delayed language or reading difficulties). Most children with repaired clefts and “normal” nasality did not have such problems (Feragen, Auckner, Særvold, & Hide, 2017). When compared to children who are not hypernasal, children who have a surgically repaired hard palate but remain hypernasal are perceived as less intelligent, less likely to make friends, and more likely to be teased (Watterson, Mancini, Brancamp, & Lewis, 2013). These are negative social consequences of hypernasality in cleft palate speech.

VPI, Consonant Articulation, and Speech Intelligibility Speech-language pathologists are interested in the kinds of consonant errors made by individuals who seek therapeutic services for speech intelligibility problems. The thought is that if a set of errors is identified by means of careful testing, the nature of specific consonant errors, or a pattern revealed by several different consonant errors, can guide a therapy plan (Kummer, 2011). Clinical observation and formal research studies have established the kinds of consonant obstruent errors made by individuals with clefts and VPI. These errors fall into one of two categories: obligatory errors and compensatory errors (Kummer, 2011).

Obligatory Errors Obligatory obstruent errors are produced in the correct way except for the required closure of the VP port. For example, a child with a repaired cleft palate and VPI may produce the correct articulatory placement for a “d,” with the tongue tip properly placed at the front of the palate, immediately behind the upper front teeth. However, the VPI results in air leakage through the VP port, which prevents the proper buildup of oral pressure within the vocal tract. The resulting obstruent speech sound is weak, possibly sounding somewhat like a nasal of the same place of articulation (e.g., an “n” instead of the intended “d”). The sound may also be perceived as a more complex error because there is audible nasal emission of air during the attempt to produce the stop. In this case, the sound has stop-like characteristics but is not heard as correct due to the nasal emission. Audible nasal emission is the result of air rushing through the nasal cavities as a result of VPI. Speech sound errors such as these are called obligatory because even with correct positioning of the articulators, other than the soft palate and pharyngeal walls, the sound is obligated to be produced as an error because of the VPI.

Compensatory Errors Unlike obligatory errors, compensatory errors are not produced with correct placement of the articulators. Compensatory errors, as the category label suggests, are a response to VPI, specifically as a way speakers “solve” the problem of air leakage through the VP port. The lips are the front end of the vocal tract tube and the larynx is the back end; the VP port is like a valve midway between these two locations. The valve, when open, allows air coming through the glottis to

19  Craniofacial Anomalies

flow through the VP port and into the nasal cavities. Some speakers with VPI compensate for the leak at the VP port by forming an articulatory constriction for an obstruent consonant before (i.e., in back of) the leak. This is shown in Figure 19–8, where a compensatory articulation error (right) is contrasted with an obligatory articulation error (left). In both cases, the “target” sound is /d/, a voiced stop consonant. The left image shows the constriction in the vocal tract made by the tongue tip pressing against the palate just in back of the upper teeth; this is the correct place of articulation for this sound. The VP port is open, however, which prevents the required buildup of pressure inside the vocal tract. The image on the right shows a compensatory solution to this problem. The constriction for the stop is made in the pharynx, before the airflow reaches the open VP port. The place of articulation for all English obstruents is in front of the VP port. Attempts to produce any of these sounds with tongue placement in the “correct” way (obligatory errors) cannot avoid a leak caused by VPI. Production of obstruents behind the VP port avoids a leak, allowing a buildup of air pressure behind the constriction to generate the desired popping noise

271

of stop consonants and hissing noise of fricatives (see Chapters 10 and 12). The compensatory errors produced by children (or adults) with VPI due to cleft palate are errors of place of articulation; manner of articulation is retained. The compensatory stop consonant shown in Figure 19–8 is produced with the characteristics of a stop (complete blockage of airflow in the vocal tract for a brief interval, released suddenly to make the popping noise), but at the wrong place of articulation. Many children with repaired cleft palates who continue to have VPI make these kinds of errors. The error sounds are called pharyngeal stops, pharyngeal fricatives, and glottal stops. In pharyngeal stops, the airstream is completely blocked by placing the back of the tongue against the posterior pharyngeal wall and then releasing it suddenly. The articulation of pharyngeal fricatives is similar, except that the constriction formed between the back of the tongue and the posterior pharyngeal wall is not quite complete, allowing air to be forced through the narrow passageway to generate a hissing noise. Pharyngeal stops are used to replace the “correct” stops of English (most often “k” and “g”) and pharyngeal fricatives the “correct”

Nasal cavities Nasal cavities Outer nose

Outer nose Velopharyngeal port

Airflow

Airflow

Figure 19–8.  Left, an obligatory error for /d/, tongue makes the required constriction at the front of the hard palate to block airflow at the alveolar place of articulation (immediately behind the teeth) but VP port is open due to VPI, preventing the buildup of pressure within the vocal tract; right, a compensatory error for /d/, tongue blocks air at a place of articulation behind the leak, in the pharynx, with the resulting speech sound retaining the manner of articulation (stop) but produced at the wrong place of articulation.

272

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

fricatives (“f,” “v,” “th,” “s,” “z,” “sh,” and “zh”). Note that the manner of articulation is preserved in these error patterns.1 Glottal stops are frequent compensatory errors in speakers with VPI. They are made by bringing the vocal folds together forcefully and after a short interval blowing them apart with a high tracheal pressure. The two vocal folds are like two articulators that form a complete constriction, and the pressure immediately below them, in the trachea, is like the positive oral pressure typical of stop consonants. Speakers with VPI typically substitute glottal stops for the “correct” stops of English, especially voiceless stops /p/, /t/, and /k/ in words such as “puppy” and “light.” Many speakers with VPI produce a mixture of obligatory and compensatory errors. When children begin producing compensatory errors early in their developmental history, the errors may become habitual and resistant to therapeutic modification. This is an argument for early surgery to correct a cleft palate, to establish a normal or near-normal VP mechanism, and to prevent compensatory errors before they begin and over time become fixed as part of the child’s sound system. Obligatory and compensatory speech sound errors, or a mixture of both kinds, affects a speaker’s intelligibility. When there is substantial hypernasality in addition to these errors, the child’s ability to be intelligible, and thus to communicate, is further impaired (Kummer, 2011).

Clefting and Syndromes A syndrome is a group of anomalies, including anatomical, physiological, and/or behavioral components, that are observed in an individual and have been observed previously in other individuals. Some of the components of a syndrome may seem unrelated, such as heart and specific psychiatric problems, but their presence in a group of individuals suggests that they are part of a pattern, perhaps with a single, underlying cause (Shprintzen & Bardach, 1995). Syndromes may also be defined as a collection of anomalies that occur together in individuals but without a single, underlying known cause. The syndromes discussed here are all conditions present at birth or known to be present at birth. Many syndromes have a genetic cause, such as velocardiofacial syndrome and Treacher Collins syndrome (see later discussion in this chapter). Environmental factors may also result in syndromes present at birth, such as fetal alcohol syndrome and syndromes associated with vitamin deficiencies. More than 275 syndromes include some form of craniofacial anomaly (Leslie & Marazita, 2013). Many of these syndromes are rare; the following summaries are of more frequently occurring syndromes with CL/P or CPO as a component. The reader is encouraged to search the Internet for photographs of individuals with the syndromes described in the following sections.

Palatoplasty and Speech Sound Errors Palatoplasty is the surgery performed to close a cleft palate. It is more than simply closing the hole, however. Muscles that lift and shape the soft palate for contact with the posterior pharynx are not attached correctly in a cleft palate; a critical part of the surgery for speech outcomes is the reconfiguration and correct attachment of these muscles to the tissue of the soft palate. Obligatory speech sound errors are likely to be eliminated by successful surgery. This is because the problem in obligatory errors is the inability to close the VP port — the rest of the articulatory characteristics are correct for the target sound. Compensatory errors are not “fixed” by successful

1

palatoplasty, because the errors are due to more than the functioning of the VP port. The child with compensatory errors has, in a sense, learned a new system for producing obstruents — move the place of articulation behind the leak. It is almost as if the child has invented a new sound system to deal with the available speech structures. This is why palatoplasty is usually recommended as early as possible — perhaps 9 to 12 months of age, before the child is developing the speech sound system in the context of first words. Compensatory errors are less likely when VPI is significantly reduced or eliminated by surgery prior to first words.

 haryngeal stops and fricatives are not part of the sound inventory of English but are present in the sound inventory of Arabic and Hebrew. P It may be the case that these sounds are not made with a constriction between the tongue and pharynx, but rather between the epiglottis and pharynx (Ladefoged & Maddieson, 1996). It is equally possible that an epiglottal-pharyngeal articulation is the basis for some of the compensatory errors heard in speakers with VPI.

19  Craniofacial Anomalies

22q11.2 Deletion Syndrome (Velocardiofacial Syndrome) Children with velocardiofacial syndrome (VCFS), also known as 22q11.2 deletion syndrome, often have clefts of the palate. The anomalies appearing most often in VCFS include an isolated cleft palate (hence, the “velo” part of the syndrome name), heart defects (hence, “cardio”), and a characteristic appearance of the face (“facial”), which has been described as pear shaped with a long nose having a broad root, a small jaw, and small ears. Many children with VCFS have learning disabilities, and approximately 10% may develop severe psychiatric disturbance around the time of puberty. VCFS is a genetic disorder, involving a deletion of genetic material on the 22nd chromosome pair. VCFS occurs in 1/4,000 births, and usually involves speech and language disorders (McDonald-McGinn & Sullivan, 2011). Like many of the genetic disorders discussed in previous chapters, the phenotype of VCFS varies widely, even with the same genotype. People with VCFS have characteristics of the syndrome that range from very mild to very severe.

Treacher-Collins Syndrome (Mandibulofacial Dysostosis or First Pharyngeal Arch Syndrome) Treacher-Collins syndrome is also called first pharyngeal arch syndrome because its main characteristics reflect problems in the development of structures originating from this arch. As described earlier, the first pharyngeal arch generates all structures of the face, as well as the hard palate, the external ear, and parts of the middle ear. Typical facial differences in children born with Treacher-Collins syndrome include downward-slanting eyes, underdeveloped or missing cheek bones, a small jaw, underdeveloped or malformed ears, and an unusually shaped mouth. An individual with Treacher-Collins syndrome may have an unusually shaped palate or a cleft palate. In addition, the small bones in the middle ear that conduct sound (see Chapter 22) may be affected, producing some hearing loss. Treacher-Collins syndrome is believed to result from a problem with chromosome 5, which is responsible for the embryological development of the first pharyngeal arch. The syndrome is fairly rare, being found in roughly one in every 10,000 births. TreacherCollins syndrome is distinctive, however, for having its symptom set confined to the structural malformations described previously. In most cases, persons with Treacher-Collins have normal cognitive abilities and no other anomalies in distant structures or functions.

273

Stickler Syndrome Stickler syndrome includes a range of anomalies, all of which — including cleft palate — can be traced to problems with genes that control the production of collagen. Collagen is a protein important for the creation of cartilage, connective tissue, and some bony structures that have cartilaginous origins during embryological development. Specifically, an individual with Stickler syndrome is likely to have a round, flattened face, a variety of eye problems including detachment of the retina and degeneration of the fluid inside the eyeball, problems with joints, poor growth of long bones resulting in short stature, hearing loss resulting from structural problems in the end organ of hearing (called the cochlea, analogous to the retina in the eye: see Chapter 22), an underdeveloped mandible, and cleft palate. When a cleft palate occurs in Stickler syndrome, it is probably a result of the underdeveloped jaw that prevented the palatine shelves from moving into the proper position for fusion (see earlier). Like VCFS and Treacher-Collins syndrome, Stickler syndrome has a genetic basis. The genes that fail to produce a normal pattern of collagen production are apparently located on the 1st, 6th, and 12th chromosomes. An estimate of the prevalence of Stickler syndrome is 1 in 7,500 to 1 in 9,000 births (Printzlau & Andersen, 2004).

Craniosynostosis (Apert and Crouzon Syndromes) Craniosynostosis is a syndrome in which the bones of the skull do not fuse together at the proper time. Craniosynostosis includes Apert and Crouzon syndromes (as well as other nonsyndromic variants), which have slightly different characteristsics (not discussed here). In a study conducted in Atlanta. craniosynostosis occurred in 4.3 in every 10,000 births between the years 1989 and 2003 (Boulet, Rasmussen, & Honein, 2008). A frequent consequence of the early fusion of skull bones is cleft palate. Craniosynostosis has a genetic basis, often associated with anomalies of the 10th chromosome.

Cleft Palate: Other Considerations CL/P and CPO are complex health care problems. The focus in this chapter has been on speech production in children with these craniofacial anomalies, but many children with CL/P or CPO have additional issues that must be dealt with by a health care team. Children with CL/P and CPO have problems with dentition, multiple surgeries, socialization, language development, hearing, and nutrition (especially as young children). The

274

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

health care team is therefore likely to include dentists, maxillofacial and plastic surgeons, psychologists/ social workers, speech-language pathologists, audiologists, and nutritionists. Of particular interest are the possibilities of delayed language development and hearing loss in children with CL/P and CPO. In a review of language development in CL/P and CPO, Morgan and colleagues (2017) note that language disorders are not unusual in this group of children and are likely to be more severe in children with CPO. The greater severity of language disorders in children with CPO may be related to the greater likelihood of CPO being associated with a syndrome, compared with CL/P. Many syndromes such as VCFS in which CPO is a component may be characterized by language learning and cognitive delays (Hardin-Jones & Chapman, 2011). This may explain why children with CPO have more severe language disorders compared with children who have CL/P. Hearing problems in children with CL/P and CPO are largely due to frequent middle ear infections (technically, otitis media with effusion [OME]: see Chapter 23). Frequent OMEs are a problem with nearly all children with clefts. The hearing loss is conductive, meaning that it is caused by interference with the middle ear mechanism, not the sensory organs of the inner ear. The mild-to-moderate hearing loss associated with OME is present during the infection but not when it resolves. The high percentage occurrence and recurrence of OME in children with clefts (75% or greater), compared to children with normal head and neck structures (around 20%), is not well understood but is thought to be related to the orofacial defects that are part of clefting (Flynn, Möller, Jönsson, & Lohmander, 2009). Not all children with recurrent OMEs have hearing loss when the middle ear is infected. As in the general population, the frequency of OMEs in children with clefts decreases with age. OMEs are a health problem for children regardless of their other concerns. A specific concern with the high rate of OMEs in children with clefts is that the frequent conductive hearing loss, even though not severe, may interfere with language development and have lasting effects on children’s literacy skills. Happily, there is little evidence that this is the case (Roberts et al., 2004).

Chapter Summary A craniofacial anomaly is defined as any deviation from normal structure, form, or function in the head and neck area of an individual; this chapter focuses on cleft lip and cleft palate.

Knowledge of embryological development of the head and neck is important to understanding how and where clefts of the upper lip and palate occur. The embryological errors that result in cleft lip and cleft palate are independent; a child can be born with a cleft lip and cleft palate, a cleft lip only, or a cleft palate only. Clefts of the upper lip almost always occur to the side of the center of the mouth; the clefts may be unilateral (one side only) or bilateral (both sides) and vary in severity from a very small notch in the lower lip to a complete cleft lip from the lower lip through the floor of the nasal cavity. Clefts of the palate may arise from two types of embryological error: one in which the palatine shelves snap up into a horizontal position but fail to generate sufficient tissue to meet in the middle and fuse, the other in which the palatine shelves cannot snap up into the horizontal position because they are trapped beneath the tongue, which has not been lowered due to incomplete growth of the mandible. Clefts of the palate may be minor ranging from a small notch at the back of the soft palate, to a complete cleft, splitting the palate from back to front. Two general categories of clefts are recognized: cleft lip with or without a cleft palate (CL/P) and cleft palate only (CPO). The incidence of clefting varies depending on country/region and other factors; worldwide the incidence is about 1 in every 700 births. The velopharyngeal port (VP port), the passageway between the pharyngeal and nasal cavities, is opened by gravity and perhaps some muscular forces, and closed by complex muscular forces of the soft palate and pharynx. When the VP port is open, air can flow from the pharyngeal cavity to the nasal cavities; during speech, an open VP port is consistent with the production of nasal sounds (such as /m/, /n/, /ŋ/ [final sound in “ring”]). When the VP port is closed, air is prevented from flowing into the nasal cavities; if there is another “seal” in the vocal tract, such as the closure of the lips for a /b/, the vocal tract is a closed volume and the flow of air into it from the lungs and through the vibrating vocal folds causes the pressure in the vocal tract to rise above atmospheric pressure. Children or adults who have difficulty producing closure of the VP port even after surgical repair of palatal clefts have some degree of velopharyngeal insufficiency (VPI) which is likely to result in hypernasality and obstruent errors (stops, fricatives, and affricates, speech sounds that require a buildup of pressure inside the vocal tract).

19  Craniofacial Anomalies

Obstruent speech sound errors that result from VPI are categorized as obligatory errors and compensatory errors. Obligatory errors are those in which the speaker has the correct articulatory placement for the sound, but the leak through the VP port prevents the buildup of air pressure required for correct production of the sound. Compensatory errors are those in which the speaker changes the place of articulation for the “target” sound to avoid the leak, and thus produces the correct manner of articulation (e.g., a stop manner or a fricative manner); compensatory errors are often made with a pharyngeal or glottal place of articulation. Many syndromes (a collection of symptoms and/ or anomalies that occur together and are seen repeatedly in a number of children) have cleft palate as one of their characteristics; it is more typical for these clefts to be isolated clefts of the palate, rather than cleft lip with or without a cleft palate. Diagnosis of VPI is done by perceptual and instrumental methods, the latter including x-ray and endoscopic techniques that provide direct visualization of the VP port to determine why it is not closing correctly. Clefting is a complex health care problem, requiring a team that may include speech-language pathologists, audiologists, surgeons, nutritionists, and psychologists.

References Boulet, S. L., Rasmussen, S. A., & Honein, M. A. (2008). A population–based study of craniosynostosis in metropolitan Atlanta, 1989–2003. American Journal of Medical Genetics, 146, 984–991. Feragen, K. B., Auckner, R., Særvold, T. K., & Hide, Ø. (2017). Speech, language, and reading skills in 10-year-old children with palatal clefts: The impact of additional conditions. Journal of Communication Disorders, 66, 1–12. Flynn, T., Möller, C., Jönsson, R., & Lohmander, A. (2009). The high prevalence of otitis media with effusion in children with cleft lip and palate as compared to children without clefts. International Journal of Pediatric Otorhinolaryngology, 73, 1441–1446. Hardin-Jones, M., & Chapman, K. L. (2011). Cognitive and language issues associated with cleft lip and palate. Seminars in Speech and Language, 32, 127–140. Harville, E. W., Wilcox, A. J., Lie, R. T., Vindenes, H., & Abyholm, F. (2005). Cleft lip and palate versus cleft lip only: Are they distinct defects? American Journal of Epidemiology, 162, 448–453.

275

Kummer, A. K. (2011). Speech therapy for errors secondary to cleft palate and velopharyngeal dysfunction. Seminars in Speech and Language, 32, 191–198. Kummer, A. K. (2018). A pediatrician’s guide to communication disorders secondary to cleft lip/palate. Pediatric Clinics of North America, 65, 31–46. Kummer, A. K. (2020). Cleft palate and craniofacial conditions (4th ed.). Burlington, MA: Jones and Bartlett Learning. Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford, UK: Blackwell. Leslie, E. J., & Marazita, M. L. (2013). Genetics of cleft lip and cleft palate. American Journal of Medical Genetics. Part C. 163, 246–258. McDonald-McGinn, D. M., & Sullivan, K. E. (2011). Chromosome 22q11.2 deletion syndrome (DiGeorge syndrome/ velocardiofacial syndrome). Medicine, 90, 1–18. Moore, K. L., Persaud, T. V. N., & Torchia, M. G. (2019). The developing human (11th ed.). Philadelphia, PA: Saunders. Morgan, A. R., Belluci, C. C., Coppersmith, J., Linde, S.B., Curtis, A., Albert, M., . . . Kapp-Simon, K. (2017). Language development in children with cleft palate with or without cleft lip adopted from non–English-speaking countries. American Journal of Speech-Language Pathology, 26, 342–354. Mossey, P. A., Little, J., Monger, R. G., Dixon, M. J., & Shaw, W. C. (2009). Cleft lip and palate. The Lancet, 374, 21–27. Parker, S. E., Mai, C. T., Canfield, M. A., Rickard, R., Wang, Y., Meyer, R. E., . . . Correa, A. (2010). Updated national birth prevalence estimates for selected birth defects in the United States, 2004–2006. Birth Defects Research Part A: Clinical and Molecular Teratology, 88, 1008–1016. Peterson-Falzone, S. J., Trost-Cardamone, J. E., Karnell M. P., & Hardin-Jones, M. A. (2017). The clinicians guide to treating cleft palate speech (2nd ed.). St. Louis, MO: Elsevier. Printzlau, A., & Andersen, M. (2004). Pierre Robin sequence in Denmark: A retrospective population-based epidemiological study. Cleft Palate and Craniofacial Journal, 41, 47–52. Roberts, J., Hunter, L., Gravel, J., Rosenfeld, R., Berman, S., Haggard, M., . . . Wallace, I. (2004). Otitis media, hearing loss, and language learning: controversies and current research. Journal of Developmental and Behavioral Pediatrics, 25, 110–122. Schoenwolf, G. C., Bleyl, S. B., Brauer, P. R., & Francis-West, P. H. (2014). Larsen’s human embryology (5th ed.). New York, NY: Elsevier. Shkoukani, M. A., Lawrence, L. A., Liebertz, D. J., & Svider, P. F. (2014). Cleft palate: A clinical review. Birth Defects Research (Part C), 102, 333–342. Shprintzen, R. J., & Bardach, J. (1995). Cleft palate speech management. St. Louis, MO: Mosby. Watterson, T., Mancini, M., Brancamp, T. U., & Lewis, K. E. (2013). Relationship between the perception of hypernasality and social judgments in school-aged children. Cleft Palate-Craniofacial Journal, 50, 498–502.

20 Swallowing Introduction

Anatomy of Swallowing

The ease of eating and drinking is deceptive. These are complicated activities that require coordinated actions of the lips, mandible, tongue, velum, pharynx, larynx, esophagus, and other structures. Because eating and drinking engage many of the same structures and much of the same airway as those used for speaking and breathing, it is not uncommon for there to be competition between these activities or for tradeoffs to occur when trying to do them simultaneously. For example, chewing must stop to be able to speak clearly, and breathing must stop to be able to swallow safely. The entire act of placing liquid or solid substance in the oral cavity, moving it backward to the pharynx, propelling it into the esophagus, and allowing it to make its way to the stomach is called deglutition. Although the word swallowing is sometimes used as a synonym for deglutition, swallowing actually includes only certain phases of deglutition. Nevertheless, to simplify the explanations that follow, the term “swallowing” is used in place of deglutition and is meant to include all phases of deglutition. Swallowing disorders are common in hospital settings, and in fact account for a significant number of hospital deaths. The deaths are usually a result of pneumonia due to food and/or drink going into the lungs instead of the stomach. Food and drink in the lungs cause bacterial infections — serious cases of pneumonia.

Figure 20–1 shows the structures that participate in swallowing. These structures extend from the lips to the stomach. Structures such as lips, jaw, tongue, soft palate, pharynx, and larynx are used in speech production as well as swallowing (see Chapter 10). The esophagus and stomach, critical structures for the process of swallowing, are not used in the speech production process.

Esophagus The esophagus is a flexible tube, about 20 to 25 cm long in adults, which extends from the lower part of the pharynx to the stomach. The esophagus begins below the base of the larynx and runs behind the trachea and lungs. It runs through the diaphragm (see Figure 20–1) and enters the abdominal cavity where it connects to the stomach. It is composed of a combination of skeletal (voluntary) muscle (the top part of the tube) and smooth muscle (the bottom part of the tube). Smooth muscle is not under “willful” control (like, e.g., muscles of the arm), but contracts in reaction to various stimuli. The upper end of the esophagus is normally closed. During swallows, it is opened by muscular action (see later in this chapter).

277

278

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Oral cavity Pharyngeal cavity Vocal folds Esophagus

Lower esophageal sphincter

Diaphragm Stomach

Pyloric sphincter

Small intestine

Figure 20–1.  Structures of the swallowing mechanism.

Stomach

The Act of Swallowing

The stomach is a large, saclike structure made up of smooth muscle, mucosa, and other tissue. It is on the left side of the abdominal cavity, against the undersurface of the diaphragm. The stomach is connected to the lower esophagus on one end, and to the small intestine at the other end. After a typical meal, the stomach holds about a liter of solid and/or liquid substance. Gastric juices in the stomach break up ingested substances so that they can be absorbed into the body through the stomach lining.

Although many of the structures that participate in swallowing are the same as those that are used for speaking, the forces and movements for the two activities are very different. In general, the forces are greater and many of the movements are slower during swallowing than during speech production. There are four phases of swallowing. These phases, illustrated in Figure 20–2, are the oral preparatory phase, oral transport phase, pharyngeal phase, and esophageal phase. The phases are used to describe

20 Swallowing

the movement of a bolus through the oral, pharyngeal, and esophageal regions of the anatomical structures of the swallowing process. Bolus is the word used to refer to the volume of liquid or the mass of solid substance

Oral preparatory

Oral transport

279

being swallowed. The physiological events associated with each of these phases are described in the following text and summarized in Table 20–1. In Figure 20–2, the green areas show the location and shape of the

Pharyngeal

Esophageal

Figure 20–2.  Images of the oral preparatory, oral transport, pharyngeal, and esophageal phases of swallowing. Table 20–1 provides a summary of these phases.

Table 20–1.  Summary of the Actions Associated With the Four Phases of Swallowing

Swallowing Phase

Actions

Oral preparatory

This phase begins as the liquid or solid substance comes in contact with the oral opening and ends with the bolus held in the oral cavity with the back of the tongue elevated to contact the velum and create an impenetrable wall. This phase can be as short as 1 second when ingesting liquid and as long as 20 seconds when chewing (preparing) a solid food.

Oral transport

During this phase the bolus is transported back through the oral cavity to the pharynx. To do so, the tongue elevates in progressively more posterior regions to push the bolus back toward the pharynx, the velum begins to elevate, and the upper esophageal sphincter begins to relax. This phase lasts less than 1 second.

Pharyngeal

During this phase, the bolus usually divides to run through the right and left sides of the bottom of the tongue and is transported through the pharynx to the upper esophageal sphincter. This phase is “triggered” automatically once the bolus passes the back of the oral cavity and is associated with numerous and rapid events: the velopharynx closes, the tongue pushes the bolus backward, the pharynx constricts segmentally, the hyoid bone and larynx move upward and forward, the larynx is closed, and the upper esophageal sphincter opens. This phase lasts less than 1 second.

Esophageal

This phase begins when the bolus enters the upper esophageal sphincter. At the same time, the lower esophageal sphincter relaxes. The bolus is moved through the esophagus by peristaltic contractions. This phase ends when the bolus enters the stomach and can last from 8 to 20 seconds.

280

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

bolus for each phase; the time sequence of the phases is from left to right.

Oral Preparatory Phase The oral preparatory phase is depicted in the first panel of Figure 20–2. This phase begins as a solid or liquid substance makes contact with the structures at the front of the oral cavity. The jaw is lowered and the lips part in anticipation of the swallow (Shune, Moon, & Goodman, 2016). What happens next depends on the nature of the substance to be swallowed.

Liquid Swallows If the substance is liquid, the jaw elevates and the lips close. This creates a tight closure at the front of the mouth to contain the bolus. The bolus is contained in the front region of the oral cavity by actions of the tongue and other structures, and held there momentarily (for about 1 second). The front of the tongue depresses and the sides of the tongue elevate to form a cup for the bolus (Dodds et al., 1989). The back of the tongue elevates to make contact with the soft palate to form a back wall that separates the oral from the pharyngeal cavities and helps ensure that no substance leaks through into the throat, and possibly into the pulmonary airways. During the oral preparatory phase, the velopharyngeal port is open so that breathing can continue with air flowing to and from the lungs through the nasal passageways. Many people stop breathing momentarily at this point in the swallow (this is called the apneic interval) or even before the glass or straw reaches the lips (Martin, Logemann, Shaker, & Dodds, 1994; Martin-Harris, Brodsky, Price, Michel, & Walters, 2003; Martin-Harris, Michel, & Castell, 2005). The stoppage of breathing reduces the risk of aspiration, which is the invasion of food or drink below the vocal folds and into the lungs.

Solid Swallows The events of the oral preparatory phase are different when the substance to be swallowed is solid rather than liquid, primarily because solid substances need to be chewed into smaller pieces and mixed with saliva before being transported toward the esophagus. Saliva is an important ingredient in this process because it moistens the solid substance to facilitate its transport. Saliva also introduces enzymes that begin to break down the substance for digestion. Actions of the mandible (and teeth), lips, tongue, and cheeks grind and manipulate the solid substance

into a well-formed bolus and position it on the front surface of the tongue. The lips may close (though this is not necessary) while the mandible moves to grind the bolus. During chewing, the mandible moves up and down, forward and backward, and side to side. The soft palate makes contact with the back part of the tongue to seal off the oral from the pharyngeal cavity and prevent the bolus from moving into the pharynx and larynx. The velopharyngeal port is open during preparation of the bolus, and breathing may either continue or may stop momentarily (McFarland & Lund, 1995; Palmer & Hiiemae, 2003). The duration of the oral preparatory phase may last from as short as 3 seconds, when chewing a soft cookie, to as long as 20 seconds, when chewing a tough piece of steak. At the end of the oral preparatory phase, the substance in the oral cavity is ready to be consumed. Usually the bolus is immediately transported back toward the pharynx (oral transport phase, see the next section).

Oral Transport Phase The oral transport phase is shown in the second panel of Figure 20–2. From the ready position (the oral preparatory phase), the bolus is transported back through the oral cavity. This is done by using the tongue tip to squeeze the bolus against the hard palate; then progressively more posterior regions of the tongue elevate and squeeze the bolus against the palate, moving the bolus back toward the pharynx. At the same time as the bolus is being moved in the direction of the palate, the velopharyngeal port begins to close as the top of the esophagus begins to open. The oral transport phase is short, lasting less than a second (Cook et al., 1994; Tracy et al., 1989).

Pharyngeal Phase The pharyngeal phase of the swallow is “triggered” when the bolus approaches the boundary between the back of the oral cavity and the pharynx. During this phase, depicted in the third panel of Figure 20–2, several events occur rapidly and nearly simultaneously to move the bolus quickly through the pharynx while protecting the airway. “Protecting the airway” means not allowing any part of the bolus to travel through the vocal folds and into the trachea, and then deeper into the lungs. This pharyngeal phase is under “automatic” neural control, so that once triggered, it proceeds as a relatively fixed set of events that cannot be altered voluntarily (except in certain respects that are not covered

20 Swallowing

here). These events occur within about half a second (Cook et al., 1994; Tracy et al., 1989) and include velopharyngeal closure, elevation of the hyoid bone and larynx, laryngeal closure, pharyngeal constriction, and opening of the upper entrance to the esophagus. During the pharyngeal phase, the velopharynx closes like a sphincter valve by elevation of the velum and constriction of the pharyngeal walls. This closure is forceful (more forceful than for speech production) so as to prohibit substances from passing through the nasopharynx into the nose. The hyoid bone and larynx (Figure 20–3) move upward and forward as a result of contraction of muscles that attach to the hyoid bone. These muscles have their origins on the jaw, and in the floor of the mouth. As the hyoid bone is pulled upward and forward, the larynx is pulled along with it by its muscular and nonmuscular connections to the hyoid

bone. Elevation of the larynx also causes the pharynx to shorten. Closure of the larynx for swallowing forms a seal to the entrance of the trachea to protect the pulmonary airways. Closure occurs at multiple levels, which include the vocal folds, the false vocal folds (located immediately above the true vocal folds), and the epiglottis. Upward and forward movement of the larynx contribute to airway protection by tucking the larynx against the root of the tongue and moving the trachea away from the pathway of the bolus to the esophagus. As the tongue propels the bolus into the pharynx, the muscles of the pharynx contract from top to bottom. The tongue root moves backward and the pharyngeal walls constrict to squeeze the bolus toward the esophagus. The top-to-bottom contraction of pharyngeal muscles is like a directional “squeeze” toward the

Hyoid bone Epiglottis

281

Thyroid cartilage Arytenoid cartilages Cricoid cartilage

Figure 20–3.  Skeletal framework of the larynx and associated structures; note especially the hyoid bone and cartilages of the larynx.

282

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

esophagus. This is conceptually similar to the front-toback squeezing by the tongue against the palate in the oral transport phase, described previously. As all of these events are taking place, the upper border of the esophagus is opening to allow the bolus “in.” Two sets of actions appear to contribute to its opening: (a) stretching of the upper esophageal sphincter by forward and upward movement of the hyoid bone and larynx, respectively, and (b) relaxing of a muscle that forms a ring around the top of the esophagus (Omari et al., 2016).1 As suggested by the concept of a muscle relaxing to accomplish a task (in this case, swallowing), this muscle is usually in a state of contraction, holding the top of the esophagus closed.

Esophageal Phase The esophageal phase, the initial part of which is illustrated in the right-most panel of Figure 20–2, begins when the bolus enters the upper esophagus and ends when it passes into the stomach through the lower opening of the esophagus (seen in Figure 20–1). This phase may last anywhere from 8 to 20 seconds (Dodds, Hogan, Reid, Stewart, & Arndorfer, 1973). The bolus is propelled through the esophagus by peristaltic actions (alternating waves of contraction and relaxation) of the esophageal walls. Peristaltic contraction raises pressure behind the bolus and relaxation lowers pressure in front of the bolus, creating the pressure differential needed to propel it toward the stomach. The esophageal phase of swallowing, like the pharyngeal phase, is “automatic” in the sense of not being under voluntary control.

Overlap of Phases Although the phases of swallowing are described as though they are discrete and occur one after the other, in fact, they overlap substantially. When a person is eating a solid substance, for example, preparation of part of the bolus in the oral cavity may continue while another part of the bolus moves into the pharyngeal area, as illustrated in Figure 20–4. The two boluses, originally a single bolus, are shown in Figure 20–4 as the brown-speckled material. The partial bolus in the lower pharynx may remain above the esophagus as long as 10 seconds before it merges with the rest of the bolus and the pharyngeal transport phase of the swallow is triggered (Hiiemae & Palmer, 1999). 1

Figure 20–4.  Eating, in which one part of the bolus has been moved into the lower pharynx, while another part is still in the oral preparatory phase of swallowing.

Breathing and Swallowing Protection of the pulmonary airways during swallowing depends, in large part, on the coordination of breathing and swallowing. Without such coordination, inspiration might occur at the same time a substance is being transported through the pharynx, and that substance might be “sucked” through the larynx into the pulmonary airways (aspiration). This is avoided, as reviewed earlier, by closing the larynx, an action that stops breathing for a brief period during the swallow. The risk of aspiration appears to be further reduced by timing the swallow to occur during the expiratory phase of the breathing cycle. During single swallows (swallowing a single bolus), the most common pattern is expiration-swallow-expiration; that is, expi-

 his ring of muscle is the same tissue that can be used as an alternate sound source when a patient has had his larynx removed because of T cancer, as described in Chapter 18.

20 Swallowing

ration begins, the swallow occurs (accompanied by breath-holding), and then expiration continues (Martin et al., 1994; Martin-Harris, 2006; Nishino, Yonezawa, & Honda, 1985; Perlman, Ettema, & Barkmeier, 2000; Selley, Flack, Ellis, & Brooks, 1989; Smith, Wolkove, Colacone, & Kreisman, 1989).

Nervous System Control of Swallowing A person with swallowing problems is said to have dysphagia. Dysphagia has many causes, including neurological disease. People with Parkinson’s disease, multiple sclerosis, and amyotrophic lateral sclerosis (ALS), as well as people recovering from stroke, may have dysphagia as a prominent component of their challenges. Thus, it is important to have some idea of how the nervous system controls the swallowing process. The neural control of swallowing is complex and not completely understood. Nevertheless, studies of humans and animals have offered important insights into how swallowing is controlled by the nervous system. Some of the important features of that control are discussed in the next sections.

Role of the Peripheral Nervous System Nearly all the structures involved in swallowing are the same as those involved in speech production (the most notable exceptions being the esophagus and stomach). Those structures that participate in both swallowing and speech production are innervated by the spinal nerves and cranial nerves as described in Chapter 2. The cranial nerves are involved in swallowing through their innervation of the lips, mandible, tongue, velum, pharynx, and larynx; spinal nerves are primarily involved in breathing and its cessation as they relate to swallowing. Disease or trauma problems that affect the cranial nerves that serve these structures, such as damage to a laryngeal nerve that results in unilateral vocal fold paralysis (Chapter 18) or damage to the nerves that control the contraction of pharyngeal muscles, are likely to affect swallowing ability. The voluntary and smooth muscles of the esophagus are also innervated by nerve branches that arise from a single cranial nerve. This nerve is called cranial nerve X, or the vagus nerve. It is noteworthy that the vagus nerve and its branches also innervate muscles of the larynx. Because the larynx plays a critical role

283

in protection of the airway in swallowing, and the esophagus has a muscle that relaxes to open its top to allow entry of food and drink, and the muscles of the esophagus push the bolus down into the stomach, damage to this one nerve has the potential to create major swallowing problems. The coordinated phases of swallowing, described previously, can be significantly disrupted by damage to this single cranial nerve (Corbin-Lewis & Liss, 2015).

Role of the Central Nervous System Although swallowing and speech production are executed using many of the same peripheral nerves, central nervous system control of these two activities is quite different. This means that a given structure, such as the tongue, is under one form of neural control during swallowing and under another form of neural control during speech production. Because of this, it is possible to have central nervous system damage that impairs the function of a neural structure for speech production but not swallowing, and vice versa. There are two major regions within the central nervous system that are responsible for the control of swallowing. One is in the brainstem, and the other is in cortical and subcortical areas. The brainstem center is located primarily in the medulla, the part of the brainstem that is contiguous with the uppermost part of the spinal cord. The brainstem center has primary control over the more automatic phases of swallowing (pharyngeal and esophageal phases). Many cortical and subcortical regions contribute to the generation and shaping of swallowing behaviors. Activity in cortical and subcortical areas of the cerebral hemispheres has a strong influence over the control of the more voluntary phases of swallowing (oral preparatory phase, including chewing, and oral transport phase). Cortical damage due to stroke (and possibly other diseases that affect structures in the cerebral hemisphere), however, may also affect the automatic phases of swallowing (pharyngeal phase and esophageal phase). Sensory input that travels from structures of the head and neck to the central nervous system, via cranial nerves, is also an important component of swallowing. Many forms of sensory input play a role in swallowing, including (among others) bolus size, texture, temperature, taste, and unpleasant stimuli (e.g., something that tastes bad, has a texture that is sufficiently dif­ferent from the range of familiar bolus textures, and so forth).

284

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Sphenopalatineganglio­ neuralgia This sounds like something you wouldn’t want to meet in the dark. But it comes from something really good. As a child (or even as an adult) you may have said the phrase, “I scream, you scream, we all scream for ice cream.” Scream has a meaning of anticipation in this context, but it can also have a meaning of hurting. You know the feeling. You take a bite of ice cream and momentarily hold it against the roof of your mouth before you swallow it. Then suddenly you get an intense, stabbing pain in your forehead. What’s up? The pain is caused as your hard palate warms up after you made it cold. Cold causes vasoconstriction (reduction in blood vessel diameter) in the region, which is followed by rapid vasodilation (increase in blood vessel diameter). It is the rapid vasodilation that hurts and gets your attention. The technical term for this pain is “sphenopalatineganglioneuralgia.” The common term (and the one more easily pronounced) is “brain freeze.” Fortunately, the pain lasts only a few seconds.

Variables That Influence Swallowing A number of variables influence swallowing, including characteristics of the bolus, the swallowing mode, and developmental and aging effects. The sections below discuss each of these.

Bolus Characteristics Although the act of swallowing occurs generally as described near the beginning of this chapter, the precise nature of the swallow is determined, in part, by what exactly is being swallowed. Bolus consistency and texture, volume, and taste are variables that have been found to influence the act of swallowing.

Consistency and Texture One of the most important contrasts that determines swallowing behavior is the difference between liquids and solids. Whereas a liquid bolus is usually held briefly in the front of the oral cavity before being propelled to the pharynx, a solid bolus may be moved to the phar-

ynx and left there for several seconds while the remainder of the bolus continues to be chewed (Hiiemae & Palmer, 1999; Palmer, Rudin, Lara, & Crompton, 1992; see Figure 20–5, later in the chapter). Although something similar can also happen with liquids (Linden, Tippett, Johnston, Siebens, & French, 1989), it is much less common, except in cases where a combined liquidand-solid bolus is chewed and swallowed (Saitoh et al., 2007), such as what might occur during mealtime eating (Dua, Ren, Bardan, Xie, & Shaker, 1997). Even when not combined, the consistency of liquids and the textures of solid food influence swallowing behavior. Liquid substances can be characterized according to consistency, ranging from as thin as water to as thick as pudding. Differences in consistency have been shown to influence swallowing. Thick liquids or puree consistencies tend to take longer to swallow than thin liquids (Chi-Fishman & Sonies, 2002; Im, Kim, Oommen, Kim, & Ko, 2012). Tongue forces in the oral preparatory and oral transport phases are higher when swallowing thick substances compared to thin liquids (Chi-Fishman & Sonies, 2002; Steele & van Lieshout, 2004). As might be predicted, it is more difficult to maintain a cohesive (single) bolus when swallowing thinner liquids as compared to thicker liquids. As a result, laryngeal penetration (where part of the bolus moves into the opening to the larynx but remains above the vocal folds; Robbins, Hamilton, Lof, & Kempster, 1992) is more common when swallowing thin liquids than when swallowing thicker substances (Daggett, Logemann, Rademaker, & Pauloski, 2006; Steele et al., 2015). The textures of solid substances can also influence the swallow (Steele et al., 2015). For example, the harder and drier the substance, the greater the number of chewing cycles (Engelen, Fontijn-Tekamp, & van der Bilt, 2005), the longer the duration of the initial transport of the bolus from the anterior oral cavity to the region where the oral cavity meets the pharynx (Mikushi, Seki, Brodsky, Matsuo, & Palmer, 2014), and the greater number of times the tongue squeezes the bolus back toward the pharynx (Hiraoka et al., 2017).

Volume It seems intuitive that the volume (size) of the bolus might affect the swallow, and most studies indicate that, in fact, it does (Chi-Fishman & Sonies, 2002; Cook et al., 1989; Kahrilas & Logemann, 1993; Logemann et al., 2000; Logemann, Pauloski, Rademaker, & Kahrilas, 2002; Perlman, Palmer, McCulloch, & VanDaele, 1999; Perlman, Schultz, & VanDaele, 1993; Tasko, Kent, & Westbury, 2002). When a person is swallowing a larger bolus compared to a smaller bolus, tongue

20 Swallowing

movements are generally larger and faster, hyoid bone movements begin earlier and are more extensive, pharyngeal wall movements and laryngeal movements are larger, and the upper esophageal sphincter relaxes and opens earlier and stays open longer (Cock, Jones, Hammer, Omari, & McCulloch, 2017; Kahrilas & Logemann, 1993). Despite the success of the adjustments made to accommodate a larger bolus, there tends to be a greater frequency of laryngeal penetration as bolus size increases, at least for liquid boluses. For example, part of the bolus penetrates the opening into the larynx more than twice as often when swallowing a 10-mL bolus than when swallowing a 1-mL bolus (Daggett et al., 2006). Nevertheless, when laryngeal penetration occurs in healthy individuals, the substance is almost always pushed away from the larynx and transported to the esophagus without being aspirated (going below the vocal folds).

Sword Throats Sword swallowing is an ancient art that continues to be practiced. There is even a Sword Swallowers Association International with both professional and amateur members from all over the world. The practice and ill effects of sword swallowing were discussed in an article in the prestigious British Medical Journal (Witcombe & Meyer, 2006). Major complications from sword swallowing are more likely when the swallower is distracted or when swallowing unusual swords. Complications can include — little wonder — perforation of the pharynx or esophagus, gastrointestinal bleeding, pneumothorax (collapsed lung), and chest pain. Novice sword swallowers must learn to desensitize the gag reflex, align the upper esophageal sphincter with the neck hyperextended, open the upper esophageal sphincter, and control retching as the blade is moved down the pipe. All in all, it does not sound like fun; it also makes for a very long bolus.

Development Many important anatomical changes influence swallowing during the period from infancy through childhood. Among these changes are the following (Arvedson & Brodsky, 2002): (a) the infant’s tongue goes from nearly filling the oral cavity to filling only

285

the floor of the oral cavity; (b) the infant’s oral cavity goes from being edentulous (lacking teeth) to having a full set of “baby” teeth; (c) the cheeks of the infant have fatty pads (sometimes called sucking pads) that eventually disappear, to be replaced with muscle; (d) the infant goes from having essentially no oropharynx to having a distinct one as the larynx descends; and (e) the infant’s larynx goes from being one-third adult size, with relatively large arytenoid cartilages and a high position within the neck, to the adult configuration and position. Swallowing (of amniotic fluid) begins well before birth, as early as 12.5 weeks’ gestation (Humphrey, 1970). Interestingly, although many of the components of swallowing are in place before birth, velopharyngeal closure during swallowing is not (Miller, Sonies, & Macedonia, 2003). Immediately after birth, the velopharynx closes for swallowing and the infant exhibits a suckling pattern characterized by forward and backward (horizontal) movements of the tongue (Bosma, 1986; Bosma, Truby, & Lind, 1965). These tongue movements are accompanied by large vertical movements of the mandible and serve to draw liquid into the oral cavity. Around the age of 6 months, this suckling pattern converts to a sucking pattern that is characterized by raising and lowering (vertical) movements of the tongue, firm approximation of the lips, and less pronounced vertical movements of the mandible. Sucking is stronger than suckling and allows the infant to pull in thicker substances into the oral cavity and to begin the ingestion of soft food (Arvedson & Brodsky, 2002). During the first few months of life, the infant relies on breastfeeding (or nipple feeding from a bottle) for all nutritional intake. This form of feeding consists of suck-swallow or suck-swallow-breathe sequences, typically repeated several times (8 to 12 times) and followed by a rest period (several seconds). During this period, several oral reflexes that aid in early feeding are active. These disappear around 6 months of age, with the exception of the gag reflex, which remains active throughout childhood and adulthood. Knowledge of the neural interactions among feeding, swallowing, and airway protection is essential for providing quality care for infants with impairments in any of these functions (Jadcherla, 2017). By about 6 months of age, infants are ready to begin eating solid foods and being fed by spoon. Foods such as crackers and soft fruits and vegetables are introduced during the next few months. The basic patterns for chewing are in place by 9 months and continue to develop over the next few years of life (Green et al., 1997; Steeve, Moore, Green, Reilly, & Ruark McMurtrey, 2008). By 2 to 3 years of age, the child is able to eat regular table food.

286

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Age As with most physiological functions, swallowing changes with age across adulthood. The most prominent age-related change is that swallowing becomes slower, particularly after age 60 years (Leonard & McKenzie, 2006; Logemann et al., 2002; Robbins et al., 1992; Sonies, Parent, Morrish, & Baum, 1988). An outcome of the age-related slowing of the swallow (combined with age-related reductions in sensory function; for example, see Malandraki, Perlman, Karampinos, & Sutton, 2011) is that the frequency of laryngeal penetration increases with age (Daniels et al., 2004; Robbins et al., 1992). Laryngeal penetration occurs in people over 50 years about twice as often as it occurs in adults under 50 years, and more frequently when swallowing liquids than when swallowing solids (Daggett et al., 2006). Although this appears to be a dangerous situation and a possible precursor to aspiration, in healthy individuals the substance is moved out of the space immediately above the vocal folds to be rejoined with the rest of the bolus (Daggett et al., 2006). Nevertheless, the risk of aspiration may be higher in older adults, compared to younger adults, because of their greater tendency to inspire immediately after swallowing (Martin-Harris, Brodsky, et al., 2005).

Measurement and Analysis of Swallowing Measurement and analysis of swallowing are not only critical to research endeavors but have also become essential to clinical practice and to the diagnosis and management of dysphagia (meaning swallowing disorders and pronounced dis-FAY-juh). Measurement of swallowing using instruments is especially important when considering that as many as half of the clients who aspirate do so “silently” without any signs of coughing or other signs of visible or audible struggle (Logemann, 1998). In such cases, aspiration can only be detected through instrumental examination (as previously defined, aspiration is the invasion of food or drink below the vocal folds and into the lungs). There are many approaches to measuring and analyzing swallowing. Two are highlighted here — videofluoroscopy and endoscopy.

Videofluoroscopy Videofluoroscopy is an x-ray technique used to image the movements of speech production. Videofluoroscopy can also be used to image movements associated

with the opening and closing of the velopharyngeal port (Chapter 18). Videofluoroscopy is also used routinely in clinical settings to evaluate swallowing in clients with suspected dysphagia. When used for this purpose, the substance to be swallowed is mixed with barium sulfate. Barium is a contrast material that allows the bolus to be tracked visually as it travels through the oral, pharyngeal, and esophageal regions. The videofluoroscopic swallow examination, sometimes called a modified barium swallow (MBS) study, was first described by Logemann, Boshes, Blonsky, and Fisher (1977). The adjective “modified” is used to differentiate this study from a barium swallow study, which is conducted by a gastroenterologist to evaluate esophageal structure and function. A videofluoroscopic examination is usually conducted with the client seated in a specially designed chair. The examination is performed in a radiology laboratory, with a radiologist (or radiology technician) running the x-ray equipment and a speech-language pathologist directing the swallowing protocol. The examination protocol typically consists of the swallowing of a series of liquid and solid substances (mixed with barium or accompanied by ingestion of a barium capsule to provide contrast) that vary in volume and consistency or texture. Figure 20–5 is an example of a videofluoroscopic image of the oral preparatory phase of swallowing. The single frame was extracted from a moving image (i.e., a movie) recorded during a swallow. Both timing and spatial measurements can be made from the videofluoroscopic images. Temporal (timing) measurements can be made to determine the time from the beginning of bolus movement from the oral cavity to its arrival at the upper esophageal sphincter. Spatial measurements may be used to determine the extent of velar elevation or extent of the upward and forward movement of the hyoid bone. There are also measures of the amount of penetration (entry of food or liquid to the laryngeal area) and/or aspiration (Rosenbek, Robbins, Roecker, Coyle, & Wood, 1996). It is generally agreed that videofluoroscopy provides the most comprehensive evaluation of swallowing, and for many it is considered the “gold standard” of measurement. It has several advantages over other measurement approaches, including the following: (a) it provides relatively clear views of nearly all the important structures involved in swallowing and their movements, with the exception of the vocal folds; (b) it is possible to visualize barium-laced substances through the oral preparatory, oral transport, pharyngeal, and esophageal phases; (c) it is possible to view swallowing events from at least two different perspectives (from the side, and from the front); and (d) it is

20 Swallowing

Figure 20–5. Videofluoroscopic image showing the oral preparatory phase of a swallow. The large, dark area in the oral region is the bolus. The thin, dark line that runs along the tongue to the epiglottis indicates that there may be some trace bolus residue from a previous swallow or that there has been some premature spillage during the oral preparatory phase. Modified and reproduced with permission from “Dynamic swallow studies: measurement techniques,” by R. Leonard and S. McKenzie in Dysphagia assessment and treatment planning: A team approach (2nd ed., p. 273). Edited by R. Leonard and K. Kendall, 2008, San Diego, CA: Plural Publishing, Inc. Copyright 2008 by Plural Publishing, Inc.

possible to identify penetration and aspiration events. The major disadvantages of videofluoroscopy are that it requires exposure to radiation, it must be coordinated with radiology, and it cannot be conducted at bedside.

Endoscopy Another way to visualize swallowing is with endoscopy (Langmore, Schatz, & Olson, 1988), also mentioned in Chapter 19. This approach requires the use of a flexible fiberoptic endoscope, like the one used for visualizing the larynx and velopharynx. To view the swallowing apparatus, the endoscope is inserted through one of the nares (following the administration of topical anesthesia and decongestant), routed through the velopharyngeal port, and guided until its tip is positioned in the laryngopharynx (the bottom part of the pharynx, just above the opening into the larynx). This approach

287

is called flexible endoscopic evaluation of swallowing (FEES). No x-rays are used in endoscopy, and therefore no barium-infused boluses are required for evaluation of swallowing disorders. A FEES station is shown in Figure 20–6. The examination usually includes a preliminary viewing of the structures included in the process of swallowing, such as (among others) the velopharyngeal region, pharyngeal walls, back part of the tongue, entrance to the larynx, and the vocal folds. Abnormalities in structure or color are noted and are used to help interpret abnormal swallow behaviors. The evaluation protocol is similar to that used for videofluoroscopic examination, using liquid and solid substances of different consistencies, textures, and volumes. Endoscopy offers certain advantages over other approaches to evaluating swallowing. For example, the equipment is portable so that the examination can be done at bedside in a hospital; there is no exposure to x-rays and no need to use barium products; and it is possible to see structural and color abnormalities. In addition, the procedure can often be performed by a speech-language pathologist without the direct oversight of a physician or the aid of other health care professionals. The speech-language pathologist can also observe the client eat an entire meal at the client’s usual pace. A disadvantage of endoscopy for evaluation of swallowing is that there are some clients who cannot tolerate the procedure, including those with structural abnormalities such as a deviated nasal septum, certain movement disorders, bleeding disorders, and certain cardiac conditions. Furthermore, it is sometimes difficult to detect penetration and aspiration with endoscopy, although there are techniques to get around this.

Client Self-Report An important form of measurement, especially in clinical settings, is the client self-report. The client selfreport can reveal symptoms (e.g., pain during swallowing, lump in the throat, difficulty swallowing certain foods) that indicate the need to perform instrumental evaluation of swallowing using measurement procedures such as those just described. One way to glean information about the client’s perspective of the swallowing problem is by using an unstructured interview. An alternative or complementary approach is to use a more formal, symptom-specific assessment tool. There are several to choose from, one of which is called the Eating Assessment Tool-10 (EAT-10; Belafsky et al., 2008). The EAT-10 contains several statements, such as “My swallowing problem has caused me to lose

288

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Figure 20–6.  Fiberoptic endoscope used for evaluation of swallowing. Reproduced with permission provided courtesy of KayPENTAX, Montvale, NJ.

weight” and “I become short of breath when I eat,” that the client rates on a scale ranging from “No problem” to “Severe problem.” Research has shown a good correspondence between the symptoms reported on the EAT-10 and the identification of dysphagia-related signs obtained from measures in patients with head and neck cancer (Arrese, Carrau, & Plowman, 2017) and in amyotrophic lateral sclerosis (Plowman et al., 2015).

Health Care Team for Individuals With Swallowing Disorders Evaluation and management of dysphagia (swallowing disorder) are a large part of the clinical practice of speech-language pathology. Although there are cases of functional dysphagia (wherein there is no known physical reason for the dysphagia) (Baumann & Katz, 2016), dysphagia usually has an identifiable structural, neurogenic, and/or systemic cause. Structural causes

include (among others) tumors, diverticula (abnormal pouches in the wall of a structure), deformation caused by surgical removal of tissue or trauma (e.g., as in surgical removal of tissue in treatment of head and neck cancers), congenital malformations, and tracheostomy (a surgically created opening at the front of the neck). Dysphagia can also have neurogenic causes such as stroke, degenerative diseases (such as Parkinson’s disease), and traumatic brain injury. Evaluation and management of swallowing disorders often require a team of health care professionals including a speech-language pathologist, radiologist, gastroenterologist, otolaryngologist, dietitian, occupational therapist, and others, depending on the nature of the swallowing disorder. The speech-language pathologist is responsible for the evaluation and behavioral management of oropharyngeal dysphagia (swallowing disorders involving the oral preparatory, oral transport, and pharyngeal phases). Usually the speech-language pathologist is asked by a physician to evaluate a client with a potential swallowing disorder. The speech-language pathol-

20 Swallowing

GERD and LPR Your stomach is rich with chemicals that have about the same acidity as the battery acid in your car. That’s right, that’s the same battery acid that will burn a hole in your clothes if you splash some of it on you. GERD, an acronym for gastroesophageal reflux disease, is a chronic condition in which acid from the stomach backs up into the esophagus when the lower esophageal sphincter (the valve that separates the esophagus and stomach) fails to do its job properly. Although a certain amount of reflux (backflow) from the stomach into the esophagus is considered normal, too much can cause heartburn and the need to see a gastroenterologist (GI doctor). When stomach acid travels all the way through the esophagus and spills onto the larynx it is called laryngopharyngeal reflux, or LPR. LPR can irritate and erode laryngeal tissue. LPR can cause a hoarse voice, chronic cough, frequent throat clearing, and other problems that may lead to the need to seek help from an otolaryngologist (ENT doctor). Some helpful hints for avoiding GERD and LPR: do not stuff yourself before you go to bed, lay off foods that make it worse, and sleep with your body inclined so that your head is higher than your feet.

ogist may begin by performing a bedside swallowing evaluation, which includes a case history interview, a physical examination of the swallowing structures, and visual, auditory, and tactile observation of the client during swallowing of water and possibly other substances. If a problem is suspected, the speech-language pathologist will perform a videofluoroscopic swallowing examination in collaboration with a radiologist. During the swallow study, the speech-language pathologist may screen for esophageal problems, and if any are noted, a gastroenterologist is notified. Alternatively, a fiberoptic endoscopic evaluation of swallowing may be conducted, a procedure that can usually be performed by the speech-language pathologist independently. Behavioral management might include the teaching of postural strategies to improve swallowing, diet (consistency) recommendations, therapeutic exercises (to improve strength and coordination of swallow-related structures), and counseling regarding the swallowing disorder. The radiologist has a limited, but critical, role in the evaluation of swallowing. Specifically, the radi-

289

ologist is responsible for the instrumental aspects of videofluoroscopic swallow (modified barium swallow) studies and barium swallow studies and, in some instances, may help in their interpretation.

Chapter Summary Eating and drinking involve intricately coordinated actions of the lips, mandible, tongue, velum, pharynx, larynx, esophagus, and other structures. The term swallowing is as a synonym for deglutition and is used as such in this chapter, although swallowing technically involves only part of the deglutition process. Many of the important anatomical and physiological components of the speech production apparatus, discussed in Chapter 10 of this text, are also important anatomical and physiological components of the swallowing apparatus. The stomach is a liter-sized sac whose upper end connects to the esophagus and whose lower end connects to the small intestine through the pyloric sphincter. The forces and movements associated with the act of swallowing can be categorized in four phases that include an oral preparatory phase, oral transport phase, pharyngeal phase, and esophageal phase. The oral preparatory phase involves taking liquid or solid substances in through the oral vestibule and manipulating them within the oral cavity to prepare the bolus (liquid volume or lump of solid) for passage. The oral transport phase involves moving the bolus (or a part of it) through the oral cavity toward the pharynx by rearward propulsion due largely to a squeezing action of the tongue, against the palate, that moves from front to back of the oral cavity. The pharyngeal phase is usually “triggered” when the bolus passes the junction of the oral cavity and pharynx, and consists of a combination of compressive actions that force the bolus downward toward the esophagus while at the same time protecting the lower airways. The esophageal phase begins when the bolus enters the esophagus and continues as the bolus is moved toward the stomach by a series of peristaltic waves of muscular contraction and relaxation that progress down the muscular tube. Although it is convenient to describe swallowing as four discrete phases, the reality is that there is substantial overlap among the phases. Structures within the brainstem exert neural control over the automatic aspects of swallowing (i.e., the pharyngeal and esophageal phase), and structures

290

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

within the cerebral hemispheres exert neural control over the voluntary aspects of swallowing (i.e., the oral preparatory and oral transport phases). Characteristics of the bolus can influence the swallowing pattern, including bolus consistency and texture, volume, and taste. The development of swallowing from infancy is rapid and complex and moves through different sucking and chewing patterns toward adult-like eating and drinking behaviors, and carries with it important developmental processes related to social and emotional development. The effect of aging on the swallow is an overall slowing and a subtle deterioration in the spatial and temporal coordination among certain structures of the swallowing apparatus. Several ways to measure and analyze swallowing are available and include videofluoroscopy (which uses x-ray to image all phases of swallowing), and endoscopy (which uses a flexible endoscope to image the pharyngeal and laryngeal regions); client self-report (which provides insight into the client’s experiences with swallowing) is also a useful measure that helps determine if further instrumental testing is warranted. Some of the more important health care professionals who work with clients with dysphagia (impaired swallowing) include speech-language pathologists, radiologists, gastroenterologists, otolaryngologists, dieticians, and occupational therapists.

References Arrese, L., Carrau, R., & Plowman, E. (2017). Relationship between the Eating Assessment Tool-10 and objective clinical ratings of swallowing function in individuals with head and neck cancer. Dysphagia, 32, 83–89. Arvedson, J., & Brodsky, L. (2002). Pediatric swallowing and feeding: Assessment and management (2nd ed.). Clifton Park, NY: Thomson Learning (Singular Publishing Group). Baumann, A., & Katz, P. (2016). Functional disorders of swallowing. Handbook of Clinical Neurology, 139, 483–488. Belafsky, P., Mouadeb, D., Rees, C., Pryor, J., Postma, G., Allen, J., & Leonard, R., (2008). Validity and reliability of the Eating Assessment Tool (EAT-10). Annals of Otology, Rhinology and Laryngology, 117, 919–924. Bosma, J. (1986). Development of feeding. Clinical Nutrition, 5, 210–218. Bosma, J., Truby, H., & Lind, J. (1965). Cry motions of the newborn infant. Acta Paediatrica Scandinavica, 163, 63–91. Chi-Fishman, G., & Sonies, B. (2002). Effects of systematic bolus viscosity and volume changes on hyoid movement kinematics. Dysphagia, 17, 278–287. Cock, C., Jones, C., Hammer, M., Omari, T., & McCulloch, T. (2017). Modulation of upper esophageal sphincter (UES)

relaxation and opening during volume swallowing. Dysphagia, 32, 216–224. Cook, I., Dodds, W., Dantas, R., Kern, M., Massey, B., Shaker, R., & Hogan, W. (1989). Timing of videofluoroscopic, manometric events, and bolus transit during the oral and pharyngeal phases of swallowing. Dysphagia, 4, 8–15. Cook, I., Weltman, M., Wallace, K., Shaw, D., McKay, E., Smart, R., & Butler, S. (1994). Influence of aging on oralpharyngeal bolus transit and clearance during swallowing: Scintigraphic study. American Journal of Physiology, 266, G972–G977. Corbin-Lewis, K., & Liss, J. (2015). Clinical anatomy and physiology of the swallow mechanism. Independence, KY: Cengage Learning. Daggett, A., Logemann, J., Rademaker, A., & Pauloski, B. (2006). Laryngeal penetration during deglutition in normal subjects of various ages. Dysphagia, 21, 270–274. Daniels, S., Corey, D., Hadskey, L., Legendre, C., Priestly, D., Rosenbek, J., & Foundas, A. (2004). Mechanism of sequential swallowing during straw drinking in healthy young and older adults. Journal of Speech, Language, and Hearing Research, 47, 33–45. Dodds, W., Hogan, W., Reid, D., Stewart, E., & Arndorfer, R. (1973). A comparison between primary esophageal peristalsis following wet and dry swallows. Journal of Applied Physiology, 35, 851–857. Dodds, W., Taylor, A., Stewart, E., Kern, M., Logemann, J., & Cook, I. (1989). Tipper and dipper types of oral swallows. American Journal of Roentgenology, 153, 1197–1199. Dua, K., Ren, J., Bardan, E., Xie, P., & Shaker, R. (1997). Coordination of deglutitive glottal function and pharyngeal bolus transit during normal eating. Gastroenterology, 112, 73–83. Engelen, L., Fontijn-Tekamp, A., & van der Bilt, A. (2005). The influence of product and oral characteristics on swallowing. Archives of Oral Biology, 50, 739–746. Green, J., Moore, C., Ruark, J., Rodda, P., Morvee, W., & VanWitzenburg, M. (1997). Development of chewing in children from 12 to 48 months: Longitudinal study of EMG patterns. Journal of Neurophysiology, 77, 2704–2716. Hiiemae, K., & Palmer, J. (1999). Food transport and bolus formation during complete feeding sequences on foods of different initial consistency. Dysphagia, 14, 31–42. Hiraoka, T., Palmer, J., Brodsky, M., Yoda, M., Inokuchi, H., & Tsubahara, A. (2017). Food transit duration is associated with the number of stage II transport cycles when eating solid food. Archives of Oral Biology, 81, 186–191. Humphrey, T. (1970). Reflex activity in the oral and facial area of the human fetus. In J. Bosma (Ed.), Second symposium on oral sensation and perception (pp. 195–233). Springfield, IL: Charles C. Thomas. Im, I., Kim, Y., Oommen, E., Kim, H., & Ko, M. (2012). The effects of bolus consistency in pharyngeal transit duration during normal swallowing. Annals of Rehabilitation Medicine, 36, 220–225. Inamoto, Y., Saitoh, E., Okada, S., Kagaya, H., Shibata, S., Ota, K., . . . Palmer, J. (2013). The effect of bolus viscosity on laryngeal closure in swallowing: Kinematic analysis using 320-row area detector CT. Dysphagia, 28, 33–42.

20 Swallowing

Jadcherla, S. (2017). Advances with neonatal aerodigestive science in the pursuit of safe swallowing in infants: Invited review. Dysphagia, 32, 15–26. Kahrilas, P., & Logemann, J. (1993). Volume accommodation during swallowing. Dysphagia, 8, 259–265. Langmore, S., Schatz, K., & Olson, N. (1988). Fiberoptic endoscopic evaluation of swallowing safety: A new procedure. Dysphagia, 2, 216–219. Leonard, R., & McKenzie, S. (2006). Hyoid-bolus transit latencies in normal swallow. Dysphagia, 21, 183-190. Linden, P., Tippett, D., Johnston, J., Siebens, A., & French, J. (1989). Bolus position at swallow onset in normal adults: Preliminary observations. Dysphagia, 4, 146–150. Logemann, J. (1998). Evaluation and treatment of swallowing disorders (2nd ed.). Austin, TX: Pro-Ed. Logemann, J., Boshes, B., Blonsky, E., & Fisher, H. (1977). Speech and swallowing evaluation in the differential diagnosis of neurologic disease. Neurologia, Neurocirugia, and Psiquiatria, 18(2–3 Suppl.), 71–78. Logemann, J., Pauloski, B., Rademaker, A., Colangelo, L., Kahrilas, P., & Smith, C. (2000). Temporal and biomechanical characteristics of oropharyngeal swallow in younger and older men. Journal of Speech, Language, and Hearing Research, 43, 1264–1274. Logemann, J., Pauloski, B., Rademaker, A., & Kahrilas, P. (2002). Oropharyngeal swallow in younger and older women: Videofluoroscopic analysis. Journal of Speech, Language, and Hearing Research, 45, 434–445. Malandraki, G., Perlman, A., Karampinos, D., & Sutton, B. (2011). Reduced somatosensory activations in swallowing with age. Human Brain Mapping, 32, 730–743. Martin, B., Logemann, J., Shaker, R., & Dodds, W. (1994). Coordination between respiration and swallowing: Respiratory phase relationships and temporal integration. Journal of Applied Physiology, 76, 714–723. Martin-Harris, B. (May 16, 2006). Coordination of respiration and swallowing. GI Motility Online. doi:10.1038/gimo10 Martin-Harris, B., Brodsky, M., Michel, Y., Ford, C., Walters, B., & Heffner, J. (2005). Breathing and swallowing dynamics across the adult lifespan. Archives of OtolaryngologyHead and Neck Surgery, 131, 762–770. Martin-Harris, B., Brodsky, M., Price, C., Michel, Y., & Walters, B. (2003). Temporal coordination of pharyngeal and laryngeal dynamics with breathing during swallowing: Single liquid swallows. Journal of Applied Physiology, 94, 1735–1743. Martin-Harris, B., Michel, Y., & Castell, D. (2005). Physiologic model of oropharyngeal swallowing revisited. Otolaryngology-Head and Neck Surgery, 133, 234–240. McFarland, D., & Lund, J. (1995). Modification of mastication and respiration during swallowing in the adult human. Journal of Neurophysiology, 74, 1509–1517. Mendell, D., & Logemann, J. (2007). Temporal sequence of swallow events during the oropharyngeal swallow. Journal of Speech, Language, and Hearing Research, 50, 1256– 1271. Mikushi, S., Seki, S., Brodsky, M., Matsuo, K., & Palmer, J. (2014). Stage I intraoral food transport: Effects of food con-

291

sistency and initial bolus size. Archives of Oral Biology, 59, 379–385. Miller, J., Sonies, B., & Macedonia, C. (2003). Emergence of oropharyngeal, laryngeal and swallowing activity in the developing fetal upper aerodigestive tract: An ultrasound evaluation. Early Human Development, 71, 61–87. Nishikubo, K., Mise, K., Ameya, M., Hirose, K., Kobayashi, T., & Hyodo, M. (2015). Quantitative evaluation of agerelated alteration of swallowing function: Videofluoroscopic and manometric studies. Auris Nasus Larynx, 42, 134–138. Nishino, T., Yonezawa, T., & Honda, Y. (1985). Effects of swallowing on the pattern of continuous respiration in human adults. American Review of Respiratory Disease, 132, 1219–1222. Omari, T., Jones, C., Hammer, M., Cock, C., Dinning, P., Wiklendt, L., . . . McCulloch (2016). Predicting the activation states of the muscles governing upper esophageal sphincter relaxation and opening. American Journal of Physiology — Gastrointestinal and Liver Physiology, 310, G359–G366. Palmer, J., & Hiiemae, K. (2003). Eating and breathing: Interactions between respiration and feeding on solid food. Dysphagia, 18, 169–178. Palmer, J., Rudin, N., Lara, G., & Crompton, A. (1992). Coordination of mastication and swallowing. Dysphagia, 7, 187–200. Perlman, A., Ettema, S., & Barkmeier, J. (2000). Respiratory and acoustic signals associated with bolus passage during swallowing. Dysphagia, 15, 89–94. Perlman, A., Palmer, P., McCulloch, T., & VanDaele, D. (1999). Electromyographic activity from human laryngeal, pharyngeal, and submental muscles during swallowing. Journal of Applied Physiology, 86, 1663–1669. Perlman, A., Schultz, J., & VanDaele, D. (1993). Effects of age, gender, bolus volume, and bolus viscosity on oropharyngeal pressure during swallowing. Journal of Applied Physiology, 75, 33–37. Plowman, E., Tabor, L., Robison, R., Gaziano, J., Dion, C., Watts, S., . . . Gooch, C. (2015). Discriminant ability of the Eating Assessment Tool-10 to detect aspiration in individuals with amyotrophic lateral sclerosis. Neurogastroenterology and Motility, 28, 85–90. Robbins, J., Hamilton, J., Lof, G., & Kempster, G. (1992). Oropharyngeal swallowing in normal adults of different ages. Gastroenterology, 103, 823–829. Rosenbek, J., Robbins, J., Roecker, E., Coyle, J., & Wood, J. (1996). A penetration-aspiration scale. Dysphagia, 11, 93–98. Saitoh, E., Shibata, S., Matsuo, K., Baba, M., Fujii, W., & Palmer, J. (2007). Chewing and food consistency: Effects on bolus transport and swallow initiation. Dysphagia, 22, 100–107. Selley, W., Flack, F., Ellis, R., & Brooks, W. (1989). Respiratory patterns associated with swallowing: Part I. The normal adult pattern and changes with age. Age and Ageing, 18, 168–172. Shune, S., Moon, J., & Goodman, S. (2016). The effects of age and preoral sensorimotor cues on anticipatory mouth

292

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

movement during swallowing. Journal of Speech, Language, and Hearing Research, 59, 195–205. Smith, J., Wolkove, N., Colacone, A., & Kreisman, H. (1989). Coordination of eating, drinking, and breathing in adults. Chest, 96, 578–582. Sonies, B., Parent, L., Morrish, K., & Baum, B. (1988). Durational aspects of the oral-pharyngeal phase of swallow in normal adults. Dysphagia, 3, 1–10. Steele, C., Alsanei, W., Ayanikalath, S., Barbon, C., Chen, J., Cichero, J., . . . Wang, H. (2015). The influence of food texture and liquid consistency modification on swallowing physiology and function: A systematic review. Dysphagia, 30, 2–26. Steele, C., & van Lieshout, P. (2004). Influence of bolus consistency on lingual behaviors in sequential swallowing. Dysphagia, 19, 192–206.

Steeve, R., Moore, C., Green, J., Reilly, K., & Ruark McMurtrey, J. (2008). Babbling, chewing, and sucking: Oromandibular coordination at nine months. Journal of Speech, Language, and Hearing Research, 51, 1390–1404. Tasko, S., Kent, R., & Westbury, J. (2002). Variability in tongue movement kinematics during normal liquid swallow. Dysphagia, 17, 126–138. Tracy, J., Logemann, J., Kahrilas, P., Jacob, P., Kobara, M., & Krugler, C. (1989). Preliminary observations on the effects of age on oropharyngeal deglutition. Dysphagia, 4, 90–94. Ulysal, H., Kizilay, F, Ünal, A., Güngor, H., & Ertekin, C. (2013). The interaction between breathing and swallowing in healthy individuals. Journal of Electromyography and Kinesiology, 23, 659–663. Witcombe, B., & Meyer, D. (2006). Sword swallowing and its side effects. British Medical Journal, 333, 1285–1287.

21 Hearing Science I: Acoustics and Psychoacoustics Introduction Acoustics is the science of sound. The study of sound is relevant to Communication Sciences and Disorders because speech is produced as an acoustic signal, and hearing uses acoustic signals as “data.” The term “acoustic signal” in this text refers to a disturbance in air pressure, created by a vibrating source. We are specifically interested in acoustic signals that fall within the frequency range of human hearing. Hixon, Weismer, and Hoit (2020) and Kramer and Brown (2019) present more detailed information on basic acoustics, and the Internet is an endless source of outstanding websites on acoustics, many of which include animations of sound wave events. This chapter covers the transmission of sound in air. Sound is a disturbance in air pressures but can also be a disturbance in the molecules of fluid or of solids. This disturbance is the bunching up and spreading apart of molecules in response to an external source. The changing density of air molecules results in pressure variations. Sounds, in fact, are pressure waves. When the author was a child, his family made regular summer trips to a lake in New Jersey, where

he and his brothers would amuse themselves by each taking two rocks, submerging underwater at opposite ends of the lake, and sending Morse code signals by banging the rocks together. The impact of the rocks created a pressure disturbance in the water — a sound wave — that was transmitted rapidly and heard clearly underwater, at a substantial distance across the lake. A more dramatic example is putting an ear to a railroad track and hearing an approaching train even though it is more than a mile away. The train wheels vibrate the steel track, setting up a pressure disturbance in steel track molecules. This disturbance, or pressure wave, is transmitted down the rails to the person whose ear in on the track. The difference in speed of sound wave transmission is determined by the relative densities of the molecules in the different media. All other things being equal, the denser (more highly packed) the molecules, the faster is the speed of sound transmission. Air has less dense molecules than water, which has less dense molecules than steel. Among these three sound-conducting media, therefore, steel conducts sound waves at the greatest speed and over the greatest distances.

293

294

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Inherent Forces, Constant Motion Air molecules can, in theory, sustain indefinitely an oscillation like the one shown in Figure 21–1. Air molecules, like springs, are elastic. They have a rest position (Figure 21–1, time 1), and when they are stretched away from that position (Figure 21–1, time 2), they generate a recoil force to get back to the rest position. Air molecules also have mass, which means they demonstrate the force of inertia. When after being stretched the molecule recoils back to the rest position, it is moving quickly and cannot “stop on a dime” at the rest position (“A body in motion tends to stay in motion,” courtesy of Sir Isaac Newton); the molecule moves through the rest position (Figure 21–1, time 3, same as the initial rest position) and is again stretched away from it (Figure 21–1, time 4), building up recoil force to spring back to the rest position (Figure 21–1, time 5). The

Oscillation The motion depicted in Figure 21–1 is called an oscillation. The back-and-forth motion repeats itself over time. The repetition of this motion over time allows it to be described according to its period (and its inverse, frequency) and amplitude. In fact, such oscillations are

forces of elasticity and inertia are intrinsic to the air molecule, meaning the molecule itself, or more precisely its motions, produce the forces and keep the molecule moving without the assistance of any “external” forces other than the initial stretch. Start the molecule moving, and in theory, it will oscillate forever under its own power. In the real world, motions of air molecules are opposed by external forces, the most common one being friction. Friction is a force that opposes the movement of molecules and generates heat as a product of this opposition. When molecules in motion rub against each other or against surfaces of containers, walls, or human tissue, they generate heat. The intrinsic forces of elasticity and inertia are degraded, or run down, by the generation of heat; when the forces of friction overcome the forces of elasticity and inertia, the motion stops.

referred to as periodic vibrations because of their repetitive nature. Imagine that we timed the motion of the molecule in 21–1 from the original position shown at time 1 through one complete back-and-forth oscillation to the position shown at time 5. The time taken for one complete cycle of this back-and-forth motion is called

Time 1 Time 2 Time 3 Time 4 Time 5 Figure 21–1.  Motions of a single air molecule at five consecutive instants in time after the molecule is “bumped” by some unknown force. The arrow at time 1 indicates the force moving the molecule to the right, and all subsequent arrows indicate the direction of the resulting motion of the molecule. The motions at different times are shown on separate rows for clarity, but the path of the motion really occurs in a single dimension, back and forth along the same path.

21  Hearing Science I:  Acoustics and Psychoacoustics

the period of vibration and is symbolized by the letter T. T is expressed in seconds (sec) or milliseconds (msec; 1 msec = 0.001 sec). An oscillation with T = 0.001 sec means that the motion from time 1 to time 5 is completed in 1/1,000th of a second. Periodic motion can also be described in terms of frequency. Frequency (symbolized as F) is the number of complete cycles of oscillation that occur in 1 sec. Frequency and period are the inverse of each other — if you know one, you know the other. For example, if the complete cycle of molecule movement shown in Figure 21–1 has a duration of 0.001 sec (T = 1 msec), then F = 1/0.001 = 1,000 cycles per second (cps), or 1000 hertz (Hz). The human ear responds to a large range of frequencies, from a low frequency of about 20 Hz (T = 1/F, = 1/20 = 0.050 sec, or 50 msec) to a high frequency of about 20,000 Hz (T = 1/20,000 = 0.00005 sec or 0.05 msec). Even in the case of the lowest frequency, the time taken for one complete cycle, 0.05 sec, is merely a fraction of a full second. Acoustic oscillations occur very rapidly.

Waveform A waveform is the amplitude of vibration as a function of time, as shown on the left side of Figure 21–2; amplitude is on the y-axis, and time is on the x-axis.

The waveform in Figure 21–2 is called a sine wave. A sine wave is a sound composed of a single period, and, therefore, only a single frequency. In Figure 21–2, left, the period is marked as T, and the value of the period is 0.008 sec, or 8 msec. T is the inverse of frequency, so the frequency of this sine wave can be calculated: F = 1/0.008 = 125 Hz.

Spectrum The spectrum is the amplitude of vibration plotted as a function of frequency; amplitude is on the y-axis, and frequency is on the x-axis (Figure 21–2, right). This spectrum has a single frequency component at 125 Hz, as expected from a sine wave with a period of 0.008 sec.

Waveform and Spectrum The waveform and spectrum are two different ways to represent the same event. A waveform is the time representation (“time domain” in Figure 21–2) of an acoustic event; a spectrum is the frequency representation (“frequency domain” in Figure 21–2) of the event. The difference between the two is conveyed by the x-axis of the two representations: time for the waveform, and

Frequency Domain

Relative Amplitude (dB)

Relative Amplitude

Time Domain

T = 0.008 sec

0

2

4

Time (msec)

6

8

295

125

250

375

500

Frequency (Hz)

Figure 21–2.  Left, molecule motion from Figure 21–1 replotted as a sine wave. The graph is a waveform, with amplitude (displacement) on the y-axis and time on the x-axis; right, spectrum of the waveform in the left side of the figure. In the spectrum, amplitude is on the y-axis, and frequency is on the x-axis.

296

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

frequency for the spectrum. Both representations have amplitude on the y-axis.

Complex Periodic Acoustic Events A spectrum with a single frequency event is called a pure tone. Chapter 23 describes how pure tones are used as one way to evaluate hearing. Most sounds in nature, however, including speech, consist of many different frequencies having many different amplitudes. Sound waves that are made up of many different frequencies are called complex acoustic events. They are like a collection of single frequencies that are all added together. Even when an acoustic event consists of many different frequencies, its waveform can still repeat itself over time. That is to say, complex acoustic events, like sine waves, can be periodic. Figure 21–3, left, shows a waveform (top) and spectrum (bottom) for the vowel “ah”. The frequency scale for this spectrum is marked off in kilohertz (kHz), meaning “1” is 1000 Hz, “2” is 2000 Hz, and so forth.

Note first the shape of the waveform: it is much more complicated compared with the shape of the sine wave in Figure 21–2. That is because this is a complex periodic waveform, reflecting a sound with many different frequencies and amplitudes. Second, the waveform has a clearly repeating pattern — it is periodic. The period (T) for one of the cycles is marked on the waveform. Third, the spectrum contains many sharply defined frequency components (the series of closely spaced amplitude peaks along the frequency scale (x-axis). These multiple-frequency components vary greatly in their amplitude (the height of the peaks on the y-axis).

Complex Aperiodic Acoustic Events Complex acoustic signals can also be aperiodic. Like complex, periodic acoustic events, complex aperiodic signals contain more than one frequency. Unlike complex periodic events, the waveforms of complex periodic events do not repeat over time. That is why they are called aperiodic acoustic events.

7.99 ms

Time

Time

10dB

Relative amplitude (dB)

Relative amplitude (dB)

10dB

1

2

3

4

5

Frequency (kHz)

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

Frequency (kHz)

Figure 21–3.  Left, a waveform (top) of an “ah” vowel and its’ spectrum (bottom). Note the repeating pattern in the waveform, which allows measurement of a period, as marked; note also the difference between the appearance of this waveform and the appearance of the sine wave waveform in Figure 21–2; the more complicated looking waveform in this figure indicates that it reflects multiple frequency components, as shown in the spectrum. This waveform and spectrum reflect a complex, periodic acoustic event. Right, a waveform of a noise like the sound of the ocean; there is no repeating pattern in this waveform, thus no period can be measured. The spectrum shows energy as a function of frequency, but not at specific “peaks” of frequency as in the spectrum on the left side of this figure. The right-hand waveform and spectrum reflect a complex, aperiodic acoustic event.

21  Hearing Science I:  Acoustics and Psychoacoustics

The right side of Figure 21–3 shows a waveform and spectrum for a sound like the ocean. The waveform does not contain a repeating pattern; therefore, a period cannot be measured. The spectrum does not contain sharply defined peaks at specific frequencies. The spectrum appears to be made up of continuous energy, rather than a series of frequency components as seen in the spectrum of the complex periodic event.

Resonance Every vibrating object has a natural frequency of vibration, also called a resonant frequency. In some cases, an object will have multiple resonant frequencies. A resonant, or natural, frequency is the frequency at which an object vibrates with maximal amplitude. The phenomenon of resonance has central importance to an understanding of both speech (see Chapter 11) and hearing (Chapter 22). For example, the resonant frequencies of the vocal tract are changed when the shape of the vocal tract is changed; this is the basis for the acoustic difference between different vowels (Chapter 11). As described later, in Chapter 22, the resonant frequency of the external ear canal, which carries sound energy from the external world to the eardrum, contributes to the sensitivity of hearing for certain important frequencies in the understanding of speech. Resonance has interesting applications to other events which may be more familiar in the popular imagination. For example, when a glass shatters in response to a singer’s high-pitched, strong note, you are seeing a particularly dramatic example of resonance. Physiological events in the singer’s speech mechanism (see Chapter 10) produce powerful sound waves, some of which have air molecule oscillations at the resonant frequency of the glass. These oscillating air molecules apply force to the glass, which responds with forceful vibration of its own because its resonant frequency has been “excited” by the same frequency of the air vibrations. The forceful vibration of the glass eventually exceeds the elastic limit1 of the material, and the glass shatters. Bridges can also be “excited” at their resonant frequency by marching soldiers. Marching is a periodic event, and if the marching frequency matches a bridge’s resonant frequency, the bridge may be set into forceful vibration and eventually collapse. This explains why marching soldiers (at least in the past) break ranks when crossing a bridge. 1

297

Tacoma Narrows Bridge In 1940, the Tacoma Narrows bridge collapsed as a response to the forces of nature. This suspension bridge, a mile-long span over the Puget Sound, connected Tacoma, Washington, with Gig Harbor, Washington. The bridge was set into motion by strong winds. (It had a previous history of vibrating before it collapsed; travelers likened crossing the bridge in a car to a roller coaster ride.) Eventually, on the fateful day, the bridge began to twist rhythmically, reaching such violent, periodic amplitudes that it eventually fell apart completely. Fortunately, no one was injured, as the rhythmic twisting grew over a period of hours, allowing people to get off the bridge. Some have claimed that the collapse occurred because the wind “excited” the bridge at its resonant frequency; the explanation is probably more complicated than that, but the resonant frequency of the bridge played a role in the collapse. Google “Tacoma Narrows” and enjoy the still photos and film clips of the bridge rolling and twisting in the wind like a child’s toy. Note the periodic twisting motions of the bridge.

The ability to “excite” objects at their resonant frequencies and cause them to shatter or disintegrate also has positive applications. A therapy for kidney stones, a condition in which a hard mass is formed somewhere in the urinary tract (which includes the kidneys), pulverizes the stones with very high-frequency sound waves which match the resonant frequencies of the stones. The stones shatter because they vibrate violently, much like the wine glass shatters when set into vibration at its resonant frequency.

Psychoacoustics Psychoacoustics is the science of the psychological response to sound. Important psychological terms for an understanding of speech and hearing are pitch, loudness, and quality. Localization, another psychoacoustic phenomenon, is not discussed in this chapter. The following discussion simplifies the relationships between physical characteristics of sound (frequency, amplitude, and complex acoustic events) and

“ Elastic limit” means the degree to which an object can be stretched before there are permanent changes in the object’s shape. In the current example, the shape of the glass material that makes up the wine glass is “changed” permanently — the glass shatters — when the vibration amplitude exceeds the elastic limit of the glass. Excessive stretch of a rubber band is another good example.

298

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

the psychological responses to them more. Detailed information on psychoacoustics is presented in Hixon, Weismer, and Hoit (2020) and Kramer and Brown (2019).

Pitch

1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8

A BCDE FGA BCDE FGA BCDE FGA BCDE FGA BCDE FGA BCDE FGA BCDE FGA BC

Pitch (musical note)

Pitch is a term generally understood as the perceived “height” of a tone. In lay conversation, pitch often refers to musical notes. The left-hand keys of a piano or organ produce lower pitches than the righthand keys. Striking keys from left to right on a keyboard produces increasingly higher-pitched notes. Not surprisingly, piano strings struck when the keys to the left of the keyboard are depressed have lower frequencies of vibration; to the right of the keyboard,

the struck strings have higher frequencies of vibration. Look inside a piano and note how the strings become increasingly thinner and shorter as you move from left to right. Thicker, longer strings have a lower frequency of vibration as compared to thinner, shorter strings. Thus, a relationship exists between frequency of vibration and perceived pitch. A general statement of the relationship between frequency of vibration and pitch perception is that pitch increases with frequency. The relationship is not simple, because equal changes in frequency do not result in equal changes in pitch. This relationship is illustrated in Figure 21–4, where a piano keyboard standing on its side with the low notes at the bottom is pictured next to a graph showing the relationship between changes in fre-

High A7 A6 A5 A4 A3 A2 A1 A0

Low

220 110 55

440

880

1760

3520

Frequency (Hz)

27.5

Figure 21–4.  Relationship between the perceptual variable pitch (y-axis) and the physical variable frequency (x-axis). The pitch axis is represented as a sequence of “A” notes from the lowest to highest on the piano keyboard. The graph shows that the perception of the pitch of adjacent octaves (e.g., A4–A5 and A5–A6), or of more separated octaves (e.g., A2–A3, and A6–A7), which sound like equivalent pitch ranges, is not matched by equivalent frequency ranges. The frequency range for octaves increases with increasingly higher octaves on the piano keyboard.

21  Hearing Science I:  Acoustics and Psychoacoustics

quency and changes in perceived pitch. All “A” notes on the piano keyboard are highlighted, ranging from the lowest-pitched A0 (frequency = 27.5 Hz) to the highest-pitched A7 (frequency = 3520 Hz). The frequency of each “A” note can be verified by the vertical, dashed lines dropped from each note to the x-axis. Most people, musicians and nonmusicians alike, know that consecutive “A”s (or any other consecutive notes such as “B”s, “C”s, and so forth) cover a range called an octave. Octaves are frequency and perceptual ranges: for a listener, the perceptual distance between A3 and A4 is the same as between A4 and A5. These two pitch distances are psychologically equivalent. The less-than-simple relationship between frequency and pitch is well illustrated by this example of consecutive musical notes of the same letter (i.e., consecutive pitches) and the frequencies of those consecutive notes. In Figure 21–4, the frequency difference between A3 and A4 is 220 Hz (A3 = 220 Hz, A4 = 440 Hz), and the frequency difference between A4 and A5 is 440 Hz (A4 = 440 Hz, A5 = 880 Hz). The frequency ranges of these consecutive A’s are different, but the pitch change associated with these two ranges is the same. Even in a comparison of the small frequency difference between A2 and A3 (110 Hz) and the large frequency difference between A6 and A7 (1760 Hz)the pitch change for the two frequency differences is the same (the two differences are associated with equivalent pitch changes). Frequency and pitch do not change in a one-to-one relationship. Sine waves (pure tones), complex periodic events (like the acoustic result of piano string vibration), and complex aperiodic events all have pitches, even if pitches of the aperiodic events may be fuzzy. The pitch of complex acoustic events is much more complicated compared with the pitch of sinusoids. The pitch of complex acoustic events is not pursued further in this chapter.

Loudness “Loudness” refers to the perceived volume of a sound. Sounds can be perceived as very soft, comfortably loud, and uncomfortably loud, as well as all degrees of loudness between these three examples. The perceptual phenomenon of loudness has a very complex relationship to the physical intensity of the sound. The measure of sound intensity can be obtained using a device called a sound level meter, 2 

299

which gives the user a decibel (dB) value for the sound’s intensity. This is a measure of the physical energy of a sound. For the purposes of this chapter, the dB scale is considered the standard measurement scale for sound intensity, and higher values are associated with greater perceived loudness. As in the case of pitch, there is not a one-to-one relationship between decibel values and perceived loudness.2 The following decibel values provide general standards for the meaning of numbers on the decibel scale. A sound intensity of roughly 30 dB is measured when, in the dead of night, you place a sound level meter in the middle of a desert, far from any city. Thirty dB is obviously very quiet. When speaking in a normal voice to a friend separated from you by 1 m, the intensity of your voice measured at your listener’s head is somewhere between 60 and 70 dB. Sound intensity at a concert involving a band with gigantic amplifiers — the kind of concert where you cannot hear what your friend is saying, even though she is standing next to you — is anywhere from 100 to 120 dB, measured in the middle of the room. A simple example of the lack of a one-to-one relationship between a physical measure of sound energy and a perceptual measure of loudness is as follows. Assume you were doing an experiment in which a participant turns a dial to adjust the loudness of a sound presented at 50 dB. The dial controls the intensity (the physical measure) of the sound, and the participant has access only to the dial — there is no access to numbers labeled on the dial with decibel values. You, as the experimenter, have that access. You ask the participant to turn the dial to double the loudness of the sound. Most participants turn the dial for a doubling of loudness to about 60 dB; they do not turn the dial to 100 dB. The loudness of a sound presented at 100 dB is many times the loudness of a 50-dB sound. As with frequency and pitch, the physical quantities (the decibel values) are not related directly to the perceptual values.

Sound Quality When someone says, “That’s an interesting sound, listen to it and tell me what you think,” they are asking for a subjective description. Psychoacousticians and speech scientists often refer to the subjective impression of an acoustic event as the quality of the sound. Quality, like pitch and loudness, can be scaled perceptually with numbers. The nature of the scaling task,

 o be precise, a sound level meter measures a quantity of sound pressure level (SPL), which varies in response to sound energy much like the term T we are using, sound intensity. The difference between the two measures is not relevant to the point being made in the text. Almost any textbook on hearing science and audiology describes the difference between sound intensity and SPL and how one measure can be derived from the other.

300

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

however, must be specified more precisely than simply, “Scale the quality of this sound.” A sound may have any number of different quality terms, such as rough, weak, shrill, boomy, piercing, thin, fat, slappy, light, and smooth, to name a few of the many, many terms people use to describe how an acoustic event “sounds.” Perceptual scaling of sound quality may involve assigning numbers to just one of these descriptions — for example, “Scale the boominess of the sounds you will hear.” Perceptual scaling of sound qualities tends to be less reliable than scaling of pitch or loudness, but it can be done and has been the object of a great deal of research in communication sciences and disorders. In the earlier discussion of pitch and loudness, these perceptual terms were connected to the physical acoustical characteristics of frequency and intensity, respectively. What is the physical basis of quality judgments? What causes a person to label one acoustic event “shrill,” another “smooth”? This is a complicated question, but we can offer a general answer as well as some more specific hints about the relationship between sound quality and physical acoustics. The general answer is that quality perception depends to a large degree on the frequency composition of an acoustic event — that is, the frequency components of the sound and their amplitudes. Sounds with a highly tonal quality, such as a piano note or a sung note by a trained singer, have mostly periodic frequency components, and little aperiodic energy. Sound with a noisy quality, like the hissing sound of a boiling tea kettle or a sustained “sh” sound as in the word “shop,” are composed mostly or completely of aperiodic energy. An extensive, probably unlimited number of sound qualities are possible, depending on the frequency components, their amplitudes, and the mix of periodic and aperiodic energy in the sound. And, the physical aspects of sound — frequency, amplitude, periodic and aperiodic energy — can vary over time, lending another dimension to sound quality. Take another look at the two spectra in Figure 21–3. They have obvious differences in frequency components and the mix of periodic and aperiodic energy, and these differences account for the vowel quality of the left-hand spectrum and the ocean-noise quality of the right-hand spectrum. A fascinating aspect of the relationship between the physical and perceptual aspects of complex sounds like the ones in Figure 21–3 is that a frequency component, or its amplitude, can be changed in very small amounts for listeners to detect a quality change. Humans are very sensitive to shades of sound quality differences.

Chapter Summary Acoustics is the science of sound; sound waves are pressure waves, initiated by a vibrating source. The periodic motion of air molecules can be used as a model for sine waves, in which the molecules oscillate back and forth. Sine waves are the most basic type of sound wave; sine waves have a single frequency. Periodic motions of sine waves can be quantified (measured) in terms of the period of vibration (the time taken to complete one complete cycle of vibration), the inverse of which is frequency (number of complete cycles per second); and in terms of amplitude, or the extent of displacement of the molecule during the vibratory motion. Acoustic events can be represented by a waveform, which is a plot of amplitude (sound energy) as a function of time, or by a spectrum, which is a plot of amplitude as a function of frequency; a waveform and a spectrum are two ways to represent the same acoustic event. Complex acoustic events are composed of many sine waves having different frequencies; complex acoustic events can be periodic or aperiodic, or a mix of periodic and aperiodic energy. Resonance is the phenomenon where an object vibrates with maximal amplitude at a specific frequency, or at multiple frequencies. Psychoacoustics is the science of the psychological reaction to acoustic events. Terms such as pitch, loudness, and quality are perceptual terms. Frequency is the primary physical (acoustic) measure related to pitch, and intensity the primary acoustic measure related to loudness. The relationships between the physical event (e.g., frequency) and its perceptual correlate (e.g., pitch) are not one-to-one; equal changes in frequency are not associated with equal changes in pitch, and equal changes in intensity are not associated with equal changes in loudness.

References Hixon, T. J., Weismer, G., & Hoit, J. D. (2020). Preclinical speech science: Anatomy, physiology, acoustics, perception (3rd ed.). San Diego, CA: Plural Publishing. Kramer, S., & Brown, D. K. (2019). Audiology: Science to practice (3rd ed.). San Diego, CA: Plural Publishing.

22 Hearing Science II: Anatomy and Physiology Introduction This chapter presents the anatomy and physiology of the auditory mechanism. Structures covered include the external ear canal, eardrum (tympanic membrane), ossicles (middle ear bones), cochlea (end organ of hearing), auditory nerve, and central auditory pathways. Abele and Wiggins (2015); Barin (2009); Goutman, Elgoyhen, and Gomez-Casati (2015); Hudpseth (2014); Lemmerling, Stambuk, Mancuso, Antonelli, and Kubilis (1997); Luers and Hüttenbrink (2016); and Olsen, Duifhuis and Steele (2012) have been used as sources for the information presented in this chapter. A solid understanding of the auditory mechanism is critical to anyone interested in pursuing a career in audiology, speech-language pathology, hearing or speech science, or any other career related to communication. Selected reasons for gaining this knowledge are the following: (a) The great majority of children learn speech and language via the auditory system, (b) an understanding of the normal structures and functions of the auditory mechanism allows an appreciation of the ways in which various diseases and conditions affect hearing, and (c) knowledge of the anatomy and physiology of the auditory mechanism is essential to understanding the design and purpose of formal tests

of hearing. The knowledge is also essential to understanding devices such as cochlear implants and hearing aids which are used to treat hearing loss. Information on diseases of the auditory system, the audiological tests to evaluate their effect on hearing, and auditory devices to treat hearing loss is presented in Chapters 23 and 24). Structures of peripheral auditory anatomy can be classified as belonging to one of three major divisions: the outer ear, the middle ear (tympanic cavity), and the inner ear. Anatomical structures of the auditory system are the same for both ears; descriptions of the structures of one ear apply equally to the structures of the other ear. “Peripheral auditory system” refers to those structures outside the central nervous system.

Temporal Bone Much of the peripheral auditory mechanism is encased in the temporal bone of the skull. Figure 22–1 shows the complex shape of the temporal bone. Figure 22–1, top, is a view of the skull from the left side of the head. The perimeter of the temporal bone is outlined in red. The temporal bone includes the bony part of the ear canal — the opening inside the pinna (the structure often referred to in lay terms as “the ear”; see below). 301

302

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Temporal bone

External ear canal

Temporal bone (petrous portion)

Figure 22–1.  Top, view of temporal bone of the skull from the left side, outlined in red; bottom, view of temporal from above, as part of the base of the skull, shaded pinkish.

Figure 22–1, bottom, shows the interior base of the skull as if the top half has been removed, with the view from above. The temporal bone extends toward the middle of the skull, forming part of the base

of the skull. The three major structures of the peripheral auditory system — the bony ear canal, the middle ear, and the inner ear — are encased within the temporal bone.

22  Hearing Science II:  Anatomy and Physiology

Peripheral Anatomy of the Ear

303

— the sheet of tissue that terminates the external auditory meatus — is also considered a structure of the outer ear. The tympanic membrane is the boundary between the outer and middle ear.

Figure 22–2 is an artist’s rendition of the peripheral anatomy of the ear. The structures are shown as if the head has been cut into front and back halves, with the front half removed. The peripheral auditory mechanism is separated into three major parts: the outer ear, the middle ear, and the inner ear. The outer ear plus middle ear are components of the conductive part of the auditory mechanism. The inner ear is the sensorineural part of the mechanism. Figure 22–3 is a schematic chart of the divisions of auditory anatomy. Both Figures 22–2 and 22–3 should be referred to frequently throughout the following sections.

Pinna (Auricle) The pinna is composed of cartilage and fat tissue. In humans, the pinna collects and directs sound energy into the external auditory meatus and toward the eardrum. Careful examination of a human’s pinna shows many creases, folds, and cavities. Anatomical characteristics of the pinna vary a good deal among individuals.

External Auditory Meatus (EAM; also called External Auditory Canal)

Outer Ear (Conductive Mechanism) The outer ear includes the pinna (auricle) and the external auditory canal, also called the external auditory meatus. Part of the tympanic membrane (eardrum) ​

The entrance to the external auditory meatus is a small opening easily seen inside the pinna. The external auditory meatus (external ear canal) is a tube extending from this opening to the tympanic membrane; meatus is

Temporal bone

Middle Ear Semicircular canals

Ossicles Incus Stapes Malleus Internal auditory meatus Pinna (Auricle)

Auditory nerve

Cochlea

Tympanic cavity External ear canal

Tympanic membrane (eardrum)

Auditory (Eustachian) tube

Inner Ear

Outer Ear Figure 22–2.  Coronal-plane view (head cut in front and back halves, view from the front) showing structures of the peripheral auditory system.

304

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Outer Ear • Pinna • EAM

Middle Ear • Eardrum

Conductive mechanism

Getting a Boost

• Ossicles - malleus - incus - stapes

Inner Ear • Vestibular Apparatus • Cochlea - basilar membrane - organ of Corti

canal to the tympanic membrane. The sound energy is in the form of molecule-sized pressure waves (Hixon, Weismer, & Hoit, 2020). As it conducts sound energy, the external auditory meatus acts like a resonator, causing energy at certain frequencies to vibrate with greater amplitude as compared to energy at other frequencies.

Sensorineural mechanism

• Auditory nerve

Figure 22–3.  The anatomical components of the outer, middle, and inner ear, and their relationship to the functional distinction between conductive versus sensorineural auditory components.

a Latin term meaning “opening” or “canal.” In adults, the external auditory meatus is roughly 2.5 cm in length and 0.7 cm in diameter. These dimensions vary quite a bit across individuals. The external auditory meatus is not a straight, level tube. Rather it runs slightly “uphill” (see Figure 22–2) and between the opening to the canal and the tympanic membrane has a small bend or kink in the direction of the back of the head. (The bend cannot be seen in Figure 22–2 because the frontal view does not provide depth perception.) You may have noticed that your medical practitioner, when inserting an otoscope (ear scope) into your ear canal to examine the canal and view your tympanic membrane, gently pulls the pinna up and toward the back of the head. The practitioner does this to straighten out the natural “kink” in the tube for a more direct view of the tympanic membrane. The outer part of the external auditory meatus is surrounded by cartilage. Near the bend in the external auditory meatus, the surrounding walls change from cartilage to bone. The remaining length of the ear canal, from bend to eardrum, is encased by part of the temporal bone. The external auditory meatus ends as a closed tube at the tympanic membrane. The primary auditory role of the external auditory meatus is to conduct sound energy through the ear

The resonant frequency of the external auditory meatus is approximately 3300 Hz. What does this mean? When sound waves travel through the canal, sound energy at 3300 Hz and nearby frequencies are amplified by the canal’s resonant frequency. Frequencies much lower or higher than 3300 Hz are amplified less. The resonant frequency of the ear canal contributes in a significant way to hearing in humans. For example, the human ear is most sensitive to sound energy at and near 3300 Hz. Second, human speech contains important sound energy over a range of frequencies around 3300 Hz, making the resonant frequency of the ear canal particularly useful for accurate perception of speech sounds.

The external auditory meatus also serves a protective function. Glands in the cartilaginous walls of the external auditory meatus secrete cerumen (earwax) which presents a barrier to foreign objects (insects, for example) and may also block the movement of bacteria or fungal agents toward the tympanic membrane. The kinked tube of the external auditory meatus is also a barrier to foreign objects moving easily from the outer ear to the delicate tissues of the tympanic membrane.

Tympanic Membrane (Eardrum) The tympanic membrane, or eardrum (Figure 22–4), is the boundary between the outer and middle ear. The circular perimeter of the tympanic membrane is linked to a bony foundation via a cartilage-ligamentous ring called the annulus. The annulus fits into a small circular, bony depression at the boundary between the outer and middle ear to fix the tympanic membrane in place. Figure 22–4 is a photograph of the right tympanic membrane as seen through an otoscope. The otoscope has a viewing lens and a light source to illuminate the ear canal and the tympanic membrane. As a result of the shallow conical shape of the tympanic membrane and its tilt with the lower half further from the scope

22  Hearing Science II:  Anatomy and Physiology

Incus

305

able bones (ossicles), ligaments, two muscles, nerves, and a bony opening to a tube that leads to the top part of the throat.

Pars flaccida of eardrum

Ossicles

Middle Ear

The ossicles are the three smallest bones in the human body. They extend across the middle ear cavity from the tympanic membrane to the oval window of the cochlea. These bones are so tiny they can be placed on a penny and occupy no more than the bottom half of the coin. The malleus (hammer) is the ossicle attached to the tympanic membrane; the stapes (stirrup) is the ossicle attached to the cochlea (part of the inner ear); and the incus (anvil) is the middle bone, linking the malleus to the stapes. The three connected ossicles are together called the ossicular chain. The malleus has a handle-like part called the manubrium, which is attached at its lower end to the tympanic membrane. The lowest attachment point of the manubrium to the tympanic membrane is called the umbo and can be seen through the translucent tissue of the tympanic membrane when viewed through an otoscope (see Figure 22–4). The top of the malleus has a knobby bump that fits into a cup-like depression at the top of the incus. This is where the connection between the two bones is made. A long, bony limb descends from the top of the incus and forms a hook at its bottom. This hook fits into the neck of the stapes (see Figure 22–2, pointer to the stapes, at its neck), connecting the two bones. The stapes, the smallest of the ossicles, has a neck (mentioned in the previous paragraph) and two arches that project from it. The base of each arch is attached to the footplate of the stapes, which is oval shaped. The footplate of the stapes fits into an oval window cut into the bony casing of the inner ear. The footplate is held in that window by a fibrous ligament. The ossicles function to transmit sound energy from the tympanic membrane to the footplate of the stapes, and eventually to the fluid-filled cochlea. This function can be explained as a series of events beginning with sound wave energy in the air (pressure wave), conversion of the pressure wave to mechanical energy in the form of vibration of the eardrum and ossicles with the final mechanical energy taking the form of vibratory movement of the footplate of the stapes, in and out of the oval window. The vibratory movement of the stapes displaces fluid in the cochlea, thus the mechanical energy is converted to fluid energy.

The middle ear is an air-filled cavity surrounded by bone. Its complexly shaped volume contains tiny, mov-

1. Sound vibrations in the air enter the ear canal and initiate vibration of the tympanic membrane.

Stapes Malleus (handle) Umbo Light reflex Annulus

Figure 22–4.  View of the right tympanic membrane as seen through an otoscope.

than the upper half, a “light reflex” (also called a “cone of light”) reflects off the normal tympanic membrane. The light reflex is directed forward, downward, and to the right. The light reflex is directed to the left when viewing the left tympanic membrane. Middle ear bones (ossicles) can be seen through the translucent membrane. In the otoscopic view, the tympanic membrane looks more or less flat — not conical, not tilted along its top-to-bottom axis relative to the observer. The cone shape and tilt are seen in the artist’s cross-sectional rendition of the tympanic membrane in Figure 22–2. The tympanic membrane has the shape of a flattened bowl with a conical base that points into the middle ear cavity. The tympanic membrane is tiny but tough. The membrane has a diameter of roughly 8 to 10 mm and a surface area of about 55 mm2. The thickness of the membrane is little more than one-tenth of a millimeter (0.0001 meters). The tympanic membrane is composed of three tissue layers. The middle layer is the one most sensitive to sound waves, and because of its tissue makeup, which is different from the two layers that sandwich it, is exceptionally strong.

306

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

2. Vibration of the tympanic membrane is transferred to the malleus, to which it is attached. 3. The sound vibration is transmitted via the connected ossicles to the footplate of the stapes. 4. The footplate of the stapes moves in and out of the oval window; behind the oval window is the fluid-filled cochlea, so the vibratory movement of the footplate of the stapes displaces cochlear fluid (see later in this chapter).

Ligaments and Muscles of the Middle Ear The ossicles are anchored to the walls of the middle ear cavity by several ligaments and two muscles. Figure 22–5, a “zoom” view of the middle ear cavity, shows three of the ligaments (short, pinkish bands of tissue), each attached to an ossicle with their other ends attached to one of the middle ear walls. In this chapter, we note the importance of these attachments as anchors for the ossicles, and their role in limiting ossicular movement when they vibrate in response to sound energy. Additional details about the middle ear ligaments are available in the publications cited in the first paragraph of the chapter.

The two muscles of the middle ear cavity are the tensor tympani muscle and the stapedius muscle (Figure 22–5). The tensor tympani muscle enters the middle ear canal and attaches to the malleus by means of a short tendon. The stapedius muscle is buried in a small bony canal in the back wall of the middle ear. It issues a tendon that enters the middle ear cavity and attaches to the neck of the stapes (Figure 22–5). Contraction of the tensor tympani muscle pulls on the malleus, retracting the tympanic membrane into the middle ear cavity. Contraction of the stapedius muscle pulls on the footplate of the stapes (by means of the stapedius tendon), away from (but not out of) its “fit” into the oval window. Contraction of either muscle stiffens the ossicles, which as discussed later, reduces the efficiency of sound energy transfer from the tympanic membrane to the footplate of the stapes. There is some debate about the role of the tensor tympani muscle in hearing, which is not presented here. The stapedius muscle, however, is known to be a key component of the acoustic reflex. The acoustic reflex is the contraction of the stapedius muscle in response to very high levels (intensities) of sound energy (see Chapter 23 for diagnostic testing for the

Superior incudal ligament Superior malleolar ligament Lateral malleolar ligament Posterior incudal ligament

Annular ligament

Anterior malleolar ligament

Tensor tympani muscle

(point of attachment)

Stapedius tendon

Figure 22–5.  Close-up (“zoom”) view of middle ear cavity, showing ossicles, muscles, and ligaments.

22  Hearing Science II:  Anatomy and Physiology

acoustic reflex). High-intensity sound vibration has the potential to drive the footplate of the stapes too forcefully into the cochlear fluid, leading to excessive fluid displacements that can damage the delicate sensory organs of hearing. Recall that contraction of the stapedius muscle pulls the footplate of the stapes away from the oval window and in so doing stiffens the ossicular chain. A stiffer ossicular chain reduces sound transmission and prevents the footplate of the stapes from too-forceful displacement into the cochlear fluid. The acoustic reflex therefore protects the cochlea from extremely high sound levels. The acoustic reflex is “wired” by a loop made up by the auditory nerve, structures in the brainstem, the facial nerve (cranial nerve VII, see Chapter 2) and the stapedius muscle. The reflex is fast: about a tenth of a second (0.1 sec) passes between the introduction into the ear of extremely high sound levels, transmission of the sound energy (in mechanical, fluid, and electrochemical forms) through the cochlea and auditory nerve to the brainstem, and a signal from the brainstem via the facial nerve to contract the stapedius muscle and exert its pull on the stapes.

Auditory (Eustachian) Tube The auditory tube (also called the Eustachian tube, after the 16th-century Italian anatomist Bartolomeo Eustachi) is shown in Figure 22–2 as a bone-encased, open tube in the lower part of the middle ear cavity; The bony tube opening near the bottom of the middle ear cavity is also seen in the “zoom” image

No Ossicles?? What would happen if we did not have ossicles? We would not be able to hear as well. Your ossicle-less ear would have an eardrum and a cochlea, separated by an air-filled middle ear cavity. Vibrations of air molecules in your outer ear would be transmitted to the air in your middle ear and strike the membrane of the oval window behind which resides the fluid (liquid) in your cochlea. Would the fluid in your cochlea be displaced by these sound waves? The answer is “yes,” but very ineffectively because sound energy in air does not push against fluid very effectively. In fact, in addition to transmitting sound energy from the tympanic membrane to the oval window, the ossicles amplify the energy by means of their anatomy. We all need our ossicles.

307

of Figure 22–5. The tube becomes cartilaginous as it extends downward and terminates in the upper part of the pharynx (throat). The auditory tube is normally closed at its pharyngeal ending; the tube is opened briefly during swallowing, chewing, and yawning. When the tube opens, it connects air in the middle ear cavity to air in the nasopharynx. Thus, intermittent opening of the pharyngeal part of the auditory tube serves to maintain middle ear pressure at normal values (i.e., at atmospheric pressure, the same pressure in the pharynx when the mouth and/or nostrils are open to atmosphere). Normal values of air pressure within the middle ear cavity are important to the health of middle-ear structures.

Inner Ear (Sensorineural Mechanism) The inner ear is encased in a complex, bony structure that is itself encased in the temporal bone. This structure is called the bony labyrinth, shown for the right ear in Figure 22–6. In this view, the semicircular canals are to the left, the vestibule in the middle, and the cochlea at the front. The vestibule joins the semicircular canals and cochlea and has the oval window as a notable landmark. All structures of the bony labyrinth are filled with fluid. The bony labyrinth is identical in the left ear. There are two openings into the bony labyrinth. The upper opening is called the oval window where the footplate of the stapes is attached. The lower opening is the round window, which is covered by a membrane similar in structure (although not identical) to the tympanic membrane. As described earlier, inward movement of the footplate of the stapes displaces fluid in the cochlea. The fluid travels like a wave through cochlear channels to the round window, which bulges when the wave arrives. The displacement of the cochlear fluid and its importance to auditory sensation are described later in this chapter.

Semicircular Canals Three fluid-filled semicircular canals comprise the leftmost structure of the bony labyrinth in the right ear (see Figure 22–6). One canal is oriented vertically, one horizontally, and one from front to back. Any one of the canals is oriented at right angles to the other two. The semicircular canals control balance. The fluid within them contains sensory organs called hair cells. Movement of the head displaces the fluid, which bends the hair cells and initiates a signal to the nervous system. The fluid displacement in the semicircular canals

308

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Cochlea Vestibule Oval window

Superior

Semi-circular canals

Lateral Posterior

Ampullae Round window

Apex Base

Figure 22–6.  The bony labyrinth (otic capsule), as if viewed from the middle ear cavity of the right ear. The semicircular canals are to the left, the vestibule in the middle, and the cochlea is to the right.

and the resulting bending of the hair cells sends signals to the brain regarding the precise location of the head in space. Balance is an important outcome of these signals.

Vestibule The vestibule contains the oval window in which the footplate of the stapes is fixed. Two structures within the vestibule, also fluid filled and with hair cells, detect motion of the head, which supplements the position detection signaled by the semicircular canals.

Cochlea The cochlea (meaning snail) is made up of many smaller structures. Among the most important of these are the scalae (plural for scala, another name for duct), the basilar membrane, and the organ of Corti, which sits atop the membrane and includes hair cells. These structures are the basis for the transformation of sound waves into neural signals. The neural signals code properties of the sound and deliver this information from the cochlea via the auditory nerve (cranial nerve VIII) to the central nervous system. The cochlea consists of three membranous, fluidfilled ducts that are coiled in a snail-shell spiral. The spiral of the membranes, like its bony casing, includes two-and-one-half turns from its base to its tip (apex). At the tip of the cochlea, the two outside ducts are connected by a small opening. This connection explains

why the fluid displacement at the oval window (pushing into the scala vestibuli) is transmitted to the scala tympani and results after a very brief delay in a bulging of the round window (the termination of the scala tympani). Figure 22–7 shows the coiled cochlea cut in several cross sections; the three ducts are visible at each cut. The footplate of the stapes fits into the oval window at the duct called the scala vestibuli. Movement of the footplate pushes into the fluid in the scala vestibuli, displacing it in the direction of the tip of the spiral. The direction of this fluid displacement is indicated by the blue arrow in Figure 22–7. The fluid displacement travels through the opening at the tip of the cochlea, from the scala vestibuli to the scala tympani (red arrow, Figure 22-7). The fluid wave travels through the scala tympani to its termination at the round window, which bulges when it is pushed by the arriving fluid wave. Each of the cross-sectional cuts in Figure 22–7 show a membrane separating the scala tympani from the third duct, called the cochlear duct (also called the scala media). The membrane is called the basilar membrane; sitting on top of the membrane is the organ of Corti. The organ of Corti contains hair cells similar to those in the semicircular canals of the vestibular system. The hair cells in the organ of Corti are bent when the fluid displacement in the scala vestibuli and scala tympani creates a sound-induced, fluid wave pattern that displaces the basilar membrane in precise ways.

22  Hearing Science II:  Anatomy and Physiology

309

Cochlea duct Scala vestibuli

Scala tympani

Cochlea Auditory nerve

Away from the tip, toward the round window Away from the oval window, toward the tip

Basilar membrane Organ of Corti

Figure 22–7.  The cochlear spiral cut in several cross-sections. The cross-section cuts through multiple turns of the cochlea. Each section of the cochlea contains three scalae, the scala vestibuli, the scala media (cochlear duct), and scala tympani.

Basilar Membrane and Organ of Corti (Sensory Mechanism) An artist’s rendition of the basilar membrane and Organ of Corti, at one “slice” through a turn in the cochlea, is shown in Figure 22–8. The organ of Corti sits atop the basilar membrane. The basilar membrane, organ of Corti, and a membrane above the hair cells (labeled “tectorial membrane”) form the sensory end organ of hearing. To get an idea of how much this image has been magnified relative to the size of the actual organs, consider that the nearly vertical structures labeled “inner hair cells” and “outer hair cells” are roughly 30 micrometers (0.000030 meters, around 1/1000th of an inch) in length and 10 micrometers in diameter. Note the single hair cell (inner hair cell) to the left of the image and the row of three hair cells (outer hair cells) to the right. These hair cells run the length of the cochlea, from base to tip. Note also the nerve fibers (shown in yellow) connected to the inner and outer

hair cells. Each hair cell in the organ of Corti, from base to tip of the basilar membrane, is attached to a nerve fiber that becomes part of the auditory nerve. The auditory nerve carries information from the cochlea to the brain, and in some cases from the brain to the cochlea (see discussion later in this chapter). Coding of Frequency Along the Basilar Membrane.  The hair cells within the organ of Corti are critical to auditory sensation, much like the rods and cones of the retina, the end organ of vision, are critical to visual sensation. Here we focus on the role of the basilar membrane and inner hair cells in coding frequency of incoming sound waves. The outer hair cells are critical to hearing as well. They control the sensitivity of the inner hair cells, allowing the detection of very soft sounds and increasing the precision of frequency analysis by the inner hair cells. The outer hair cells are not discussed further in this chapter. The basilar membrane is displaced (“deformed”) by the fluid wave within the cochlea. The precise location

310

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Outer Hair Cells

Tectorial membrane

Inner Hair Cell

Basilar Membrane Nerve fibers Figure 22–8.  The basilar membrane and the organ of Corti, at one cut along the cochlear spiral.

of maximum displacement along the basilar membrane depends on the frequency (or frequencies) of the incoming sound wave. This is because frequency analysis is arranged systematically along the basilar membrane and the hair cells in the organ of Corti. The systematic arrangement of frequency along the basilar membrane is called tonotopic representation. Tonotopic representation is illustrated in Figures 22–9 and 22–10. In Figure 22–9, the snail shell–like cochlea (top) is “unrolled” (bottom) to better explain tonotopic representation. The basilar membrane is shown in Figure 22–9, bottom, by the flat pink strip running from the base to tip of the cochlea. The arrows show the path of fluid displacement from the scala vestibuli, through the narrow opening at the tip, and then through the scale tympani in the direction of the round window. The basilar membrane is narrow at its base (the left end of the membrane in Figure 22–9) and becomes increasingly wider as it extends to the tip of the cochlea. The narrow base of the basilar membrane is very stiff, and the wide tip end of the membrane is relatively floppy. The hair cells at the base (narrow part) of the membrane are sensitive to the highest frequencies humans can hear (about 20,000 Hz). Moving from

the base toward the tip of the basilar membrane, the hair cells are sensitive to increasingly lower frequencies until at the tip they are sensitive to the lowest frequencies human can hear (about 20 Hz). Now we can make a more precise statement about tonotopic representation: frequency is represented tonotopically along the basilar membrane such that the location of a hair cell along the membrane determines its frequency sensitivity. Hair cells at the base of the basilar membrane are sensitive to the highest frequencies, and hair cells at the tip of the basilar membrane are sensitive to the lowest frequencies. Displacement of the Basilar Membrane and Frequency Analysis.  Tonotopic arrangement of the hair cells along the basilar membrane raises the question of how different locations along the membrane are stimulated when vibratory motions of the conductive mechanism are transferred to the cochlear fluid. We owe the understanding of frequency analysis in the cochlea to experiments performed by Georg von Békésy (1899–1972), a Hungarian physicist and engineer who in 1961 won the Nobel Prize for this work. Von Békésy observed that when the footplate of the stapes vibrated in response to sound energy, it pushed

22  Hearing Science II:  Anatomy and Physiology

311

Basilar membrane

Highest frequencies

Apex

Oval window Round window Base

Lowest frequencies

Figure 22–9.  Top, the cochlea as if rolled out to form a straight structure (bottom). In the bottom image, the scala vestibuli is the top duct, the scala tympani the bottom duct, and the basilar membrane is shown as the pinkish partition in the middle, narrow and stiff at the base and wide and floppy at the apex.

into the cochlea and created a fluid wave that traveled through the scala vestibuli and scala tympani. The fluid wave displaced the basilar membrane, and the location of its maximum displacement depended on the frequency of the incoming sound wave. Low frequencies resulted in a basilar membrane displacement that built up gradually and reached its peak near the tip of the membrane. In contrast, high-frequency sound energy produced displacement of the basilar membrane that built to a peak at a short distance from the oval window (that is, near the base). In his world-famous 1928 paper, von Békésy described this fluid movement in the cochlea as a traveling wave. Figure 22–10 is a schematic summary of how different frequencies of incoming sound waves result in different locations of maximum displacement along the

basilar membrane. Three unrolled cochleae (plural of cochlea) are shown, with three schematic “blips” representing maximum displacement of the basilar membrane for high (top), mid (middle), and low (bottom) frequencies. These are the wave patterns expected for single frequencies. Traveling wave patterns for sound waves made up of many different frequencies result in more complex patterns of displacement along the basilar membrane. How does displacement of the basilar membrane result in a frequency signal that is sent to the brain? The traveling wave that has maximum displacement of the basilar membrane at a specific location causes bending of the hair cells at the same location. When bent, the membranes of the hair cells change their sensitivity to certain molecules, which in turn cause their attached

312

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

High

Highest frequencies

Mid

Low Apex

Oval window Round window Lowest frequencies

Base

High

Highest frequencies

Mid

Low Apex

Oval window Round window Lowest frequencies

Base

High

Highest frequencies

Mid

Low Apex

Oval window Round window Base

Lowest frequencies

Figure 22–10.  Three unrolled cochleae, showing the location of maximum displacement along the basilar membrane for high (top), mid (middle), and low (bottom) frequencies.

nerve fibers (see Figure 22–8) to “fire” and send a signal to the brain via the auditory nerve (see Figure 22–2 for the auditory nerve emerging from the cochlea). The nerve fibers attached to the individual hair cells have the same tonotopic arrangement as the basilar mem-

brane: The fluid displacement that causes a traveling to wave to crest at a specific location, which depends on the frequency of the incoming sound wave, results in the firing of a nerve fiber that is maximally sensitive to that frequency.

22  Hearing Science II:  Anatomy and Physiology

313

Auditory Nerve and Auditory Pathways (Neural Mechanism)

problems and facial weakness may have diagnostic significance.

The auditory nerve in the peripheral auditory system, and the auditory pathways within the central nervous system, comprise the neural component of the hearing mechanism.

Auditory Pathways. The auditory pathways are structures in the nervous system that carry auditory impulses from the auditory nerve to the cortex, the highest level of the central nervous system. The auditory nerve also includes fibers carrying information from the central nervous system to the cochlea; these fibers innervate the outer hair cells. The focus in this section is on the pathways from the auditory nerve to the cortex of the cerebral hemispheres When electrical impulses are transmitted in the auditory nervous system, they travel along nerves (or tracts, as they are called in the central nervous system) and make connections (synapses) in clusters of cell bodies. These cell bodies issue another tract aimed at a different cluster of cell bodies along the pathway to the auditory cortex. The auditory pathways are more or less dedicated to transmitting information from the auditory nerve all the way to the auditory cortex. The pathway terminates in the primary auditory cortex, located on the upper lip of the temporal lobe. Like the hair cells and auditory nerve, cells within the auditory cortex are tonotopically arranged.

Auditory Nerve. Individual nerve fibers emerging from the base of the inner hair cells are gathered together and form a significant part of the auditory nerve, which is part of cranial nerve VIII. The auditory nerve travels through the internal auditory meatus, a narrow, short tunnel in the temporal bone. The nerve emerges from the tunnel and enters the central nervous system at the lower levels of the brainstem. The internal auditory meatus also contains the fibers of the other part of cranial nerve VIII — the vestibular nerve — as well as fibers of cranial nerve VII (the facial nerve). Figure 22–2 shows the cochlear and vestibular components of cranial nerve VIII and the fibers of cranial nerve VII. The close proximity of the facial nerve to the auditory nerve is significant because the facial nerve may be affected by a disease of the auditory nerve, and the combination of auditory

Tests of Hearing and Auditory Anatomy and Physiology Tests of hearing are designed based on knowledge of the auditory system. For example, in pure-tone audiometry, tones of a single frequency are used to estimate the response of the hair cells at different locations along the basilar membrane. Because of the tonotopic arrangement of the hair cells from base to apex of the cochlea, single-frequency tones allow a tester to assess the health of the hair cells very precisely at different locations throughout the cochlea. There are also techniques for assessing the stiffness of the conductive mechanism (outer and middle ear). These stiffness evaluations, all performed at the entrance to the ear canal with minimal discomfort to the person being tested, are used to diagnose many auditory disorders ranging from middle ear infections, which are very common in childhood, to possible diseases of the auditory nerve which may be reflected in poorly functioning acoustic reflexes. One more example is the use of electrodes placed on the scalp to measure the amplitudes and timing of electrical activity of cell groups within the auditory pathways of the central nervous system, as brain analysis of an acoustic signal makes its way from the auditory nerve to the auditory cortex. Clearly, audiological tests reflect an intimate knowledge of the structure and function of the auditory system. Chapter 23 presents details of these tests and their interpretation.

314

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Chapter Summary Knowledge of the structure and function of the auditory mechanism is critical to those who plan a career in communication sciences and disorders, whether the career goal is to understand (a) how children learn language, (b) how diseases affect the normal mechanism, and (c) how formal evaluations of hearing are designed and interpreted. Most of the auditory mechanism is housed within the temporal bone, a complex bone of the skull. The peripheral auditory mechanism can be subdivided into the conductive mechanism, comprising the outer and middle ear, and the sensorineural mechanism, comprising the inner ear and auditory nerve. The external auditory meatus (or external auditory canal) is a canal approximately 2.5 cm long and 0.7 cm in diameter that extends from an opening in the auricle to the tympanic membrane, in which cerumen (earwax) is produced and through which sound waves are directed to the tympanic membrane. The external auditory meatus has a resonant frequency of roughly 3300 Hz, which explains in part the very acute sensitivity of the human auditory mechanism in this frequency region. The tympanic membrane (or eardrum) is a small (about 55 mm2 area), three-layered structure located at the internal end of the outer ear; the middle tissue layer is sensitive to the very small pressure variations associated with sound waves. The middle ear is an air-filled cavity located between the outer ear and inner ear, and contains three small ossicles (bones), several ligaments, and two muscles. The ossicles are stabilized by ligaments that attach to the walls of the middle ear, and the tissue of two muscles, the tensor tympani muscle that attaches to the malleus and the stapedius muscle that attaches to the stapes; contraction of both muscles stuffen the ossicular chain when they contract. The stapedius muscle is an important component of the acoustic reflex; the muscle contracts in response to high-level sound energy, and in so doing prevents the footplate of the stapes from excessive displacement force into the cochlear fluid. The auditory tube (or Eustachian tube), a 3.8-cm (1.5-inch) tube that runs from the middle ear to the nasopharynx, is bony and open at the middle ear and cartilaginous and flexible toward the top part of the pharynx (the nasopharynx) where it is usually closed, but opens occasionally to equalize the pressure in the middle ear. The inner ear is housed within the bony labyrinth of the temporal bone and contains the semicircular

canals, vestibule, and cochlea, all structures that communicate with the central nervous system via cranial nerve VIII (auditory-vestibular nerve). Three semicircular canals, each oriented at right angles to the other two, contain hair cells that bend when the head moves and cause the vestibular part of cranial nerve VIII to fire and send information about head position and orientation to the brain. The vestibule contains the oval window as well as structures that contain hair cells that send signals to the brain about the relative position and acceleration of the head. The cochlea is the spiral-shaped end organ of hearing that converts sound into neural signals and contains many important structures including the scalae, basilar membrane and organ of Corti, and hair cells. Within the cochlea are three membranous, fluidfilled ducts called the scala vestibuli, scala media (or cochlear duct) and scala tympani, the first and last of which are connected at the top of the cochlear spiral. The scala media is separated from the scala tympani by the basilar membrane. On top of the basilar membrane sits the organ of Corti, which contains a row of inner hair cells and three rows of outer hair cells. Movement of fluid in the cochlea (caused by sound waves transmitted through the outer and middle ear, and the movement of the stapes into the oval window) deforms inner hair cells and causes them to send a signal to an attached nerve fiber, which makes the nerve fiber “fire.” Hair cells are arranged tonotopically along the basilar membrane, ranging from those that respond best to the highest frequency at its narrow base (20,000 Hz) to those that respond best to the lowest frequency at its wide apex (20 Hz). As discovered by Georg von Békésy, hair cells are stimulated by traveling waves transmitted through the cochlear fluid, the highest amplitude of which is frequency dependent, with high-frequency sounds creating waves that peak near the base of the basilar membrane and low-frequency sounds creating waves that peak near the apex of the basilar membrane. The auditory pathways consist of the auditory nerve, and the tracts and clusters of cell bodies that carry auditory signals from the brainstem to the cortex.

References Abele, T. A., & Wiggins, III, R. H. (2015). Imaging of the temporal bone. Radiological Clinics of North America, 53, 15–36.

22  Hearing Science II:  Anatomy and Physiology

Barin, K. (2009). Clinical neurophysiology of the vestibular system. In J. Katz, L. Medwetzky, R. Burkard, & L. Hood (Eds.), Handbook of clinical audiology (6th ed., pp. 431–466). Baltimore, MD: Lippincott, Williams, & Wilkins. Békésy, G. (1928). Zur Theorie des Hörens; die Schwingungsform der Basilarmembran. Physik. Zeits. 29, 793–810. Goutman, J. D., Elgoyhen, A. B., & Gomez-Casati, M. E. (2015). Cochlear hair cells: The sound-sensing machines. FEBS Letters, 589, 3354–3361. Hixon, T. J., Weismer, G., & Hoit, J. D. (2020). Preclinical speech science: Anatomy, physiology, acoustics, perception (3rd ed.). San Diego, CA: Plural Publishing.

315

Hudpseth, A. J. (2014). Integrating the active process of hair cells with cochlear function. Nature Reviews Neuroscience, 15, 600–614. Lemmerling, M. J., Stambuk, H. E., Mancuso, A. A., Antonelli, P. J., & Kubilis, P. S. (1997). CT of the normal suspensory ligaments of the ossicles in the middle ear. AJNR American Journal of Neuroradiology, 18, 471–477. Luers, J. C., & Hüttenbrink, K.-B. (2016). Surgical anatomy and pathology of the middle ear. Journal of Anatomy, 228, 338–353. Olsen, E. S., Duifhuis, H., & Steele, C. R. (2012). Von Békésy and cochlear mechanics. Hearing Research, 293, 31–43.

23 Diseases of the Auditory System and Diagnostic Audiology Introduction This chapter presents an overview of the diseases of the auditory and vestibular systems, and how they are diagnosed. The chapter has been written specifically to discuss the various tests used to evaluate hearing and balance disorders. Subsequent chapters discuss what can be done to rehabilitate auditory disorders. With your new knowledge of the anatomy and physiology of the auditory and vestibular system from the last chapter, we can start to understand the tests needed to determine which parts of the auditory system are functional. Further information about the senses of hearing and balance can be gathered from the outstanding texts by Kramer and Brown (2019) and Jacobson and Shepard (2016). Then we evaluate the different tests and their outcomes for each of the main types of hearing disorders — conductive, sensorineural, and mixed losses. With a current population in the United States of approximately 327 million people, the National Institute of Deafness and Other Communication Disorders (NIDCD) reports that approximately 15% or 37.5 million American adults over the age of 18 have some trouble hearing, which is about the same as the total population in the state of California. It is estimated that 90% to 95% of these individuals can be helped

with hearing aids, which indicates that there are just under 30 million people for whom hearing aids would be of some benefit (NIDCD, 2016). As you can see in Table 23–1, hearing loss is a problem in the general population. We should know that individuals with hearing loss will also have problems understanding speech. But in addition to this obvious problem, we also know that hearing loss is associated with other serious negative consequences such as academic difficulties, problems in the workplace, and psychosocial issues such as social isolation, depression, anxiety, loneliness, and lessened self-efficacy (Mueller, Ricketts, & Bentler, 2014). However, as audiologists we need to identify these individuals and determine the type, degree, and configuration of their hearing loss so that we can assist them in developing effective communication.

Hearing Evaluation According to the American Academy of Audiology, an audiologist provides services in the audiologic identification, assessment, diagnosis, and treatment of persons with impairment of auditory and vestibular function while in their roles as clinician, therapist, teacher, consultant, researcher, and administrator. They identify, 317

318

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Table 23–1.  Quick Statistics About Hearing Children

• Two to three out of every 1,000 children are born with a hearing loss in one or both ears • More than 90% of deaf children are born to hearing parents • Five out of six children have an ear infection by the time they are 3 years old • At least 1.4 million children (18 or younger) have hearing problems

Adults

• Three in 10 people over age 60 years have hearing loss • One in six baby boomers (ages 41–59 years), or 14.6%, have a hearing problem • One in 14 Generation Xers (ages 29–40 years), or 7.4%, already have hearing loss • 10% of adults (about 25 million) have experienced tinnitus

Treatment

• Only about 16% of adults with hearing loss use hearing aids • 58,000 cochlear implants have been implanted in adults and 38,000 in children

Source:  From “Quick Statistics About Hearing,” National Institute on Deafness and Other Communication Disorders (NIDCD), 2016. Retrieved from https://www.nidcd.nih.gov

to diagnose any abnormality in their hearing and/or vestibular systems (AAA, 2004).

Case History It is important that before you start your assessment, you obtain some information about the patient. Referred to as a case history, you can learn important information and gain great clinical insight about the patient’s primary complaints and symptoms. Through the case history, you will obtain answers to your questions about the extent of any hearing and communication problems, when the problem began, whether it has worsened, if it came on suddenly or gradually, and if the patient has associated dizziness and/or tinnitus (ringing in the ears). Additional topics you may explore in the case history include how family members perceive the patient’s problem, any associated circumstances or activities that brought on the conditions, what medications they are taking, a family history of hearing problems, results of previous hearing tests, and any previous use of hearing aids. Based on the patient’s answers to these questions and/or information from other sources, additional questions may be appropriate. With this information, you can begin to develop a clinical impression of the patient and his or her problem, which will help guide you to the next steps.

Otoscopy assess, diagnose, and treat individuals with impairment of either peripheral or central auditory and/or vestibular function (AAA, 2004). Audiologists (or clinicians as they are often referred) are trained as both diagnosticians and habilitation/rehabilitation experts for the auditory system. They spend much of their time determining the type, degree, and configuration of any hearing loss they detect and determining what can be done to rehabilitate communication problems associated with the loss. In most cases, this makes audiologists heavily reliant on technology. Therefore, audiologists need to be technologically savvy to be competent in their job. The assessment of an individual’s hearing includes the administration and interpretation of behavioral, psychoacoustic, and electrophysiologic measures of the peripheral and central auditory systems. The assessment of an individual’s vestibular system includes the administration and interpretation of behavioral and electrophysiologic tests of equilibrium. Both of these types of assessments are accomplished using standardized testing procedures and instrumentation in order

After completing a case history, it is important to visually inspect the ear canal and tympanic membrane before attempting any audiologic assessment, especially ones that require placing an ear insert or ear probe into the canal. This technique is called otoscopy and requires you to place the speculum of the otoscope into the patient’s ear canal. Figure 23–1 shows a photo of a standard otoscope with its handle, neck, head, and specula. The otoscope has a light source and magnifies the view down the ear canal. This ability to peer into a patient’s ear canal can determine the status of the outer and middle ear by assessing the color, shape, and general appearance of the structures to see if they are normal. As shown in Figure 23–2, the clinician holds the otoscope with a pencil grip and uses proper bracing technique (i.e., other fingers are placed against the head) to support the insertion of the specula into the patient’s ear. The opposite hand is used to grip the auricle (pinna), gently but firmly pulling up and back, which will straighten out the canal and provide better visualization. Using this bracing technique to hold the oto-

23  Diseases of the Auditory System and Diagnostic Audiology

319

Head

Specula Neck

Handle Figure 23–2.  When viewing the tympanic membrane through a standard otoscope, it is important to use proper bracing technique where one hand pulls up and back on the auricle and the other holds the otoscope pencil-style using the other fingers to brace the otoscope against the patient. Courtesy of AudProf.com.

Figure 23–1.  A standard otoscope used in visualizing the ear canal and tympanic membrane. Disposable specula are used to prevent the spread of germs from one patient to another.

scope will avoid injury to the patient’s ear canal, as it will move with the patient should he or she suddenly move the head. As you insert the speculum (plural, specula) into the ear canal, you can look through the otoscope to view the tympanic membrane at the far end of the canal (Figure 23–3). As you examine the ear canal and tympanic membrane, not only are you looking for excess cerumen (earwax) and foreign objects but also for diseases and disorders of the outer and middle ear.

Immittance Immittance audiometry describes the sound energy that is transferred through the outer and middle ear 1

systems.1 If we apply a known sound (Chapter 21) to the ear, the acoustic and mechanical properties of the outer and middle ears (Chapter 22) provide opposition to the energy flow, which is referred to as impedance. A high impedance system (i.e. a middle ear that is filled with fluid or an ear infection) will have a greater opposition to the flow of energy than a low impedance system (i.e., a normal, healthy middle ear). The reciprocal of impedance is called admittance, which is the measure of how much of the applied energy flows through the middle ear system, so that a high admittance system (i.e., a normal, healthy middle ear) has a greater flow of energy. These concepts can be applied to the evaluation of the conductive part of the hearing mechanism (Chapter 22) by making different types of measurements including tympanometry and acoustic equivalent volume of the ear canal. Figure 23–4 shows the basic components of an admittance instrument. To obtain a measure of admittance, the probe must include an air pressure pump, which allows the system to vary the pressure within the ear canal. The system also has a speaker, which produces an 85 dB sound pressure level (SPL) pure tone (usually at 226 Hz), called the probe tone. The tone is

Immittance is the overall term that includes both admittance and impedance. Admittance is the amount of energy that moves through the middle ear system while impedance is the reciprocal of admittance. In other words Admittance values (Y) and impedance values (Z) are related, Y = 1/Z or Z = 1/Y. Audiologists commonly use admittance when measuring middle ear function (Kramer & Brown, 2019).

320

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

presented to the ear through a probe assembly placed at the entrance to the ear canal. A microphone, also a part of the probe assembly, is used to monitor the level of the probe tone in the ear canal. For infants younger than 6 months, conventional tympanometry with a

Incus

Pars flaccida of eardrum

Stapes Malleus (handle) Umbo Cone of light Annulus

Figure 23–3. The view of a normal tympanic membrane through an otoscope.

Air pressure

226 Hz probe tone is not a valid measure, and a higher probe-tone frequency (1000 Hz) is recommended.

Tympanometry Tympanometry is one of the most commonly used tests in the basic audiometric test battery. This test is used to measure how the admittance changes as a function of applied air pressure and how this function is affected by different conditions of the middle ear. The results of this test are displayed on a graph (Figure 23–5) called a tympanogram, where the admittance is on the y-axis (mmhos, a unit of admittance), and the pressure range is displayed on the x-axis (decaPascals or daPa, units of pressure). In conducting this test, the first step is to place the probe in the ear and obtain an airtight seal with an appropriate-sized rubber probe tip. This allows the air pressure to be manipulated by the air pressure pump. The canal is pressurized to +200 daPa, while the probe tone is presented to the ear as shown in Figure 23–4. The pressurization of the ear canal forces the tympanic membrane in the direction of the middle ear cavity (positive pressure), reducing the tympanic membrane’s ability to vibrate (low admittance). The admittance is recorded at +200 daPa and plotted on the graph (see Figure 23-5). The pressure is then swept continuously from +200 to −200, and the admittance is recorded along the way. In a normal ear, maximum admittance

Probe

Tone generator Ear drum

Microphone

EAM Figure 23–4.  The key components of an admittance instrument or tympanometer. The air pressure pump is used to apply air pressure during tympanometry, the speaker sends the probe tone down the ear canal toward the tympanic membrane, and the microphone measures the intensity of the tone as it is reflected back from the tympanic membrane.

23  Diseases of the Auditory System and Diagnostic Audiology

EAM Side

321

Middle Ear Side

Positive pressure (+)

Admittance

Negative pressure (–)



0

+

Pressure in Ear Canal

Figure 23–5.  A tympanogram where the admittance is plotted across ear canal pressure. Note the effect of pressure on the tympanic membrane, where the positive pressure is drawn in red and the negative pressure in blue. Compare the position of the TM (on the left side) with the movement of the TM on the tympanogram (on the right side). In this example, the TPP is 0 daPa.

is found at 0 daPa, where the air pressure is equal on either side of the tympanic membrane allowing it to vibrate most effectively (high admittance). As the applied air pressure becomes negative, the admittance again decreases because the tympanic membrane does not vibrate as efficiently when the eardrum is pulled out with negative pressure. The pressure at which admittance is maximum corresponds to a specific amount of pressure; this is referred to as the tympanometric peak pressure (TPP).

Acoustic Equivalent Volume of the Ear Canal Ear canal volume is an important measure, as it will provide information about the outer and middle ear. The acoustic equivalent volume (Vea) is a measure of the physical volume of the ear canal, as estimated from the admittance obtained at +200 daPa. It can provide diagnostic information about the condition of the tympanic membrane and/or ear canal. For a normal ear canal and tympanic membrane, the admittance at +200 should be within the normal range of ear canal volumes. If the tympanic membrane has a perforation or pressure equalization (PE) tube inserted, then the Vea will be larger than the normal range. This is because the volume estimate includes not only the volume of

the ear canal but also the volume of the middle ear (and potentially eustachian tube). If the Vea is smaller than the expected normal range, it may be an indication that the external ear canal is obstructed. For either of these abnormal Vea conditions, the tympanogram will not show any changes in admittance as the applied air pressure is varied and appears as a flat line. Types of Tympanograms.  Tympanograms can be categorized based on the five types described by Jerger (1970). Although Jerger’s classification scheme is widely used, it is a more useful approach to describe the actual characteristics of the tympanogram such as a “flat tympanogram” or a “normal-shaped tympanogram with the peak admittance occurring at −150 daPa.” The description of a normal tympanogram varies with a number of factors including age. The different types of tympanograms and their descriptions are briefly described next. An abnormal tympanogram is a good indication of some middle ear involvement that affects the admittance characteristics of the middle ear. However, the tympanogram is not a predictor of the amount (if any) of conductive hearing loss. Normal admittance (Type A) tympanogram has a characteristic peak shape with normal compliance and

322

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

range (Figure 23–7B). A flat tympanogram occurs with either an ear infection (fluid in the middle ear), a hole (perforation) in the tympanic membrane (i.e.,larger than normal Vea), impacted cerumen, or when the probe is pushed against the ear canal wall (reduced Vea). Patients with a flat tympanogram will most likely have a conductive hearing loss.

Admittance

the tympanic peak pressure within the normal range (Figure 23–6). A normal Type A tympanogram occurs in normal functioning middle ears, and this patient will not have a conductive hearing loss. Flat (Type B) tympanogram does not have the characteristic peak shape (i.e. no TPP) seen for Type A, but instead appears to be relatively flat across the pressure

-400

-200

0

+200

Pressure at which peak compliance occurs, measured in decaPascals. Figure 23–6.  Type A or normal admittance tympanogram.

Retracted eardrum (Type C Tympanogram) 1.5 cm3

No Mobility (Type B Tympanogram)

Admittance

Admittance

1.5 cm3

-400

-200

0

+200

Pressure (daPa)

-400

-200

0

Pressure (daPa)

A B Figure 23–7.  A. Type C negative pressure tympanogram. B. Type B or flat tympanogram.

+200

23  Diseases of the Auditory System and Diagnostic Audiology

Negative pressure (Type C) tympanogram has a characteristic peak with the same shape as a Type A; however, the TPP is shifted to a more negative pressure (Figure 23–7A). A negative pressure tympanogram indicates that the pressure in the middle ear space is not equal to the atmospheric pressure. However, when the TPP is outside the normal range and a negative pressure tympanogram persists for an extended period of time, then fluid can build up in the ear, at which point it will change to flat (Type B). A patient with a negative pressure tympanogram usually does not have a conductive hearing loss. Reduced admittance (Type As) tympanogram has a characteristic peak shape with the TPP in the normal range as for Type A; however, the admittance is lower than the lower end of the normal range (Figure 23–8). This type of tympanogram is sometimes referred to as “shallow.” A reduced tympanogram suggests reduced movement of the tympanic membrane. A patient with a reduced tympanogram will have a conductive loss on the audiogram. High admittance (Type Ad) tympanograms have a characteristic peak shape with the TPP in the normal range as for Type A; however, the admittance is higher than the upper end of the normal range (Figure 23–8). A high admittance tympanogram suggests a highly mobile tympanic membrane, which may be seen in some cases of disarticulation of the ossicular chain or cases of thinned tympanic membranes resulting from previous middle ear infections. These high admittance tympanograms (Type Ad) are suggestive of a disartic-

ulation (break) of the ossicular chain, and the patient will usually have a conductive hearing loss.

Acoustic Reflex Threshold In this section, we will discuss the contraction of the stapedius muscles (Chapter 22) by the ear’s involuntary middle ear reflex in response to a loud sound. The acoustic reflex threshold (ART) test, as it is known, uses the same immittance instrument and is usually performed immediately after obtaining a tympanogram. This acoustic reflex is a bilateral response that occurs when a loud tone is delivered to one ear resulting in a contraction of the stapedius muscle due to firing of the seventh cranial (facial) nerve in both ears. This contraction of the stapedius muscle changes the transmission efficiency of the sound energy as it travels through the ossicular chain, decreasing the admittance of the probe tone. Figure 23–9 provides a simplified diagram of the acoustic reflex pathway, illustrating the main ART pathways and key structures. The ART is defined as the lowest intensity level (in 5 dB steps) of a reflex eliciting tone that produces a repeatable acoustic reflex. This test takes into account pathologies that occur in the outer and middle ear as well as abnormalities of the cochlea, eighth cranial nerve, lower brainstem, and/or the seventh cranial nerve, as they can also influence the ability to record an acoustic reflex. The goal is to monitor any change in the admittance of the probe tone that occurs when the stapedius muscle contracts in response to a loud tone presented

Hyperflaccid (Type Ad)

Normal Ear (Type A)

Admittance -400

Stiff Ear (Type As)

-200

323

0

+200

Pressure (daPa) Figure 23–8.  Type Ad or hypermobile tympanograms.

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

SOC & VII Nucleus

e rv

SOC & VII Nucleus

CN

II VI

VI II N er ve

CN

rv

Ne e

Middle ear

VI IN e

e rv

VI IN e

324

Inner ear

Inner ear

Middle ear

Right ear

Right ear

Figure 23–9.  A simplified diagram of the acoustic reflex pathway showing the ipsilateral reflex arcs for the right ear in red and for the left ear in blue.

to the ear. The probe tone is a hum-like sound (226 Hz) that is constantly playing in the ear; the clinician then presents a tone to elicit the reflex. If the acoustic reflex is triggered, the stapedius muscle fires, and there is an abrupt reduction in admittance of the probe tone. In an individual with normal hearing, the ART should fire between 75 and 95 decibels hearing level (dB HL) (Wiley, Oviatt, & Block, 1987) as shown in Figure 23–10. In this example, you can see that at the beginning, the level of the reflex eliciting tone is below the stapedius reflex threshold, and there is no measurable change in admittance. As the reflex eliciting tone is increased in intensity (level), it will eventually become loud enough causing the stapedius to contract and the admittance

to decrease. As the intensity of the reflex eliciting tone increases above the ART, there is a range in which the stapedius contraction strengthens and the size (amplitude) of the downward deflection of the acoustic reflex increases with increasing dB HL.

Audiometric Testing Audiometric testing has been the mainstay in the audiologist’s armament of diagnostic tests since audiology’s modern beginnings in the mid-1900s (Jerger, 2009). The standard battery of audiometric testing includes both pure-tone audiometry and speech audiometry. When

Admittance mitta

Higher 80 dB At TPP Lower

.00

85 dB

.01

90 dB

95 dB

.04

90 dB

.03 .08

Figure 23–10.  An acoustic reflex measure illustrating different levels of the reflex eliciting tone. The acoustic reflex threshold is defined as the lowest level of the reflex eliciting tone that produces a downward deflection (reduced admittance) ≥0.02 mL. In this example, the acoustic reflex threshold is 90 dB HL. Reproduced with permission from Audiology: Science to practice (3rd ed., p. 232) by S. Kramer and D. K. Brown, 2019, San Diego, CA: Plural Publishing, Inc. Copyright 2019 by Plural Publishing, Inc.

23  Diseases of the Auditory System and Diagnostic Audiology

assessing a cooperative patient for a hearing concern, pure-tone audiometry is a part of the basic audiologic assessment. It should be pointed out that audiologists use a test battery approach, not counting on one single test to determine a person’s ability to hear because the auditory system is complex, and we have the ability to assess many of its parts independently. Using the cross-check principle2 along with the battery of tests, we are able assess a person’s hearing to determine type, degree, and configuration of the hearing loss with confidence.

Pure-Tone Audiometry Pure-tone audiometry is the heart of the standard test battery and involves finding the lowest intensity across the frequency range that a person is just able to hear. The lowest intensity for a particular tone that a person can reliably respond to at least 50% of the time is called his or her threshold for that frequency. In pure-tone audiometry, thresholds are obtained in a quiet environment for a range of frequencies between 250 and 8000 Hz. This range is important because it contains the frequencies that are most relevant for speech sounds. The pattern of hearing loss according to the threshold across the frequency spectrum is often characteristic of certain types of hearing loss. For example, high frequencies are more affected than low frequencies for persons with hearing loss due to noise exposure. Pure-tone thresholds can be used to describe why a patient’s hearing loss might relate to their inability to hear different speech sounds. Speech sounds such as the consonants “f”, “s”, or “th” have higher-frequency components than vowels “a” or “o”, which suggests that the person with a high frequency hearing loss will have more trouble hearing these consonants (and others as well) compared with hearing the vowels. This simple test of determining a person’s threshold may be quite easy to accomplish with a cooperative adult but may require considerable skill and experience to be able to recognize and adapt to different patient response abilities and patterns when testing an 8-month-old child or an elderly patient with dementia. It is the audiologist’s responsibility to incorporate and integrate the pure-tone results with other test findings and make appropriate interpretations, impressions, and recommendations for management of the hearing loss. Pure-tone audiometric thresholds are used by audiologists to (a) describe the amount of the patient’s

325

hearing loss, (b) determine which parts of the auditory system are involved, (c) determine if a medical referral is needed, and (d) predict how the patient’s hearing loss may relate to his or her ability to listen and communicate. To be able to measure a pure-tone threshold, the audiologist uses an instrument called an audiometer to produce the variety of stimuli needed for the test. The audiometer is used to create pure tones (Chapter 21) from 125 to 8000 Hz (in octave or half-octave steps) at a variety of intensities through a variety of transducers such as an earphone, an ear insert, a bone vibrator, or speakers. It can also produce different noises such as speech noise or narrow-band noise, which are used as maskers to keep the nontest ear busy while determining the threshold for the test ear. Pure-tone audiometry is first completed using earphones or ear inserts, this means that the sound is transmitted down the ear canal and through the middle ear. This pathway is referred to as air conduction, as the initial sound is traveling through air before it is converted to mechanical energy by the tympanic membrane and passed along to the cochlea. The other pathway is referred to as bone conduction and requires the use of a bone oscillator, which vibrates the skull in order to transmit the sound directly to the cochlea, bypassing the outer and middle ear. Differences in thresholds between air and bone conduction results in an air-bone gap, and gaps greater than 20 dB are considered to be a conductive hearing loss (discussed later in this chapter). The next step is to take the threshold and plot it on an audiogram to record the results. However, we must first understand the audiogram or graphical record on which we record a patient’s thresholds. When trying to understand how well a patient hears, we must place these thresholds on the audiogram, according to the frequency and intensity. The audiogram, as shown in Figure 23–11, is a description of a person’s hearing with frequency (in Hz) on the x-axis and their threshold (in hearing level, dB HL)3 on the y-axis. The legend or audiogram key indicates the various symbols that can be used to describe the results. Once the thresholds are plotted on the audiogram, we can calculate the pure-tone average (PTA), which is simply the average of the thresholds for 500, 1000, and 2000 Hz. The degree or amount of loss is based on the person’s threshold. Once you have determined the patient’s threshold for a particular frequency, you will plot it on the audiogram according to the frequency and intensity of the stimulus.

2 

 he cross-check principle, first suggested by Dr. James Jerger (1976), is the checking of results of a single test by the results of another indeT pendent test. With this principle, we can compare results from a number of tests to determine if the outcome is supported. For example, we can compare the results of the pure-tone audiogram with the results of tympanometry.

3 

dB hearing level (HL) is a scale used on the audiogram where 0 dB HL at any frequency represents the lowest level for normal hearing.

Introduction to Communication Sciences and Disorders:  The Scientific Basis of Clinical Practice

Decibels Hearing Level (dB HL) De

-10 0

250 2

500

Frequency (Hz) 1000 2000 4000 8000

10 20 30 40 50

Audiogram Right Left Ear Ear Key AC unmasked O X AC masked BC unmasked BC masked No response Sound-field