The Oxford Handbook of Music and the Brain 0198804121, 9780198804123

The study of music and the brain can be traced back to the work of Gall in the 18th century, continuing with John Hughli

2,401 320 7MB

English Pages 848 Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Oxford Handbook of Music and the Brain
 0198804121, 9780198804123

Table of contents :
Cover
The oxford handbook of MUSICAND THE BRAIN
Copyright
Table of Contents
List of Contributors
Section I: INTRODUCTION
Chapter 1: The Neuroscientific Study of Music: A Burgeoning Discipline
Introduction
Vignettes from a Burgeoning Discipline
Chapter Overviews
II. Music, the Brain, and Cultural Contexts
III. Music Processing in the Human Brain
IV. Neural Responses to Music
V. Musicianship and Brain Function
VI. Developmental Issues in Music and the Brain
VII. Music, the Brain, and Health
VIII. The Future
References
Section II: MUSIC, THE BRAIN, AND CULTURAL CONTEXTS
Chapter 2: Music through the Lens of Cultural Neuroscience
Introduction to Cultural Neuroscience
Cultural Practices Adapt to Neural Constraints
The Brain Adapts to Cultural Practice
Genetic Influences on Musical Behavior
Neural Plasticity
Neural Pruning
Myelination
Cultural Influences on Innate Infant Responses to Music
The Search for Music Universals
Cross-Cultural Music Research
Conclusion
References
Chapter 3: Cultural Distance: A Computational Approach to Exploring Cultural Influences on Music Cognition
Introduction
Related Literature
Cross-Cultural Explorations of Emotion
Cross-Cultural Explorations of Music Preference
Cross-Cultural Explorations of Musical Structure
Scale and Key Perception
Rhythm and Meter Perception
Phrasing and Form
Cross-Cultural Explorations of Music Memory
Cultural Distance
Limitations
Conclusion
References
Chapter 4: When Extravagance Impresses: Recasting Esthetics in Evolutionary Terms
Introduction
Squandering as Asset, I: The Evolutionary Logic
Squandering as Asset, II: A Bulwark against Bluff
Squandering as Asset, III: Surrendering to Mastery
Transposing to the Human Case
The Psychological Impact of Music: Neither “Meaning” nor “Emotion”
Conclusion
References
Section III: MUSIC PROCESSING IN THE HUMAN BRAIN
Chapter 5: Cerebral Organization of Music Processing
Introduction
Neural Basis of Music Processing in the Healthy Brain
The Ascending Auditory Pathways
Auditory-Frontal Networks
Auditory-Motor Networks
Cortico-Cerebellar Network
Basal Ganglia-Thalamo-Cortical Network
Auditory-Limbic Networks
Brain Network Interactions
Summary
References
Chapter 6: Network Neuroscience: An Introduction to Graph Theory Network-Based Techniques for Music and Brain Imaging Research
Introduction
Overview of Network Science
Introduction to Network Metrics
Generating Brain Networks: Steps for Network-Based Neuroimaging Analysis
Implications for Music and Brain Research
References
Chapter 7: Acoustic Structure and Musical Function: Musical Notes Informing Auditory Research
Introduction and Overview
Grouping Notes: Deconstructing Chords and Harmonies
Grouping Harmonics: Deconstructing Individual Notes
Acoustic Structure and Musical Timbre
The Use of Temporally Varying Sounds in Music Perception Research
On the Methodological Convenience of Simplified Sounds
Conclusions
Acknowledgments
References
Chapter 8: Neural Basis of Rhythm Perception
Introduction
Feeling the Beat
Oscillatory Mechanisms
Language and Music
Development of Rhythm
Comparative Psychology and Evolution of Rhythm Perception
Cross-Modal Investigations of Rhythm Perception
Individual Differences and Musical Training
Mirroring and Joint Action
Conclusion
References
Chapter 9: Neural basis of music perception: Melody, Harmony, and Timbre
Introduction
We Do Not Only Hear with Our Cochlea
Auditory Feature Extraction in Brainstem and Thalamus
Acoustical Equivalency of “Timbre” and “Phoneme”
Auditory Feature Extraction in the Auditory Cortex
Echoic Memory and Gestalt Formation
Musical Expectancy Formation: Processing of Local Dependencies
Musical Structure Building: Processing of Nonlocal Dependencies
Concluding Remark
References
Chapter 10: Multisensory Processing in Music
Introduction
Multisensory Processing
Auditory Processing
Pathways
Multisensory Perception of Pitch
Visuomotor Influences
Somatosensory Influences
Multisensory Perception of Timbre
Visuomotor Influences
Somatosensory Influences
Multisensory Perception of Rhythm
Visuomotor Influences
Somatosensory Influences
Movement-Based Influences
Summary and Conclusions
Acknowledgments
References
Section IV: NEURAL RESPONSES TO MUSIC: COGNITION, AFFECT, LANGUAGE
Chapter 11: Music and Memory
Introduction
Human Memory in General
Memory Processes during Music Listening
Tone Memory
Tone Interval Memory
Tonal Working Memory
Behavioral Findings of Tonal Working Memory
Neuroanatomical Correlates of Working Memory
Memory for Music
Intrinsic Features of Musical Pieces
Emotion and Arousal Induced by Music
Individual Schemas and Music Structure
Brain Activation during Encoding and Retrieval of Music
Music as Memory Enhancer
Musical Proficiency and Memory
Influence of Background Music on Learning and Recall
Music as Memory Modulator in Healthy Subjects
Music as Memory Modulator in Neurological Patients
A Model for Music Memory
References
Chapter 12: Music and Attention, Executive Function, and Creativity
Introduction
Music and Attention
Theories of Attention
Selection and Filtering
Attending to Musical Pitch and Harmonicity
Temporal Attention, Prediction, and Entrainment of Musical Stimuli
Music and Executive Function
Association Studies Suggesting Near Transfer
Far Transfer
Longitudinal Studies on Far Transfer
Behavioral Changes and Neural Mechanisms
Negative Findings
Conclusions and Implications
Music and Creativity
Musical Improvisation as a Model of Creativity
Neuroimaging Studies of Music and Creativity
Data-Driven Correlates of Creativity
Personality and Cognitive Profiles of Creative Musicians
Conclusions
References
Chapter 13: Neural Correlates of Music and Emotion
Introduction
Musical Affect: Definitions and Distinctions
Psychological Mechanisms: A Theoretical Framework
Emotion Perception
Emotion Induction
Review of Empirical Studies
General Overview
Empirical Approaches
Summary of Brain Imaging Data
Perception of Emotions
Induction of Emotions
Towards a More Principled Approach
Concluding Remarks: A Field in Need of an Agenda?
References
Chapter 14: Neurochemical Responses to Music
Introduction
Dopamine Systems
Endogenous Opioid Systems
Serotonin Systems
Neuroendocrine Systems I (Posterior Pituitary)
Oxytocin
Arginine Vasopressin
Neuroendocrine Systems II (Anterior Pituitary)
Norepinephrine Systems
Peripheral Immune System
Immune Cells
Cytokines
Immunoglobulin A
Cholinergic Systems
Discussion and Future Directions
References
Chapter 15: The Neuroaesthetics of Music: A Research Agenda Coming of Age
Introduction
The Need to Study the Neural Determinants of Art
Neuroaesthetics: A Research Agenda also for Music
Existing Models of the Musical Aesthetic Experience
Main Brain Structures Related to Aesthetic Responses to Music
Present Advances: Neural Interactions for Music Aesthetic Responses
Future Challenges and Promises
References
Chapter 16: Music and Language
Introduction
On Modularity of Music and Language
Common Functions and Operations in Music and Language
Overlap and Resource Sharing
Music Training and Language Skills
Bridging Music and Language
A Temporal Focus into Music and Language
References
Section V: MUSICIANSHIP AND BRAIN FUNCTION
Chapter 17: Musical Expertise and Brain Structure: The Causes and Consequences of Training
Introduction
Structural Brain Differences in Adult Musicians
Developmental Impacts on Training-Related Plasticity
The Interaction between Development and Training
Aptitude and Short-Term Training
Bringing It All Together
Why Is Music Such an Effective Driver of Brain Plasticity?
Where Do We Go From Here?
References
Chapter 18: Genomics Approaches for Studying Musical Aptitude and Related Traits
Genomic Approaches to Study Human Traits
Musical Aptitude as a Biological Trait
Evolution of Musical Aptitude
Genome-Wide Linkage and Association Analyses of Musical Traits
The Effect of Music Perception and Performance on Human Transcriptome
Convergent Analysis
Biological Background of Creative Activities in Music
Conclusion
References
Chapter 19: Brain Research in Music Performance
Introduction
Performing Music as a Driver of Brain Plasticity
Brain Regions Involved in Performing Music: A Quick Overview
The Effects of Musical Training on Brain Function
The Effects of Musical Training on Brain Structure
De-Expertise: Musician’s Dystonia as a Syndrome of Maladaptive Plasticity
Brain Changes Associated with Loss of Sensorimotor Control
Brain Plasticity as Prerequisite and Result of Expert Performance in Musicians
Acknowledgments
References
Chapter 20: Brain Research in Music Improvisation
Introduction
Functional Magnetic Resonance Imaging (fMRI)
Cortical Regions Involved in the Generation of Musical Structures during Improvisation in Pianists (Bengtsson, Csikszentmihalyi, & Ullén, 2007)
Design
Results
Conclusions/Highlighted Discussion
Generation of Novel Motor Sequences: The Neural Correlates of Musical Improvisation (Berkowitz & Ansari, 2008)
Design
Results
Conclusions/Highlighted Discussion
Neural Substrates of Spontaneous Musical Performance: An fMRI Study of Jazz Improvisation (Limb & Braun, 2008)
Design
Results
Conclusions/Highlighted Discussion
Expertise-Related Deactivation of the Right Temporoparietal Junction during Musical Improvisation (Berkowitz & Ansari, 2010)
Design
Results
Conclusions/Highlighted Discussion
Goal-Independent Mechanisms for Free Response Generation: Creative and Pseudo-Random Performance Share Neural Substrates (de Manzano & Ullén, 2012b)
Design
Results
Conclusions/Highlighted Discussion
Activation and Connectivity Patterns of the Presupplementary and Dorsal Premotor Areas during Free Improvisation of Melodies and Rhythms (de Manzano & Ullén, 2012a)
Design
Results
Conclusions/Highlighted Discussion
Neural Correlates of Lyrical Improvisation: An fMRI Study of Freestyle Rap (Liu et al., 2012)
Design
Results
Conclusions/Highlighted Discussion
Neural Correlates of Musical Creativity: Differences between High and Low Creative Subjects (Villarreal et al., 2013)
Design
Results
Conclusions/Highlighted Discussion
Connecting to Create: Expertise in Musical Improvisation Is Associated with Increased Functional Connectivity between Premotor and Prefrontal Areas (Pinho, de Manzano, Fransson, Eriksson, & Ullén, 2014)
Design
Results
Conclusions/Highlighted Discussion
Addressing a Paradox: Dual Strategies for Creative Performance in Introspective and Extrospective Networks (Pinho, Ullén, Castelo-Branco, Fransson, & de Manzano, 2016)
Design
Results
Conclusions/Highlighted Discussion
Neural Substrates of Interactive Musical Improvisation: An fMRI Study of “Trading Fours” in Jazz (Donnay, Rankin, Lopez-Gonzalez, Jiradejvong, & Limb, 2014)
Design
Results
Conclusions/Highlighted Discussion
Emotional Intent Modulates the Neural Substrates of Creativity: An fMRI Study of Emotionally Targeted Improvisation in Jazz Musicians (McPherson, Barrett, Lopez-Gonzalez, Jiradejvong, & Limb, 2016)
Design
Results
Conclusions/Highlighted Discussion
Positron Emission Tomography (PET)
Music and Language Side by Side in the Brain: A PET Study of the Generation of Melodies and Sentences (Brown, Martinez, & Parsons, 2006)
Design
Results
Conclusions/Highlighted Discussion
Transcranial Direct Current Stimulation (tDCS)
Anodal tDCS to Right Dorsolateral Prefrontal Cortex Facilitates Performance for Novice Jazz Improvisers but Hinders Experts (Rosen et al., 2016)
Design
Results
Conclusion/Highlighted Discussion
Electroencephalography (EEG)
The Brain Network Underpinning Novel Melody Creation (Adhikari et al., 2016)
Design
Results
Conclusions/Highlighted Discussion
Creativity as a Distinct Trainable Mental State: An EEG Study of Musical Improvisation (Lopata, Nowicki, & Joanisse, 2017)
Design
Results
Conclusions/Highlighted Discussion
Summary and Discussion
Attentional Networks and the Prefrontal Cortex
Motor Regions
Limbic/Affective Processing
Language Areas
Sensory Processing
Heteromodal Sensory Processing and the Parietal Lobes
Conclusion
References
Chapter 21: Neural Mechanisms of Musical Imagery
Introduction
Imagery and Perception of Music
Behavioral and Psychophysical
Brain Damage
Physiological Measures
Electrophysiology
Brain Imaging
Involuntary Musical Imagery
Anticipatory Musical Imagery
Musical Hallucinations
Schizophrenia
Earworms
Synesthesia
Embodied Musical Imagery
Spatial and Force Metaphors
Mimicry
Inner Ear and Inner Voice
Mental Practice and Performance
Dance
Musical Affect
Summary and Conclusions
References
Chapter 22: Neuroplasticity in Music Learning
Introduction
Electrophysiological Markers of Enhanced Cortical Sound Processing in Musicians
Sound Encoding along the Auditory Pathway
Musical Training and the Developing Brain
Transfer beyond Sound Processing: Does Music Improve Executive Functions?
Predispositions in Musical Abilities and their Neural Correlates
Conclusions and Future Directions
References
Section VI: DEVELOPMENTAL ISSUES IN MUSIC AND THE BRAIN
Chapter 23: The Role of Musical Development in Early Language Acquisition
Introduction
The Music of Speech
Music and Language Perception Abilities at Birth
Co-development of Music and Language Perception from 6 to 12 Months of Age
Co-development of Music and Language in Childhood
Is the Newborn Brain Specialized for Language Perception?
Linked Developmental Disorders
Entanglement of Music and Language in Adults
Conclusion
References
Chapter 24: Rhythm, Meter, and Timing: The Heartbeat of Musical Development
Introduction
Early Perception of Beat, Meter, and Rhythm
Timing and Rhythm in Infant-Directed Singing
Rhythm, Prediction, Neural Oscillations, and Deviations from Regularity
Timing and Musical Emotion
Development of Auditory–Motor Entrainment and Social Effects of Synchronous Movement
Applications to Developmental Disorders
Conclusion
Acknowledgments
References
Chapter 25: Music and the Aging Brain
Introduction
Normal Aging: Music for Cognitive Preservation and Well-Being
Cognition
The Effect of Musical Expertise
The Effect of Short-Term Musical Training
The Effect of Passive Musical Exposure
Emotions and Well-Being
The Underpinning Brain Mechanisms
Pathological Aging: Neuroscience-Informed Training and Rehabilitation with Music
Memory
Language
Motor Functions
Emotions and Well-Being
Conclusion and Future Directions
References
Chapter 26: Music Training and Cognitive Abilities: Associations, Causes, and Consequences
Introduction
A Review of Existing Evidence
Music Training and General Cognitive Abilities
Associations with Specific Cognitive Abilities
Associations with Visuospatial Skills
Associations with Language Abilities
Music Training and Cognitive Performance in Real-World Contexts
Music Training and Academic Achievement
Music Training and Healthy Aging
One Way Forward: Measuring Music Aptitude and Music Training
Conclusion
Acknowledgments
References
Chapter 27: The Neuroscience of Children on the Autism Spectrum with Exceptional Musical Abilities
Introduction
Gaver’s Ecological Theory of Listening and Autism
Zygonic Theory
Towards a New Developmental Model of Auditory Perception
The Development of Auditory Perception in Some Children on the Autism Spectrum
Absolute Pitch
The Impact of AP on Musical Engagement of Children and Young People Who Are on the Autism Spectrum
Derek
Romy
Freddie
Conclusion: The Neuroscience of Autistic Children with Exceptional Musical Abilities
References
Section VII: MUSIC, THE BRAIN, AND HEALTH
Chapter 28: Neurologic Music Therapy in Sensorimotor Rehabilitation
Introduction
Neurologic Music Therapy
Acquired Motor Dysfunction
Degenerative Movement Disorders
Parkinsonian Syndromes
Huntington’s Disease
Parkinsonism
Multiple Sclerosis
Healthy Elderly
Developmental Disorders
Autism Spectrum Disorder
Cerebral Palsy
Summary and Conclusion
References
Chapter 29: Neurologic Music Therapy for Speech and Language Rehabilitation
Introduction
Dysarthria
Apraxia of Speech
Aphasia
Fluency
Sensory Deficits
Voice Disorders
Dyslexia
Conclusion
References
Chapter 30: Neurologic Music Therapy Targeting Cognitive and Affective Functions
Cognitive and Affective Functions: Central to Functional Recovery
Cognitive Remediation: Treatment Method to Improve Cognitive Functions
Music and Brain Plasticity
Music Therapy: From a Social Science Model to a Neuroscience Model
An Overview of Neurologic Music Therapy and its Two Explanatory Models
Rational Scientific Mediating Model (RSMM)
The Transformational Design Model (TDM)
Techniques of NMT Targeting Cognition and Emotion
Scientific Basis of NMT Techniques Targeting Cognition, Emotion, and Psychosocial Functioning
Summary and Future Directions
References
Chapter 31: Musical Disorders
Introduction
Pitch-Based Amusia
Prevalence and Behavioral Markers
Identification of Congenital Amusia
Neurological Markers
Contribution of the Right Frontotemporal Network
Beat Finding Disorder
Acquired Amusia
Acquired Pitch Perception Deficits
Acquired Time Perception Deficits
Acquired Amusia: Larger Cohort Studies
Acquired Amusia and its Comorbidities
Musical Anhedonia
What Have We Learned from Studying Amusia?
What Does Congenital Amusia Have in Common with Other Developmental Disorders?
Different Phenotypes of Amusia
Improving Perceptual Outcomes in Amusia
References
Chapter 32: When Blue Turns to Gray: The Enigma of Musician’s Dystonia
Introduction
Dystonia
Musician’s Dystonia
Essential Characteristics of Musician’s Dystonia
Assessment
The Importance of Rating Scales
Rating Scales for Musician’s Dystonia
Rating Scale Deficiencies
Treatment
Treating Musician’s Dystonia
Physical Medicine and Rehabilitation
Pathophysiology
Pathophysiology versus Pathogenesis
Classic Concepts on Focal Dystonia Pathophysiology
Dystonia and Disordered Timing
Plasticity
Pathogenic Theory
Biological Predisposition
Use Patterns
Future Directions
Better Assessment
New Treatments
Research on Pathophysiology
Research on Pathogenic Mechanisms
Acknowledgments
References
Section VIII: THE FUTURE
Chapter 33: New Horizons for Brain Research in Music
Introduction: The Flaws of Predictions
Ten New Horizons for Future Research
Coda
References
Index

Citation preview

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

T h e Ox f o r d H a n d b o o k o f

M USIC A N D T H E   BR A I N

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the oxford handbook of

MUSIC AND THE BRAIN Edited by

MICHAEL H. THAUT and

DONALD A. HODGES

1

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

1 Great Clarendon Street, Oxford, ox2 6dp, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Oxford University Press 2019 The moral rights of the authors have been asserted First Edition published in 2019 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2019943710 ISBN 978–0–19–880412–3 Printed and bound by CPI Group (UK) Ltd, Croydon, cr0 4yy Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

Table of Contents

List of Contributorsix

SE C T ION   I   I N T RODU C T ION 1. The Neuroscientific Study of Music: A Burgeoning Discipline

3

Donald A. Hodges and Michael H. Thaut

SE C T ION I I   M U SIC , T H E B R A I N , A N D   C U LT U R A L C ON T E X T S 2. Music Through the Lens of Cultural Neuroscience

19

Donald A. Hodges

3. Cultural Distance: A Computational Approach to Exploring Cultural Influences on Music Cognition

42

Steven J. Morrison, Steven M. Demorest, and Marcus T. Pearce

4. When Extravagance Impresses: Recasting Esthetics in Evolutionary Terms

66

Bjorn Merker

SE C T ION I I I   M U SIC P RO C E S SI N G I N T H E H UM A N B R A I N 5. Cerebral Organization of Music Processing

89

Thenille Braun Janzen and Michael H. Thaut

6. Network Neuroscience: An Introduction to Graph Theory Network-Based Techniques for Music and Brain Imaging Research

122

Robin W. Wilkins

7. Acoustic Structure and Musical Function: Musical Notes Informing Auditory Research Michael Schutz

145

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

vi   table of contents

8. Neural Basis of Rhythm Perception

165

Christina M. Vanden Bosch der Nederlanden, J. Eric T. Taylor, and Jessica A. Grahn

9. Neural Basis of Music Perception: Melody, Harmony, and Timbre

187

Stefan Koelsch

10. Multisensory Processing in Music

212

Frank Russo

SE C T ION I V   N E U R A L R E SP ON SE S TO M U SIC : C O G N I T ION , A F F E C T, L A N G UAG E 11. Music and Memory

237

Lutz Jäncke

12. Music and Attention, Executive Function, and Creativity

263

Psyche Loui and Rachel E. Guetta

13. Neural Correlates of Music and Emotion

285

Patrik N. Juslin and Laura S. Sakka

14. Neurochemical Responses to Music

333

Yuko Koshimori

15. The Neuroaesthetics of Music: A Research Agenda Coming of Age

364

Elvira Brattico

16. Music and Language

391

Daniele Schön and Benjamin Morillon

SE C T ION   V   M U SIC IA N SH I P A N D B R A I N F U N C T ION 17. Musical Expertise and Brain Structure: The Causes and Consequences of Training

419

Virginia B. Penhune

18. Genomics Approaches for Studying Musical Aptitude and Related Traits Irma Järvelä

439

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

table of contents   vii

19. Brain Research in Music Performance

459

Eckart Altenmüller, Shinichi Furuya, Daniel S. Scholz, and Christos I. Ioannou

2 0. Brain Research in Music Improvisation

487

Michael G. Erkkinen and Aaron L. Berkowitz

21. Neural Mechanisms of Musical Imagery

521

Timothy L. Hubbard

2 2. Neuroplasticity in Music Learning 

546

Vesa Putkinen and Mari Tervaniemi

SE C T ION V I   DE V E L OP M E N TA L I S SU E S I N   M U SIC A N D T H E B R A I N 2 3. The Role of Musical Development in Early Language Acquisition

567

Anthony Brandt, Molly Gebrian, and L. Robert Slevc

2 4. Rhythm, Meter, and Timing: The Heartbeat of Musical Development

592

Laurel J. Trainor and Susan Marsh-Rollo

2 5. Music and the Aging Brain

623

Laura Ferreri, Aline Moussard, Emmanuel Bigand, and Barbara Tillmann

2 6. Music Training and Cognitive Abilities: Associations, Causes, and Consequences

645

Swathi Swaminathan and E. Glenn Schellenberg

27. The Neuroscience of Children on the Autism Spectrum with Exceptional Musical Abilities

671

Adam Ockelford

SE C T ION V I I   M U SIC , T H E B R A I N , A N D H E A LT H 2 8. Neurologic Music Therapy in Sensorimotor Rehabilitation Corene Thaut and Klaus Martin Stephan

695

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

viii   table of contents

2 9. Neurologic Music Therapy for Speech and Language Rehabilitation

715

Yune S. Lee, Corene Thaut, and Charlene Santoni

3 0. Neurologic Music Therapy Targeting Cognitive and Affective Functions

738

Shantala Hegde

31. Musical Disorders

760

Isabelle Royal, Sébastien Paquette, and Pauline Tranchant

32. When Blue Turns to Gray: The Enigma of Musician’s Dystonia

776

David Peterson and Eckart Altenmüller

SE C T ION V I I I   T H E F U T U R E 33. New Horizons for Brain Research in Music

805

Michael H. Thaut and Donald A. Hodges

Index

813

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

List of Contributors

Eckart Altenmüller, Institute of Music Physiology and Musicians’ Medicine (IMMM), University of Music, Drama and Media, Germany Aaron  L.  Berkowitz, Department of Neurology, Brigham and Women’s Hospital, ­Harvard Medical School, USA Emmanuel Bigand, CNRS, UMR5022, Laboratoire d’Etude de l’Apprentissage et du Développement, Université de Bourgogne, France and Institut Universitaire de France, France Anthony Brandt, The Shepherd School of Music, USA Elvira Brattico, Center for Music in the Brain (MIB), Department of Clinical Medicine, Aarhus University, Denmark and The Royal Academy of Music, Aarhus/Aalborg, Denmark Thenille Braun Janzen, Music and Health Science Research Collaboratory (MaHRC), University of Toronto, Canada Steven M. Demorest, Northwestern University, USA Michael G. Erkkinen, Department of Neurology, Brigham and Women’s Hospital, USA Laura Ferreri, Cognition and Brain Plasticity Group, Bellvitge Biomedical Research Institute, Hospitalet de Llobregat, Barcelona and Department of Cognition, Development and Educational Psychology, University of Barcelona, Spain. Laboratoire d’Etude des Mécanismes Cognitifs, Université Lumière Lyon 2, 69676 Lyon, France Shinichi Furuya, Sony Computer Science Laboratories Inc., Japan Molly Gebrian, University of Wisconsin-Eau Claire, Department of Music and Theatre Arts, USA Jessica A. Grahn, Brain and Mind Institute, Western University, Canada Rachel E. Guetta, The National Center for PTSD, VA Boston Healthcare System, USA Shantala Hegde, Clinical Neuropsychology and Cognitive Neuroscience Center and Music Cognition Laboratory, Department of Clinical Psychology, National Institute of Mental Health and Neurosciences, Bengaluru, India Donald A. Hodges, University of North Carolina at Greensboro, USA

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

x   list of contributors Timothy L. Hubbard, Arizona State University, USA and Grand Canyon University, USA Christos I. Ioannou, Institute of Music Physiology and Musicians’ Medicine (IMMM), University of Music, Drama and Media, Germany Lutz Jäncke, Division of Neuropsychology, Institute of Psychology, University of Zurich, Switzerland Irma Järvelä, Department of Medical Genetics, University of Helsinki, Finland Patrik N. Juslin, Department of Psychology, Uppsala University, Sweden Stefan Koelsch, Department for Biological and Medical Psychology, University of Bergen, Norway Yuko Koshimori, Music and Health Research Collaboratory (MaHRC), University of Toronto, Canada Yune S. Lee, Department of Speech and Hearing Science, The Ohio State University, USA Psyche Loui, Northeastern University, USA Susan Marsh-Rollo, Auditory Development Lab, McMaster University, Canada Bjorn Merker, Independent Scholar, Kristianstad, Sweden Benjamin Morillon, Institut de Neurosciences des Systèmes, Aix-Marseille Université & INSERM, Marseille, France Steven J. Morrison, University of Washington, USA Aline Moussard, Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal (CRIUGM), Canada Adam Ockelford, School of Education, University of Roehampton, London, UK Sébastien Paquette, International Laboratory for Brain, Music and Sound Research (BRAMS), Université de Montréal, Québec, Canada Marcus  T.  Pearce, Queen Mary University of London, UK and Aarhus University, Denmark Virginia B. Penhune, Department of Psychology Concordia University, Canada David Peterson, Institute for Neural Computation, University of California San Diego, USA Vesa Putkinen, Turku PET Centre, University of Turku, Turku, Finland Isabelle Royal, Département de psychologie, Université de Montréal, Québec, Canada Frank Russo, Ryerson University, Canada

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

list of contributors   xi Laura S. Sakka, Department of Psychology, Uppsala University, Sweden Charlene Santoni, Faculty of Music, University of Toronto, Canada E. Glenn Schellenberg, Department of Psychology, University of Toronto Mississauga, Canada Daniel S.  Scholz, Institute of Music Physiology and Musicians’ Medicine (IMMM), University of Music, Drama and Media, Germany Daniele Schön, Institut de Neurosciences des Systèmes, Aix-Marseille Université & INSERM, France Michael Schutz, Institute for Music and the Mind, McMaster University, Canada L. Robert Slevc, Department of Psychology, University of Maryland, USA Klaus Martin Stephan, SRH Gesundheitszentrum Bad Wimpfen, Germany Swathi Swaminathan, Rotman Research Institute, Baycrest Health Sciences, Canada J. Eric T. Taylor, Brain and Mind Institute, Western University, Canada Mari Tervaniemi, Cognitive Brain Research Unit, Department of Psychology and ­Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland and Cicero Learning, Faculty of Educational Sciences, University of Helsinki, Helsinki, Finland Corene Thaut, Faculty of Music, University of Toronto, Canada Michael H. Thaut, Music and Health Science Research Collaboratory (MaHRC), ­University of Toronto, Canada Barbara Tillmann, CNRS, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics team, France and University of Lyon, France Laurel J. Trainor, Department of Psychology, Neuroscience & Behavior, McMaster University, Canada Pauline Tranchant, Département de psychologie, Université de Montréal, Canada Christina  M.  Vanden Bosch der Nederlanden, Brain and Mind Institute, Western University, Canada Robin W. Wilkins, University of North Carolina at Greensboro, USA

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

section i

I N T RODUC T ION

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

chapter 1

Th e N eu roscien tific Stu dy of M usic: A  Bu rgeon i ng Discipli n e Donald A. Hodges and Michael H. Thaut

Introduction This book is the result of a considerable amount of effort by fifty-four authors from thirteen countries. Beyond that, it represents the work of hundreds of researchers over the past fifty years or so. The neuroscientific study of music, or neuromusical research as it may be called, has grown and expanded significantly over several decades. The purpose of this chapter is twofold. The first portion provides a brief historical perspective on music and neuroscience. The second presents an overview of the eight sections and thirty-three chapters of this book.

Vignettes from a Burgeoning Discipline Space limitations do not permit a detailed historical overview of neuromusical research. Rather, the intent is to provide glimpses of early, pioneering efforts. In 1977, R. A. Henson included historical notes on neuromusical research in the ground-breaking book on music and the brain he edited along with Macdonald Critchley (Critchley & Henson, 1977). John Brust (2003) also provided a historical perspective. More recently, Eckart Altenmüller, Stanley Finger, and François Boller edited a two-volume set on music, neurology, and neuroscience (2015a, 2015b) that provides far greater depth and detail. The first volume focuses on historical connections and perspectives and the

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

4   donald a. hodges and michael h. thaut s­ econd on evolution, the musical brain, and medical conditions and therapies. From these and other sources, here are a few glimpses into the growing field of music–brain research. • Franz Joseph Gall (1758–1828), the founder of phrenology, identified music as one of the twenty-seven faculties of the mind (Elling, Finger, & Whitaker,  2015); in Fig.  1, you can see the music faculty, listed as Tune, just above the eye. Among many others who pursued this notion, Madam Luise Cappiani (1901) gave an address at the American Institute of Phrenology in which she discussed phren­ology, physiology, and psychology in connection with music and singing. • In the 1860s and 1870s, British neurologist John Hughlings Jackson (1835–1911) made cogent observations about children who could not speak but who could sing (Lorch & Greenblatt, 2015). Speaking of one speechless child, Jackson said, “It is worthy of remark that when he sings he can utter certain words . . . but he can only do so while singing” (Jackson, 1871, p. 430). By 1888, German neurologist August

Figure 1.  A phrenological map of the brain. Music is listed as “Tune” and appears just above the eye. Source: By William Walker Atkinson, 1862–1932 [No restrictions], via Wikimedia Commons. https://upload.wikimedia.org/wikipedia/commons/7/71/How_to_know_human_nature-_its_inner_ states_and_outer_forms_%281919%29_%2814784651435%29.jpg

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

the neuroscientific study of music: a burgeoning discipline   5









Knoblauch (1863–1919) had coined the term “amusia” (Graziano & Johnson, 2015) and created a model with five music centers: an auditory center for the perception of musical tones, a motor center for musical production, an idea center for the analysis and comprehension of music, a visual system for reading musical notation, and a motor system for writing musical notation (Johnson & Graziano, 2003). Damage to any of these five centers could lead to nine disorders, grouped into perception or production impairments. Richard Wallaschek (1860–1917), John Edgren (1849–1929), and others also investigated the loss of musical abilities in relation to brain function (Henson, 1977). • The first encephalographic (EEG) recording in humans was made by Hans Berger in 1924 (Haas, 2003). Less than twenty-five years later, researchers were studying musicogenic epilepsy by means of EEG (Shaw & Hill,  1947). By the mid-1970s, investigators were utilizing event-related potentials (ERPs) in relation to music (Schwent, Snyder, and Hillyard, 1976). They found N100 responses (negative waves peaking between 80 and 120 ms after the onset of a stimulus) reflecting pre-attentive perception of pitch changes. • In 1981, Roland, Skinhøj, and Lassen asked participants to make same-different judgments on tone-rhythm patterns taken from the Seashore Tests of Musical Talent while undergoing positron emission tomography (PET) scans. They found widespread activations, including differences between left and right hemispheric processing. • Roland Beisteiner reported on three experiments conducted in Vienna in 1995 in which he used functional magnetic resonance imaging (fMRI), along with direct current EEG (DC-EEG) and magnetoencephalography (MEG), to demonstrate the viability of these methods in the study of music. Finger and hand movements, approximating those used in playing the piano, elicited strong activations in primary and supplementary motor cortices. Since that time, fMRI has become a predominant methodology in neuromusical research. • Recent years have seen the development of several additional methodologies, including transcranial magnetic stimulation (TMS), voxel based morphometry (VBM), tensor based morphometry (TBM), diffusion tensor imaging (DTI), and genomics approaches. Also, new data analysis techniques are being developed, such as network science (described by Wilkins, this volume).

From these earliest explorations into music and the brain, neuromusical research has exploded in recent decades, as indicated in Fig. 2. What began as fledgling, pioneering efforts from the 1940s to the 1960s has burgeoned into a relative flood of publications in the 2000s. Given their variety and ubiquity, human musical experiences are complex and mysterious. Philosophers, ethnomusicologists, music theorists, and many others have spilled countless barrels of ink trying to explicate the phenomenon of music. Why do we respond to music so powerfully? What does it mean? Why do we have it at all? Explaining how music “works” in the human brain is no less daunting. Of necessity,

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

6   donald a. hodges and michael h. thaut Music-Brain Articles in PubMed

3000

2628

2500 2000 1500

1183

1000 500 0

1

5

15

72

136

1940s

1950s

1960s

1970s

1980s

263 1990s

2000s

2010s

Figure 2.  The number of published articles obtained from a simple “music and brain” search in PubMed (https://www.ncbi.nlm.nih.gov/pubmed/).

neuroscientists frequently take a reductionist approach (Bickle,  2003; Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, 2017). Findings from work going on at one level (e.g., networks) are not necessarily integrated into work at another level (e.g., genomics). Furthermore, results are often parsed according to methodology (e.g., fMRI and ERP). As stated, some of this is of necessity; after all, notions derived from activations generated across 30 minutes of music listening and monitored by fMRI are not immediately compatible with results from an experimental design with musical stimuli of just a few seconds as recorded by MEG. To avoid a crazy-quilt, scattershot view of music, broad overviews attempting to blend disparate findings have appeared from time to time in the literature. Whether in articles (e.g., Peretz & Zatorre, 2005; Warren, 2008), chapters (e.g., Marin & Perry, 1999; Schlaug, 2003), or books (e.g., Critchley & Henson, 1977; Koelsch, 2012), these reviews are critically important in moving us toward a more coherent, unified understanding of music in the brain. There are certain advantages to having a singular view of one or two authors, or even in focusing the discussion in a limited word count. The present volume, on the other hand, has strengths in the diversity and expertise of fifty-four authors who have written approximately 350,000 words on music and ­neuroscience. In the next portion of this chapter, we provide an overview of their thirty-three chapters.

Chapter Overviews As this introductory chapter comprises the first section, these overviews will concentrate on sections II through VIII.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

the neuroscientific study of music: a burgeoning discipline   7

II.  Music, the Brain, and Cultural Contexts 2. Music through the lens of cultural neuroscience, Donald A. Hodges. 3. Cultural distance: A computational approach to exploring cultural influences on music cognition, Steven J. Morrison, Steven M. Demorest, and Marcus T. Pearce. 4. When extravagance impresses: Recasting esthetics in evolutionary terms, Bjorn Merker. The three chapters in Section II aim to put the neuroscientific study of music into a larger cultural context. First, Donald Hodges revisits a long-standing notion that mu­sical experiences have both biological and cultural underpinnings. Biology and culture are so intertwined that there is no clear way to separate the two, and no need to, either. Rather, the new field of cultural neuroscience provides increased understanding of how  biological and cultural aspects constrain and enhance each other. Next, Steven Morrison, Steven Demorest, and Marcus Pearce present a model of cultural distance, a computational means of determining how closely the music from disparate cultures relate. Unfamiliar music whose statistical patterns of pitch and rhythm closely approximate one’s own may be easier to process than music with widely divergent patterns. Such a model may be useful in future neuroimaging studies of cross-cultural music processing. In the final chapter in this section, Bjorn Merker presents a persuasive argument that our human aesthetic responses to music arise from elements at play in the development of large and complex birdsong repertoires. Responses among birds may range from boredom to interest/curiosity. In humans, a hedonic reversal leads to being impressed, being moved, or to awe and sublimity at the extreme. Taken together, these three chapters remind us that findings from the neuroscientific study of music must always be placed into broader cultural contexts in order for a full and complete understanding.

III.  Music Processing in the Human Brain 5. Cerebral organization of music processing, Thenille Braun Janzen and Michael H. Thaut. 6. Network neuroscience: An introduction to graph theory network-based techniques for music and brain imaging research, Robin W. Wilkins. 7. Acoustic structure and musical function: Musical notes informing auditory research, Michael Schutz. 8. Neural basis of rhythm perception, Christina M. Vanden Bosch der Nederlanden, J. Eric T. Taylor, and Jessica A. Grahn. 9. Neural basis of music perception: Melody, harmony, and timbre, Stefan Koelsch. 10. Multisensory processing in music, Frank Russo.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

8   donald a. hodges and michael h. thaut Authors in Section III explore what we know about how music is processed in the human brain. Thenille Braun Janzen and Michael Thaut present an organizational scheme based upon ascending auditory pathways, auditory-frontal networks, auditorymotor networks, and auditory-limbic networks. The most advanced research has moved beyond what parts of the brain are involved at specific points in the processing stream and are beginning to look increasingly at how these various brain regions interact in real time. The complexity of music processing, involving aspects such as preference, sociocultural contexts, musical expertise, and so on, poses a daunting challenge but substantial process is being made. One advancement, according to Robin Wilkins, is network science, which utilizes graph theory techniques and analysis as a means of understanding structural and functional connectivity in the brain. Network science moves us closer to learning how the brain communicates with itself in the dynamic process of responding to music. A further advantage may be that it allows for monitoring task performance during much longer music listening conditions than brief excerpts. Michael Schutz continues the discussion in the next chapter with a more fine-grained examination of how micro-timing changes in musical stimuli are processed in the brain as music unfolds over time. Constant, rapid fluctuations in overtone spectra require sophisticated neural tracking mechanisms. Indeed, one of the deficiencies of early synthesized music, and to some extent some auditory perception research, is a lack of ecological validity in terms of temporally invariant musical stimuli. In the next chapter, Christina Vanden Bosch der Nederlanden, J. Eric T. Taylor, and Jessica Grahn provide an overview of the research on how the brain processes and produces musical rhythms. Auditory-motor networks are particularly important in beat finding and other rhythmic processes. Our brain’s ability to perceive and produce rhythms has wide-ranging implications for many aspects of human behavior. Stefan Koelsch expands the discussion into an examination of the neural underpinnings of melodic, harmonic, and timbral perception. Numerous and widespread brain regions are involved in processing music. Because infants and individuals without formal music training can process melody, harmony, and timbre successfully, musicality is clearly a natural ability of the human brain. Although much of the extant research focuses on particular sensory modalities, ultimately a more ecologically valid understanding arises from the integration of multiple sensory inputs and this topic is taken up by Frank Russo. An integrated, multisensory view of music processing involves auditory, visual, somatosensory, vestibular, and motor systems. This necessarily involves extensive, widely-distributed but locallyspecialized neural networks (Sergent, Zuck, Terriah, & MacDonald, 1992). Overall, the six chapters of Section III remind us that music is a whole brain experience, with numerous intertwining and interacting neural networks. Enormous progress has been made in ferreting out all the disparate components and their entangled interrelationships, especially with the advent of rapidly evolving technologies but there are still puzzles left to solve.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

the neuroscientific study of music: a burgeoning discipline   9

IV.  Neural Responses to Music 11. Music and memory, Lutz Jäncke. 12. Music and attention, executive function, and creativity, Psyche Loui and Rachel Guetta. 13. Neural correlates of music and emotion, Patrik Juslin and Laura Sakka. 14. Neurochemical responses to music, Yuko Koshimori. 15. The neuroaesthetics of music: A research agenda coming of age, Elvira Brattico. 16. Music and language, Daniele Schön and Benjamin Morillon. The six chapters comprising Section IV delve into the ways the brain responds to music. Once again, we see multiple overlapping and mutually reinforcing domains. All meaningful musical experiences involve memory in one way or another. Lutz Jäncke explores discrete, music-only, and shared memory systems that involve auditory pro­ cessing, episodic, autobiographic, semantic, and implicit memories, as well as motor programs, emotion, and motivation. Each of these components has neural correlates designed for encoding, storing, and retrieving musical memories. Such a diffuse and distributed network may help explain commonly reported musical influences on nonmusical memory formation. Psyche Loui and Rachel Guetta tackle relationships between music and attention, executive function, and creativity. The topic of attention in music can be informed by general theories of attention, as well as those specifically applied to musical stimuli. Passive music listening experiences are less likely to affect executive functions, but research is ongoing concerning whether and to what extent active musicing affects executive functions in terms of near and far transfers and in terms of relevant neural mechanisms. Attention and executive functions, along with their attendant brain networks, are both connected to musical creativity. Patrik Juslin and Laura Sakka provide a thorough and detailed review of neuroimaging studies related to music and emotion. Although certain brain regions have been more or less consistently implicated in the processing of musical emotions, much is still unclear. For example, it is not always certain in some experimental designs whether participants are “merely” perceiving or actually experiencing musical emotions. Juslin and Sakka provide methodological recommendations for moving the field forward. Neurochemical responses are the basis for musical emotions and Yuko Koshimori reviews recent work in this emerging field. Musical experiences induce the release of neurotransmitters (e.g., dopamine, serotonin, and acetylcholine), neuropeptides (e.g., beta-endorphin, oxytocin, and arginine vasopressin), steroid hormones (e.g., cortisol), and peripheral immune biomarkers. In addition to the main area of research concerning neurochemical responses in music listening and music performance experiences, another primary course of investigation involves the intentional manipulation of neurochemicals via music in a variety of health and wellness issues (e.g., Parkinson’s disease, chronic pain, and stress).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

10   donald a. hodges and michael h. thaut Elvira Brattico’s discussion of neuroaesthetics combines but also moves beyond the previous chapters in this section; this is another emerging field that demonstrates the maturing of neuromusical research. Building on decades of previous work in music ­perception, cognition, and more recently emotion, neuroaesthetics investigates matters such as brain areas involved in liking, preference, and aesthetic judgments. While this undoubtedly introduces more subjectivity into the discussion, it also moves us closer to a core human experience that lies at the root of music’s importance. Music and language are both ubiquitous aspects of the human experience and questions about the nature of and relationships between the two have been asked and the answers debated for centuries. Now neuroscientists are posing new questions, such as “to what extent are music and language processed in distinct, shared, or homologous networks?” Daniele Schön and Benjamin Morillon give answers to this and related questions based on current evidence. They also discuss the effects of musical experiences on language acquisition and skills. As was the case with Section III, Section IV demonstrates the tremendous complexity of human musical experiences from a neuroscientific standpoint. Steadily, patiently, over a period of time and with new technologies and methodologies, a clearer picture is emerging.

V.  Musicianship and Brain Function 17. Musical expertise and brain structure: The causes and consequences of training, Virginia Penhune. 18. Genomics approaches for studying musical aptitude and related traits, Irma Järvelä. 19. Brain research in music performance, Eckart Altenmüller, Shinichi Furuya, Daniel Scholz, and Christos Ioannou. 20. Brain research in music improvisation, Michael Erkkinen and Aaron Berkowitz. 21. Neural mechanisms of musical imagery, Timothy Hubbard. 22. Neuroplasticity in music learning, Vesa Putkinen and Mari Tervaniemi. Authors of the six chapters comprising Section V are all concerned with unraveling knotty issues surrounding the ways musicianship and brain function interact with each other. Virginia Penhune begins with the notion that musical training affects numerous brain structures, including gray and white matter, auditory cortex and association areas, motor regions, frontal regions, and parietal cortex. Some variances between adult musicians and non-musicians may be due to pre-existing differences, but sufficient research exists to support the contention that long-term musical training produces many of these changes. Penhune also discusses reasons why music has such strong effects on brain plasticity. Irma Järvelä takes us on a tour of genomics, specifically the role of genetics in human musicality. Genes influencing inner ear development, auditory pathways, and cognition

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

the neuroscientific study of music: a burgeoning discipline   11 are all linked to musical aptitude. In addition, genomics research suggests that music and language have a common evolutionary heritage and that genes play a role in the effects music has on the body. Eckart Altenmüller, Shinichi Furuya, Daniel Scholz, and Christos Ioannou examine the contributions that prolonged extensive goal-directed practice, multisensory-motor integration, high arousal, and emotional and social rewards make toward inducing brain plasticity. They discuss motor planning and control, and finally musician’s dystonia, that is, plasticity-induced loss of skills or what they call de-expertise. Michael Erkkinen and Aaron Berkowitz review neuroimaging studies of music improvisation. Using PET, fMRI, tDCS (transcranial direct current stimulation), and EEG, researchers have implicated numerous brain regions involved in the spontaneous creation of music. Overall, improvisation activates a broad network of brain regions involving cognitive control and monitoring, motor planning and execution, multimodal sensation, motivation, emotional/limbic processing, and language regions. Timothy Hubbard describes and discusses auditory and motor neural mechanisms supporting musical imagery. Involuntary musical imagery includes anticipatory musical imagery, musical hallucinations, schizophrenia, earworms, and synesthesia. Embodied musical imagery is covered in such examples as spatial and force metaphors, the role of mimicry, the distinction between the inner ear and inner voice, the effects of mental practice on performance, musical imagery and dance, and musical affect. Vesa Putkinen and Mari Tervaniemi are concerned with neural plasticity in music learning. Focusing primarily on studies employing ERPs derived from EEG and MEG, they found evidence to support the contention that musical training enhances domaingeneral auditory processing skills, though far transfer to executive functions is less certain. They also contend that training alone does not account for all the differences between musicians and non-musicians, as self-selection is a confound in terms of predisposing factors. These six chapters push beyond the nature of passive music listening situations into the realm of active musicing experiences. While we cannot pretend that we fully understand what is transpiring in the brain of Daniel Barenboim as he conducts a Mahler symphony, by fits and starts, patient marching, and occasional leaping, we are moving forward.

VI.  Developmental Issues in Music and the Brain 23. The role of musical development in early language acquisition, Anthony Brandt, Molly Gebrian, and Robert Slevc. 24. Rhythm, meter, and timing: The heartbeat of musical development, Laurel J. Trainor and Susan Marsh-Rollo. 25. Music and the aging brain, Laura Ferreri, Aline Moussard, Emmanuel Bigand, and Barbara Tillmann.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

12   donald a. hodges and michael h. thaut 26. Music training and cognitive abilities: Associations, causes, and consequences, Swathi Swaminathan and E. Glenn Schellenberg. 27. The neuroscience of children on the autism spectrum with exceptional musical abilities, Adam Ockelford. Throughout the lifespan, musical experiences have consequences for brain development. Anthony Brandt, Molly Gebrian, and Robert Slevc examine the role of early musical experiences on language acquisition. Evidence suggests that speech is initially processed by infants as a type of music. Initially entangled in the child’s brain, speech and music gradually develop into independent modalities. Though many of the differences between speech and music starkly divide them, timbral aspects of phonemes and prosodic elements of melodic and rhythmic inflection provide a common bridge. Laurel Trainor and Susan Marsh-Rollo focus on the special role that rhythmic elem­ ents play in musical development. Initially, infants use timing cues to perceive and respond to emotional information. As they become enculturated to their surroundings, they develop oscillatory brain rhythms that link auditory and motor aspects of entrainment. Eventually, perceptual awareness of the synchronicity of movements among people enables them to make reliable judgments of trust and friendship. Laura Ferreri, Aline Moussard, Emmanuel Bigand, and Barbara Tillmann report on the role music can play in improving cognition and promoting well-being and social connection at the other end of the lifespan. Divided into two major sections, the first concentrates on music’s contributions to healthy aging, including underlying brain regions. The second examines the role of music-based therapeutic approaches dealing with age-related issues such as memory, language, motor functions, and emotions and well-being. Swathi Swaminathan and E. Glenn Schellenberg review relationships between music training and cognitive abilities. Positive associations are reported for measures of general cognitive, visuospatial, and language abilities, as well as academic achievement and healthy aging. However, with the exception of some linkages between musical training and specific language skills, causal evidence is lacking, inconsistent, or weak. In the final chapter in this section, Adam Ockelford presents a neuroscientific model accounting for exceptional musicianship among some children on the autism spectrum. In these special cases, children process language and everyday sounds as if they were music. For these individuals, then, music takes precedence over language and other everyday sounds. From birth to death, and in all cognitive conditions, music plays an important role in the human experience. We have known this anecdotally and now we are beginning to understand requisite brain processes.

VII.  Music, the Brain, and Health 28. Neurologic Music Therapy in sensorimotor rehabilitation, Corene Thaut and Klaus Martin Stephan. 29. Neurologic Music Therapy for speech and language rehabilitation, Yune Lee, Corene Thaut, and Charlene Santoni.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

the neuroscientific study of music: a burgeoning discipline   13 30. Neurologic Music Therapy targeting cognitive and affective functions, Shantala Hegde. 31. Musical disorders, Isabelle Royal, Sébastien Paquette, and Pauline Tranchant. 32. When blue turns to gray: The enigma of musician’s dystonia, David Peterson and Eckart Altenmüller. The greatest preponderance of neuromusical research is basic research, an attempt to understand how music is processed in the brain. To date, the strongest forays into applied research come in the area of health. The five chapters in Section VII demonstrate the tremendous strides that have been taken in utilizing the power of music for more healthy living. Music is important in the development, rehabilitation, and maintenance of sensorimotor function, especially as it relates to neurologic disorders. Corene Thaut and Klaus Martin Stephan discuss the role of neurologic music therapy (NMT) in the facilitation of motor function in such populations as those with Parkinson’s disease, stroke, ­traumatic brain injury (TBI), multiple sclerosis, cerebral palsy, autism, and the healthy elderly. They cover acquired movement disorders, degenerative diseases, and developmental disorders. Yune Lee, Corene Thaut, and Charlene Santoni explore the efficacy of using NMT interventions for the treatment of dysarthria, apraxia of speech, aphasia, fluency, sensory deficits, voice disorders, and dyslexia. Eight standardized clinical techniques in the speech and language domain include Melodic Intonation Therapy (MIT), Musical Speech Stimulation (MUSTIM), Rhythmic Speech Cueing (RSC), Vocal Intonation Therapy (VIT), Oral Motor and Respiratory Exercises (OMREX), Therapeutic Singing (TS), Developmental Speech and Language Training Through Music (DSLM), and Symbolic Communication Training Through Music (SYCOM). Built-in temporal processes for both rhythm and speech are mediated by corticostriatal circuitries comprising the basal ganglia, the supplementary motor area (SMA), the premotor cortex, and the frontal operculum. Shantala Hegde discusses the use of NMT to improve cognitive and affective ­functioning in such neurological conditions as TBI, stroke/cerebrovascular accident, dementia, other degenerative conditions like Parkinson’s disease, and in major psychiatric conditions such as schizophrenia, bipolar affective disorders, as well as common psychiatric conditions such as anxiety and depression. Music can play an important role in cognitive rehabilitation as it engages auditory, motor, language, cognitive, and emotional functions across cortical and subcortical brain regions. Although early results are promising, considerably more research using standardized NMT techniques is needed. Isabelle Royal, Sébastien Paquette, and Pauline Tranchant focus their attention on musical deficiencies due to congenital or acquired amusia and musical anhedonia. Some individuals are born with an inability to process pitch or rhythm; others acquire such deficits as a result of brain trauma or stroke. Musical anhedonia may affect approximately 2 percent of the population; even though these individuals are able to interpret music’s emotional content, they derive no pleasure from it. Collectively, the study of

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

14   donald a. hodges and michael h. thaut amusia provides a unique opportunity to study neural structures underlying music processing. David Peterson and Eckart Altenmüller investigate musical dystonia (MD), the enigmatic disorder that selectively interferes with involuntary motor control necessary for musical performance. MD includes such pathological features as abnormalities in inhibition, sensorimotor integration, and plasticity at many levels of the central nervous ­system. Increasing understanding of the underlying neurological processes may lead to improved management and possibly prevention of MD. Centuries of music therapy in a broad sense of the term (e.g., as in the role of the shaman or medicine man in many societies worldwide) and decades of “modern” music therapy have clearly demonstrated the healing powers of music. We are just now, however, at the cusp of explaining these effects from a neuroscientific standpoint. Ensuing years will undoubtedly see tremendous progress in these applications.

VIII.  The Future 33. New horizons for brain research in music, Michael Thaut and Donald Hodges. In the final chapter, our aim is to identify noteworthy developments in music-brain research and identify a few key areas for future research. As demonstrated throughout this book, significant strides are being made in a wide variety of important areas, including network modeling and connectivity analyses, genomics and neurotransmitter imaging, and clinical neuroscience research. Somewhat lagging is neuroimaging work in musician’s health, music education, and collaborative efforts with music philosophers. A few final comments concerning the content of this book: anyone reading multiple chapters is likely to discover that there are some overlaps in coverage. That is, subtopics may be discussed in more than one chapter. We chose not to delete most of these places during the editorial process for two main reasons: (1) Subtopics frequently need to be reintroduced in various chapters to provide context for the main topic at hand. (2) In using slightly different wording or citing different sources, various authors provide a richer understanding. Contrarily, there are still a few topics that are not covered in this volume. To do so would require expanded coverage beyond what is possible at this point. Furthermore, it should be noted that research in certain areas is moving so quickly that new findings are changing our understanding on a very short timescale. Rapid release of individual chapters online counteracts this problem to a certain extent and we are extremely pleased with the contributions these contributing authors have made to the literature on music and the brain.

References Altenmüller, E., Finger, S., & Boller, F. (Eds.). (2015a). Music, neurology, and neuroscience: Historical connections and perspectives. Progress in Brain Research Vol. 216. Amsterdam: Elsevier.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

the neuroscientific study of music: a burgeoning discipline   15 Altenmüller, E., Finger, S., & Boller, F. (Eds.). (2015b). Music, neurology, and neuroscience: Evolution, the musical brain, and medical conditions and therapies. Progress in Brain Research, Vol. 217. Amsterdam: Elsevier. Beisteiner, R. (1995). DC-EEG, MEG and FMRI as investigational tools for music processing. In R.  Steinberg (Ed.), Music and the mind machine: The psychophysiology and psycho­ pathology of the sense of music (pp. 243–249). Berlin: Springer Verlag. Bickle, J. (2003). Philosophy and neuroscience: A ruthlessly reductive account. Dordrecht: Kluwer Academic Publishers. Brust, J. (2003). Music and the neurologist: A historical perspective. In I. Peretz & R. Zatorre (Eds.), The cognitive neuroscience of music (pp. 181–191). Oxford: Oxford University Press. Cappiani, L. (1901). Phrenology, physiology, and psychology in connection with music and singing. The Phrenological Journal and Science of Health (1870–1911) 3(2), 58–60. Critchley, M., & Henson, R. (Eds.). (1977). Music and the brain: Studies in the neurology of music. Springfield, IL: Charles C. Thomas. Elling, P., Finger, S., & Whitaker, H. (2015). Franz Joseph Gall and music: The faculty and the bump. In E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience: Historical connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 3–32). Amsterdam: Elsevier. Graziano, A., & Johnson, J. (2015). Music, neurology, and psychology in the nineteenth century. In E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience: Historical connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 33–49). Amsterdam: Elsevier. Haas, L. (2003). Hans Berger (1873–1941), Richard Caton (1842–1926), and electroencephalography. Journal of Neurosurgery and Psychiatry 74(1), 9. Henson, R. (1977). Neurological aspects of musical experience. In M. Critchley & R. Henson (Eds.), Music and the brain (pp. 3–21). Springfield, IL: Charles C. Thomas. Jackson, J. (1871). National hospital for the paralysed and epileptic: Singing by speechless (aphasic) children. The Lancet 2, 430–431. Johnson, J., & Graziano, A. (2003). August Knoblauch and amusia: A nineteenth-century ­cognitive model of music. Brain and Cognition 51(1), 102–114. Koelsch, S. (2012). Brain and music. Oxford: Wiley-Blackwell. Krakauer, J., Ghazanfar, A., Gomez-Marin, A., MacIver, M., & Poeppel, D. (2017). Neuroscience needs behavior: Correcting a reductionist bias. Neuron 93(3), 480–490. Lorch, M., & Greenblatt, S. (2015). Singing by speechless (aphasic) children: Victorian medical observations. In E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience: Historical connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 53–72). Amsterdam: Elsevier. Marin, O., & Perry, D. (1999). Neurological aspects of music perception and performance. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 653–724). San Diego: Academic Press. Peretz, I., & Zatorre, R. (2005). Brain organization for music processing. Annual Review of Psychology 56, 89–114. Roland, P. E., Skinhøj, E., & Lassen, N. A. (1981). Focal activations of human cerebral cortex during auditory discrimination. Journal of Neurophysiology 45(6), 1139–1151. Schlaug, G. (2003). The brain of musicians. In I. Peretz & R. Zatorre (Eds.), The cognitive neuroscience of music (pp. 366–381). Oxford: Oxford University Press. Schwent, V. L., Snyder, E., & Hillyard, S. A. (1976). Auditory evoked potentials during multichannel selective listening: Role of pitch and localization cues. Journal of Experimental Psychology: Human Perception and Performance 2(3), 313–325.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

16   donald a. hodges and michael h. thaut Sergent, J., Zuck, E., Terriah, S. & MacDonald, B. (1992). Distributed neural network underlying musical sight-reading and keyboard performance. Science 257(3), 106–109. Shaw, D., & Hill, D. (1947). A case of musicogenic epilepsy. Journal of Neurology, Neurosurgery, and Psychiatry 10(3), 107. Warren, J. (2008). How does the brain process music? Clinical Medicine 8(1), 32–36.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

section ii

M USIC , T H E BR A I N, A N D C U LT U R A L C ON T E X T S

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

chapter 2

M usic through th e Lens of Cu lt u r a l N eu roscience Donald A. Hodges

Introduction to Cultural Neuroscience Scholars have long recognized the co-equal roles biology and culture play in the ­phenomenon we call music (e.g., Blacking, 1973). Fifty years ago, Gaston (1968), quoting Dobzhansky, expressed the idea clearly and succinctly. In asking how we developed characteristics of humanness, he wrote: To begin to answer this question, it is not necessary to separate the biology from the culture of man [italics in the original]. They go hand in hand. “The fact which must be stressed, because it has frequently been missed or misrepresented, is that the biological and cultural evolutions are parts of the same process” (Dobzhansky, 1962, p. 22). This means that the part of man’s culture we call music has a biological as well as a cultural basis.  (p. 11)

To be certain, the pendulum of our understanding has sometimes swung toward one and away from the other, when nature is favored over nurture and vice versa. However, for the moment, let us take it as axiomatic that both are necessary. Even so, “the problem of reconciling ‘cultural’ and ‘biological’ approaches to music, and indeed to the nature of mind itself, remains” (Cross & Morley, 2009, p. 61). The purpose of this chapter, then, is not to debate that biology and culture are both necessary components of human musical experiences, nor to determine the extent of the contribution from each, but rather to examine some of the recent evidence that supports this contention. One reason to take another look at an old, and perhaps well-established concept, is to add newer understandings from the field of cultural neuroscience.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

20   donald a. hodges CC-Behavior

Interact

Culture

ide Gu CV-Behavior

Fit /M od

if y

Brain

e aliz xtu nte Co

e ap Sh

Figure 1.  Illustration of the CBB loop model of human development. Cultural environments contextualize human behaviors. Learning novel cultural beliefs and the practice of different behavioral scripts in turn modify the functional organization of the brain. The modified brain then guides individual behavior to voluntarily fit into a cultural context and meanwhile to ­modify current cultural environments. Direct interactions also occur between culture and brain without overt behavior. Abbreviations: CBB, culture–behavior–brain, CC-Behavior, culturally contextualized behavior; CV-Behavior, culturally voluntary behavior. Reprinted from Trends in Cognitive Neuroscience 19(11), Shihui Han and Yina Ma, A culture-behavior-brain loop model of human development, pp. 666–676, Figure 1, doi.org/10.1016/j.tics.2015.08.010, Copyright © 2015 Elsevier Ltd. All rights reserved.

Cultural neuroscience is an emerging field of study that has arisen as a means of investigating relationships between culture and brain (Chiao, Li, Seligman, & Turner, 2016; Han et al., 2013). Chiao (2009) sees three components of the cultural neuroscience toolbox: • Cultural psychologists investigate what cultural values, beliefs, and practices influence human behavior and how they do so. • Neuroscientists use a variety of approaches to determine the role of the brain. • Neurogeneticists investigate genetic regulation of brain mechanisms that support cognitive, emotional, and social behaviors. Using these three components, Han and Ma (2015) proposed a culture–behavior– brain (CBB) loop model of human development (Fig.  1). Culturally contextualized behaviors (CC-Behavior) occur within a specific cultural context but may not occur outside that culture. Culturally voluntary behaviors (CV-Behavior) are guided by specific cultural mores that become embedded in the brain. Genes moderate culture–brain interactions by affecting brain anatomy and some behavioral and cognitive characteristics; likewise, there are mutual gene–culture influences. Some of these genetic influences take place over thousands of years and some occur within a given lifespan. Because a full explication of cultural neuroscience would require an extended discussion beyond this chapter, a more straightforward way to approach cultural neuroscience is to examine the implications of the following: “Cultural practices adapt to neural constraints,

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   21 and the brain adapts to cultural practice” (Ambady & Bharucha, 2009, p. 342). Let us examine both of these in turn, specifically as they relate to music.

Cultural Practices Adapt to Neural Constraints Although it is difficult to predict precise biological limits for human performance, a reasonable assumption is that biological factors place restrictions on human musicality. We can hear musical pitches only within a delimited frequency range, typically 20 Hz–20,000 Hz at the extremes. We can sing only so high; for example, Mozart stretched the limits when he wrote an F above high C in the Queen of the Night aria from The Magic Flute (“Der Hölle Rache kocht in meinem Herzen” from Die Zauberflöte). Even so, musicians are capable of amazing feats. Smith (1953) reported that one pianist performed the 6266 notes of Schumann’s Toccata in C Major, Op. 7 in 4’20” at a rate of 24.1 notes per second. Toscanini was credited with a phenomenal memory, reportedly having memorized 250 symphonic works and 100 operas (Marek, 1975). Although it is certainly possible for someone to play these pieces faster or memorize more scores, surely there must be some limits. Perceptually and cognitively, Wagner and Chinese operas push many listeners to the extreme. Going beyond human limits, however, Cage’s Organ2/ASLSP (As SLow aS Possible) is currently being performed in a church in Halberstadt, Germany in what is projected to take 639 years (Wakin,  2006). At this speed, it is possible for any person to hear only a fraction of the entire performance.

The Brain Adapts to Cultural Practice Just as the brain shapes what we do, what we do shapes the brain. Neurologist Frank Wilson (1998) wrote a compelling account of how the brain and the hand co-evolved. Over time, developmental changes allowed us to use our hands for an increasingly wider variety of tasks, such as grasping, throwing, pounding, manipulating tools, and so on, and these newly-acquired skills, in turn, spurred further brain development. Of course, it is not just the hand in isolation. In chipping stone tools, for example, listening carefully to the sound of the stone being shaped is critical to a successful result, as one extra strike may cause the rock to break. Creating bone flutes (Conard, Malina, & Münzel, 2009) or lithophones (Cross, Zubrow, & Cowan, 2002), rock percussion instruments out of flint blades, would require similar interactions of hand, ear, and brain. In the case of flutes, tinkering with where to place finger holes and how to direct the air (i.e., whether as a notched, block, or transverse flute) requires considerable ingenuity (Kunej & Turk, 2000). Wilson encapsulates these ideas in speaking about the co-evolution of the brain and the musical hand: What we are left with when we seek to explain musical talent on a biological basis seems best characterized as an assembly of neurologic and behavioral potentials that arise from within and are uniquely defined by specific cultures.  (1998, p. 224)

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

22   donald a. hodges Another example of cultural practice influencing brain development can be seen in the organization of the hearing mechanism. Tonotopic organization maintains a ­frequency map on the basilar membrane in the inner ear that is maintained throughout the auditory pathway all the way to the auditory cortex. Pantev and colleagues (1998) demonstrated that for trained musicians a pitch map overlays the frequency map, as responses were 25 percent larger to piano tones than to pure tones; this was not true for controls who had never learned to play a musical instrument. Similarly, violinists and trumpeters showed more robust responses to tones from their instrument than to pure tones (Pantev, Roberts, Schultz, Engelien, & Ross, 2001). Since musical tones from Western instruments (i.e., piano, violin, trumpet) are cultural artifacts, it is d ­ ifficult to account for these results unless the brain has adapted itself to environmental experiences. In contrast to the two ends of a continuum (i.e., either nature or nurture), human behavior, generally, and musical behavior, specifically, are a combination of the two. In the following sections, we will briefly examine genetic influences on musical behavior, neural plasticity, cultural influences on innate infant responses to music, the search for music universals, and cross-cultural music research.

Genetic Influences on Musical Behavior Genetic instructions provide another example of biological restrictions that can be modified by environmental experiences. Although genes provide instructions that influence nearly everything about us, including both physical features (e.g., hair and eye color) and behavior, genetic instructions are not inviolable; rather, daily living and life’s experiences influence gene expression, including those associated with learning and memory (Rampon et al., 2000). However, interpreting gene–environment interactions is not without difficulty. What makes the situation so problematic is that some environmental circumstances that might influence genetic expression are themselves open to  genetic influence. In reviewing the status of current understanding, Manuck and McCaffery state that, “. . . it seems reasonable to assume that most dimensions of ­measured experience will have both environmental and genetic determinants . . .” (2014, p. 63), even if there is no clear way of disentangling the two. Ullén, Hambrick, and Mosing (2016) discussed interactions between environment and genetic instructions in the development of expertise. In contrast to a focus on deliberate practice as the sole determiner of expert performance, they proposed a multifactorial gene–environment interaction model (MGIM) of expert performance (Fig. 2). According to this model, expertise results from an array of factors that work in tandem. High-level expertise (e.g., musical performance) cannot simply be a ­matter of enough hours of deliberate practice. Genetic and non-genetic factors, along

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   23

Neural Mechanisms

Abilities

General IQ Narrow abilities

LTM-WM Sensorimotor skills

Personality

Grit/conscientiousness Openness Impulsivity

Interests Motivation

Genes

Expertise Physical properties

Deliberate practice

Muscle strength Height Size of body and extremities

G-E covariation

Environment

Figure 2. Schematic summary of main elements of the multifactorial gene–environment interaction model (MGIM). At the phenotypic level (upper part), the MGIM assumes that psychological traits such as abilities, personality, interests, and motivation are associated with the domain and intensity of practice. Specific examples of variables that have been shown to be involved in various forms of expertise are provided in italics under each general heading. Practice will cause adaptations of neural mechanisms involved in expertise and can also influence relevant physical body properties. Furthermore, neural mechanisms related to trait differences may impact expertise independently of practice. Both genetic and non-genetic factors (lower part) influence the various variables that are involved in expertise at the phenotypic level. These influences are likely to be complex and involve both gene–environment interaction effects and covariation between genes and environment (G–E covariation). Reprinted from Psychological Bulletin 142(4), Fredrik Ullén, David Zachary Hambrick, and Miriam Anna Mosing, Rethinking expertise: A multifactorial gene–environment interaction model of expert performance, pp. 427–446, doi.org/10.1037/bul0000033, Copyright © 2016 American Psychological Association.

with their interactions, are necessary. For example, in a large study of twins (N = 10,500), genetic influences accounted for the amount of practice time (69 percent of  the variance in males and 41 percent in females) (Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014). Contemporary research is providing increasingly refined understandings of genetic– musical behavior interactions. For example, gene expression is differentially upregulated or downregulated for music listening or for music performance (Kanduri et al., 2015a, b). Excellent reviews of the role of genetics in music, documenting interaction between genes and environment are found in Mosing, Peretz, & Ullén (2018), Yi, McPherson, Peretz, Berkovic, & Wilson (2014), and Yi, McPherson, & Wilson (2018). See also Chapter 18.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

24   donald a. hodges

Neural Plasticity Musicians are models of neural plasticity (Münte, Altenmüller, & Jäncke, 2002). That is, many changes have been documented in the brains of musicians as a result of training. Table 1 is not intended to be an exhaustive list, either of neural adaptations or number of relevant sources, but rather to show only a few of the ways that adult musicians’ brains have been modified by music learning experiences. Several investigators have concluded that these changes are more likely a result of intense music learning experiences than that these musicians were born with “different” brains (Hyde et al., 2009; Norton et al., 2005; Schlaug, Norton, Overy, & Winner, 2005; Schlaug et al., 2009). In a confirming study, identical twins, with one member of each pair having piano lessons and the

Table 1.  Changes in musicians’ brains Region

Change

Anatomical changes Cerebellum

Greater volume in males, but not females

Hutchinson et al., 2003

Corpus callosum

Area 3 of CC enlarged

Schlaug et al., 2009

Gray matter

Greater volume in motor, auditory, and visuospatial areas

Bermudez & Zatorre, 2005; Gaser & Schlaug, 2003

Sensorimotor cortex

Identifying markers in precentral cortex for string players (RH) and pianists (LH)

Bangert & Schlaug, 2006

White matter

Positive correlations between amount of Bengtsson et al., 2005 practice time and white matter organization

Functional changes Auditory cortex

Source

Increased cortical representation for musical tones over pure tones

Pantev et al., 1998, 2001

Multimodal integration areas

Increased activity in convergence zones

Hodges et al., 2005

RH motor cortex

Increased cortical representation for string players

Elbert et al., 1995

Secondary auditory cortex

Superior sound localization in conductors

Münte et al., 2001

Temporal and frontal lobes

Enhanced MMN for chord alterations

Koelsch et al., 1999; Tervaniemi et al., 1999

Visual cortex

Minimal deactivation of visual cortex during difficult auditory tasks

Hodges et al., 2010

RH = right hemisphere; LH = left hemisphere; CC = corpus callosum; MMN = mismatch negativity, a component of event-related potentials in response to a violation of an expected rule (e.g., a wrong note in a tonal musical passage).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   25 other one not, showed significant differences in brain anatomy attributed to musical training (Manzano & Ullén, 2018). Actually, formal study, or in musical parlance practice, is not necessary for musical experiences to elicit changes in the brain. With the possible exception of those with congenital amusia (Peretz, Brattico, Järvenpää, & Tervaniemi, 2009), nearly everyone learns the music of the surrounding culture, even in the absence of formal training. For example, people generally have no trouble successfully processing the accompanying musical track while watching movies and television. This was confirmed in a study in which scores on combined music aptitude tests were normally distributed in a population, ­suggesting that “moderate musical aptitude is common and does not need formal ­training” (Oikkonen & Järvelä, 2014, p. 1104). One of the critical challenges infants face is to make sense of what initially appears to be a chaotic world. Fortunately, they come into the world remarkably able to detect patterns and structures in the environment based on the frequency with which they are encountered. Moreover, they are able to do this often in the absence of explicit feedback. Statistical learning, as it is called, is foundational for understanding how we process both auditory (Saffran, Aslin, & Newport, 1996; Saffran, Johnson, Aslin, & Newport, 1998) and visual stimuli (Kirham, Slemmer, & Johnson, 2002; Turk-Browne, Jungé, & Scholl, 2005). Music and language are the two primary auditory inputs that have been studied. Regarding music, statistical learning plays a role in the perception of melody (Creel, Newport, & Aslin, 2004), harmony (Jonaitis & Saffran, 2009), timbre (Tillmann & McAdams, 2004), and the acquisition of absolute pitch (Saffran, 2003; Saffran & Griepentrog, 2001). Gestalt organizing principles appear to be important in the statistical learning process (Creel, Newport, & Aslin, 2004). Work on the neural structures involved in statistical learning is just beginning (e.g., Karuza et al., 2013), however, there is every reason to believe that advancements in this area will continue to be made. In the meantime, additional support for innate neural structures subserving music came with the discovery that congenital amusics (persons with music processing deficits) can learn unfamiliar words as easily as controls, but not musical patterns (Peretz, Saffran, Schön, & Gosselin, 2012); in other words, mere exposure is not sufficient without the requisite intact neural mechanisms. Experience-expectant processes (e.g., language and music) are largely driven by genes; the brain prepares itself, largely through genetic processes, to learn any language(s) that the person might encounter (Kuhl & Rivera-Gaxiola,  2008). Experience-dependent processes (e.g., English or Spanish; jazz or Chinese opera) rely more on learning experiences. Thus, infants have the capability of processing any musical style they might encounter (Hannon & Trehub,  2005; Winkler, Háden, Ladining, Sziller, & Honing, 2009), but the particular musical style or styles depends upon the environment in which they are raised. Galvan (2010) created a model whereby neural plasticity is a result of both development and learning (Fig. 3). Rather than being independent, autonomous processes, development and learning are part of a continuum. Genetic instructions and learning experiences work together to shape the brain. Experienceexpectant mechanisms rely more on development, while experience-dependent mechanisms rely more on learning.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

26   donald a. hodges Experience -expectant

Experience -dependent

Development

Learning

Neural Plasticity

Figure 3.  This working model illustrates that development and learning exist on a continuum, as each independently and simultaneously influence neural plasticity. While development is largely guided by experience-expectant mechanisms, it also receives input from experiencedependent mechanisms. Similarly, learning is mostly guided by experience-dependent mechanisms, but also receives experience-expectant input. Reprinted from Human Brain Mapping 31(6), Adriana Galván, Neural plasticity of development and learning, pp. 879–90, Figure 1+, doi.org/10.1002/hbm.21029, Copyright © 2010, John Wiley and Sons.

Looking for explanations of how these changes occur in the brain leads us to two basic brain development processes, neural pruning and myelination, both of which have been implicated in musical studies. Each process is driven by both genetic instructions and lived experiences.

Neural Pruning Early on in development, the brain overproduces synapses, the connections between neurons (Berk, 2017). Different brain regions peak at different times, but by age 2 there may be as many as 50 percent more synapses in a given area than will be present during adulthood (Stiles, Reilly, Levine, Trauner, & Nass, 2012). Following the peak of this rapid proliferation of synapses, a protracted period of decline extends throughout childhood and into early adulthood. Operating on a “use it or lose it” basis, unused synapses are selectively pruned, leaving a sculpted brain (Gogate, Giedd, Janson, & Rapoport, 2001). The number of possible connections—100,000 trillion (1014) synapses in the cerebral cortex—is far too great to be determined by genetics alone. Rather, the general outlines are genetically programmed, with selective pruning guided by sensory and motor experience, psychoactive drugs, gonadal hormones, parent–child relationships, peer relationships, stress, intestinal flora, and diet (Kolb & Gibb, 2011). Changes in cortical thickness as a result of pruning are associated with behavior. Sculpting the brain is not simply a matter of deleting unused cells and synapses. At the same time this is happening, new synapses are being formed throughout the lifetime. Synapses formed early in life are “expecting” certain life experiences that will prune

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   27 them into optimal networks. Later forming synapses are more localized and specific to particular learning experiences. “Thus, experiences are changing neural networks by both adding and pruning synapses” (Kolb & Gibb, 2011, p. 268).

Myelination As neurons communicate among themselves by creating neural networks, they have numerous dendrites for input but only one axon for output. Over time, axons are covered in a fatty sheath called myelin that enhances transmission speed up to 100 times and improves efficiency (Zull, 2002). Genetic instructions drive myelination in a ­process that moves through the brain from bottom to top and back to front. Thus, it is only in one’s early to mid-20s that the frontal lobes are fully myelinated, and increasing myelination is related to enhanced cognitive functioning (Webb, Monk, & Nelson, 2001). Because myelin is white in appearance, the core of the brain is called white matter; here, billions of fibers connect different regions of gray matter into neural networks (Filley, 2005). Although genetic instructions are essential, myelination is also responsive to learning experiences, as “neurons that wire together fire together” and “neurons that fire apart wire apart—or neurons out of sync fail to link” (Doidge,  2007, pp. 63–64). In other words, when we engage repeatedly in a thought or action (e.g., practicing scales), the neural network(s) supporting those processes becomes stronger with repetitive stimulation (Fields,  2009). Specifically, learning experiences elicit more wrappings of the axon, making message transmission increasingly efficient. Thus, for example, Bengtsson et al. (2005) found that practicing the piano induced changes in white matter plasticity. Changes were greater during childhood than during adolescence or adulthood. Improved efficiency comes at a cost, as myelination decreases flexibility in neural responses. That is, the brain places restrictions on itself such that what is learned limits what can be learned (Quartz, 2003). The more attuned to surrounding cultural expressions (e.g., language, music, etc.) children become, the less responsive they are to other cultural expressions (Pons, Lewkowicz, Soto-Faraco, & Sebastián-Gallés,  2009). Responding appropriately to unfamiliar tonal and rhythmic structures becomes more difficult once one has learned the music of the surrounding culture (Patel, Meltznoff, & Kuhl, 2004).

Cultural Influences on Innate Infant Responses to Music There is significant evidence that the fetus responds to sounds during the last trimester before birth, as activations in the primary auditory cortex were recorded using fMRI in the left hemisphere of fetuses at 33 weeks gestation (Jardi et al., 2008). Newborns as early

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

28   donald a. hodges as 1–3 days old responded to music with activations in both hemispheres (Perani et al., 2010). Excerpts of Western tonal music registered primarily in the right hemisphere (RH), while altered or dissonant versions reduced RH responses and activated left inferior frontal cortex and limbic structures. “These results demonstrate that the infant brain shows a hemispheric specialization in processing music as early as the first post­ natal hours” (Perani et al., 2010, p. 4758). Similarly, using near-infrared spectroscopy (NISR), researchers found that neonates registered speech and music sounds in both hemispheres, with more coherent responses to speech in the left hemisphere (LH) (Kotilahti et al.,  2010). Regarding music specifically, researchers used event-related potentials (ERP) to determine that newborns can process musical pitch intervals (Stefanics et al., 2009), distinguish pitch from timbre (Háden et al., 2009), detect the beat in music (Winkler et al., 2009), and create expectations for tonal patterns (Carral et al., 2005). While one cannot rule out the effects of learning entirely, it seems clear that we come into the world prepared to process musical sounds. The foregoing suggests inborn proclivities for musical processing, but not predetermined responses to specific styles of music. After reviewing the literature, Hannon and Trainor (2007) concluded that neural networks “become increasingly specialized for encoding the musical structure of a particular culture” (p. 470). As an example, Shahin, Roberts, & Trainor (2004) found that auditory evoked potentials in 4- and 5-year-olds were larger in those who received Suzuki music lessons compared to controls who did not; even larger responses were generated by tones from the instrument studied (i.e., piano or violin). To conclude this section, we look at two studies in which researchers examined the effects of enculturation more closely. Mehr and colleagues conducted several experiments designed to explore the ways in which infants imbue music with social meanings. Five-month-old infants heard one of two novel songs presented by a parent, by a toy, or by a friendly, but unfamiliar adult in person and subsequently via video (Mehr, Song, & Spelke, 2016). Later, these infants heard two novel individuals sing the familiar and then the unfamiliar song. Those infants who had previously heard a parent sing the familiar song preferred (i.e., looked longer at) the new person singing it rather than the new person singing the unfamiliar song. The amount of exposure to the song received at home was correlated with the length of selective attention. These effects were not found in the infants who initially heard the familiar song emanating from a toy or a sociallyunrelated person. Thus, songs sung by caretakers embody social meanings for five-monthold infants. In an extension, eleven-month-old infants were randomly assigned to one of two groups; one group listened to one of two novel songs sung by a parent, while the others heard a song that emanated from a toy activated by a parent (Mehr & Spelke, 2017). Subsequently, they viewed a video of two new people, each singing one of the songs. In a following silent condition, two people appeared next to each other, each presenting and endorsing an object, such as a small stuffed toy or models of an apple or pear, and the infant was allowed to reach toward the objects. Preference was indicated by eye gaze and touching. Infants in both groups chose the object presented by the singer of

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   29 the familiar song. Clearly, infants preferred familiar songs regardless of whether they were learned by hearing the parent sing or by playing with a musical toy. Even though both groups chose the familiar song, infants who heard parents sing the song gazed longer at the object than those who heard it coming from a toy. Again, music was imbued with social meanings.

The Search for Music Universals René Dubos (1981) coined the term invariants, by which he meant characteristics of human culture that are universal in a general sense but particularized in each culture. Language, clothing, and shelter are some examples, as are art and music. A common way to approach an understanding of the ubiquity of music around the world is to separate universal from culture-specific features. Ethnomusicologists have taken on this challenge in many articles (e.g., Boiles,  1984; List,  1984; Merriam,  1964; Nettl,  1977,  1983, 2000, 2005; Nketia, 1984). By universal, they mean “more common than not” or “typical” and certainly not that every culture employs a particular feature. There will nearly always be exceptions. Nevertheless, there is abundant evidence to support the contention that all human societies engage in what may be called or recognized as music (Cross, 2007, 2009–2010; Cross & Morley, 2009; Nettl, 2005). Such universal behavior is likely supported by underlying biological mechanisms (Turner & Ioannides,  2009), such as genetic influences (see Chapter 18). One line of support for music’s long-standing role in human development comes from archaeological findings. Although the earliest evidence of art is shrouded in the mists of time, there are tantalizing hints such as the Venus of Tan-Tan, a quartzite sculpture dated from between 300,000 and 500,000 years ago (DeFelipe, 2011) or cave paintings from 64,800 years ago (Hoffman et al., 2018). Granted, these earliest findings are controversial, having been created by Homo heidelbergensis and Homo neanderthalensis, respectively, and not direct evidence of music. However, there is reason to believe that music was also part of early human behavioral repertoire (e.g., Mithen, 2006). Here are just a few examples of supporting evidence: • 70,000 years ago: Cave paintings depict a bow, which anthropologists contend was used as a musical instrument as well as a weapon (Kendig & Levitt, 1982); musical bows have been found worldwide (Mason, 1897). • 60,000 years ago: Artifacts in a cave in Lebanon indicate ceremonies involving singing and dancing (Constable,  1973). This was made more plausible when a video was made of a contemporary Australian aborigine executing a cave painting in the presence of singing and dancing as part of a religious ritual (Mumford, 1967). Acoustically, the best places for singing and chanting are those caves with the most art and rooms with poor acoustics rarely have paintings (Allman, 1994; Cross & Morley, 2009; Morley, 2006).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

30   donald a. hodges • 40–20,000 years ago: Cave paintings of musicians and dancers (Prideaux, 1973) are found along with whistles, pipes, flutes, and bone and rock percussion instruments (Blake & Cross, 2008; Cross et al., 2002; Dams, 1985; Kunej & Turk, 2000). Of course, a more extensive treatment of this topic would provide many more details, but even these few points should suffice to make the point that humans have always and everywhere been musical. Singing is common among all cultures (Lomax, 1968), as is the singing of lullabies and dancing to music (McDermott,  2008). Lullabies appear to possess common features (Trehub, Unyk, & Trainor, 1993). The use of musical instruments is so common as to be nearly universal, if not completely so (Wade, 2009). Instruments are often classified into idiophones (struck instruments such as gongs and rattles), membranophones (drums), aerophones (flutes and other wind instruments), chordophones (stringed instruments), corpophones (body percussion and hand clapping), and electrophones (mechanical and electrical instruments) (Hornbostel & Sachs,  1992; Wade,  2009). Drake and Bertrand (2001) proposed five candidates for universals in temporal processing: segmentation and grouping, predisposition towards regularity, active search for regularity, temporal zone for optimal processing, and predisposition toward simple duration ratios. Even some basic emotions appear to be recognized in music cross-culturally (Adachi, Trehub, & Abe,  2004; Balkwill & Thompson,  1999; Balkwill, Thompson, & Matsunaga, 2004; Fritz et al., 2009), although subtle emotions are strongly affected by culture (Davies, 2010; Gregory & Varney, 1996). Given the enormous variety of music and musicing around the world, and even given the fact that some cultures do not have a specific word for music in their language (Cross & Morley, 2009; Dissanayake, 2009), it should be no surprise that there is scant agreement on universal features. However, Brown and Jordania (2011) proposed four types of universals: • Type 1: Conserved Universals occur in all musical utterances and include the use of discrete pitches, octave equivalence, phrase structures, and so on. • Type 2: Predominant Patterns occur in all musical systems or styles and include musical scales with seven or fewer pitches per octave, a predominance of precise rhythms, use of idiophones and drums, and so on. • Type 3: Common Patterns. Musical patterns that, while not universal, are widespread. Examples might include the unity of Jewish musical traditions following the diaspora, with Ashkenazic styles in Russia and northern Europe and Sephardic styles in Persia, India, Spain, and the Mediterranean basin (Bahat, 1980). Another example would be religious music, such as Buddhist or Christian musical practices in many different countries. • Type 4: Range Universals. Particular categories of music or musical behavior are expressed across a wide range of possibilities. For example, all music could be placed into a classification of multipart textures, from monophony, heterophony, homophony, to polyphony.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   31 The first three categories are based strongly on Nettl’s (2000) gradient-of-universality approach. The authors then provide a list of seventy items related to music’s sound structures (i.e., pitch, rhythm, melodic structure and texture, form, vocal style, expressive devices, and instruments) and extra-musical features (i.e., contexts, contents, and behavior). Just as the twelve tones of Western music’s chromatic scale provide for an infinite number of realizations, so might these putative universals provide the structure of human music within which the cultural variations are also infinite. Continued research, however, is critically necessary. Some candidates for relatively universal functions and roles of music in worldwide cultures have also been offered (Table 2). On balance, evidence supports the notion that biological and cultural aspects combine and interact to create whatever may be universal about music. Ongoing cross-cultural music research is critical to advancing our understanding.

Table 2.  Functions and roles of music Music provides the function of: Emotional expression Aesthetic enjoyment Entertainment Communication Symbolic representation Physical response/coordination of action Enforcing conformity to social norms Validation of social institutions and religious rituals Contribution to the continuity and stability of culture Contribution to the integration of society Regulation of an individual’s emotional, cognitive, or physiological state Mediation between self and other

M M M&G M&G M, C, & G M&C M M&G M M C C

Traditional roles of music include: Lullabies Games Work music Dancing Storytelling Ceremonies and festivals Battle Ethnic or group identity Salesmanship Healing Trance Court music C = Clayton (2009), G = Gregory (1997), M = Merriam (1964).

G G G G G G G G G G G G

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

32   donald a. hodges

Cross-Cultural Music Research In thinking about cross-cultural music research, it should be noted that one of the difficulties in our current understanding of neurocognition is the fact that 94 percent of the participants in psychological experiments come from only 12 percent of the world’s population (Arnett,  2008), and 90 percent of published neuroimaging studies come from Western countries (Chiao, 2009). This is likely even more true for music cognition and an alarming exacerbation is the rapid Westernization of the globe such that it will soon be much more difficult to find listeners who have not been exposed to Western music. Time is running out for us to have access to indigenous, authentic musical performers and listeners. A few tentative conclusions can be drawn from the relatively small number of cross-cultural music research studies published:

1. The most general finding is that enculturation strongly affects how one interprets and understands music from within and without the home culture (Curtis & Bharucha, 2009; Demorest, Morrison, Beken, & Jungbluth, 2008; Demorest, Morrison, Nguyen, & Bodnar, 2016; Kessler, Hansen, & Shepard, 1984). The cultural distance hypothesis suggests that musical processing is more efficient and accurate when unfamiliar music is similar to one’s own cultural music and less so the farther removed the unfamiliar music becomes (Demorest & Morrison, 2016; see also Chapter 3). 2. Given the caveats for music universals from the previous section, there are probable cognitive and emotional processes that support all musical experiences but these can be highly modified by enculturation (e.g., Krumhansl et al.,  2000; Lauuka, Eerola, Thingujan, & Yamasaki, 2013; Neuhaus, 2003). 3. Certain basic emotions (e.g., happiness, sadness, anger) may be identifiable in unfamiliar music, but less so emotions that are more culture specific (Balkwill & Thompson,  1999; Balkwill et al.,  2004; Fritz et al.,  2009; Lauuka et al.,  2013). There is some evidence that psychophysical variables or acoustic cues (e.g., tempo, loudness, complexity, etc.) play a role in determining emotional expressions (Balkwill & Thompson, 1999; Balkwill et al., 2004). 4. Music enculturation begins early in infancy (Morrison, Demorest, & Stambaugh,  2008; Soley & Hannon,  2010; Trainor, Marie, Gerry, Whiskin, & Unrau, 2012); bimusicalism, similar to bilingualism, can result from sufficient early exposure (Wong, Roy, & Margulis, 2009). Once well established, however, enculturated processes may be somewhat resistant to change through training (Morrison, Demorest, Campbell, Bartolome, & Roberts 2013). 5. Active musicing is more efficient than passive exposure in establishing enculturated music processes (Trainor et al.,  2012); however, passive exposure, in the form of statistical learning, is sufficient for inculcating a basic understanding of one’s own cultural music (Drake & Ben El Heni, 2003).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   33 Regarding brain responses in cross-cultural research, a few additional points can be made. Activation sites to familiar (i.e., from one’s own culture) and unfamiliar music may elicit responses in similar (Demorest & Morrison, 2003) or nearby brain regions (Matsunaga, Yokosawa, & Abe, 2012). Also, brain activations to culturally unfamiliar music may be more in terms of degree than substance (Demorest & Osterhout, 2012; Morrison & Demorest, 2009; Morrison, Demorest, Aylward, Cramer, & Maravilla, 2003; Nan, Knösche, & Frederici, 2006). However, different brain regions may also be activated in response to familiar and unfamiliar music depending on specific tasks required of participants (Nan, Knösche, Zysset, & Frederici, 2008). Finally, cultural experiences influence both the perception and memory of music at behavioral and neurological ­levels (Demorest et al., 2010). Taken as a whole, cross-cultural research supports the main contention of this chapter, namely that musical experiences are an intricate and complicated combination of biological and cultural processes. Because biological mechanisms may influence how enculturation proceeds and enculturation may impose biological constraints, the two are highly interrelated. As stated at the outset, our purpose is not in attempting to sep­ arate the two, as that is not only impossible but an artificial schism; rather, it is to recognize that one informs the other.

Conclusion Viewed through the lens of cultural neuroscience, the central thesis of this chapter is that biological and cultural aspects of musical experiences are inextricably intertwined. Virtually nothing about musical experiences is purely biological or purely cultural. We might consider a tree with its root system as a visual analogy (Fig. 4). Let the trunk represent musicality as a universal aspect of being human. Let the branches represent major cultural traditions and the smaller twigs and leaves stand for particular musical genres and styles.1 The cultural distance hypothesis (Demorest & Morrison, 2016) suggests that leaves on the same branch (i.e., nearby musical styles) are more understandable to listeners than leaves on the opposite side of the tree. Supporting the visible part of the tree is a dense, deep-seated root system. These roots represent the supporting biological and cultural underpinnings of music. Each root is an amalgam of biological and cultural aspects, such that it is impossible to disentangle the Gordian knot. It is important to remember that the object of study in neuromusical research is not a brain that sits in a jar on a shelf in some lab; it is inside a living person with a personality, with all manner of proclivities, potentialities, and internal and external motivations and influences. Being mindful of these biocultural interactions does not mean that it is possible to separate biology from culture, but rather that research findings must be interpreted 1  To be more accurate, each leaf should have a different shape to represent the individuality of various musical styles.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

34   donald a. hodges

Figure 4.  A visual analogy for human musicality. The roots represent biological and cultural underpinnings. The trunk represents musicality as a universal aspect of humankind. The branches represent different cultural traditions and the twigs and leaves represent particular musical genres and styles.

with an awareness of these mutual influences. Let us hope that research within a cultural neuroscience perspective will proceed at an ever-increasing pace so that we can learn as much as possible about the biocultural aspects of music before it is too late and there are no more indigenous, authentic musicians and music listeners to study.

References Adachi, M., Trehub, S., & Abe, J.-I. (2004). Perceiving emotion in children’s songs across age and culture. Japanese Psychological Research 46(4), 322–336. Allman, W. (1994). The stone age present. New York: Simon & Schuster. Ambady, N., & Bharucha, J. (2009). Culture and the brain. Current Directions in Psychological Science 18(6), 342–345. Arnett, J. (2008). The neglected 95 percent: Why American psychology needs to become less American. American Psychologist 63(7), 602–614.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   35 Bahat, A. (1980). The musical traditions of the oriental Jews. The World of Music 22(2), 46–55. Balkwill, L.-L., & Thompson, W. (1999). A cross-cultural investigation of the perception of emotions in music: Psychophysical and cultural cues. Music Perception 17(1), 43–64. Balkwill, L.-L., Thompson, W., & Matsunaga, R. (2004). Recognition of emotion in Japanese, Western, and Hindustani music by Japanese listeners. Japanese Psychological Research 46(4), 337–349. Bangert, M., & Schlaug, G. (2006). Specialization of the specialized in features of external human brain morphology. European Journal of Neuroscience 24(6), 1832–1834. Bengtsson, S., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience 8(9), 1148–1150. Berk, L. (2017). Development through the lifespan. New York: Pearson Education. Bermudez, J., & Zatorre, R. (2005). Differences in gray matter between musicians and nonmusicians. The neurosciences and music II: From perception to performance. Annals of the New York Academy of Sciences 1060, 395–399. Blacking, J. (1973). How musical is man? Seattle: University of Washington Press. Blake, E., & Cross, I. (2008). Flint tools as portable sound-producing objects in the upper paleolithic context: An experimental study. In P. Cunningham, J. Heeb, & R. Paardekooper (Eds.), Experiencing archaeology by experiment (pp. 1–19). Oxford: Oxbow Books. Boiles, C. (1984). Universals of musical behavior: A taxonomic approach. The World of Music 26(2), 50–64. Brown, S., & Jordania, J. (2011). Universals in the world’s musics. Psychology of Music 41(2), 229–248. Carral, V., Huotilainen, M., Ruusuvirta, T., Fellman, V., Näätänen, R., & Escera, C. (2005). A  kind of auditory “primitive intelligence” already present at birth. European Journal of Neuroscience 21(11), 3201–3204. Chiao, J. (2009). Cultural neuroscience: A once and future discipline. Progress in Brain Research 178, 287–304. Chiao, J., Li, S., Seligman, R., & Turner, R. (Eds.). (2016). The Oxford handbook of cultural neuroscience. Oxford: Oxford University Press. Clayton, M. (2009). The social and personal functions of music in cross-cultural perspective. In S.  Hallam, I.  Cross, & M.  Thaut (Eds.), The Oxford handbook of music psychology (pp. 35–44). Oxford: Oxford University Press. Conard, N., Malina, M., & Münzel, C. (2009). New flutes document the earliest musical trad­ition in southwestern Germany. Nature 460(7256), 737–740. Constable, G. (1973). The Neanderthals. New York: Time-Life Books. Creel, S., Newport, E., & Aslin, R. (2004). Distant melodies: Statistical learning of nonadjacent dependencies in tone sequences. Journal of Experimental Psychology 30(5), 1119–1130. Cross, I. (2007). Music and cognitive evolution. In L. Barrett & R. Dunbar (Eds.), The Oxford handbook of evolutionary psychology (pp. 649–667). Oxford: Oxford University Press. Cross, I. (2009–2010). The evolutionary nature of musical meaning. Musicæ Scientiæ, Special Issue 2009–2010, 179–200. Cross, I., & Morley, I. (2009). The evolution of music: Theories, definitions and the nature of the evidence. In S. Malloch & C. Trevarthen (Eds.), Communicative musicality (pp. 61–81). Oxford: Oxford University Press. Cross, I., Zubrow, E., & Cowan, F. (2002). Musical behaviours and the archaeological record: A preliminary study. In J. Mathieu (Ed.), Experimental archaeology. British Archaeological Reports International Series 1035 (pp. 25–34). Oxford: BAR Publishing.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

36   donald a. hodges Curtis, M. E., & Bharucha, J. J. (2009). Memory and musical expectation for tones in cultural context. Music Perception 26(4), 365–375. Dams, L. (1985). Paleolithic lithophones: Descriptions and comparisons. Oxford Journal of Archaeology 4(1), 31–46. Davies, S. (2010). Emotions expressed and aroused by music: Philosophical perspectives. In P. Juslin & J. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 15–43). Oxford: Oxford University Press. DeFelipe, J. (2011). The evolution of the brain, the human nature of cortical circuits, and intellectual creativity. Frontiers in Neuroanatomy 5(29), 1–17. Demorest, S., & Morrison, S. (2003). Exploring the influence of cultural familiarity and expertise on neurological responses to music. Annals of the New York Academy of Sciences 999, 112–117. Demorest, S., & Morrison, S. (2016). Quantifying culture: The cultural distance hypothesis of melodic expectancy. In J. Chiao, S.-C. Li, R. Seligman, & R. Turner (Eds.), The Oxford handbook of cultural neuroscience (pp. 183–196). Oxford: Oxford University Press. Demorest, S., Morrison, S., Beken, M., & Jungbluth, D. (2008). Lost in translation: An enculturation effect in music memory performance. Music Perception 25(3), 213–223. Demorest, S., Morrison, S., Nguyen, V., & Bodnar, E. (2016). The influence of contextual cues on cultural bias in music memory. Music Perception 33(5), 590–600. Demorest, S., Morrison, S., Stambaugh, L., Beken, M., Richards, T., & Johnson, C. (2010). An fMRI investigation of the cultural specificity of music memory. Social Cognitive and Affective Neuroscience 5(2–3), 282–291. Demorest, S., & Osterhout, L. (2012). ERP responses to cross-cultural melodic expectancy violations. Annals of the New York Academy of Sciences 1252, 152–157. Dissanayake, E. (2009). Root, leaf, blossom, or bole: Concerning the origin and adaptive function of music. In S. Malloch & C. Trevarthen (Eds.), Communicative musicality: Exploring the basis of human companionship (pp. 17–30). Oxford: Oxford University Press. Dobzhansky, T. (1962). Mankind evolving. New Haven, CT: Yale University Press. Doidge, N. (2007). The brain that changes itself. New York: Penguin. Drake, C., & Ben El Heni, J. (2003). Synchronizing with music: Intercultural differences. Annals of the New York Academy of Sciences 999, 429–437. Drake, C., & Bertrand, D. (2001). The quest for universals in temporal processing in music. Annals of the New York Academy of Sciences 930, 17–27. Dubos, R. (1981). Celebrations of life. New York: McGraw-Hill. Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–307. Fields, D. (2009). The other brain. New York: Simon & Schuster. Filley, C. (2005). White matter and behavioral neurology. In J. Ulmer, L. Parsons, M. Moseley, & J. Gabrieli (Eds.), White matter in cognitive neuroscience. Annals of the New York Academy of Sciences 1064, 162–183. Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., . . . Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology 19(7), 573–576. Galvan, A. (2010). Neural plasticity of development and learning. Human Brain Mapping 31(6), 879–890. Gaser, C., & Schlaug, G. (2003). Gray matter differences between musicians and nonmu­ sicians. Annals of the New York Academy of Sciences 999, 514–517.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   37 Gaston, E. (1968). Man and music. In E. Gaston (Ed.), Music in therapy (pp. 7–29). New York: Macmillan. Gogate, N., Giedd, J., Janson, K., & Rapoport, J. (2001). Brain imaging in normal and abnormal brain development: New perspectives for child psychiatry. Clinical Neuroscience Research 1(4), 283–290. Gregory, A. (1997). The roles of music in society: The ethnomusicological perspective. In D.  Hargreaves & A.  North (Eds.), The social psychology of music (pp. 123–140). Oxford: Oxford University Press. Gregory, A., & Varney, N. (1996). Cross-cultural comparisons in the affective response to music. Psychology of Music 24(1), 47–52. Háden, G., Stefanics, G., Vestergaard, M., Denham, S., Sziller, I., & Winkler, I. (2009). Timbreindependent extraction of pitch in newborn infants. Psychophysiology 46(1), 69–74. Han, S., & Ma, Y. (2015). A culture–behavior–brain loop model of human development. Trends in Cognitive Sciences 19(11), 666–676. Han, S., Northoff, G., Vogeley, K., Wexler, B., Kitayama, S., & Varnum, M. (2013). A cultural neuroscience approach to the biosocial nature of the human brain. Annual Review of Psychology 64, 335–359. Hannon, E. E., & Trainor, L. J. (2007). Music acquisition: Effects of enculturation and formal training on development. Trends in Cognitive Sciences 11(11), 466–472. Hannon, E.  E., & Trehub, S.  E. (2005). Metrical categories in infancy and adulthood. Psychological Science 16(1), 48–55. Hodges, D., Burdette, J., & Hairston, D. (2005). Aspects of multisensory perception: The integration of visual and auditory information processing in musical experiences. In G. Avanzini, L. Lopez, S. Koelsch, & M. Majno (Eds.), The neurosciences and music II: From perception to performance. Annals of the New York Academy of Sciences 1060, 175–185. Hodges, D., Hairston, W., Maldjian, J., & Burdette, J. (2010). Keeping an open mind’s eye: Mediation of cross-modal inhibition in music conductors. In S. M. Demorest, S. J. Morrison, & P. S. Campbell (Eds.), Proceedings of the 11th International Conference on Music Perception and Cognition (ICMPC 11) (pp. 415–416). Seattle, Washington. Hoffman, D., Standish, C., Garcia-Diez, M., Pettitt, P., Milton, J., Zilhão, J., . . . Pike, A. (2018). U-Th dating of carbonate crusts reveals Neandertal origin of Iberian cave art. Science 359(6378), 912–915. Hornbostel, E., & Sachs, C. (1992). Classification of musical instruments. In H. Meyers (Ed.), Ethnomusicology: An introduction (pp. 444–461). New York: W. W. Norton. Hutchinson, S., Lee, L., Gaab, N., & Schlaug, G. (2003). Cerebellar volume of musicians. Cerebral Cortex 13(9), 943–949. Hyde, K., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A., & Schlaug, G. (2009). Musical training shapes brain development. Journal of Neuroscience 20(10), 3019–3025. Jardi, R., Pins, D., Houfflin-Debarge, V., Chaffiotte, C., Rocourt, N., Pruvo, J.-P., . . . Thomas, P. (2008). Fetal cortical activation to sound at 33 weeks of gestation: A functional MRI study. NeuroImage 42(1), 10–18. Jonaitis, E., & Saffran, J. (2009). Learning harmony: The role of serial statistics. Cognitive Science 33(5), 951–968. Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A., Lähdesmäki, H., & Järvelä, I. (2015a). The effect of music performance on the transcriptome of professional musicians. Scientific Reports 5, 9506. doi:10.1038/srep09506

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

38   donald a. hodges Kanduri, C., Raijas, P., Ahvenaninen, M., Phillips, A., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä, I. (2015b). The effect of listening to music on human transcriptome. PeerJ 3, e830. doi:10.7717/peerj.830 Karuza, E., Newport, E., Aslin, R., Starling, S., Tivarus, M., & Bavelier, D. (2013). The neural correlates of statistical learning in a word segmentation task: An fMRI study. Brain and Language 127(1), 46–54. Kendig, F., & Levitt, G. (1982). Overture: Sex, math and music. Science Digest 90(1), 72–73. Kessler, E. J., Hansen, C., & Shepard, R. N. (1984). Tonal schemata in the perception of music in Bali and the West. Music Perception 2(2), 131–165. Kirham, N., Slemmer, J., & Johnson, S. (2002). Visual statistical learning in infancy: Evidence for a domain general learning mechanism. Cognition 83(2), B35–B42. Koelsch, S., Schröger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. NeuroReport 10(6), 1309–1313. Kolb, B., & Gibb, R. (2011). Brain plasticity and behaviour in the developing brain. Journal of Canadian Child Adolescent Psychiatry 20(4), 265–276. Kotilahti, K., Nissilä, I., Näsi, T., Lipiäinen, L., Noponen, T., Meriläinen, P., . . . Fellman, V. (2010). Hemodynamic responses to speech and music in newborn infants. Human Brain Mapping 31(4), 595–603. Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P., Järvinen, T., & Louhivuori, J. (2000). Cross-cultural music cognition: Cognitive methodology applied to North Sami yoiks. Cognition 76(1), 13–58. Kuhl, P., & Rivera-Gaxiola, M. (2008). Neural substrates of language acquisition. Annual Review of Neuroscience 31, 511–534. Kunej, D., & Turk, U. (2000). New perspectives on the beginnings of music: Archaeological and musicological analysis of a middle Paleolithic bone “flute.” In N. Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 235–268). Cambridge, MA: MIT Press. Lauuka, P., Eerola, T., Thingujan, N., & Yamasaki, T. (2013). Universal and culture-specific factors in the recognition and performance of musical affect expressions. Emotion 13(3), 434–449. List, G. (1984). Concerning the concept of the universal and music. The World of Music 26(2), 40–47. Lomax, A. (1968). Folk song style and culture. New Brunswick, NJ: Transaction Books. McDermott, J. (2008). The evolution of music. Nature 453(7193), 287–288. Manuck, S., & McCaffery, J. (2014). Gene–environment interaction. Annual Review of Psychology 65, 41–70. Manzano, O., & Ullén, F. (2018). Same genes, different brains: Neuroanatomical differences between monozygotic twins discordant for musical training. Cerebral Cortex 28(1), 1–8, 387–394. Marek, G. (1975). Toscanini. London: Vision Press. Mason, O. (1897). Geographical description of the musical bow. American Anthropologist 10(11), 377–380. Matsunaga, R., Yokosawa, K., & Abe, J. (2012). Magnetoencephalography evidence for different brain subregions serving two musical cultures. Neuropsychologia 50(14), 3218–3227. Mehr, S., Song, L., & Spelke, E. (2016). For 5-month-old infants, melodies are social. Psychological Science 27(4), 486–501. Mehr, S., & Spelke, E. (2017). Shared musical knowledge in 11-month-old infants. Developmental Science 21(2), e12542. doi:10.1111/desc.12542

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   39 Merriam, A. (1964). The anthropology of music. Chicago, IL: Northwestern University Press. Mithen, S. (2006). The singing Neanderthals: The origins of music, language, mind, and society. Cambridge, MA: Harvard University Press. Morley, I. (2006). The evolutionary origins and archaeology of music: An investigation into the prehistory of human musical capacities and behaviors (Doctoral dissertation). University of Cambridge, Cambridge. Darwin College Research Reports, DCRR-002. Retrieved from https://www.darwin.cam.ac.uk/drupal7/sites/default/files/Documents/publications/ dcrr002.pdf Morrison, S., & Demorest, S. (2009). Cultural constraints on music perception and cognition. In J. Y. Chiao (Ed.), Progress in brain research, Vol. 178: Cultural neuroscience: Cultural influences on brain function (pp. 67–77). Amsterdam: Elsevier. Morrison, S., Demorest, S., Aylward, E., Cramer, S., & Maravilla, K. (2003). fMRI investigation of cross-cultural music comprehension. NeuroImage 20(1), 378–384. Morrison, S., Demorest, S., Campbell, P.Bartolome, S., & Roberts, J. (2013). Effect of intensive instruction on elementary students’ memory for culturally unfamiliar music. Journal of Research in Music Education 60(4), 363–374. Morrison, S., Demorest, S., & Stambaugh, L. (2008). Enculturation effects in music cognition: The role of age and music complexity. Journal of Research in Music Education 56(2), 118–129. Mosing, M., Madison, G., Pedersen, N., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not make perfect: No causal effect of music practice on music ability. Psychological Science 25(9), 1795–1803. Mosing, M., Peretz, I., & Ullén, F. (2018). Genetic influences on music expertise. In D. Hambrick, G. Campitelli, & B. Macnamara (Eds.), The science of expertise: Behavioral, neural, and genetic approaches to complex skill (pp. 272–282). New York: Routledge. Mumford, L. (1967). The myth of the machine. New York: Harcourt Brace Jovanovich. Münte, T., Altenmüller, E., & Jäncke, L. (2002). The musician’s brain as a model of neuroplasticity. Nature Reviews Neuroscience 3(6), 473–478. Münte, T., Kohlmetz, C., Nager, W., & Altenmüller, E. (2001). Superior auditory spatial tuning in conductors. Nature 409(6820), 580. Nan, Y., Knösche, T., & Frederici, A. (2006). The perception of musical phrase structure: A cross-cultural ERP study. Brain Research 1094(1), 179–191. Nan, Y., Knösche, T., Zysset, S., & Frederici, A. (2008). Cross-cultural music phrase processing: An fMRI study. Human Brain Mapping 29(3), 312–328. Nettl, B. (1977). On the question of universals. The World of Music 19, 2–13. Nettl, B. (1983). The study of ethnomusicology. Urbana, IL: University of Illinois Press. Nettl, B. (2000). An ethnomusicologist contemplates universals in musical sound and culture. In N. Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 463–472). Cambridge, MA: MIT Press. Nettl, B. (2005). The study of ethnomusicology: Thirty-one issues and concepts. Champaign, IL: University of Illinois Press. Neuhaus, C. (2003). Perceiving musical scale structures: A cross-cultural event-related brain potentials study. Annals of the New York Academy of Sciences 999, 184–188. Nketia, J. (1984). Universal perspectives in ethnomusicology. The World of Music 26(2), 3–20. Norton, A., Winner, E., Cronin, K., Overy, K., Lee, D., & Schlaug, G. (2005). Are there preexisting neural, cognitive, or motoric markers for musical ability? Brain and Cognition 59(2), 124–134.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

40   donald a. hodges Oikkonen, J., & Järvelä, I. (2014). Genomics approaches to study musical aptitude. Bioessays 36(11), 1102–1108. Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L., & Hoke, M. (1998). Increased auditory cortical representation in musicians. Nature 392(6678), 811–814. Pantev, C., Roberts, L., Schultz, M., Engelien, A., & Ross, B. (2001). Timbre-specific enhancement of auditory cortical representations in musicians. Neuroreport 12(1), 169–174. Patel, A., Meltznoff, A., & Kuhl, K. (2004). Cultural differences in rhythm perception: What is the influence of native language? In S. Lipscomb, R. Ashley, R. Gjerdingen, & P. Webster (Eds.), Proceedings of the 8th International Conference on Music Perception and Cognition. Evanston, IL: Northwestern University. CD-ROM. Perani, D., Saccuman, M., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., . . . Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107(10), 4758–4763. Peretz, I., Brattico, E., Järvenpää, M., & Tervaniemi, M. (2009). The amusic brain: In tune, out of key, and unaware. Brain 132(5), 1277–1286. Peretz, I., Saffran, J., Schön, D., & Gosselin, N. (2012). Statistical learning of speech, not music, in congenital amusia. Annals of the New York Academy of Sciences 1252, 361–367. Pons, F., Lewkowicz, D., Soto-Faraco, S., & Sebastián-Gallés, N. (2009). Narrowing of intersensory speech perception in infancy. Proceedings of the National Academy of Sciences, 106(26), 10598–10602. Prideaux, T. (1973). Cro-Magnon man. New York: Time-Life Books. Quartz, S. (2003). Learning and brain development: A neural constructivist perspective. In P. Quinlan (Ed.), Connectionist models of development (pp. 279–309). New York: Psychology Press. Rampon, C., Jiang, C., Dong, H., Tang, Y.-P., Lockhart, D., Schultz, P., . . . Hu, Y. (2000). Effects of environmental enrichment on gene expression in the brain. Proceedings of the National Academy of Sciences 97(23), 12880–12884. Saffran, J. (2003). Absolute pitch in infancy and adulthood: The role of tonal structure. Developmental Science 6(1), 35–47. Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-old infants. Science 274(5294), 1926–1928. Saffran, J., & Griepentrog, G. (2001). Absolute pitch in infant auditory learning: Evidence for developmental reorganization. Developmental Psychology 37(1), 74–85. Saffran, J., Johnson, E., Aslin, R., & Newport, E. (1998). Statistical learning of tone sequences by human infants and adults. Cognition 70(1), 27–52. Schlaug, G., Forgeard, M., Zhu, L., Norton, A., Norton, A., & Winner, E. (2009). Traininginduced neuroplasticity in young children. The neurosciences and music III. Annals of the New York Academy of Sciences 1169, 205–208. Schlaug, G., Norton, A., Overy, K., & Winner, E. (2005). Effects of music training on the child’s brain and cognitive development. The neurosciences and music II: From perception to performance. Annals of the New York Academy of Sciences 1060, 219–230. Shahin, A., Roberts, L., & Trainor, L. (2004). Enhancement of auditory cortical development by musical experience in children. NeuroReport 15(12), 1917–1921. Smith, H. (1953). From fish to philosopher. Garden City, NY: Doubleday Anchor. Soley, G., & Hannon, E. (2010). Infants prefer the musical meter of their own culture: A cross-cultural comparison. Developmental Psychology 46(1), 286–292.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music through the lens of cultural neuroscience   41 Stefanics, G., Háden, G., Sziller, I., Balázs, L., Beke, A., & Winkler, I. (2009). Newborn infants process pitch intervals. Clinical Neurophysiology 120(2), 304–308. Stiles, J., Reilly, J., Levine, S., Trauner, D., & Nass, R. (2012). Neural plasticity and cognitive development: Insights from children with perinatal brain injury. Oxford: Oxford University Press. Tervaniemi, M., Kujala, A., Alho, K., Virtanen, J., Ilmoniemi, R., & Näätänen, R. (1999). Functional specialization of the human auditory cortex in processing phonetic and musical sounds: A magnetoencephalographic (MEG) study. NeuroImage 9(3), 330–336. Tillmann, B., & McAdams, S. (2004). Implicit learning of musical timbre sequences: Statistical regularities confronted with acoustical (dis)similarities. Journal of Experimental Psychology: Learning, Memory, and Cognition 30(5), 1131–1142. Trainor, L., Marie, C., Gerry, D., Whiskin, E., & Unrau, A. (2012). Becoming musically enculturated: Effects of music classes for infants on brain and behavior. Annals of the New York Academy of Sciences 1252, 129–138. Trehub, S., Unyk, A., & Trainor, L. (1993). Adults identify infant-direct music across cultures. Infant Behavior and Development 16(2), 193–211. Turk-Browne, N., Jungé, J., & Scholl, B. (2005). The automaticity of visual statistical learning. Journal of Experimental Psychology: General 134(4), 552–564. Turner, R., & Ioannides, A. (2009). Brain, music and musicality: Inferences from neuroimaging. In S. Malloch & C. Trevarthen (Eds.), Communicative Musicality (pp. 147–181). Oxford: Oxford University Press. Ullén, F., Hambrick, D., & Mosing, M. (2016). Rethinking expertise: A multifactorial gene– environment interaction model of expert performance. Psychological Bulletin 142(4), 427–446. Wade, B. (2009). Thinking musically: Experiencing music, expressing culture (2nd ed.). New York: Oxford University Press. Wakin, D. (2006). John Cage’s long music composition in Germany changes a note. New York Times, May 6. Retrieved September 26, 2017 from http://www.nytimes.com/2006/05/06/ arts/music/06chor.html Webb, S., Monk, C., & Nelson, C. (2001). Mechanisms of postnatal neurobiological development: Implications for human development. Developmental Neuropsychology 19(2), 147–171. Wilson, F. (1998). The hand: How its use shapes the brain, language, and human culture. New York: Vintage Books. Winkler, I., Háden, G., Ladining, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the beat in music. Proceedings of the National Academy of Sciences 106(7), 2468–2471. Wong, P., Roy, A., & Margulis, E. (2009). Bimusicalism: The implicit dual enculturation of cognitive and affective systems. Music Perception 27(2), 81–88. Yi, T., McPherson, G., Peretz, I., Berkovic, S., & Wilson, S. (2014). The genetic basis of music ability. Frontiers in Psychology 5(658), 1–19. Yi, T., McPherson, G., & Wilson, S. (2018). The molecular genetic basis of music ability and music-related phenotypes. In D.  Hambrick, G.  Campitelli, & B.  Macnamara (Eds.), The ­science of expertise: Behavioral, neural, and genetic approaches to complex skill (pp. 283–303). New York: Routledge. Zull, J. (2002). The art of changing the brain: Enriching the practice of teaching by exploring the biology of learning. Sterling, VA: Stylus Publishing.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 3

Cu lt u r a l Dista nce: A Com pu tationa l A pproach to Ex pl or i ng Cu lt u r a l I n flu ences on M usic Cogn ition Steven J. Morrison, Steven M. Demorest, and Marcus T. Pearce

Introduction As with many psychological constructs, much of what has been reported in research on the cognitive processing of music is limited to data collected from individuals from a small subset of cultural contexts (Henrich, Heine, & Norenzayan, 2010). Further, the music that is typically employed for the purposes of testing and exploration tends to be drawn from a similarly small set of music practices and mostly consists of that constructed within the Western diatonic framework. This includes Western classical music as well as many North American and Western European folk and popular genres. This is striking given that music is often regarded as a particularly prominent and powerful manifestation of culture. Music is a common way for individuals to assert cultural identity (Frith, 1996) and, as such, its value arguably lies as much in its cultural and stylistic distinctiveness as in any universal qualities it may possess. Musical systems are somewhat closed in that each describes a set of practices and conventions within which performances, pieces, or whatever might be the appropriate musical “unit” are understood and evaluated. These same practices and conventions can

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   43 also serve as touchstones against which individuals push in the spirit of creativity and innovation. People come to inhabit a musical system due to various combinations of formal learning—conservatory training, for example, as a means of gaining knowledge of avant-garde art music—and informal learning—becoming steeped in Cajun music as a result of growing up in the southern region of the US state of Louisiana. In this chapter, our purpose is to emphasize music as an intercultural phenomenon. As such, we will not focus on the particularities of any specific music cultural tradition, nor will we examine the concept of musical universals or the structural or acoustical candidates for such a distinction. Rather, we will dedicate our attention to interactions between music cultures, to what happens when music moves across cultural boundaries. From a sociological perspective, it has been useful to view the construct of culture from a somewhat dichotomous perspective in which the notion of the cultural insider can be contrasted with that of the cultural outsider (Merton,  1972). Contemporary scholarship has drawn attention to the complexity of this comparison and the considerable subjectivity that lies at the heart of such an often, oversimplified bifurcation (for an examination of this issue in the field of music research, see Trulsson & Burnard, 2016). Although music is often associated with cultural identity and therefore susceptible to insider/outsider categorization, the ease with which an individual interacts with any given culture’s music may be more nuanced. Culture-based differences in the way listeners and performers interact with and respond to music are often delineated by ethnic identity or geographical location which are, in turn, generally treated as categorical constructs. As such, they tend to oversimplify complex relationships, obscure considerable within-group variability, and, most critically for the present purpose, do not hold up well when considering a brain-based understanding of music processing. The cultural dimension of music provides context for critical tests of music as a neurological phenomenon. The conclusion that particular brain regions or neurological pathways are associated with human music processing can be tested by examining whether such relationships are evident across musical and cultural contexts. Likewise, the strength or extent of neural activity may offer insights into the ways in which particular music parameters function within musical systems. Cultural roots of music practice also offer a critical test of principles of formal musical learning. Teaching and learning practices often vary from culture to culture and, given that they are often directed toward within-culture music, likely interact with the idiosyncratic elements of the music being taught. The prospect of learning—even at a fundamental level—an unfamiliar music tradition as a performer or as a listener provides a context in which culture-general learning strategies or pathways might be tested. Similarly, it provides a framework in which “from the ground up” skill or schema development can be observed, particularly through more informal learning pathways in which exposure and self-directed discovery feature prominently. At the neurological level, learning within a culturally unfamiliar context might provide further evidence of experience-based neural plasticity as well as potential interactions with already-learned music conventions.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

44    steven j. morrison, steven m. demorest, and marcus t. pearce Given the incremental nature of music learning (formal or informal) and the imprecision of insider/outsider classifications, cross-cultural studies of both music perception and music learning would benefit from a more nuanced view of cross-cultural differences in musical traditions, one that is more continuous than categorical. Below we will explore the construct of cultural distance as one potential approach. Cultural distance has been examined at a societal level (Hofstede, 1983) through the development of a suite of measures found to effectively account for culture-based variability among workers. Since its publication, this construct has been used primarily in the fields of business and economics; however, it has also been employed in a number of cross-cultural designs including, occasionally, those related to music (Baek, 2015). The principle of cultural distance—as a way to conceptualize a culture-specific phenomenon in relation to its manifestation in other cultures—is evident in research on more specific cultural practices, as well. Kuhl (1991), for example, posited a “perceptual magnet effect” to explain early language learning processes and the manner in which infants’ speech perception quickly gravitates toward commonly used phonemic prototypes. Similarly, individuals demonstrate better memory (Golby, Gabrieli, Chiao, & Eberhardt, 2001) as well as better recognition of emotional expression (Chiao et al., 2008) for same-race faces. In both instances, more differentiated face recognition correlated with increased neural activity in fusiform areas and amygdala, respectively. In this chapter, we will provide a brief overview of cross-cultural research in music cognition. We will consider studies that have compared individuals’ interactions with culturally familiar and unfamiliar music, those that have compared responses by participants from different cultural backgrounds, and those that have employed fully comparative designs in which participants of different cultural backgrounds interact with each other’s music tradition. Among the previous research, we will summarize some of our own recent work that has focused on identifying musical parameters—specifically pitch and rhythm—that appear to make a particularly strong contribution to the differences arising from cross-cultural music interactions. Based on this work, we will then describe the construct of cultural distance as a conceptual and analytical means of interpreting and perhaps predicting cross-cultural responses to music.

Related Literature The purpose of this review is to provide a brief overview of topics in music cognition that have been explored through a cross-cultural lens. (For more thorough treatment of this topic see reviews by Morrison & Demorest,  2009 and Patel & Demorest,  2013.) Researchers have explored the cross-cultural perception of music emotion, preference, musical structures of scale and key, rhythm and meter, and larger formal elements, as well as musical memory. Participants in these studies have spanned the gamut from infancy to adulthood offering a picture of how culture influences music cognition and how that influence changes with age and experience.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   45

Cross-Cultural Explorations of Emotion The single largest body of cross-cultural research in music cognition has to do with the recognition of emotion in music. With the exception of a very small number of studies (e.g., Egerman, Fernando, Chuen, & McAdams, 2015), the research has focused not on emotion induction, or how music makes you feel, but on the ability to recognize emotional states present in music stimuli. On the surface, this seems a curious choice given the somewhat flexible nature of emotion recognition even within a cultural group. However, emotion proves to be an excellent choice for exploring cultural universality versus particularity in music cognition because emotions refer not just to cognitive categories, but to physical states that can be mimicked acoustically (Juslin, 2000; Juslin & Laukka, 2003). Cross-cultural studies have explored Western listeners’ perceived emotion in music of India (Balkwill, 2006; Balkwill & Thompson, 1999; Balkwill, Thompson, & Matsunaga, 2004; Deva & Vermani, 1975; Gregory & Varney, 1996; Keil & Keil, 1966), perception of Western music by non-Western listeners, including Congolese pygmies (Egermann et al., 2015) and the Mafa people of northern Cameroon (Fritz et al., 2009), Western listeners perception of Congolese pygmy music (Egermann et al., 2015), and the cross-cultural communication of emotion involving performers and listeners from Swedish, Indian, and Japanese music cultures (Laukka, Eerola, Thingujam, Yamasaki, & Beller, 2013). The findings can be summarized briefly as follows: A limited set of emotions can be recognized in music regardless of cultural familiarity. The emotions most consistently recognized (happy, sad, angry) vary in arousal in ways that mimic physiological states. Other emotion recognition judgments show influences of cultural familiarity. There are several theories of emotion recognition that attempt to model this combination of psychophysical and cultural cues in emotion recognition judgments. One of the first theories was the Cue Redundancy Model (CRM) proposed by Balkwill and Thompson (1999). According to this model, emotions in music are decoded by attending to cues in the musical stimulus consisting of psychophysical cues (sound intensity, tempo, melodic complexity, pitch range, etc.) and culture-specific cues like the use of a certain instrument or tonality to communicate a particular emotional state. This allows in-culture ­listeners to use more information in their emotion recognition judgments, but it also allows out-of-culture listeners to access basic emotional information regardless of familiarity. The authors later proposed a more refined model called Fractionating Emotional Systems or FES (Thompson & Balkwill, 2010). FES attempts to explain how the culture-specific and culture-general cues proposed in CRM function in development. They propose that all emotion communication is built on a phylogenetic base of shared cues involved in being human. As we age we incorporate ontogenetic cues for both music and language prosody into our emotional vocabulary in a more culturally specific way. Fritz (2013) has proposed a “dock-in” model of emotion recognition that is consistent with previous models in stating “all music cultures contain both universal and culture-specific features” (p. 514). It differs from previous models in that it proposes that different cultures may “dock in” to only a subset of universal music codes and

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

46    steven j. morrison, steven m. demorest, and marcus t. pearce that cross-cultural understanding can be explained in part by the overlap in universal ­features employed. This notion of overlap between cultures is similar to the cultural distance hypothesis discussed below, though the basis for comparing cultural systems is based on a simulation of the cognitive processing of musical structure rather than a comparison of stimulus features. When evaluating the findings of cross-cultural research in emotion perception it is important to keep in mind that, of all of the studies listed, only three (Egermann et al., 2015; Gregory & Varney, 1996; Laukka et al., 2013) were fully comparative, that is, featuring both listeners and musical stimuli from all cultures involved (Patel & Demorest, 2013). It may be difficult to generalize these findings to other non-Western listeners or musics. While the experience of emotions is a human universal, the notion that music contains an emotional message rather than a functional or social one, may be a somewhat culturally specific one. Given that most of the studies cited here asked listeners from Western or Western-influenced cultures to identify the emotions in nonWestern music, and that much of that music came from a single non-Western culture (India), it is difficult to determine the cultural appropriateness of emotion judgments in music. As Fritz (2013) observed in relation to one specific comparison involving members of a society indigenous to a remote region of Cameroon, “the musical expression of a variety of emotions like fearfulness and sadness, while recognized in the Western stimuli by the Mafa participants, are—according to interviews with Mafa individuals— never represented in the traditional music of the Mafa people” (p. 512).

Cross-Cultural Explorations of Music Preference Music preference research also explores affective responses to music, not in terms of how music codes affect and emotion, but rather by examining the conditions under which listeners experience pleasure when hearing music. As LeBlanc proposed in his theoretical model, “Music preference decisions are based upon the interaction of input information and the characteristics of the listener, with input information consisting of the musical stimulus and the listener’s cultural environment” (1982, p. 29). Music educators have long been interested in music preference as a cross-cultural phenomenon in part due to their commitment to providing a culturally diverse music education. Researchers in music education have looked at how children’s preference for music of other cultures develops and its relationship to familiarity and other musical features. Researchers have explored the musical qualities that might influence preference judgments across cultures (Demorest & Schultz, 2004; Flowers, 1980; Fung, 1994; Morrison & Yeh, 1999; Shehan, 1981) and whether instruction in a culture’s music can influence preference (Heingartner & Hall, 1974; Shehan, 1985). As with the research on emotion, the bulk of studies explore how Western listeners respond to non-Western music and are not fully comparative. Findings show that preference for culturally unfamiliar music can be increased with exposure—most of these studies were conducted in formal educational settings among school-age and college populations—but it does not extend to

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   47 novel pieces from the culture. Also, students prefer music that has properties of their culture such as westernized arrangements of non-Western music (Demorest & Schultz, 2004). To summarize the findings, the more familiar sounding something is culturally, the more likely listeners are to like it. However, while exposure can increase preference for out-of-culture music, it does so only for learned pieces and does not generalize to the style as a whole (Shehan, 1985).

Cross-Cultural Explorations of Musical Structure One of the debates surrounding music and culture is the extent to which there are deep structures in music that are relatively invariant across cultures (cf. Brown & Jordania, 2013). Given humans’ shared biology and the apparent human need to engage in musical behavior, it is plausible that certain structural features would be present in most, if not all, musics. Through cross-cultural explorations of musical structure, researchers have sought to identify some of the structural features or perceptual processes that work across cultures as well as the points at which music cognition becomes more culturally bound.

Scale and Key Perception Some of the earliest cross-cultural work done on scale perception included infants (Lynch & Eilers, 1991, 1992; Lynch, Eilers, Oller, & Urbano 1990; Lynch, Eilers, Oller, Urbano, & Wilson, 1991; Lynch, Short, & Chua, 1995). In a series of studies the authors tested whether pitch deviations could be detected when presented in the context of familiar (major/minor) versus unfamiliar (pelog) scale contexts. They found that deviations were better detected for familiar scale contexts for both adults and children with the exception of infants aged 6–12 months who performed similarly. While these studies represent an important early attempt to examine scale perception, they were hampered by methodological issues pertaining to the way in which stimuli were created and the possible interference of absolute pitch strategies. There has been a significant amount of work examining whether tonal relationships or tonal hierarchies (Krumhansl & Shepard, 1979) can be perceived by out-of-culture listeners (Castellano, Bharucha, and Krumhansl,  1984; Kessler, Hansen, and Shepard,  1984; Krumhansl,  1995; Krumhansl, Louhivuori, Toiviainen, Jarvinen, & Eerola, 1999; Krumhansl et al., 2000). The research has included music and participants from a variety of cultures in the designs and the findings have been mixed. The general sense is that out-of-culture listeners can employ more global strategies involving tone proximity and frequency of occurrence within the stimulus materials to mimic insider tonality judgments, but only up to a point. When judgments become more complex (Krumhansl et al.,1999, 2000) or require specific cultural knowledge (Curtis & Bharucha, 2009), cultural influences on tonal cognition become more pronounced. This suggests that tonality perception, like emotion perception, provides both general and specific cues for listeners depending on their cultural background.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

48    steven j. morrison, steven m. demorest, and marcus t. pearce Two recent fully comparative studies (Raman & Dowling, 2016, 2017) demonstrate the relative influence of global versus cultural factors in tonality judgments. In a series of four experiments across two studies the authors explored the sensitivity of Western and Carnātic trained musicians to two types of modulations in Carnātic melodies. The rāgamālikā modulation is more typical in Carnātic music and corresponds to the less frequent parallel minor (C major to C minor) modulation in Western music. The grahabēdham modulation is less common in Carnātic music, but more common in Western music as it corresponds to a modulation to the relative minor (C  major to A minor). They tested modulation identification (both accuracy and speed), tonal profiles, and active probe tone response during modulation. While results varied somewhat across the different experiments, they found, in general, that cultural background influenced speed and accuracy in modulation detection with Indian listeners more accurate overall. Response time varied by the cultural familiarity of the modulation, with Indians faster for rāgamālikās and Westerners faster for grahabēdhams. They also found that Western musicians’ tone profile responses, while relying on global information about frequency and distribution of tones, were sometimes influenced by a misapplication of Western major/minor judgments in Carnātic tone profiles. The authors reference the Cue Redundancy Model reviewed above as a possible explanation for the mix of global and cultural cues employed by both groups of musicians. Other approaches to cross-cultural tonal cognition have included event-related potential (ERP) responses to tasks involving out-of-culture scale violations (Neuhaus, 2003; Renninger, Wilson, & Donchin,  2006) and melodic expectancy violations (Demorest & Osterhout, 2012). In general, listeners were less sensitive to out-of-culture scale deviations unless they could detect the deviations using a culture-specific strategy. Another area of research has addressed whether linguistic background shapes musical ability. Researchers have found that tonal language speakers are generally better at general pitch discrimination (Giuliano, Pfordresher, Stanley, Narayana, & Wicha, 2011; Pfordresher & Brown, 2009; Wong et al., 2012) and even at pitch accuracy in singing (Pfordresher & Brown, 2009) than those from non-tonal linguistic backgrounds. The authors suggest that fine-grained pitch processing is central to the acquisition of a tonal language and therefore better developed among these individuals (Pfordresher & Brown, 2009).

Rhythm and Meter Perception Rhythm and meter perception has received much more attention in music cognition over the last ten to fifteen years, and with that attention has come a commensurate increase in cross-cultural exploration. Researchers have examined when infants’ responses to meter become culturally biased (Hannon & Trehub, 2005a, 2005b; Soley & Hannon, 2010), the influence of linguistic rhythm on rhythm perception (Hannon, 2009; Iversen, Patel, & Ohgushi, 2008; Patel & Daniele, 2003; Yoshida et al., 2010), and cultural influences on rhythmic perception and performance (Cameron, Bentley, & Grahn, 2015; Drake & Ben El Heni, 2003; Polak, London, & Jacoby, 2016; Stobart & Cross, 2000).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   49 In all of these investigations researchers have found varying degrees of cultural influence in rhythm processing in adults and infants, with infants demonstrating a preference for the meters of their home culture as early as 4–8 months (Soley & Hannon, 2010), even when those meters were more complex. Unlike adults, monocultural infants were equally responsive to metric violations within both familiar and unfamiliar meters (Hannon & Trehub 2005a) and infants as old as 12 months demonstrated enough flexibility to “reset” their perceptual responses with sufficient exposure to an unfamiliar meter (Hannon & Trehub, 2005b). While language acquisition has often been a focus of tonal cognition, several studies have found relationships between the rhythmic qualities of language and musical rhythms (Hannon,  2009; Patel & Daniele,  2003) and rhythm grouping (Iversen et al.,  2008; Yoshida et al.,  2010) of instrumental music from the culture. In a recent fully comparative study, Cameron and colleagues (2015) tested Westernborn and East African musicians’ performance on three rhythmic tasks, discriminating between two patterns, reproducing rhythm patterns, and tapping a steady beat to rhythmic patterns. Patterns were drawn from East African and Western music and the authors predicted that musicians would show a cultural advantage for all three tasks. As with previous cross-cultural work, however, they found that while the two performance tasks (rhythm reproduction and beat tapping) showed an in-culture advantage, the groups were equally adept at rhythm discrimination. This study was particularly noteworthy for including both perception and performance measures, as many studies feature one or the other.

Phrasing and Form Researchers have explored the influence of enculturation on phrase boundary perception (Nan, Knösche, & Friederici, 2006; Nan, Knösche, Zysset, & Friederici, 2008) and musical tension (Wong, Chan, Roy, & Margulis, 2011) through neuroscientific measures. Two fully comparative ERP studies (Nan, Knösche, & Friederici, 2009; Nan et al., 2006) tested Chinese and German musicians’ and non-musicians’ ability to detect phrase boundaries cross-culturally in unfamiliar excerpts. Results showed a clear in-culture advantage on the behavioral task, and early positive ERP components (100–450 ms) distinguished the two groups of participants for Chinese music (familiar only to the Chinese participants). Both groups exhibited a Closure Positive Shift neurologically suggesting they were sensitive to phrase boundaries in both cultures. A follow-up study with only German participants used an fMRI paradigm (Nan et al., 2008) to scan participants while they heard phrased and unphrased examples of Western and Chinese melodies that they were asked to classify by culture. All participants were better at recognizing in-culture examples and the researchers found that participants exhibited generally higher activation when listening to the Chinese melodies in regions associated with attention and auditory processing suggesting that out-of-culture music is more demanding for those processes. In most of the studies reviewed thus far, there are differences with in-culture and outof-culture responses to a variety of musical tasks from emotion and preference to basic

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

50    steven j. morrison, steven m. demorest, and marcus t. pearce musical structures. However, the results are almost always tempered by an awareness that some aspects of music processing can be done without relying on culturally specific strategies, using more global cues and responding to familiar sounding aspects of unfamiliar cultures. In the next section, we review a series of studies on cross-cultural music memory that have led us to propose a possible explanatory framework for musical enculturation.

Cross-Cultural Explorations of Music Memory In a series of experiments over the last decade or so we have used recognition memory as a way of assessing how effectively in-culture and out-of-culture music is processed. The studies have explored both behavioral (Demorest, Morrison, Beken, & Jungbluth, 2008; Morrison, Demorest, Campbell, Bartolome, & Roberts, 2012; Morrison, Demorest, & Stambaugh,  2008) and neurological (Demorest et al.,  2010; Morrison, Demorest, Aylward, Cramer, & Maravilla,  2003) responses to culturally familiar (Western or Turkish) and culturally unfamiliar (Turkish or Chinese) music. In addition, we explored whether memory performance was influenced by training (Demorest et al.,  2008; Morrison et al., 2012) or complexity (Morrison et al., 2008). The primary finding of this research has been that there is an “enculturation effect,” or cultural bias, in listening such that culturally unfamiliar music is consistently less effectively processed even when considering matters of age, training, and complexity. Further, this effect appears in both Western and non-Western born listeners. This finding was strengthened by the work of another group that tested memory and tension judgments in monomusical and bimusical participants in the United States and India (Wong, Roy, & Margulis, 2009) and found a similar recognition memory effect for monomusical, but not for bimusical, participants. It should be noted that in most cases out-of-culture recognition memory was above chance and demonstrated improvement with repeated testing (Morrison et al., 2012); however, the observed difference between in- and out-of-culture memory performance remained. Despite the consistency of the enculturation effect, we did not have a good explanation for its cause: that is, what aspect of out-of-culture music was interfering with listeners’ ability to hear and remember it? What was so unfamiliar about culturally unfamiliar music? Was it timbre, tonality, rhythm, melody, or some combination? In a recent study (Demorest, Morrison, Nguyen, & Bodnar, 2016), we sought to strip away contextual variables in an attempt to attenuate or eliminate the effects of enculturation on memory performance. We also explored the possible influence of music preference as a variable influencing attention and memory. Western-born participants (N = 128) were randomly assigned to conditions in which they heard the same music excerpts presented in one of three contexts: full instrumental ensemble (the original version), a single-voice melody on piano, or a single-voice isochronous pitch sequence also on piano. In each condition participants heard a block of three longer Western art music excerpts and a block of three longer Turkish art music excerpts in a counterbalanced order. After each example, they were asked to rate their preference for the excerpt. After each set of three examples they completed a twelve-item recognition memory test with six targets (taken from the

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   51 excerpts heard previously) and six foils (taken from a musically different and previously unheard part of the same pieces). Regardless of the listening condition, participants demonstrated superior memory for in-culture examples suggesting that none of the contextual changes mitigated memory performance for out-of-culture music. In-culture memory performance was influenced by context, but out-of-culture memory per­ formance was not. Preference was higher overall for in-culture music, but there was no significant correlation between preference scores and memory performance across cultures. This suggested that the process of enculturation involved a kind of informal learning of deeper structure involving commonly heard sequences of pitch relationships. Based on these findings we concluded, “If our understandings of out-of-culture music are filtered through in-culture expectations, then a comparison of the statistical properties of a listener’s home culture with that of an unfamiliar culture might yield predictive information about subsequent memory performance” (Demorest et al., 2016, p. 597). We labeled the notion of a statistical comparison between music cultures across one or more selected parameters as cultural distance (Demorest & Morrison, 2016) in an effort to convey the potentially continuous rather than dichotomous relationship among music cultural practices. In the next section, we will discuss the construct of cultural distance as an explanatory framework and present illustrative work in cross-cultural corpus analysis that lends support to its central premise.

Cultural Distance Throughout the body of research that examines cross-cultural cognitive processes associated with music, the logic of the underlying design typically sets individuals and/or music examples from one cultural background in contrast with individuals and/ or music from another cultural background. Such designs impose a dichotomous relationship between that which is culturally familiar or culturally similar and that which is unfamiliar or dissimilar. On one scale, this might be seen as reflecting the in-group and out-of-group dynamic. However, such bifurcation blurs the fluidity that characterizes musical interactions (Cross, 2008). That is, from the point of view of an individual encultured in a particular music tradition, the music of a culturally unfamiliar tradition may seem surprisingly accessible in one case or virtually impenetrable in another. It is this distinction—and the continuum of increasing or decreasing similarity from one’s own music—that we propose can be productively explored using the concept of cultural distance (Demorest & Morrison, 2016). The way in which an individual interacts with music is mediated by the properties common to the prevailing music of that individual’s culture. The music on which one was “brought up” provides the framework by which subsequent music experiences are judged as typical or atypical. Put another way, the statistical likelihood of events that characterize the music of one’s home culture governs not only the way in which one interacts with novel pieces from within that same cultural tradition, but also with music

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

52    steven j. morrison, steven m. demorest, and marcus t. pearce from culturally unfamiliar music traditions. One scans for common and familiar ­patterns both where they are likely to be found and where they may not be likely at all. This situation suggests a way in which an individual’s responses to and facility with ­culturally unfamiliar music may be interpreted or, indeed, predicted. Specifically, we have hypothesized that the degree to which the musics of any two cultures differ in the statistical patterns of pitch and rhythm will predict how well a person from one of the cultures can process the music of the other.  (Demorest & Morrison, 2016, p. 189)

Based on this cultural distance hypothesis, music cultures with considerable overlap of patterns would likely allow for more efficient and effective processing that might be observed through such responses as recognition memory, error detection, phrase parsing, or metric identification, to name a few. In order to test this proposition, we first need a way to ascertain the statistical properties of structural parameters considered typical of a given culture’s music. IDyOM (Information Dynamics of Music; Pearce, 2005) is a computational model of auditory expectation that uses statistical learning and probabilistic prediction to acquire and process internal representations of the structure of a musical style. Using the intervallic content of melody as an illustration, IDyOM generates a probability distribution over the set of possible intervals leading to each note in the melody. IDyOM generates probability distributions that are conditioned upon the preceding musical context and the prior musical experience of the model. The probability of each note can be logtransformed to yield its information content according to the model (MacKay, 2003), which reflects how unexpected the model finds a note in a particular context. IDyOM is a ­variable-order Markov model (Begleiter, El-Yaniv, & Yona, 2004; Bell, Cleary, & Witten, 1990; Bunton, 1997; Cleary & Teahan, 1997) which uses a multiple-viewpoint framework (Conklin & Witten, 1995) to represent music. This means that IDyOM has several features that go beyond the capabilities of standard Markov (or n-gram) models: first, it combines predictions from models of different order (using different length contexts for prediction); second, it adapts the maximum order used depending on the context; third, it combines predictions from a long-term model (intended to reflect effects of long-term exposure to a musical style) and a short-term model (reflecting dynamic learning of repeated structure within a given piece of music); and fourth, it is able to combine models of different representations of the musical surface (e.g., chromatic pitch, pitch contour, pitch interval and scale degree for predicting pitch; duration, duration ratio, duration contour for predicting rhythm). IDyOM has been shown to predict accurately Western listeners’ pitch expectations in behavioral, physiological, and EEG studies (e.g., Egermann, Pearce, Wiggins, & McAdams,  2013; Hansen & Pearce,  2014; Omigie, Pearce, & Stewart,  2012; Omigie, Pearce, Williamson, & Stewart,  2013; Pearce,  2005; Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya, 2010). In many circumstances, IDyOM provides a more accurate model of listeners’ pitch expectations than static rule-based models (e.g., Narmour,  1990; Schellenberg, 1997). Rule-based models consist of fixed rules (e.g., a small interval is

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   53 expected to be followed by another small interval in the same direction) which cannot be modified by experience and therefore do not predict any differences in perception between music cultures. Although such models may describe the perception of listeners from a given culture they do not constitute accurate models of cognition since they cannot account for the observed effects of enculturation reviewed above, and they often prove less accurate than IDyOM in accounting for within-culture perception (Hansen & Pearce, 2014; Pearce, 2005; Pearce, Ruiz, et al., 2010). Furthermore, IDyOM accounts well for other psychological processes in music perception, including similarity perception (Pearce & Müllensiefen, 2017), recognition memory performance (Agres, Abdallah, & Pearce, 2018), phrase boundary perception (Pearce, Müllensiefen, & Wiggins, 2010), and aspects of emotional experience (Egermann et al., 2013; Gingras et al., 2015; Sauvé, Sayad, Dean, & Pearce, 2017). To illustrate the construct of cultural distance, we trained three IDyOM models to simulate listeners with enculturation in three different musical styles: first, a Western model trained on a corpus of European folk songs to simulate the perception of a Western listener enculturated in Western tonal music; second, a Chinese model trained on a corpus of Chinese folk songs to simulate the perception of a Chinese listener enculturated in Chinese traditional music; and third, a Turkish model trained on a corpus of Turkish Makam melodies to simulate the perception of a Turkish listener enculturated in Turkish Makam music. The corpus of Western tonal music consists of 769 German folk songs from the Essen Folk Song Collection (Schaffrath, 1992, 1994, 1995), extracted from the datasets fink and erk. The corpus of Chinese music consists of 858 Chinese folk songs from the Essen Folk Song Collection, extracted from the datasets han and natmin. The corpus of Turkish Makam music consists of 805 Makam melodies extracted from the SymbTR database (Karaosmanoğlu,  2012).1 See Table  1 for further details of the ­corpora used to train the model simulations. Empty and non-monophonic compositions were first removed from all corpora. Furthermore, we removed duplicate compositions using a conservative procedure that considers two compositions duplicates if they share the same opening four melodic pitch intervals regardless of rhythm. The pitch system used in Turkish Makam music is microtonal and does not precisely map onto the Western (approximately) twelve-fold equal division of the octave (Bozkurt, Ayangil, & Holzapfel, 2014). Since IDyOM’s pitch matching is exact this would cause the Western and Chinese models to assign zero probabilities to every pitch in the Turkish corpus. A simple (though not unproblematic) way of addressing this issue is to round each pitch in the Turkish corpus to the nearest semitone, which enables comparisons to be made between the corpora. For studies with Western participants, this corresponds to the assumption that listeners perceive microtonal pitches categorically, aggregating microtonal pitches to the ­nearest semitone category. There is some evidence that listeners do in fact perceive pitch 1 The Essen Folk Song Collection was retrieved from: http://kern.humdrum.org/cgi-bin/ browse?l=/essen. The SymbTR database was retrieved from: https://github.com/MTG/SymbTr.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

54    steven j. morrison, steven m. demorest, and marcus t. pearce Table 1.  Corpora used in modeling cultural distance and stimulus selection Corpus

Source

Number of melodies Number of notes (before duplicates (before duplicates removed) removed)

Mean number of notes per melody

Western

Essen Folk Song Collection (fink, erk)

769 (2,240)

37,340 (112,042)

48

Turkish

SymbTr Makam melodies

805 (1,935)

307,041 (718,380)

381

Chinese

Essen Folk Song Collection (han, natmin)

858 (1,994)

57,677 (126,321)

67

c­ ategorically in this way, at least in certain circumstances (Burns & Campbell, 1994; Perlman & Krumhansl, 1996). In this example, any responses among Western listeners that demonstrated differences between Western melodies and these “pitch-Westernized” Turkish melodies would underestimate the dissimilarity experienced between the two corpora, conservatively producing type II errors (false negatives) rather than type I errors (false positives). Each model was used to make both within-culture and between-culture predictions. For the within-culture predictions, IDyOM estimates the information content of every event in every composition in the corpus, using ten-fold cross-validation (Kohavi, 1995) to create training and test sets from the same corpus. For between-culture predictions, IDyOM is first trained on the within-culture corpus (e.g., the Western corpus for the Western model) and then estimates the information content of every note in every composition in a different corpus representing the comparison culture (e.g., the Chinese or Turkish corpus for the Western model). IDyOM was configured to use only its longterm model (or LTM, simulating long-term exposure to a musical style) trained on the appropriate corpus; the short-term model (simulating dynamic learning of repeated patterns within a piece of music) was not used. Other than these differences regarding training corpora, all models were configured identically using the default parameters described in Pearce (2005). In all cases, information content was averaged across notes for each composition yielding a value representing the mean unpredictability of that composition for a given model. For each comparison between cultures (Western vs. Turkish, Western vs. Chinese, Turkish vs. Chinese), we then plot the data for each composition in the two corresponding corpora: information content for one model is plotted on the abscissa while information content for the second model is plotted on the ordinate. The line of equality (x = y) indicates equivalence between the two models. Compositions lying on this line do not distinguish the two cultures, being equally predictable for each model; in other words, they should be equally familiar and predictable to listeners enculturated in either of

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   55 the  two cultures. Positions near the origin represent compositions that are simple within both cultures—that is, they are highly predictable insofar as most incidences of a  selected feature are quite common—while positions far from the origin represent compositions that are complex—unpredictable, uncommon—within both cultures. Positions further away from the line of equality represent compositions that are predictable for the simulated model of one culture but unpredictable for the simulated model of the other culture. Distance from the line of equality, therefore, provides a quantitative measure of cultural distance based on information-theoretic modeling of enculturation in musical styles. Fig. 1A illustrates how cultural distance is computed for a comparison between IDyOM models trained on the Western corpus and the Chinese corpus using a pitch interval representation. By rotating the data points through 45°, Fig. 1B shows the same data with Cultural Distance on the ordinate and culture-neutral complexity on the abscissa. In this example, IDyOM correctly classifies 98 percent of the folk songs by culture (Chinese vs. Western). As mentioned above, IDyOM is capable of modeling different attributes of the musical surface and combining the predictions made by those models. For each comparison between cultures, cultural distance is computed for models predicting pitch structure alone (using a representation of pitch interval), rhythmic structure alone (using a representation of inter-onset interval), and for models using a combined representation of pitch and rhythmic structure (for which a melodic event is represented as a pair of values, one for the preceding pitch interval and one for the preceding inter-onset interval). For each cultural comparison and each of the three representations, ten compositions with the highest Cultural Distance were selected for each of the two cultures compared. These compositions are highlighted in Fig. 1 for the pitch interval representation. Table 2 shows the mean Cultural Distance values for each combination of cultural comparison and model representation for the corpus as a whole and for the ten selected com­ positions. Note that this Cultural Distance measure reflects both corpora included in the comparison. Thus, there is only partial overlap between the different comparisons (e.g., five of the ten Chinese songs selected in the German comparison are the same as those selected in the Turkish comparison; five for the two Turkish comparisons and two for the two German comparisons). Note also that this Cultural Distance measure may be asymmetrical such that one culture is on average more distant from the second than the second is from the first (e.g., in the case of the Western and Chinese comparison, see Table 2). For all three cultural comparisons, as shown in Table 2, the IDyOM simulations produce positive correlations between the cultures for rhythm predictions much more so than pitch predictions which yield no correlation (Western/Chinese), a small positive correlation (Western/Turkish), or a moderate negative correlation (Turkish/ Chinese). This suggests that pitch is a more important indicator of cultural distance between these styles than rhythm. For each of the three representations used in each of the three comparisons, one-sample t-tests indicate that the mean cultural distance is significantly different from zero (p < 0.01) for both corpora involved in the comparison.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

56    steven j. morrison, steven m. demorest, and marcus t. pearce (b)

6

1.0

Western Chinese

5

0.5 Cultural distance

Chinese model (mean IC)

(a)

4 3 2

0.0 –0.5 –1.0

1 1

5 2 3 4 Western model (mean IC)

6

–1.5

3.5

4.0 4.5 5.0 5.5 6.0 6.5 Culture-neutral complexity

7.0

Figure 1.  Modeling cultural distance between the Western and Chinese corpora using a pitch interval representation. A: The information content of the Western model plotted against that of the Chinese model with the x = y line shown. B: A 45° rotation of A such that the ordinate represents cultural distance and the abscissa culture-neutral complexity. For each style, the ten compositions with most extreme cultural distance are highlighted.

Limitations The analysis of two or more types of music along any given musical parameter (for example, pitch as in the illustration above) or combination of parameters imposes the assumption that such an analysis is valid within each music type. While a music tradition such as Western art music (at least that from or deriving from the common practice period of approximately the mid-seventeenth to early twentieth centuries) has a well-established history of analysis and interpretation based, in part, on both sequential and concurrent pitch interval relationships, the same may not be said of other traditions. Tools such as IDyOM offer the flexibility to examine cultural distance according to a variety of individual or combinations of musical parameters. Nevertheless, any specific configuration runs the risk of privileging one parametric hierarchy over another. Thus, in terms of cross-cultural research, such statistical models will virtually always impose the perspective of a particular music tradition, at least to some degree. This limitation has ramifications for fully comparative studies in that the degree to which a parameter holds primacy for one set of participants may not hold true for the other. Much as emotion recognition, so familiar to the experience of westernized listeners, did not figure meaningfully in the music tradition of the Mafa (Fritz, 2013), the statistical likelihood of patterns of pitch may contribute less to musical thinking among Rwandans and more to North Americans (as in Cameron et al., 2015) than does the complexity of patterns of rhythm. In this way, cultural distance is a tool through which one can isolate norms for one or more musical parameters as well as provide a particular perspective on musical meaning-making.

Mean Cultural Distance Corpus Culture 1

Culture 2

Representation

Western

Turkish

Pitch Rhythm Pitch + Rhythm

Western

Turkish

Chinese

Chinese

Pitch

IC Correlation

Selected Stimuli

Culture 1

Culture 2

Culture 1

Culture 2

0.20, p < 0.01

−0.39

0.41

−1.27

1.39

0.52, p < 0.01

−0.18

0.68

−0.77

5.9

0.08, p < 0.01

−0.57

1.08

−1.52

6.44

−0.04, p = 0.11

−0.46

0.51

−1.03

1.18

Rhythm

0.44, p < 0.01

−0.4

0.11

−1.92

1.71

Pitch + Rhythm

0.00, p = 0.99

−0.86

0.62

−2.35

2.23

−0.52, p < 0.01

−0.91

1.06

−1.96

2.11

0.68, p < 0.01

−0.4

0.07

−4.47

0.36

−0.26, p < 0.01

−1.32

1.13

−5.54

2.2

Pitch Rhythm Pitch + Rhythm

OUP CORRECTED PROOF – OUP FINAL, CORRECTED 07/10/2019, SPi PROOF – FINAL, 07/10/2019, SPi

Table 2.  Mean cultural distance values for the entire corpus and selected stimuli for the two styles involved in each comparison. Data are reported for models predicting pitch alone, rhythm alone, and both pitch and rhythm

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

58    steven j. morrison, steven m. demorest, and marcus t. pearce A related limitation is that IDyOM currently requires symbolic score-like input in which notes are represented as discrete events with discrete properties (e.g., onset time, pitch). This does not readily accommodate musical cultures which depend heavily on timbral, dynamic, or textural changes. The same is true of musical cultures that have no written tradition, where the distinction between composition and performance is blurred or nonexistent or where music is inextricably combined with other modes of communication (Cross, 2014). Despite the emphasis here on the advantageous aspects of familiarity, without question novelty is an attractive characteristic of music. Models of musical expectancy (e.g., Huron, 2006; Meyer, 1956) describe the interest inherent in and stimulation derived from that which is unfamiliar and surprising in music. The constant curiosity for new musical ideas suggests ongoing willingness to explore less “predictable” musical scenarios. With much of the world’s music readily—and in many cases instantly—accessible, such willingness leads as easily to unfamiliar music traditions as to the remoter corners of one’s own. We have used cultural distance as a means of explaining processing difficulties (as operationalized by recognition memory); however, it is equally viable as a tool to examine such positive aspects of music experience as interest and surprise. Although Cook (2008) was referring specifically to musicologists, his description can arguably be construed more broadly: “Practically all of us are at least to some degree musically multilingual . . . as a result one understands even the tradition(s) in which one is most ‘at home’ as options amongst other options, understands them in relation to other traditions rather than as absolutes” (p. 63).

Conclusion Research on cross-cultural music interactions has demonstrated that responses to culturally familiar and unfamiliar music, as well as responses by individuals encultured in different music traditions, can be either remarkably similar or strikingly different depending on the task and the music presented. Theoretical models such as Cue Redundancy (Balkwill & Thompson, 1999) or Fritz’s (2013) dock-in model, have framed cross-cultural music interactions as consisting of culture-general and culture-specific components. The manner in which these models account for areas of overlap between music cultures and distinctions unique to each music culture fit well with recent research findings as well as with the concept of cultural distance. However, absent from their construal of shared and unique features is a middle ground of “culturally specific but similar” components that, while mutually proprietary and uniquely meaningful to each culture, may be somewhat accommodating to strategies for listening, performing, and meaning-making deployed by individuals from outside the culture. This similar-but-not-shared aspect of the cultural distance construct can help account for memory responses, reported above, to out-of-culture music that were less successful than for in-culture music but were still above chance (e.g., Demorest et al.,  2008).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   59 Likewise, it also provides an explanation in cases where listeners have applied familiar listening strategies to culturally unfamiliar music only to encounter ultimate confusion (e.g., Curtis & Bharucha, 2009). Eventually, the trajectory of complexity within a culturally unfamiliar system takes a listener or performer past where learned patterns can accommodate. On the whole, responses to musics that demonstrate considerable overlap may show greater consistency than those to musics with very few points of commonality. Thus, one can make a distinction between the apparent “ease” with which an individual can move between music cultures and the more likely case of greater opportunities afforded by some unfamiliar music cultures to successfully deploy familiar strategies. This is potentially useful for neurological investigations of music processing. Responses to culturally unfamiliar music have generally been reported to differ more by degree than by presence or location. That is, music appears to recruit similar neural systems regardless of its cultural familiarity, though the strength or extent of that activity may differ according to the music encountered (e.g., Nan et al., 2008; Demorest et al., 2010). The model of cultural distance is a tool that provides a continuous rather than categorical conceptualization of cross-cultural music research designs. Such a correlational approach may lend itself well to the fine-grained, incremental, and plastic manner in which neurological processes and pathways develop and are deployed. We are not suggesting that through the learning of an unfamiliar array of patterns one can gain access to the full, rich experience of culturally situated musical contexts. Music represents a broad range of activities and relationships that may only have tenuous connections to structural parameters like melodic or rhythmic intervals. Much of music’s meaning is derived from where, when, and how it occurs quite apart from how it is put together (Small, 1998). Rather, we suggest that cultural distance may be a useful lens through which specific aspects of the cognitive processing of music—particularly musical structure—may be predicted, investigated, analyzed, and interpreted. Much of the research on cross-cultural musical interactions has involved measurement of such things as memory, affective response, detection of differences, verbal or written description, and preference. In virtually all cases these outcomes were prompted through listening tasks, a way of experiencing music that, while ecologically valid and obviating any need for previous training, is covert and arguably accommodating of varied interpretations and strategies. In contrast, investigations of cross-cultural performance contexts may yield new insights into the ways in which individuals navigate unfamiliar musical terrain. More directly observable performance-based interactions may shed additional light on the processes by which one grapples with, accommodates, or eventually gains facility with musics that are differently organized. Earlier we posed the question of what happens when music crosses cultural boundaries. The construct of cultural distance provides a more graduated, incremental way of conceptualizing the relationship between the familiar and the unfamiliar. It allows for the fluidity characteristic of musical interactions, recognizes the porous nature of music categorization, and accounts for the variability found within any music tradition. For research purposes, cultural distance offers a way by which dichotomous models of music—insider/outsider, familiar/unfamiliar, own/other—can be refined to test a

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

60    steven j. morrison, steven m. demorest, and marcus t. pearce more nuanced picture of musical meaning-making. In this way, cross-cultural music interactions might be viewed less as the crossing of a boundary and more as the undertaking of a trip.

References Agres, K., Abdallah, S., & Pearce, M. T. (2018). Information-theoretic properties of auditory sequences dynamically influence expectation and memory. Cognitive Science 42(1), 43–76. Baek, Y.  M. (2015). Relationship between cultural distance and cross-cultural music video consumption on YouTube. Social Science Computer Review 33(6), 730–748. Balkwill, L. L. (2006). Perceptions of emotion in music across cultures. Paper presented at Emotional Geographies: The Second International & Interdisciplinary Conference, May, Queen’s University, Kingston, Canada. Balkwill, L. L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of emotion in music: Psychophysical and cultural cues. Music Perception 17(1), 43–64. Balkwill, L. L., Thompson, W. F., & Matsunaga, R. (2004). Recognition of emotion in Japanese, Western, and Hindustani music by Japanese listeners. Japanese Psychological Research 46(4), 337–349. Begleiter, R., El-Yaniv, R., & Yona, G. (2004). On prediction using variable order Markov models. Journal of Artificial Intelligence Research 22, 385–421. Bell, T. C., Cleary, J. G., & Witten, I. H. (1990). Text compression. Englewood Cliffs, NJ: Prentice Hall. Bozkurt, B., Ayangil, R., & Holzapfel, A. (2014). Computational analysis of Makam music in Turkey: Review of state-of-the-art and challenges. Journal of New Music Research 43(1), 3–23. Brown, S., & Jordania, J. (2013). Universals in the world’s musics. Psychology of Music 41(2), 229–248. Bunton, S. (1997). Semantically motivated improvements for PPM variants. The Computer Journal 40(2–3), 76–93. Burns, E. M., & Campbell, S. L. (1994). Frequency and frequency-ratio resolution by possessors of absolute and relative pitch: Examples of categorical perception. Journal of the Acoustical Society of America 96(5), 2704–2719. Cameron, D. J., Bentley, J., & Grahn, J. A. (2015). Cross-cultural influences on rhythm processing: Reproduction, discrimination, and beat tapping. Frontiers in Psychology 6, 366. Retrieved from https://doi.org/10.3389/fpsyg.2015.00366 Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music of north India. Journal of Experimental Psychology: General 113(3), 394–412. Chiao, J. Y., Iidaka, T., Gordon, H. L., Nogawa, J., Bar, M., Aminoff, E., . . . Ambady, N. (2008). Cultural specificity in amygdala response to fear faces. Journal of Cognitive Neuroscience 20(12), 2167–2174. Cleary, J.  G., & Teahan, W.  J. (1997). Unbounded length contexts for PPM. The Computer Journal 40(2–3), 67–75. Conklin, D., & Witten, I. H. (1995). Multiple viewpoint systems for music prediction. Journal of New Music Research 24(1), 51–73. Cook, N. (2008). We are all (ethno)musicologists now. In H. Stobart (Ed.), The new (ethno)musicologies (pp. 48–70). Lanham, MD: Scarecrow Press.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   61 Cross, I. (2008). Musicality and the human capacity for culture. Musicae Scientiae 12(1 Suppl.), 147–167. Cross, I. (2014). Music and communication in music psychology. Psychology of Music 42(6), 809–819. Curtis, M. E., & Bharucha, J. J. (2009). Memory and musical expectation for tones in cultural context. Music Perception 26(4), 365–375. Demorest, S. M., & Morrison, S. J. (2016). Quantifying culture: The cultural distance hypothesis of melodic expectancy. In J. Y. Chiao, S.-C. Li, R. Seligman, & R. Turner (Eds.), The Oxford handbook of cultural neuroscience (pp. 183–194). Oxford: Oxford University Press. Demorest, S. M., Morrison, S. J., Beken, M. N., & Jungbluth, D. (2008). Lost in translation: An enculturation effect in music memory performance. Music Perception 25(3), 213–223. Demorest, S. M., Morrison, S. J., Beken, M. N., Stambaugh, L. A., Richards, T. L., & Johnson, C. (2010). Music comprehension among western and Turkish listeners: fMRI investigation of an enculturation effect. Social Cognitive and Affective Neuroscience 5, 282–291. Demorest, S. M., Morrison, S. J., Nguyen, V. Q., & Bodnar, E. N. (2016). The influence of contextual cues on cultural bias in music memory. Music Perception 33(5), 590–600. Demorest, S. M., & Osterhout, L. (2012). ERP responses to cross-cultural melodic expectancy violations. Annals of the New York Academy of Sciences 1252, 152–157. Demorest, S. M., & Schultz, S. J. (2004). Children’s preference for authentic versus arranged versions of world music recordings. Journal of Research in Music Education 52(4), 300–313. Deva, B. C., & Virmani, K. G. (1975). A study in the psychological response to ragas. (Research Report II of Sangeet Natak Akademi). New Delhi, India: Indian Musicological Society. Drake, C., & Ben El Heni, J. (2003). Synchronizing with music: Intercultural differences. Annals of the New York Academy of Sciences 999, 429–437. Egermann, H., Fernando, N., Chuen, L., & McAdams, S. (2015). Music induces universal emotion-related psychophysiological responses: Comparing Canadian listeners to Congolese Pygmies. Frontiers in Psychology 5, 1341. Retrieved from https://doi.org/10.3389/ fpsyg.2014.01341 Egermann, H., Pearce, M. T., Wiggins, G. A., & McAdams, S. (2013). Probabilistic models of expectation violation predict psychophysiological emotional responses to live concert music. Cognitive, Affective & Behavioral Neuroscience 13(3), 533–553. Flowers, P. J. (1980). Relationship between two measures of music preference. Contributions to Music Education 8, 47–54. Frith, S. (1996). Music and identity. In S. Hall & P. Du Gay (Eds.), Questions of cultural identity (pp. 108–127). London: Sage Publications. Fritz, T. (2013). The dock-in model of music culture and cross-cultural perception. Music Perception: An Interdisciplinary Journal 30(5), 511–516. Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., . . . Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology 19(7), 573–576. Fung, C. V. (1994). Undergraduate nonmusic majors’ world music preference and multicultural attitudes. Journal of Research in Music Education 42(1), 45–57. Gingras, B., Pearce, M. T., Goodchild, M., Dean, R. T., Wiggins, G., & McAdams, S. (2015). Linking melodic expectation to expressive performance timing and perceived musical ­tension. Journal of Experimental Psychology: Human Perception & Performance 42(4), 594–609. Giuliano, R. J., Pfordresher, P. Q., Stanley, E. M., Narayana, S., & Wicha, N. Y. (2011). Native experience with a tone language enhances pitch discrimination and the timing of neural

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

62    steven j. morrison, steven m. demorest, and marcus t. pearce responses to pitch change. Frontiers in Psychology 2, 146. Retrieved from https://doi. org/10.3389/fpsyg.2011.00146 Golby, A. J., Gabrieli, J. D., Chiao, J. Y., & Eberhardt, J. L. (2001). Differential responses in the fusiform region to same-race and other-race faces. Nature Neuroscience 4, 845–850. Gregory, A. H., & Varney, N. (1996). Cross-cultural comparisons in the affective response to music. Psychology of Music 24(1), 47–52. Hannon, E.  E. (2009). Perceiving speech rhythm in music: Listeners classify instrumental songs according to language of origin. Cognition 111(3), 403–409. Hannon, E.  E., & Trehub, S.  E. (2005a). Metrical categories in infancy and adulthood. Psychological Science 16(1), 48–55. Hannon, E. E., & Trehub, S. E. (2005b). Tuning in to musical rhythms: Infants learn more readily than adults. Proceedings of the National Academy of Sciences 102(35), 12639–12643. Hansen, N. C., & Pearce, M. T. (2014). Predictive uncertainty in auditory sequence processing. Frontiers in Psychology 5, 1–17. Retrieved from https://doi.org/10.3389/fpsyg.2014.01052 Heingartner, A., & Hall, J. V. (1974). Affective consequences in adults and children of repeated exposure to auditory stimuli. Journal of Personality and Social Psychology 29(6), 719–723. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences 33(2–3), 61–83. Hofstede, G. (1983). National cultures in four dimensions: A research-based theory of cultural differences among nations. International Studies of Management & Organization 13(1–2), 46–74. Huron, D. B. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. Journal of the Acoustical Society of America 124, 2263–2271. Juslin, P.  N. (2000). Cue utilization in communication of emotion in music performance: Relating performance to perception. Journal of Experimental Psychology: Human Perception and Performance 26(6), 1797–1812. Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal emotion and music performance: Different channels, same code? Psychological Bulletin 129(5), 770–814. Karaosmanoğlu, M. K. (2012). A Turkish Makam music symbolic database for music information retrieval: Symbtr. In Proceedings of the 13th ISMIR Conference, Porto, Portugal, 223–228. Keil, A., & Keil, C. (1966). A preliminary report: The perception of Indian, Western, and AfroAmerican musical moods by American students. Ethnomusicology 10(2), 153–173. Kessler, E. J., Hansen, C., and Shepard, R. N. (1984). Tonal schemata in the perception of music in Bali and the West. Music Perception 2(2), 131–65. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (Vol. 2, pp. 1137–1145). San Mateo, CA: Morgan Kaufmann. Krumhansl, C. L. (1995). Music psychology and music theory: Problems and prospects. Music Theory Spectrum 17(1), 53–80. Krumhansl, C.  L., Louhivuori, J., Toiviainen, P., Jarvinen, T., & Eerola, T. (1999). Melodic expectation in Finnish spiritual folk hymns: Convergence of statistical, behavioral, and computational approaches. Music Perception 17(2), 151–195. Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions within a diatonic context. Journal of Experimental Psychology: Human Perception and Performance 5(4), 579–594.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   63 Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P., Järvinen, T., & Louhivuori, J. (2000). Cross-cultural music cognition: Cognitive methodology applied to North Sami Yoiks. Cognition 76(1), 13–58. Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Attention, Perception, & Psychophysics 50(2), 93–107. Laukka, P., Eerola, T., Thingujam, N.  S., Yamasaki, T., & Beller, G. (2013). Universal and ­culture-specific factors in the recognition and performance of musical affect expressions. Emotion 13(3), 434–449. LeBlanc, A. (1982). An interactive theory of music preference. Journal of Music Therapy 19(1), 28–45. Lynch, M.  P., & Eilers, R.  E. (1991). Children’s perception of native and nonnative musical scales. Music Perception 9(1), 121–131. Lynch, M. P., & Eilers, R. E. (1992). A study of perceptual development for musical tuning. Perception & Psychophysics 52(6), 599–608. Lynch, M. P., Eilers, R. E., Oller, D. K., & Urbano, R. C. (1990). Innateness, experience, and music perception. Psychological Science 1(4), 272–276. Lynch, M. P., Eilers, R. E., Oller, K. D., Urbano, R. C., & Wilson, P. (1991). Influences of acculturation and musical sophistication on perception of musical interval patterns. Journal of Experimental Psychology: Human Perception and Performance 17(4), 967–975. Lynch, M. P., Short, L. B., and Chua, R. (1995). Contributions of experience to the development of musical processing in infancy. Developmental Psychobiology 28(7), 377–398. MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press. Merton, R.  K. (1972). Insiders and outsiders: A chapter in the sociology of knowledge. American Journal of Sociology 78(1), 9–47. Meyer, L.  B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press. Morrison, S.  J., & Demorest, S.  M. (2009). Cultural constraints on music perception and ­cognition. Progress in Brain Research 178, 67–77. Morrison, S. J., Demorest, S. M., Aylward, E. H., Cramer, S. C., & Maravilla, K. R. (2003). fMRI investigation of cross-cultural music comprehension. NeuroImage 20(1), 378–384. Morrison, S. J., Demorest, S. M., Campbell, P. S., Bartolome, S. J., & Roberts, J. C. (2012). Effect of intensive instruction on elementary students’ memory for culturally unfamiliar music. Journal of Research in Music Education 60(4), 363–374. Morrison, S. J., Demorest, S. M., & Stambaugh, L. A. (2008). Enculturation effects in music cognition: The role of age and music complexity. Journal of Research in Music Education 56(2), 118–129. Morrison, S. J., & Yeh, C. S. (1999). Preference responses and use of written descriptors among music and nonmusic majors in the United States, Hong Kong, and the People’s Republic of China. Journal of Research in Music Education 47(1), 5–17. Nan, Y., Knösche, T. R., & Friederici, A. D. (2006). The perception of musical phrase structure: A cross-cultural ERP study. Brain Research 1094(1), 179–191. Nan, Y., Knösche, T.  R., & Friederici, A.  D. (2009). Non-musicians’ perception of phrase boundaries in music: A cross-cultural ERP study. Biological Psychology 82(1), 70–81. Nan, Y., Knösche, T. R., Zysset, S., & Friederici, A. D. (2008). Cross-cultural music phrase processing: An fMRI study. Human Brain Mapping 29(3), 312–328.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

64    steven j. morrison, steven m. demorest, and marcus t. pearce Narmour, E. (1990). The analysis and cognition of basic melodic structures: The implicationrealization model. Chicago, IL: University of Chicago Press. Neuhaus, C. (2003). Perceiving musical scale structures: A cross-cultural event-related brain potentials study. Annals of the New York Academy of Sciences 999, 184–188. Omigie, D., Pearce, M. T., & Stewart, L. (2012). Tracking of pitch probabilities in congenital amusia. Neuropsychologia 50(7), 1483–1493. Omigie, D., Pearce, M. T., Williamson, V. J., & Stewart, L. (2013). Electrophysiological correlates of melodic processing in congenital amusia. Neuropsychologia 51(9), 1749–1762. Patel, A.  D., & Daniele, J.  R. (2003). An empirical comparison of rhythm in language and music. Cognition 87(1), B35–B45. Patel, A.  D., & Demorest, S.  M. (2013). Comparative music cognition: Cross-species and cross-cultural studies. In D. Deutsch (Ed.), The psychology of music (3rd ed., pp. 647–681). London: Academic Press. Pearce, M. T. (2005). The construction and evaluation of statistical models of melodic structure in music perception and composition (Doctoral dissertation). Department of Computing, City University, London. Pearce, M. T., & Müllensiefen, D. (2017). Compression-based modelling of musical similarity perception. Journal of New Music Research 46(2), 135–155. Pearce, M. T., Müllensiefen, D., & Wiggins, G. A. (2010). Melodic grouping in music information retrieval: New methods and applications. In Z. W. Ras & A. Wieczorkowska (Eds.), Advances in music information retrieval (pp. 364–388). Berlin: Springer. Pearce, M. T., Ruiz, M. H., Kapasi, S., Wiggins, G. A., & Bhattacharya, J. (2010). Unsupervised statistical learning underpins computational, behavioural and neural manifestations of musical expectation. NeuroImage 50(1), 302–313. Perlman, M., & Krumhansl, C. L. (1996). An experimental study of internal interval standards in Javanese and Western musicians. Music Perception 14(2), 95–116. Pfordresher, P. Q., & Brown, S. (2009). Enhanced production and perception of musical pitch in tone language speakers. Attention, Perception, & Psychophysics 71(6), 1385–1398. Polak, R., London, J., & Jacoby, N. (2016). Both isochronous and non-isochronous metrical subdivision afford precise and stable ensemble entrainment: A corpus study of Malian djembe drumming. Frontiers in Neuroscience 10, 285. Retrieved from https://doi.org/10.3389/ fnins.2016.00285 Raman, R., & Dowling, W. J. (2016). Real-time probing of modulations in South Indian classical (Carnatic) music by Indian and Western musicians. Music Perception 33(3), 367–393. Raman, R., & Dowling,  W.  J. (2017). Perception of modulations in south Indian classical (Carnatic) music by student and teacher musicians: A cross-cultural study. Music Perception 34(4), 424–437. Renninger, L. B., Wilson, M. P., & Donchin, E. (2006). The processing of pitch and scale: An ERP study of musicians trained outside of the western musical system. Empirical Musicology Review 1(4), 185–197. Sauvé, S., Sayad, A., Dean, R. T., & Pearce, M. T. (2017). Effects of pitch and timing expectancy on musical emotion. arXiv Preprint, 1708.03687. Schaffrath, H. (1992). The ESAC databases and MAPPET software. Computing in Musicology 8, 66. Schaffrath, H. (1994). The ESAC electronic songbooks. Computing in Musicology 9, 78. Schaffrath, H. (1995). The Essen folksong collection. In D. Huron (Ed.), Database containing 6,255 folksong transcriptions in the Kern format and a 34-page research guide [computer database]. Menlo Park, CA: CCARH.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

cultural distance   65 Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. Music Perception 14(3), 295–318. Shehan, P.  K. (1981). Student preferences for ethnic music styles. Contributions to Music Education 9, 21–28. Shehan, P. K. (1985). Transfer of preference from taught to untaught pieces of non-Western music genres. Journal of Research in Music Education 33(3), 149–158. Small, C. (1998). Musicking: The meanings of performing and listening. Middletown, CT: Wesleyan University Press. Soley, G., & Hannon, E.  E. (2010). Infants prefer the musical meter of their own culture: A cross-cultural comparison. Developmental Psychology 46(1), 286–292. Stobart, H., & Cross, I. (2000). The Andean anacrusis? Rhythmic structure and perception in Easter songs of northern Potosi, Bolivia. British Journal of Ethnomusicology 9(2), 63–92. Thompson, W.  F., & Balkwill, L.  L. (2010). Cross-cultural similarities and differences. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 755–790). New York: Oxford University Press. Trulsson, Y. H., & Burnard, P. (2016). Insider, outsider or cultures in-between. In P. Burnard, E. Mackinlay, & K. Powell (Eds.), The Routledge international handbook of intercultural arts research (pp. 115–125). New York: Routledge. Wong, P. C. M., Chan, A. H. D., Roy, A., & Margulis, E. H. (2011). The bimusical brain is not two monomusical brains in one: Evidence from musical affective processing. Journal of Cognitive Neuroscience 23(12), 4082–4093. Wong, P. C., Ciocca, V., Chan, A. H., Ha, L. Y., Tan, L. H., & Peretz, I. (2012). Effects of culture on musical pitch perception. PloS ONE 7(4), e33424. Wong, P. C. M., Roy, A. K., & Margulis, E. H. (2009). Bimusicalism: The implicit dual enculturation of cognitive and affective systems. Music Perception 27(2), 81–88. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition 115(2), 356–361.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 4

W hen Extr avaga nce I mpr esses: R ecasti ng Esthetics i n Evolu tiona ry Ter ms Bjorn Merker

Introduction When we constrain language by meter and rhyme in poetry, or when we adorn mundane earthenware pottery with decorative markings, we are making matters more complicated than utility or instrumental purposes dictate. Whole art forms, such as music, lay claim to human resources without yielding obvious returns in survival benefits. A candidate benefit such as the promotion of group cohesion through music-mediated bonding (Huron, 2001) begs the question of why humans need music to bond when non-human animals bond perfectly well without it (Lim & Young, 2006). We share with them the system of socially contingent circulation of “hormones of affiliation” (oxytocin and vasopressin, Heinrichs, von Dawans, & Domes, 2009), so enhanced expression of relevant receptors in nuclei of the basal forebrain (Kelly & Goodson, 2014) would seem to provide a less cumbersome way to increase our bonding propensities. Even assuming music does play a role in the human case (Pearce, Launay, & Dunbar, 2015), the evolutionary question of how and why music acquired a capacity to facilitate bonding remains (Pinker, 1997, p. 528). The paucity of well-supported utilitarian accounts of the function of music has led some to regard music as a by-product of other mechanisms of the mind (Pinker, 1997, p. 534) or as a culturally invented “technology” (Patel, 2008, p. 400). The approach to be detailed in what follows traces the human propensity to expend resources on the arts to the same selection pressures that have compelled a number of species of non-human animals to maintain cultural traditions featuring large repertoires of elaborate song on a learned basis.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   67 In tracing that analogy we will uncover a psychological/neural mechanism specifically involved in esthetic judgments that frames the question of the emotional impact of music in a new way. To do so, we need to approach the relevant animal displays from first principles onwards to arrive at the recent elaboration of Zahavi’s handicap principle (Zahavi, 1975) into the “developmental stress hypothesis” for the function of large and complex song repertoires in birds with vocal learning (Hasselquist, Bensch, & von Schantz, 1996; MacDougall-Shackleton & Spencer, 2012; Nowicki, Searcy, & Peters, 2002a).

Squandering as Asset, I: The Evolutionary Logic Among songbirds with vocal production learning (Janik & Slater, 1997) there is an association between a high duty cycle for song (i.e., a large amount of continuous singing per day), large song repertoires, and high pattern variety among songs (Baylis, 1982). This correlation is presumably driven by the fact that protracted production of monotonous singing loses the attention of its audience by the ubiquitous mechanism of habituation (Hartshorne,  1956; Kroodsma,  1978; Sachs,  1967; Sokolov,  1963). Persistent singing is energetically costly and takes place at the expense of useful activities such as foraging. Why, then, prolong the song display beyond the boredom threshold, thus incurring the additional cost of acquiring the means to produce elaborate song? Whence the waste and frivolity of virtuoso performance? Because the costs of signaling are paid for by the same metabolic engine that foots the bill for survival, the very fact of surviving with the added burden of exaggerated signaling is in itself informative. It supplies proof positive that the signaler is capable of sustaining that additional burden. The capacity is therefore necessarily an aspect of signaler quality, a circumstance Amotz Zahavi codified in what he named the “handicap principle” (Zahavi,  1975), a principle that completes the Darwinian theory of sexual selection (Darwin, 1871). The logic of the handicap principle is quite general, and is by no means limited to promoting potential genetic benefits to offspring. A male in an agonistic interaction with a conspecific needs to assess the actual fighting ability of the rival, and not his genetic potential. Similarly a female in a species with obligate bi-parental care must assess the extent of a prospective mate’s capacity to invest in the care of offspring. For a variety of reasons that capacity can and does vary independently of the genes he contributes to those offspring. In these cases and others, a display of excess capacity in the form of elaborate signaling can indicate capacity in the relevant behavioral dimension, provided such signaling actually relates, directly or indirectly, to abilities and resources employed in the behavioral dimension of interest to the receiver. An example involving a direct relationship between signaling and a desired or feared quality in the signaler is physical fitness itself. Loud singing for many hours on a daily

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

68   bjorn merker basis proves that the singer has the energy reserves, predator vigilance, stamina, and foraging ability to sustain such behavior without succumbing. For a receiver this means that those same resources are available for other uses should the animal’s circumstances or needs require it. Indirect relationships between signaling and signaler qualities can be quite remote, as illustrated by numerous laboratory and field studies inspired by the “developmental stress hypothesis” over the past two decades (Hasselquist et al., 1996; Nowicki et al., 2002a). The learned acquisition of elaborate song is a protracted and demanding sequence of intertwined perceptual, attentional, memory, and motor challenges that unfolds after hatching in a still developing brain. The sequence of passive song memorization, followed by stage-wise practice of vocal skill spanning over weeks and months, interacts with and feeds back upon the development and neural maturation of an elaborate system of interconnected forebrain nuclei dedicated to song learning and production (Iwaniuk & Nelson, 2003; reviewed in Nowicki et al., 2002a). Thus, the size of the song control nuclei of the mature songbird forebrain correlates not only with average song repertoire size across species, but with the repertoire size and song proficiency achieved by individuals within a species (Gahr, 2000; Garamszegi & Eens, 2004). The latter circumstance is of central biological significance, because repertoire size and song proficiency are factors used by females in choosing a mate (Nowicki, Searcy, & Peters, 2002b). Each sequential stage of this delicately tuned two-way interaction between neural development and behavior is susceptible to perturbation by a variety of external stressors and disturbances (hence the name “developmental stress hypothesis”). They include, but are not limited to, immune challenges, disease and parasites, nutritional status dependent on parental provisioning and later the bird’s own foraging ability, environmental pollutants, and disruptions at the nest (reviewed by MacDougall-Shackleton & Spencer,  2012). The management of such encumbrances consumes developmental resources which otherwise would have been available for the practice-dependent growth of the song system. A large repertoire and proficient song performance accordingly can only be acquired by an individual who as a nestling was cared for by well-functioning parents, who grew up in a secure nest, was subsequently unencumbered by disease and parasites, and—in possession of sharp faculties, memory capacity, foraging ability, and predator vigilance— engaged in hundreds of hours of successful singing practice. Whatever impairs the post-hatching growth of a bird’s system of song nuclei, and whatever keeps the bird from attending to and practicing song is later evident as deficits in the size and perfection of its mature song repertoire. This makes a large repertoire of complex song a direct causal reflection of an individual’s successful passage through a demanding and varied developmental obstacle course. The more demanding the performance to be acquired, the more comprehensive a measure of an individual’s personal history and qualities lies implicit in the perfected, mature songbout. In effect, then, an individual’s level of song proficiency sums up, in a single performance, the entire developmental history of the singer, and as such provides an all-round certificate of competence, of all-around individual phenotypic quality.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   69 It tells its audience, in a way impossible to counterfeit, that the singer comes from, as it were, “a good background.” Potential mates and rivals thus do well to take a singer ­displaying mastery and virtuosity seriously. Though none of this is likely to be accomplished without an adequate genetic background, it is the finished phenotype and not the genotype that fights with rivals and helps a bonded female provision her offspring and defend the nest. Hence the importance of markers for phenotypic quality when decisions in these regards have to be made on the spot during a brief breeding season. That is what the expert songbout provides, conferring on the singer high priority as mate or rival. Provided, that is, that there are ears competent to assess the quality of the songbout, and to discriminate an outstanding performance from a middling one. This in turn leads us to the crux of cultural esthetics, namely the means by which receivers judge performance quality and the critical dependence of those means on the cultural song tradition within which the performance takes place.

Squandering as Asset, II: A Bulwark against Bluff The circumstances outlined in the previous section help us understand why a brown thrasher accumulates a song repertoire estimated to contain over 1800 separate melodies (Kroodsma & Parker, 1977), or how the sounds of as many as 76 different species of birds from two continents can be identified in the song repertoire of a single marsh warbler individual (Dowsett-Lemaire, 1979). As we have seen, the sole reason to take a performance seriously is the protracted and demanding process of its acquisition. It is only the lengthy and exacting course of pattern acquisition and vocal skill practice that makes a songbout an all-round index of phenotypic quality. Accordingly, 1800 melodies produced by impromptu invention on the spot ought to impress less than the same number acquired by meticulous copying from the local song tradition. Why should this be so? Only the local song tradition (or other local sounds in the case of bird mimics) provides the intended audience with a standard or norm by which to judge the extent of a singing individual’s proficiency and repertoire coverage. The listeners or judges grew up in the same general neighborhood as the singer. They were therefore exposed to the same song tradition and other ambient sounds, and committed them to memory even if, as is the case for the females of some species, they do not themselves sing (the females of many species do in fact sing, see Riebel, 2003). Females are sensitive both to how much has been learned by a male and to how well his performance matches the shared ­standard, and they make their mating decisions accordingly (Nowicki et al., 2002b). Only against the background of the intimate knowledge of the local lore shared between performer and audience is it possible to tell the extent to which a given performer has achieved mastery. By the same token, potential usurpers without apprenticeship

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

70   bjorn merker in that lore but attempting to fake it need not apply. Even in species that copy the sounds of other species or environmental sounds into their repertoire (perhaps generations ago: Baylis,  1982), and thus lack species-specific constraints on the patterns that may be acquired, the repertoire is typically acquired, first and foremost, from the local song tradition carried by conspecifics. Because only exact duplication of the received ­pattern proves that the bird actually attended to and discriminated the perceptual details of its model and then practiced its articulatory complexities to perfection, copying fidelity is part of the standard. A whole circuit in the song control system of birds—the so-called anterior forebrain pathway—is dedicated to using auditory feedback of the bird’s own singing voice to gradually shape its vocal output to match the model stored in memory (see Konishi, 2004 for review). Complexity itself is therefore not the point of the performance. Random strings abound in complexity—in fact, by one measure they are ultimately complex (see, for example, Grassberger, 1986, Fig. 1)—but their complexity is of a kind that does not lend itself to comparative assessment. If individuals are to be compared one with another, the extent of their acquisition of content from a common, shared, pool of content, namely the local song tradition or soundscape, is essential for ranking their performances. If that tradition and soundscape is richer than what any single individual can easily master, then the extent of an individual’s repertoire “coverage” of that material, that is, the size of an individual’s tradition-based repertoire, is a veridical measure of his or her song learning capacity. We can conjecture that only a performance that draws on a sufficiently broad sample of the listener’s recognition memory for the local song tradition—a sample large enough to challenge and even tax the listeners’ powers of apprehension—will be taken seriously as a proof of competence. The better each song string reproduces the traditional model the better will their aggregate fill this function. A judgment of competence might have to be upgraded to one of mastery if, in addition to featuring extensive coverage of the traditional lore and high fidelity reproduction, the performance starts pushing the limits of the listener’s recognition memory. This would happen when the songbout includes material that in fact forms part of the tradition but was “missed” by the listener’s own ontogenetic acquisition process, or when it features patterns introduced by the singer as virtuoso embellishments. Under such circumstances the listener has good reason to take the performance seriously indeed. In either case the performer is giving proof of a capacity beyond that possessed by the listener, given the proviso that in either case such “excess” material (excess from the standpoint of the listener) fits seamlessly into the framework of the received form. It is copying fidelity (supported by what I have called a “conformal motive,” Merker,  2005) that gives the resulting cultural song tradition the temporal inertia needed for it to serve as a standard of judgment. It in effect stabilizes the tradition against too rapid accumulation of inevitable copying errors. Thus stabilized, it provides individual learners with a vehicle by means of which to advertise the quality of their developmental history through the quality and scope of their command of the local tradition,

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   71 and their audience with a standard by which to judge the same performance. We turn now to the inner workings of the listener’s responsiveness in this regard.

Squandering as Asset, III: Surrendering to Mastery Judgments of a singer’s performance are fraught with consequences for listeners, be they potential rivals or mates. In the case of the opposite sex it determines the partner with whom one or more breeding seasons—even a lifetime—will be spent, and in the case of same-sex rivalry it determines matters as important as the quality of the territory on which foraging and the rearing of offspring will take place. Much therefore hinges on the assessment of the songbout that serves as a proxy for the phenotypic qualities it underwrites, as covered in the previous two sections. How then to compare and judge the streams of intricately patterned sound emanating from the throats of singers (perhaps not even visible to their judges)? Something must intervene in the psychology of the receiver/judge/listener between apprehension of the songbout and the real-life choice the receiver makes on its basis. That something can hardly be formal analysis of the contents of the songbout, but ought to be some form of intuitive summary measure of the extent to which the songbout taps and taxes the listener’s knowledge of the local song tradition. Some form of global emotional summary thus lies close at hand. As we have seen, the repertoire size, model fidelity, pattern complexity, and ease or elegance of delivery of a songbout must be measured against the local song tradition as its standard. It is assessable, therefore, only against a background of prior familiarity with that local song tradition (or soundscape, in the case of mimics). A principal function of the bulging forebrain system of warm-blooded animals (i.e., birds and mammals, which are largebrained compared to the rest of the animal kingdom) is to determine the extent to which current sensory afference pushes or exceeds the boundaries of prior stimulus familiarity. This quantity has variously been called surprisal (Tribus, 1961), novelty (Berlyne, 1960; Bindra, 1959; Sachs, 1967; Sokolov, 1963), surprisingness (Kamin, 1969), prediction error (not named as such: Rescorla & Wagner, 1972), and expectancy violation (Meyer, 1956). It has been related specifically to esthetics by Berlyne (1971). Though there are differences in emphasis and detail behind these names, they all have a shared functional principle at their core, readily interpretable in informal Bayesian terms (Rohrmeier and Koelsch, 2012). The operation of this principle is captured by the free energy formulation of the logistics of bi-directional learning networks pioneered by Geoffrey Hinton and colleagues (Hinton & Zemel, 1994), subsequently popularized by Karl Friston and others (Clark, 2013; Friston, 2002). Implemented through an elaborate neural system which besides its neocortical parts involves the hippocampus, amygdala, and diencephalic and midbrain way stations (see

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

72   bjorn merker Merker, 2007a, Fig. 3 and Merker, 2013a, Fig. 2) this function converts the informational content of sensory experience to a running emotional summary in real time of the extent to which the pattern of current afference exceeds the bounds of prior experience. When those bounds are exceeded, this system signals caution, apprehension, fear, and even terror, along a dimension that represents the magnitude of novelty, expectancy violation, or prediction error. For present purposes, it suffices to conceive of movement along this dimension to be signaled by increasing levels of central activation. Central activation is reflected in cortical gamma oscillations (Merker, 2013b), and peripherally in the specifically cholinergic aspect of sympathetic activity reflected in skin conductance changes (Shields, MacDowell, Fairchild, & Campbell, 1987), which vary linearly with the intensity of psychological (emotional) activation (Bradley & Lang, 2000). Long before the recent formal treatments of this system were inaugurated, its behavioral and psychological aspects had been studied by psychologists and physiologists interested in the learning dynamics of habituation to novelty already cited. Their results can be summarized in terms of a pattern of graded responsiveness along a single psychological/ emotional dimension. Cognitively it spans a spectrum from total certainty to total uncertainty, behaviorally from sleep to freezing, and emotionally from boredom to terror. Between the latter two extremes lies a gradient of emotional states ranging from mild interest, to active curiosity, caution, and fear, as depicted in Fig. 1. When the prior stimulus familiarity of such a system, stored as recognition memory, includes a massive repertoire of local song, acquired during the intensive song learning stage of ontogeny, the normal operation of this system renders it a sensitive detector of the extent to which a currently experienced songbout pushes or exceeds the boundaries of the listener’s recognition memory. To the extent that it does, the system will deliver the selfsame real-time emotional summary along the central activation dimension for MAXIMAL UNCERTAINTY Freezing Escape The novelty spectrum in behavior & emotion Bindra 1959 Sachs 1967

Orienting/Fixation

Terror Fear Caution Surprise

Exploration

Interest/curiosity

Idling

Indifference

Rest/Sleep

Boredom

Habituation of the orienting reflex Sokolov 1963

MAXIMAL CERTAINTY

Figure 1.  Schematic depiction of the “information dimension” outlined in the text. It is a composite of concepts and findings of three authors studying responses to novelty in the 1960s (Bindra, 1959; Sachs, 1967; Sokolov, 1963). The dimension spans from maximal certainty (minimal prediction error) at the bottom, to maximal uncertainty (maximal prediction error) at the top. Behavioral reactions are presented in the left-hand column, and inferred psychological states in the right-hand column.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   73 that songbout as for any other sensory experience. An impoverished sample of the local song tradition will be experienced as “boring.” An adequate sample rendered with confidence or flair will be experienced as “interesting.” Finally, a bout whose pattern richness taxes the limits of the listener’s recognition memory would be experienced as apprehension or fear, were it not for the fact that it is set apart from other activities by its character of performance, or “display” in behavioral biology terms, and is recognized as such by all concerned. Thus framed and contextually constrained, the superior performance induces not outright apprehension or fear in the listener, but a “tamed” version of the same in the form of being “touched,” “moved,” “impressed,” and—at the high end of the informational dimension—even “awed” by what is heard (cf. Konečni, 2005, 2015). This hedonic shift instantiates, in other words, the principle that in a context of safety, negative emotions may undergo a “hedonic reversal” to be experienced as positive (Apter,  1982; Bloom, 2010; Strohminger, 2013). The principal proposal of this chapter, then, is this: The biological roots of the esthetic emotions, animal and human, are to be found in this informational dimension of telencephalic operations in large-brained species. So far these esthetic emotions have been discussed primarily in the context of human responses to art (Berlyne,  1971; Konečni, 2005, 2015; Konečni, Brown, & Wanic, 2008; Kuehnast, Wagner, Wassiliwizky, Jacobsen, & Mennighaus,  2014; Scherer & Zentner,  2001; Scherer, Zentner, & Schacht, 2001–2002). For brevity, I propose to use the expressions “moving” and “being moved” (Konečni, 2005, 2015), and at times the equivalent “impressing,” “impressive,” and “being impressed,” as shorthand for phenomena associated with the mid-range of emotional responsiveness to esthetic stimuli, flanked by “interest” at the less intense, and by “awe” at the more intense, end of the range, as depicted in Fig. 2. The reason the heart of a Bengalese finch female starts beating faster on hearing a tape recording of an accomplished male singer (Okanoya,  2004) would accordingly be “because she is moved or impressed” by what she hears. And if we are indeed on the grounds of emotion we should be able to specify an action tendency or behavioral bias promoted by that emotion (Ekman, 1999; Fontaine & Scherer, 2013; Frijda, 1987;

HEDONIC REVERSAL Real life

Esthetics

Terror “danger zone” Fear Caution Surprise Interest/curiosity Indifference Boredom

Awe Being moved or impressed Curiosity/interest Indifference Boredom

Figure 2.  The same spectrum of psychological states as in the right-hand column of Fig. 1, paired with their counterparts in the domain of esthetics. The context of existential safety that frames esthetic experience occasions a “hedonic reversal” of valence in the upper reaches of the esthetic information dimension, here designated “danger zone.”

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

74   bjorn merker Izard, 2007). In view of what has gone before the answer is not far to seek: the action tendency promoted by the more intense levels of being impressed is that of “yielding,” “surrender,” “submission,” or “capitulation” to the source of the impressive performance, be the performer a potential mate or a rival. In a sense, the hedonic reversal from “fear” to “being moved” or “impressed” is reflected in a replacement of the behavioral tendency to escape by a tendency to yield or surrender. And if, finally, we ask for the eliciting stimulus or antecedent that evokes the emotion of being impressed, the answer can only be “an outstanding performance.” We can accordingly sum up this excursion into the biology of “squandering as asset” by saying that an “outstanding performance” before listeners conversant with the tradition to which the performance belongs will move or impress those listeners, and that their emotional response of being impressed is realized in an action tendency toward “surrender,” directed at the performer exhibiting mastery through the performance. This ascription of the effect of esthetic stimuli to the operation of the informationrelated emotional dimension sets them clearly apart from both motivational systems in general (their hedonic aspects included, for which see Bloom, 2010) and from the domain of basic emotions as a whole (Ekman, 1999). The boredom-to-awe spectrum is but the full unpacking of one of the basic emotions, variously namely “interest” (Izard, 2007) or “surprise” (Ekman, 1999). In keeping with the “cerebral” nature of this emotional spectrum, the neural system for learning, producing, apprehending, and judging song in birds with vocal learning is concentrated to the telencephalon of their forebrain (Jarvis, 2007). Finally, to counter possible misunderstanding of the role assigned to emotion in the present proposal: the fact that the process of assessing the merits of a songbout is mediated by an emotional variable (“being moved or impressed”) by no means implies that the patterns of the song somehow “portray emotion,” are about emotion, or are a vehicle for communicating emotions. They portray nothing outside of themselves. What they communicate is command of repertoire, complexity, and mastery of execution, not anything encoded, language-like, in those patterns (more on this in the section “The Psychological Impact of Music”). When performed by an accomplished singer, a listener attuned to the relevant song tradition registers appreciation of the performance in the form of being interested, moved, or awed, according to the degree of command of tradition and virtuosity displayed therein. The emotion is about the pattern, and not the other way around. That the pattern in turn reflects the all-round phenotypic qualities of the performer is what allows esthetics to be cast in evolutionary terms, if the argument developed in this chapter has any merit.

Transposing to the Human Case We are now ready to make a swift transition to human arts and esthetics, and we do so via human music on the plausible assumption that the first form of human music proper was song. In fact, song may have preceded speech in our evolutionary history, perhaps

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   75 in the form of song and dance in a group setting as a first form of the human arts (Merker, 2005, 2008). A strong reason to make these assumptions is provided by the fact that humans, unlike our closest relatives among the apes, indeed unlike any other primate, are vocal learners, and more specifically, are vocal production learners (Janik & Slater, 1997; see also Doupé & Kuhl, 1999). Among non-human animals, this capacity for learning to reproduce by voice novel sound patterns originally received by ear has most commonly evolved to serve learned song. Therefore, the default assumption regarding the function for which our own capacity for vocal learning originated would be song as well. If so, learned song preceded speech in our evolutionary ancestry (see Merker, 2012, 2015 for details), and we have landed squarely in the constellation of factors outlined in previous sections as critical for the origin and maintenance of complex cultural traditions of ritual lore in animals with learned song. As a biological trait, this would include the motivational mechanism of a conformal motive ensuring fidelity to tradition (Merker, 2005), the role of prior familiarity in appreciation (cf. Madison & Schiölde, 2017), as well as the ultimate purpose for taking on the burden of acquisition, namely to impress a competent audience with one’s command of the shared lore. As we saw in the section “Squandering as Asset, II,” fidelity to tradition coupled with a shared exposure history furnishes a standard of judgment short of which the tradition eventually collapses into idiosyncratic caprice without grounds for comparing one performance with another. Trends in Western art over the past century have tended to obscure the fundamental nature of this connection. In fact, it has been actively combatted as a fetter on the exercise of untrammeled creativity. In good agreement with the present perspective, the history of contemporary art accordingly abounds in examples of idiosyncratic caprice exercised in the absence of shared criteria for comparing one performance or creation with another. Proof of this assertion surfaces from time to time in the form of adventitious revelations that expose the arbitrary nature of the judgments involved (Cheston, 2015; Jordan-Smith, 1960; Museum of Hoaxes, 2005. Also: Wikipedia entries for Disumbrationism, Pierre Brassau). Each step toward such a state of affairs typically meets with opposition when it first occurs. Presumably this trend in the serious arts of Western culture (poetry, painting, and music first and foremost, though not limited to these) would not have proceeded as far as it has were it not for a more general cultural ambience in the West emphasizing the inherent value of novelty and the sanctity of artistic freedom, buttressed by the Romanticism myth of artistic genius. That myth emphasizes the role of rare artistic endowment over that of diligent mastery of a tradition in the genesis of great art (Smith, 1924; Waterhouse, 1926). This cultural ambience eventually ripened into outright celebration of iconoclasm in the course of the twentieth century. Yet even then, with each advance of idiosyncratic license, voices were raised in protest, sometimes trenchantly so. One illustrative example occurred when a faction of musicians in the modern jazz genre abandoned all reliance on traditional form in what they styled “free form jazz.” The jazz bassist and band leader Charles Mingus, a creative musician by no means a stranger to innovation, witnessed a key event in this development. It was Ornette

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

76   bjorn merker Coleman’s controversial 1960 performances at the New York City “Five Spot” jazz club. Mingus commented: “. . . if the free-form guys could play the same tune twice, then I would say they were playing something . . . Most of the time they use their fingers on the saxophone and they don’t even know what’s going to come out. They’re experimenting” (Wikipedia entry “Charles Mingus”). On another occasion he noted, “They don’t even know their Parker” (B. Merker, personal observation). Note that Mingus’ comments by no means are directed against creativity or innovation as such. They remind us, rather, of the necessity, under circumstances where freedom is in fact possible because the means of artistic expression are learned, of a shared exposure history to ground substantive assessment of artistic merit. It is that shared background that supplies the crucial “common currency” by which alone the informational emotion of “being impressed” serves as an index of comparative value across different performances. Without that anchoring in a shared tradition, the emotional reaction of being impressed becomes as arbitrary and idiosyncratic as the performances themselves. The bulwark against bluff has been broken. The reaction of being impressed by outstanding artistic creations for which one has been prepared by an appropriate exposure history is ubiquitous across the arts. It must not be confounded with the kind of emotional responses that originate in personal associations forged in the course of significant life events. A tune that figured prominently in a teenage romantic infatuation may, when encountered years or even decades later, compel strong feelings on an associative basis without reflecting on the tune’s artistic merits (Konečni, 2005; Rauhe, 2003; Scherer & Zentner, 2001). It is otherwise when we encounter a piece of music, perhaps for the first time, for which our listening history of the genre to which it belongs has equipped us to appreciate its masterfully patterned content, and we groan and even weep in admiration (Gabrielsson, 2011; Scherer et al., 2001–2002; see also Konečni, 2005). We may even feel our skin covered in goose-bumps, and a shiver or chill traverse our spine (reviewed in Hunter & Schellenberg, 2010; see further Gabrielsson, 2011; Scherer & Zentner, 2001; Silvia & Nusbaum, 2011; Vickhoff, Åström, & Theorell, 2012). But what a peculiar way to express our admiration, by sighing, groaning, chills, and even tears! Our analysis of animal cultural esthetics allows us to make sense of these peculiar behavioral and physiological tokens of being impressed. An ordinary trigger for bodily reactions such as shivers, goose-bumps (piloerection), or chills is genuine fear (Marks, 1969, pp. 2, 39). They are the peripheral expressions of the central fear state, as it engages the autonomic (sympathetic) nervous system on an automatic, involuntary basis. These low-level autonomic (involuntary) reactions apparently remain patent even under circumstances where an esthetic stimulus taps the fear range of the information dimension, but on contextual grounds undergoes a hedonic reversal, as already covered. Thus the shivers, chills, and goose-bumps betray the origin of the emotional impact of strong esthetic experiences in the fear range of the informational dimension depicted in Figs. 1 and 2, in good agreement with the present interpretive framework (cf. Benedek & Kaernbach, 2011).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   77 Similarly for the sighing, groaning, and weeping elicited by strong esthetic experiences. Tears appear to be the most common bodily response to strong experiences of music (Gabrielsson, 2011; Scherer et al., 2001–2002; see also Konečni, 2005). A prominent ordinary setting for such reactions is the experience of personal loss, for which such reactions serve a largely involuntary expressive function (e.g., Averill, 1979; Frijda, 1988). As we saw in the previous section, the action tendency contingent on being esthetically moved should be a readiness, indeed an urge, to yield, submit, surrender, or capitulate to the source of an impressive performance, be it a rival who has bested us by a masterly performance, or a suitor who has penetrated our defenses by the same. In either case, loss is implicit in the act of surrender. Being bested by a rival is attended by a direct loss of status and its perquisites. What hovers in the evolutionary background, as we have seen, is the potential for physical attack from an agonist whose masterful performance, according to the developmental stress hypothesis, advertises his all-round superior phenotypic characteristics. In surrendering to a suitor, one loses freedom of choice in matters as important as the parentage of one’s offspring, along with loss of personal independence for the considerable stretch of time that the partnership will last. More abstractly conceived, a certain giving up (loss) of self is implicit in every act of submission. Arthur Schopenhauer emphasized “forgetfulness of self ” in discussing esthetics, and its special relation to experiences of the sublime which he illustrated by way of landscape painting (Schopenhauer, 1844/1966, vol. I, pp. 200ff., vol. II. pp. 369ff.). For a recent discussion of this important (and once celebrated) topic in esthetics, see Konečni (2005, 2011). Absorption in the pattern-stream of a musical performance promotes forgetfulness of anything extraneous, including one’s sense of self. Such absorption, given the requisite level of background familiarity, will be all the more complete and compelling in the case of outstanding performances, not only because their masterly patterning invites it, even compels it, but because they tax our powers of apprehension. Then self-surrender and forgetfulness of self may reach a peak, a circumstance that may bear on the psychology of transcendental and religious experiences that are a prominent aspect of strong experiences of music (Gabrielsson, 2011). What drives tears to our eyes even though we are not actually sad or grieving, then, is the tacit sense of loss coupled to the action tendency of surrender promoted by an outstanding performance. The phenomenon is not even strictly confined to arts and esthetics: similar responses can occur on witnessing an outstanding performance in, say, sports. To prevent misunderstanding, note that none of this is to be taken to mean that the connection between such reactions and surrender is directly present to the minds of those experiencing them. The listening mind is typically absorbed in the pattern of the performance, far from the sadness of loss or the cold hand of fear. In keeping with the hedonic reversal invoked here, happiness and joy are typical of these intense experiences (Gabrielsson, 2011). These caveats regarding what might be present to the mind of the listener/beholder do not mean, however, that the evolutionary logic of capitulating to the originator of a

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

78   bjorn merker masterful display necessarily is a matter of our distant ancestry only. There is no dearth of examples of strangers soliciting casual amorous liaisons with famous creators of art on the basis of encountering their creations alone (Lipsius, 1919; Miller, 2000, p. 331; see also Nettle & Clegg, 2006). In sum, only where artistry is embedded in a tradition that poses a challenge of acquisition for its practitioners and also has shaped the sensibilities of the intended audience does the latter’s emotional response of being impressed provide a measure of artistic merit. It is only when both conditions are met that a causal connection between intuitive response and artistic merit is in fact patent, according to the psychological account given in the section “Squandering as Asset, III.” Such was typically the state of affairs throughout human cultures until the advent of modernity in the West, and even there it still holds for its popular culture within any of its given subcultures. By the same token, where anything can be art, nothing in fact is.

The Psychological Impact of Music: Neither “Meaning” nor “Emotion” The thesis that art generally, and music specifically, exerts its effect via the informational dimension defined in the section “Squandering as Asset, III” has obvious consequences for the much discussed issue of “music and meaning” and its subdomain “music and emotion” (Davies, 1994; Juslin & Sloboda, 2001; Meyer, 1956; Robinson, 1997). It does so by allowing us to draw a principled distinction between the undoubted psychological impact of music on the one hand, and questions of its carrying meaning as well as of its portraying or inducing emotions on the other. Strictly speaking only the sentences of language “mean” at all, as most trenchantly argued by Staal (Staal,  1989). The multilevel combinatorics of phonemes and morphemes by which language performs its arbitrary (in the sense of conventional) mapping between the form of utterances and their meaning (compare “bord,” “Tisch,” and “table” for the selfsame type of object in Swedish, German, and English, respectively) constitutes a bona fide code for representing and conveying meaning. This code is so detailed and comprehensive that virtually every difference in the strings of phonemes that make up sentences makes a difference in the information conveyed by those sentences. This lexically semanticized syntactic code turns sequential patterns of vocally produced sounds into statements about things that bear not the slightest resemblance to those sound sequences themselves (see the “table” example above). Thus we think and communicate about objects, events, matters of fact, states of the world, ideas, intentions, beliefs and desires, without limit, using the same few dozen phonemes to do so. This is what it means to “mean,” namely that something encodes something other than itself, which is its meaning.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   79 Much of what compels our interest, carries significance, and recruits our psychological engagement—in short, has psychological impact—does so without the detour of meaning in this sense. Our non-linguistic perception and cognition quite generally operates on patterns of sensory input by discriminating, segmenting, grouping, classifying and generalizing within and across them, and not by using them as vehicles for encoding matters other than themselves. Something need not, in other words, mean in order to be meaningful: witness the experience of a magnificent sunset as but one of an infinitude of cases in point. Viewed in this light, the patterns of music define themselves as perceptual objects that engage the informational dimension of our perceptual/cognitive capacities in the manner of auditory analogs of visually presented arabesques or the shifting patterns of a turning kaleidoscope, to use Hanslick’s felicitous metaphors (Hanslick, 1854). As such they need to “sound good” (and “better,” and “best”), not to refer to circumstances other than themselves. In so doing they exploit the limitless pattern-generativity music conquers for itself by a combinatorics of “particulate” elements drawn from the discretized continua of pitch and duration (for which see Abler, 1989; Merker, 2002; Merker, Morley, & Zuidema,  2015). This limitless generativity fits ill with a conception of music as a device for portraying or evoking the limited set of subjective states that make up our emotions. Not only is the empirical evidence supporting that conception weak (Konečni, 2003, 2008; Konečni et al., 2008; Scherer, 2003), but weighty arguments have been leveled against it, arguments for which Hanslick’s 1854 essay is still the unsurpassed locus classicus (Hanslick, 1854; see also Davies, 1994; Zangwill, 2004). It is the common experience of having been moved by music, even to the point of tears or chills in some instances, that has lent credence to the notion that music, somehow, is “about” emotions, or exerts its effects by inducing them. This “being moved” or being impressed by music is indeed an emotional response. But as we saw in the section “Squandering as Asset, III,” that response moves up and down the intensity dimension of a single one of the basic emotions, rather than across them. The fact that music has an emotional impact in this sense must accordingly be sharply distinguished from the claim that music is about emotions in the plural, or aims at evoking emotions, again in the plural, which in either case it would have to do to fit the metaphor of being a “language of emotions” (Spencer, 1911). To the extent that the patterns of music refer to circumstances other than themselves (e.g., storms, battles, or dancing peasants in programmatic music) they tend to do so by dynamically or otherwise mimicking, resembling, or caricaturing the things to be evoked (Hanslick, 1854). That is not how language carries meaning except in the special cases of onomatopoeia and some of the uses of prosody, both of which lie outside of the central coding device that gives the sentences of language their unique and unbounded capacity to mean. So even when music is intended to mean—which is far from always the case—it does not mean, it mimics. In song and music without lyrics it is the vocal or instrumental patterns themselves that are the information conveyed. As detailed in previous sections, our emotional

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

80   bjorn merker response to these patterns concerns the inducing patterns themselves as they interact with the background of our prior musical familiarity. When that background relates to the genre to which the patterns belong, the specificity, scope, and intricacy of the interaction is commensurately enhanced. It is here that the infinite pattern generativity of music comes into its own. It furnishes the makings of the untold structural devices (variation and repetition, various symmetries and asymmetries, etc.) needed to create temporal trajectories capable of sustaining our interest in the face of the ubiquitous habituability of our cognitive equipment, a habituability that converts every novelty to “old hat” in the course of a few encounters. It is not to our emotions that this content is addressed in the first place, but to our imagination, as Eduard Hanslick, following Arthur Schopenhauer, insisted (Hanslick,  1854; Schopenhauer, 1844/1966, vol. II, pp. 447ff.).1 Engaging the contours of our recognition memory the ever-varied patterns of music trigger a variety of familiarity-based expectancies which their temporally unfolding melodic, rhythmic, and harmonic patterns confirm, violate, or complement, generating tensions, their resolution, and new expectancies in ever-shifting peregrinations across the sensibility landscape sculpted by the listener’s history of prior exposure (Meyer, 1956; Narmour, 1977; Schopenhauer, 1844/1966, vol. II, p. 455). For a given musical listening experience, it is the cumulative effect—presumably in “leaky integrator” fashion—of the particular sequence of twists and turns along the temporal trajectory of this interaction that determines how far up the information dimension toward “awe” a given experience of music takes us and thus the extent to which it moves or impresses us. In this sense “being moved” or “impressed” is a specifically esthetic emotion. It may even be deemed the esthetic emotion (Konečni, 2005, 2011, 2015), generated by hedonic reversal in the danger zone of the information dimension. In both cultural history and the listening history of individuals, the infinite space of musical combinatorics differentiates into occupied subregions according to genre (cf. Merker, 2002, pp. 11–12). Individual specimens that make up any given region of this space will exhibit greater or lesser elegance with greater or lesser degrees of wellformedness (see, e.g., Lerdahl & Jackendoff, 1983) and greater or lesser efficacy in stirring a given listener’s imagination on encounter. And just as in other perceptual and cognitive systems, pattern invariants are bound to be extracted across the sampled space in accordance with a variety of shared structural characteristics, clustering musical impressions under high-level descriptors. Thus the categories “nostalgia” (sentimental, dreamy, melancholic), “power” (energetic, triumphant, heroic), and “tension” (agitated, nervous, impatient, irritated) extracted by Zentner and colleagues from responses to a diverse sample of European classical music (Zentner, Grandjean, & Scherer, 2008). The authors interpret their results in terms of a model of music-specific emotions. They might also be construed in terms of high-level intuitive (statistical) 1  For Schopenhauer’s influence on Hanslick, see Merker (2007b), to which can be added the fact that the “smoking gun” of that influence in Hanslick’s final paragraph was eliminated from all but the first edition of Hanslick’s famous essay.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   81 pattern-classification, ranging across the vast and multiform world of musical patterns that have accumulated in a given musical culture and whose uneven sampling has shaped the musical familiarity and sensibilities of any given listener. The perspective on the psychological impact of music presented here by no means relegates music to an abstract domain of formalist connoisseurship. Some of its patterns— say of rhythmic music meant to support dancing—access a presumably species-specific predisposition for bodily entrainment to isochrony-based auditory patterns, and help optimize such entrainment (Merker, 2014; Merker, Madison, & Eckerdal, 2009). The central role of music in youth and popular culture suffices to dispel any overly formalist notion of its nature, and fits well with the evolutionary perspective on esthetics ­presented here.

Conclusion To sum up: the emotional impact of music is best understood not by analogy to the meaning encoded in language, nor by assimilation to the biology of basic emotions, but through the behavioral biology of the Zahavian handicap principle (Zahavi, 1975) and its psychological ramifications. Where handicaps take the form of esthetic displays—from the peacock’s tail to the vocal artistry of pied butcherbirds (Taylor, 2009)—mechanisms for judging their quality must be in place, typically in the medium of an emotional dimension spanning from boredom, via interest/curiosity, to being impressed, with awe and a sense of sublimity at its high end. As I have been at pains to make credible, the elaboration of Zahavi’s handicap principle in the developmental stress hypothesis for the size and complexity of birdsong repertoires provides an eminently plausible interpretive framework for the nature and function of human song and music as well. It dispels the appearance of frivolity encumbering our expenditure of effort and resources on acquiring and producing the pattern richness of human song and music. By exact analogy to the case of learned birdsong, it gives us a means to display command and mastery of a trove of culturally patterned and transmitted lore. Such command and mastery serves not only as a badge of competence in the culture, but as a certificate of the phenotypic traits needed to achieve that competence. In our case today, music is not alone in providing such a shorthand certificate of phenotypic competence. It was eventually supplemented by language performing that same function among others. The two domains share not only the pattern generativity of a combinatorics of discrete elements, but also the mechanism of vocal learning, and the cerebral equipment for pattern-assessment. It is even possible that language grew out of song in a glacial movement of contextual semanticization of song repertoires, as detailed in Merker (2012). For music, in the setting of a cultural tradition of pattern familiarity shared between performer and listener, the circumstances reviewed here allow a given performance to

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

82   bjorn merker be appreciated, and even to be assessed, on occasion, as an outstanding one. And that, I submit, is when extravagance impresses, and what is more, when it rightfully should impress, according to the recasting of esthetics in evolutionary terms that has been the burden of this chapter.

References Abler, W. L. (1989). On the particulate principle of self-diversifying systems. Journal of Social and Biological Structures 12(1), 1–13. Apter, M. J. (1982). The experience of motivation: The theory of psychological reversals. New York: Academic Press. Averill, J. R. (1979). The functions of grief. In C. Izard (Ed.), Emotions in personality and psychopathology (pp. 339–368). New York: Plenum Press. Baylis, J. R. (1982). Avian vocal mimicry: Its function and evolution. In D. E. Kroodsma & E. H. Miller (Eds.), Acoustic communication in birds (pp. 51–83). New York: Academic Press. Benedek, M., & Kaernbach, C. (2011). Physiological correlates and emotional specificity of human piloerection. Biological Psychology 86(3), 320–329. Berlyne, D. E. (1960). Conflict, arousal, and curiosity. New York: McGraw-Hill. Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton Century. Bindra, D. (1959). Stimulus change, reactions to novelty, and response decrement. Psychological Review 66(2), 96–103. Bloom, P. (2010). How pleasure works. New York: W. W. Norton. Bradley, M. M., & Lang, P. J. (2000). Measuring emotion: Behavior, feeling and physiology. In R. D. Lane, L. Nadel, & G. Ahern (Eds.), Cognitive neuroscience of emotion (pp. 242–276). New York: Oxford University Press. Cheston, P. (2015). Artist in legal row claims “former workshop sold her paint-spattered carpet as genuine works.” Retrieved from http://www.standard.co.uk/news/london/artist-in-legalrow-claims-former-workshop-sold-her-paint-spattered-carpet-as-genuineworks-a2947666.html Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences 36(3), 181–204. Darwin, C. (1871). The descent of man and selection in relation to sex. New York: D. Appleton & Company. Davies, S. (1994). Musical meaning and expression. Ithaca, NY: Cornell University Press. Doupé, A.  J., & Kuhl, P.  K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience 22, 567–631. Dowsett-Lemaire, F. (1979). The imitative range of the song of the Marsh Warbler, Acrocephalus palustris, with special reference to imitations of African birds. Ibis 121(4), 453–468. Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion (pp. 45–60). Chichester: John Wiley and Sons. Fontaine, J. J. R., & Scherer, K. R. (2013). Emotion is for doing: The action tendency component. In J. J. R. Fontaine, K. R. Scherer, & C. Soriano (Eds.), Components of emotional meaning: A sourcebook (Chapter 11). Oxford Scholarship Online. Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199592746.001.0001 Frijda, N. H. (1987). Emotion, cognitive structure, and action tendency. Cognition and Emotion 1(2), 115–143.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   83 Frijda, N. H. (1988). Laws of emotion. American Psychologist 43(5), 349–358. Friston, K. (2002). Functional integration and inference in the brain. Progress in Neurobiology 68(2), 113–143. Gabrielsson, A. (2011). Strong experiences with music. Oxford: Oxford University Press. Gahr, M. (2000). Neural song control system of hummingbirds: Comparison to swifts, vocal learning (songbirds) and nonlearning (suboscines) passerines, and vocal learning (budgerigars) and nonlearning (dove, owl, gull, quail, chicken) nonpasserines. Journal of Comparative Neurology 426(2), 182–196. Garamszegi, L. Z., & Eens, M. (2004). Brain space for a learned task: Strong intraspecific evidence for neural correlates of singing behavior in songbirds. Brain Research Reviews 44(2–3), 187–193. Grassberger, P. (1986). Toward a quantitative theory of self-generated complexity. International Journal of Theoretical Physics 25(9), 907–938. Hanslick, E. (1854). Vom musikalisch Schönen. Beiträge zur Revision der Ästhetik der Tonkunst. Leipzig: Weigel. Hartshorne, C. (1956). The monotony threshold in singing birds. Auk 73, 176–192. Hasselquist, D., Bensch, S., & von Schantz, T. (1996). Correlation between song repertoire, extra-pair paternity and offspring survival in the great reed warbler. Nature 381(6579), 229–232. Heinrichs, M., von Dawans, B., & Domes, G. (2009). Oxytocin, vasopressin, and human social behavior. Frontiers in Neuroendocrinology 30(4), 548–557. Hinton, G.  E., & Zemel, R.  S. (1994). Autoencoders, minimum description length, and Helmholtz free energy. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6 (pp. 3–10). San Mateo, CA: Morgan Kaufmann. Hunter, P. G., & Schellenberg, E. G. (2010). Music and emotion. In M. R. Jones, R. R. Fay, & A. N. Popper (Eds.), Music perception (pp. 129–164). New York: Springer. Huron, D. (2001). Is music an evolutionary adaptation? Annals of the New York Academy of Sciences 930, 43–61. Iwaniuk, A. N., & Nelson, J. E. (2003). Developmental differences are correlated with relative brain size in birds: A comparative analysis. Canadian Journal of Zoology 81(12), 1913–1928. Izard, C. E. (2007). Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspectives on Psychological Science 2(3), 260–280. Janik, V.  M., & Slater, P.  J.  B. (1997). Vocal learning in mammals. Advances in the Study of Behavior 26, 59–99. Jarvis, E.  D. (2007). Neural systems for vocal learning in birds and humans: A synopsis. Journal of Ornithology 148, 35–44. Jordan-Smith, P. (1960). The road I came; some recollections and reflections concerning changes in American life and manners since 1890. Caldwell, Idaho: Caxton Printers. Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford: Oxford University Press. Kamin, L. J. (1969). Predictability, surprise, attention, and conditioning. In R. Church & B. Campbell (Eds.), Punishment and aversive behavior (pp. 279–296). New York: AppletonCentury-Crofts. Kelly, A. M., & Goodson, J. L. (2014). Social functions of individual vasopressin–oxytocin cell groups in vertebrates: What do we really know? Frontiers in Neuroendocrinology 35(4), 512–529. Konečni, V.  J. (2003). Review of P.  N.  Juslin and J.  A.  Sloboda (Eds.), Music and emotion: Theory and research. Music Perception 20, 332–341.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

84   bjorn merker Konečni, V. J. (2005). The aesthetic trinity: Awe, being moved, thrills. Bulletin of Psychology and the Arts 5(2): 27–44. Konečni, V. J. (2008). Does music induce emotion? A theoretical and methodological analysis. Psychology of Aesthetics, Creativity, and the Arts 2(2), 115–129. Konečni, V. J. (2011). Aesthetic trinity theory and the sublime. Philosophy Today 55, 64–73. Konečni, V. J. (2015). Being moved as one of the major aesthetic emotional states: A commentary on “Being moved: linguistic representation and conceptual structure.” Frontiers in Psychology 6, 343. Konečni, V. J., Brown, A., & Wanic, R. (2008). Comparative effects of music and recalled lifeevents on emotional state. Psychology of Music 36(3), 289–308. Konishi, M. (2004). The role of auditory feedback in birdsong. In H. P. Ziegler & P. Marler (Eds.), The behavioral neurobiology of birdsong. Annals of the New York Academy of Sciences 1016, 463–475. Kroodsma, D.  E. (1978). Continuity and versatility in birdsong: Support for the monotony threshold hypothesis. Nature 274(5672), 681–683. Kroodsma, D. E., & Parker, L. D. (1977). Vocal virtuosity in the brown thrasher. Auk 94, 783–785. Kuehnast, M., Wagner, V., Wassiliwizky, E., Jacobsen, T., & Mennighaus, W. (2014). Being moved: Linguistic representation and conceptual structure. Frontiers in Psychology: Emotion Science 5, 1242. Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. Lim, M.  M., & Young, L.  J. (2006). Neuropeptidergic regulation of affiliative behavior and social bonding in animals. Hormones and Behavior 50(4), 506–517. Lipsius, I. M. (1919). Liszt und die Frauen. Leipzig: Breitkopf & Härtel. MacDougall-Shackleton, S. A., & Spencer, K. A. (2012). Developmental stress and birdsong: Current evidence and future directions. Journal of Ornithology 153(Suppl. 1), S105–S117. Madison, G., & Schiölde, G. (2017). Repeated listening increases the liking for music regardless of its complexity: Implications for the appreciation and aesthetics of music. Frontiers in Neuroscience 11, 147. Marks, I. M. (1969). Fears and phobias. New York: Academic Press. Merker, B. (2002). Music: The missing Humboldt system. Musicae Scientiae 6(1), 3–21. Merker, B. (2005). The conformal motive in birdsong, music and language: An introduction. In G. Avanzini, L. Lopez, S. Koelsch, & M. Majno (Eds.), The neurosciences and music II: From perception to performance. Annals of the New York Academy of Sciences 1060, 17–28. Merker, B. (2007a). Consciousness without a cerebral cortex: A challenge for neuroscience and medicine. Behavioral and Brain Sciences 30(1), 63–134. Merker, B. (2007b). Music at the limits of the mind. In G. Kugiumutzakis (Ed.),Sympantiki Armonia, Musike kai Epistimi. Ston Miki Theodoraki [Universal harmony, music and science. In honour of Mikis Theodorakis]. Heraklion: Crete University Press. Merker, B. (2008). Ritual foundations of human uniqueness. In S. Malloch & C. Trevarthen (Eds.), Communicative musicality (pp. 45–59). Oxford: Oxford University Press. Merker, B. (2012). The vocal learning constellation: Imitation, ritual culture, encephalization. In N.  Bannan & S.  Mithen (Eds.), Music, language and human evolution (pp. 215–60). Oxford: Oxford University Press. Merker, B. (2013a). The efference cascade, consciousness, and its self: Naturalizing the first person pivot of action control. Frontiers in Psychology 4, article 501, 1–20.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

when extravagance impresses   85 Merker, B. (2013b). Cortical gamma oscillations: The functional key is activation, not cognition. Neuroscience & Biobehavioral Reviews 37(3): 401–417. Merker, B. (2014). Groove or swing as distributed rhythmic consonance: Introducing the groove matrix. Frontiers in Human Neuroscience 8, article 454, 1–4. Merker, B. (2015). Seven theses on the biology of music and language. Signata 6, 195–213. Merker, B., Madison, G., & Eckerdal, P. (2009). On the role and origin of isochrony in human rhythmic entrainment. Cortex 45(1): 4–17. Merker, B., Morley, I., & Zuidema, W. (2015). Five fundamental constraints on theories of the origins of music. Philosophical Transactions of the Royal Society of London: Biology 370(1664): 20140095. doi:10.1098/rstb.2014.0095 Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press. Miller, G. F. (2000). The mating mind: How sexual choice shaped the evolution of human nature. New York: Doubleday. Museum of Hoaxes (2005). Monkey art fools expert. Retrieved from: http://hoaxes.org/ weblog/comments/monkey_art_fools_expert Narmour, E. (1977). Beyond Schenkerism: The need for alternatives in music analysis. Chicago, IL: University of Chicago Press. Nettle, D., & Clegg, H. (2006). Schizotypy, creativity and mating success in humans. Proceedings of the Royal Society of London B: Biological Sciences 273, 611–615. doi:10.1098/ rspb.2005.3349 Nowicki, S., Searcy, W. A., & Peters, S. (2002a). Brain development, song learning and mate choice in birds: A review and experimental test of the “nutritional stress hypothesis.” Journal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology 188: 1003–1004. Nowicki, S., Searcy, W.  A., & Peters, S. (2002b). Quality of song learning affects female response to male bird song. Proceedings of the Royal Society of London B: Biological Sciences 269, 1949–1954. Okanoya, K. (2004). Song syntax in Bengalese finches: Proximate and ultimate analyses. Advances in the Study of Behavior 34, 297–345. Patel, A. D. (2008). Music, language, and the brain. Oxford: Oxford University Press. Pearce, E., Launay, J., & Dunbar, R. I. M. (2015). The ice-breaker effect: Singing mediates fast social bonding. Royal Society Open Science 2, 150221. Retrieved from http://dx.doi. org/10.1098/rsos.150221 Pinker, S. (1997). How the mind works. New York: Penguin Putnam. Rauhe, H. (2003). Musik heilt und befreit. In H. G. Bastian & G. Kreutz (Eds.), Musik und Humanität. Interdiziplinäre Grundlagen für (musikalische) Erzhiehung und Bildung (pp. 182–191). Mainz: Schott. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In A.  H.  Black & W.  F.  Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts. Riebel, K. (2003). The “mute” sex revisited: Vocal production and perception learning in female songbirds. Advances in the Study of Behavior 33, 49–86. Robinson, J. (Ed.). (1997). Music and meaning. Ithaca, NY: Cornell University Press. Rohrmeier, M. A., & Koelsch, S. (2012). Predictive information processing in music cognition. A critical review. International Journal of Psychophysiology, 83, 164–175.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

86   bjorn merker Sachs, E. (1967). Dissociation of learning in rats and its similarities to dissociative states in man. In J. Zubin & H. Hunt (Eds.), Comparative psychopathology: Animal and human (pp. 249–304). New York: Grune and Stratton. Scherer, K.  R. (2003). Why music does not produce basic emotions. In R.  Breslin (Ed.), Proceedings of the Stockholm Music Acoustic Conference, 2 vols., Vol. 1 (pp. 25–28). Retrieved from http://www.speech.kth.se/smac03 Scherer, K.  R., & Zentner, M.  R. (2001). Emotional effects of music: Production rules. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361–392). Oxford: Oxford University Press. Scherer, K.  R., Zentner, M.  R., & Schacht, A. (2001–2002). Emotional states generated by music: An exploratory study of music experts. Musicae Scientiae, Special Issue: Current trends in the study of music and emotion, 149–171. Schopenhauer, A. (1844/1966). The world as will and representation (2nd ed.; orig. ed. 1819). Trans. E. F. J. Payne, 2 vols. New York: Dover. Shields, S. A., MacDowell, K. A., Fairchild, S. B., & Campbell, M. L. (1987). Is mediation of sweating cholinergic, adrenergic, or both? A comment on the literature. Psychophysiology 24(3), 312–319. Silvia, P. J., & Nusbaum, E. C. (2011). On personality and piloerection: Individual differences in aesthetic chills and other unusual aesthetic experiences. Psychology of Aesthetics, Creativity, and the Arts 5(3), 208–214. Smith, L. P. (1924). Four words: Romantic, originality, creative, genius. Oxford: Clarendon Press. Sokolov, E.  N. (1963). Higher nervous functions: The orienting reflex. Annual Review of Physiology 25, 545–580. Spencer, H. (1911). On the origin and function of music. In Essays on education and kindred subjects (pp. 312–330). London: J. M. Dent & Sons. Staal, F. (1989). Rules without meaning. New York: Peter Lang. Strohminger, N.  S. (2013). The hedonics of disgust (Doctoral dissertation). University of Michigan. Retrieved from https://deepblue.lib.umich.edu/handle/2027.42/97960 Taylor, H. (2009). Towards a species songbook: Illuminating the vocalisations of the Australian pied butcherbird (Cracticus nigrogularis) (Doctoral dissertation). University of Western Sydney. Tribus, M. (1961). Thermodynamics and thermostatics: An introduction to energy, information and states of matter, with engineering applications. New York: Van Nostrand. Vickhoff, B., Åström, R., & Theorell, T. (2012). Musical piloerection. Music and Medicine 4, 82–89. Waterhouse, F. A. (1926). Romantic “originality.” The Sewanee Review 34, 40–49. Zahavi, A. (1975). Mate selection: A selection for a handicap. Journal of Theoretical Biology 53(1), 205–214. Zangwill, N. (2004). Against emotion: Hanslick was right about music. British Journal of Aesthetics 44(1), 29–43. Zentner, M., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion 8(4), 494–521.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

section iii

M USIC PRO C E S SI NG IN THE HUMAN BR A I N

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

chapter 5

Cer ebr a l Orga n iz ation of M usic Processi ng Thenille Braun Janzen and Michael H. Thaut

Introduction Uncovering the neural underpinnings of music processing is a central theme in ­cognitive neuroscience, as evidenced by the growing body of literature on this topic. Neuroimaging research developed over the past 20 years has successfully mapped several cortical and subcortical brain regions that support music processing. This chapter provides a broad panorama of the current knowledge concerning the anatomical and functional basis of music processing in the healthy brain. For that, we focus our attention on core brain networks implicated in music processing, emphasizing the anatomical and functional interactions between cortical and subcortical areas within auditory-frontal networks, auditory-motor networks, and auditory-limbic networks. Finally, we review recent studies investigating how brain networks organize themselves in a naturalistic music listening context. The term network here implies the notion of a collection of regions that are activated to support a particular function, referencing structural and functional connections between these regions. With that, we move beyond the “where” and “when” of task-related activity to start understanding how different brain networks interact to support cognitive, perceptual, and motor functions.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

90    thenille braun janzen and michael h. thaut

Neural Basis of Music Processing in the Healthy Brain The Ascending Auditory Pathways Music perception begins with the decoding of acoustic information. Acoustic signals such as voices and music enter the human ear and trigger a cascade of signal transpositions along the auditory pathways (Fig. 1). Incoming auditory signals are transmitted by the outer and middle ear to the cochlea of the inner ear, where acoustic information is translated into neural activity. Acoustic properties such as sound frequency are represented tonotopically in the basilar membrane of the cochlea, which refers to the systematic topographical arrangement of neurons as a function of their response to tones of different frequencies. This tonotopic organization is found throughout the auditory neuraxis (Humphries, Liebenthal, & Binder, 2010; Zatorre, 2002). Outside of the cochlea, dendrites of the spiral ganglion cells synapse with the base of the hair cells located in the organ of Corti on the basilar membrane. Triggered by the movement of the hair cells on the basilar membrane, the spiral ganglion cells are the first

ac mgb

stc hc

ic cochlea cn outer ear middle ear

soc

Figure 1.  The neural auditory pathway consists of an interconnecting cascade of processing nodes from the cochlear nucleus (CN) up to primary auditory cortex (AC) and higher-level auditory regions in superior temporal cortex (STC). Abbreviations: CN, cochlear nucleus; SOC, superior olivary complex; IC, inferior colliculus; HC, hippocampus; MGB, medial geniculate body; AC, auditory cortex; STC, superior temporal cortex. Reprinted from Progress in Neurobiology 123(1), Sascha Frühholz, Wiebke Trost, and Didier Grandjean, The role of the medial temporal limbic system in processing emotions in voice and music, pp. 1–17, https://doi.org/10.1016/J.PNEUROBIO.2014.09.003, Copyright © 2014 Elsevier Ltd. All rights reserved.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    91 neurons to fire an action potential in the auditory pathway and transmit all the brain’s auditory input via their axons synapsing with the dendrites of the cochlear nuclei (Amunts, Morosan, Hilbig, & Zilles, 2012; Froud et al., 2015; Nayagam, Muniak, & Ryugo, 2011). The majority of the fibers (70 percent) cross over to the opposite hemisphere starting at the levels of the cochlear nuclei (contralateral pathway), while some remain on the same incoming side (ipsilateral pathway). The acoustic information is highly preprocessed by a series of brainstem nuclei before reaching the cortex. Basic acoustic features such as sound intensity, signal onsets, periodicity, and signal location are extracted in the cochlear nucleus, lateral lemniscus, and the superior olivary complex. There is a ­secondary pathway that originates in the ventral cochlear nucleus where some fibers project from there to the reticular formation, a general arousal system in the lower brainstem. Descending (efferent) fiber tracts from the reticular formation form the audio-spinal pathway by connecting with the motor neurons in the spinal cord to innervate reflexive motor responses to sound and to prime motor neural excitability (Horn, 2006; Huffman & Henson, 1990; Rossignol & Melvill Jones, 1976). The secondary ascending (afferent) pathway inhibits lower auditory centers to elevate hearing thresholds and alert the cortex to incoming auditory signals. In the primary ascending pathway, the superior olivary complex is the first relay ­station of the brainstem where cochlear inputs from both left and right sides converge, providing the anatomical basis for the processing of sound location by measuring timing and sound intensity differences between incoming left and right signals to determine sound angles (Grothe, 2000; Tollin, 2003). More complex spectral and temporal decoding of the acoustic signals occurs in the inferior colliculus. Functional magnetic resonance imaging research with animals has shown that the spectral and temporal dimensions of the acoustic signals are distinctly mapped in the inferior colliculus, indicating that, in addition to the tonotopic maps, the temporal envelope of the acoustic signals are also topographically represented in the inferior colliculus (Baumann et al., 2011). The last cross-lateral projections are at the inferior colliculus level. The last subcortical node in the primary ascending pathway is the medial geniculate body, which is comprised of multiple subdivisions. The ventral nucleus of the medial geniculate body is tonotopically organized and is the main ascending route to the primary auditory cortex, while its other subdivisions project widely to both primary and non-primary auditory cortex. Importantly, the auditory pathway does not only consist of ascending projections; it also has rich top-down projections that are critical for modulation of neural responses in the subcortical auditory centers and for learning-induced plasticity (Bajo, Nodal, Moore, & King, 2010; Suga & Ma, 2003). In general, conduction in the auditory pathway is faster and stronger for the contralateral pathway. The human auditory cortex is located in the posterior part of the superior temporal lobe covering the Heschl’s gyrus and parts of the planum temporale and the posterior superior temporal gyrus. More specifically, the primary auditory cortex is largely located in the medial part of the Heschl’s gyrus (corresponding to Brodmann’s area BA41), and its core auditory region is tonotopically organized such that different ­subregions of the cortex are sensitive to different frequency bands (Langers,  2014;

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

92    thenille braun janzen and michael h. thaut Norman-Haignere, Kanwisher, & McDermott,  2013). The primary auditory cortex performs fine-grained and specific analysis of acoustic features, such as frequency (Da Costa et al.,  2011; Humphries et al.,  2010; Warren, Uppenkamp, Patterson, & Griffiths, 2003) and spectro-temporal modulation (Schonwiesner & Zatorre, 2009), playing a key role for the transformation of acoustic features into auditory percepts (e.g., from sound frequency into pitch percept) (Griffiths & Warren, 2004). Several lesion studies and functional imaging research have identified the lateral Heschl’s gyrus as a pitch-sensitive area, suggesting that pitch percepts are represented in this particular cortical region of the auditory cortex (for review, see Zatorre & Zarate, 2012). After the initial decoding of acoustic information in the primary auditory cortex, the information is transmitted to the secondary auditory cortex (located in the planum temporale and the planum polare) and to higher-level associative cortex in the superior temporal cortex and superior temporal sulcus. Areas of the non-primary auditory cortex are involved in a number of functions crucial for establishing a cognitive represen­ tation of the acoustic environment, including the representation of auditory objects (­auditory Gestalt formation), which entails processes such as the analysis of the contour of a melody, spatial grouping, extraction of inter-sound relationships, and stream segregation (Griffiths & Warren, 2002, 2004; for review, see Koelsch, 2011). Within the non-primary auditory cortex, there are multiple differentiated networks that have distinct functional roles (Cammoun et al., 2015). There is consistent evidence indicating that the superior temporal gyrus—both anterior and posterior to the Heschl’s gyrus—plays an important role in melodic processing (for review, see Janata, 2015; Peretz & Zatorre, 2005; Zatorre & Zarate, 2012). For instance, the superior temporal lobe (including both the superior temporal gyrus and the superior temporal sulcus) has been identified in studies examining melodic contour processing (Lee,  Janata, Frost, Hanke, & Granger,  2011; Patterson, Uppenkamp, Johnsrude, & Griffiths, 2002; Schindler, Herdener, & Bartels, 2013; Tramo, Shah, & Braida, 2002), perception of melodic intervals (Klein & Zatorre,  2015), sound spectral envelope (Warren, Jennings, & Griffiths, 2005), and categorical perception of major and minor chords (Klein & Zatorre, 2011). Interestingly, studies have shown that the posterior region of the auditory cortex is more sensitive to decoding changes in pitch height (which refers to the spectral weighting of a sound), whereas more anterior areas are more sensitive to changes in pitch chroma (which is a feature related to the relative position of a pitch within a scale), indicating that pitch dimensions may have distinct representations in the human auditory cortex (Warren et al., 2003). Recently emerging evidence suggests that the parietal cortex and posterior regions of the superior temporal sulcus are key brain areas for multisensory integration, where information from auditory, visual, tactile, and multisensory stimuli converge via a patchy distribution of inputs, followed by integration in the intervening cortex (Beauchamp, Argall, Bodurka, Duyn, & Martin, 2004; Beauchamp, Nath, & Pasalar, 2010; Beauchamp, Yasar, Frye, & Ro, 2008). Functional differences have also been reported between the left and right auditory cortices, whereby the left auditory cortical areas have a higher degree of temporal

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    93 s­ ensitivity, whereas corresponding areas on the right auditory cortex have a greater spectral resolution (Andoh & Zatorre,  2011; Cha, Zatorre, & Schönwiesner,  2016; Perani, 2012; Santoro et al., 2014; Stewart, Overath, Warren, Foxton, & Griffiths, 2008; Tervaniemi et al., 2000; Warrier et al., 2009). Notably, research has repeatedly shown a right-hemisphere bias in the processing of fine-grained spectral processing and a preferential response in the left hemisphere for temporal features of sounds, which supports the hypothesis that these functional asymmetries at early stages of auditory processing may be related to the intrinsic properties of each cortical hemisphere (Zatorre & Zarate, 2012). However, the pattern of activation between hemispheres can be modulated by stimulus complexity and/or task demands (Brechmann & Scheich, 2005; Hyde, Peretz, & Zatorre, 2008; Schön, Gordon, & Besson, 2005; Stewart et al., 2008) or music training (Ohnishi et al., 2001; Proverbio, Orlandi, & Pisanu, 2016). With respect to music perception, the findings outlined thus far reveal a hierarchical organization of auditory processing (Stewart et al., 2008; Wessinger et al., 2001; see also de Heer, Huth, Griffiths, Gallant, & Theunissen,  2017). The primary auditory cortex plays a crucial role in extracting individual pitches and pitch changes within the melody, whereas non-primary auditory areas are involved in determining relationships between pitches to define the melody contour. More abstract processes required to establish syntactic relationships and meaning occur largely in regions outside of the auditory cortex, including the frontal cortex.

Auditory-Frontal Networks The transformation of the auditory information into a musically meaningful tonal context involves several areas of the frontal cortex. Studies of music syntax, utilizing primarily expectancy violation paradigms, have demonstrated that regions of the inferior frontal gyrus respond to harmonic expectancy violations (Bianco et al., 2016; Janata, Birk, et al., 2002; Koelsch et al.,  2002; Koelsch, Fritz, Schulze, Alsop, & Schlaug,  2005; Maess, Koelsch, Gunter, & Friederici, 2001; Seger et al., 2013; Tillmann, Janata, & Bharucha, 2003). Reports have repeatedly indicated that the cortical network comprising the inferior frontolateral cortex (corresponding to BA44), inferior frontal gyrus, the anterior portion of the superior temporal gyrus, and the ventral premotor cortex, is involved in the processing of musical structure (for review, see Koelsch, 2006, 2011). This network appears to be specialized in establishing syntactic relationships by evaluating the harmonic relationship between incoming tonal information and a preceding harmonic sequence, thus detecting musical-structural irregularities and organizing fast short-term predictions of upcoming musical events (Koelsch, 2006). Recent imaging research has also suggested that rhythmic and melodic deviations in musical sequences may recruit different cortical areas—pitch deviations engage a neural network comprising auditory cortices, inferior frontal and prefrontal areas, whereas rhythmic deviations of a musical sequence recruit neural networks involving the posterior parts of the auditory cortices and parietal areas (Lappe, Lappe, & Pantev, 2016; Lappe, Steinsträter, & Pantev, 2013).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

94    thenille braun janzen and michael h. thaut These findings are in accordance with the dual-pathway model of auditory processing, which hypothesizes that two auditory processing pathways originate from the primary auditory cortex, each contributing to processing different higher-order aspects of auditory stimuli (Belin & Zatorre, 2000; Bizley & Cohen, 2013; Hickok & Poeppel, 2007; Rauschecker & Scott,  2009). The anterior-ventral auditory pathway—which projects from anterior superior temporal gyrus to anterior inferior frontal gyrus and prefrontal areas—is predominantly involved in perceiving auditory objects and processing auditory spectral features. For instance, it has been shown that the inferior frontal gyrus and related areas of the ventrolateral prefrontal cortex are activated during phonological and semantic processing, non-verbal auditory sound detection (Kiehl, Laurens, Duty, Forster, & Liddle, 2001), discrimination and auditory feature detection (Gaab, Gaser, Zaehle, Jancke, & Schlaug, 2003; Zatorre, Bouffard, & Belin, 2004), and auditory working memory (Kaiser, Ripper, Birbaumer, & Lutzenberger, 2003), which reinforces the assumption that these areas play a more fundamental role in auditory processing. On the other hand, the posterior-dorsal stream—which connects posterior superior temporal gyrus with posterior inferior frontal gyrus, posterior parietal cortex, and premotor ­cortex—has been implicated in extracting spectral motion and temporal components of an auditory stimulus, thus processing how frequencies change over time (review: Plakke & Romanski, 2014; Zatorre & Zarate, 2012). Recent evidence indicates that the dorsal pathway of ­auditory processing also plays an important role in calculating and comparing pitch or temporal manipulations within a context and using this auditory information to select and prepare appropriate motor responses (Belin & Zatorre, 2000; Chen, Rae, & Watkins, 2012; Foster, Halpern, & Zatorre, 2013; Hickok & Poeppel, 2007; Loui, 2015; Saur et al., 2008; Warren, Wise, & Warren, 2005). Frontal cortex activity has also been associated with cognitive demands or the stimulus properties within a task. Tasks that require maintenance and rehearsal of musical information activate the working memory functional network, comprising the ventrolateral premotor cortex (encroaching Broca’s area), dorsal premotor cortex, the planum temporale, inferior parietal lobe, the anterior insula, and subcortical structures (Koelsch et al., 2009; Royal et al., 2016; Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011). The medial prefrontal cortex (primarily the medial orbitofrontal region) appears to be ­particularly engaged in tasks requiring self-referential judgments (Alluri et al.,  2013; Zysset, Huber, Ferstl, & von Cramon, 2002), musical semantic memory (Groussard et al., 2010; Platel, Baron, Desgranges, Bernard, & Eustache, 2003), and music-evoked autobio­ graphical memory (Janata, 2009; Von Der Heide, Skipper, Klobusicky, & Olson, 2013). Areas in the frontal lobes, such as parietal and ventrolateral prefrontal regions, are differentially activated depending on the relative attentional demands of the tasks (Alho, Rinne, Herron, & Woods, 2014; Janata, Tillmann, & Bharucha, 2002; Maidhof & Koelsch, 2011; Satoh, Takeda, Nagata, Hatazawa, & Kuzuhara, 2001). Involuntary musical imagery—that is, the spontaneous experience of having music looping in one’s head—is associated with cortical thickness in regions of the right frontal and temporal cortices as well as the anterior cingulate and left angular gyrus (Farrugia, Jakubowski, Cusack, & Stewart, 2015). On the other hand, voluntary musical

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    95 imagery—the generation of mental representation of music or musical attributes in the absence of real sound input—engages secondary auditory cortices, the parietal cortex, inferior frontal regions, the supplementary motor area (SMA) and pre-SMA (Brown & Martinez, 2007; Halpern, Zatorre, Bouffard, & Johnson, 2004; Harris & De Jong, 2014; Peretz et al., 2009; Zatorre, Halpern, Perry, Meyer, & Evans, 1996). Neural activity in motor areas during perception or mental imagery of sounds have been repeatedly reported when musicians listen to a well-rehearsed musical sequence (Bangert et al., 2006; D’Ausilio, Altenmüller, Olivetti Belardinelli, & Lotze, 2006; Harris & De Jong, 2014; Haueisen & Knösche,  2001) or when pianists watch silent video recordings of hands playing a silent keyboard (Baumann et al., 2007; Bianco et al., 2016; Hasegawa et al., 2004). Activation of the fronto-parietal motor-related network (comprising Broca’s area, the premotor region, intraparietal sulcus, and inferior parietal region) was also found when non-musicians listened to a piano piece they learned to play (Lahav, Saltzman, & Schlaug, 2007). These studies collectively show that the mere perception or mental imagery of sounds (which would normally be associated with a specific action) can automatically trigger representations of the movement necessary to produce these sounds, providing strong evidence that perception and action are intrinsically coupled in the human brain and in cognition (for review, see Keller, 2012; Maes, Leman, Palmer, & Wanderley, 2014; Novembre & Keller, 2014).

Auditory-Motor Networks Projections from motor cortex to the auditory cortex are an architectural feature common to many animal species (Schneider, Nelson, & Mooney, 2014). Animal research has indeed proven to be an important model to investigate the synaptic and circuit mechanisms by which the motor cortex interacts with auditory cortical activity (e.g., Merchant, Perez, Zarco, & Gamez,  2013; Nelson et al.,  2013; Roberts et al.,  2017; Schneider & Mooney,  2015). For instance, a recent study in mice found that axons from the secondary motor cortex make synapses onto both excitatory and inhibitory neurons in deep and superficial layers of the auditory cortex and that a subset of these neurons extends axons to various subcortical areas important for auditory processing (Nelson et al., 2013). The analysis of local field potentials of behaving macaques has also provided valuable insight regarding the neural underpinnings for beat synchronization, showing, for example, that beta-band oscillations may enable communication between distributed circuits involving the striato-thalamo-cortical network during rhythm perception and production (for a review, see Merchant & Bartolo,  2018; Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015; see also Chapter 8). Recent research has also identified fiber projections transmitting auditory signals into motor regions in the human brain (Fernández-Miranda et al., 2015). FernándezMiranda and colleagues demonstrated that the left and right arcuate fascicle, a white matter fiber tract that links lateral temporal cortex with frontal areas, is segmented into subtracts with distinct fiber terminations (Fig. 2). One set of fibers terminates at the

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

96    thenille braun janzen and michael h. thaut

Figure 2.  Subtracts of the left arcuate fascicle with terminations on primary motor cortex and premotor cortex, corresponding to Brodmann areas BA6 and BA4 (ventral precentral and caudal middle frontal gyri). Reprinted by permission from Brain Structure and Function 220 (3), Asymmetry, connectivity, and segmentation of the arcuate fascicle in the human brain, Juan C. Fernández-Miranda, Yibao Wang, Sudhir Pathak, Lucia Stefaneau, Timothy Verstynen, and Fang-Cheng Yeh, pp. 1665–1680, https://doi.org/10.1007/s00429-014-0751-7 © Springer-Verlag Berlin Heidelberg, 2014.

v­ entral precentral and caudal middle frontal gyri (BA4, BA6), providing direct projections from auditory cortex to motor areas (primary motor cortex, premotor cortex). Further evidence of functional coordination between auditory and motor cortices has been provided by a robust body of neuroimaging research. Studies have shown that listening to and encoding auditory rhythms internally increases auditory-motor brain connectivity (Chen, Penhune, & Zatorre,  2008a; Chen, Zatorre, & Penhune,  2006; Fujioka, Trainor, Large, & Ross, 2012; Grahn & Brett, 2007), and that the coupling among cortical motor and auditory areas is strengthened with musical training (Chen, Penhune, & Zatorre, 2008b; Grahn & Rowe, 2009; Palomar-García, Zatorre, Ventura-Campos, Bueichekú, & Ávila, 2017). Studies have also found that corticospinal excitability is modulated by music with a strong beat (“groove”), which suggests that merely listening to musical rhythm elicits activity in motor-output pathways from the primary motor cortex to the spinal cord (Giovannelli et al., 2013; Michaelis, Wiener, & Thompson, 2014; Stupacher, Hove, Novembre, Schütz-Bosbach, & Keller, 2013). Further evidence of

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    97 auditory-motor coupling at spinal cord level is provided by research showing that delivering transcranial magnetic stimulation in time with the music facilitates corticospinal excitability in muscles involved in foot tapping (i.e., tibialis anterior, gastrocnemius) (Wilson & Davey, 2002; see also Thaut, McIntosh, Prassas, & Rice, 1992), and that the degree of corticospinal excitability depends on musical training, being greater in trained musicians (D’Ausilio et al., 2006; Stupacher et al., 2013). Finally, extensive neurophysiological evidence indicates that auditory and motor regions communicate through oscillatory activity and that the cortical loop between these areas generates temporal predictions that are crucial in auditory perceptual learning and for the perception of, and entrainment to, musical rhythms (Fujioka et al., 2012; Large, Herrera, & Velasco, 2015; Large & Snyder, 2009; Ross, Barat, & Fujioka, 2017; for review: Merchant et al., 2015; Morillon & Baillet,  2017; Ross, Iversen, & Balasubramaniam,  2016). Therefore, ­evidence at multiple levels of inquiry suggests that there is a strong functional and anatomical link between auditory and motor-related areas, and that many components of the motor system are deeply involved in auditory perceptual learning, in the generation of predictions, as well as in the perception of, and entrainment to, musical rhythms. Interconnectivity between auditory and motor-related areas is crucial for time perception and for the production of timed movements. Temporal processing and sensorimotor synchronization involve complex functional networks comprising several distant cortical and subcortical brain areas, including the cerebellum, the basal ganglia (predominantly the putamen), thalamus, the SMA and pre-SMA, premotor cortex (PMC), and the auditory cortex (for review: Chauvigné, Gitau, & Brown, 2014; Iversen & Balasubramaniam, 2016; Leow & Grahn, 2014; Merchant et al., 2015; Teki, Grube, & Griffiths, 2012). Although the specific role of each area is still emerging, recent studies have reached consensus that there are at least two distinct networks involved in timing—one is centered on the role of the cerebellum in the processing of sensory prediction errors, motor adaptation, and duration-based timing, and the second is based on the role of the basal ganglia and the SMA on beat-based timing and internally driven rhythmic movements.

Cortico-Cerebellar Network The cerebellum receives segregated projections from prefrontal, frontal, parietal, and superior temporal regions via the pontine nuclei in the brainstem (Fig. 3). Output projections are then sent from the cerebellar cortex to specialized deep cerebellar nuclei, which in turn project back, via the thalamus, to the region of the cerebral cortex from which the initial projection originated (Koziol, Budding, & Chidekel, 2011; Schmahmann & Pandya, 1997). These parallel cortico-cerebellar loops place the cerebellum in a unique position to use all the information it receives from the neocortex to build, through a learning process, an internal “model” that contains all of the dynamic processes that are required to perform a specific movement or behavior. This feedforward information (or efferent copy) is used to generate a representation of the expected sensory consequences of that command, and to compute error signals that can produce online changes to adjust its execution and/or to improve future predictions (for review,

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

98    thenille braun janzen and michael h. thaut

SMA PPC PFC

Striatum GPe

Th

STN VTA SN

Cb Cortex DN

PN IO Glutamate GABA Dopamine Unknown

Figure 3.  Diagram of cortico-cerebellar and basal ganglia-thalamo-cortical networks and the intricate connectivity between these circuits. The basal ganglia-thalamo-cortical timing network normally involves the SMA, PFC, Striatum, PPC, GPe, Th, STN, VTA, and SN. The cerebellar network involves the Cb Cortex, PN, DN, and IO. Note that the cerebellum is also connected to multiple cortical and subcortical regions, and that reciprocal connections between the basal ganglia and the cerebellum are not illustrated. Abbreviations: PFC, prefrontal cortex; SMA, supplementary motor area; PPC, posterior parietal cortex; Th, thalamus; GPe, globus pallidus; STN, sub thalamic nuclei; SN, substantia nigra; VTA, ventral tegmental area; PN, pontine nuclei; DN, dentate nucleus; Cb, cerebellar cortex; IO, inferior olive. Reproduced with permission from Petter et al. (2016).

see Sokolov, Miall, & Ivry, 2017; Wolpert, Miall, & Kawato, 1998). Indeed, research has demonstrated that the cerebellum is key in establishing sensory prediction errors by processing signal discrepancies between the expected sensory consequences of a stimulus/ movement and the actual sensory input (Baumann et al., 2015; Koziol et al., 2014; Manto et al., 2012; Tseng, Diedrichsen, Krakauer, Shadmehr, & Bastian, 2007). These error signals are essential for sensorimotor control, motor adaptation, and learning because they allow rapid adjustments in the motor output and refinement of future sensory predictions in order to reduce the variability of subsequent actions (Doyon, Penhune, & Ungerleider, 2003; Petter, Lusk, Hesslow, & Meck, 2016; Shadmehr, Smith, & Krakauer, 2010; Sokolov et al., 2017). A growing body of research evidence indicates that cortico-cerebellar networks are predominantly engaged in movement synchronization to externally cued stimuli, but less involved in self-paced or internally guided motor behaviors (Brown, Martinez, & Parsons,  2006; Buhusi & Meck,  2005; Chauvigné et al.,  2014; Del Olmo, Cheeran, Koch, & Rothwell, 2007; Grahn & Rowe, 2013; Manto et al., 2012; Thaut et al., 2009; Witt, Laird, & Meyerand, 2008). These findings concur with the cerebellum’s role in

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    99 integration of sensory and motor information, basic sensory prediction related to  motor timing, and temporal adaptation during sensorimotor synchronization (Diedrichsen, Criscimagna-Hemminger, & Shadmehr, 2007; Gao et al., 1996; Manto et al.,  2012; Mayville, Jantzen, Fuchs, Steinberg, & Kelso,  2002; Rao et al.,  1997; Schwartze, Keller, & Kotz,  2016; Shadmehr et al.,  2010; Thaut, Demartin, & Sanes, 2008; Tseng et al., 2007). The premotor cortex is also known to play a role in movements guided by external sensory stimuli and is thought to be particularly involved in aspects of prediction related to motor timing and temporal adaptation, and in integrating higher-order features of sound with the appropriately timed and organized motor response (Chapin et al., 2010; Chen et al., 2008b; Jahanshahi et al., 1995; Jäncke, Loose, Lutz, Specht, & Shah, 2000; Kornysheva & Schubotz, 2011; Pecenka, Engel, & Keller,  2013; Schubotz,  2007). Studies have indeed identified fronto-olivocerebellar pathways that connect the dorsal portions of the dentate nucleus in the cerebellum to motor areas such as the primary motor cortex and the premotor cortex (Dum, 2002; Middleton & Strick, 2001; Schmahmann & Pandya, 1997). The olivocerebellar network is thought to be an important neural loop in the cerebellar adaption of sensorimotor forward models due to its capacity to directly modulate the output signals sent from the cerebellum back to sensorimotor cortical areas (Koziol et al., 2011; Sokolov et al., 2017). The inferior olive is a brainstem nucleus that receives significant projections from the sensorimotor cortex and is one of the main sources of input to the cerebellar cortex. Excitatory neurons originated in the inferior olive, known as climbing fibers, project to Purkinje cells in the cerebellar sensorimotor cortex and the deep cerebellar nuclei. This microcircuit is completed with Purkinje cells in the cerebellar cortex sending inhibitory projections to the deep cerebellar nuclei (including the dentate nucleus), which in turn send projections back to the inferior olive and to the cerebral cortex via the thalamus (Fig. 3). Some models suggest that this cortico-cerebellar network is involved in detecting sequences of cortical input activity and generating precisely timed output activity in response, hence contributing to the optimization and coordination of neocortical network activity involved in cognitive and motor processes (Durstewitz,  2003; Fatemi et al.,  2012, p. 792; Mauk & Buonomano,  2004; Medina & Mauk, 2000; Molinari et al., 2005; Molinari, Leggio, & Thaut, 2007; Thaut et al., 2009). Alternatively, other theories hypothesize that the olivocerebellar circuit has the electrophysiological characteristics of a neural clock capable of generating accurate absolute timing signals, suggesting that the cerebellum is specialized for providing an explicit temporal representation (Allman, Teki, Griffiths, & Meck, 2014; Ashe & Bushara, 2014; Ivry, Spencer, Zelaznik, & Diedrichsen, 2002; Spencer, Ivry, & Zelaznik, 2005; Teki et al., 2012). Recently converging evidence indicates that the cerebellum is also implicated in ­measuring and storing the absolute duration of sub-second time intervals of discrete perceptual events (for review: Allman et al., 2014; Petter et al., 2016; Teki et al., 2012). Several studies have demonstrated that the cerebellum is crucial for perceptual tasks requiring temporal discrimination, processing of target duration, detecting the timing onset of discrete perceptual events, detecting violations of temporal expectancies, and

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

100    thenille braun janzen and michael h. thaut processing complex temporal events such as polyrhythmic stimuli and non-metric rhythms (Grahn & Rowe,  2009; Grube, Cooper, Chinnery, & Griffiths,  2010; Kotz, Stockert, & Schwartze, 2014; O’Reilly, Mesulam, & Nobre, 2008; Paquette, Fujii, Li, & Schlaug,  2017; Schwartze, Rothermich, Schmidt-Kassow, & Kotz,  2011; Teki, Grube, Kumar, & Griffiths, 2011; Tesche & Karhu, 2000; Thaut et al., 2008). Recent functional imaging and transcranial stimulation research demonstrated that the cerebellar lobules VI and VIIA in the vermis are specially active in perceptual tasks involving duration-based timing (Grube, Cooper, et al., 2010; Grube, Lee, Griffiths, Barker, & Woodruff, 2010; Keren-Happuch, Chen, Ho, & Desmond, 2014; Lee et al., 2007; O’Reilly et al., 2008). The notion that distinct cerebellar regions are activated depending on the context and the different aspects of timing is supported by neuroimaging studies demonstrating that the cerebellum is topographically organized so that different regions of the cerebellum manage information from different domains (Kelly & Strick,  2003; Keren-Happuch et al., 2014; Koziol et al., 2011; Stoodley & Schmahmann, 2009, 2010). Although the cerebellum has been long known for its importance in motor behavior and timing, current research has firmly established the cerebellum’s critical role in modulating cognitive functions including attention, emotion, executive function, language, working memory, and music perception (for review, see Baumann et al., 2015; Buckner, 2013; Koziol et al., 2014; Sokolov et al., 2017). Recent studies indeed suggest that the cerebellum plays a role in processing pitch and timbre (Alluri et al.,  2012; Parsons,  2012; Parsons, Petacchi, Schmahmann, & Bower, 2009; Pfordresher, Mantell, Brown, Zivadinov, & Cox, 2014; Thaut, Trimarchi, & Parsons,  2014; Toiviainen, Alluri, Brattico, Wallentin, & Vuust, 2014). For instance, Thaut and colleagues (2014) described common and distinct neural substrates underlying processing of the different components of rhythmic structure (i.e., pattern, meter, tempo), but also, that melody processing induced activity in different regions when compared to rhythm (e.g., right anterior insula and various cerebellar areas). Another study showed that alterations of auditory feedback during piano ance, particularly pitch disruptions, increased activity in the cerebellum perform­ (Pfordresher et al., 2014), which is aligned with the understanding that the cerebellum is involved in monitoring sensory prediction errors, including pitch information. The cere­ bellum has also been implicated in the processing of affective sounds (Alluri et al., 2015; Pallesen et al., 2005; for review: Frühholz, Trost, & Kotz, 2016), and in working memory tasks such as recognizing musical motifs (e.g., Burunat, Alluri, Toiviainen, Numminen, & Brattico, 2014; see also Ito, 2008; Marvel & Desmond, 2010), supporting the idea that the cerebellum is a multipurpose neural mechanism capable of influencing a wide range of functional processes.

Basal Ganglia-Thalamo-Cortical Network Mounting evidence suggests that a distributed network comprising the basal ganglia (particularly the putamen), thalamus, and cortical areas such as the SMA and pre-SMA, premotor cortex and auditory cortex, is engaged in beat perception (for review: Leow & Grahn, 2014; Merchant et al., 2015; Petter et al., 2016; Teki et al., 2012). The basal ganglia are thought to play a key role in predicting upcoming events based on a relative timing

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    101 mechanism, that is, where temporal intervals are coded relative to a periodic beat interval (Grahn & Brett, 2007; Grahn, Henry, & McAuley, 2011; Grahn & Rowe, 2013; Grube, Cooper, et al., 2010; Grube, Lee, et al., 2010; Kotz, Brown, & Schwartze, 2016; Nozaradan, Schwartze, Obermeier, & Kotz, 2017; Teki et al., 2011). These findings are consistent with studies showing the involvement of the basal ganglia in reward prediction, associate learning, and harmonic processing (e.g., Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011; Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015; Seger et al., 2013). Functional connectivity between basal ganglia (putamen), cortical motor areas (pre­ motor cortex and SMA), and auditory cortex increases significantly when listening to rhythms with a clear beat, suggesting that the basal ganglia and the SMA are important for the representation of pulse and rhythm even in the absence of movement (Chen et al., 2008a; Grahn & Brett, 2007; Grahn & Rowe, 2009; Stupacher et al., 2013). Neural pathways connecting the basal ganglia and the SMA have been identified in studies using in vivo imaging tractography (Akkal, Dum, & Strick, 2007; Lehéricy et al., 2004), showing that corticostriatal connections are part of a distributed network that supports different aspects of timing (Fig. 3). There are strong indications that the basal ganglia (putamen) and SMA are predom­ inantly involved in maintaining the internal representation of the beat intervals in sensorimotor tasks (beat continuation). This notion is supported by studies showing that there is greater activation of the putamen and SMA during the continuation phase of synchronization-continuation tasks, that is, when the external reference cues are no longer available (Cunnington, Bradshaw, & Iansek,  1996; Grahn & Rowe,  2013; Halsband, Ito, Tanji, & Freund,  1993; Rao et al.,  1997). These findings concur with research describing the role of the SMA in timed movements performed in the absence of any pacing stimulus (i.e., self-paced or internally guided motor behaviors) (Coull, Vidal, & Burle, 2016; Harrington & Jahanshahi, 2016; Lima, Krishnan, & Scott, 2016; Nachev, Kennard, & Husain, 2008; Witt et al., 2008). Activity in the SMA and basal ganglia during internally generated movements has been also investigated in non-human primates (for review: Merchant & Bartolo, 2018; Merchant et al., 2015). The analysis of local field potentials of behaving macaques has demonstrated, for instance, that greater beta-band (15–30 Hz) activity in the putamen was observed during the continuation phase of synchronization-continuation tasks, suggesting that beta-band oscillations may enable communication between a distributed set of circuits including the motor cortico-basal ganglia-thalamo-cortical circuit (Bartolo, Prado, & Merchant,  2014). Interestingly, the study also found gamma-activity (30–50 Hz) in some local fields in the putamen during the synchronization phase of the task, suggesting that the putamen may also be involved in local computations associated with sensorimotor processing during beat synchronization. The physiological mechanism underlying the processing of temporal information involving the basal ganglia-thalamo-cortical circuit is likely mediated by dopamine receptors located on corticostriatal neurons in the nigrostriatal pathway (for review, see Agostino & Cheng, 2016; Allman et al., 2014; Buhusi & Meck, 2005; Petter et al., 2016). Evidence suggests that striatal medium spiny neurons in the dorsal striatum (comprising

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

102    thenille braun janzen and michael h. thaut putamen and caudate nucleus) are crucial to duration discrimination in the ­seconds-to-minutes range due to their role in large-scale oscillatory networks connecting mesolimbic, nigrostriatal, and mesocortical dopaminergic systems (Buhusi & Meck, 2005; Merchant, Harrington, & Meck, 2013). The striatal beat-frequency model suggests that the neural mechanisms of interval timing are based on the entrainment of the oscillatory activity of striatal neurons and cortical neural oscillators (Matell & Meck, 2004). The role of dopamine in interval timing accuracy and precision is supported by studies showing that patients with disorders that involve dopaminergic pathways, such as Parkinson’s disease, Huntington’s disease, and schizophrenia, have difficulties in timing-related tasks, and that dopaminergic medication can ameliorate these issues (Harrington et al., 2011; Jahanshahi et al., 2010; see review in Allman & Meck, 2012; Coull, Cheng, & Meck, 2011). A recent study also showed that dopamine depletion in healthy individuals attenuated the activity in the putamen and SMA and directly interfered with the processing of temporal information (Coull, Hwang, Leyton, & Dagher, 2012). Pharmacological studies have also made significant advances in understanding how dopamine affects the activity of corticostriatal circuits and what roles the different dopaminergic receptors play in timing behavior (for review, see Agostino & Cheng, 2016; Narayanan, Land, Solder, Deisseroth, & DiLeone, 2012). Taken together, it is clear that cortico-cerebellar and basal ganglia-thalamo-cortical networks have complementary roles in temporal perception and motor timing, and the challenge for future studies is to further understand how these networks interact in both motor and non-motor functions. Recently emerging evidence from neuroanatomical studies using virus transneuronal tracers demonstrates that the cerebellum and the basal ganglia are reciprocally connected and that these subcortical structures are indeed part of an integrated network (Bostan, Dum, & Strick, 2013; Caligiore et al., 2017; Chen, Fremont, Arteaga-Bracho, & Khodakhah,  2014; Kotz et al.,  2016; Pelzer, Melzer, Timmermann, von Cramon, & Tittgemeyer, 2017). Models discussing the possible ways in which the cortico-cerebellar and striato-thalamo-cortical networks may integrate to support time perception and sensorimotor synchronization have been recently proposed, instigating further investigations (Lusk, Petter, Macdonald, & Meck, 2016; Petter et al., 2016; Teki et al., 2012).

Auditory-Limbic Networks The limbic and the auditory systems are highly interconnected and form an important part of the core neural network involved in affective sound processing (Frühholz et al., 2016). Direct and indirect pathways between the auditory system and limbic areas have been described in the literature (Fig. 4B) (Frühholz, Trost, & Grandjean, 2014; Janak & Tye, 2015). For instance, the amygdala (specifically, the lateral part of the basolateral complex) receives direct projections from the superior temporal cortex (LeDoux, 2007), and animal research suggests that there may also be a direct connection with the primary auditory cortex (Reser, Burman, Richardson, Spitzer, & Rosa, 2009). The amygdala is

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

(a)

ac mgb

stc hc

ic cochlea

outer ear

cn

soc

middle ear

(b)

ac

auditory cortex

thalamus stc

mgb 2

1

2

ic 3

ac

amygdala

m c b

dg

l

amygdala pathways 1 efferences (subcortical) 2 efferences (cortical) 3 afferences

2

hippocampus

sub erc

ca3 ca4

1

hippocampus pathways 1 indirect 2 direct

ca2

ca l

prc

phg

Figure 4. (A) The neural auditory ascending pathway. (B) Amygdala and hippocampal ­connections to the auditory system. The amygdala receives direct input from the MGB of the thalamus (1) and from higher-level auditory cortex in STC (line 2), which both project to the lateral nucleus of the basolateral (l) complex of the amygdala. Tracing studies in animals also report connections between AC and the amygdala (dashed line 2). The basal nucleus (b) of the basolateral complex has efferent connection to the IC (line 3). The accessory nucleus (ac), the medial (m), and the central nucleus (c) are not directly connected to the auditory system. The hippocampus (hc) shows direct (line 2) and indirect (lines 1) connections to the auditory cortex. A direct connection exists from the CA1 region to the higher-level auditory cortex (line 2). Indirect connections mainly provide input to the hippocampal formation by connections from the STC to the parahippocampal gyrus (phg), to the perirhinal cortex (prc) and the entorhinal cortex (erc), all line 1, which figure as input relays to the hippocampus. Abbreviations: MGB, medial geniculate body; STC, superior temporal cortex; IC, inferior colliculus; CN, cochlear nucleus; SOC, superior olivary complex; AC, auditory cortex; SUB, subiculum; DG, dentate gyrus. Reprinted with permission from Frühholz et al. (2014).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

104    thenille braun janzen and michael h. thaut also interconnected with subcortical nodes of the ascending auditory pathway, receiving direct projections from the medial geniculate body and sending projections to the inferior colliculus, supporting the notion that less complex sounds (i.e., short high-intensity sounds or aversive sounds) may be transmitted to the amygdala through a fast subcortical circuit (Fig. 4B) (Frühholz et al., 2016; Pannese, Grandjean, & Frühholz, 2016). Recent theories suggest that this direct link between the auditory thalamus and the amygdala plays an important role in fast responses to sound whereas a “slow” network projecting from thalamus to primary auditory cortex to association cortex to amygdala may ­govern interpretive labeling/understanding responses during music processing and music-evoked emotions (Huron, 2006; Juslin & Västfjäll, 2008). Mounting evidence from functional neuroimaging research shows that music can modulate activity in several brain areas of the limbic system, such as the amygdala, the hippocampal formation, right ventral striatum (including the nucleus accumbens) extending into the ventral pallidum, caudate nucleus, insula, the cingulate cortex, and the orbitofrontal cortex (for review: Koelsch, 2014; Zatorre, 2015). Studies have demonstrated that music that is perceived as joyful elicits strong response in the superficial nuclei group of the amygdala, an area that seems to be particularly involved in extracting the social significance of signals that convey basic socio-affective information (Koelsch et al.,  2013; Koelsch & Skouras,  2014; Lehne, Rohrmeier, & Koelsch,  2013). Activity changes in response to joyful, unpleasant, or sad music, were also found in the (right) laterobasal amygdala, an area that has been implicated in acquisition, encoding, and retrieval of both positive and negative associations, and processing cues that predict either positive or negative reinforcement (Brattico et al.,  2011; Koelsch et al.,  2013; Koelsch, Fritz, v. Cramon, Müller, & Friederici,  2006; Mitterschiffthaler, Fu, Dalton, Andrew, & Williams, 2007; Pallesen et al., 2005). The laterobasal amygdala is involved in the regulation of neural input into the hippocampal formation, another area that responds to music-evoked emotions such as tenderness, peacefulness, nostalgia, or wonder (Burunat et al., 2014; Choppin et al., 2016; Koelsch et al., 2013; Mitterschiffthaler et al., 2007; Trost, Ethofer, Zentner, & Vuilleumier, 2012; for review: Koelsch, 2014). The hippocampus, in turn, receives projections from the auditory system, however, these are mediated by the parahippocampal gyrus, the perirhinal cortex, and the entorhinal cortex (Fig. 4B) (for review, see Frühholz et al., 2014; Koelsch, 2014). Changes in the ventral striatum (including the nucleus accumbens) have also been found in response to pleasant music (Blood & Zatorre,  2001; Koelsch et al.,  2006; Menon & Levitin, 2005; Mueller et al., 2015; Salimpoor et al., 2013; Zatorre & Salimpoor, 2013). In particular, the nucleus accumbens has been shown to respond to intense feelings of music-evoked pleasure and reward (Blood & Zatorre,  2001; Salimpoor et al., 2011, 2013), suggesting that functional connectivity between the auditory cortex and ventral striatum (including the nucleus accumbens) is crucial for experiencing pleasure in music (Martínez-Molina, Mas-Herrero, Rodríguez-Fornells, Zatorre, & Marco-Pallarés,  2016; Sachs, Ellis, Schlaug, & Loui,  2016; Salimpoor et al.,  2013). Music-evoked pleasure can lead to dopamine release in distinct anatomical areas; increase in dopamine availability in the dorsal striatum is associated to the anticipation

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    105 of reward, whereas increase in dopamine in the ventral striatum occurs during the rewarding experience (Blood & Zatorre, 2001; Menon & Levitin, 2005; Salimpoor et al., 2011, 2015; Zatorre & Salimpoor, 2013). Aesthetic pleasure results from the integration between subcortical dopaminergic regions and higher-order cortical areas of the brain (for review, see Salimpoor et al., 2015). It has been shown, for instance, that functional connectivity between the nucleus accumbens and the auditory cortex as well as fronto-striatal circuit (involving ventral and dorsal subdivisions of the striatum and frontal areas such as inferior frontal gyri, prefrontal cortex, and orbitofrontal cortex) predicts whether individuals will decide to purchase a song (Salimpoor et al., 2013). Recently emerging data from transcranial magnetic stimulation research further supports the direct role of the fronto-striatal circuit in both the affective responses and motivational aspects of music-induced reward (Mas-Herrero, Dagher, & Zatorre, 2017). The ventromedial prefrontal cortex and adjacent orbitofrontal cortex are involved in high-level emotional processing, such as reward detection and valuation, and are the main cortical inputs to the nucleus accumbens, again reinforcing the notion that fronto-striatal circuits are highly involved in the integration, evaluation, and decision-making of reward-related stimuli (for review: Haber & Knutson, 2010; Salimpoor et al., 2015; see also Chapter 14). Recent findings suggest that the auditory cortex also plays a crucial role in the emotional processing of sounds, beyond mere acoustical analysis (Frühholz et al.,  2016; Koelsch, Skouras, & Lohmann, 2018). Koelsch et al. (2018) found that fear stimuli (compared with joy stimuli) evoked higher network centrality in both anterior and posterior auditory association cortex, suggesting that the auditory cortex may play a central role in the affective processing of auditory information. Moreover, findings also indicated that the auditory cortex is functionally connected with a widespread network involved in emotion processing, which includes limbic/paralimbic structures (cingulate, insular, parahippocampal, and orbitofrontal cortex, as well as the ventral striatum), and also extra-auditory neocortical areas (visual, somatosensory, and motor-related areas, and attentional structures). These results expand the traditional view that sensory cortices have mere perceptual functions and highlight the importance of investigating the functional connectivity between brain regions.

Brain Network Interactions Recent advances in neuroimaging analysis methods have allowed researchers to address questions of functional connectivity, interregion coupling, and networked computations that go beyond the “where” and “when” of task-related activity, providing new insights about how different brain networks interact to support cognitive, perceptual, and motor functions (Friston, 2011). Among the topics recently explored in music neuroscience is how brain networks organize themselves in a naturalistic music listening situation, wherein data acquisition takes place while participants ­listen to entire songs in an uninterrupted fashion, thus emulating real-life listening

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

106    thenille braun janzen and michael h. thaut experiences (Alluri et al., 2012, 2013, 2015; Burunat et al., 2014; Koelsch & Skouras, 2014; Koelsch et al., 2018; Lehne et al., 2013; Sachs et al., 2016; Toiviainen et al., 2014). Studies using novel data-driven methods to investigate neural correlates of musical feature processing using fMRI data have found, for instance, that timbral feature pro­ cessing during naturalistic listening conditions engages sensory and default mode ­network cerebrocortical areas as well as cognitive areas of the cerebellum, whereas musical pulse and tonality processing recruit cortical and subcortical cognitive, motor, and emotion-related circuits (Alluri et al.,  2012; Toiviainen et al.,  2014). Orbitofrontal cortex and the anterior cingulate cortex, which are associated with aesthetic judgments and self-referential appraisal, are also recruited while listening to full musical pieces (Alluri et al., 2013; Reybrouck & Brattico, 2015; Sachs et al., 2016). Moreover, music containing lyrics seems to particularly increase activity in the left auditory cortex, corroborating the hypothesis of hemispheric lateralization (Alluri et al., 2013; Brattico et al., 2011). Collectively, these findings confirm the notion that music processing requires timely coordination of large-scale cognitive, motor, and limbic brain circuitry. Research has also demonstrated that music preference and music expertise can modulate functional brain connectivity during passive music listening. A recent study has found that the default mode network—a network of interacting brain regions that is important for internally-focused thoughts—was more functionally connected when people listened to unfamiliar music they like compared to music they dislike, and that listening to one’s favorite music increased connectivity between auditory brain areas and the hippocampus (Wilkins, Hodges, Laurienti, Steen, & Burdette, 2014). These findings were recently expanded by a study showing that musicians and non-musicians use different neural networks during music listening (Alluri et al., 2017). Whole-brain network analysis revealed that, while the dominant hubs during passive music listening in non-musicians encompassed regions related to the default mode network, in musicians the primary neural hubs engaged during music listening comprised cerebral and cerebellar sensorimotor regions. Moreover, the study also showed that musicians have enhanced connectivity in the motor and sensory homunculus representing the upper limbs and torso during the listening task, suggesting that experts tend to process music using an action-based approach whereas non-musicians use a perception-based approach (Alluri et al., 2017; see also Moore, Schaefer, Bastin, Roberts, & Overy, 2014). Evidence for the reconfiguration of human brain functional networks during music listening has also been provided by electroencephalography (EEG) studies (Adamos, Laskaris, & Micheloyannis, 2018; Klein, Liem, Hänggi, Elmer, & Jäncke, 2016; Rogenmoser, Zollinger, Elmer, & Jäncke, 2016; Sänger, Müller, & Lindenberger, 2012; Wu et al., 2012; Wu, Zhang, Ding, Li, & Zhou, 2013). Overall, findings concur that music processing induces changes in the functional organization of neural synchronies by increasing intraregional and interregional oscillatory synchronizations. These findings support the evidence that music, like other higher cognitive tasks, requires the activation of different cortical and subcortical regions in an organized and cooperative manner (Bhattacharya & Petsche, 2005).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    107

Summary Uncovering the neural underpinnings of music processing is a central theme in cognitive neuroscience, as evidenced by the robust body of literature on this topic. Neuroimaging research developed in the past 20 years has successfully identified several brain regions involved in the complex set of cognitive processes underlying music perception, memory, emotion, and performance, providing the foundation upon which research has started to explore how these different brain regions interact to support music processing. This chapter provides a broad panorama of the current knowledge concerning the anatomical and functional basis of music processing through a network perspective. Starting with the trajectory of auditory stimuli through the ascending auditory pathway, we described how interactions between auditory and frontal cortical areas are crucial for the transformation of the acoustic information into a musically meaningful tonal context, for the integration of sound events over time in working memory, and the role of frontal areas in autobiographical memories, attention, and musical imagery. Anatomical and functional coordination between auditory and motor-related areas were also discussed in order to understand how cortical and subcortical areas are involved in sensorimotor synchronization and temporal processing, focusing more specifically on the roles of cortico-cerebellar and basal ganglia-thalamo-cortical networks. Auditory and limbic interactions were also discussed in relation to affective sound processing and music-evoked emotions, also pointing to the importance of the integration between subcortical dopaminergic regions and higher-order cortical areas for aesthetic pleasure. To finalize, we reviewed recent studies investigating how brain networks ­organize themselves in a naturalistic music listening context. Collectively, this robust body of literature suggests that music processing requires timely coordination of large-scale cognitive, motor, and limbic brain networks, setting the stage for a new generation of music neuroscience research on the dynamic organization of brain networks underlying music processing.

References Adamos, D. A., Laskaris, N., & Micheloyannis, S. (2018). Harnessing functional segregation across brain rhythms as a means to detect EEG oscillatory multiplexing during music listening. Journal of Neural Engineering 15, 036012. Agostino, P. V., & Cheng, R. K. (2016). Contributions of dopaminergic signaling to timing accuracy and precision. Current Opinion in Behavioral Sciences 8, 153–160. Akkal, D., Dum, R. P., & Strick, P. L. (2007). Supplementary motor area and presupplementary motor area: Targets of basal ganglia and cerebellar output. Journal of Neuroscience 27(40), 10659–10673. Alho, K., Rinne, T., Herron, T. J., & Woods, D. L. (2014). Stimulus-dependent activations and attention-related modulations in the auditory cortex: A meta-analysis of fMRI studies. Hearing Research 307, 29–41.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

108    thenille braun janzen and michael h. thaut Allman, M. J., & Meck, W. H. (2012). Pathophysiological distortions in time perception and timed performance. Brain 135(3), 656–677. Allman, M. J., Teki, S., Griffiths, T. D., & Meck, W. H. (2014). Properties of the internal clock: First- and second-order principles of subjective time. Annual Review of Psychology 65, 743–771. Alluri, V., Brattico, E., Toiviainen, P., Burunat, I., Bogert, B., Numminen, J., & Kliuchko, M. (2015). Musical expertise modulates functional connectivity of limbic regions during continuous music listening. Psychomusicology 25(4), 443–454. Alluri, V., Toiviainen, P., Burunat, I., Kliuchko, M., Vuust, P., & Brattico, E. (2017). Connectivity patterns during music listening: Evidence for action-based processing in musicians. Human Brain Mapping 38(6), 2955–2970. Alluri, V., Toiviainen, P., Jääskeläinen, I. P., Glerean, E., Sams, M., & Brattico, E. (2012). Largescale brain networks emerge from dynamic processing of musical timbre, key and rhythm. NeuroImage 59(4), 3677–3689. Alluri, V., Toiviainen, P., Lund, T.  E., Wallentin, M., Vuust, P., Nandi, A.  K., . . . Brattico, E. (2013). From Vivaldi to Beatles and back: Predicting lateralized brain responses to music. NeuroImage 83, 627–636. Amunts, K., Morosan, P., Hilbig, H., & Zilles, K. (2012). Auditory system. In J.  K.  Mai & G. Paxinos (Eds.), The human nervous system (3rd ed., pp. 1270–1300). London: Elsevier. Andoh, J., & Zatorre, R. J. (2011). Interhemispheric connectivity influences the degree of modulation of TMS-induced effects during auditory processing. Frontiers in Psychology 2, 161. Ashe, J., & Bushara, K. (2014). The olivo-cerebellar system as a neural clock. In H. Merchant & V. de Lafuente (Eds.), Neurobiology of interval timing: Advances in experimental medicine and biology (pp. 155–166). New York: Springer. Bajo, V. M., Nodal, F. R., Moore, D. R., & King, A. J. (2010). The descending corticocollicular pathway mediates learning-induced auditory plasticity. Nature Neuroscience 13(2), 253–260. Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., . . . Altenmüller, E. (2006). Shared networks for auditory and motor processing in professional pianists: Evidence from fMRI conjunction. NeuroImage 30(3), 917–926. Bartolo, R., Prado, L., & Merchant, H. (2014). Information processing in the primate basal ganglia during sensory-guided and internally driven rhythmic tapping. Journal of Neuroscience 34(11), 3910–3923. Baumann, O., Borra, R. J., Bower, J. M., Cullen, K. E., Habas, C., Ivry, R. B., . . . Sokolov, A. A. (2015). Consensus paper: The role of the cerebellum in perceptual processes. Cerebellum 14(2), 197–220. Baumann, S., Griffiths, T. D., Sun, L., Petkov, C. I., Thiele, A., & Rees, A. (2011). Orthogonal representation of sound dimensions in the primate midbrain. Nature Neuroscience 14(4), 423–425. Baumann, S., Koeneke, S., Schmidt, C. F., Meyer, M., Lutz, K., & Jancke, L. (2007). A network for audio-motor coordination in skilled pianists and non-musicians. Brain Research 1161(1), 65–78. Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A. (2004). Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nature Neuroscience 7(11), 1190–1192. Beauchamp, M. S., Nath, A. R., & Pasalar, S. (2010). fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. Journal of Neuroscience 30(7), 2414–2417.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    109 Beauchamp, M. S., Yasar, N. E., Frye, R. E., & Ro, T. (2008). Touch, sound and vision in human superior temporal sulcus. NeuroImage 41(3), 1011–1020. Belin, P., & Zatorre, R.  J. (2000). “What,” “where” and “how” in auditory cortex. Nature Neuroscience 3(10), 965–966. Bhattacharya, J., & Petsche, H. (2005). Phase synchrony analysis of EEG during music perception reveals changes in functional connectivity due to musical expertise. Signal Processing 85(11), 2161–2177. Bianco, R., Novembre, G., Keller, P. E. E., Kim, S.-G. G., Scharf, F., Friederici, A. D. D., . . .Sammler, D. (2016). Neural networks for harmonic structure in music perception and action. NeuroImage 142, 454–464. Bizley, J. K., & Cohen, Y. E. (2013). The what, where and how of auditory-object perception. Nature Reviews Neuroscience 14(10), 693–707. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences 98(20), 11818–11823. Bostan, A. C., Dum, R. P., & Strick, P. L. (2013). Cerebellar networks with the cerebral cortex and basal ganglia. Trends in Cognitive Sciences 17(5), 241–254. Brattico, E., Alluri, V., Bogert, B., Jacobsen, T., Vartiainen, N., Nieminen, S., & Tervaniemi, M. (2011). A functional MRI study of happy and sad emotions in music with and without lyrics. Frontiers in Psychology 2, 308. Brechmann, A., & Scheich, H. (2005). Hemispheric shifts of sound representation in auditory cortex with conceptual listening. Cerebral Cortex 15(5), 578–587. Brown, S., & Martinez, M. J. (2007). Activation of premotor vocal areas during musical discrimination. Brain and Cognition 63(1), 59–69. Brown, S., Martinez, M. J., & Parsons, L. M. (2006). The neural basis of human dance. Cerebral Cortex 16(8), 1157–1167. Buckner, R. L. (2013). The cerebellum and cognitive function: 25 years of insight from anatomy and neuroimaging. Neuron 80(3), 807–815. Buhusi, C. V., & Meck, W. H. (2005). What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews Neuroscience 6(10), 755–765. Burunat, I., Alluri, V., Toiviainen, P., Numminen, J., & Brattico, E. (2014). Dynamics of brain activity underlying working memory for music in a naturalistic condition. Cortex 57, 254–269. Caligiore, D., Pezzulo, G., Baldassarre, G., Bostan, A. C., Strick, P. L., Doya, K., . . . Herreros, I. (2017). Consensus paper: Towards a systems-level view of cerebellar function: The interplay between cerebellum, basal ganglia, and cortex. Cerebellum 16(1), 203–229. Cammoun, L., Thiran, J. P., Griffa, A., Meuli, R., Hagmann, P., & Clarke, S. (2015). Intrahemispheric cortico-cortical connections of the human auditory cortex. Brain Structure & Function 220(6), 3537–3553. Cha, K., Zatorre, R.  J., & Schönwiesner, M. (2016). Frequency selectivity of voxel-by-voxel functional connectivity in human auditory cortex. Cerebral Cortex 26(1), 211–224. Chapin, H. L., Zanto, T., Jantzen, K. J., Kelso, S. J. A. A., Steinberg, F., & Large, E. W. (2010). Neural responses to complex auditory rhythms: The role of attending. Frontiers in Psychology 1, 547–558. Chauvigné, L. A. S., Gitau, K. M., & Brown, S. (2014). The neural basis of audiomotor entrainment: An ALE meta-analysis. Frontiers in Human Neuroscience 8, 776.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

110    thenille braun janzen and michael h. thaut Chen, C. H., Fremont, R., Arteaga-Bracho, E. E., & Khodakhah, K. (2014). Short latency cerebellar modulation of the basal ganglia. Nature Neuroscience 17(12), 1767–1775. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008a). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex 18(12), 2844–2854. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008b). Moving on time: Brain network for auditory-motor synchronization is modulated by rhythm complexity and musical training. Journal of Cognitive Neuroscience 20(2), 226–239. Chen, J. L., Rae, C., & Watkins, K. E. (2012). Learning to play a melody: An fMRI study examining the formation of auditory-motor associations. NeuroImage 59(2), 1200–1208. Chen, J. L., Zatorre, R. J., & Penhune, V. B. (2006). Interactions between auditory and dorsal premotor cortex during synchronization to musical rhythms. NeuroImage 32(4), 1771–1781. Choppin, S., Trost, W., Dondaine, T., Millet, B., Drapier, D., Vérin, M., . . . Grandjean, D. (2016). Alteration of complex negative emotions induced by music in euthymic patients with bipolar disorder. Journal of Affective Disorders 191, 15–23. Coull, J. T., Cheng, R. K., & Meck, W. H. (2011). Neuroanatomical and neurochemical substrates of timing. Neuropsychopharmacology 36(1), 3–25. Coull, J. T., Hwang, H. J., Leyton, M., & Dagher, A. (2012). Dopamine precursor depletion impairs timing in healthy volunteers by attenuating activity in putamen and supplementary motor area. Journal of Neuroscience 32(47), 16704–16715. Coull, J. T., Vidal, F., & Burle, B. (2016). When to act, or not to act: That’s the SMA’s question. Current Opinion in Behavioral Sciences 8, 14–21. Cunnington, R., Bradshaw, J. L., & Iansek, R. (1996). The role of the supplementary motor area in the control of voluntary movement. Human Movement Science 15(5), 627–647. D’Ausilio, A., Altenmüller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity of the motor cortex while listening to a rehearsed musical piece. European Journal of Neuroscience 24(3), 955–958. Da Costa, S., van der Zwaag, W., Marques, J. P., Frackowiak, R. S. J., Clarke, S., & Saenz, M. (2011). Human primary auditory cortex follows the shape of Heschl’s gyrus. Journal of Neuroscience 31(40), 14067–14075. de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L., & Theunissen, F. E. (2017). The hierarchical cortical organization of human speech processing. Journal of Neuroscience 37(27), 6539–6557. Del Olmo, M. F., Cheeran, B., Koch, G., & Rothwell, J. C. (2007). Role of the cerebellum in externally paced rhythmic finger movements. Journal of Neurophysiology 98(1), 145–152. Diedrichsen, J., Criscimagna-Hemminger, S. E., & Shadmehr, R. (2007). Dissociating timing and coordination as functions of the cerebellum. Journal of Neuroscience 27(23), 6291–6301. Doyon, J., Penhune, V., & Ungerleider, L.  G. (2003). Distinct contribution of the corticostriatal and cortico-cerebellar systems to motor skill learning. Neuropsychologia 41(3), 252–262. Dum, R. P. (2002). An unfolded map of the cerebellar dentate nucleus and its projections to the cerebral cortex. Journal of Neurophysiology 89(1), 634–639. Durstewitz, D. (2003). Self-organizing neural integrator predicts interval times through climbing activity. Journal of Neuroscience 23(12), 5342–5353. Farrugia, N., Jakubowski, K., Cusack, R., & Stewart, L. (2015). Tunes stuck in your brain: The frequency and affective evaluation of involuntary musical imagery correlate with cortical structure. Consciousness and Cognition 35, 66–77.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    111 Fatemi, S. H., Aldinger, K. A., Ashwood, P., Bauman, M. L., Blaha, C. D., Blatt, G. J., . . .Welsh, J.  P. (2012). Consensus paper: Pathological role of the cerebellum in autism. Cerebellum 11(3), 777–807. Fernández-Miranda, J. C., Wang, Y., Pathak, S., Stefaneau, L., Verstynen, T., & Yeh, F. C. (2015). Asymmetry, connectivity, and segmentation of the arcuate fascicle in the human brain. Brain Structure & Function 220(3), 1665–1680. Foster, N. E. V. V, Halpern, A. R., & Zatorre, R. J. (2013). Common parietal activation in musical mental transformations across pitch and time. NeuroImage 75, 27–35. Friston, K. J. (2011). Functional and effective connectivity: A review. Brain Connectivity 1(1), 13–36. Froud, K.  E., Wong, A.  C.  Y., Cederholm, J.  M.  E., Klugmann, M., Sandow, S.  L., Julien, J.-P., . . . Housley, G. D. (2015). Type II spiral ganglion afferent neurons drive medial olivococh­ lear reflex suppression of the cochlear amplifier. Nature Communications 6(1), 7115. Frühholz, S., Trost, W., & Grandjean, D. (2014). The role of the medial temporal limbic system in processing emotions in voice and music. Progress in Neurobiology 123, 1–17. Frühholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions: Towards a unifying neural network perspective of affective sound processing. Neuroscience & Biobehavioral Reviews 68, 96–110. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized Timing of isochronous sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32(5), 1791–1802. Gaab, N., Gaser, C., Zaehle, T., Jancke, L., & Schlaug, G. (2003). Functional anatomy of pitch memory: An fMRI study with sparse temporal sampling. NeuroImage 19(4), 1417–1426. Gao, J. H., Parsons, L. M., Bower, J. M., Xiong, J., Li, J., & Fox, P. T. (1996). Cerebellum implicated in sensory acquisition and discrimination rather than motor control. Science 272(5261), 545–547. Giovannelli, F., Banfi, C., Borgheresi, A., Fiori, E., Innocenti, I., Rossi, S., . . . Cincotta, M. (2013). The effect of music on corticospinal excitability is related to the perceived emotion: A transcranial magnetic stimulation study. Cortex 49(3), 702–710. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Grahn, J. A., Henry, M. J., & McAuley, J. D. (2011). fMRI investigation of cross-modal interactions in beat perception: Audition primes vision, but not vice versa. NeuroImage 54(2), 1231–1243. Grahn, J.  A., & Rowe, J.  B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience 29(23), ­ 7540–7548. Grahn, J. A., & Rowe, J. B. (2013). Finding and feeling the musical beat: Striatal dissociations between detection and prediction of regularity. Cerebral Cortex 23(4), 913–921. Griffiths, T. D., & Warren, J. D. (2002). The planum temporale as a computational hub. Trends in Neurosciences 25(7), 348–353. Griffiths, T. D., & Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience 5(11), 887–892. Grothe, B. (2000). The evolution of temporal processing in the medial superior olive, an auditory brainstem structure. Progress in Neurobiology 61(6), 581–610.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

112    thenille braun janzen and michael h. thaut Groussard, M., Viader, F., Hubert, V., Landeau, B., Abbas, A., Desgranges, B., . . . Platel, H. (2010). Musical and verbal semantic memory: Two distinct neural networks? NeuroImage 49(3), 2764–2773. Grube, M., Cooper, F. E., Chinnery, P. F., & Griffiths, T. D. (2010). Dissociation of durationbased and beat-based auditory timing in cerebellar degeneration. Proceedings of the National Academy of Sciences 107(25), 11597–11601. Grube, M., Lee, K. H., Griffiths, T. D., Barker, A. T., & Woodruff, P. W. (2010). Transcranial magnetic theta-burst stimulation of the human cerebellum distinguishes absolute, duration-based from relative, beat-based perception of subsecond time intervals. Frontiers in Psychology 1, 171. Haber, S. N., & Knutson, B. (2010). The reward circuit: Linking primate anatomy and human imaging. Neuropsychopharmacology 35(1), 4–26. Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET investigation of auditory imagery for familiar melodies. Cerebral Cortex 9(7), 697–704. Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural correlates of perceived and imagined musical timbre. Neuropsychologia 42(9), 1281–1292. Halsband, U., Ito, N., Tanji, J., & Freund, H. J. (1993). The role of premotor cortex and the supplementary motor area in the temporal control of movement in man. Brain 116(1), 243–266. Harrington, D. L., Castillo, G. N., Greenberg, P. A., Song, D. D., Lessig, S., Lee, R. R., & Rao, S. M. (2011). Neurobehavioral mechanisms of temporal processing deficits in Parkinson’s disease. PLoS ONE 6(2), e17461. Harrington, D. L., & Jahanshahi, M. (2016). Reconfiguration of striatal connectivity for timing and action. Current Opinion in Behavioral Sciences 8, 78–84. Harris, R., & De Jong, B. M. (2014). Cerebral activations related to audition-driven performance imagery in professional musicians. PLoS ONE, 9(4), e93681. Hasegawa, T., Matsuki, K. I., Ueno, T., Maeda, Y., Matsue, Y., Konishi, Y., & Sadato, N. (2004). Learned audio-visual cross-modal associations in observed piano playing activate the left planum temporale: An fMRI study. Cognitive Brain Research 20(3), 510–518. Haueisen, J., & Knösche, T. R. (2001). Involuntary motor activity in pianists evoked by music perception. Journal of Cognitive Neuroscience 13(6), 786–792. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience 8(5), 393–402. Horn, A. K. E. (2006). The reticular formation. Progress in Brain Research 151, 127–155. Huffman, R. F., & Henson, O. W. (1990). The descending auditory pathway and acousticomotor systems: Connections with the inferior colliculus. Brain Research Reviews 15(3), 295–323. Humphries, C., Liebenthal, E., & Binder, J. R. (2010). Tonotopic organization of human auditory cortex. NeuroImage 50(3), 1202–1211. Huron, D. B. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Hyde, K. L., Peretz, I., & Zatorre, R. J. (2008). Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46(2), 632–639. Ito, M. (2008). Control of mental activities by internal models in the cerebellum. Nature Reviews Neuroscience 9(4), 304–313. Iversen, J.  R., & Balasubramaniam, R. (2016). Synchronization and temporal processing. Current Opinion in Behavioral Sciences 8, 175–180.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    113 Ivry, R. B., Spencer, R. M., Zelaznik, H. N., & Diedrichsen, J. (2002). The cerebellum and event timing. Annals of the New York Academy of Sciences 978, 302–317. Jahanshahi, M., Jenkins, I. H., Brown, R. G., Marsden, C. D., Passingham, R. E., & Brooks, D. J. (1995). Self-initiated versus externally triggered movements: I.  An investigation using measurement of regional cerebral blood flow with PET and movement-related potentials in normal and Parkinson’s disease subjects. Brain 118(4), 913–933. Jahanshahi, M., Jones, C. R. G., Zijlmans, J., Katzenschlager, R., Lee, L., Quinn, N., . . . Lees, A. J. (2010). Dopaminergic modulation of striato-frontal connectivity during motor timing in Parkinson’s disease. Brain 133(3), 727–745. Janak, P. H., & Tye, K. M. (2015). From circuits to behaviour in the amygdala. Nature 517(7534), 284–292. Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex 19(11), 2579–2594. Janata, P. (2015). Neural basis of music perception. In G.  G.  Celesia & G.  Hickok (Eds.), Handbook of clinical neurology: The human auditory system (Vol. 129, pp. 187–205). Amsterdam: Elsevier. Janata, P., Birk, J., Van Horn, J., Leman, M., Tillmann, B., & Bharucha, J. J. (2002). The cortical topography of tonal structures underlying Western music. Science 293(5539), 2425–2430. Janata, P., Tillmann, B., & Bharucha, J.  J. (2002). Listening to polyphonic music recruits domain-general attention and working memory circuits. Cognitive, Affective & Behavioral Neuroscience 2(2), 121–140. Jäncke, L., Loose, R., Lutz, K., Specht, K., & Shah, N. (2000). Cortical activations during paced finger-tapping applying visual and auditory pacing stimuli. Cognitive Brain Research 10(1–2), 51–66. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31(5), 559–575. Kaiser, J., Ripper, B., Birbaumer, N., & Lutzenberger, W. (2003). Dynamics of gamma-band activity in human magnetoencephalogram during auditory pattern working memory. NeuroImage 20(2), 816–827. Keller, P.  E. (2012). Mental imagery in music performance: Underlying mechanisms and potential benefits. Annals of the New York Academy of Sciences 1252(1), 206–213. Kelly, R. M., & Strick, P. L. (2003). Cerebellar loops with motor cortex and prefrontal cortex of a nonhuman primate. Journal of Neuroscience 23(23), 8432–8444. Keren-Happuch, E., Chen, S. H. A., Ho, M. H. R., & Desmond, J. E. (2014). A meta-analysis of cerebellar contributions to higher cognition from PET and fMRI studies. Human Brain Mapping 35(2), 593–615. Kiehl, K. A., Laurens, K. R., Duty, T. L., Forster, B. B., & Liddle, P. F. (2001). Neural sources involved in auditory target detection and novelty processing: An event-related fMRI study. Psychophysiology 38(1), 133–142. Klein, C., Liem, F., Hänggi, J., Elmer, S., & Jäncke, L. (2016). The “silent” imprint of musical training. Human Brain Mapping 37(2), 536–546. Klein, M. E., & Zatorre, R. J. (2011). A role for the right superior temporal sulcus in categorical perception of musical chords. Neuropsychologia 49(5), 878–887. Klein, M. E., & Zatorre, R. J. (2015). Representations of invariant musical categories are decodable by pattern analysis of locally distributed BOLD responses in superior temporal and intraparietal sulci. Cerebral Cortex 25(7), 1947–1957.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

114    thenille braun janzen and michael h. thaut Koelsch, S. (2006). Significance of Broca’s area and ventral premotor cortex for music-syntactic processing. Cortex 42(4), 518–520. Koelsch, S. (2011). Toward a neural basis of music perception: A review and updated model. Frontiers in Psychology 2, 110. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15(3), 170–180. Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005). Adults and children processing music: An fMRI study. NeuroImage 25(4), 1068–1076. Koelsch, S., Fritz, T., v. Cramon, D. Y., Müller, K., & Friederici, A. D. (2006). Investigating emotion with music: An fMRI study. Human Brain Mapping 27(3), 239–250. Koelsch, S., Gunter, T. C., v. Cramon, D. Y., Zysset, S., Lohmann, G., & Friederici, A. D. (2002). Bach speaks: A cortical “language-network” serves the processing of music. NeuroImage 17(2), 956–966. Koelsch, S., Schulze, K., Sammler, D., Fritz, T., Müller, K., & Gruber, O. (2009). Functional architecture of verbal and tonal working memory: An fMRI study. Human Brain Mapping 30(3), 859–873. Koelsch, S., & Skouras, S. (2014). Functional centrality of amygdala, striatum and hypothalamus in a “small-world” network underlying joy: An fMRI study with music. Human Brain Mapping 35(7), 3485–3498. Koelsch, S., Skouras, S., Fritz, T., Herrera, P., Bonhage, C., Küssner, M. B., & Jacobs, A. M. (2013). The roles of superficial amygdala and auditory cortex in music-evoked fear and joy. NeuroImage 81, 49–60. Koelsch, S., Skouras, S., & Lohmann, G. (2018). The auditory cortex hosts network nodes influential for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13(1), e0190057. Kornysheva, K., & Schubotz, R. I. (2011). Impairment of auditory-motor timing and compensatory reorganization after ventral premotor cortex stimulation. PLoS ONE 6(6), e21421. Kotz, S. A., Brown, R. M., & Schwartze, M. (2016). Cortico-striatal circuits and the timing of action and perception. Current Opinion in Behavioral Sciences 8, 42–45. Kotz, S. A., Stockert, A., & Schwartze, M. (2014). Cerebellum, temporal predictability and the updating of a mental model. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1658), 20130403. Koziol, L. F., Budding, D., Andreasen, N., D’Arrigo, S., Bulgheroni, S., Imamizu, H., . . .Yamazaki, T. (2014). Consensus paper: The cerebellum’s role in movement and cognition. Cerebellum 13(1), 151–177. Koziol, L. F., Budding, D. E., & Chidekel, D. (2011). Sensory integration, sensory processing, and sensory modulation disorders: Putative functional neuroanatomic underpinnings. Cerebellum 10(4), 770–792. Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomotor recognition network while listening to newly acquired actions. Journal of Neuroscience 27(2), 308–314. Langers, D. R. M. (2014). Assessment of tonotopically organised subdivisions in human auditory cortex using volumetric and surface-based cortical alignments. Human Brain Mapping 35(4), 1544–1561. Lappe, C., Lappe, M., & Pantev, C. (2016). Differential processing of melodic, rhythmic and simple tone deviations in musicians: An MEG study. NeuroImage 124, 898–905.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    115 Lappe, C., Steinsträter, O., & Pantev, C. (2013). Rhythmic and melodic deviations in musical sequences recruit different cortical areas for mismatch detection. Frontiers in Human Neuroscience 7, 260. Large, E. W., Herrera, J. A., & Velasco, M. J. (2015). Neural networks for beat perception in musical rhythm. Frontiers in Systems Neuroscience 9, 159. Large, E. W., & Snyder, J. S. (2009). Pulse and meter as neural resonance. Annals of the New York Academy of Sciences 1169, 46–57. LeDoux, J. (2007). The amygdala. Current Biology 17(20), R868–R874. Lee, K.-H., Egleston, P. N., Brown, W. H., Gregory, A. N., Barker, A. T., & Woodruff, P. W. R. (2007). The role of the cerebellum in subsecond time perception: Evidence from repetitive transcranial magnetic stimulation. Journal of Cognitive Neuroscience 19(1), 147–157. Lee, Y. S., Janata, P., Frost, C., Hanke, M., & Granger, R. (2011). Investigation of melodic contour processing in the brain using multivariate pattern-based fMRI. NeuroImage 57(1), 293–300. Lehéricy, S., Ducros, M., Krainik, A., Francois, C., Van De Moortele, P. F., Ugurbil, K., & Kim, D. S. (2004). 3-D diffusion tensor axonal tracking shows distinct SMA and pre-SMA projections to the human striatum. Cerebral Cortex 14(12), 1302–1309. Lehne, M., Rohrmeier, M., & Koelsch, S. (2013). Tension-related activity in the orbitofrontal cortex and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9(10), 1515–1523. Leow, L. A., & Grahn, J. A. (2014). Neural mechanisms of rhythm perception: Present findings and future directions. Advances in Experimental Medicine and Biology 829, 325–338. Lima, C. F., Krishnan, S., & Scott, S. K. (2016). Roles of supplementary motor areas in auditory processing and auditory imagery. Trends in Neurosciences 39(8), 527–542. Loui, P. (2015). A dual-stream neuroanatomy of singing. Music Perception 32(3), 232–241. Lusk, N. A., Petter, E. A., Macdonald, C. J., & Meck, W. H. (2016). Cerebellar, hippocampal, and striatal time cells. Current Opinion in Behavioral Sciences 8, 186–192. Maes, P.-J., Leman, M., Palmer, C., & Wanderley, M. M. (2014). Action-based effects on music perception. Frontiers in Psychology 4, 1008. Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: An MEG study. Nature Neuroscience 4(5), 540–545. Maidhof, C., & Koelsch, S. (2011). Effects of selective attention on syntax processing in music and language. Journal of Cognitive Neuroscience 23(9), 2252–2267. Manto, M., Bower, J.  M., Conforto, A.  B., Delgado-García, J.  M., Da Guarda, S.  N.  F., Gerwig, M., . . . Timmann, D. (2012). Consensus paper: Roles of the cerebellum in motor control: The diversity of ideas on cerebellar involvement in movement. Cerebellum 11, 457–487. Martínez-Molina, N., Mas-Herrero, E., Rodríguez-Fornells, A., Zatorre, R.  J., & MarcoPallarés, J. (2016). Neural correlates of specific musical anhedonia. Proceedings of the National Academy of Sciences 113(46), E7337–E7345. Marvel, C. L., & Desmond, J. E. (2010). Functional topography of the cerebellum in verbal working memory. Neuropsychology Review 20(3), 271–279. Mas-Herrero, E., Dagher, A., & Zatorre, R. J. (2017). Modulating musical reward sensitivity up and down with transcranial magnetic stimulation. Nature Human Behaviour 2(1), 27–32. Matell, M. S., & Meck, W. H. (2004). Cortico-striatal circuits and interval timing: Coincidence detection of oscillatory processes. Cognitive Brain Research 21(2), 139–170.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

116    thenille braun janzen and michael h. thaut Mauk, M. D., & Buonomano, D. V. (2004). The neural basis of temporal processing. Annual Review of Neuroscience 27, 307–340. Mayville, J. M., Jantzen, K. J., Fuchs, A., Steinberg, F. L., & Kelso, J. A. S. (2002). Cortical and subcortical networks underlying syncopated and synchronized coordination revealed using fMRI. Human Brain Mapping 17(4), 214–229. Medina, J. F., & Mauk, M. D. (2000). Computer simulation of cerebellar information processing. Nature Neuroscience 3(Suppl.), 1205–1211. Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage 28(1), 175–184. Merchant, H., & Bartolo, R. (2018). Primate beta oscillations and rhythmic behaviors. Journal of Neural Transmission 125, 461–470. Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T. (2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140093. Merchant, H., Harrington, D. L., & Meck, W. H. (2013). Neural basis of the perception and estimation of time. Annual Review of Neuroscience 36, 313–336. Merchant, H., Perez, O., Zarco, W., & Gamez, J. (2013). Interval tuning in the primate medial premotor cortex as a general timing mechanism. Journal of Neuroscience 33(21), 9082–9096. Michaelis, K., Wiener, M., & Thompson, J.  C. (2014). Passive listening to preferred motor tempo modulates corticospinal excitability. Frontiers in Human Neuroscience 8, 252. Middleton, F. A., & Strick, P. L. (2001). Cerebellar projections to the prefrontal cortex of the primate. Journal of Neuroscience 21(2), 700–712. Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R. (2007). A functional MRI study of happy and sad affective states induced by classical music. Human Brain Mapping 28(11), 1150–1162. Molinari, M., Leggio, M. G., Filippini, V., Gioia, M. C., Cerasa, A., & Thaut, M. H. (2005). Sensorimotor transduction of time information is preserved in subjects with cerebellar damage. Brain Research Bulletin 67, 448–458. Molinari, M., Leggio, M. G., & Thaut, M. H. (2007). The cerebellum and neural networks for rhythmic sensorimotor synchronization in the human brain. Cerebellum 6(1), 18–23. Moore, E., Schaefer, R. S., Bastin, M. E., Roberts, N., & Overy, K. (2014). Can musical training influence brain connectivity? Evidence from diffusion tensor MRI. Brain Sciences 4(2), 405–427. Morillon, B., & Baillet, S. (2017). Motor origin of temporal predictions in auditory attention. Proceedings of the National Academy of Sciences 114(42), E8913–E8921. Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J. J., . . . Möller, H. E. (2015). Investigating the dynamics of the brain response to music: A central role of the ventral striatum/nucleus accumbens. NeuroImage 116, 68–79. Nachev, P., Kennard, C., & Husain, M. (2008). Functional role of the supplementary and presupplementary motor areas. Nature Reviews Neuroscience 9, 856–869. Narayanan, N. S., Land, B. B., Solder, J. E., Deisseroth, K., & DiLeone, R. J. (2012). Prefrontal D1 dopamine signaling is required for temporal control. Proceedings of the National Academy of Sciences 109(50), 20726–20731. Nayagam, B. A., Muniak, M. A., & Ryugo, D. K. (2011). The spiral ganglion: Connecting the peripheral and central auditory systems. Hearing Research 278(1–2), 2–20.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    117 Nelson, A., Schneider, D. M., Takatoh, J., Sakurai, K., Wang, F., & Mooney, R. (2013). A circuit for motor cortical modulation of auditory cortical activity. Journal of Neuroscience 33(36), 14342–14353. Norman-Haignere, S., Kanwisher, N., & McDermott, J. H. (2013). Cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex. Journal of Neuroscience 33(50), 19451–19469. Novembre, G., & Keller, P. E. (2014). A conceptual review on action-perception coupling in the musicians’ brain: What is it good for? Frontiers in Human Neuroscience 8, 603. Nozaradan, S., Schwartze, M., Obermeier, C., & Kotz, S. A. (2017). Specific contributions of basal ganglia and cerebellum to the neural tracking of rhythm. Cortex 95, 156–168. O’Reilly, J. X., Mesulam, M. M., & Nobre, A. C. (2008). The cerebellum predicts the timing of perceptual events. Journal of Neuroscience 28(9), 2252–2260. Ohnishi, T., Matsuda, H., Asada, T., Aruga, M., Hirakata, M., Nishikawa, M., . . . Imabayashi, E. (2001). Functional anatomy of musical perception in musicians. Cerebral Cortex 11(8), 754–760. Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2005). Emotion processing of major, minor, and dissonant chords: A functional magnetic resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453. Palomar-García, M. Á., Zatorre, R. J., Ventura-Campos, N., Bueichekú, E., & Ávila, C. (2017). Modulation of functional connectivity in auditory-motor networks in musicians compared with nonmusicians. Cerebral Cortex 27(5), 2768–2778. Pannese, A., Grandjean, D., & Frühholz, S. (2016). Amygdala and auditory cortex exhibit distinct sensitivity to relevant acoustic features of auditory emotions. Cortex 85, 116–125. Paquette, S., Fujii, S., Li, H. C., & Schlaug, G. (2017). The cerebellum’s contribution to beat interval discrimination. NeuroImage 163, 177–182. Parsons, L. M. (2012). Exploring the functional neuroanatomy of music performance, perception, and comprehension. The Cognitive Neuroscience of Music 930(1), 211–231. Parsons, L. M., Petacchi, A., Schmahmann, J. D., & Bower, J. M. (2009). Pitch discrimination in cerebellar patients: Evidence for a sensory deficit. Brain Research 1303, 84–96. Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron 36(4), 767–776. Pecenka, N., Engel, A., & Keller, P. E. (2013). Neural correlates of auditory temporal predictions during sensorimotor synchronization. Frontiers in Human Neuroscience 7, 380. Pelzer, E. A., Melzer, C., Timmermann, L., von Cramon, D. Y., & Tittgemeyer, M. (2017). Basal ganglia and cerebellar interconnectivity within the human thalamus. Brain Structure and Function 222(1), 381–392. Perani, D. (2012). Functional and structural connectivity for language and music processing at birth. Rendiconti Lincei 23(3), 305–314. Peretz, I., Gosselin, N., Belin, P., Zatorre, R. J., Plailly, J., & Tillmann, B. (2009). Music lexical networks: The cortical organization of music recognition. Annals of the New York Academy of Sciences 1169, 256–265. Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annual Review of Psychology 56, 89–114. Petter, E. A., Lusk, N. A., Hesslow, G., & Meck, W. H. (2016). Interactive roles of the cerebellum and striatum in sub-second and supra-second timing: Support for an initiation,

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

118    thenille braun janzen and michael h. thaut continuation, adjustment, and termination (ICAT) model of temporal processing. Neuroscience & Biobehavioral Reviews 71, 739–755. Pfordresher, P. Q., Mantell, J. T., Brown, S., Zivadinov, R., & Cox, J. L. (2014). Brain responses to altered auditory feedback during musical keyboard production: An fMRI study. Brain Research 1556, 28–37. Plakke, B., & Romanski, L. M. (2014). Auditory connections and functions of prefrontal cortex. Frontiers in Neuroscience 8, 199. Platel, H., Baron, J. C., Desgranges, B., Bernard, F., & Eustache, F. (2003). Semantic and episodic memory of music are subserved by distinct neural networks. NeuroImage 20(1), 244–256. Proverbio, A. M., Orlandi, A., & Pisanu, F. (2016). Brain processing of consonance/dissonance in musicians and controls: A hemispheric asymmetry revisited. European Journal of Neuroscience 44(6), 2340–2356. Rao, S. M., Harrington, D. L., Haaland, K. Y., Bobholz, J. A., Cox, R. W., & Binder, J. R. (1997). Distributed neural systems underlying the timing of movements. Journal of Neuroscience 17(14), 5528–5535. Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nature Neuroscience 12(6), 718–724. Reser, D.  H., Burman, K.  J., Richardson, K.  E., Spitzer, M.  W., & Rosa, M.  G.  P. (2009). Connections of the marmoset rostrotemporal auditory area: Express pathways for analysis of affective content in hearing. European Journal of Neuroscience 30(4), 578–592. Reybrouck, M., & Brattico, E. (2015). Neuroplasticity beyond sounds: Neural adaptations following long-term musical aesthetic experiences. Brain Sciences 5(1), 69–91. Roberts, T. F., Hisey, E., Tanaka, M., Kearney, M. G., Chattree, G., Yang, C. F., . . . Mooney, R. (2017). Identification of a motor-to-auditory pathway important for vocal learning. Nature Neuroscience 20(7), 978–986. Rogenmoser, L., Zollinger, N., Elmer, S., & Jäncke, L. (2016). Independent component processes underlying emotions during natural music listening. Social Cognitive and Affective Neuroscience 11(9), 1428–1439. Ross, B., Barat, M., & Fujioka, T. (2017). Sound-making actions lead to immediate plastic changes of neuromagnetic evoked responses and induced β-band oscillations during perception. Journal of Neuroscience 37(24), 5948–5959. Ross, J. M., Iversen, J. R., & Balasubramaniam, R. (2016). Motor simulation theories of musical beat perception. Neurocase 22(6), 558–565. Rossignol, S., & Melvill Jones, G. (1976). Audio-spinal influence in man studied by the H-reflex and its possible role on rhythmic movements synchronized to sound. Electroencephalography and Clinical Neurophysiology 41(1), 83–92. Royal, I., Vuvan, D.  T., Zendel, B.  R., Robitaille, N., Schönwiesner, M., & Peretz, I. (2016). Activation in the right inferior parietal lobule reflects the representation of musical structure beyond simple pitch discrimination. PLoS ONE 11(5), e0155291. Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects human aesthetic responses to music. Social Cognitive and Affective Neuroscience 11(6), 884–891. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14(2), 257–264. Salimpoor, V. N., Van Den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R.  J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    119 Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Sänger, J., Müller, V., & Lindenberger, U. (2012). Intra- and interbrain synchronization and network properties when playing guitar in duets. Frontiers in Human Neuroscience 6, 312. Santoro, R., Moerel, M., De Martino, F., Goebel, R., Ugurbil, K., Yacoub, E., & Formisano, E. (2014). Encoding of natural sounds at multiple spectral and temporal resolutions in the human auditory cortex. PLoS Computational Biology 10(1), e1003412. Satoh, M., Takeda, K., Nagata, K., Hatazawa, J., & Kuzuhara, S. (2001). Activated brain regions in musicians during an ensemble: A PET study. Cognitive Brain Research 12(1), 101–108. Saur, D., Kreher, B.  W., Schnell, S., Kummerer, D., Kellmeyer, P., Vry, M.-S., . . . Weiller, C. (2008). Ventral and dorsal pathways for language. Proceedings of the National Academy of Sciences 105(46), 18035–18040. Schindler, A., Herdener, M., & Bartels, A. (2013). Coding of melodic gestalt in human auditory cortex. Cerebral Cortex 23(12), 2987–2993. Schmahmann, J.  D., & Pandya, D.  N. (1997). The cerebrocerebellar system. International Review of Neurobiology 41, 31–38, 38a, 39–60. Schneider, D. M., & Mooney, R. (2015). Motor-related signals in the auditory system for listening and learning. Current Opinion in Neurobiology 33, 78–84. Schneider, D. M., Nelson, A., & Mooney, R. (2014). A synaptic and circuit basis for corollary discharge in the auditory cortex. Nature 513(7517), 189–194. Schön, D., Gordon, R. L., & Besson, M. (2005). Musical and linguistic processing in song perception. Annals of the New York Academy of Sciences 1060(1), 71–81. Schonwiesner, M., & Zatorre, R. J. (2009). Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proceedings of the National Academy of Sciences 106(34), 14611–14616. Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a new framework. Trends in Cognitive Sciences 11(5), 211–218. Schulze, K., Zysset, S., Mueller, K., Friederici, A. D., & Koelsch, S. (2011). Neuroarchitecture of verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping 32(5), 771–783. Schwartze, M., Keller, P. E., & Kotz, S. A. (2016). Spontaneous, synchronized, and corrective timing behavior in cerebellar lesion patients. Behavioural Brain Research 312, 285–293. Schwartze, M., Rothermich, K., Schmidt-Kassow, M., & Kotz, S. A. (2011). Temporal regularity effects on pre-attentive and attentive processing of deviance. Biological Psychology 87(1), 146–151. Seger, C. A., Spiering, B. J., Sares, A. G., Quraini, S. I., Alpeter, C., David, J., & Thaut, M. H. (2013). Corticostriatal contributions to musical expectancy perception. Journal of Cognitive Neuroscience 25(7), 1062–1077. Shadmehr, R., Smith, M. A., & Krakauer, J. W. (2010). Error correction, sensory prediction, and adaptation in motor control. Annual Review of Neuroscience 33, 89–108. Sokolov, A.  A., Miall, R.  C., & Ivry, R.  B. (2017). The cerebellum: Adaptive prediction for movement and cognition. Trends in Cognitive Sciences 21(5), 313–332. Spencer, R. M. C., Ivry, R. B., & Zelaznik, H. N. (2005). Role of the cerebellum in movements: Control of timing or movement transitions? Experimental Brain Research 161(3), 383–396. Stewart, L., Overath, T., Warren, J. D., Foxton, J. M., & Griffiths, T. D. (2008). fMRI evidence for a cortical hierarchy of pitch pattern processing. PLoS ONE 3(1), e1470.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

120    thenille braun janzen and michael h. thaut Stoodley, C. J., & Schmahmann, J. D. (2009). Functional topography in the human cerebellum: A meta-analysis of neuroimaging studies. NeuroImage 44(2), 489–501. Stoodley, C. J., & Schmahmann, J. D. (2010). Evidence for topographic organization in the cerebellum of motor control versus cognitive and affective processing. Cortex 46(7), 831–844. Stupacher, J., Hove, M. J., Novembre, G., Schütz-Bosbach, S., & Keller, P. E. (2013). Musical groove modulates motor cortex excitability: A TMS investigation. Brain and Cognition 82(2), 127–136. Suga, N., & Ma, X. (2003). Multiparametric corticofugal modulation and plasticity in the auditory system. Nature Reviews Neuroscience 4(10), 783–794. Teki, S., Grube, M., & Griffiths, T. D. (2012). A unified model of time perception accounts for duration-based and beat-based timing mechanisms. Frontiers in Integrative Neuroscience 5, 90. Teki, S., Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of durationbased and beat-based auditory timing. Journal of Neuroscience 31(10), 3805–3812. Tervaniemi, M., Medvedev, S. V., Alho, K., Pakhomov, S. V., Roudas, M. S., Van Zuijen, T. L., & Näätänen, R. (2000). Lateralized automatic auditory processing of phonetic versus musical information: A PET study. Human Brain Mapping 10(2), 74–79. Tesche, C. D., & Karhu, J. J. T. (2000). Anticipatory cerebellar responses during somatosensory omission in man. Human Brain Mapping 9(3), 119–142. Thaut, M. H., Demartin, M., & Sanes, J. N. (2008). Brain networks for integrative rhythm formation. PLoS ONE 3(5), e2312. Thaut, M. H., McIntosh, G. C., Prassas, S. G., & Rice, R. R. (1992). Effect of rhythmic auditory cuing on temporal stride parameters and EMG patterns in normal gait. Neurorehabilitation and Neural Repair 6(4), 185–190. Thaut, M.  H., Stephan, K.  M., Wunderlich, G., Schicks, W., Tellmann, L., Herzog, H., . . . Hömberg, V. (2009). Distinct cortico-cerebellar activations in rhythmic auditory motor synchronization. Cortex 45(1), 44–53. Thaut, M. H., Trimarchi, P., & Parsons, L. (2014). Human brain basis of musical rhythm perception: Common and distinct neural substrates for meter, tempo, and pattern. Brain Sciences 4(2), 428–452. Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical priming. Annals of the New York Academy of Sciences 999, 209–211. Toiviainen, P., Alluri, V., Brattico, E., Wallentin, M., & Vuust, P. (2014). Capturing the musical brain with Lasso: Dynamic decoding of musical features from fMRI data. NeuroImage 88, 170–180. Tollin, D. J. (2003). The lateral superior olive: A functional role in sound source localization. Neuroscientist 9(2), 127–143. Tramo, M. J., Shah, G. D., & Braida, L. D. (2002). Functional role of auditory cortex in frequency processing and pitch perception. Journal of Neurophysiology 87(1), 122–139. Trost, W., Ethofer, T., Zentner, M., & Vuilleumier, P. (2012). Mapping aesthetic musical emotions in the brain. Cerebral Cortex 22(12), 2769–2783. Tseng, Y., Diedrichsen, J., Krakauer, J. W., Shadmehr, R., & Bastian, A. J. (2007). Sensory prediction errors drive cerebellum-dependent adaptation of reaching. Journal of Neurophysiology 98(1), 54–62. Von Der Heide, R. J., Skipper, L. M., Klobusicky, E., & Olson, I. R. (2013). Dissecting the uncinate fasciculus: Disorders, controversies and a hypothesis. Brain 136(6), 1692–1707. Warren, J. D., Jennings, A. R., & Griffiths, T. D. (2005). Analysis of the spectral envelope of sounds by the human brain. NeuroImage 24(4), 1052–1057.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

cerebral organization of music processing    121 Warren, J.  D., Uppenkamp, S., Patterson, R.  D., & Griffiths, T.  D. (2003). Separating pitch chroma and pitch height in the human brain. Proceedings of the National Academy of Sciences 100(17), 10038–10042. Warren, J. E., Wise, R. J. S., & Warren, J. D. (2005). Sounds do-able: Auditory-motor transformations and the posterior temporal plane. Trends in Neurosciences 28(12), 636–643. Warrier, C., Wong, P., Penhune, V., Zatorre, R., Parrish, T., Abrams, D., & Kraus, N. (2009). Relating structure to function: Heschl’s gyrus and acoustic processing. Journal of Neuroscience 29(1), 61–69. Wessinger, C. M., VanMeter, J., Tian, B., Van Lare, J., Pekar, J., & Rauschecker, J. P. (2001). Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging. Journal of Cognitive Neuroscience 13(1), 1–7. Wilkins, R. W., Hodges, D. A., Laurienti, P. J., Steen, M., & Burdette, J. H. (2014). Network science and the effects of music preference on functional brain connectivity: From Beethoven to Eminem. Scientific Reports 4(1), 6130. Wilson, E. M. F., & Davey, N. J. (2002). Musical beat influences corticospinal drive to ankle flexor and extensor muscles in man. International Journal of Psychophysiology 44(2), 177–184. Witt, S. T., Laird, A. R., & Meyerand, M. E. (2008). Functional neuroimaging correlates of finger-tapping task variations: An ALE meta-analysis. NeuroImage 42(1), 343–356. Wolpert, D. M., Miall, R. C., & Kawato, M. (1998). Internal models in the cerebellum. Trends in Cognitive Sciences 2(9), 338–347. Wu, J., Zhang, J., Ding, X., Li, R., & Zhou, C. (2013). The effects of music on brain functional networks: A network analysis. Neuroscience 250, 49–59. Wu, J., Zhang, J., Liu, C., Liu, D., Ding, X., & Zhou, C. (2012). Graph theoretical analysis of EEG functional connectivity during music perception. Brain Research 1483, 71–81. Zatorre, R.  J. (2002). Auditory cortex. In V.  S.  Ramachandran (Ed.), Encyclopedia of the Human Brain (pp. 289–301). Amsterdam: Elsevier. Zatorre, R. J. (2015). Musical pleasure and reward: Mechanisms and dysfunction. Annals of the New York Academy of Sciences 1337(1), 202–211. Zatorre, R. J., Bouffard, M., & Belin, P. (2004). Sensitivity to auditory object features in human temporal neocortex. Journal of Neuroscience 24(14), 3637–3642. Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the mind’s ear: A PET investigation of musical imagery and perception. Journal of Cognitive Neuroscience 8(1), 29–46. Zatorre, R. J., & Salimpoor, V. N. (2013). From perception to pleasure: Music and its neural substrates. Proceedings of the National Academy of Sciences 110(Suppl. 2), 10430–10437. Zatorre, R.  J., & Zarate, J. (2012). Cortical processing of music. In D.  Poeppel, T.  Overath, A. N. Popper, & R. R. Fay (Eds.), The human auditory cortex: Springer handbook of auditory research (Vol. 43, pp. 261–294). New York: Springer. Zysset, S., Huber, O., Ferstl, E., & von Cramon, D. Y. (2002). The anterior frontomedian cortex and evaluative judgment: An fMRI study. NeuroImage 15(4), 983–991.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 6

N et wor k N eu roscience: A n I n troduction to Gr a ph Th eory N et wor k-Based Tech n iqu es for M usic a n d Br a i n Im agi ng R e se a rch Robin W. Wilkins

Introduction In this chapter, I provide an introduction to network neuroscience techniques and methods that may be successfully applied to neuroimaging data for brain-based music research. Included in this chapter is a background to the field of network science more broadly, as an approach to the study of complex systems, in addition to the more currently accepted graph theory techniques and applied analysis methods. The focus of the chapter is on two main components. First, an introductory overview of some of the specific network-based techniques that may be applied to neuroimaging data for understanding structural and functional brain connectivity. For those interested in pursuing the effects of music on brain connectivity, it is important to understand there is a difference between network-based brain connectivity analyses and other conventional correlation measures of connectivity analysis. This is particularly true within the most prominent area of resting-state connectivity research (Biswall, Kylen, & Hyde,  1997; Biswal, Yetkin, Haughton, & Hyde, 1995; Fox et al., 2005; Greicius, Krasnow, Reiss, & Menon,  2003) as well as the default mode network (Broyd et al.,  2009; Buckner, Andrews-Hanna, & Schacter, 2008; Raichle, 2001). At present, terms such as “brain networks,” “functional connectivity,” or “brain connectivity” frequently appear in the brain

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   123 imaging literature. Nonetheless, readers are cautioned that all connectivity terms are not scientifically interchangeable or mathematically equal in their approach. Non-trivial statistical differences depend on whether network-based or correlational statistical methods are used to analyze and describe structural and functional brain connectivity (Bassett & Sporns, 2017; Bullmore & Sporns, 2009; Greicius et al., 2003; Stam, 2014). In addition, the field of network neuroscience is currently highly active and readers will find more refined network measures being generated and reported regularly. Thus, as a neuroscientific frontier, the second section of this chapter provides some of the more promising implications from the application of these network neuroscience techniques for advancing our understanding of the effects of music on structural and functional brain network connectivity. Ultimately, the supportive evidence found from the application of these techniques may prove useful for a host of neurological questions and neurorehabilitation avenues surrounding musical experiences and the brain.

Overview of Network Science Network-based approaches to the study of complex systems have become ubiquitous in a wide variety of research areas (Barabási & Albert,  1999; Newman,  2003; Watts & Strogatz, 1998). Steeped in the mathematical foundation of graph theory (Euler, 1736), network methods have led to a greater understanding of the interactions between components in systems as disparate as social networks, biological systems, communication arrays, and transportation networks (Barabási, 2002; Newman, 2003; Watts, 2003; Watts & Strogatz, 1998). In addition, the fields of neuroscience and neuroimaging have greatly benefited from a network science approach (Bassett & Sporns, 2017; Stam & Reijneveld, 2007). Studying the brain as a complex system presents an opportunity to understand how structural and functional features contribute to dynamic mental phenomena of the brain. Importantly, network-based methods move opportunities in experimental design forward and progress beyond correlation analyses of neuroimaging data by providing more advanced statistical measures to evaluate whole brain connectivity (Bassett & Bullmore, 2006; Bullmore & Sporns, 2009; Fox, Zhang, Snyder, & Raichle, 2009; Sporns, Chialvo, Kaiser, & Hilgetag, 2004; Sporns, Tononi, & Kötter, 2005). Here, the brain is subdivided into regions (represented as network nodes) and interregional interactions (represented as network edges) estimated from structural or functional neuroimaging modalities, including functional magnetic resonance imaging (fMRI), electroenceph­ alography (EEG), diffusion tensor imaging (DTI), and magnetoencephalography (MEG) (Friston, Frith, Turner, & Frackowiak, 1995; Logothetis, 2008; Stam & Reijneveld, 2007; Tuch, Reese, Wiegell, & Wedeen, 2003; Wedeen, Hagmann, Tseng, Reese, & Weiskoff, 2005). Recent advances in a network-based understanding of the brain have dramatically changed from conventional approaches of more traditional brain-activation focused experiments and former statistical analysis methods of neuroimaging data (Savoy, 2005; Shirer, Ryali, Rykhlevskaia, Menon, & Greicius, 2012). Now, rather than trying to understand brain function through isolated areas of brain response activation, researchers are

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

124  robin w. wilkins  able to explore neurological responses throughout the entire brain, as an interconnected system. This knowledge, that the brain is a complex system, is transforming our more traditional understanding of the brain (Bassett & Sporns,  2017; Betzel et al.,  2012). Approaching the brain as a system presents an opportunity to uncover patterns in interregional interactions that are not apparent with conventional neuroimaging approaches to experimental design and analysis methods (Bassett, Khambhati, & Grafton,  2017; Bassett & Sporns, 2017; He & Evans, 2010; Sporns et al., 2005). This is specifically advantageous to questions surrounding music and brain imaging research. Unlike conventional neuroimaging analyses, a focal impetus behind network-based analyses arises from the hypothesis that a network approach provides for a more accurate representation of the brain as an interconnected system, an organizational property that is often overlooked in more conventional neuroscientific approaches (Telesford, Simpson, Burdette, Hayasaka, & Laurienti, 2011). Perhaps more importantly, network methods allow for a statistically principled investigation of different brain states and neurological disorders under a common representational framework (Bassett & Bullmore,  2009; Moussa et al., 2011; Sporns et al., 2004). Network-based methods not only refine the outcomes of existing techniques, but also typify a paradigm shift for representing the brain’s structure and functional connectivity dynamics. This approach offers quantitatively different maps, where networks, consisting of nodes (e.g., voxels of neurons or brain regions) and links (e.g., anatomical or functional connections) are endowed with topological properties. Studying the brain at these various levels has led to the emergence of substantial evidence from the newer field of network neuroscience, a now firmly established brain-based scientific frontier (Bassett & Sporns, 2017). Within the brain, music affects an intricate set of complex neural processing systems (Alluri et al., 2012, 2013; Koelsch, 2009; Schlaug, 2001, 2009a; Thaut, Demartin, & Sanes, 2008; Wilkins, 2015; Wilkins, Hodges, Laurienti, Steen, & Burdette, 2012, 2014; Zatorre, Evans, Meyer, & Gjedde, 1992). These include structural components associated with sensory processing as well as functional elements implicated in memory, cognition, and mood fluctuation. Because music affects such diverse systems in the brain, it is an ideal candidate for analysis using a network-based approach (Guye, Bettus, Bartolomei, & Cozzone, 2010; Wilkins, 2015). A network approach represents a conceptual revolution beyond standard statistical approaches by bringing together researchers from a variety of disciplines to work on complex problems that defy understanding through confinement within any single discipline (West, 2011). With recent technological and analytical advances, we are witnessing an explosion in the quantity of network data and the subsequent comprehensiveness of information gleaned by generating network-based maps of complex systems’ data at each spatiotemporal scale. Importantly, network-based methods offer a natural mathematical framework that not only refine the outcomes of existing statistical analysis techniques, but also typify a paradigm shift for representing complex systems’ structure and dynamics. Extrapolating new and highly-detailed information may now be found within the intricacies of complex systems (Mitchell, 2009; Strogatz, 2001). Consequently, new and rewarding solutions are being obtained to address problems important to society

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   125 (Wang, González, Hidalgo, & Barabási, 2009; West, 2011). For readers interested in learning more about the emerging area of networks, the book Linked gives a user-friendly account of developments in the study of networks (Barabási,  2002). In addition, Six Degrees offers a sociologist’s view of historical discoveries, both old and new (Watts, 2003).

Introduction to Network Metrics As a field of interdisciplinary statistical physics, network science provides a host of robust statistical techniques and methods for investigating the structure and function of complex systems that display behaviors that defy explanation by the study of the systems’ elements in isolation (Barabási & Albert, 1999; Girvan & Newman, 2002). Network science is based on the branch of mathematics called graph theory (Euler,  1736; Newman, 2003). A graph is simply a mathematical representation of any real-world network that is made up of interconnected elements. In its most basic form, a graphed network is a collection of points, referred to as vertices or nodes connected together by lines as links or edges (see Fig. 1). A simple graph is a set of nodes that has a set of edges. Nodes represent the fundamental elements of the system, such as people, and the edges represent the connections between the pairs of nodes, such as friendships between pairs of people. Thus, a network is basically defined as a set of nodes or vertices where the connections between them are measured as links or edges. It is important to note that networks can be either directed or undirected, depending on the type of network and the data provided. Undirected networks are networks where information is passed to and from any given node in no particular or specific flow pattern. Directed networks, on the other hand, imply that information flows in a unilateral direction. Finally, networks can be weighted or unweighted, depending on the choice of the type of network. For a more detailed discussion, see Newman (2006). The most primary network metric is degree. Within a network, the degree of a node is simply the number of connections the node has to other nodes within the rest of the ­network (Bullmore & Sporns, 2009; Strogatz, 2001). The degrees of all the nodes within

11

12

6 10

13

5 1

4

8 7 9

2 3

Figure 1.  Demonstration of a network. This network is comprised of 13 nodes. Nodes are shown as numbered circles. Nodes are connected to other nodes within the network by edges or links (shown as connecting lines).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

126  robin w. wilkins  7

2 5

1 3

8 9

6 4

12 11

10

Figure 2.  Demonstration of the Network Statistic Degree. This figure depicts the degree of a network node. In this network, Node 9 has edge connections to four other nodes in the network. Thus, Node 9 has a degree of four. Note that Node 9 also connects to Nodes 6, 7, 8, and 11 within the network but does not connect to Node 10 or Node 12.

the network form a degree distribution (Amaral, Scala, Barthelemy, & Stanley, 2000). In random networks, where all connections are equally possible, the degree distribution is typically Gaussian (i.e., normal) with a symmetrically centered distribution. Complex networks, on the other hand, generally result in a non-Gaussian degree distribution with a long tail toward high degree nodes. In the depiction of the network shown in Fig. 2, nodes are connected by links. Node 9 has edges or links that connect it to four other nodes within the network. Thus, the node in Fig. 2 has a degree of four. The connection links, or path length, is calculated by measuring the minimum number of edges information must pass through, when going from one node to another node, on its way to its final node destination within the network. The path length measurement can be compared to a similar network with the same number of nodes and the probability of a randomly generated set of connection links within the same network. Thus, in any network collection of nodes, the degree of the collection can be compared to the degree that might occur in a randomly connected network of the same size or density (i.e., the total number of nodes within the network). In Fig. 3, we can see that nodes within a network can have the equal probability of connecting to each and every other node within the network. If all nodes in the network connect to all the other possible neighboring nodes, we would say that the network is regular (i.e., completely connected). If, on the other hand, we investigated the possibility of the connections of a node within a random network, we would see a different result. In random networks, all degree connections are equally probable, resulting as a Gaussian degree distribution. As shown in Fig. 3, in the regular network we can see that each node is connected to each and every other neighboring node, but does not have long-range connections to nodes across the network. The regular network is considered completely connected. However, in a random network, node connections are arbitrary. Thus, in contrast to both the regular and the random network, the small-world network depicted in the center of Fig. 3 shows that most nodes connect to neighboring nodes. However, this network has a few nodes with long-range connections to other network nodes. Thus, while the regular network has a lot of node-to-nearest-neighbor connections, the small-world

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   127

Regular

Small-World

Random

Figure 3.  Depiction of three networks: regular, small-world, and random. This figure demonstrates differences in connections within three networks that have the same number of nodes. The regular network has connections with all neighboring nodes but no long-range connections. The random network has haphazard connections throughout the network. In contrast, the small-world network has primarily nearest neighbor connections but also some long-range connections across the network. This is referred to as the “small-world” effect. Small-world networks have been revealed to be a property of the brain.

network also has a few distinct long-range nodal connections that, in turn, generate close proximity through direct connectivity (Amaral et al., 2000). These direct connections are found regardless of node location (i.e., regional proximity). This phenomenon of a “small-world” effect is a widely recognized characteristic of complex brain networks (Bassett & Bullmore, 2006; Watts & Strogatz, 1998). In random networks, all node degree connections are possible. In most complex systems however, high degree nodes tend to connect to other high degree nodes. In other words, the network does not scale regularly. A scale-free network is a network where the degree distribution follows a power law. Thus, in complex systems, rather than high degree nodes exhibiting random connection to any particular node, high degree nodes tend to self-select by connecting to other high degree nodes and therefore generate a non-Gaussian distribution that is scale-free. Intuitively, when considered as a characteristic framework for understanding the brain, this makes sense. The brain selectively utilizes its high degree connections as resources in an efficient fashion in order to coordinate a host of widely distributed system-level functions. These complex networks are termed “scale-free.” To recap, nodes in complex systems, such as the brain, generally have a non-Gaussian degree distribution, often with a long tail toward a high degree. Complex brain networks exhibit characteristics of small-world networks where nodes tend to connect to other nodes in disparate regions of the network (Bullmore & Sporns, 2012). Finally, the degree distributions of nodes in complex networks are scale-free and follow a power law (Barabási & Albert, 1999). If the nearest neighbors of a node are also directly connected to each other they form a cluster (Watts & Strogatz, 1998). Nodes that tend to cluster are considered hubs (see Fig. 4). As the term implies, hubs function as connection “interchanges” within the network. The clustering coefficient quantifies the number of connections that exist between the nearest neighbors of a node as a proportion of the maximum number of possible connections. Random networks have a low average clustering whereas complex networks

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

128  robin w. wilkins  11

12

6 10

13

5 1

4

8 7 9

2 3

Figure 4.  Demonstration of a hub. Node 7, shown as a darker circle, is central to all the other nodes in the network and is therefore a hub. Note that Node 7 has a degree of five, but due to its high centrality, Node 7 is also considered a hub within the entire network.

typically have high clustering. Those nodes with high degrees, as hubs, are considered central to the network and can demonstrate their importance to the overall functioning brain network. This is important when considering application to the brain. Understanding brain function, and how it may be structurally or functionally altered or remediated via musical experiences, has important implications for understanding the effects of music and musical training as well as treating a variety of neurological conditions and disorders (El Haj, Fasotti, & Allain, 2012; Hodges & Wilkins, 2015; Hyde et al., 2009; Schlaug, 2009a; Wilkins, 2015; Wilkins et al., 2012, 2014; Wilkins et al., 2018; Wong, Skoe, Russo, Dees, & Kraus, 2007). Hubs are part of a class of network measurements termed centrality. Centrality analysis measures how many of the shortest paths between pairs of nodes information must pass through on its way to its final destination within the network (Zuo et al., 2011; see Fig. 4). Presently, centrality measures are currently an active and ongoing area of research and there are several specific mathematical approaches to calculating unique characteristics of centrality metrics in the brain including: betweenness, eigenvector, and leverage centrality, among others (Borgatti, 2005; Joyce, Laurienti, Burdette, & Hayasaka, 2010; Newman, 2005). In concept, centrality functions like highway interchanges or subway “transfer-stops” by calculating those nodes, as central hubs, that play an important functional role in the network. A node with high centrality, as a hub, is considered crucial to the network. As one could envision in Fig. 4, if the central hub is damaged or removed the network will become fragmented and communication across the network will be affected accordingly. Conversely, yet perhaps equally enticingly, if a hub were able to be restored or trained, there would be functional implications as well. Evidence indicates that the function of a complex network requires the maintenance of specific hubs that have high degree connections as node clusters. These hubs, import­antly, are not necessarily adjacent and may be located in widely distributed brain regions (Bullmore & Sporns, 2012). Presently, there are provincial hubs that have high withinmodule degree and low participation coefficient as well as connector hubs with a high participation coefficient. However, the most widely accepted metric currently substantiated in the brain imaging literature is the “rich club,” those regions with densely interconnected connector hubs (Bullmore & Sporns, 2012). The selection and removal of a few

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   129 critical nodes that are hubs can inflict havoc and potentially dismantle the entire functional or structural network (Albert, Jeong, & Barabási, 2000). Again, this has implications for the brain. Evidence from network neuroscience has exposed how the brain’s network resilience to attack helps protect its fragility and potential vulnerabilities. Damage within brain regions or specific trauma to particular brain network hubs would likely have impact on the brain functional network. Conversely, if external stimuli such as music or experiences in musical training can potentially re-route connections to specific hubs in brain regions important for healthy brain function, or even temporarily restore hub connections within traumatized regions, research suggests the brain may experience enhanced or therapeutic functional results (see Fig. 7) (Raglio et al., 2015; Sachs, Ellis, Schlaug, & Loui,  2016; Shirer et al.,  2012; Sihvonen et al.,  2017; Thaut et al., 2009; Wilkins et al., 2012, 2014). This would also be demonstrated in related functional brain concepts within the neuroimaging literature such as neuroplasticity, neurorestoration, and neurorehabilitation (Herholtz & Zatorre, 2012; Kraus & Chandrasekaran, 2010; Schlaug, 2009a, 2009b; Zatorre & Samson, 1991). Assortativity is the correlation between the degrees of connected nodes. Positive assortativity indicates that high degree nodes tend to preferentially self-select to connect with other high degree nodes. Again, these degree distributions, where high degree nodes connect to other high degree nodes, result in the “small-world phenomenon” (Barabási & Albert, 1999; Watts & Strogatz, 1998). A negatively assortative network, on the other hand, indicates that high degree nodes tend to connect to low degree nodes. Community structure is a network metric for the measurement of the interconnectedness of nodes within a network (Newman & Girvan, 2004). Somewhat similar in concept to the partition approach when similar types of houses can be mapped into local nearby geographic sections or neighborhoods, community structure measures the topological configuration of the network by partitioning the network to calculate those nodes that exhibit and share more inner connections than outer node connections (see Fig. 5). Community structure analysis is performed by creating non-overlapping collections of highly interconnected nodes, or “modules” of nodes, that are statistically more connected to each other than to other nodes within the overall network (Girvan & Newman, 2002; Newman & Girvan, 2004). Modules are subsets of strongly connected nodes within the brain network. Modularity is defined as the quality of a particular partition of the network into modules (Newman & Girvan, 2004). Computationally, modularity (often referred to as Q) reflects the number of links between nodes within a module minus what would be expected given a random distribution of links between all nodes regardless of modules. This value varies from 0 to 1, with a higher value reflecting stronger community structure. In brief, in order to calculate the consistency of modular organization across time, the networks are first partitioned into distinct modules (i.e., separate communities) using a choice of algorithm approaches such as those found in Blondel, Guillaume, Lambiotte, and Lefebvre (2008), among others. These methods include optimization algorithms for modularity analysis that operate by identifying, through an iterative process, partitions of the network into subsets of highly connected nodes compared to other connected nodes’ modularity. In community structure detection

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

130  robin w. wilkins  11 1

11

12

9

1

5

6 3 13

12

10

5

6 3

9

13

8

15

14

2

13

14

11 12

8

15

14 5

2

1

3 4

6

9

8 7

2

10

4

4 7

15

10

7 Network

Community Detection

Three Communities

Figure 5.  Community Structure. This figure depicts how a network (left panel) can be a­ nalyzed into separate communities. Community structure is a statistical detection procedure that measures those nodes that exhibit more highly interconnected nodes, compared to other nodal connections within the network. This network has three sub-graphed communities (shown in green, red, and blue circles, middle and right panels). Notice that each community is still sparsely connected, through connector hubs, to other nodes that are in other communities. Communities can be highly connected despite their spatial or regional proximity within the brain. Community structure is a statistic that is also referred to as “modularity.”

procedures, the brain network is partitioned through multiple iterations, as repetitive calculations to detect which subdivisions throughout the entire network have modules that result in the maximum number of within-group edges and the minimum number of between-group edges (Newman & Girvan, 2004). Community detection procedures are computationally intensive and are impacted by the choice of node parcellation schemes. In addition, an atlas or region-of-interest (ROI) based network will necessarily be different from a voxel-based network, due to the size of the network and node selection. Robust network partitioning into modules requires partitioning the individual network into modules across multiple iterations, in order to capture the most representative modular structure (Blondel et al.,  2008; Fortunato, 2010; Newman, 2006). In an effort to calculate module comparisons based on groups of people or different conditions, datasets from groups of people or conditions can be further strengthened through the application of an additional statistical procedure termed Scaled Inclusivity (Steen, Hayasaka, Joyce, & Laurienti, 2011). Scaled Inclusivity takes into account each subject’s modules and then cross-compares it to each and every other person’s modules to determine which subject’s modules are most representative of the group (Stanley et al., 2013; Steen et al., 2011; Wilkins et al., 2014). Importantly, scaled inclusivity also accounts for the negative (absence) of a node within each person’s module and thus “scales” the calculation accordingly. Again, there are several different community detection procedures that divide the functional subsets within the network across the brain topology and are measured through several different

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   131 optimization procedures (Blondel et al.,  2008; Fortunato,  2010; Mucha, Richardson, Macon, Porter, & Onnela, 2010). Community structure analysis, that calculates nodes that share connections with each other as non-overlapping groups, is also called modularity (Newman, 2006). In closing this introductory section on network methods, there are a host of robust statistical graph theory approaches that can be used to describe networks that are beyond the scope of this chapter including, but not limited to: multiplex, multilayer, multislice, multitype, hierarchial, multiweighted, interacting, interdependent, and coupled networks. For a complete review of fundamental brain network measurements, see Rubinov and Sporns (2010). In summary, there are numerous network-based metrics that can be applied to brain imaging data. In any network, there can be different—yet potentially equally informative— measurements about the components of a network. These graph theory techniques account for characteristics of the network by measuring specific components and their unique interactions (Telesford et al., 2011). The choice of nodes for network generation frequently varies from study to study. It is important to stress that the choice of node parcellation scheme and procedure is key for understanding the robustness of a particular network and subsequent results. Research has substantiated that voxel-based brain imaging networks differ substantially from region or atlas based networks in terms of choice of nodal parcellation (Cohen et al., 2008; Craddock, James, Holtzheimer, Hu, & Mayberg, 2012; Hayasaka & Laurienti, 2010; Mumford et al., 2010; Stanley et al., 2013). Depending on the type of imaging modality (e.g., fMRI, EEG, DSI, DTI, or MEG), the choice of node parcellation scheme(s) and approach to the actual node selection will, necessarily, be different. Currently, there is an absence of a fully agreed upon approach to node selection and studies can range from single neuron to voxel-based as well as brain regions-of-interest primarily determined by the neuroimaging literature brain atlases (Craddock et al., 2012; Power et al., 2011; Stanley et al., 2013; Wang, Zuo, & He, 2010). This inherently alters how the connectivity results and analyses are interpreted. A brain network comprised of a 90-node network is obviously going to be different than network-based statistics performed on a 21,000 voxel-based network. The means of node selection in brain networks largely determines the subsequent neurobiological interpretation of the network results (Butts, 2009). Readers are again encouraged to determine whether research reports have selected nodes based on results from previous neuroimaging literature a priori, somewhat like a predefined seek-and-search, which may eliminate important information before the results and analyses are performed, or whether the brain network and statistics were generated without biases a priori and the subsequent analyses performed without prior intentional selection toward findings in any particular region or specific area of the brain. Again, neither is necessarily “better” than the other, but it is certainly worth making the distinction as the field of music and brain connectivity research moves forward. In closing, this section highlights the fundamental graph theory metrics from network science. Each network-based statistic provides a different layer of information that leads to a fuller understanding of brain connectivity.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

132  robin w. wilkins 

Generating Brain Networks: Steps for Network-Based Neuroimaging Analysis Generating a brain network requires multiple processing steps for analyses. In brief, functional magnetic resonance imaging (fMRI) or other neuroimaging data (EEG, MEG) are acquired. Once the data are acquired, several statistical procedures are applied to prepare the data for network analysis. These procedures are typically performed as data preprocessing steps but are frequently reported under the data processing section within peer-reviewed research reports. The preprocessing of fMRI data involves skull stripping of the acquired neuroimaging data (i.e., revealing the brain only) and the application of several imperative statistical procedures that include motion correction, slice timing correction, realignment, co-registration of structural and functional images, normalization, and smoothing. An excellent explanation of the statistical techniques used on fMRI data may be found in Lindquist et al. (2018). Processing fMRI data for network-based analysis is only performed after completion of the preprocessing and correlation procedures through a series of statistical steps via command line data processing. There are several fMRI data processing applications available online such as the Free Software Library (FSL), AFNI, FreeSurfer, Diffusion Analysis and Tracula, and Statistical Parametric Mapping (SPM). Brain network generation and analysis is currently an active area of research. Due to this fact, network-based analyses include emerging procedures and new statistical methods that are being created and applied, with new results being published regularly. Rather than performing the more conventional connectivity analyses, for generating brain networks (i.e., graph-theory based networks) subsequent to the data processing phase, several more advanced statistical procedures are needed in order to achieve actual graph-theory based network generation and analysis. Due to the high computational load, network procedures and approaches as well as most state-of-the-art network processing and analyses are still managed through various in-house data processing scripts, typically in UNIX/LINUX, matlab, and/or python computing languages. However, there are several useful network toolkits and software applications that are freely available including The Brain Connectivity Tool Box, the Functional Connectivity (Conn) Tool Box, and GraphVar (Kruschwitz, List, Waller, Rubinov, & Walter, 2015; Rubinov & Sporns, 2010; Whitfield-Gabrielli & Neito-Castanon, 2012). Due to the nature of network neuroscience as an emerging field in brain science, there is also the option of developing new statistical network measurements and approaches, including more advanced computer scripts, that apply to specific procedures or statistical analyses. At present, most of these are created for a new network property or for comparing different properties. This process will continue as the field grows and will certainly further advance our understanding of both structural and functional brain networks in terms of cognition and perception, in addition to neurological health and disease. These newer network analyses statistics and algorithms are typically published in methods sections and are frequently reported under methods as “in-house” processing scripts, many times in the supplemental methods section of a peer-reviewed publication. It is quite

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   133 common for new network statistics and in-house processing scripts to be employed for working with fMRI data for network analysis. Thus, apart from the aforementioned network statistics, the field remains to be defined fully in terms of which newer network methods are considered sufficiently robust as “gold standards.” Again, researchers are cautioned that this is particularly true for node parcellation and node choice selection (Stanley et al., 2013). For any network analysis, once the fMRI data have been processed, a connectivity matrix must be generated. In brief, for connectivity analysis (often referred to as “functional connectivity”), a cross-correlation procedure is applied between each node and each and every other node. Current neuroimaging technology limits functional brain network analysis to nodes above the millimeter scale, meaning that many potentially interacting neurons and synapses will be represented as individual nodes in human brain networks. Once the cross-correlation (i.e., the connectivity) matrix is generated, a thresholding statistic is applied to the data. A set of statistical thresholding procedures are performed on each correlation so that the resulting matrix can be binarized to reveal the strongest connections in the network. Thresholding is currently another active area of network research (Van den Heuvel et al., 2017). Thresholding intuitively eliminates at least some of the brain network connections. Correlation matrices can be measured through thresholding iterations across all possible data points, from 0.01 to 1. Indeed, thresholding procedures have been applied across varying data points and examined for their robust characteristics. For example, having too high a threshold (e.g., 0.95 or 1.0) necessarily includes all correlation connections (exceedingly strong and very weak). Thus, the results yield of the thresholded matrix is not informative. However, results reveal that similarly sized networks show less inter-subject network fragmentation with thresholds set at 0.2, 0.25, or 0.3. There are currently several different statistical approaches for thresholding procedures that are not inconsequential including proportional, relative, and absolute, among others (Van den Heuvel et al., 2017). Researchers interested in reading more about different consequences that may result from varying threshold statistical approaches will find more detailed information in Van Wijk, Stam, and Daffertshofer (2010) and Van den Heuvel et al. (2017). Again, the goal of thresholding the correlation matrix is to preserve the strongest connections and density of the network. Additionally, thresholding procedures are implemented to prevent excessive fragmentation and inadvertent insertion of randomness into the data, while simultaneously eliminating the weaker connections. All thresholding is performed on the connectivity matrices prior to applying any graph theory statistics for network-based analyses. The result from this thresholding procedure is considered a widely accepted and most fundamental step prior to any network-based analysis. The results of thresholding procedures reveal the adjacency matrix (Aij). It is important to note that, unlike typical correlation analyses of resting-state data with music as functional connectivity analyses (e.g., intrinsic connectivity, radial connectivity) oftentimes reported in the brain imaging literature, all advanced network-based statistics and analyses are performed on the adjacency matrix data. Thus, the choices of parcellation scheme in terms

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

134  robin w. wilkins  of node selection and thresholding procedures are critical for examining brain networks. A current lack of an agreed upon approach to node selection has led to the analysis of functional brain networks across an extensive range of scales. While individual neurons may be considered as nodes, this has only been successful for more simplistic networks, such as the C. Elegans (Sporns & Kötter, 2004; Towlson, Vertes, Ahnert, Schafer, & Bullmore, 2013). It is still not currently possible to non-invasively image or computationally analyze the brain’s estimated 100 billion neurons each one with ∼7,000 synapses (Stanley et al., 2013). Presently, a comprehensive and unanimously agreed upon nodal definition is still outstanding, making the selection of node options one of the more central challenges in network analyses of neuroimaging data (Stanley et al., 2013). Again, readers are encouraged to note that not all connectivity approaches reported in the neuroimaging literature are mathematically or statistically interchangeable in their approaches. While prevalent brain connectivity literature employs correlation

Correlation Matrix

Adjacency Matrix

Signal

FMRI Time Series

Signal

Time

Time

Time

Functional Network

Modularity Analysis

Figure 6.  Processing stream for brain network analysis. Functional time series are correlated and then binarized through thresholding procedures to create an adjacency matrix, representing the strongest connections between every possible pair of nodes. The adjacency matrix is subsequently mapped onto brain space following network-based statistical analyses. For network analysis, functional magnetic resonance imaging (fMRI) data is processed in multiple steps through what is typically referred to as a pipeline. Reproduced from Wilkins (2015).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   135 ­ rocedures, network-based (graph theory) connectivity methods stem from the field p of network science. A full explanation of the technical and statistical steps used in brain imaging is found in the wide set of fMRI literature, although several articles highlight components of these techniques and network region-of-interest or voxel-based network comparisons (Hagberg, Schult, & Swart, 2008; Hayasaka & Laurienti, 2010). A complete review of statistics for fMRI data can be found in Lindquist et al. (2018). Fig. 6 is a pictorial description of a more typical data processing stream and network generation pipeline. The pipeline depicted here is for fMRI data. Each of these steps must be performed before any network-based statistics can be applied to individual datasets and any networkbased statistical comparisons can be made across groups of people. In summary, in terms of some of the broader categories of network statistical properties and their role in the analyses of the overall brain network (Rubinov & Sporns, 2010), there are particular metrics useful for brain segregation, integration, and influence. Examples of segregation of brain networks include clustering, motifs, and community structure or modularity. Integration of brain networks includes distance, path length, and efficiency measures, among others, while influence includes network metrics of degree, participation, and betweenness (Bassett & Sporns, 2017; Bullmore & Sporns, 2009). Thus, neuroimaging investigators are cautioned, in regard to network-based brain imaging studies with music, to try to select the most robust categories of node measures for network statistics and each imaging modality as possible to avoid spurious results.

Implications for Music and Brain Research Since the original network-based investigation into the effects of music on the brain, “Network Science: A New Method for Investigating the Complexity of Musical Experiences in the Brain” (Wilkins et al., 2012), that paper and those that followed have generated new insight into how and why music affects network-based functional and structural brain connectivity using EEG, DTI, DSI, and fMRI data (Fauvel et al., 2014; Hodges & Wilkins, 2015; Karmonik et al., 2016; Koelsch, Skouras, & Lohmann, 2018; Liu, Abu-Jamous, et al., 2017; Liu, Brattico et al., 2017; Wilkins, 2015; Wilkins et al., 2014; Wu et al., 2012; Wu, Zhang, Ding, Liu, & Zhou, 2013). The evidence resulting from a network-based approach to the brain (Bassett & Bullmore, 2006; Bassett & Sporns, 2017; Bullmore & Sporns, 2009) provides us with substantial confirmation that network neuroscience not only advances our understanding of the brain, but simultaneously holds promise for new understandings regarding the effects of music and musical training on structural and functional brain networks in both neurological health and disease as well as various compromised and functional brain states (Bigand et al., 2015; Blum et al., 2017;

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

136  robin w. wilkins  Fauvel et al., 2014; Gaser & Schlaug, 2003; Greicius, 2008; Gusnard, Akbudak, Shulman, & Raichle, 2001a, 2001b; Karmonik et al., 2016; Koelsch et al., 2018; Magee, Clark, Tamplin, & Bradt, 2017; Moussa et al., 2011; Raglio et al., 2015; Sihvonen et al., 2017; Wilkins et al., 2018; Wu et al., 2013). Network neuroscience presents opportunities in experimental designs previously beyond the scope of classic neuroimaging analyses (i.e., “one region-one behavior”). While conventional activation-style designs for traditional experimental neuroimaging research are still valid, being able to pursue questions about the brain’s entire system in a statistically principled manner presents an opportunity to advance our understanding of music and the brain. Newer evidence suggests that music may provide a means to affect information flow in the brain network (Karmonik et al., 2016) as well as changes in functional measures that accompany gray matter volume changes from musical expertise (Fauvel et al., 2014). Results reveal the brain functional network responds to preferred music listening by creating communities within pivotal regions of the default mode network, a region widely accepted to be important to self-reflective and mind wandering processes important for brain function (Wilkins et al., 2014) and that a favorite song can spontaneously separate the functional network into distinct communities between the auditory cortex and the hippocampus, a region recognized for memory encoding. Dynamic functional connectivity analyses of data collected while people were listening to stimuli of continuous music previously suggested to influence anxiety and anger show significant measures of intrinsic connectivity within the salience network (Lindquist et al., 2018). More recent evidence suggests that whole brain responses to naturalistic music listening spontaneously alters the resting brain to stimulate significant hubs within attentional control regions of the anterior cingulate, highlighting how the network system may potentially optimize or restore aspects of neurological function by resourcing attentional circuit-breaker mechanisms (Wilkins et al., 2018). Compared to the brain at rest, network analyses also indicate a significant reduction in betweenness centrality within the amygdala during naturalistic music, a region implicated in emotional responses linked to anxiety and avoidance behaviors, suggesting a systems-level decrease in these affective responses while listening to ambient background music (Wilkins et al., 2018). Recent evidence also reveals that significant functional network characteristics of different auditory regions are exhibited during music-evoked emotional experiences of fear and joy (Koelsch et al., 2018). The substantial questions and promising potential surrounding the effects of brain responses to musical experiences have been, in many ways, outside the scope of previously available tools and the more conventional brain activation-based experimental approaches and analyses techniques. It is easy to understand how music and brain imaging investigators at all levels occasionally may have a sense of unease in dealing comprehensively with a network-based approach to music and the brain. Under such circumstances, it is tempting for neuroimaging scientists who are pursuing questions about music to remain within the confines of conventional activation analyses. A similar historical response can be found when neuroimaging scientists were first considering the connectivity of the Default Mode Network: “The suggested link between the

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   137 processing taking place at rest and its physiology is one that can have no direct relevance for neuroimaging” (Morcom & Fletcher, 2007, p. 1075; for a complete update on this commentary see also Raichle,  2001). This type of statement is arguably true, if one’s experimental music and brain horizons are limited to previous techniques and analyses in functional neuroimaging science. This chapter suggests, however, that such a finite agenda will be depleted eventually if not nourished by the broader implications and understanding of brain function that these emerging network science techniques may serve. In closing, the main objective of this chapter is to highlight the graph theory methods and network science evidence that persuades us towards complex systems thinking and the field of network neuroscience. While conventional approaches provide evidence of brain activation to music, a network-based approach takes a different perspective, which is that a full understanding of brain activity—including brain responses to musical experiences—critically depends on studying the brain as a complex system (Bassett & Sporns, 2017; Bullmore & Sporns, 2009; Wilkins, 2015) through the application of network (graph theory) techniques (Bassett & Bullmore, 2009). A network-based analysis provides us with statistical rigor to study detailed patterns of neural connections throughout the entire system of the brain. This approach can be applied to data collected while people are listening to continuous music, as well as comparing brain responses to different types of music and the brains of people with musical training (Fig. 7) (Wilkins et al., 2012). These complex connections, or brain networks, help reveal the architectural and functional scaffolding that ultimately illuminates the brain’s dynamic behaviors as robust statistical connectivity patterns, including the brain’s intrinsic (i.e., resting-state) activity that may be affected while listening to music including that present in the default mode network regions of the brain (Broyd et al., 2009; Raichle, 2001; Wilkins et al., 2014). Country

0.3

Rap

Classical

Degree

Rock

0.9

Unfamiliar

Figure 7.  Depiction of high degree hubs based on musical genre. Note the consistency of high degree hubs in the auditory regions while people (N = 21) listened to continuous classical music, in this case Beethoven’s 1st Symphony, Mvt. 1 London Symphony Orchestra. A 21,000 × 21,000 voxel-based matrix was used for the network-based statistical analyses. Reproduced from Wilkins et al. (2012, pp. 282–283). © 2012 by the International Society for the Arts, Sciences and Technology, published by MIT Press.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

138  robin w. wilkins  Although there are still fundamental questions about music and the brain that remain unresolved, network science offers key tools that hold promise for providing answers about complex systems in new ways. As the field continues to advance, network ­neuroscience and the study of brain connectivity, through network-based statistics, will expand new experimental and theoretical avenues for understanding how structural brain connectivity leads to dynamic brain function. The discussion in this chapter, in particular, illustrates how network-based approaches may advance fundamental questions surrounding the promising effects of music in neurological research and rehabilitation (Hodges & Wilkins, 2015; Kotchoubey, Pavlov, & Kleber, 2015; Thaut et al., 2008). As a computationally robust field, network neuroscience provides a new mathematical framework for investigating complex systems that goes beyond previously conventional approaches to experimental design and neuroimaging research. Methods from graph theory provide a robust, well-established framework for assessing brain connectivity, both locally and globally, offering a rigorous opportunity to expansively and non-invasively explore the entire human brain under whole brain activity experiences (Bullmore & Sporns, 2009; Rubinov & Sporns, 2010). Analyses can reveal patterns of both structural and functional brain connectivity. A network neuroscience approach provides unprecedented opportunities for examining the effects of musical experiences on the human brain. The methods and techniques presented here provide an opportunity for researchers to pursue questions that may further advance the field of music and brain research, deepening our scientific understanding surrounding the effects of music on the brain.

References Albert, R., Jeong, H., & Barabási, A.-L. (2000). Error and attack tolerance of complex networks. Nature 406(6794), 378–382. Alluri, V., Toivianen, P., Jaaskelainen, J. P., Glerean, E., Sams, M., & Brattico, E. (2012). Largescale brain networks emerge from dynamic processing of musical timbre, key and rhythm. NeuroImage 59(4), 3677–3689. Alluri, V., Toiviainen, P., Lund, T.  E., Wallentin, M., Vuust, P., Nandi, A.  K., . . . Brattico, E. (2013). From Vivaldi to Beatles and back: Predicting lateralized brain responses to music. NeuroImage 83, 627–636. Amaral, L. A., Scala, A., Barthelemy, M., & Stanley, H. E. (2000). Classes of small-world networks. Proceedings of the National Academy of Sciences 97(21), 11149–11152. Barabási, A.-L. (2002). Linked: The new science of networks. Cambridge, MA: Perseus Publishing. Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science 286(5439), 509–512. Bassett, D. S., & Bullmore, E. (2006). Small-world brain networks. Neuroscientist 12(6), 512–523. Bassett, D. S., & Bullmore, E. (2009). Human brain networks in health and disease. Current Opinion in Neurology 22(4), 340–347. Bassett, D. S., Khambhati, A. N., & Grafton, S. T. (2017). Emerging frontiers of neuroengineering: A network science of brain connectivity. Annual Review of Biomedical Engineering 19, 327–352.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   139 Bassett, D.  S., & Sporns, O. (2017). Network neuroscience. Nature Neuroscience 20(3), 353–364. Betzel, R. F., Erickson, M. A., Abell, M., O’Donnell, B. F., Hetrick, W. P., & Sporns, O. (2012). Synchronization dynamics and evidence for a repertoire of network states in resting EEG. Frontiers in Computational Neuroscience 6. Retrieved from https://doi.org/10.3389/ fncom.2012.00074 Bigand, E., Tillmann, B., Peretz, I., Zatorre, R. J., Lopez, L., & Majno, M. (Eds.). (2015). The neurosciences and music V: Cognitive stimulation and rehabilitation. Annals of the New York Academy of Sciences 1337. Biswal, B. B., Kylen, J. V., & Hyde, J. S. (1997). Simultaneous assessment of flow and BOLD signals in resting-state functional connectivity maps. NMR in Biomedicine 10(45), 165–170. Biswal, B. B., Yetkin, F. Z., Haughton, V. M., & Hyde, J. S. (1995). Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magnetic Resonance in Medicine 34(4), 537–541. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics 2008. Retrieved from https:// doi.org/10.1088/1742-5468/2008/10/P10008 Blum, K., Simpatico, T., Febo, M., Rodriquez, C., Dushaj, K., Li, M., . . . Badgaiyan, R. D. (2017). Hypothesizing music intervention enhances brain functional connectivity involving dopaminergic recruitment: Common neuro-correlates to abusable drugs. Molecular Neurobiology 54(5), 3753–3758. Borgatti, S. (2005). Centrality and network flow. Social Networks 27(1), 55–71. Broyd, S. J., Demanuele, C., Debener, S., Helps, S. K., James, C. J., & Sonuga-Barke, E. J. S. (2009). Default-mode brain dysfunction in mental disorders: A systematic review. Neuroscience & Biobehavioral Reviews 33(3), 279–296. Buckner, R. L., Andrews-Hanna, J. R., & Schacter, D. L. (2008). The brain’s default mode network: Anatomy, function, and relevance to disease. Annals of the New York Academy of Sciences 1124, 1–38. Bullmore, E., & Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10(3), 186–198. Bullmore, E., & Sporns, O. (2012). The economy of brain network organization. Nature Reviews Neuroscience 13(5), 336–349. Butts, C. T. (2009). Revisiting the foundations of network analysis. Science 325(5939), 414–416. Cohen, A. L., Fair, D. A., Dosenbach, N. U. F., Miezin, F. M., Dierker, D., & Van Essen, D. C. (2008). Defining functional areas in individual human brains using resting functional connectivity MRI. NeuroImage 41, 45–57. Craddock, R. C., James, G. A., Holtzheimer, P. E., Hu, X. P., & Mayberg, H. S. (2012). A whole brain fMRI atlas generated via spatially constrained spectral clustering. Human Brain Mapping 33(8), 1914–1928. El Haj, M., Fasotti, L., & Allain, P. (2012). The involuntary nature of music-evoked autobiographical memories in Alzheimer’s disease. Consciousness and Cognition 21(1), 238–246. Euler, L. (1736). Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae Scientiarum Imperialis Petropolitanae 8, 128–140. Reprinted and translated in N. L. Biggs, E. K. Lloyd, & R. J. Wilson, Graph Theory 1736–1936 (pp. 3–8). Oxford: Oxford University Press, 1976. Fauvel, B., Groussard, M., Chetelat, G., Fouquet, M., Landeau, B., Eustache, F., . . . Platel, H. (2014). Morphological brain plasticity induced by musical expertise is accompanied by modulation of functional connectivity at rest. NeuroImage 90, 179–188.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

140  robin w. wilkins  Fortunato, S. (2010). Community detection in graphs. Physics Reports 486(3–5), 75–174. Fox, M.  D., Snyder, A.  Z., Vincent, J.  L., Corbetta, M., Van Essen, D.  C., & Raichle, M.  E. (2005). The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proceedings of the National Academy of Sciences 102(27), 9673–9678. Fox, M. D., Zhang, D., Snyder, A. Z., & Raichle, M. E. (2009). The global signal and observed anticorrelated resting state brain networks. Journal of Neurophysiology 101(6), 3270–3283. Friston, K. J., Frith, C. D., Turner, R., & Frackowiak, R. S. (1995). Characterizing evoked hemodynamics with fMRI. NeuroImage 2(2), 157–165. Gaser, C., and Schlaug, G. (2003). Brain structures differ between musicians and non-musicians. Journal of Neuroscience 23(27), 9240–9245. Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99(12), 7821–7826. Greicius, M. (2008). Resting-state functional connectivity in neuropsychiatric disorders. Current Opinion in Neurology 21(4), 424–430. Greicius, M., Krasnow, B., Reiss, A. L., & Menon, V. (2003). Functional connectivity in the resting brain: A network analyses of the default mode hypothesis. Proceedings of the National Academy of Sciences 100(1), 253–258. Gusnard, D. A., Akbudak, E., Shulman, G. L., & Raichle, M. E. (2001a). Medial prefrontal cortex and self-referential mental activity: Relation to a default mode of brain function. Proceedings of the National Academy of Sciences 98(7), 4259–4264. Gusnard, D. A., Akbudak, E., Shulman, G. L., & Raichle, M. E. (2001b). Role of medial prefrontal cortex in a default mode of brain function. NeuroImage 13(6), S414. Guye, M., Bettus, G., Bartolomei, F., & Cozzone, P. (2010). Graph theoretical analysis of structural and functional connectivity MRI in normal and pathological brain networks. Magnetic Resonance Materials in Physics, Biology and Medicine 23(5–6), 409–421. Hagberg, A. A., Schult, D. A., & Swart, P. J. (2008). Exploring network structure, dynamics, and function using NetworkX. In G. Varoquaux, T. Vaught, & J. Millman (Eds.), Proceedings of the 7th Python in Science Conference (SciPy2008) (pp. 11–15). Pasadena, CA. Hayasaka, S., & Laurienti, P.  J. (2010). Comparison of characteristics between region- and voxel-based network analyses in resting-state fMRI data. NeuroImage 50(2), 499–508. He, Y., & Evans, A. (2010). A review of structural and functional brain connectivity. Current Opinion in Neurology 23(4), 341–350. Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Hodges, D. A., & Wilkins, R. W. (2015). How and why does music move us? Answers from psychology and neuroscience. Music Education Journal 101(4), 41–47. Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. Journal of Neuroscience 29(10), 3019–3025. Joyce, K. E., Laurienti, P. J., Burdette, J. H., & Hayasaka, S. (2010). A new measure of centrality for brain networks. PLoS ONE 5(8), e12200. Karmonik, C., Brandt, A., Anderson, J. R., Brooks, F., Lytle, J., Silverman, E., & Frazier, J. T (2016). Music listening modulates functional connectivity and information flow in the human brain. Brain Connectivity 6(8), 632–641. Koelsch, S. (2009). A neuroscientific perspective on music therapy. Annals of the New York Academy of Sciences 1169, 374–384.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   141 Koelsch, S., Skouras, S., & Lohmann, G. (2018). The auditory cortex hosts network nodes influential for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13(1), e0190057. Kotchoubey, B., Pavlov, Y. G., & Kleber, B. (2015). Music in research and rehabilitation of disorders of consciousness: Psychological and neurophysiological foundations. Frontiers in Psychology 6, 1763. Retrieved from https://doi.org/10.3389/fpsyg.2015.01763 Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8), 599–605. Kruschwitz, J. D., List, D., Waller, L., Rubinov, M. & Walter, H. (2015). GraphVar: A user-friendly toolbox for comprehensive graph analyses of functional brain connectivity. Journal of Neuroscience Methods 245, 107–115. Lindquist, K.  A., Pendl, S., Brooks, J.  A., Wilkins, R.  W., Kraft, R.  A., & Gao, W. (2018). Dynamic functional connectivity of intrinsic networks during emotions. NeuroImage. Under review. Liu, C., Abu-Jamous, B., Brattico, E., & Nandi, A. K. (2017). Towards tunable consensus clustering for studying functional brain connectivity during affective processing. International Journal of Neural Systems 27(2), 1650042. doi:10.1142/S0129065716500428 Liu, C., Brattico, E., Abu-Jamous, B., Pereira, C. S., Jacobsen, T., & Nandi, A. K. (2017). Effect of explicit evaluation on neural connectivity related to listening to unfamiliar music. Frontiers in Human Neuroscience 11, 611. Retrieved from https://doi.org/10.3389/ fnhum.2017.00611 Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature 453(7197), 869–878. Magee, W. L., Clark, I., Tamplin, J., & Bradt, J. (2017). Music interventions for acquired brain injury. Cochrane Database of Systematic Reviews 1, CD006787. doi:10.1002/14651858. CD006787.pub3 Mitchell, M. (2009). Complexity: A guided tour. Oxford: Oxford University Press. Morcom, A. M., & Fletcher, P. C. (2007). Does the brain have a baseline? Why we should be resisting a rest. NeuroImage 37(4), 1073–1082. Moussa, M. N., Vechlekar, C. D., Burdett, J. H., Steen, M. R., Hugenschmidt, C. E., & Laurienti, P. J. (2011). Changes in cognitive state alter human functional brain networks. Frontiers in Human Neuroscience 5, 1–15. Retrieved from https://doi.org/10.3389/fnhum.2011.00083 Mucha, P.  J., Richardson, T., Macon, K., Porter, M.  A., & Onnela, J.  P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980), 876–878. Mumford, J. A., Horvath, S., Oldham, M. C., Langfelder, P., Geschwind, D. H., & Poldrack, R. A. (2010). Detecting network modules in fMRI time series: A weighted network analysis approach. NeuroImage 52(4), 1465–1476. Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review 45, 167–256. Newman, M.  E.  J. (2005). Power laws, Pareto distributions and Zipf ’s law. Contemporary Physics 46(5), 323–351. Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103(23), 8577–8582. Newman, M.  E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E 69(2 Pt. 2), 026113.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

142  robin w. wilkins  Power, J. D., Cohen, A. L, Nelson, S. M., Wig, G. S., Barnes, K. A., Church, J. A., . . . Petersen, S. E. (2011). Functional network organization of the human brain. Neuron 72(4), 665–678. Raglio, A., Attardo, L., Gontero, G., Rollino, S., Groppo, E., & Granieri, E. (2015). Effects of music and music therapy on mood in neurological patients World Journal of Psychiatry 5(1), 68–78. Raichle, M. E. (2001). A default mode of brain function. Proceedings of the National Academy of Sciences 98(2), 676–682. Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity: Uses and interpretations. NeuroImage 52(3), 1059–1069. Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects aesthetic responses to music. Social Cognitive and Affective Neuroscience 11(6), 884–891. Savoy, R.  A. (2005). Experimental design in brain activation MRI: Cautionary tales. Brain Research Bulletin 67, 361–365. Schlaug, G. (2001). The brain of musicians. A model for functional and structural adaptation. Annals of the New York Academy of Sciences 930, 281–299. Schlaug, G. (2009a). Listening to music facilitates brain recovery processes. Annals of the New York Academy of Sciences 1169, 372–373. Schlaug, G. (2009b). Music, musicians, and brain plasticity. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (pp. 197–207). Oxford: Oxford University Press. Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white matter tracts of chronic aphasic patients undergoing intense intonation-based speech therapy. Annals of the New York Academy of Sciences 1169, 385–394. Shirer, W. R., Ryali, S., Rykhlevskaia, E., Menon, V., & Greicius, M. D. (2012). Decoding subject-driven cognitive states with whole-brain connectivity patterns. Cerebral Cortex 22(1), 158–165. Sihvonen, A.  J., Sarkamo, T., Leo, V., Tervaniemi, M., Altenmuller, E., & Soinila, S. (2017). Music-based interventions in neurological rehabilitation. The Lancet Neurology 16(8), 648–660. Sporns, O., Chialvo, D., Kaiser, M., & Hilgetag, C. (2004). Organization, development and function of complex brain networks. Trends in Cognitive Sciences 8(9), 418–425. Sporns, O., & Kotter, R. (2004). Motifs in brain networks. PLoS Biology 2(11), e369. Sporns, O., Tononi, G., & Kötter, R. (2005). The human connectome: A structural description of the human brain. PLoS Computational Biology 1(4), e42. Stam, C.  J. (2014). Modern network science of neurological disorders. Nature Reviews Neuroscience 15, 683–695. Stam, C. J., & Reijneveld, J. P. (2007). Graph theoretical analysis of complex networks in the brain. Nonlinear Biomedical Physics 1, 3. doi:10.1186/1753-4631-1-3 Stanley, M. L., Moussa, M. N., Paolini, B. M., Lyday, R., Burdette, J. H., & Laurienti, P. J. (2013). Defining nodes in complex networks. Frontiers in Computational Neuroscience 7, 169. Retrieved from https://doi.org/10.3389/fncom.2013.00169 Steen, M., Hayasaka, S., Joyce, K., & Laurienti, P. (2011). Assessing the consistency of community structure in complex networks. Physical Review E 84(1–2), 016111. Strogatz, S. H. (2001). Exploring complex networks. Nature 410(6825), 268–276. Telesford, Q. K., Simpson, S. L., Burdette, J. H., Hayasaka, S., & Laurienti, P. J. (2011). The brain as a complex system: Using network science as a tool for understanding the brain. Brain Connectivity 1(4), 295–308.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

network neuroscience   143 Thaut, M. H., Demartin, M., & Sanes, J. N. (2008). Brain networks for integrative rhythm formation. PLoS ONE 3, e2312. Thaut, M. H., Gardiner, J. C., Holmberg, D., Horwitz, J., Kent, L., Andrews, G., . . . McIntosh, G. R. (2009). Neurologic music therapy improves executive function and emotional adjustment in traumatic brain injury rehabilitation. Annals of the New York Academy of Sciences 1169, 406–416. Towlson, E., Vertes, P. E., Ahnert, S., Schafer, W. R., & Bullmore, E. T. (2013). The rich club of the C. elegans neuronal connectome. Journal of Neuroscience 33(15), 6380–6387. Tuch, D. S., Reese, T. G., Wiegell, M. R., & Wedeen, V. J. (2003). Diffusion MRI of complex neural architecture. Neuron 40(5), 885–895. Van den Heuvel, M. P., de Lange, S. C., Zalesky, A., Seguin, C., Yeo, B. T. T., & Schmidt, R. (2017). Proportional thresholding in resting-state fMRI functional connectivity networks and consequences for patient-control connectome studies: Issues and recommendations. NeuroImage 152, 437–449. Van Wijk, B. C., Stam, C. J., & Daffertshofer, A. (2010). Comparing brain networks of different size and connectivity density using graph theory. PloS ONE 5, e13701. Wang, J., Zuo, X., & He, Y. (2010). Graph-based network analysis of resting-state functional MRI. Frontiers in Systems Neuroscience 4, 16. Retrieved from https://doi.org/10.3389/ fnsys.2010.00016 Wang, M., González, C. A., Hidalgo, & Barabási, A.-L. (2009). Understanding the spreading patterns of mobile phone viruses. Science 324(5930), 1071–1076. Watts, D. J. (2003). Six degrees: The science of a connected age. New York: W. W. Norton. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature 393(6684), 440–442. Wedeen, V. J., Hagmann, P., Tseng, W. Y., Reese, T. G., & Weiskoff, R. M. (2005). Mapping complex tissue architecture with diffusion spectrum magnetic resonance imaging. Magnetic Resonance in Medicine 54(6), 1377–1386. West, B. J. (2011). Overview 2010 of ARL program on network science for human decision making. Frontiers in Physiology 2, 76. Retrieved from https://doi.org/10.3389/ fphys.2011.00076 Whitfield-Gabrielli, S., & Nieto-Castanon, A. (2012). Conn: A functional connectivity toolbox for correlated and anticorrelated brain networks. Brain Connectivity 2(3). doi:10.1089/ brain.2012.0073 Wilkins, R. W. (2015). Network science and the effects of music on the human brain (Doctoral dissertation). University of North Carolina at Greensboro. Wilkins, R. W., Giridharan, S., Johnston, M., Brooks, J. A., Lindquist, K. A., & Kraft, R. A. (2018). Changes in resting-state functional brain networks during naturalistic music listening. In preparation. Wilkins, R. W., Hodges, D. A., Laurienti, M., Steen, M., & Burdette, J. H. (2012). Network science: A new method for investigating the complexity of musical experiences in the brain. Leonardo 45(3), 282–283. Wilkins, R. W., Hodges, D. A., Laurienti, M., Steen, M., & Burdette, J. H. (2014). Network science and the effects of music preference on functional brain connectivity: From Beethoven to Eminem. Scientific Reports 4, 6130. doi: 10.1038/srep06130 Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

144  robin w. wilkins  Wu, J., Zhang, J., Ding, X., Liu, D., & Zhou, C. (2013). The effects of music on brain functional networks: A network analyses. Neuroscience 250, 49–59. Wu, J., Zhang, J., Liu, C., Liu, D., Ding, X., & Zhou, C. (2012). Graph theoretical analysis of EEG functional connectivity during music perception. Brain Research 1483, 71–81. Zatorre, R., Evans, A., Meyer, E., & Gjedde, A. (1992). Lateralization of phonetic and pitch discrimination in speech processing. Science 256(5058), 846–849. Zatorre, R., & Samson, S. (1991). Role of the right temporal neocortex in retention of pitch in auditory short-term memory. Brain 114(6), 2403–2417. Zuo, X., Ehmke, R., Mennes, M., Imperati, D., Castellanos, X., Sporns, O., & Milham, M. (2011). Network centrality in the human functional connectome. Cerebral Cortex 22(8), 1862–1875.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

chapter 7

Acoustic Struct u r e a n d M usica l Fu nction: M usica l Notes I n for mi ng Au ditory R ese a rch Michael Schutz

Introduction and Overview Beethoven’s Fifth Symphony has intrigued audiences for generations. In opening with a succinct statement of its four-note motive, Beethoven deftly lays the groundwork for hundreds of measures of musical development, manipulation, and exploration. Analyses of this symphony are legion (Schenker, 1971; Tovey, 1971), informing our understanding of the piece’s structure and historical context, not to mention the human mind’s fascination with repetition. In his intriguing book The first four notes, Matthew Guerrieri deconstructs the implications of this brief motive (2012), illustrating that great insight can be derived from an ostensibly limited grouping of just four notes. Extending that approach, this chapter takes an even more targeted focus, exploring how groupings related to the harmonic structure of individual notes lend insight into the acoustical and perceptual basis of music listening. Extensive overviews of auditory perception and basic acoustical principles are readily available (Moore, 1997; Rossing, Moore, & Wheeler, 2013; Warren, 2013) discussing the structure of many sounds, including those important to music. Additionally, several texts now focus specifically on music perception and cognition (Dowling & Harwood, 1986; Tan, Pfordresher, & Harré, 2007; Thompson, 2009). Therefore this chapter focuses

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

146   michael schutz on a previously under-discussed topic within the subject of musical sounds—the importance of temporal changes in their perception. This aspect is easy to overlook, as the perceptual fusion of overtones makes it difficult to consciously recognize their individual contributions. Yet changes in the amplitudes of specific overtones excited by musical instruments as well as temporal changes in the relative strengths of those overtones play a crucial role in musical timbre. Western music has traditionally focused on properties such as pitch and rhythm, yet contemporary composers are increasingly interested in timbre, to the point where it can on occasion even serve as a composition’s primary focus (Boulez, 1987; Hamberger, 2012). And although much previous scientific research on the neuroscience of music as well as music perception has focused on temporally invariant tones, there has been increasing recognition in the past decade that broadening our toolbox of stimuli is important to elucidating music’s psychological and neurological basis. Consequently, understanding the role of temporal changes in musical notes holds important implications for psychologists, musicians, and neuroscientists alike. Traditional musical scores give precise information regarding the intensity of each instrument throughout a composition in the form of dynamic markings. But for obvious practical reasons scores never specify the rapid intensity changes found in each overtone of an individual note. At most, composers hint at their preferences through descriptive terms such as “sharper/duller,” vague instructions (“as if off in the distance”), and/or performers use stylistic considerations to make such decisions—e.g., by following period-specific performance practice. And to a large extent, both the harmonic structure of a note as well as changes in its harmonic structure over time are natural consequences of an instrument’s physical structure. For example, the rapid decay of energy in harmonics shortly after the onset of a vibraphone note contrasts with the long sustain of its fundamental—contributing to its characteristic sound. Musical notation clearly reflects changes in the intensity of collections of notes (e.g., crescendos, sfz) but never on the changes within notes themselves. While understandable, this decision mirrors the lack of attention to changes in overtone intensity in many psychophysical descriptions of sound—as well as perceptual experiments with auditory stimuli. This is unfortunate, as these intensity changes play an important role in efforts to synthesize “realistic” sounding musical notes—an issue of great relevance to composers creating electronic music. These also play an important role in discussions of tone quality so crucial to music educators training young ears, not to mention sound editors/ engineers exploring which dynamic changes are important to capture and preserve when recording/mixing/compressing high quality audio. This chapter summarizes research on both the perceptual grouping of overtones and their rapid temporal changes, placing it in a broader context by highlighting connections to another important topic— how individual notes are perceptually grouped into chords. Finally, it concludes with a discussion of mounting evidence that auditory stimuli devoid of complex temporal changes may lead to experimental outcomes that fail to generalize to world listening— and on occasion can suggest errant theoretical frameworks and basic principles.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

acoustic structure and musical function   147

Grouping Notes: Deconstructing Chords and Harmonies The vertical alignment of notes gives rise to musical harmonies ranging from lush to biting—from soothing to scary. Consequently, composers carefully design complex groupings whose musical effects hinge on small changes in their arrangement. For example, major and minor chords differ significantly in their neural processing (Pallesen et al., 2005; Suzuki et al.,  2008) and evoke distinct affective responses (Eerola, Friberg, & Bresin, 2013; Heinlein, 1928; Hevner, 1935). Yet from the standpoint of acoustic structure this change is small—a half-step in the third (i.e., “middle note”) of a musical chord (Aldwell, Schachter, & Cadwallader, 2002). In absolute terms, this represents a relatively small shift in the raw acoustic information—moving one of three notes the smallest permissible musical distance. From a raw acoustic perspective, this is particularly unremarkable in a richly orchestrated passage, yet the shift from major to minor can lead to significant changes in a passage’s character. Individuals with cochlear implants—which offer relatively coarse pitch discrimination—are often unable to hear these distinctions, and often find music listening problematic (Wang et al., 2012). Fortunately most hear these changes quite readily, as evidenced by a literature on the detection of “out of key” notes shifted by a mere semi-tone (Koelsch & Friederici,  2003; Pallesen et al.,  2005). Although musical acculturation occurring at a relatively young age (Corrigall & Trainor, 2010, 2014) aids this process, even musically untrained individuals are capable of detecting small changes (Schellenberg, 2002). Notes of different pitch are often grouped together into a single musical object—a chord. Typically consisting of three or more individual notes, chords function as a  “unit” and together lay out the harmonic framework or backbone of a musical ­passage. The specific selection of simultaneous notes (i.e., harmonically building chords) has profound effects on the listening experience of audiences, forming one of the key building blocks of strong physiological responses to music (Lowis,  2002; Sloboda, 1991). The masterful selection of notes, rhythms, and instruments requires both intuition and craft, and basic principles are articulated in numerous treatises on composition (Clough & Conley, 1984), and guidelines to orchestration (Alexander & Broughton,  2008; Rimsky-Korsakov,  1964). Yet another aspect of musical sound’s vertical structure plays a crucial role in the listening experience, even if it is under less direct control by composers—the “vertical structure” (i.e., harmonic content) of individual notes—as well as the time-varying changes to these components. This topic forms the primary focus of this chapter, for much as study of individual notes can lend insight into our perception of musical passages, studying the rich, timevarying structure of concurrent harmonics can lend insight into our understanding of their perception.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

148   michael schutz

Grouping Harmonics: Deconstructing Individual Notes The complexities in composers’ grouping of individual notes into chords are well known (Aldwell et al., 2002), yet the musical importance of individual harmonics is less transparent, even though single notes produced by musical instruments contain incredible sophistication and nuance (Hjortkjaer, 2013). Musical instruments produce sounds rich in overtones, which for pitched instruments generally consist of harmonics at integer multiples of the fundamental (Dowling & Harwood, 1986; Tan et al., 2010), as well as other non-harmonic energy (particularly during a sound’s onset). The lawful structure of these overtones serves as an important binding cue, triggering a decision by the perceptual system to blend overtones such that “the listener is not usually directly aware of the separate harmonics” (Dowling & Harwood, 1986, p. 24). Although some musicians develop the ability to “hear out” individual components of their instruments (Jourdain, 1997, p. 35), in general this collection of frequencies fuses into a single musical unit. Consequently for practical matters the complex structure of individual notes is of less musical interest than the composer’s complex selection of structural cues (Broze & Huron, 2013; Huron & Ollen, 2003; Patel & Daniele, 2003; Poon & Schutz, 2015), or the performer’s interpretation of those cues (Chapin, Jantzen, Kelso, Steinberg, & Large, 2010). Although the musical importance of small note-to-note variations in amplitude with respect to phrasing and expressivity (Bhatara, Tirovolas, Duan, Levy, & Levitin, 2011; Repp, 1995) is widely recognized, the small moment-to-moment amplitude variations in individual overtones have received less research attention. Musical sounds contain overtones shifting in their relative strength over time (Jourdain, 1997, p. 35), and some textbooks explicitly note the importance of these dynamic changes (Thompson, 2009, p. 59). Yet the role of spectra is often presented as time-invariant and described through summaries of spectral content irrespective of temporal changes in a note’s spectra. Musical instruments produce notes rich in temporal variation—not only in their overall amplitudes, but even with respect to the envelopes of individual harmonics. For example, Fig. 1 visualizes a musical note performed on the trumpet (left panel) and clarinet (right panel), based on instrument sounds provided by the University of Iowa Electronic Music studios (Fritts, 1997). The intensity (z axis) of energy extracted from each harmonic (x axis) is graphed over time (y axis). These 3D visualizations illustrate the temporal complexity of harmonics bound into the percept of a single note. In fact, divorced from its context in a melody, expressive timings in musical passages, discussion of performer’s intentions regarding phrasing and numerous other considerations, analysis of isolated notes affords invaluable insight. Small temporal variations in each overtone play a key role in the degree to which synthesized notes sound “real” rather than “artificial.” Highly trained musicians can routinely produce different variations on

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

acoustic structure and musical function   149 Trumpet analysis, H1 = 261.63 Hz

Intensity

Intensity

Clarinet analysis, H1 = 261.63 Hz

2

11

13

15

0

1 5

7 9 Harm onic

0.5 11

13

15

Tim e

1 3

(s)

1.5

(s)

5 7 9 Harm onic

Tim e

1 3

4 3.5 3 2.5 2 1.5 1 0.5

0

Figure 1.  Visualization of single notes produced by a trumpet (left) and clarinet (right), illustrating their complex temporal structure. Although the trumpet spectrum changes more dynamically than the clarinet, each partial is in constant flux. The goal of these 3D figures is to illustrate the dynamic nature of the harmonic structure of musical tones. Consequently they are not complete acoustical analyses (which are readily available elsewhere), but serve to highlight information lost in temporally invariant power spectra.

a single note (“brighter” or “more legato,” “shimmery,” etc.), which involve intentionally varying both the balance and temporal changes in a note’s overtones. As tones synthesized without adequate temporal changes often sound uninteresting or “fake,” composers of electronic music, producers, instrument manufacturers, and other musical professionals pay top dollar for high quality audio samplings of instruments needed for their artistic purposes. Some creators of electronic music prefer samples of real musical sounds over efforts to synthesize these sounds (Risset & Wessel, 1999), in part due to the temporal complexity of accurately realizing the temporal changes in individual musical notes, as well as our sensitivity to small changes (or the lack thereof) in electronically generated tones. From a psychological perspective, what is so crucial about the structure of individual notes? What are the acoustic differences between life-like and dull renditions of individual instruments? The importance of dynamic changes in an individual note’s harmonics can be most usefully understood within the context of musical timbre—a complex, multidimensional property that has proven incredibly challenging to even define, let alone explain. Unfortunately for timbre enthusiasts, this property is often treated as a “miscellaneous category” (Dowling & Harwood, 1986, p. 63) accounting for the perceptual experience of “everything about a sound which is neither loudness nor pitch” (ANSI, 1994; Erickson, 1975). In other words, timbre is often defined less by what it is than what it is not (Risset & Wessel, 1999). This oppositional approach is sensible given the multitude of acoustic factors known to play a role in its perception (Caclin, McAdams, Smith, & Winsberg, 2005; McAdams, Winsberg, Donnadieu, de Soete, & Krimphoff, 1995).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

150   michael schutz

Acoustic Structure and Musical Timbre

Intensity

Intensity

One particularly useful technique for studying musical timbre is multidimensional scaling (MDS), which allows for exploration absent of assumptions about which acoustic properties are most important. Many studies using this approach will present a variety of individual notes matched for pitch and intensity, asking participants to rate their similarity (or more often, dissimilarity). Analysis of dissimilarity ratings affords construction of a multidimensional space allowing for visualization of the “perceptual distance” between different pairs of notes. Early studies found spectral properties play a crucial role (Miller & Carterette, 1975), and subsequent work has refined our understanding of their role on both the neural (Tervaniemi, Schröger, Saher, & Näätänen, 2000) and perceptual (Grey & Gordon,  1978; Trehub, Endman, & Thorpe,  1990) levels. Consequently, the role of spectra in timbre is well explained in numerous textbooks on auditory perception and music cognition (Dowling & Harwood, 1986; Tan et al., 2010; Thompson, 2009, p. 48), typically through visualizations of power spectra, similar to Fig. 2. Power spectra provide a useful, time-invariant summary of the relative harmonic strength. By collapsing along the temporal dimension shown in Fig. 1, Fig. 2 summarizes one of the characteristic distinctions between brass and woodwind instruments—that trumpets produce energy at all harmonics, whereas clarinets primarily emphasize alternate harmonics. Yet power spectra fail to capture the dynamic changes prominent in natural musical instruments, and the perceptual difference between synthesizing the information represented in Fig. 1 and Fig. 2 is striking. For interactive demonstrations of these differences, pedagogical tools useful for both teaching and research purposes are freely available from www.maplelab.net/pedagogy.

2

4

6

8

10

Harmonic

12

14

16

2

4

6

8

10

12

14

16

Harmonic

Figure 2.  Power spectra of trumpet and clarinet. These plots accurately convey the trumpet’s energy at many harmonics in contrast to the clarinet’s energy primarily at odd numbered harmonics. However, power spectra fail to convey any information about the temporal changes in harmonic amplitude so crucial to a sound’s timbre.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

acoustic structure and musical function   151 The shortcomings of power spectra are clear in cases where temporal cues play key roles not only in the realism of a musical sound, but in the distinction between different musical timbres. For example, the top row of Fig. 3 shows power spectra for notes produced on the trombone vs. cello.1 This visual similarity in power spectra is somewhat surprising, given the markedly different methods of sound production in these instruments—a brass tube driven by lips on a mouthpiece vs. a bow drawn across a string. Additionally, cellos and trombones function differently in most musical compositions, suggesting their perception is distinct. Although this distinction is not apparent from their power spectra, it is clear in the middle row of Fig. 3 showing changes in harmonic strength over time. The bottom row provides a visualization of tones synthesized using the power spectra in the first row—illustrating what is retained and what is lost in time-invariant visualizations of musical sounds. Certain aspects of temporal dynamics are recognized as playing an important role in musical timbre. For example, both the rise time (initial onset) of notes (Grey,  1977; Krimphoff, McAdams, & Winsberg, 1994) as well as gross temporal structure—amplitude envelope—have been shown to be important (Iverson & Krumhansl,  1993). As an extreme example, reversing the temporal structure of a note qualitatively changes its timbre, such that a piano note played “backwards” sounds more like a reed-organ than a piano (Houtsma, Rossing, & Wagennars, 1987). It is important to note that in this case the power spectra for piano notes played either forwards or backwards are identical— yet the experience of listening to these renditions differs markedly. Even beyond dramatic changes such as backwards listening, temporal changes are known to play an important role in sounds from natural instruments. However, interest in the connection between temporal dynamics and timbre has largely focused on a sound’s onset (Gordon, 1987; Strong & Clark, 1967) rather than changes throughout its sustain period. For example, past studies have shown that insensitivity to a tone’s onset correlates with reading deficits (Goswami, 2011). Tone onset is also crucial to distinguishing between musical timbres (Skarratt, Cole, & Gellatly, 2009), and their removal leads to confusion of instruments otherwise easily differentiable (Saldanha & Corso, 1964).2

The Use of Temporally Varying Sounds in Music Perception Research Although temporal changes in the strengths of individual harmonics clearly play an important role in musical sounds, these changes are rightly recognized by experimental psychologists as potentially confounding (or at least introducing noise into) perceptual 1  All analyses of notes in this chapter are based on additional samples from the University of Iowa Electronic Music studios (Fritts, 1997). 2  However, presenting notes without transients as part of a melodic sequence (rather than as isolated tones) may mitigate this confusion (Kendall, 1986).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

Intensity

Cello analysis, H1 = 261.63 Hz

Intensity

Trombone analysis, H1 = 261.63 Hz

2

4

6

8 10 Harmonic

12

14

16

2

4

6

8 10 Harmonic

12

14

16

Cello analysis, H1 = 261.63 Hz

Intensity

Intensity

Trombone analysis, H1 = 261.63 Hz

9 11 13 15 onic

0

e (s)

7

Harm

analysis, H1 = 261.63 Hz

0

7 9 11 Harm 13 15 onic

0

1

4 3 2

(s)

Intensity 1 3 5

6 5

Tim e

Tim

1

4 3 2

6 5

e (s)

Intensity

analysis, H1 = 261.63 Hz

1 3 5 7 9 11 Harm 13 15 onic

1 0.5

2 1.5

Tim

e (s)

0

1 3 5

Tim

1 3 5

7 9 11 Harm 13 15 onic

2.5 2 1.5 1 0.5

Figure 3.  Visualizations of trombone (left) and cello (right). Panels in top row illustrate similarity in these instruments’ power spectra, despite the clear acoustical differences shown in the middle panels. Bottom panels visualize tones synthesized using static power spectra (i.e., ignoring temporal changes in the strength of individual harmonics).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

acoustic structure and musical function   153 experiments. Not only will different instruments (along with variations in mouthpieces, mallets, bows, etc.) make consistency challenging when using natural musical tones, the complexity of changes in recordings of nominally steady-state notes runs contrary to the level of control desirable for scientific experimentation. If an experimenter’s goal is to explore the role of pitch difference in auditory stream segregation, short pure tones with minimal amplitude variation offer clear benefits for drawing strong, replicable conclusions elucidating some aspects of our auditory perceptual organization. Consequently, the high degree of emphasis placed upon tightly constrained, easily reproducible stimuli incentivizes the use of simplified tones lacking temporal variation beyond simplistic onsets and offsets. This raises important questions about what kinds of stimuli are used to assess auditory perception. Although simplified sounds aid researchers in avoiding problematic confounds, their over-use could lead to challenges with generalizing their findings to natural sounds with the kinds of temporal variations shown in Fig. 1. In order to explore the kinds of sounds used in research on music perception, my team surveyed 118 empirical papers published in the journal Music Perception from experiments dating back to its inception in 1983, based on a previous comprehensive bibliometric survey (Tirovolas & Levitin, 2011). Primarily interested in determining the amount of amplitude variation found in the temporal structures of auditory stimuli, we classified every stimulus used in each of the 212 surveyed experiments as either “flat” (i.e., lacking temporal variation), “percussive” (decaying notes such as those produced by the piano, cowbell, or marimba), or “other”—sounds such as those produced by sustained instruments like the French horn or human voice. Fig. 4 illustrates examples of each stimulus class. The most surprising outcome from this survey was that although most articles included a wealth of technical information on spectral structure, duration, and the exact model of headphones or speakers used to present the stimuli, about 35 percent failed to define the stimuli’s temporal structure. This finding is not unique to Music Perception— my team found similar problems with under-specification in the journal Attention, Perception & Psychophysics (Gillard & Schutz, 2013). More important than underspecification, both surveys revealed a strong bias against sounds with the kinds of temporal variations common to musical instruments. Although flat tones lend themselves well to tight experimental control and consistent replication amongst different labs, they fail to capture the richness of the sounds forming the backbone of the musical listening experience. Yet they remain prominent in a wide range of research on auditory perception on tasks purportedly designed to illuminate generalizable principles of auditory perception. Prominent researchers have noted that the world is “[not] replete with examples of naturally occurring auditory pedestals [i.e., flat amplitude envelopes]” (Phillips, Hall, & Boehnke, 2002, p. 199). Yet flat tones appear to be the normative approach to research on auditory perception, which are clearly far removed from the complexity of natural musical sounds—as shown in Fig. 5. Note that each of the three musical instruments visualized not only exhibits constant temporal changes, but temporal changes in the

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

154   michael schutz

Amplitude

“Flat” Tones

0.0

0.5

1.0 1.5 Time (s)

2.0

2.5

4

5

Amplitude

“Percussive” tones

0

1

2

3 Time (s)

Amplitude

“Other” tones

0

5

10 Time (s)

15

20

Figure 4.  Wave forms of different sounds found in the survey of stimuli used in Music Perception (Schutz & Vaisberg, 2014). Reproduced from Music Perception: An Interdisciplinary Journal 31(3), Michael Schutz and Jonathan M. Vaisberg, Surveying the temporal structure of sounds used in music perception, pp. 288–296, doi:10.1525/mp.2014.31.3.288, Copyright © 2014, The Regents of the University of California.

amplitudes of each individual harmonic. This dynamic fluctuation contrasts starkly with the flat tones favored in auditory perception research shown in the bottom right panel. This over-fixation on sounds lacking meaningful amplitude variation is not confined to behavioral work; a large-scale review of auditory neuroscience research concluded with a note of caution that important properties of functions of the auditory system will only be fully understood when researchers begin employing envelopes that “involve modulation in ways that are closer to real-world tasks faced by the auditory system” (Joris, Schreiner, & Rees, 2004, p. 570). The acoustic distance between the temporally dynamic musical sounds and temporally constrained flat tones common in auditory perception and neuroscience research raises important questions about the degree to which theories and models derived from these experiments generalize to musical listening. The complexities of balancing competing needs for experimental control and ecological relevance are significant, and will serve as the focus of the ­following section.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

7 9 11 Harmo nic

13

15

0

1

3

5

1 7

9 Harmonic

11

0.5 13

15

2.5

0

Intensity

Intensity

5

2 1.5

Tim e (s )

(s)

3

Tim e

1

1.4 1.2 1 0.8 0.6 0.4 0.2

Intensity

Intensity

acoustic structure and musical function   155

2 1.5

2

5

7 9 11 Harmo nic

0.5 13

15

0

Tim e( s)

3

1

1

3

1 5

7 9 Harmo 11 nic

Tim e( s)

1.5 1

0.5 13

15

0

Figure 5.  Single notes produced by an oboe (upper left), French horn (upper right), and viola (lower left) illustrate their temporal complexity. Although their specific mix of harmonics varies, these instruments all exhibit constant changes in the strength of each harmonic over the tone’s duration. This temporal complexity contrasts strongly with the temporal simplicity of the flat tone depicted in the lower right panel, which lacks temporal variation beyond abrupt onsets/ offsets, and no change in relative strength of harmonics.

On the Methodological Convenience of Simplified Sounds This focus on tightly constrained stimuli is not necessarily problematic; control of extraneous variables is essential to researchers’ ability to draw strong conclusions from individual experiments. Consistency in the synthesis of stimuli amongst different labs holds many advantages with respect to replication, an issue of increasing importance to the field as a whole. And in some circumstances the real-world associations inherent in temporally complex sounds can pose obstacles to answering key questions. For example, researchers exploring acoustic attributes of unpleasant sounds illustrate that frequency range (Kumar, Forster, Bailey, & Griffiths, 2008), spectral “roughness” (Terhardt, 1974), and the relative mix of harmonics-to-noise (Ferrand,  2002) are key factors—issues

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

156   michael schutz important for engineers designing human–computer auditory interfaces. Yet a direct ranking of sounds shows that vomiting is regarded as one of the most unpleasant (Cox, 2008), an outcome related less to its specific acoustic properties than the obvious real-world associations (McDermott, 2012). In some cases these real-world associations may be regarded as confounds obfuscating the general principles at hand. Therefore, in some inquiries aimed at understanding the relationship between acoustic structure and perceptual response, it is not only reasonable but actually necessary to use sounds devoid of referents. This issue of disentangling the effects attributable to associations vs. acoustic features is of particular importance in the perception of music, given the rich and complex relationship between music, memory, and emotion. Familiar compositions can evoke memories as a result of past associations—for example from a history of personal listening/performance (Schulkind, Hennis, & Rubin, 1999) or use in film sound tracks (e.g., those used by Vuoskoski and Eerola, 2012). Indeed songs from popular television shows are so familiar they have even been used to assess the pervasiveness of absolute pitch amongst the general population (Schellenberg & Trehub, 2003). Consequently, synthesized tones lacking real-world associations serve a useful purpose in advancing our understanding of auditory perception. However, although artificial sounds devoid of real-world associations that afford precise control/replication offer advantages in certain circumstances, their simplicity can pose barriers to fully understanding music perception. In fact, auditory psychophysics’ focus on “control” (Neuhoff, 2004) and the study of isolated parameters absent their natural context (Gaver, 1993) is an issue of long-standing concern in some corners of the auditory perception community. This is of particular importance to understanding music, as composers, performers, conductors, and recording engineers focus great attention to slight nuances of musical timbre. Yet the same differences so useful in artistic creation often serve as confounds within the realm of auditory psychophysics. This raises important questions about the types of stimuli that should be used in experiments designed to address questions related to music listening. Can artificial sounds abstracted from our day-to-day musical experiences lead to experimental outcomes that generalize to listening outside the laboratory? Perceptual experiments exploring audio-visual integration in musical contexts offer a useful case study in the consequences of ignoring the role of musical sounds’ dynamic temporal structures. A large body of audio-visual integration research using temporally simplistic sounds has concluded that vision rarely influences auditory evaluations of duration3 (Fendrich & Corballis, 2001; Walker & Scott, 1981; Welch & Warren, 1980). However, a musical experiment exploring ongoing debate amongst percussionists led to a surprising break with widely accepted theory. In that series of studies an internationally acclaimed musician attempted to create long and short notes on the marimba—a tuned, wooden bar instrument similar to the xylophone. Notes on the marimba are percussive (Fig. 4, middle panel)—with continuous temporal variation in their structure 3  Provided that the acoustic information is of sufficient quality (Alais & Burr,  2004; Ernst & Banks, 2002).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

acoustic structure and musical function   157 as the energy transferred into the bar (by striking) gradually dissipates as a result of friction, air resistance, etc. Whether or not the duration of these notes can be intentionally varied has been long debated in the percussion community (Schutz & Manning, 2012). However, an assessment of an expert percussionist’s ability to control note duration demonstrated that these gestures are in fact acoustically inconsequential, but trigger an illusion in which the longer physical gesture used to strike the instrument affects perception of the resulting note’s duration (Schutz & Lipscomb, 2007). Musical implications (Schutz, 2008) aside, this finding represents a clear break from previously accepted views on the integration of sight and sound (Fendrich & Corballis, 2001; Walker & Scott, 1981; Welch & Warren, 1980). The surprising ability of percussionists to shape perceived note duration despite previous experimental work to the contrary stems in large part from a bias in the temporal structure of stimuli used in auditory research. Subsequent experiments illustrate that movements derived from the percussionists’ gesture (Schutz & Kubovy, 2009b) integrate with sounds exhibiting decaying envelopes (e.g., piano notes, produced from the impact of a hammer on string), but failed to integrate with the sustained tones produced by the clarinet or French horn (Schutz & Kubovy, 2009a). As the clarinet differs in many properties from the marimba and piano, a direct test of temporal structure using pure tones (i.e., sine waves) shaped with decaying vs. amplitude invariant amplitude envelopes found visual information integrated with the temporally dynamic percussive tones, but not the temporally invariant flat tones previously used in audio-visual integration experiments (Schutz, 2009). This distinction between the outcomes of experiments with tones using temporally dynamic vs. static amplitude envelopes is important in assessing the degree to which lab-based tasks inform our understanding of listening in the real world. For example, temporal structure can play a key role in the well-known audio-visual bounce effect (ABE), in which two circles approach each other, overlap, and then move to their original starting point. Although this ambiguous display can be perceived as depicting circles either “bouncing off ” or “passing through” one another, a brief tone coincident with the moment of overlap enhances the likelihood of seeing a bounce (Sekuler, Sekuler, & Lau, 1997). However, not all sounds affect this integrated perception in the same way. Sounds synthesized with decaying envelopes mimicking impact events trigger significantly more bounce percepts than their mirror images (Grassi & Casco, 2009). The temporal structure of individual tones also plays a role in a variety of “general” perceptual tasks assessed primarily using tones lacking dynamic temporal changes, leading to different experimental outcomes in tasks ranging from learning associations (Schutz, Stefanucci, Baum, & Roth, 2017) to perceiving pitches (Neuhoff & McBeath, 1996), assessing event duration (Vallet, Shore, & Schutz, 2014), and segmenting auditory streams (Iverson, 1995). Overlooking the importance of temporal structure in auditory perception can even lead to misguided theoretical claims used to inform ongoing research programs. For example, as discussed previously a great deal of audio-visual integration research involves temporally simplified tones ensuring experimental control. However, interest

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

158   michael schutz in the role of the natural connection between sight and sound has been considered in discussions regarding the “unity assumption” (Welch, 1999) and/or “identity decision” (Bedford,  2004). That research explores the idea that event unity between sight and sound plays an important role in the binding decision, such that stimuli perceived as “going together” are more likely to bind. For example, in the well-known “ventriloquist effect” the sound of a ventriloquist’s voice is perceptually bound with concurrent lip movements of their puppets (Abry, Cathiard, Robert-Ribes, & Schwartz, 1994; Bonath et  al.,  2007). Unfortunately, the natural real-world relationships between sights and sounds often pose challenges for the controlled manipulations so important to experimental research. For example, tightly controlled, psychophysically inspired studies of multimodal speech help clarify the importance of event unity in multisensory integration. Gender matched faces and voices—the sound of a male producing syllable paired with the lip movements of either male or female articulating that syllable—bind more strongly than gender mis-matched faces and voices (Vatakis & Spence, 2007). This finding offers strong evidence for the unity assumption raising important questions about the degree to which it applies to auditory stimuli beyond speech. A series of experiments assessing the role of the unity assumption with musical stimuli involved pairing the sound of a piano note and plucked guitar string with video recordings of the movements used to produce these sounds. Following their earlier procedures, this approach found no evidence of the unity assumption playing a role in this non-speech musical task (as well as other stimuli such as a hammer striking ice vs. a bouncing ball). This outcome contributed to the conclusion that the unity assumption applied only to speech stimuli (Vatakis, Ghazanfar, & Spence, 2008). However, as summarized below, subsequent research found strong evidence for the unity assumption in non-speech tasks—considering the importance of auditory temporal structure. The piano and guitar sounds used by Vatakis et al. (2008) exhibited similar amplitude envelopes—a property defining the gross temporal structure of a sound (i.e., the summation of changes in the amplitudes of spectral components). Building upon their approaches to assessing binding using musical notes produced by the marimba and cello, my team found evidence for the unity assumption when assessing sounds that involved clearly differentiable amplitude envelopes (Chuen & Schutz, 2016). Although in hindsight, the traditional focus on flat tones in auditory psychophysics research helped obfuscate the obvious similarity in temporal structure of the guitar and piano notes used by Vatakis et al. (2008). Given the relatively small proportion of auditory perception studies using natural sounds, this oversight is understandable as the use of natural sounds in psychophysics experiments is laudable given the general focus on temporally invariant stimuli, which “often seems to have limited direct relevance for understanding the ability to recognize the nature of complex natural acoustic source events” (Pastore, Flint, Gaston, & Solomon, 2008, p. 13). From these examples, it is clear that the time-varying structure of natural sounds (or lack thereof) can meaningfully influence the outcomes of psychological experiments.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

acoustic structure and musical function   159 This is true whether researchers’ goals are to explore natural listening or attempting to better understand the theoretical structure and function of the auditory system. This issue holds important implications even for experiments aimed at elucidating generalized principles of perceptual processing rather than explicitly assessing the role of dynamic temporal changes. Together, these concerns are consistent with those raised previously by proponents of ecological acoustics such as John Neuhoff, who argue that “the perception of dynamic, ecologically valid stimuli is not predicted well by the results of many traditional experiments using static stimuli” (2004, p. 5).

Conclusions Traditional studies of specific sequences of notes such as the four note opening of Beethoven’s Fifth Symphony provide useful insight into both the theoretical structure of musical passages, as well as their larger cultural relevance. Much as the constant movement of pitches and rhythms gives rise to lively melodies, the continual variations in temporal structure (for multiple simultaneous harmonics) play an important role in musical listening. However, as this information is not notated in musical scores and is often under-emphasized in scientific discourse, the importance of these dynamic changes is not always fully recognized. This “insight” is well understood amongst those involved in sound synthesis and virtual modeling of musical instruments. However, the need for tight experimental control for stimuli used in experimental work on auditory perception and auditory neuroscience has incentivized the use of simple time-invariant flat tones. Although they offer important methodological benefits, their distance from musical sounds can pose limitations on their ability to inform our understanding of natural listening. With modern recording and sound synthesis approaches we now have the ability to generate auditory stimuli exhibiting the rich temporal variation of natural musical sounds, while also affording the precise control so crucial for avoiding confounds—raising exciting new possibilities for future innovation and discovery. Looking toward the future, research assessing core questions of auditory perception using temporally complex sounds will help clarify the degree to which existing theories and models apply to our perception of natural sounds such as those produced by musical instruments.

Acknowledgments Funding supporting this research was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), Social Sciences and Humanities Research Council of Canada (SSHRC), and the Ontario Early Researcher Award (ERA). I would like to thank Maxwell Ng for his assistance in creating the visualizations of the instrument sounds used throughout this chapter.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

160   michael schutz

References Abry, C., Cathiard, M. A., Robert-Ribes, J., & Schwartz, J. L. (1994). The coherence of speech in audio-visual integration. Current Psychology of Cognition 13, 52–59. Acoustical Society of America Standards Secretariat (1994). Acoustical Terminology ANSI S1.1–1994 (ASA 111-1994). American National Standard. ANSI/Acoustical Society of America. Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology 14(3), 257–262. Aldwell, E., Schachter, C., & Cadwallader, A. (2002). Harmony & voice leading (3rd ed.). Boston, MA: Schirmer. Alexander, P. L., & Broughton, B. (2008). Professional orchestration: The first key. Solo instruments & instrumentation note, volume 1 (3rd ed.). Petersburg, VA: Alexander Publishing. Bedford, F.  L. (2004). Analysis of a constraint on perception, cognition, and development: One object, one place, one time. Journal of Experimental Psychology: Human Perception and Performance 30(5), 907–912. Bhatara, A., Tirovolas, A. K., Duan, L. M., Levy, B., & Levitin, D. J. (2011). Perception of emotional expression in musical performance. Journal of Experimental Psychology: Human Perception and Performance 37(3), 921–934. Bonath, B., Noesselt, T., Martinez, A., Mishra, J., Schwiecker, K., Heinze, H.-J., & Hillyard, S. A. (2007). Neural basis of the ventriloquist illusion. Current Biology 17(19), 1697–1703. Boulez, P. (1987). Timbre and composition—timbre and language. Contemporary Music Review 2(1), 161–171. Broze, Y., & Huron, D. (2013). Is higher music faster? Pitch–speed relationships in Western compositions. Music Perception: An Interdisciplinary Journal 31(1) 19–31. Caclin, A., McAdams, S., Smith, B. K., & Winsberg, S. (2005). Acoustic correlates of timbre space dimensions: A confirmatory study using synthetic tones. Journal of the Acoustical Society of America 118(1), 471–482. Chapin, H., Jantzen, K., Kelso, J. A. S., Steinberg, F., & Large, E. W. (2010). Dynamic emotional and neural responses to music depend on performance expression and listener experience. PLoS ONE 5, 1–14. Chuen, L., & Schutz, M. (2016). The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude cues. Attention, Perception, & Psychophysics 78(5), 1512–1528. Clough, J., & Conley, J. (1984). Basic harmonic progressions. New York: W. W. Norton. Corrigall, K.  A., & Trainor, L.  J. (2010). Musical enculturation in preschool children: Acquisition of key and harmonic knowledge. Music Perception: An Interdisciplinary Journal 28(2), 195–200. Corrigall, K. A., & Trainor, L. J. (2014). Enculturation to musical pitch structure in young children: Evidence from behavioral and electrophysiological methods. Developmental Science 17(1), 142–158. Cox, T. J. (2008). Scraping sounds and disgusting noises. Applied Acoustics 69(12), 1195–1204. Dowling, W. J., & Harwood, D. L. (1986). Music cognition. Orlando, FL: Academic Press. Eerola, T., Friberg, A., & Bresin, R. (2013). Emotional expression in music: Contribution, linearity, and additivity of primary musical cues. Frontiers in Psychology 4, 1–12. Retrieved from https://doi.org/10.3389/fpsyg.2013.00487 Erickson, R. (1975). Sound Structure in Music. Berkeley, CA: University of California Press.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

acoustic structure and musical function   161 Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433. Fendrich, R., & Corballis, P. M. (2001). The temporal cross-capture of audition and vision. Perception & Psychophysics 63(4), 719–725. Ferrand, C.  T. (2002). Harmonics-to-noise ratio: An index of vocal aging. Journal of Voice 16(4), 480–487. Fritts, L. (1997). University of Iowa Electronic Music Studios. University of Iowa. Retrieved from http://theremin.music.uiowa.edu/MIS.html Gaver, W. (1993). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology 5(1) 1–29. Gillard, J., & Schutz, M. (2013). The importance of amplitude envelope: Surveying the temporal structure of sounds in perceptual research. In Proceedings of the Sound and Music Computing Conference (pp. 62–68). Stockholm, Sweden. Gordon, J. W. (1987). The perceptual attack time of musical tones. Journal of the Acoustical Society of America 82(1) 88–105. Goswami, U. (2011). A temporal sampling framework for developmental dyslexia. Trends in Cognitive Sciences 15(1) 3–10. Grassi, M., & Casco, C. (2009). Audiovisual bounce-inducing effect: Attention alone does not explain why the discs are bouncing. Journal of Experimental Psychology: Human Perception and Performance 35(1), 235–243. Grey, J.  M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America 61(5), 1270–1277. Grey, J. M., & Gordon, J. W. (1978). Perceptual effects of spectral modifications on musical timbres. Journal of the Acoustical Society of America 63(5), 1493–1500. Guerrieri, M. (2012). The first four notes: Beethoven’s Fifth and the human imagination. New York: Alfred A. Knopf. Hamberger, C. L. (2012). The evolution of Schoenberg’s Klangfarbenmelodie: The importance of timbre in modern music. The Pennsylvania State University. Retrieved from https://etda. libraries.psu.edu/files/final_submissions/8130 Heinlein, C. P. (1928). The affective characters of the major and minor modes in music. Journal of Comparative Psychology 8, 101–142. Hevner, K. (1935). The affective character of the major and minor modes in music. American Journal of Psychology 47(1), 103–118. Hjortkjaer, J. (2013). The musical brain. In J. O. Lauring (Ed.), An introduction to neuroaesthetics: The neuroscientific approach to aesthetic experience, artistic creativity, and arts appreciation (pp. 211–244). Copenhagen: Museum Tusculanum Press. Houtsma, A. J. M., Rossing, T. D., & Wagennars, W. M. (1987). Auditory demonstrations on compact disc. Journal of the Acoustical Society of America. New York: Acoustical Society of America/Eindhoven: Institute for Perception Research. Huron, D., & Ollen, J. (2003). Agogic contrast in French and English themes: Further support for Patel and Daniele (2003). Music Perception: An Interdisciplinary Journal 21(2), 267–271. Iverson, P. (1995). Auditory stream segregation by musical timbre: Effects of static and dynamic acoustic attributes. Journal of Experimental Psychology: Human Perception and Performance 21, 751–763. Iverson, P., & Krumhansl, C.  L. (1993). Isolating the dynamic attributes of musical timbre. Journal of the Acoustical Society of America 94, 2594–2603.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

162   michael schutz Joris, P. X., Schreiner, C. E., & Rees, A. (2004). Neural processing of amplitude-modulated sounds. Physiological Reviews 84, 541–577. Jourdain, R. (1997). Music, the brain, and ecstasy: How music captures our imagination. New York: William Morrow and Company. Kendall, R. A. (1986). The role of acoustic signal partitions in listener categorization of musical phrases. Music Perception 4(2), 185–213. Koelsch, S., & Friederici, A.  D. (2003). Toward the neural basis of processing structure in music. Annals of the New York Academy of Sciences 999, 15–28. Krimphoff, J., McAdams, S., & Winsberg, S. (1994). Caractérisation du timbre des sons complexes. II. Analyses acoustiques et quantification psychophysique. Journal de Physique IV Colloque 4, 625–628. Kumar, S., Forster, H.  M., Bailey, P., & Griffiths, T.  D. (2008). Mapping unpleasantness of sounds to their auditory representation. Journal of the Acoustical Society of America 124(6), 3810–3817. Lowis, M. J. (2002). Music as a trigger for peak experiences among a college staff population. Creativity Research Journal 14(3–4), 351–359. McAdams, S., Winsberg, S., Donnadieu, S., de Soete, G., & Krimphoff, J. (1995). Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes. Psychological Research 58(3), 177–192. McDermott, J. (2012). Auditory preferences and aesthetics: Music, voices, and everyday sounds. In R. J. Dolan & T. Sharot (Eds.), Neuroscience of preference and choice: Cognitive and neural mechanisms (pp. 227–257). London: Academic Press. Miller, J. R., & Carterette, E. C. (1975). Perceptual space for musical structures. Journal of the Acoustical Society of America 58(3), 711–720. Moore, B.  C.  J. (1997). An introduction to the psychology of hearing (4th ed.). London: Academic Press. Neuhoff, J. G. (2004). Ecological psychoacoustics (J. G. Neuhoff, Ed.). Amsterdam: Elsevier/ Academic Press. Neuhoff, J.  G., & McBeath, M.  K. (1996). The Doppler illusion: The influence of dynamic intensity change on perceived pitch. Journal of Experimental Psychology: Human Perception and Performance 22(4), 970–985. Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2005). Emotion processing of major, minor, and dissonant chords: A functional magnetic resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453. Pastore, R. E., Flint, J., Gaston, J. R., & Solomon, M. J. (2008). Auditory event perception: The source–perception loop for posture in human gait. Perception & Psychophysics 70(1), 13–29. Patel, A. D., & Daniele, J. R. (2003). Stress-timed vs. syllable-timed music? A comment on Huron and Ollen (2003). Music Perception: An Interdisciplinary Journal 21(2), 273–276. Phillips, D. P., Hall, S. E., & Boehnke, S. E. (2002). Central auditory onset responses, and temporal asymmetries in auditory perception. Hearing Research 167(1–2), 192–205. Poon, M., & Schutz, M. (2015). Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech. Frontiers in Psychology 6, 1–13. Retrieved from https://doi.org/10.3389/fpsyg.2015.01419 Repp, B.  H. (1995). Quantitative effects of global tempo on expressive timing in music ­performance: Some perceptual evidence. Music Perception: An Interdisciplinary Journal 13(1), 39–57. Rimsky-Korsakov, N. (1964). Principles of orchestration (M. Steinberg, Ed.). New York: Dover.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

acoustic structure and musical function   163 Risset, J.-C., & Wessel, D.  L. (1999). Exploration of timbre by analysis and synthesis. In D. Deutsch (Ed.), The Psychology of Music (pp. 113–169). San Diego, CA: Gulf Professional Publishing. Rossing, T. D., Moore, R. F., & Wheeler, P. A. (2013). The science of sound (3rd ed.). London: Pearson Education. Saldanha, E. L., & Corso, J. F. (1964). Timbre cues and the identification of musical instruments. Journal of the Acoustical Society of America 36(11), 2021–2026. Schellenberg, E.  G. (2002). Asymmetries in the discrimination of musical intervals: Going out-of-tune is more noticeable than going in-tune musical intervals. Music Perception: An Interdisciplinary Journal 19(2), 223–248. Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychological Science 14(3), 262–266. Schenker, H. (1971). Analysis of the first movement. In E. Forbes (Ed.), Beethoven Symphony No. 5 in C minor (pp. 164–182). New York: W. W. Norton. Schulkind, M. D., Hennis, L. K., & Rubin, D. C. (1999). Music, emotion, and autobiographical memory: They’re playing your song. Memory & Cognition 27(6), 948–955. Schutz, M. (2008). Seeing music? What musicians need to know about vision. Empirical Musicology Review 3(3), 83–108. Schutz, M. (2009). Crossmodal integration: The search for unity (Dissertation). University of Virginia. Schutz, M., & Kubovy, M. (2009a). Causality and cross-modal integration. Journal of Experimental Psychology: Human Perception and Performance 35(6), 1791–1810. Schutz, M., & Kubovy, M. (2009b). Deconstructing a musical illusion: Point-light representations capture salient properties of impact motions. Canadian Acoustics 37(1), 23–28. Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music: Vision influences perceived tone duration. Perception 36(6), 888–897. Schutz, M., & Manning, F. (2012). Looking beyond the score: The musical role of percussionists’ ancillary gestures. Music Theory Online 18, 1–14. Schutz, M., Stefanucci, J., Baum, S.  H., & Roth, A. (2017). Name that percussive tune: Associative memory and amplitude envelope. Quarterly Journal of Experimental Psychology 70(7), 1323–1343. Schutz, M., & Vaisberg, J. M. (2014). Surveying the temporal structure of sounds used in music perception. Music Perception: An Interdisciplinary Journal 31(3), 288–296. Sekuler, R., Sekuler, A. B., & Lau, R. (1997). Sound alters visual motion perception. Nature 385(6614), 308. Skarratt, P. A., Cole, G. G., & Gellatly, A. R. H. (2009). Prioritization of looming and receding objects: Equal slopes, different intercepts. Attention, Perception, & Psychophysics 71(4), 964–970. Sloboda, J. (1991). Music structure and emotional response: Some empirical findings. Psychology of Music 19(2), 110–120. Strong, W., & Clark, M. (1967). Perturbations of synthetic orchestral wind-instrument tones. Journal of the Acoustical Society of America 41(2), 277–285. Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., . . . Yanai, K. (2008). Discrete cortical regions associated with the musical beauty of major and minor chords. Cognitive, Affective, & Behavioral Neuroscience 8(2), 126–131. Tan, S.-L., Pfordresher, P. Q., & Harré, R. (2007). Psychology of music: From sound to significance. New York: Psychology Press.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

164   michael schutz Terhardt, E. (1974). On the perception of periodic sound fluctuations (roughness). Acta Acustica United with Acustica 30, 201–213. Tervaniemi, M., Schröger, E., Saher, M., & Näätänen, R. (2000). Effects of spectral complexity and sound duration on automatic complex-sound pitch processing in humans: A mismatch negativity study. Neuroscience Letters 290, 66–70. Thompson, W. F. (2009). Music, thought, and feeling: Understanding the psychology of music. New York: Oxford University Press. Tirovolas, A. K., & Levitin, D. J. (2011). Music perception and cognition research from 1983 to 2010: A categorical and bibliometric analysis of empirical articles in Music Perception. Music Perception: An Interdisciplinary Journal 29(1), 23–36. Tovey, D. F. (1971). The Fifth Symphony. In E. Forbes (Ed.), Beethoven Symphony No. 5 in C minor (pp. 143–150). New York: W. W. Norton. Trehub, S.  E., Endman, M.  W., & Thorpe, L.  A. (1990). Infants’ perception of timbre: Classification of complex tones by spectral structure. Journal of Experimental Child Psychology 49(2), 300–313. Vallet, G., Shore, D. I., & Schutz, M. (2014). Exploring the role of amplitude envelope in duration estimation. Perception 43(7), 616–630. Vatakis, A., Ghazanfar, A. A., & Spence, C. (2008). Facilitation of multisensory integration by the “unity effect” reveals that speech is special. Journal of Vision 8(9), 1–11. Vatakis, A., & Spence, C. (2007). Crossmodal binding: Evaluating the “unity assumption” using audiovisual speech stimuli. Perception & Psychophysics 69(5), 744–756. Vuoskoski, J. K., & Eerola, T. (2012). Can sad music really make you sad? Indirect measures of affective states induced by music and autobiographical memories. Psychology of Aesthetics, Creativity, and the Arts 6, 1–10. Walker, J. T., & Scott, K. J. (1981). Auditory-visual conflicts in the perceived duration of lights, tones and gaps. Journal of Experimental Psychology: Human Perception and Performance 7(6), 1327–1339. Wang, S., Liu, B., Dong, R., Zhou, Y., Li, J., Qi, B., . . . Zhang, L. (2012). Music and lexical tone perception in Chinese adult cochlear implant users. The Laryngoscope 122, 1353–1360. Warren, R. M. (2013). Auditory perception: A new synthesis. Amsterdam: Elsevier. Welch, R. B. (1999). Meaning, attention, and the “unity assumption” in the intersensory bias of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Musseler (Eds.), Cognitive contributions to the perception of spatial and temporal events (pp. 371–387). Amsterdam: Elsevier. Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin 88(3), 638–667.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

chapter 8

N eu r a l Basis of R h y th m Perception Christina M. Vanden Bosch der Nederlanden, J. Eric T. Taylor, and Jessica A. Grahn

Introduction To experience music, listeners must be able to pick up on the temporal relationships among events as they unfold. These temporal relationships are characterized by the rhythm, or the pattern of time intervals between the onsets of events in music (see Fig. 1). Unlike sculpture or painting, we must comprehend rhythmic structure to perceive and produce music and dance. One of the most intriguing phenomena in music is that when we listen to rhythm, we perceive a regular, recurring pulse or beat (Cooper & Meyer, 1960; Large, 2008), which allows us to bob our heads and clap our hands in time to the music. This psychologically generated beat does not always have to align with the note onsets in a rhythm, as evidenced by the fact that we mentally continue the beat through gaps in the music (see Fig. 1). We further organize the musical beat into alternations of strong and weak beats at multiple hierarchical timescales, called meter (Epstein, 1995; Lerdahl & Jackendoff, 1983). The meter helps us distinguish between, for example, a waltz (i.e., triple meter) and a march (i.e., duple meter) depending on whether we hear the strong beat fall on every three or two beats, respectively. Despite the ease with which humans pick up on the beat and synchronize their movements to music, it is not trivial to understand how human brains perceive and process rhythm. Musical rhythms are beat-based, which means that the pattern of onsets gives rise to the feeling of an underlying pulse or framework. Perceiving a beat can make it easier to predict and act on upcoming events in a rhythmic sequence. However, many other naturally occurring rhythms in our environment do not have a regular pulse or beat, such as walking, talking, or the car engine turning over. Such rhythms are called non-beat-based. Different mechanisms have been proposed to account for the way that

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

166   christina m. vanden bosch der nederlanden et al. (a)

(b)

(c)

Simple

Complex

Syncopated

Figure 1.  Rhythm is represented by the black dots on each line, whereas the beat is represented by the bold black lines occurring every other beat (duple meter). (A). Represents a simple metrical pattern, such that events fall on the beat more often than not. (B). Depicts a complex metrical pattern with some events occurring on the beat, while many others do not. Finally, (C). Illustrates a syncopated rhythm, where note events always occur off the beat.

humans encode beat- and non-beat-based rhythms. Absolute timing mechanisms encode exact durations of all intervals in a sequence, whereas relative timing mechanisms encode when intervals start and stop in relation to the beat. If there is no regular beat, then absolute timing is likely necessary to encode the rhythm. However, with a beat, relative timing may be used. There is evidence for distinct neural networks associated with absolute and relative timing (Teki, Grube, Kumar, & Griffiths,  2011), with ­participants relying on either mechanism depending on the nature of the rhythm and task demands. A number of approaches are used to understanding rhythm processing, incorporating methodologies based on behavior, neuroimaging, and patient studies. Measuring behavior is fundamental to our understanding of rhythm because it can provide a direct measure of how we move to music. However, much of behavioral research is correlational—that is, distinct measures of stimulus characteristics and tapping variability may be related to one another, but the stimulus characteristics may not cause tapping variability as there may be a third unmeasured variable that is the true influencer of performance. Although neuroimaging approaches are ideal for discovering more about when and where in the brain rhythm is processed, some neuroimaging studies fail to include (or are unable to include for methodological reasons) behavioral measures of rhythm processing, which makes it difficult to determine exactly how differences in neural activation relate to real-world outcomes. That is, simply because there are differences in activation for two different rhythms does not necessarily mean that participants will also perceive them differently. Finally, studies of patients with brain damage or dysfunction provide significant insights, but rely on natural accidents, which do not lead to the same amount or location of damage in each individual. This makes it difficult to understand what lesioned areas are truly necessary for rhythm processing or whether a particular combination of areas is required to perceive rhythm. Of course, a combination of these approaches, combined with methods that focally disrupt ongoing neural processing, such as transcranial magnetic stimulation (TMS), have given rise to a rich literature on human rhythm perception. This chapter will review the current literature on the neural

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   167 basis of rhythm perception, which highlights important brain areas for perceiving a beat, and how the human brain entrains to rhythms in music.

Feeling the Beat To understand how the brain processes rhythm, especially rhythms that have a beat (as is the case in most music), it is first necessary to create rhythmic stimuli that are capable of inducing the percept of a strong beat, and similar stimuli that either do not induce a beat percept at all, or only weakly so. This allows researchers to compare scenarios when participants feel the beat more or less strongly (or not at all), but other aspects of the task are equivalent (e.g., the presence of acoustically similar sounds, having to listen to or reproduce rhythms). The stimuli must be as physically similar as possible so that when activation of strong and weak beat rhythms is compared, activation differences will not arise from other differences between stimuli apart from the percept of a beat. To solve this problem, researchers take advantage of the fact that the strength of a beat percept can be driven by perceptual accents, or perceived emphases on certain tones in a rhythm that generally mark the beat. These perceptual accents differ from physical accents (e.g., changes in loudness or pitch on certain notes), because they arise from timing of the tones, even though the physical properties of all the tones are identical (Brochard, Abecasis, Potter, Ragot, & Drake,  2003; Povel & Okkerman,  1981). For instance, people perceive accents on every other event in an evenly spaced sequence (e.g., tick-tock of a clock), or perceive accents on notes that are preceded or followed by long silent intervals, even if the events are the same duration, loudness, and pitch as events not surrounded by silence. By creating sequences that differ in their pattern of temporal onsets, but are identical in all other respects (e.g., number of tones, duration, loudness, pitch), researchers can create rhythms with varying degrees of a beat percept that are matched in other ways. For example, in metric simple rhythms (see Fig. 1A), the timing of the tones is selected to create regularly occurring perceptual accents, which induce a clear and steady beat. In these cases, the intervals between tones would be comprised of whole-integer ratios (e.g., 2, 2, 1, 1, 2, in which numbers represent multiples of an arbitrary time interval (e.g., 1 = 250 ms) between tone onsets). These can be compared to metric complex rhythms (see Fig. 1B), in which tone onsets are syncopated: the perceptual accents occur irregularly—they do not always coincide with the beat, and thus do not induce a strong beat percept (Povel & Essens, 1985). The irregular perceptual accents of complex rhythms also make it more difficult to synchronize with them (Patel, Iversen, Chen, & Repp, 2005). A third category of rhythmic sequences is non-metric, in which onsets occur in non-integer ratio sequences (e.g., onsets occurring at 1, 1.3, 0.4, 1.7 intervals apart) rather than the integer ratio sequences common to metric simple and complex rhythms. These stimuli sound almost random, so much so that listeners have a strong inclination to reproduce them by incorrectly tapping back integer ratio sequences (Collier & Wright, 1995; Essens, 1986; Ravignani, Delgado, & Kirby, 2017). Importantly,

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

168   christina m. vanden bosch der nederlanden et al. the perceptual difference between simple and complex rhythms is a product of whether perceptual accents coincide with tone onsets, as both types of rhythms are composed of integer-ratio intervals. Now that we have covered the notion of metricality and the distinction between simple and complex rhythms, we can dive into recent findings from cognitive neuroscience that describe how neural processes give rise to rhythm perception. The basic logic of fMRI studies involves measuring the brain’s online metabolic activity in at least two different experimental conditions, and comparing the pattern of differences. These differences indicate which neural structures respond during a behavior or cognitive function. For example, comparing brain responses to simple, complex, and non-metric rhythms that differ in the strength of beat perception but are otherwise perceptually similar, might pinpoint the neural structures involved in perceiving the beat. Using this approach, greater activity has been seen in certain motor structures, namely the basal ganglia (BG) and supplementary motor area (SMA), while hearing metric simple rhythms (which have a clear beat) compared to complex or non-metric stimuli (which have little or no beat; Grahn & Brett, 2007). Greater activity occurred regardless of participants’ musical training, suggesting a fundamental role for those areas in rhythm perception. Responses in other motor areas, namely the premotor cortex (PMC) and cerebellum, were observed for all sequences, and did not vary depending on the presence or absence of a beat. All these motor areas were implicated in more general timing and sequence processing as well (Chen, Penhune, & Zatorre, 2008), but the basal ganglia and SMA appear to be particularly responsive to beat processing. A similar study revealed a distributed network that predicts individual differences in beat perception, in which strong beat perceivers display greater SMA, ventrolateral prefrontal cortex (PFC), and medial PFC activity compared to weak beat perceivers. The activated network in strong beat perceivers was more distributed through frontal and motor areas, whereas the activated network in weak beat perceivers was more limited to auditory processing (Chen, Zatorre, & Penhune, 2006). While degree of motor response in activated ­networks differs depending on the task and the stimuli, one conclusion is clear: rhythm perception recruits participation from a network of motor areas, even when movements are not required. The recruitment of basal ganglia (specifically, the putamen) during beat perception was further confirmed by a later fMRI study that examined the neural responses to the beat when the beat was induced by different types of accents, or emphases (Grahn & Rowe, 2009). The beat was either emphasized by changes in loudness on beat tones (strong external accents), or perceptual accents created by timing of the tones without loudness changes as described in metric simple stimuli above (weak external accents), or the tones were unaccented, and thus any beat imposed by listeners was generated internally (internal accents). Regardless of the accent used to induce beat perception, greater putamen activity was observed relative to control non-beat rhythms. Moreover, greater connectivity (taken as a measure of communication between brain areas) was observed between the putamen and the SMA, PMC, and auditory cortex. In musicians, whether the beat was external or internal activated different subcomponents of the motor network, modulating connectivity between PMC and auditory cortex.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   169 fMRI studies provide insights into the metabolic state of the brain under different conditions, but there are other physiological markers we can use to decipher the processes involved in rhythm perception. TMS can be used to briefly activate the connections between the brain, spinal cord, and distal musculature, giving us a snapshot of the body’s corticospinal excitability at any given moment. Simply, a strong magnetic field is induced on the surface of a participant’s head using a powerful electric current, safely contained within an insulated coil. This field passes a few centimeters through the head and induces a small and localized electric field inside the brain that causes neurons within it to fire. This means experimenters can directly and non-invasively stimulate neuronal firing. The neuronal firing triggered by TMS delivered to primary motor cortex (M1) results in involuntary muscle twitches. Measuring the amplitude of the electrical signal that causes the muscle to contract, called a motor evoked potential (MEP), at the muscle gives a reliable index of corticospinal excitability, or the motor system’s readiness for action. For example, stimulating M1 in pianists elicits greater amplitude MEPs from hand muscles when they listen to a piece they have played compared to an unfamiliar piece (D’Ausilio, Altenmüller, Olivetti Belardinelli, & Lotze, 2006), suggesting that their motor system automatically responds to pieces they have learned. Rhythm researchers have used this TMS-MEP logic to measure the motor system’s excitability during beat perception. For example, MEP amplitudes measured from the ankle were greater when TMS pulses were delivered in time with the beats of strong beat compared to weak beat rhythms (Cameron, Stewart, Pearce, Grube, & Muggleton, 2012). Increased MEP amplitude in response to the beat is in line with the aforementioned fMRI literature on greater basal ganglia and SMA activations during perception of simple compared to complex sequences. Increases in excitability also happen in response to music. In a recent study, musicians listened to “high-groove” or “low-groove” music while receiving TMS, where groove is “a musical quality that makes us want to move with the rhythm or beat” (Stupacher, Hove, Novembre, Schutz-Bosbach, & Keller,  2013, p. 127). MEP amplitudes were greater for high-groove versus low-groove music, and this effect was more pronounced for MEPs elicited on the beat versus off the beat pulses, indicating that motor system readiness was greatest on the beat. Although these studies do not directly implicate the basal ganglia in rhythm perception—these structures are too deep within the brain for the transcranial magnetic field to penetrate—the MEPs are measured downstream of the central nervous system, implicating the possible modulation of the entire motor system. The last source of evidence for the role of the motor system in rhythm perception comes from neuropsychological cases. There are patients whose rhythm perception has been altered by a disease or disorder, or by lesions due to stroke. For example, Parkinson’s disease (PD) is characterized by rigidity of movement, tremor, and slowness, and is caused by the progressive deterioration of the dopaminergic pathway in the basal ganglia. Given the aforementioned review of the basal ganglia’s role in rhythm perception, and PD patients’ documented difficulty with perception and production of isochronous rhythms (Harrington, Haaland, & Hermanowitz, 1998), Grahn and Brett (2009) surmised that this patient population would also display difficulties with beat perception.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

170   christina m. vanden bosch der nederlanden et al. Indeed, healthy older participants found it easier to perceive simple, beat-based rhythms versus complex rhythms, whereas PD patients did not display this advantage, indicating they were less able use the simple rhythms’ beat-based structure to perform the discrimination task. The authors concluded that a healthy basal ganglia appears to be necessary for processing rhythms with a strong beat. In a follow-up to this study, PD patients who were on L-DOPA, a drug that increases dopamine, discriminated simple rhythms better (but complex rhythms worse) when they were on versus off their medication (Cameron, Pickett, Earhart, & Grahn, 2016). Rhythm discrimination performance was correlated with the severity of the Parkinson’s disease. Taken together, the results indicated that healthy dopaminergic function influences beat-based timing. Finally, the ability to adapt to changes in tempo is severely hampered by focal basal ganglia lesions due to stroke (Schwartze, Keller, Patel, & Kotze, 2011). Thus, overall, these neuropsychological studies point to an essential role of the basal ganglia in normal rhythm perception.

Oscillatory Mechanisms An alternative way of examining beat perception is through the neural dynamics of excitation and inhibition, which lead to cyclical activity changes in populations of neurons. These cyclical changes are called oscillations. Neuronal activity oscillates spontaneously in the brain, but when listeners receive rhythmic input, the phase and period of ongoing neural oscillations can be influenced to match, or phase-lock to, the incoming signal (e.g., Picton, John, Dimitrijevic, & Purcell, 2003; Schroeder, Lakatos, Kajikawa, Partan, & Puce, 2008). Indeed, rhythmic stimuli like music or language can act as a pacing signal that allows listeners to more accurately attend to relevant information in the continuous signal (Henry & Obleser, 2012). This finding is directly consistent with behavioral work from the dynamic attending literature that finds fluctuations in attention over time (Jones & Boltz,1989; Large & Jones, 1999). That is, attention fluctuates periodically, with peaks of concentrated attention that occur more strongly with increasing periodicity of a stimulus. Oscillatory attentional and neural dynamics help explain a large body of literature in music cognition that finds better performance for events (pitch or interval discrimination) happening on a strong beat—when attention may be at its peak in the oscillatory phase—compared to a weak beat (for recent review see Henry & Herrmann, 2014). The ongoing oscillatory neural dynamics of the brain also make behavioral predictions for musical rhythm and beat perception. Neural resonance theory (Large, 2008; Large & Snyder, 2009) demonstrates that the interaction of rhythmic input with a bank of ongoing neural oscillators can give rise to several key facets of human rhythm and beat ­perception described throughout this chapter. As described above, the way that humans experience music is as a stable regular pattern in time. However, the surface rhythm is often not periodic. A rhythm may initially have several events that fall on the beat, but

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   171 events will also fall on both strong and weak metrical positions (Figs. 1A and 1B) and may even begin to consistently fall on the off-beat, as is the case in syncopated rhythms (Fig. 1C). The mathematical model of neural resonance theory predicts that perception of the beat in this instance would remain stable because the initial rhythmic input from rhythms of Figs. 1A and 1B act to reset the phase and period of ongoing neural oscillators, leading to a beat percept that is quite persistent, even in the face of conflicting evidence. Further, the physics of ongoing oscillators allow listeners to maintain a rhythmic pulse, even in the absence of environmental input (e.g., through a silent gap in a song). Finally, a key prediction from neural resonance theory illustrates that neural oscillations ­resonate with rhythmic input, which results in peaks of activation at integer multiples at harmonics or subharmonics of the input rhythms. These harmonics (3:1 or 2:1) and subharmonics (1:3 or 1:2) are related to the way that listeners hear perceptual accents on alternating events (see section on “Oscillatory Mechanisms”), and place stronger perceptual emphasis on certain tones, such as downbeats, in metrical groupings. Taken together, neural resonance theory explains how humans (a) perceive a regular pulse from irregular rhythmic input (i.e., in the absence of strictly periodic input, such as a metronome), (b) maintain the feeling of a pulse or beat that persists when sound ceases, and (c) experience alternations of strong and weak beats and begin to organize music into a hierarchical metrical framework. That is, listeners “hear musical events in relation to these patterns because they are intrinsic to the physics of the neural systems involved in perceiving, attending, and responding to auditory stimuli” (Large & Snyder, 2009, p. 52). Neural oscillatory perspectives on rhythm perception have been fruitful because they rely on the temporal dynamics of rhythm processing instead of relying on an index of rhythm processing at a single moment in time or as an average of brain activity over time. One popular way to examine beat perception has been through the frequency tagging approach. This methodology allows researchers to characterize at what rates listeners hear strong events happening, or have heightened attention, in musical rhythms (e.g., events happening at a rate of 2 or 3 Hz). A landmark study demonstrated that when participants heard a metronome (i.e., evenly spaced, unaccented sequence of tones), but were asked to perceive the metronome in groupings of two or three tones, there was a peak in the power of the EEG spectrum related to the particular frequency they imagined. Thus, when participants heard the rhythm in groupings of two there was greater power at the frequency related to a binary grouping compared to the slower frequency related to a ternary grouping and vice versa (Nozaradan, Peretz, Missal, & Mouraux, 2011). When participants were trained to move to an ambiguous rhythm in either a duple or a triple meter, there was subsequently greater power in the EEG spectrum at the frequency they moved to (Chemin, Mouraux, & Nozaradan, 2014), even when simply listening and no longer moving. Such findings demonstrate that beat perception is not simply stimulus-driven, but that listeners can and do impose a beat on a sequence that can be observed in neural activity. Oscillatory activity in particular frequency bands—which are unrelated to the particular frequency of the input stimulus—is also important for characterizing rhythm processing.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

172   christina m. vanden bosch der nederlanden et al. When researchers looked at induced (i.e., not phase-locked to the event onset) and not evoked (i.e., phase-locked to the stimulus onset) activity using electroencephalography they found that activity of high-frequency oscillations from 20–60 Hz followed the pattern of processing that was observed in many behavioral studies of rhythm processing— listeners do not simply react to note onsets after they occur. Instead, they anticipate a beat and even “feel” that beat when a note is occasionally omitted. These researchers showed that there was a peak in induced high-frequency oscillations that occurred in anticipation of tone onsets even when the tone was omitted (Snyder & Large, 2005). Further studies found separate functional roles for beta (15–30 Hz) and gamma (30–80 Hz) frequency bands (Fujioka, Trainor, Large, & Ross, 2009). Activity in the beta band reflected motor processing and was important for coordinating auditory-motor interactions when processing the beat in music, whereas gamma-band activity was associated with the same endogenous anticipatory processing of the beat found in previous studies. In a follow-up study, Fujioka and colleagues (Fujioka, Trainor, Large, & Ross,  2012) found that induced beta-band activity increased in anticipation of the beat and varied with tempo, further suggesting an endogenous generator. In contrast, a sharp decrease in induced beta occurred immediately after the onset of the beat, but this decrease followed the same pattern regardless of stimulus presentation characteristics, suggesting that beta desynchronization was simply a response to hearing a tone, and not reflective of anticipation. Importantly, this activity originates both from auditory cortex and from sensorimotor cortex, again highlighting the role beta-band activity plays in coordinating auditory-motor interactions (Fujioka et al., 2012). These auditory-motor interactions in the beta band could have important consequences for preparation of motor movements that are important in more ecologically valid musical experiences, such as when the listener grooves along with the music. Again, these studies highlight that rhythm processing is not just a faithful tracking of acoustic input, but rather involves the perception of beat and meter related periodicities that are not necessarily part of a stimulus. These studies also highlight the integral role motor processing plays in the perception of the beat in music, even when listeners are not moving or tapping along to the music. There is a growing interest in neuroscientific investigations of rhythm processing to characterize the way that humans entrain to music by looking at oscillatory dynamics at the beat frequency and by considering auditory-motor dynamics in other frequency bands. While many approaches have shed considerable light on the ways listeners perceive rhythm, it is important to carefully consider how we interpret whether differences in peak power of the EEG spectrum are related to beat or stimulus characteristics (Henry, Herrmann, & Grahn, 2017). Neural resonance theory has shown that many facets of beat perception in humans emerge naturally from the physical interactions of multiple internal oscillators and rhythmic input, but it does not explain all aspects of beat perception, such as how children learn to be better perceivers of the beat in music, or how beat perception changes based on culture or musical experience. Future research is needed to understand more about how oscillatory activity in the brain interacts with music experience and how such experiences are maintained or weighted in such a dynamical system.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   173

Language and Music Music and language are both important forms of human communication. Although there are many similarities among these two domains, one of the key similarities related to rhythm processing is that music and language both unfold sequentially in time and are hierarchically structured (Patel, 2003). Yet the temporal characteristics of how they unfold are different (Ding et al., 2017). As is clear from the discussion thus far, listeners perceive a musical beat that, despite surface irregularities, leads to the perception of beat events unfolding at regular intervals, with alternating strong and weak beats according to the meter of the music. In language, there is a long history of debate over whether there are isochronous units in speech rhythms between successive syllable onsets or between successive stressed syllable onsets. However, after carefully annotating speech intervals at consonant and vowel onsets, little evidence has been found for regularity between syllables or stressed syllables in the acoustic signal, although other patterns of more or less vocalic variability did emerge (Grabe & Low,  2002; but see Brown, Pfordresher, & Chow, 2017). Despite the lack of evidence for an isochronous interval in speech, spoken utterances are still comprised of rhythmic, albeit less regular than in music, peaks in the acoustic signal that are important for helping listeners form expectations. There is growing evidence that better neural tracking of the rhythmic syllable onsets in speech is important for language comprehension (e.g., Peelle, Gross, & Davis, 2013). Given the importance of understanding rhythmic relationships in ­language, there is growing interest in how musical training or musical ability is related to language abilities in a wide range of listeners. A large body of literature has focused on the relationship between reading and rhythm processing. For instance, compared to age and reading level matched peers, individuals with dyslexia are worse at neurally tracking low-frequency (e.g., delta and theta frequency bands) temporal information in the speech signal (Power, Colling, Mead, Barnes, & Goswami, 2016). Delta (0–4 Hz) and theta (4–8 Hz) frequencies roughly correspond to the rate at which phrases and syllables unfold in the speech stream, respectively. Some researchers have even posited that the deficits related to phonological awareness and analysis in developmental dyslexia are actually caused by ­temporal processing deficits. In particular, some evidence suggests that adults with dyslexia oversample the speech stream in high-frequency oscillatory bands that may be related to phonological onsets, which leads to greater power at frequencies that may be ­irrelevant for processing phonetic information in speech (Lehongre, Ramus, Villiermet, Schwartz, & Giraud, 2011). Further evidence of a relationship between language and temporal processing abilities comes from individuals with specific language impairments, including dyslexia, for whom there are positive correlations between musical training and language outcomes, with unique predictive power coming from rhythm perception skills (Flaugnacco et al., 2015; Habib et al., 2016; Zuk et al., 2017). Enhanced language processing as a result of music ability may be particularly related to beat-based

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

174   christina m. vanden bosch der nederlanden et al. processing abilities, and not better encoding of rhythmic intervals in general, as studies have shown that regularity detection is particularly related to language and literacy in adults from a wide range of language backgrounds (Bekius, Cope, & Grube,  2016; Grube, Cooper, & Griffiths, 2013). While the studies outlined above show evidence that neurally following rhythmic input in speech is important for developing normal language and reading skills and that music training seems to be related to behavioral language outcomes, there are very few studies that have compared beat perception or production abilities and related them directly to neural tracking of rhythmic input. However, one study has shown that there is an association between rhythm production abilities in preschoolers and encoding of fundamental frequency in a single utterance (i.e., “da”) through auditory brainstem response (Woodruff-Carr, White-Schwoch, Tierney, Strait, & Kraus,  2014). Further research is necessary to establish a link between rhythm perception or production and neural tracking of low-frequency information in speech. Tracking the rhythmic fluctuations in the amplitude envelope of speech may also be different from the types of rhythmic entrainment discussed above. Therefore, it is important to determine whether these language and reading studies rely on similar mechanisms as musical rhythm entrainment, such that that neural activity can remain entrained to the syllable rate even when the utterance is removed, or whether these studies represent a more stimulus-dependent rhythmic processing than musical rhythms.

Development of Rhythm Rhythm perception is an important skill for myriad domains, including music, language, and movement, among others. The ubiquity of rhythmic information in our everyday environment highlights the importance for developing rhythm processing skills early in development. Indeed, rhythm processing seems to be important also for social-emotional development. Children who are able to synchronize to music show not only better parsing of events unfolding in time, but also show better social-emotional processing as a result of synchronization. After playing musical instruments together, 4-year-olds showed higher rates of spontaneous helping compared to children who were not encouraged to synchronize their actions with a partner (Kirschner & Tomasello, 2010). A similar pattern can be found as young as 14 months of age in a paradigm that induces synchrony between a child and another adult by having the experimenter bounce the child in an infant carrier in synchrony with another adult (Cirelli, Einarson, & Trainor, 2014). Infants bounced synchronously showed more prosocial helping behaviors than children bounced out of synchrony. It is clear that beat processing is advantageous for normal development, but indexing beat perception in infancy is difficult given the limitations young infants have making overt behavioral responses. This is where

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   175 neural measures are particularly useful for examining beat perception at the earliest stages of development. Newborns listening to musical sequences show larger neural mismatch responses when an omission occurs on a strong beat compared to a weak beat (Winkler, Haden, Ladinig, Sziller, & Honing, 2009), providing evidence for beat processing in humans from birth. Further, using the frequency-tagging approach described above, infants who heard an ambiguous rhythm that could be perceived in either a duple or triple meter, had peaks corresponding to the beat and both metrical frequencies (Cirelli, Spinelli, Nozaradan, & Trainor, 2016). However, infants with either more experience in music classes or more musically engaged parents showed greater peaks in the EEG spectrum related to duple compared to triple meter perception (Brochard et al., 2003). This finding is in line with culture-specific patterns suggesting that Western listeners prefer simple integer ratios, with a bias toward duple meter groupings. Later in childhood, even when children are capable of making behavioral or motor responses, the immaturity of their motor system makes it unclear whether a lack of beat perception abilities or motor immaturity is at the root of differences between the way children and adults process rhythm and beat. For instance, although children move when they hear music, there is little evidence that they actually synchronize their movements to the beat, which makes it unclear whether children are poor beat perceivers or poor dancers. As described above, beta-band activity reflects auditory-motor interactions and researchers have used this approach to show that beat processing may not become mature until after age 7. Seven-year-olds’ beta-band activity during beat processing only showed the adult-like pattern of desynchronization and subsequent rebound in anticipation of the beat for slow rhythms, not fast rhythms (Cirelli, Bosnyak, et al., 2014). These findings in the beta band align well with behavioral findings suggesting that sensorimotor synchronization with music does not reach adult-like accuracy until 8 or 9 years of age (McAuley, Jones, Holub, Johnston, & Miller, 2006). Together these findings suggest that beat perception is intact from birth, but that the auditory-motor processing capabilities required to synchronize movements to music develop well into childhood.

Comparative Psychology and Evolution of Rhythm Perception Clues to the neural mechanisms of rhythm perception can be gleaned from comparative studies between humans and other species that have similar abilities, but different brains. Some of the best examples of rhythmic entrainment come from various bird species, such as cockatiels (Patel, Iversen, Bregman, & Schulz, 2009), certain parrots (Schachner, Brady, Pepperberg, & Hauser, 2009), and budgerigars (Hasegawa, Okanoya, Hasegawa, & Seki, 2011), who can all bob their heads in time with a simple rhythm.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

176   christina m. vanden bosch der nederlanden et al. Although none of these animals appear to reach human sophistication with rhythmic entrainment—for example, they have difficulty with complex rhythms, or with adaptation to novel tempos—their ability to synchronize with simple rhythms has led some researchers to theorize that beat perception is a corollary ability to the development of vocal learning (Patel, 2006). Although this idea is further supported by the presence of simple synchronization abilities in other vocal, non-bird species such as bonobos, chimpanzees, and possibly elephants (Hattori, Tomonaga, & Matsuzawa, 2013; Large & Gray, 2015; Poole, Tyack, Stoeger-Horwath, & Watwood, 2005), recent demonstrations of rhythmic entrainment in a sea lion—a decidedly non-vocal species—pose complications for the theory that beat perception is a corollary development to vocal learning (Cook, Rouse, Wilson, & Reichmuth, 2013). The sea lion not only synchronizes to simple rhythms, she satisfies more stringent tests of rhythmic entrainment previously observed only in humans, such as being able to adapt to changes in tempo. Most cross-species research is done on monkeys instead of better vocal learners (like the aforementioned bird species) for matters of convenience (e.g., established monkey neurophysiology labs, similar brains) and closer evolutionary ancestry. Monkey rhythmic entrainment is impoverished by comparison to humans: macaques can time intervals very accurately (Zarco, Merchant, Prado, & Mendez, 2009), and can synchronize with simple isochronous sequences, but they have more reactionary, as opposed to anticipatory, actions. Online measures of neural activity (i.e., local field potentials, LFPs) during synchronization tasks indicate that monkeys’ putaminal cells are interval-sensitive, with different populations representing different durations by bursts of gamma- or betaband oscillations (Bartolo, Prado, & Merchant, 2014). Thus, monkeys are good at timing the individual intervals that make up a rhythm, but they do not appear to synchronize as accurately as humans do when multiple intervals are presented in a sequence. Monkeys also do not appear to process non-isochronous rhythms the way that humans do. EEG studies in monkeys show no event-related potentials (ERPs) corresponding to unexpected events (as indexed by the mismatch negativity, or MMN, to unexpected beat omissions); the MMN represents the detection of something out-of-place—if the monkeys don’t perceive the beat in the rhythm, then the omissions won’t be out of place (Honing, Merchant, Háden, Prado, & Bartolo, 2012). Moreover, simple rhythmic deviants elicit changes in gaze and expression, and auditory cortex LFPs (Selezneva et al.,  2013). Structurally, these sequential timing tasks in humans rely on the motor cortico-basalganglia-thalamo-cortical (mCBGT) circuit in humans. The monkey analogue to this network also appears to be heavily involved in motor timing and sequencing (Merchant, Pérez, Zarco, & Gámez, 2013). However, the reciprocal connections between auditory cortex and the mCBGT circuit that exist in humans are not matched in monkeys. Instead, the monkey mCBGT appears to be more strongly connected to visual cortex, which may explain why monkeys lack strong rhythmic entrainment, and perform better on visual synchrony tasks than auditory ones (for a review, see Merchant & Honing, 2014). This structural discrepancy, along with the behavior differences between humans and monkeys, may explain why strong rhythmic entrainment appears to be a decidedly human ability.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   177

Cross-Modal Investigations of Rhythm Perception Thus far, this chapter has focused on rhythm perception in the auditory modality. However, rhythms include temporally patterned stimuli in any modality. For example, the isochronous blinking of a car’s turn signal is a visual rhythm, and your phone’s vibrating notification is a tactile rhythm. In this section, we will discuss how rhythm is perceived in non-auditory modalities, focusing on vision. Predictably, the neural correlates of rhythm perception differ between modalities, but some are also shared. These shared substrates might be a clue to the neural representation of rhythm in a pure, temporal sense, uncontaminated by modality-specific processing: the sine qua non of rhythm perception. Like audition, vision is sensitive to temporal regularities in the environment. For example, visual-spatial attention is biased toward reliably repeating patterns (Zhao, Al-Aidroos, & Turk-Browne, 2013). Unlike audition, rhythmic visual stimuli, such as a blinking dot, do not lead to a strong sense of beat: Auditory rhythms are reproduced and remembered better than visual ones (Glenberg, Mann, Altman, Forman, & Procise, 1989; Collier & Logan, 2000, respectively). While it is true that audition generally has better temporal sensitivity than vision (e.g., Goldstone & Lhamon, 1972) this does not explain why auditory rhythms give rise to a sense of beat and visual ones do not. Recently, researchers have instantiated visual rhythms with more dynamic stimuli in an attempt to capitalize on the visual system’s sensitivity to motion and acceleration. A blinking stimulus isn’t visually natural, but a moving one is. Concordantly, rotating bars and bouncing balls can give rise to a sense of beat in a manner similar to auditory stimuli (Grahn, 2012; Hove, Iversen, Zhang, & Repp, 2013; Iversen, Patel, Nicodemus, & Emmorey, 2015). Even more naturalistic stimuli, like watching a dancer or following a conductor’s baton, give rise to timing advantages illustrative of beat perception (Luck & Sloboda, 2009; Su & Salazar-Lopez, 2016). The message from this new literature is that although audition wins over other modalities for temporal processing superiority, rhythm processing is possible in other modalities when the stimuli are crafted to follow that sense’s priorities. Given that visual rhythm processing is possible, how does the brain do it? One possibility is that visual rhythm processing piggy-backs on the rhythmically-superior auditory and motor resources. According to this view, visual rhythm perception involves the creation of an internal auditory rhythm to accompany visual stimuli. Evidence for this perspective was demonstrated in an fMRI task where participants watched or heard rhythmic stimuli in counterbalanced blocks of a tempo adaptation task; visual sequences produced a stronger sense of beat and stronger bilateral putamen activity when preceded by the auditory task block versus with no prior auditory experience with the task (Grahn, Henry, & McAuley, 2011). This change in brain response during the visual task following the auditory block resembled the activation observed in auditory tasks alone

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

178   christina m. vanden bosch der nederlanden et al. (Grahn & Brett, 2007). When the visual task preceded the auditory block, there was no enhancement to rhythm perception or brain response in the basal ganglia, indicating that it was not simply a practice effect. This study used blinking visual rhythms that do not readily elicit rhythm perception, so the authors suggested that the observed behavior and brain responses reflected the co-opting of typical auditory rhythm perception to achieve the perception of a visual rhythm. In a later fMRI study with discrete and moving visual and auditory stimuli, this putamen activity was shown to reflect a supra-modal rhythm perception response: Activity in the putamen corresponded to the strength of synchrony with an ongoing rhythm, regardless of the modality and without prior auditory experience with the stimuli (Hove, Fairhurst, Kotz, & Keller, 2013). This idea of a supra-modal, or modality-general, process underpinning rhythm perception received further support from a study measuring ERPs in response to temporal expectancy violations in an adaptive tempo task with auditory and visual stimuli (Pasinski, McAuley, & Snyder, 2016). They found larger amplitude ERPs for the auditory task, but similar patterns as the visual response, suggesting again the presence of a modality-general rhythm perception network, likely rooted in the basal ganglia and the motor system.

Individual Differences and Musical Training Rhythm processing abilities vary widely in the general population. It is not difficult to run into someone who proclaims that she has two left feet, and there is evidence of individuals who are actually “beat deaf.” These individuals cannot align their movements to the beat of a musical piece despite being able to synchronize to a metronome (PhillipsSilver et al., 2011). Differences in experience, such as music training, can enhance rhythm perception and production. But individual differences in abilities that are associated with beat perception can lead people to encode, store, and act on auditory information in different ways. For instance, individual differences in auditory short-term memory (STM) and regularity detection are associated with better rhythm abilities, especially when reproducing longer rhythms (Grahn & Schuit, 2012). Music training also accounts for unique variance in rhythm reproduction abilities compared to auditory STM and regularity detection, although musical training may only influence rhythm perception abilities in certain tasks (Bauer, Kreutz, & Herrmann, 2015; Grahn & Brett, 2007; Geiser, Ziegler, Jancke, & Meyer, 2009). Regularity detection is also correlated with activation in auditory-motor areas, including left SMA and left dorsal and ventral premotor areas, which may indicate that people who are better at detecting the beat in music also rely more heavily on transforming rhythms into auditory-motor representations instead of relying purely on auditory cues (Grahn & Schuit, 2012). These findings are similar to previous work showing that strong beat perceivers showed greater activation in SMA compared to weak beat perceivers, when listening to an ambiguous rhythm

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   179 (Grahn & McAuley, 2009). Individual motor abilities may also be important for predicting individual differences in preferred tempo (120 bpm or 2 Hz), which is the rate at which listeners feel most comfortable tapping to music or a metronome (McAuley et al., 2006). An individual’s specific peak frequency in the beta range, assessed during a motor tapping task, predicts preferred tempo (Bauer et al., 2015), providing additional evidence that auditory-motor interactions can lead to differences in the way that people prefer to entrain to music. Although much of the literature on rhythm and beat perception makes claims about commonalities across individuals in the neural processing of rhythms, there is considerable variation in the way humans respond to rhythms. These individual differences are particularly important to consider when trying to use rhythm as a therapeutic tool, as in patients with PD. Although rhythmic stimulation may have seemingly miraculous effects for some individuals, there are many others for whom rhythmic stimulation may have no effect or perhaps even a negative effect on gait (Leow, Parrott, & Grahn, 2014; Nombela et al., 2013). Further research is necessary to characterize what factors lead to these individual differences in neural processing of rhythm, including auditory-motor interactions, musical background, and biological differences to better target interventions to the individual.

Mirroring and Joint Action So far, we have examined rhythm as a subject of the perceiver and his or her brain. Realistically, rhythms must also have creators, making rhythm perception an inherently social topic: It depends upon the perception of others’ actions. One of the themes of this chapter has been the contribution of the motor system to the perception of rhythm; unsurprisingly, it may be through the shared architecture of our motor systems that we perceive music and rhythm so fluently when expressed by other people. The idea of motor system involvement in rhythm perception follows from the ­discovery of the mirror neuron system in monkeys and analogous systems in humans (see Rizzolatti & Craighero, 2004, for a review), a network that responds not only to one’s own movements, but also to seeing or hearing movements of others. This discovery was rapidly adapted to explain motor simulation: that we unconsciously mimic, or concurrently represent, others’ movements within our own motor system (Gallese & Goldman, 1998). It is useful to think of motor simulation as a way to represent observed actions by the same motoric structures that execute them. Later, motor simulation was employed to explain the empathic nature of movement in art (Freedberg & Gallese, 2007). Evidence for the shared representation of action in art observation has been demonstrated in dance (Cross, Hamilton, & Grafton, 2006) and painting (Leder, Bär, & Topolinski, 2012; Taylor, Witt, & Grimaldi, 2012), but it is most prominently espoused in music, explaining findings such as the automatic activation of hand-controlling motor areas in pianists while listening to piano performance (Haueisen & Knösche, 2001), the co-activation of

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

180   christina m. vanden bosch der nederlanden et al. auditory areas in violinists when they mimic violin actions (Lotze, Scheler, Tan, Braun, & Birbaumer, 2003), and various effects describing interference between music listening and musical performance, which occurs because both processes depend upon activation of the motor system (e.g., Drost, Rieger, Brass, Gunter, & Prinz, 2005; Drost, Rieger, & Prinz, 2007; Taylor & Witt, 2015). As we have seen, rhythm perception involves the motor system: Feeling the beat is an inherently motoric phenomenon. We can apply the logic of motor simulation to the challenging demands of timing and joint action in music. Perceiving rhythm in a social setting may also require the concurrent representation of others’ actions in the listener’s motor system. In an inventive TMS study, pianists were required to play the right-hand part of a duet where the left-hand part was either rehearsed by them at an earlier time or unrehearsed. This left-hand part would undergo regular changes in timing that the subject would adapt to. Right-hemisphere (read: left-hand) TMS interfered with tempo adaptation only for duets where the subject had previously rehearsed the left-handed accompanying piece. This indicates that keeping time with a duet partner involved the online co-representation of that partner’s piece, which was disrupted by the TMS (Novembre, Ticini, Schütz-Bosbach, & Keller,  2013). This is evidence for the role of motor simulation in the perception of rhythm and timing during joint musical action. This flexible adaptation is not surprising given the motor system’s ability to represent the temporal dynamics of observed actions (Press, Cook, Blakemore, & Kilner, 2011). This co-representation of observed and executed actions is important for any kind of rhythmic cooperation. To study rhythmic cooperation, a group of researchers created a virtual partner in an adaptive timing task so that the degree of timing cooperation could be tightly controlled. Subjects tapping along with the virtual partner’s changing rhythm exhibited increased activity in premotor areas when the partner was cooperative versus difficult to follow along with, suggesting that this kind of rhythmic co-action depends on simulated internal representations of the co-actor (Fairhurst, Janata, & Keller, 2012).

Conclusion As we have seen, the neuroscience of rhythm is a vibrant field of study that can be approached from many angles. Despite this variety, there has been a common thread throughout this chapter: the involvement of motor processes during the perception of rhythm and beat. Given the role of the motor system in fundamental timing processes, it is not surprising that similar networks should become involved with the perception of rhythm. What interests us is the reliability and the variety of motor system participation in rhythm processing, whether it is the recruitment of the basal ganglia during the perception of strong beats as revealed by fMRI (see section on “Feeling the Beat”), the heightened corticospinal excitability of toe-tapping as revealed by TMS (see section on “Feeling the Beat”), the auditory-motor coordination of beta-band patterns recorded from EEG (see section on “Oscillatory Mechanisms”), the co-development of movement and

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   181 rhythm production in children (see section on “Development of Rhythm”), or the use of others’ actions to guide joint action (see section on “Comparative Psychology and Evolution of Rhythm Perception”). These are just a few of the many ways in which the auditory and motor systems interact to produce the rich experience of rhythm perception and production. Promising new research offers these auditory-motor interactions as the basis for therapies to help patients with neurodegenerative diseases of the basal ganglia (e.g., Spaulding et al., 2013), or patients with developmental language disorders, such as dyslexia (e.g., Flaugnacco et al., 2015). Our ability to neurally process rhythms is not only important for being able to clap along to our favorite song, but is important for examining fundamental psychological questions ranging from individual differences in perception and production to what makes humans unique from other species. Future work on the neural bases of rhythm perception has the potential to inform a wide range of domains including aesthetics, evolution, and human perception and production.

References Bartolo, R., Prado, L., & Merchant, H. (2014). Information processing in the primate basal ganglia during sensory-guided and internally driven rhythmic tapping. Journal of Neuroscience 34(11), 3910–3923. Bauer, A. K., Kreutz, G., & Herrmann, C. S. (2015). Individual musical tempo preference correlates with EEG beta rhythm. Psychophysiology 52(4), 600–604. Bekius, A., Cope, T., & Grube, M. (2016). The beat to read: A cross-lingual link between rhythmic regularity perception and reading skill. Frontiers in Human Neuroscience 10, 425. Retrieved from https://doi.org/10.3389/fnhum.2016.00425 Brochard, R., Abecasis, D., Potter, D., Ragot, R., & Drake, C. (2003). The “ticktock” of our internal clock: Direct brain evidence of subjective accents in isochronous sequences. Psychological Science 14(4), 362–366. Brown, S., Pfordresher, P.  Q., & Chow, I. (2017). A musical model of speech rhythm. Psychomusicology: Music, Mind, and Brain 27(2), 95–112. Cameron, D. J., Pickett, K. A., Earhart, G. M., & Grahn, J. A. (2016). The effect of dopaminergic medication on beat-based auditory timing in Parkinson’s disease. Frontiers in Neurology 7, 19. Retrieved from https://doi.org/10.3389/fneur.2016.00019 Cameron, D. J., Stewart, L., Pearce, M. T., Grube, M., & Muggleton, N. G. (2012). Modulation of motor excitability by metricality of tone sequences. Psychomusicology: Music, Mind, and Brain 22(2), 122–128. Chemin, B., Mouraux, A., & Nozaradan, S. (2014). Body movement selectively shapes the neural representation of musical rhythms. Psychological Science 25(12), 2147–2159. Chen, J.  L., Penhune, V.  B., & Zatorre, R.  J. (2008). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex 18(12), 2844–2854. Chen, J.  L., Zatorre, R.  J., & Penhune, V.  B. (2006). Interactions between auditory and ­dorsal premotor cortex during synchronization to musical rhythms. NeuroImage 32(4), 1771–1781. Cirelli, L. K., Bosnyak, D., Manning, F. C., Sinelli, C., Marie, C., Fujioka, T., . . . Trainor, L. J. (2014). Beat-induced fluctuations in auditory cortical beta-band activity: Using EEG to measure age-related changes. Frontiers in Psychology 5, 742. Retrieved from https://doi. org/10.3389/fpsyg.2014.00742

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

182   christina m. vanden bosch der nederlanden et al. Cirelli, L.  K., Einarson, K.  M., & Trainor, L.  J. (2014). Interpersonal synchrony increases prosocial behavior in infants. Developmental Science 17(6), 1003–1011. Cirelli, L. K., Spinelli, C., Nozaradan, S., & Trainor, L. J. (2016). Measuring neural entrainment to beat and meter in infants: Effects of music background. Frontiers in Neuroscience 10, 229. Retrieved from https://doi.org/10.3389/fnins.2016.00229 Collier, G. L., & Logan, G. (2000). Modality differences in short-term memory for rhythms. Memory & Cognition 28(4), 529–538. Collier, G.  L., & Wright, C.  E. (1995). Temporal rescaling of simple and complex ratios in rhythmic tapping. Journal of Experimental Psychology: Human Perception and Performance 21(3), 602–627. Cook, P., Rouse, A., Wilson, M., & Reichmuth, C. (2013). A California sea lion (Zalophus californianus) can keep the beat: Motor entrainment to rhythmic auditory stimuli in a non vocal mimic. Journal of Comparative Psychology 127(4), 412–427. Cooper, G., & Meyer, L. B. (1960). The rhythmic structure of music. Chicago, IL: University of Chicago Press. Cross, E. S., Hamilton, A. F. D. C., & Grafton, S. T. (2006). Building a motor simulation de novo: Observation of dance by dancers. NeuroImage 31(3), 1257–1267. D’Ausilio, A., Altenmuller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity of the motor cortex while listening to a rehearsed musical piece. European Journal of Neuroscience 24(3), 955–958. Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D. (2017). Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81(Part B), 181–187. Drost, U. C., Rieger, M., Brass, M., Gunter, T. C., & Prinz, W. (2005). Action-effect coupling in pianists. Psychological Research 69(4), 233–241. Drost, U. C., Rieger, M., & Prinz, W. (2007). Instrument specificity in experienced musicians. Quarterly Journal of Experimental Psychology 60(4), 527–533. Epstein, D. (1995). Shaping time: Music, the brain, and performance. New York: Macmillan. Essens, P. J. (1986). Hierarchical organization of temporal patterns. Perception & Psychophysics 40(2), 69–73. Fairhurst, M. T., Janata, P., & Keller, P. E. (2012). Being and feeling in sync with an adaptive virtual partner: Brain mechanisms underlying dynamic cooperativity. Cerebral Cortex 23(11), 2592–2600. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PLoS ONE 10(9), e0138715. Freedberg, D., & Gallese, V. (2007). Motion, emotion and empathy in esthetic experience. Trends in Cognitive Sciences 11(5), 197–203. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2009). Beta and gamma rhythms in human auditory cortex during musical beat processing. Annals of the New York Academy of Sciences 1169, 89–92. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized timing of isochronous sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32(5), 1791–1802. Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences 2(12), 493–501. Geiser, E., Ziegler, E., Jancke, L., & Meyer, M. (2009). Early electrophysiological correlates of meter and rhythm processing in music perception. Cortex 45(1), 93–102.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   183 Glenberg, A. M., Mann, S., Altman, L., Forman, T., & Procise, S. (1989). Modality effects in the coding reproduction of rhythms. Memory & Cognition 17(4), 373–383. Goldstone, S., & Lhamon, W. T. (1972). Auditory-visual differences in human temporal judgment. Perceptual and Motor Skills 34(2), 623–633. Grabe, E., & Low, L. (2002). Durational variability in speech and the rhythm class hypothesis. In N.  Warner & C.  Gussenhoven (Eds.), Papers in laboratory phonology 7 (pp. 515–546). Berlin: Mouton de Gruyter. Grahn, J.  A. (2012). See what I hear? Beat perception in auditory and visual rhythms. Experimental Brain Research 220(1), 51–61. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Grahn, J.  A., & Brett, M. (2009). Impairment of beat-based rhythm discrimination in Parkinson’s disease. Cortex 45(1), 54–61. Grahn, J. A., Henry, M. J., & McAuley, J. D. (2011). fMRI investigation of cross-modal interactions in beat perception: Audition primes vision, but not vice versa. NeuroImage 54(2), 1231–1243. Grahn, J. A., & McAuley, J. D. (2009). Neural bases of individual difference in beat perception. NeuroImage 47(4), 1894–1903. Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience 29(23), 7540–7548. Grahn, J.  A., & Schuit, D. (2012). Individual difference in rhythmic ability: Behavioral and neuroimaging investigations. Psychomusicology: Music, Mind, and Brain 22(2), 105–121. Grube, M., Cooper, F. E., & Griffiths, T. D. (2013). Auditory temporal-regularity processing correlates with language and literacy skill in early adulthood. Cognitive Neuroscience 3(3–4), 225–230. Habib, M., Lardy, C., Desiles, T., Commeiras, C., Chobert, J., & Besson, M. (2016). Music and dyslexia: A new musical training method to improve reading and related disorders. Frontiers in Psychology 7, 26. Retrieved from https://doi.org/10.3389/fpsyg.2016.00026 Harrington, D.  L., Haaland, K.  Y., & Hermanowitz, N. (1998). Temporal processing in the basal ganglia. Neuropsychology 12(1), 3–12. Hasegawa, A., Okanoya, K., Hasegawa, T., & Seki, Y. (2011). Rhythmic synchronization tapping to an audio-visual metronome in budgerigars. Scientific Reports 1, 120. doi:10.1038/ srep00120 Hattori, Y., Tomonaga, M., & Matsuzawa, T. (2013). Spontaneous synchronized tapping to an auditory rhythm in a chimpanzee. Scientific Reports 3, 1566. doi:10.1038/srep01566 Haueisen, J., & Knösche, T. R. (2001). Involuntary motor activity in pianists evoked by music perception. Journal of Cognitive Neuroscience 13(6), 786–792. Henry, M.  J., & Herrmann, B. (2014). Low-frequency neural oscillations support dynamic attending in temporal context. Timing and Time Perception 2(1), 62–86. Henry, M. J., Herrmann, B., & Grahn, J. A. (2017). What can we learn about beat perception by comparing brain signals and stimulus envelopes? PLoS ONE 12(2), e0172454. Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains slow neural oscillations and optimizes human listening behavior. Proceedings of the National Academy of Sciences 109(49), 20095–20100. Honing, H., Merchant, H., Háden, G.  P., Prado, L., & Bartolo, R. (2012). Rhesus monkeys (Macaca mulatta) detect rhythmic groups in music, but not the beat. PloS ONE 7(12), e51369.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

184   christina m. vanden bosch der nederlanden et al. Hove, M. J., Fairhurst, M. T., Kotz, S. A., & Keller, P. E. (2013). Synchronizing with auditory and visual rhythms: An fMRI assessment of modality differences and modality appropriateness. NeuroImage 67, 313–321. Hove, M. J., Iversen, J. R., Zhang, A., & Repp, B. H. (2013). Synchronization with competing visual and auditory rhythms: Bouncing ball meets metronome. Psychological Research 77(4), 388–398. Iversen, J. R., Patel, A. D., Nicodemus, B., & Emmorey, K. (2015). Synchronization to auditory and visual rhythms in hearing and deaf individuals. Cognition 134, 232–244. Jones, M.  R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review 96(3), 459–491. Kirschner, S., & Tomasello, M. (2010). Joint music making promotes prosocial behavior in 4-year-old children. Evolution and Human Behavior 31(5), 354–364. Large, E. W. (2008). Resonating to musical rhythm: Theory and experiment. In S. Grondin (Ed.), Psychology of time (pp. 189–231). Bingley: Emerald. Large, E. W., & Gray, P. M. (2015). Spontaneous tempo and rhythmic entrainment in a bonobo (Pan paniscus). Journal of Comparative Psychology 129(4), 317–328. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review 106(1), 119–159. Large, E.  W., & Snyder, J.  S. (2009). Pulse and meter as neural resonance. Annals of the New York Academy of Sciences 1169, 46–57. Leder, H., Bär, S., & Topolinski, S. (2012). Covert painting simulations influence aesthetic appreciation of artworks. Psychological Science 23(12), 1479–1481. Lehongre, K., Ramus, F., Villiermet, N., Schwartz, D., & Giraud, A.-L. (2011). Altered low-gamma sampling in auditory cortex accounts for the three main facets of dyslexia. Neuron 72(6), 1080–1090. Leow, L.-A., Parrott, T., & Grahn, J. A. (2014). Individual differences in beat perception affect gait responses to low- and high-groove music. Frontiers in Human Neuroscience 8, 1–12. Retrieved from https://doi.org/10.3389/fnhum.2014.00811 Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. Lotze, M., Scheler, G., Tan, H. R., Braun, C., & Birbaumer, N. (2003). The musician’s brain: Functional imaging of amateurs and professionals during performance and imagery. NeuroImage 20(3), 1817–1829. Luck, G., & Sloboda, J. A. (2009). Spatio-temporal cues for visually mediated synchronization. Music Perception: An Interdisciplinary Journal 26(5), 465–473. McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our lives: Life span development of timing and event tracking. Journal of Experimental Psychology: General 135(3), 348–367. Merchant, H., & Honing, H. (2014). Are non-human primates capable of rhythmic entrainment? Evidence for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience 7, 274. Retrieved from https://doi.org/10.3389/fnins.2013.00274 Merchant, H., Pérez, O., Zarco, W., & Gámez, J. (2013). Interval tuning in the primate medial premotor cortex as a general timing mechanism. Journal of Neuroscience 33(21), 9082–9096. Nombela, C., Rae, C. L., Grahn, J. A., Barker, R. A., Owen, A. M., & Rowe, J. B. (2013). How often does music and rhythm improve patients’ perception of motor symptoms in Parkinson’s disease? Journal of Neurology 260(5), 1404–1405.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of rhythm perception   185 Novembre, G., Ticini, L. F., Schütz-Bosbach, S., & Keller, P. E. (2013). Motor simulation and the coordination of self and other in real-time joint action. Social Cognitive and Affective Neuroscience 9(8), 1062–1068. Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the neuronal entrainment to beat and meter. Journal of Neuroscience 31(28), 10234–10240. Pasinski, A. C., McAuley, J. D., & Snyder, J. S. (2016). How modality specific is processing of auditory and visual rhythms? Psychophysiology 53(2), 198–208. Patel, A.  D. (2003). Rhythm in language and music. Annals of the New York Academy of Sciences 999, 140–143. Patel, A. D. (2006). Musical rhythm, linguistic rhythm, and human evolution. Music Perception: An Interdisciplinary Journal 24(1), 99–104. Patel, A. D., Iversen, J. R., Bregman, M. R., & Schulz, I. (2009). Experimental evidence for synchronization to a musical beat in a nonhuman animal. Current Biology 19(10), 827–830. Patel, A. D., Iversen, J. R., Chen, Y., & Repp, B. H. (2005). The influence of metricality and modality on synchronization with a beat. Experimental Brain Research 163(2), 226–238. Peelle, J. E., Gross, J., & Davis, M. H. (2013). Phase-locked responses to speech in human auditory cortex are enhanced during comprehension. Cerebral Cortex 23(6), 1378–1387. Phillips-Silver, J., Toiviainen, P., Gosselin, N., Piche, O., Nozaradan, S., Palmer, C., & Peretz, I. (2011). Born to dance but beat deaf: A new form of congenital amusia. Neuropsychologia 49(5), 961–969. Picton, T. W., John, M. S., Dimitrijevic, A., & Purcell, D. (2003). Human auditory steady-state responses. International Journal of Audiology 42, 177–219. Poole, J. H., Tyack, P. L., Stoeger-Horwath, A. S., & Watwood, S. (2005). Animal behaviour: Elephants are capable of vocal learning. Nature 434(7032), 455–456. Povel, D.  J., & Essens, P. (1985). Perception of temporal patterns. Music Perception: An Interdisciplinary Journal 2(4), 411–440. Povel, D. J., & Okkerman, H. (1981). Accents in equitone sequences. Perception & Psychophysics 30(6), 565–572. Power, A. J., Colling, L. J., Mead, N., Barnes, L., & Goswami, U. (2016). Neural encoding of the speech envelope by children with developmental dyslexia. Brain and Language 160, 1–10. Press, C., Cook, J., Blakemore, S. J., & Kilner, J. (2011). Dynamic modulation of human motor activity when observing actions. Journal of Neuroscience 31(8), 2792–2800. Ravignani, A., Delgado, T., & Kirby, S. (2017). Musical evolution in the lab exhibits rhythmic universals. Nature Human Behaviour 1(1), 0007. doi:10.1038/s41562-016-0007 Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience 27, 169–192. Schachner, A., Brady, T. F., Pepperberg, I. M., & Hauser, M. D. (2009). Spontaneous motor entrainment to music in multiple vocal mimicking species. Current Biology 19(10), 831–836. Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal oscillations and visual amplification of speech, Cell Press 12(3), 106–113. Schwartze, M., Keller, P.  E., Patel, A.  D., & Kotz, S.  A. (2011). The impact of basal ganglia lesions on sensorimotor synchronization, spontaneous motor tempo, and the detection of tempo changes. Behavioural Brain Research 216(2), 685–691. Selezneva, E., Deike, S., Knyazeva, S., Scheich, H., Brechmann, A., & Brosch, M. (2013). Rhythm sensitivity in macaque monkeys. Frontiers in Systems Neuroscience 7. Retrieved from https://doi.org/10.3389/fnsys.2013.00049

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

186   christina m. vanden bosch der nederlanden et al. Snyder, J.  S., & Large, E.  W. (2005). Gamma-band activity reflects the metric structure of rhythmic tone sequences. Cognitive Brain Research 24(1), 117–126. Spaulding, S. J., Barber, B., Colby, M., Cormack, B., Mick, T., & Jenkins, M. E. (2013). Cueing and gait improvement among people with Parkinson’s disease: A meta-analysis. Archives of Physical Medicine and Rehabilitation 94(3), 562–570. Stupacher, J., Hove, M. J., Novembre, G., Schutz-Bosbach, S., & Keller, P. E. (2013). Musical groove modulates motor cortex excitability: A TMS investigation. Brain and Cognition 82(2), 127–136. Su, Y. H., & Salazar-López, E. (2016). Visual timing of structured dance movements resembles auditory rhythm perception. Neural Plasticity 2016, 1678390. doi:10.1155/2016/1678390 Taylor, J. E. T., & Witt, J. K. (2015). Listening to music primes space: Pianists, but not novices, simulate heard actions. Psychological Research 79(2), 175–182. Taylor, J. E. T., Witt, J. K., & Grimaldi, P. J. (2012). Uncovering the connection between artist and audience: Viewing painted brushstrokes evokes corresponding action representations in the observer. Cognition 125(1), 26–36. Teki, S., Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of durationbased and beat-based auditory timing. Journal of Neuroscience 31(10), 3805–3812. Winkler, I., Haden, G. P., Ladinig, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the beat in music. Proceedings of the National Academy of Sciences 106(7), 2468–2471. Woodruff Carr, K., White-Schwoch, T., Tierney, A. T., Strait, D. L., & Kraus, N. (2014). Beat synchronization predicts neural speech encoding and reading readiness in preschoolers. Proceedings of the National Academy of Sciences 111(40), 14559–14564. Zarco, W., Merchant, H., Prado, L., & Mendez, J. C. (2009). Subsecond timing in primates: Comparison of interval production between human subjects and rhesus monkeys. Journal of Neurophysiology 102(6), 3191–3202. Zhao, J., Al-Aidroos, N., & Turk-Browne, N.  B. (2013). Attention is spontaneously biased toward regularities. Psychological Science 24(5), 667–677. Zuk, J., Bishop-Liebler, P., Ozernov-Palchik, O., Moore, E., Overy, K., Welch, G., & Gaab, N. (2017). Revisiting the “enigma” of musicians with dyslexia: Auditory sequencing and speech abilities. Journal of Experimental Psychology: General 146(4), 495–511.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

chapter 9

n eu r a l basis of m usic perception: M el ody, H a r mon y, a n d Ti m br e Stefan Koelsch

Introduction “Music” is a special case of sound. As opposed to animal song and drumming (e.g.,  birdsong, ape drumming, etc.), music is produced by humans. As opposed to noise, and noise-textures (e.g., wind, fire crackling, rain, water bubbling, etc.), musical sounds have a structural organization. In the time domain, the most fundamental principle of musical structure is the temporal organization of sounds based on an isochronous grid (the tactus, or “beat”), although there are notable exceptions (such as some kinds of meditation music, or some pieces of modern art music, such as the famous Atmosphères by Ligeti). In the frequency (pitch) domain, the most fundamental principle of musical structure is an organization of pitches according to the overtone series, resulting in simple (e.g., pentatonic) scales. Note that the production of overtonebased scales is, in turn, rooted in the perceptual properties of the auditory system, especially in octave and “fifth equivalence” (Terhardt,  1991), and that inharmonic spectra (e.g., of inharmonic metallophones) give rise to different scales, such as the pelog and slendro scales (Sethares, 2005). Thus, for a vast amount of musical traditions around the globe and through human history, these two principles (isochronous beat and scale-pitch) build the nucleus of a universal musical grammar. Out of this nucleus, a seemingly infinite number of musical systems, styles, and compositions evolved, and this evolution appears to have followed computational principles described, for example, in the Chomsky hierarchy and their extensions (Rohrmeier, Zuidema, Wiggins, & Scharff, 2015), that is, local relationships between sounds based on a finite state grammar, nonlocal relationships between sounds based on a context-free grammar, and possibly a context-sensitive grammar (Rohrmeier et al., 2015). Note that the term “language” also

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

188   stefan koelsch refers to structured sounds that are produced by humans. Similar to music, spoken language also has melody, rhythm, accents, and timbre. However, in language normally only one individual speaks at a time (otherwise the language cannot be understood, and the sound is unpleasant). By contrast, music immediately affords, by virtue of its fundamental structural principles, that several individuals produce sounds together (while the music still makes sense and sounds good). In this sense, language is the music of the individual, and music is the language of the group. The fact that music can only be produced by humans is afforded by the uniquely human ability to synchronize movements (including vocalizations) flexibly in a group to an external pulse (see also Merchant & Honing, 2014; Merker, Morley, & Zuidema, 2015). Finally, several scholars have noted that “language,” in turn, is a special case of music. For example, Ulrich (personal communication) once noted that language is music distorted by (propositional) semantics. In this regard, the terms “music” and “language” both refer to structured sounds that are produced by humans as a means of social interaction, expression, diversion, or evocation of emotion, with language, in addition, affording the property of propositional semantics. The following sections will review neuroscientific research about the perception of musical sounds, in particular with regard to the structural processing of melodies and harmonies.

We Do Not Only Hear with Our Cochlea The auditory system evolved phylogenetically from the vestibular system. Interestingly, the vestibular nerve contains a substantial number of acoustically responsive fibers. The otolith organs (saccule and utricle) are sensitive to sounds and vibrations (Todd, Paillard, Kluk, Whittle, & Colebatch, 2014), and the vestibular nuclear complex in the brainstem exerts a major influence on spinal (and ocular) motoneurons in response to loud sounds with low frequencies, or with sudden onsets (Todd et al., 2014; Todd & Cody, 2000). Moreover, both the vestibular nuclei and the auditory cochlear nuclei in the brainstem project to the reticular formation (also in the brainstem), and the vestibular nucleus also projects to the parabrachial nucleus, a convergence site for vestibular, visceral, and autonomic processing in the brainstem (Balaban & Thayer, 2001; Kandler & Herbert, 1991). Such projections initiate and support movements and contribute to the arousing effects of music. Thus, subcortical processing of sounds does not only give rise to auditory sensations, but also to muscular and autonomic responses, and the stimulation of motoneurons and autonomic neurons by low-frequency beats might contribute to the human impetus to “move to the beat” (Grahn & Rowe, 2009; Todd & Cody, 2000). In addition to vibrations of the vestibular apparatus and cochlea, sounds also evoke resonances in vibration receptors, that is, in the Pacinian corpuscles (which

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   189 are sensitive from 10 Hz to a few kHz, and located mainly in the skin, the retroperitoneal space in the belly, the periosteum of the bones, and the sex organs), and maybe even responses in mechanoreceptors of the skin that detect pressure. The famous inter­ national concert percussionist Dame Evelyn Glennie is profoundly deaf and hears mainly  through vibrations felt in the skin (personal communication with Dame Glennie), and probably in the vestibular organ. Thus, we do not only hear with our cochlea, but also with the vestibular apparatus and mechanoreceptors distributed throughout our body.

Auditory Feature Extraction in Brainstem and Thalamus Neural activity originating in the auditory nerve is progressively transformed in the auditory brainstem, as indicated by different neural response properties for the periodicity of sounds, timbre (including roughness, or consonance/dissonance), sound intensity, and interaural disparities in the superior olivary complex and the inferior colliculus (Geisler, 1998; Langner & Ochse, 2006; Pickles, 2008; Sinex, Guzik, Li, & Henderson Sabes, 2003). Already the inferior colliculi can initiate flight- and defensive behavior in response to threatening stimuli (even before the acoustic information reaches the auditory cortex; Cardoso, Coimbra, & Brandão, 1994; Lamprea et al., 2002), providing evidence of relatively elaborated auditory processing already in the brainstem. This stays in contrast to the visual system: already Philip Bard (1934) observed that decortication (removing the neocortex) led to blindness in cats and dogs, but not to deafness. Although the hearing thresholds appeared to be elevated, the animals were capable of differentiating sounds. From the thalamus (particularly over the medial geniculate body) neural impulses are mainly projected into the auditory cortex (but note that the thalamus also projects auditory impulses into the amygdala and the medial orbitofrontal cortex; Kaas, Hackett, & Tramo, 1999; LeDoux, 2000; Öngür & Price, 2000). The exact mechanisms underlying pitch perception are not known (and will not be discussed here), but it is clear that both space information (originating from the tonotopic organization of the cochlea) and time information (originating from the integer time intervals of neural spiking in the auditory nerve) contribute to pitch perception (Moore,  2008). Importantly, the auditory pathway does not only consist of bottom-up, but also of top-down projections; nuclei such as the dorsal nucleus of the inferior colliculus presumably receive even more descending than ascending projections from diverse auditory cortical fields (Huffman & Henson,  1990). Given the massive top-down projections within the auditory pathway, it also becomes increasingly obvious that top-down predictions play an important role in pitch perception (Malmierca, Anderson, & Antunes, 2015). Within the predictive coding framework (currently one of

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

190   stefan koelsch the dominant theories on sensory perception), such top-down projections are thought to afford passing on backward predictions, while forward sensory information is passed bottom-up, signaling prediction errors, that is, sensory information that does not match a prediction (Friston, 2010). Numerous studies investigated decoding of frequency information in the auditory brainstem using the frequency-following response (FFR; Kraus & Chandrasekaran, 2010). The FFR can be elicited pre-attentively and is thought to originate mainly from the inferior colliculus (but note also that it is likely that the auditory cortex is at least partly involved in shaping the FFRs, e.g., by virtue of top-down projections to the inferior colliculus, referred to above). Using FFRs, Wong and colleagues (Wong, Skoe, Russo, Dees, & Kraus, 2007) measured brainstem responses to three Mandarin tones that differed only in their (F0) pitch contours. Participants were amateur musicians and non-musicians, and results revealed that musicians had more accurate encoding of the pitch contour of the phonemes (as reflected in the FFRs) than non-musicians. This finding indicates that the auditory brainstem is involved in the encoding of pitch contours of speech information (vowels), and that the correlation between the FFRs and the properties of the acoustic information is modulated by musical training. Similar training effects on FFRs elicited by syllables with a dipping pitch contour have also been observed in native English speakers (non-musicians) after a training period of 14 days (with eight 30-minute sesssions; Song, Skoe, Wong, & Kraus, 2008). The latter results show the contribution of the brainstem in language learning and its neural plasticity in adulthood. A study by Strait and colleagues (Strait, Kraus, Skoe, & Ashley, 2009) also reported musical training effects on the decoding of the acoustic features of an affective vocalization (an infant’s unhappy cry), as reflected in auditory brainstem potentials. This suggests (a) that the auditory brainstem is involved in the auditory processing of communicated states of emotion (which substantially contributes to the decoding and understanding of affective prosody), and (b) that musical training can lead to a finer tuning of such (subcortical) processing.

Acoustical Equivalency of “Timbre” and “Phoneme” With regard to a comparison between music and speech, it is worth mentioning that, in terms of acoustics, there is no difference between a phoneme and the timbre of a musical sound (and it is only a matter of convention that some phoneticians rather use terms such as “vowel quality” or “vowel color,” instead of “timbre”). Both are characterized by the two physical correlates of timbre: spectrum envelope (i.e., differences in the relative amplitudes of the individual “harmonics,” or “overtones”) and amplitude envelope (also sometimes called the amplitude contour or energy contour of the sound wave, i.e., the way that the loudness of a sound changes over time, particularly with regard to the on- and offset of a sound). Aperiodic sounds can also

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   191 differ in spectrum envelope (see, e.g., the difference between /ʃ/ and /s/), and timbre differences related to amplitude envelope play a role in speech, e.g. in the shape of the attack for /b/ vs. /w/ and /ʃ/ vs. /tʃ/.

Auditory Feature Extraction in the Auditory Cortex As mentioned earlier, auditory information is projected mainly via the subdivisions of the medial geniculate body into the primary auditory cortex (PAC, corresponding to Brodmann’s area 41) and adjacent secondary auditory fields (corresponding to Brodmann’s areas 42 and 52; for a detailed description of primary auditory “core,” and secondary auditory “belt” fields, as well as their connectivity, see Kaas & Hackett, 2000). With regard to the functional properties of primary and secondary auditory fields, a study by Petkov and colleagues (Petkov, Kayser, Augath, & Logothetis, 2006) showed that, in the macaque monkey, all of the PAC core areas, and most of the surrounding belt areas, show a tonotopic organization (the tonotopic organization is clearest in the field A1, and some belt areas seem to show only weak, or no, tonotopic organization). These auditory areas perform a more fine-grained, and more specific, analysis of acoustic features compared to the auditory brainstem. For example, Tramo and colleagues (Tramo, Shah, & Braida, 2002) reported that a patient with bilateral lesion of the PAC (a) had normal detection thresholds for sounds (i.e., the patient could say whether there was a tone or not), but (b) had elevated thresholds for determining whether two tones had the same pitch or not (i.e., the patient had difficulties to detect fine-grained frequency differences between two subsequent tones), and (c) had markedly increased thresholds for determining the pitch direction (i.e., the patient had great difficulties in saying whether the second tone was higher or lower in pitch than the first tone, even though he could tell that both tones differed.1 Note that the auditory cortex is also involved in a number of other functions, such as auditory sensory memory, extraction of inter-sound relationships, discrimination and organization of sounds as well as sound patterns, stream segregation, automatic change detection, and multisensory integration (for reviews see Hackett & Kaas, 2004; Winkler, 2007; some of these functions are also mentioned further in the following). Moreover, the (primary) auditory cortex is involved in the transformation of acoustic features (such as frequency information) into percepts (such as pitch height and pitch chroma). For example, a sound with the frequencies 200 Hz, 300 Hz, and 400 Hz is transformed into the pitch percept of 100 Hz. Lesions of the (right) PAC result in a loss of the ability to perceive residue pitch (or “virtual pitch”) in both animals (Whitfield, 1980) and humans (Zatorre, 1988), and neurons in the anterolateral region of the PAC show responses to a missing fundamental frequency (Bendor & Wang,  2005). Moreover, magnetoencephalographic (MEG) data indicate that 1  For similar results obtained from patients with (right) PAC lesions see Johnsrude, Penhune, and Zatorre (2000) and Zatorre (2001).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

192   stefan koelsch response properties in the PAC depend on whether or not a missing fundamental of a complex tone is perceived (Patel & Balaban, 2001; data were obtained from humans). Note, however, that combination tones emerge already in the cochlea, and that the periodicity of complex tones is coded in the spike pattern of auditory brainstem neurons; therefore, different mechanisms contribute to the perception of residue pitch on at least three different levels (basilar membrane, brainstem, and auditory cortex). However, the studies by Zatorre (1988) and Whitfield (1980) suggest that, compared to the brainstem or the basilar membrane, the auditory cortex plays a more prominent role for the transformation of acoustic features into auditory percepts (such as the transformation of information about the frequencies of a complex sound, as well as about the periodicity of a sound, into a pitch percept). Warren and colleagues (Warren, Uppenkamp, Patterson, & Griffiths, 2003) reported that changes in pitch chroma involve auditory regions anterior of the PAC (covering parts of the planum polare) more strongly than changes in pitch height. Conversely, changes in pitch height appear to involve auditory regions posterior of the PAC (covering parts of the planum temporale) more strongly than changes in pitch chroma (Warren et al., 2003). Moreover, with regard to functional differences between the left and the right PAC, as well as neighboring auditory association cortex, several studies suggest that the left auditory cortex (AC) has a higher resolution of temporal information than the right AC, and that the right AC has a higher spectral resolution than the  left AC (Hyde, Peretz, & Zatorre,  2008; Perani et al.,  2010; Zatorre, Belin, & Penhune, 2002). Finally, the auditory cortex also prepares acoustic information for further conceptual and conscious processing. For example, with regard to the meaning of sounds, just a short single tone can sound, for example, “bright”, “rough,” or “dull”. That is, the timbre of a single sound is already capable of conveying meaning information. Operations within the (primary and adjacent) auditory cortex related to auditory feature analysis are reflected in electrophysiological recordings in brain-electric responses that have latencies of about 10 to 100 ms, particularly middle-latency responses, including the auditory P1 (a response with positive polarity and a latency of around 50 ms), and the later auditory N100 component (the N1 is a response with negative polarity and a latency of around 100 ms). Such brain-electric responses are also referred to as “event-related potentials” (ERPs) or “evoked potentials.”

Echoic Memory and Gestalt Formation While auditory features are extracted, the acoustic information enters the auditory sensory memory (or “echoic memory”), and representations of auditory Gestalten (Griffiths & Warren, 2004) or “auditory objects” are formed. The auditory sensory memory (ASM) retains information only for a few seconds, and information stored in the ASM fades quickly. The ASM is thought to store physical features of sounds (such as

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   193 pitch, intensity, duration, location, timbre, etc.), sound patterns, and even abstract features of sound patterns (e.g., Paavilainen, Simola, Jaramillo, Näätänen, & Winkler, 2001). Operations of the ASM are at least partly reflected electrically in the mismatch negativity (MMN, e.g., Näätänen, Tervaniemi, Sussman, Paavilainen, & Winkler, 2001). The MMN is an ERP with negative polarity and a peak-latency of about 100–200 ms and appears to receive its main contributions from neural sources located in the PAC and adjacent auditory (belt) fields, with additional (but smaller) contributions from frontal cortical areas (for reviews, see Deouell, 2007; Schönwiesner et al., 2007). Auditory sensory memory operations are indispensable for music perception; therefore, practically all MMN studies are inherently related to, and relevant for, the understanding of the neural correlates of music processing. As will be outlined below, numerous MMN studies have contributed to this issue (a) by investigating different response properties of the ASM to musical and speech stimuli, (b) by using melodic and rhythmic patterns to investigate auditory Gestalt formation, and/or (c) by studying effects of long- and short-term musical training on processes underlying ASM operations. Especially the latter studies have contributed substantially to our understanding of neuroplasticity (i.e., to changes in neuronal structure and function due to experience), and thus to our understanding of the neural basis of learning (for a review see Tervaniemi, 2009). Here, suffice it to say that MMN studies showed effects of long-term musical training on the processing of sound localization, pitch, melody, rhythm, musical key, timbre, tuning, and timing (e.g., Koelsch, Schröger, & Tervaniemi, 1999; Putkinen, Tervaniemi, Saarikivi, de Vent, & Huotilainen, 2014; Rammsayer & Altenmüller, 2006; Tervaniemi, Castaneda, Knoll, & Uther, 2006; Tervaniemi, Janhunen, Kruck, Putkinen, & Huotilainen, 2016). Auditory oddball paradigms were also used to investigate processes of melodic and rhythmic grouping of tones occurring in tone patterns (such grouping is essential for auditory Gestalt formation, see also Sussman,  2007), as well as effects of musical long-term training on these processes. These studies showed effects of musical training (a) on the processing of melodic patterns (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004; Tervaniemi, Ilvonen, Karma, Alho, & Näätänen, 1997; Tervaniemi, Rytkönen, Schröger, Ilmoniemi, & Näätänen, 2001; Zuijen, Sussman, Winkler, Näätänen, & Tervaniemi, 2004; in these studies, patterns consisted of four or five tones), (b) on the encoding of the number of elements in a tone pattern (Zuijen, Sussman, Winkler, Näätänen, & Tervaniemi, 2005), and (c) on the processing of patterns consisting of two voices (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2005). The formation of auditory Gestalten entails processes of perceptual separation, as well as processes of melodic, rhythmic, timbral, and spatial grouping. Such processes have been summarized under the concepts of auditory scene analysis and auditory stream segregation (Bregman, 1994). Grouping of acoustic events follows Gestalt principles such as similarity, proximity, and continuity (for acoustic cues used for perceptual separation and auditory grouping see Darwin, 1997, 2008). In everyday life, such operations are not only important for music processing, but also, for instance, for separating a speaker’s voice during a conversation from other sound sources in the environment. That is, these operations are important because their function is to recognize and to

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

194   stefan koelsch follow acoustic objects, and to establish a cognitive representation of the acoustic environment. It appears that the planum temporale (which is part of the auditory association cortex) is a crucial structure for auditory scene analysis and stream segregation, particularly due to its role for the processing of pitch intervals and sound sequences (Griffiths & Warren, 2002; Patterson, Uppenkamp, Johnsrude, & Griffiths, 2002; Snyder & Elhilali, 2017).

Musical Expectancy Formation: Processing of Local Dependencies Processing regularities of subsequent sounds can be performed based on two different principles: first, based on the regularities inherent in the acoustical properties of the sounds, for example, pitch (after a sequence of several sounds with the same pitch, a sound with a different pitch sounds irregular). This type of processing is assumed to be performed by the auditory sensory memory, and processing of irregular sounds is reflected in the MMN (discussed earlier). Note that the extraction of the regularity underlying such sequences does not require memory capabilities beyond the auditory sensory memory (i.e., the regularity is extracted in real time, on a moment-to-moment basis). I have referred previously to such syntactic processes as “knowledge-free structuring” (Koelsch, 2012). Second, the local arrangement of elements in language and music includes numerous regularities that cannot simply be extracted on a moment-to-moment basis but have to be learned over an extended period of time (“local” refers here to the arrangement of adjacent, or directly succeeding, elements). For example, it usually takes months, or even years, to learn the syntax of a language, and it takes a considerable amount of exposure and learning to establish (implicit) knowledge of the statistical regularities of a certain type of music. I have referred previously to such syntactic processes as “musical expectancy formation” (Koelsch, 2012). An example for local dependencies in music captured by “musical expectancy formation” is the bigram table of chord transition probabilities extracted from a corpus of Bach chorales in a study by Rohrmeier and Cross (2008). That table, for example, showed that after a dominant seventh chord, the most likely chord to follow is the tonic. It also showed that a supertonic is nine times more likely to follow a tonic than a tonic following a supertonic. This is important, because the acoustic similarity of tonic and supertonic is the same in both cases, and therefore it is very difficult to explain this statistical regularity simply based on acoustic similarity. Rather, this regularity is specific for this kind of major-minor tonal music, and thus has to be learned (over an extended period of time) to be represented accurately in the brain of a listener. Notably, even non-musicians are sensitive to such statistical regularities and pick up statistical structures without explicit intent. This ability is explored within the frameworks

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   195 of statistical learning (Saffran, Aslin, & Newport, 1996) and implicit learning (Cleeremans, Destrebecqz, & Boyer, 1998), both of which have been argued to investigate the same underlying learning phenomenon (Dienes, 2012; Perruchet & Pacton, 2006). Although statistical learning appears to be domain-general (Conway & Christiansen, 2005), it has most prominently been investigated in the context of language acquisition, especially word learning (for a review see Romberg & Saffran, 2010), as well as music (for reviews see Ettlinger, Margulis, & Wong, 2011; François & Schön, 2014; Rohrmeier & Rebuschat, 2012). With regard to statistical learning paradigms, word learning has been argued to be grounded, at least in part, in sequence prediction: in a continuous stream of syllables, sequences of events linked with high statistical conditional probability likely correspond to words, whereas syllable transitions with low predictability may likely be indicative of word-boundaries (François & Schön, 2014; Marcus, Vijayan, Rao, & Vishton, 1999; Saffran, Newport, & Aslin, 1996). Thus, tracking conditional probability relations between syllables has been regarded as highly relevant for the extraction of candidate word forms (Hay, Pelucchi, Estes, & Saffran,  2011; Saffran,  2001). In music, representations of musical regularities guiding local dependencies serve the formation of a musical expectancy (“musical” is italicized here to clearly differentiate this type of expectancy formation from the formation of expectancies based on simply acoustical regularities). In addition, integrating information across the extracted units eventually reveals distributional properties (Hunt & Aslin, 2010; Thiessen, Kronstein, & Hufnagle, 2013). Extracted statistical properties provide an important basis for predictions which guide the processing of sensory information (Friston, 2010; Friston & Kiebel, 2009; Thiessen et al., 2013). Stimuli that are hard to predict (e.g., the syllable after a word boundary) have been hypothesized to increase processing load (Friston, 2010; Friston & Kiebel, 2009). Such an increase in processing load has been found to be reflected neurophysiologically in ERP components such as the N100 and the N400: during ­successful stream segmentation, word-onsets evoke larger N100 and N400 ERPs compared to more predictable positions within the word in adults (e.g., Abla, Katahira, & Okanoya, 2008; François, Chobert, Besson, & Schön, 2013; Francois & Schön, 2011, 2014; Schön & François,  2011; Teinonen & Huotilainen,  2012), and similar ERP responses have been observed even in newborns (Teinonen, Fellman, Näätänen, Alku, & Huotilainen, 2009). When participants learn local dependencies (i.e., statistical regularities underlying the succession of sounds), irregular sounds elicit a statistical MMN (or sMMN; Koelsch, Busch, Jentschke, & Rohrmeier, 2016), which is maximal between around 130–220 ms, and has a frontal distribution (Daikoku, Yatomi, & Yumoto, 2014; Furl et al; 2011; Koelsch et al., 2016; Paraskevopoulos, Kuchenbuch, Herholz, & Pantev, 2012). So far, this has been investigated in statistical learning paradigms in which participants are presented over a period of several dozens of minutes with streams of “triplets” (i.e., sounds arranged in threes), with the triplets being designed such that succession of tones within and between triplets follows exactly specified statistical regularities.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

196   stefan koelsch It is important to understand that, within the Chomsky hierarchy, a finite state automaton is required to process both the regularities underlying the generation the physical MMN (phMMN) and abstract-feature MMN (afMMN) on the one side (i.e., “knowledge-free structuring”), and the sMMN on the other (i.e., “musical expectancy formation”). In other words, a finite state grammar is sufficient to process these two types of regularities. However, they are represented psychologically and neurophysiologically in fundamentally different ways (because the processing of regularities that do not require long-term memory, i.e., “knowledge-free structuring,” differs neurocognitively from the processing of regularities stored in long-term memory, i.e. “musical expectancy formation”). The local transition probabilities underlying the generation of the phMMN and afMMN are stored in auditory sensory memory (and if the probabilities change, the sensory representations of the new transition probabilities are dynamically updated). By contrast, deviants in statistical learning paradigms, like those employed in the MEG studies described above (Daikoku et al., 2014; Daikoku, Yatomi, & Yumoto, 2015; Furl et al., 2011; Koelsch et al., 2016; Paraskevopoulos et al., 2012) require an extended period of learning, and the mismatch response associated with statistical learning reflects the processing of local dependencies based on (implicit) knowledge about statistical regularities. That is, the mismatch response associated with statistical learning is based on memory representations beyond the capabilities of sensory memory. With regard to music, this also means that fundamentally different neurocognitive systems process different types of local syntactic dependencies in music, even though they can be captured by the same (finite state) automaton within the Chomsky hierarchy.

Musical Structure Building: Processing of Nonlocal Dependencies As described in the previous section, tonal music involves representations of single events and local relationships on short timescales. However, many composers designed nested hierarchical syntactic structures spanning longer timescales, potentially up to entire movements of symphonies and sonatas (Salzer, 1962; Schenker, 1956). Hierarchical syntactic structure (involving the potential for nested nonlocal dependencies) is a key component of the human language capacity (Chomsky,  1995; Fitch & Hauser,  2004; Friederici, Bahlmann, Heim, Schubotz, & Anwander, 2006; Hauser, Chomsky, & Fitch, 2002; Nevins, Pesetsky, & Rodrigues, 2009), and frequently produced and perceived in everyday life. For example, in the sentence “the boy who helped Peter kissed Mary,” the subject relative clause “who helped Peter” is nested into the main clause “the boy kissed Mary,” creating a nonlocal hierarchical dependency between “the boy” and “kissed Mary.”2 Music theorists have described analogous hierarchical structures for music. 2  Note that a finite state automaton will only (mis)understand that “Peter kissed Mary”!

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   197 Schenker (1956) was the first to describe musical structures as organized hierarchically, in a way that musical events are elaborated (or prolonged) by other events in a recursive fashion. According to this principle, for example, a phrase (or set of phrases) can be conceived of as an elaboration of a basic underlying tonic–dominant–tonic progression. Schenker further argued that this principle can be expanded to even larger musical sequences, up to entire musical movements. In addition, Hofstadter (1979) was one of the first to argue that a change of key embedded in a superordinate key (such as a tonal modulation away from and returning to an initial key) constitutes a prime example of recursion in music. Based on similar ideas, several theorists have developed formal descriptions of the analysis of hierarchical structures in music (Lerdahl & Jackendoff, 1983; Rohrmeier, 2011; Steedman, 1984), including the Generative Theory of Tonal Music (GTTM) by Lerdahl and Jackendoff (1983), and the Generative Syntax Model (GSM) by Rohrmeier (2011). Humans are capable of processing hierarchically organized structures including nonlocal dependencies in music (Dibben, 1994; Koelsch, Rohrmeier, Torrecuso, & Jentschke, 2013; Lerdahl & Krumhansl, 2007; Serafine, Glassman, & Overbeeke, 1989), driven by the human capacity to perceive and produce hierarchical, potentially recursive structures (Chomsky, 1995; Hauser et al., 2002; Jackendoff & Lerdahl, 2006). Using chorales by J. S. Bach (see Figure 1) a recent study (Koelsch et al., 2013) showed that hierarchically incorrect final chords of a musical period (violating the nonlocal prolongation of the beginning of the period) elicit a negative brain-electric potential which is maximal between 150 and 300 ms and had frontal preponderance. Note that the term “hierarchical” is used here to refer to a syntactic organizational principle of musical sequences by which elements are organized in terms of subordination and dominance relationships (Lerdahl & Jackendoff, 1983; Rohrmeier, 2011; Steedman, 1984). Such hierarchical structures can be established through the recursive application of rules, analogous to the establishment of hierarchical structures in language (Chomsky, 1995). In both linguistics and music theory, such hierarchical dependency structures are commonly represented using tree graphs. The term “hierarchical” is sometimes also used in a different sense, namely to indicate that certain pitches, chords, or keys within pieces occur more frequently than others and thus establish a frequencybased ranking of structural importance (Krumhansl & Cuddy, 2010). That is not the sense intended here. Numerous other studies using EEG, MEG, and fMRI have previously investigated processing of musical syntax using melodies (with regular and irregular tones) or chord sequences (with regular and irregular harmonies, for reviews see Koelsch, 2009, 2012; Patel, 2008). In all of these studies, the processes of “musical expectancy formation” (involving processing of local dependencies) and “musical structure building” (involving processing of hierarchically organized nonlocal dependencies) were confounded (as is usually the case in “real” music). For example, in the sequences shown in Figure 2b, the final chord of the upper sequence is a tonic (I), which is the most likely chord to follow a dominant (V). The final chord of the lower sequence is a supertonic (II), which is less likely to follow a dominant. Thus, the local transition probability from V to II is lower

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

198   stefan koelsch BWV 373 (Liebster jesu, wir sind hier) (a) Original

T T

T

T

D D

T

S

I

V

I

(b) Modified

V

[III

V

I

D

VI] VII I

V

IV

V

I

T

T

T

? T

D D S

I

V

I

V

[III

V

I

T D

V

VI] VII I

IV

V

I

?

Figure 1.  Nonlocal dependencies in music. (a) Original version of J. S. Bach’s chorale Liebster Jesu, wir sind hier. The first phrase ends on an open dominant (see chord with fermata) and the second phrase ends on a tonic (dotted rectangle). The tree structure above the scores represents a schematic diagram of the harmonic dependencies. The two thick vertical lines (separating the first and the second phrase) visualize that the local dominant (V, rectangle above the fermata) is not immediately followed by a resolving tonic chord, but implies its resolution with the final tonic (indicated by the dotted arrow). The same dependency exists between initial and final tonic (indicated by the solid arrow). This illustrates the nonlocal (long-distance) dependency between the initial and final tonic regions and tonic chords, respectively (also illustrated by the solid arrow).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   199 than from V to I (in other words, the local dependency of I on V is stronger, i.e., more regular, than of II on V). At the same time, the final tonic “prolongs” the initial tonic, whereas the final supertonic does not. Therefore, the nonlocal dependency between initial and final chord is fulfilled in the upper sequence and violated in the bottom sequence. Figure 2c shows brain-electric responses to the final chords of the sequences shown in Figure 2b: the irregular supertonics elicit an ERAN (early right anterior negativity, indicated by the arrow) compared to the regular tonic chords. Importantly, as described earlier, the ERAN elicited here is a conglomerate of the sMMN (due to processing the local dependency violation) and the “hierarchical ERAN” (due to the processing of the nonlocal dependency violation). A study by Zhang and colleagues (Zhang, Zhou, Chang, & Yang, 2018), however, nicely showed effects of nonlocal context effects on local harmonic processing using the ERAN. The ERAN has a larger amplitude in individuals with musical training, is reduced by strong attentional demands, but can be elicited even if participants ignore the musical stimulus (for a review see Koelsch, 2012). Most studies reporting an ERAN used harmonies as stimuli, but the ERAN can also be elicited by melodies (e.g., Carrus, Pearce, & Bhattacharya,  2013; Fiveash, Thompson, Badcock, & McArthur,  2018; Miranda & Ullman, 2007; Zendel, Lagrois, Robitaille, & Peretz, 2015). Moreover, a study by Sun and colleagues (Sun, Liu, Zhou, & Jiang, 2018) reported that the ERAN can also be elicited by rhythmic syntactic processing. Interestingly, a study by Przysinda and colleagues (Przysinda, Zeng, Maves, Arkin, & Loui, 2017) showed differential ERAN responses in classical and jazz musicians depending on their preferences for irregular, or unusual harmonies. The ERAN is relatively immune against predictions: the ERAN latency, but not amplitude, is influenced by veridical expectations (Guo & Koelsch, 2016). However, Vuvan and colleagues (Vuvan, Zendel, & Peretz, 2018) reported that random feedback (including false feedback) on participants’ detection of out-of-key tones in melodies modulated the ERAN amplitude, possibly suggesting that attention-driven changes in the confidence in predictions (i.e., changes in the precision of predictions) might alter the ERAN amplitude. Recent studies also report that the ERAN is absent in individuals

The chords belonging to a key other than the initial key (see function symbols in square brackets) represent one level of embedding. (b) Modified version (the first phrase, i.e. notes up to the fermata, was transposed downwards by the pitch interval of one fourth, see light gray scores). The tree structure above the scores illustrates that the second phrase is not compatible with an expected tonic region (indicated by the dotted line), and that the last chord (a tonic of a local cadence, dotted rectangle) neither prolongs the initial tonic, nor closes the open dominant (see solid and dotted lines followed by question mark). In both (a) and (b), roman numerals indicate scale degrees. T, S, and D indicate the main tonal functions (tonic, subdominant, dominant) of the respective part of the sequence. Squared brackets indicate scale degrees relative to the local key (in the original version, the function symbols in square brackets indicate that the local key of C major is a subdominant region of the initial key of G major).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

200   stefan koelsch (a)

(b) tonic supertonic dominant

(c)

I IV II V I

–3.0 μV

F4

0.5 3.0

I II III IV V VI VII

I IV II V II

s 1.0 regular irregular difference

(d)

x = 52, y = 12, z = 10

x = –48, y = 11, z = 7

Figure 2.  (a) Examples of chord functions: The chord built on the first scale tone is denoted as the tonic, the chord on the second tone as the supertonic, and the chord on the fifth tone as the dominant. (b) The dominant-tonic progression represents a regular ending of a harmonic sequence (top), the dominant-supertonic progression is less regular and unacceptable as a marker of the end of a harmonic progression (bottom sequence, the arrow indicates the less regular chord). (c) ERPs elicited in a passive listening condition by the final chords of the two sequence types shown in (b). Both sequence types were presented in pseudorandom order equiprobably in all twelve major keys. Brain responses to irregular chords clearly differ from those to regular chords (best to be seen in the black difference wave, regular subtracted from irregular chords). The first difference between the two waveforms is maximal around 200 ms after the onset of the fifth chord (ERAN, indicated by the long arrow) and taken to reflect processes of music-syntactic analysis. The ERAN is followed by an N5 taken to reflect processes of harmonic integration (short arrow). (d) Activation foci (small spheres) reported by functional imaging studies on music-syntactic processing using chord sequence paradigms (Koelsch, Fritz, et al., 2005; Maess et al., 2001; Koelsch et al., 2002; Tillmann et al., 2003) and melodies (Janata et al., 2002). Large gray disks show the mean coordinates of foci (averaged for each hemisphere across studies, coordinates9 refer to standard stereotaxic space). Reprinted from Trends in Cognitive Sciences, 9(12), Stefan Koelsch and Walter A. Siebel, Towards a neural basis of music perception, pp. 578–584, Copyright © 2005 Elsevier Ltd. All rights reserved.

with “amusia” (Sun, Lu, et al.,  2018), or that pitch-judgment tasks can eliminate the ERAN in amusics (Zendel et al., 2015). In children, the ERAN becomes visible around the age of 30 months (Jentschke, Friederici, & Koelsch, 2014), and several studies have reported ERAN responses in preschool children (Corrigall & Trainor, 2014; Jentschke, Koelsch, Sallat, & Friederici, 2008; Koelsch, Grossmann, Gunter, Hahne, & Friederici, 2003). Children with specific language impairment show a reduced (or absent) ERAN (Jentschke et al., 2008), whereas neurophysiological correlates of language-syntactic processing are developed earlier, and more strongly in children with musical training (Jentschke & Koelsch, 2009).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   201 Functional neuroimaging studies using chord sequences (similar to those shown in Figure  2b, e.g., Koelsch et al.,  2002; Koelsch, Fritz, Schulze, Alsop, & Schlaug,  2005; Maess, Koelsch, Gunter, & Friederici, 2001; Tillmann, Janata, & Bharucha,  2003; Villarreal, Brattico, Leino, Østergaard, & Vuust, 2011) or melodies (Janata, Tillmann, & Bharucha, 2002) suggest that music-syntactic processing involves the pars opercularis of the inferior frontal gyrus (corresponding to BA 44v; Amunts et al., 2010) bilaterally, but with right-hemispheric weighting (see the spheres in Figure 2d). It seems likely that the involvement of BA 44v in music-syntactic processing is mainly due to the hierarchical processing of (syntactic) information: This part of Broca’s area is involved in the hierarchical processing of syntax in language (e.g., Friederici et al., 2006; Makuuchi, Bahlmann, Anwander, & Friederici, 2009), the hierarchical processing of action sequences (e.g., Fazio et al., 2009; Koechlin & Jubault, 2006), and possibly also in the processing of hierarchically organized mathematical formulas and termini (Friedrich & Friederici, 2009; although activation in the latter study cannot clearly be assigned to BA 44 or BA 45). Finally, using an artificial musical grammar, a recent study by Cheung and colleagues (Cheung, Meyer, Friederici, & Koelsch, 2018) reported activation of BA 44v associated with the processing of nonlocal (nested) dependencies (however, note that dependencies in that study were not hierarchically organized). It appears that inferior BA 44 is not the only structure involved in music-syntactic processing: additional structures include the superior part of the pars opercularis (Koelsch et al., 2002), ventral premotor cortex (PMCv; Janata et al., 2002; Koelsch, Fritz, et al., 2005; Parsons, 2001), and the anterior portion of the STG (Koelsch, Fritz, et al., 2005). The PMCv possibly contributes to the processing of local music-syntactic dependencies (i.e., information based on a finite state grammar): activations of PMCv have been reported in a variety of functional imaging studies on auditory processing using musical stimuli, linguistic stimuli, auditory oddball paradigms, pitch discrimination tasks, and serial prediction tasks, underlining the importance of these structures for the sequencing of structural information, the recognition of structure, and the prediction of sequential information (Janata & Grafton, 2003). With regard to language, Friederici (2004) reported that activation foci of functional neuroimaging studies on the processing of hierarchically organized long-distance dependencies and transform­ ations are located in the posterior IFG (with the mean of the coordinates reported in that article being located in the inferior pars opercularis), whereas activation foci of functional neuroimaging studies on the processing of local dependency violations are located in the PMCv (see also Friederici et al., 2006; Makuuchi et al., 2009; Opitz & Kotz,  2011). Moreover, patients with lesions in the PMCv show disruption of the processing of finite state, but not phrase-structure grammar (Opitz & Kotz, 2011). That is, in the abovementioned experiments that used chord sequence paradigms to investigate the processing of harmonic structure, the music-syntactic processing of the chord functions probably involved processing of both finite state grammar (local dependencies) and phrase-structure (or “context-free”) grammar (hierarchically organized nonlocal dependencies). The music-syntactic analysis involved a computation of the harmonic relation between a chord function and the context of preceding

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

202   stefan koelsch chord functions (phrase-structure grammar). Such a computation is more difficult (and less common) for irregular than for regular chord functions, and this increased difficulty is presumably reflected in a stronger activation of (inferior) BA 44 in response to irregular chords. In addition, the local transition probability from the penultimate to the final chord is lower for the dominant–supertonic progression than for the dominant–tonic progression (finite state grammar), and the computation of the (less predicted) lower-probability progression is presumably reflected in a stronger activation of PMCv in response to irregular chords. The stronger activation of both BA 44 and PMCv appears to correlate with the perception of a music-syntactically irregular chord as “unexpected” (although emotional effects of irregular chords probably originate from BA 47, ­discussed below). Note that the ability to process context-free grammar is available to humans, whereas non-human primates are apparently not able to master such grammars (Fitch & Hauser, 2004). Thus, it is highly likely that only humans can adequately process music-syntactic information at the phrase-structure level. It is also worth noting that numerous studies showed that even “non-musicians” (i.e., individuals who have not received formal musical training) have a highly sophisticated (implicit) knowledge about musical syntax (e.g., Tillmann, Bharucha, & Bigand, 2000). Such knowledge is presumably acquired during listening experiences in everyday life. Finally, it is important to note that violations of musical expectancies also have emotional effects, such as surprise, or tension (Huron, 2006; Koelsch, 2014; Lehne & Koelsch, 2015; Meyer, 1956). Consequently, musical irregularity confounds emotioneliciting effects, and it is difficult to disentangle cognitive and emotional effects of music-syntactic irregularities in neuroscientific experiments. For example, a study by Koelsch and colleagues (Koelsch, Ftiz, et al., 2005) reported activation foci in both BA 44 and BA 47 (among other structures) in response to musical expectancy violations, and a study by Levitin and Menon (2005) reported activation of BA 47 (without BA 44) in response to scrambled (unpleasant) vs. normal music. BA 47 is paralimbic, fivelayered palaeocortex (not neocortex), and activation of this region with musical irregularities is most likely due to emotional effects (this is also consistent with an fMRI study reporting that musical tension correlates with neural activity in BA 47; Lehne, Rohrmeier, & Koelsch, 2014). Note that, because BA 47 is not neocortex, it is problematic to consider this region as a “language area.” Moreover, BA 47 is adjacent to BA 44/45/46, thus activation foci originating in Broca’s area can easily be misplaced in BA 47. Based on receptorarchitectonic (and cytoarchitectonic) data, a study by Amunts et al. (2010) showed that BA 47 does not cluster together with BA 44/45/46 (Broca’s area in the wider sense), nor with BA 6 (PMC). As mentioned earlier, hierarchical processing of syntactic information from different domains (such as music and language) requires contributions from neural populations located in BA 44. However, it is still possible that, although such neural populations are located in the same brain area, entirely different (non-overlapping) neural populations serve the syntactic processing of music and language within the same area. That is, perhaps the neural populations mediating language-syntactic processing in BA 44 are

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   203 different from neural populations mediating music-syntactic processing in the same area. Therefore, the strongest evidence for shared neural resources for the syntactic processing of music and language stems from experiments that revealed interactions between music-syntactic and language-syntactic processing (Carrus et al.,  2013; Fedorenko, Patel, Casasanto, Winawer, & Gibson, 2009; Koelsch, Gunter, Wittfoth, & Sammler, 2005; Patel, Iversen, Wassenaar, & Hagoort, 2008; Slevc, Rosenberg, & Patel, 2009; Steinbeis & Koelsch,  2008). In these studies, chord sequences or melodies were played simultaneously with (visually presented) sentences, and it was shown, for example, that the ERAN elicited by irregular chords interacted with the left anterior negativity (LAN) elicited by linguistic (morpho-syntactic) violations (Koelsch, Gunter, et al., 2005; Steinbeis & Koelsch, 2008). Thus, music-syntactic processes can interfere with language-syntactic processes. In summary, neurophysiological studies show that music- and language-syntactic processes engage overlapping resources (presumably located in the inferior frontolateral cortex), and evidence showing that these resources underlie music- and languagesyntactic processing is provided by experiments showing interactions between ERP components reflecting music- and language-syntactic processing (in particular LAN and ERAN). Importantly, such interactions are observed in the absence of interactions between LAN and MMN, that is, in the absence between language-syntactic and acoustic deviance processing (reflected in the MMN), and in the absence of interactions between the ERAN and the N400 (i.e., in the absence of music-syntactic and languagesemantic processing). Therefore, the reported interactions between LAN and ERAN are syntax-specific and cannot be observed in response to any kind of irregularity.

Concluding Remark As a concluding remark I would like to emphasize that even individuals without formal musical training show sophisticated abilities with regard to the decoding of musical information, the acquisition of knowledge about musical syntax, the processing of musical information according to that knowledge, and the understanding of music. This finding supports the notion that musicality is a natural ability of the human brain. Such musical abilities are important for making music together in groups, and thus for the beneficial social effects promoted by musical group activities (such as cooperation and social cohesion, e.g., Koelsch, 2014; Tarr, Launay, & Dunbar, 2014). The natural musical abilities of humans are also important for the acquisition and the processing of language. For example, differentiating different vowels, consonants, and lexical tones, is a highly sophisticated capability of the human auditory system. Tonal languages rely on a meticulous decoding of pitch information, and both tonal and non-tonal languages require an accurate analysis of speech prosody to decode structure and meaning of speech. Infants use such prosodic cues to acquire information about word and phrase boundaries (possibly even about word meaning). The assumption of an intimate connection between

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

204   stefan koelsch music and speech is corroborated by the reviewed findings of overlapping and shared neural resources for music and language processing in both adults and children. These findings suggest that the human brain, particularly at an early age, does not treat language and music as separate domains, but rather treats language as a special case of music, and music as a special case of sound.

References Abla, D., Katahira, K., & Okanoya, K. (2008). On-line assessment of statistical learning by event-related potentials. Journal of Cognitive Neuroscience 20(6), 952–964. Amunts, K., Lenzen, M., Friederici, A. D., Schleicher, A., Morosan, P., Palomero-Gallagher, N., & Zilles, K. (2010). Broca’s region: Novel organizational principles and multiple receptor mapping. PLoS Biology 8(9), e1000489. Balaban, C. D., & Thayer, J. F. (2001). Neurological bases for balance–anxiety links. Journal of Anxiety Disorders 15(1), 53–79. Bard, P. (1934). On emotional expression after decortication with some remarks on certain theoretical views: Part II. Psychological Review 41(5), 424. Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature 436(7054), 1161–1165. Bregman, A.  S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press. Cardoso, S. H., Coimbra, N. C., & Brandão, M. L. (1994). Defensive reactions evoked by activation of NMDA receptors in distinct sites of the inferior colliculus. Behavioural Brain Research 63(1), 17–24. Carrus, E., Pearce, M. T., & Bhattacharya, J. (2013). Melodic pitch expectation interacts with neural responses to syntactic but not semantic violations. Cortex 49(8), 2186–2200. Cheung, V., Meyer, L., Friederici, A.  D., & Koelsch, S. (2018). The right inferior frontal gyrus processes hierarchical non-local dependencies in music. Scientific Reports 8, 3822. doi:10.1038/s41598-018-22144-9 Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press. Cleeremans, A., Destrebecqz, A., & Boyer, M. (1998). Implicit learning: News from the front. Trends in Cognitive Sciences 2(10), 406–416. Conway, C. M., & Christiansen, M. H. (2005). Modality-constrained statistical learning of tactile, visual, and auditory sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition 31(1), 24–39. Corrigall, K. A., & Trainor, L. J. (2014). Enculturation to musical pitch structure in young children: Evidence from behavioral and electrophysiological methods. Developmental Science 17(1), 142–158. Daikoku, T., Yatomi, Y., & Yumoto, M. (2014). Implicit and explicit statistical learning of tone sequences across spectral shifts. Neuropsychologia 63, 194–204. Daikoku, T., Yatomi, Y., & Yumoto, M. (2015). Statistical learning of music-and language-like sequences and tolerance for spectral shifts. Neurobiology of Learning and Memory 118, 8–19. Darwin, C. J. (1997). Auditory grouping. Trends in Cognitive Sciences 1(9), 327–333. Darwin, C.  J. (2008). Listening to speech in the presence of other sounds. Philosophical Transactions of the Royal Society B: Biological Sciences 363(1493), 1011–1021. Deouell, L. Y. (2007). The frontal generator of the mismatch negativity revisited. Journal of Psychophysiology 21(3/4), 188–203.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   205 Dibben, N. (1994). The cognitive reality of hierarchic structure in tonal and atonal music. Music Perception 12(1), 1–25. Dienes, Z. (2012). Conscious versus unconscious learning of structure. In P.  Rebuschat & J. Williams (Eds.), Statistical learning and language acquisition (pp. 337–364). Berlin: Walter de Gruyter. Ettlinger, M., Margulis, E. H., & Wong, P. C. (2011). Implicit memory in music and language. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00211 Fazio, P., Cantagallo, A., Craighero, L., D’Ausilio, A., Roy, A. C., Pozzo, T., . . . Fadiga, L. (2009). Encoding of human action in Broca’s area. Brain 132(7), 1980–1988. Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural integration in language and music: Evidence for a shared system. Memory & Cognition 37(1), 1–19. Fitch, W. T., & Hauser, M. D. (2004). Computational constraints on syntactic processing in a nonhuman primate. Science 303(5656), 377–380. Fiveash, A., Thompson, W. F., Badcock, N. A., & McArthur, G. (2018). Syntactic processing in music and language: Effects of interrupting auditory streams with alternating timbres. International Journal of Psychophysiology 129(1), 31–40. François, C., Chobert, J., Besson, M., & Schön, D. (2013). Music training for the development of speech segmentation. Cerebral Cortex 23(9), 2038–2043. Francois, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical and linguistic structures. Cerebral Cortex 21(10), 2357–2365. François, C., & Schön, D. (2014). Neural sensitivity to statistical regularities as a fundamental biological process that underlies auditory learning: The role of musical practice. Hearing Research 308, 122–128. Friederici, A. D. (2004). Processing local transitions versus long-distance syntactic hierarchies. Trends in Cognitive Sciences 8(6), 245–247. Friederici, A. D., Bahlmann, J., Heim, S., Schubotz, R. I., & Anwander, A. (2006). The brain differentiates human and non-human grammars: Functional localization and structural connectivity. Proceedings of the National Academy of Sciences 103(7), 2458–2463. Friedrich, R., & Friederici, A. D. (2009). Mathematical logic in the human brain: Syntax. PLoS ONE 4(5), e5599. Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience 11(2), 127–138. Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society B: Biological Sciences 364(1521), 1211–1221. Fujioka, T., Trainor, L.  J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. Journal of Cognitive Neuroscience 16(6), 1010–1021. Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2005). Automatic encoding of polyphonic melodies in musicians and nonmusicians. Journal of Cognitive Neuroscience 17(10), 1578–1592. Furl, N., Kumar, S., Alter, K., Durrant, S., Shawe-Taylor, J., & Griffiths, T. D. (2011). Neural prediction of higher-order auditory sequence statistics. NeuroImage 54(3), 2267–2277. Geisler, C.  D. (1998). From sound to synapse: Physiology of the mammalian ear. New York: Oxford University Press. Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience 29(23), 7540–7548. Griffiths, T. D., & Warren, J. D. (2002). The planum temporale as a computational hub. Trends in Neurosciences 25(7), 348–353.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

206   stefan koelsch Griffiths, T. D., & Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience 5(11), 887–892. Guo, S., & Koelsch, S. (2016). Effects of veridical expectations on syntax processing in music: Event-related potential evidence. Scientific Reports 6, 19064. doi:10.1038/srep19064 Hackett, T. A., & Kaas, J. (2004). Auditory cortex in primates: Functional subdivisions and processing streams. In M.  S.  Gazzaniga (Ed.), The cognitive neurosciences (pp. 215–232). Cambridge, MA: MIT Press. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science 298(5598), 1569–1579. Hay, J. F., Pelucchi, B., Estes, K. G., & Saffran, J. R. (2011). Linking sounds to meanings: Infant statistical learning in a natural language. Cognitive Psychology 63(2), 93–106. Hofstadter, D. R. (1979). Gödel, Escher, Bach. New York: Basic Books. Huffman, R. F., & Henson, O. W. (1990). The descending auditory pathway and acousticomotor systems: Connections with the inferior colliculus. Brain Research Reviews 15(3), 295–323. Hunt, R. H., & Aslin, R. N. (2010). Category induction via distributional analysis: Evidence from a serial reaction time task. Journal of Memory and Language 62(2), 98–112. Huron, D. B. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Hyde, K. L., Peretz, I., & Zatorre, R. J. (2008). Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46(2), 632–639. Jackendoff, R., & Lerdahl, F. (2006). The capacity for music: What is it, and what’s special about it? Cognition 100(1), 33–72. Janata, P., & Grafton, S. T. (2003). Swinging in the brain: Shared neural substrates for behaviors related to sequencing and music. Nature Neuroscience 6(7), 682–687. Janata, P., Tillmann, B., & Bharucha, J.  J. (2002). Listening to polyphonic music recruits domain-general attention and working memory circuits. Cognitive, Affective, & Behavioral Neuroscience 2(2), 121–140. Jentschke, S., Friederici, A.  D., & Koelsch, S. (2014). Neural correlates of music-syntactic processing in two-year old children. Developmental Cognitive Neuroscience 9, 200–208. Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development of syntax processing in children. NeuroImage 47(2), 735–744. Jentschke, S., Koelsch, S., Sallat, S., & Friederici, A. D. (2008). Children with specific language impairment also show impairment of music-syntactic processing. Journal of Cognitive Neuroscience 20(11), 1940–1951. Johnsrude, I.  S., Penhune, V.  B., & Zatorre, R.  J. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain 123(1), 155–163. Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences 97(22), 11793–11799. Kaas, J. H., Hackett, T. A., & Tramo, M. J. (1999). Auditory processing in primate cerebral cortex. Current Opinion in Neurobiology 9(2), 164–170. Kandler, K., & Herbert, H. (1991). Auditory projections from the cochlear nucleus to pontine and mesen-cephalic reticular nuclei in the rat. Brain Research 562(2), 230–242. Koechlin, E., & Jubault, T. (2006). Broca’s area and the hierarchical organization of human behavior. Neuron 50(6), 963–974. Koelsch, S. (2009). Music-syntactic processing and auditory memory: Similarities and differences between ERAN and MMN. Psychophysiology 46(1), 179–190. Koelsch, S. (2012). Brain and music. Chichester: Wiley-Blackwell.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   207 Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15(3), 170–180. Koelsch, S., Busch, T., Jentschke, S., & Rohrmeier, M. (2016). Under the hood of statistical learning: A statistical MMN reflects the magnitude of transitional probabilities in auditory sequences. Scientific Reports 6, 19741. doi:10.1038/srep19741 Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005). Adults and children processing music: An fMRI study. NeuroImage 25(4), 1068–1076. Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., & Friederici, A. D. (2003). Children processing music: Electric brain responses reveal musical competence and gender differences. Journal of Cognitive Neuroscience 15(5), 683–693. Koelsch, S., Gunter, T. C., Cramon, D. Y. von, Zysset, S., Lohmann, G., & Friederici, A. D. (2002). Bach speaks: A cortical “language-network” serves the processing of music. NeuroImage 17(2), 956–966. Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: An ERP study. Journal of Cognitive Neuroscience 17(10), 1565–1577. Koelsch, S., Rohrmeier, M., Torrecuso, R., & Jentschke, S. (2013). Processing of hierarchical syntactic structure in music. Proceedings of the National Academy of Sciences 110(38), 15443–15448. Koelsch, S., Schröger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. Neuroreport 10(6), 1309–1313. Koelsch, S., & Siebel, W.  A. (2005). Towards a neural basis of music perception. Trends in Cognitive Sciences 9(12), 578–584. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8), 599–605. Krumhansl, C.  L., & Cuddy, L.  L. (2010). A theory of tonal hierarchies in music. Music Perception 36, 51–87. Lamprea, M.  R., Cardenas, F.  P., Vianna, D.  M., Castilho, V.  M., Cruz-Morales, S.  E., & Brandão, M.  L. (2002). The distribution of Fos immunoreactivity in rat brain following freezing and escape responses elicited by electrical stimulation of the inferior colliculus. Brain Research 950(1–2), 186–194. Langner, G., & Ochse, M. (2006). The neural basis of pitch and harmony in the auditory system. Musicae Scientiae 10(1), 185. LeDoux, J.  E. (2000). Emotion circuits in the brain. Annual Review of Neuroscience 23, 155–184. Lehne, M., & Koelsch, S. (2015). Toward a general psychological model of tension and suspense. Frontiers in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.00079 Lehne, M., Rohrmeier, M., & Koelsch, S. (2014). Tension-related activity in the orbitofrontal cortex and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9(10), 1515–1523. Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. Lerdahl, F., & Krumhansl, C. L. (2007). Modeling tonal tension. Music Perception 24(4), 329–366. Levitin, D. J., & Menon, V. (2005). The neural locus of temporal structure and expectancies in music: Evidence from functional neuroimaging at 3 tesla. Music Perception: An Interdisciplinary Journal 22(3), 563–575.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

208   stefan koelsch Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in the area of Broca: An MEG-study. Nature Neuroscience 4(5), 540–545. Makuuchi, M., Bahlmann, J., Anwander, A., & Friederici, A. D. (2009). Segregating the core computational faculty of human language from working memory. Proceedings of the National Academy of Sciences 106(20), 8362–8367. Malmierca, M.  S., Anderson, L.  A., & Antunes, F.  M. (2015). The cortical modulation of stimulus-specific adaptation in the auditory midbrain and thalamus: A potential neuronal correlate for predictive coding. Frontiers in Systems Neuroscience 9, 19. Retrieved from https://doi.org/10.3389/fnsys.2015.00019 Marcus, G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule learning by seven-month-old infants. Science 283(5398), 77–80. Merchant, H., & Honing, H. (2014). Are non-human primates capable of rhythmic entrainment? Evidence for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience 7, 274. Retrieved from https://doi.org/10.3389/fnins.2013.00274 Merker, B., Morley, I., & Zuidema, W. (2015). Five fundamental constraints on theories of the origins of music. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140095. Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press. Miranda, R. A., & Ullman, M. T. (2007). Double dissociation between rules and memory in music: An event-related potential study. NeuroImage 38(2), 331–345. Moore, B. C. J. (2008). An introduction to the psychology of hearing (5th ed.). Bingley: Emerald. Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001). “Primitive intelligence” in the auditory cortex. Trends in Neurosciences 24(5), 283–288. Nevins, A., Pesetsky, D., & Rodrigues, C. (2009). Pirahã exceptionality: A reassessment. Language 85(2), 355–404. Öngür, D., & Price, J. L. (2000). The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cerebral Cortex 10(3), 206–219. Opitz, B., & Kotz, S. A. (2011). Ventral premotor cortex lesions disrupt learning of sequential grammatical structures. Cortex 48(6), 664–673. Paavilainen, P., Simola, J., Jaramillo, M., Näätänen, R., & Winkler, I. (2001). Preattentive extraction of abstract feature conjunctions from auditory stimulation as reflected by the mismatch negativity (MMN). Psychophysiology 38(2), 359–365. Paraskevopoulos, E., Kuchenbuch, A., Herholz, S. C., & Pantev, C. (2012). Statistical learning effects in musicians and non-musicians: An MEG study. Neuropsychologia 50(2), 341–349. Parsons, L. (2001). Exploring the functional neuroanatomy of music performance, perception, and comprehension. Annals of the New York Academy of Sciences 930, 211–231. Patel, A. D. (2008). Music, language, and the brain. Oxford: Oxford University Press. Patel, A.  D., & Balaban, E. (2001). Human pitch perception is reflected in the timing of stimulus-related cortical activity. Nature Neuroscience 4(8), 839–844. Patel, A. D., Iversen, J. R., Wassenaar, M., & Hagoort, P. (2008). Musical syntactic processing in agrammatic Broca’s aphasia. Aphasiology 22(7), 776–789. Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron 36(4), 767–776. Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., . . . Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107(10), 4758–4763. Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two approaches. Trends in Cognitive Sciences 10(5), 233–238.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   209 Petkov, C. I., Kayser, C., Augath, M., & Logothetis, N. K. (2006). Functional imaging reveals numerous fields in the monkey auditory cortex. PLoS Biology 4(7), e215. Pickles, J. O. (2008). An introduction to the physiology of hearing (3rd ed.). Bingley: Emerald. Przysinda, E., Zeng, T., Maves, K., Arkin, C., & Loui, P. (2017). Jazz musicians reveal role of expectancy in human creativity. Brain and Cognition 119, 45–53. Putkinen, V., Tervaniemi, M., Saarikivi, K., de Vent, N., & Huotilainen, M. (2014). Investigating the effects of musical training on functional brain development with a novel melodic MMN paradigm. Neurobiology of Learning and Memory 110, 8–15. Rammsayer, T., & Altenmüller, E. (2006). Temporal information processing in musicians and nonmusicians. Music Perception 24(1), 37–48. Rohrmeier, M. (2011). Towards a generative syntax of tonal harmony. Journal of Mathematics and Music 5(1), 35–53. Rohrmeier, M., & Cross, I. (2008). Statistical properties of tonal harmony in Bach’s chorales. In Ken’ichi Miyazaki, Mayumi Adachi, Yuzuru Hiraga, Yoshitaka Nakajima, and Minoru Tsuzaki (Eds.), Proceedings of the 10th International Conference on Music Perception and Cognition. ICMPC (CD-ROM). Rohrmeier, M., & Rebuschat, P. (2012). Implicit learning and acquisition of music. Topics in Cognitive Science 4(4), 525–553. Rohrmeier, M., Zuidema, W., Wiggins, G.  A., & Scharff, C. (2015). Principles of structure building in music, language and animal song. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140097. Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science 1(6), 906–914. Saffran, J.  R. (2001). Words in a sea of sounds: The output of infant statistical learning. Cognition 81(2), 149–169. Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science 274(5294), 1926–1928. Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of distributional cues. Journal of Memory and Language 35(4), 606–621. Salzer, F. (1962). Structural hearing: Tonal coherence in music (Vol. 1). New York: Dover Publications. Schenker, H. (1956). Neue musikalische theorien und phantasien: Der freie satz (2nd ed.). Vienna: Universal Edition. Schön, D., & François, C. (2011). Musical expertise and statistical learning of musical and linguistic structures. Frontiers in Psychology 2, 167. Retrieved from https://doi:10.3389/ fpsyg.2011.00167 Schönwiesner, M., Novitski, N., Pakarinen, S., Carlson, S., Tervaniemi, M., & Näätänen, R. (2007). Heschl’s gyrus, posterior superior temporal gyrus, and mid-ventrolateral prefrontal cortex have different roles in the detection of acoustic changes. Journal of Neurophysiology 97(3), 2075–2082. Serafine, M.  L., Glassman, N., & Overbeeke, C. (1989). The cognitive reality of hierarchic structure in music. Music Perception 6(4), 397–430. Sethares, W. A. (2005). The gamelan. In W. A. Sethares, Tuning, timbre, spectrum, scale (pp. 165–187). Berlin: Springer. Sinex, D. G., Guzik, H., Li, H., & Henderson Sabes, J. (2003). Responses of auditory nerve fibers to harmonic and mistuned complex tones. Hearing Research 182(1–2), 130–139. Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: Selfpaced reading time evidence for shared processing of linguistic and musical syntax. Psychonomic Bulletin & Review 16(2), 374–381.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

210   stefan koelsch Snyder, J. S., & Elhilali, M. (2017). Recent advances in exploring the neural underpinnings of auditory scene perception. Annals of the New York Academy of Sciences 1396, 39–55. Song, J. H., Skoe, E., Wong, P. C. M., & Kraus, N. (2008). Plasticity in the adult human auditory brainstem following short-term linguistic training. Journal of Cognitive Neuroscience 20(10), 1892–1902. Steedman, M. J. (1984). A generative grammar for jazz chord sequences. Music Perception 2(1), 52–77. Steinbeis, N., & Koelsch, S. (2008). Shared neural resources between music and language indicate semantic processing of musical tension-resolution patterns. Cerebral Cortex 18(5), 1169–1178. Strait, D. L., Kraus, N., Skoe, E., & Ashley, R. (2009). Musical experience and neural efficiency: Effects of training on subcortical processing of vocal expressions of emotion. European Journal of Neuroscience 29(3), 661–668. Sun, L., Liu, F., Zhou, L., & Jiang, C. (2018). Musical training modulates the early but not the late stage of rhythmic syntactic processing. Psychophysiology 55(2), e12983. Sun, Y., Lu, X., Ho, H. T., Johnson, B. W., Sammler, D., & Thompson, W. F. (2018). Syntactic processing in music and language: Parallel abnormalities observed in congenital amusia. NeuroImage: Clinical 19, 640–651. Sussman, E. S. (2007). A new view on the MMN and attention debate: The role of context in processing auditory events. Journal of Psychophysiology 21(3), 164–175. Tarr, B., Launay, J., & Dunbar, R. I. (2014). Music and social bonding: “Self–other” merging and neurohormonal mechanisms. Frontiers in Psychology 5, 1096. Retrieved from https:// doi.org/10.3389/fpsyg.2014.01096 Teinonen, T., Fellman, V., Näätänen, R., Alku, P., & Huotilainen, M. (2009). Statistical language learning in neonates revealed by event-related brain potentials. BMC Neuroscience 10(1), 21. Teinonen, T., & Huotilainen, M. (2012). Implicit segmentation of a stream of syllables based on transitional probabilities: An MEG study. Journal of Psycholinguistic Research 41(1), 71–82. Terhardt, E. (1991). Music perception and sensory information acquisition: Relationships and low-level analogies. Music Perception: An Interdisciplinary Journal 8(3), 217–239. Tervaniemi, M. (2009). Musicians—same or different? Annals of the New York Academy of Sciences 1169, 151–156. Tervaniemi, M., Castaneda, A., Knoll, M., & Uther, M. (2006). Sound processing in amateur musicians and nonmusicians: Event-related potential and behavioral indices. Neuroreport 17(11), 1225–1228. Tervaniemi, M., Ilvonen, T., Karma, K., Alho, K., & Näätänen, R. (1997). The musical brain: Brain waves reveal the neurophysiological basis of musicality in human subjects. Neuroscience Letters 226(1), 1–4. Tervaniemi, M., Janhunen, L., Kruck, S., Putkinen, V., & Huotilainen, M. (2016). Auditory profiles of classical, jazz, and rock musicians: Genre-specific sensitivity to musical sound features. Frontiers in Psychology 6, 1900. Retrieved from https://doi.org/10.3389/ fpsyg.2015.01900 Tervaniemi, M., Rytkönen, M., Schröger, E., Ilmoniemi, R. J., & Näätänen, R. (2001). Superior formation of cortical memory traces for melodic patterns in musicians. Learning & Memory 8(5), 295–300. Thiessen, E.  D., Kronstein, A.  T., & Hufnagle, D.  G. (2013). The extraction and integration framework: A two-process account of statistical learning. Psychological Bulletin 139(4), 792–814.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

neural basis of music perception   211 Tillmann, B., Bharucha, J., & Bigand, E. (2000). Implicit learning of tonality: A self-organized approach. Psychological Review 107(4), 885–913. Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical priming. Cognitive Brain Research 16(2), 145–161. Todd, N. P. M., & Cody, F. W. (2000). Vestibular responses to loud dance music: A physiological basis of the “rock and roll threshold”? Journal of the Acoustical Society of America 107(1), 496–500. Todd, N. P. M., Paillard, A., Kluk, K., Whittle, E., & Colebatch, J. (2014). Vestibular receptors contribute to cortical auditory evoked potentials. Hearing Research 309, 63–74. Tramo, M.  J., Shah, G.  D., & Braida, L.  D. (2002). Functional role of auditory cortex in ­frequency processing and pitch perception. Journal of Neurophysiology 87(1), 122–139. Villarreal, E. A. G., Brattico, E., Leino, S., Østergaard, L., & Vuust, P. (2011). Distinct neural responses to chord violations: A multiple source analysis study. Brain Research 1389, 103–114. Vuvan, D. T., Zendel, B. R., & Peretz, I. (2018). Random feedback makes listeners tone-deaf. Scientific Reports 8(1), 7283. Warren, J.  D., Uppenkamp, S., Patterson, R.  D., & Griffiths, T.  D. (2003). Separating pitch chroma and pitch height in the human brain. Proceedings of the National Academy of Sciences 100(17), 10038–10042. Whitfield, I. (1980). Auditory cortex and the pitch of complex tones. Journal of the Acoustical Society of America 67(2), 644–647. Winkler, I. (2007). Interpreting the mismatch negativity. Journal of Psychophysiology 21(3–4), 147–163. Wong, P.  C.  M., Skoe, E., Russo, N.  M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422. Zatorre, R. J. (1988). Pitch perception of complex tones and human temporal-lobe function. Journal of the Acoustic Society of America 84, 566–572. Zatorre, R.  J. (2001). Neural specializations for tonal processing. Annals of the New York Academy of Sciences 930, 193–210. Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences 6(1), 37–46. Zendel, B. R., Lagrois, M.-É., Robitaille, N., & Peretz, I. (2015). Attending to pitch information inhibits processing of pitch information: The curious case of amusia. Journal of Neuroscience 35(9), 3815–3824. Zhang, J., Zhou, X., Chang, R., & Yang, Y. (2018). Effects of global and local contexts on chord processing: An ERP study. Neuropsychologia 109, 149–154. Zuijen, T. L. von, Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2004). Grouping of sequential sounds: An event-related potential study comparing musicians and nonmusicians. Journal of Cognitive Neuroscience 16(2), 331–338. Zuijen, T. L. von, Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2005). Auditory organization of sound sequences by a temporal or numerical regularity: A mismatch negativity study comparing musicians and non-musicians. Cognitive Brain Research 23(2–3), 270–276.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 10

M u ltisensory Processi ng i n M usic Frank Russo

Introduction Definitions of music tend to be unimodal in nature, often including some version of the idea that music is organized sound with aesthetic intent. Even philosophical ­treatises that attempt to define music in broad terms tend to overlook multisensory aspects (Nattiez,  1990; Thomas,  1983). However, multisensory aspects abound. For instance, the facial expressions and body gestures of a performer may be perceived through the visual system and the mechanical vibrations produced by a musical instrument may be perceived through the somatosensory system. Sensorimotor networks may also give rise to cascade effects. For example, motor activity in response to a beat may give rise to micro-movements of the head and torso, which may in turn lead to vestibular stimulation. When the motor activity becomes entrained it may serve as its own channel of sensory input. As such, the perception of music is often multisensory, integrating inputs from auditory, visual, somatosensory, vestibular, and motor areas. This chapter has three main sections. The first provides an overview of theory and evidence regarding multisensory processing. The second considers auditory-only processing with a focus on lateralization, basic modularity, and pathways. This sets the stage for the final section, which considers non-auditory and multisensory processing of pitch, timbre, and rhythm. In each subsection corresponding to a dimension of music, psychophysical evidence is presented before reviewing the extant neuroscientific evidence. Where no neuroscientific evidence exists, proposals have been made about the types of neural processing that may be involved.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   213

Multisensory Processing It has often been noted that speech is perceived by eye and by ear. This is normally characterized as an opportunity to minimize uncertainty as it allows the brain to capitalize on convergences. However, it can also represent a sensory-processing challenge in that information from across two channels must somehow be bound together into a common representation. This challenge may be even greater in music given the additional channels of sensory information that are routinely involved and the intentional use of uncertainty as a compositional device. Nevertheless, under most conditions, multisensory information in music is successfully integrated yielding a coherent and stable multisensory percept. Information from across the senses may be integrated in a manner that is cognitive or perceptual (Schutz, 2008). Cognitive integration takes place after information from two or more channels has been processed independently (see review and meta-analysis concerning audio-visual music by Platz & Kopiez, 2012). A classic musical example of this type of integration is the influence of performer attractiveness on judgments of perform­ance quality (Wapnick, Mazza, & Darrow, 2000). In this example, information from one channel does not so much alter perception in another as much as influence how those perceptions are evaluated. Another musical example that reflects cognitive multisensory integration concerns the “blue note” in live jazz and blues. Blue notes are often accompanied by a visual ­display that conveys negatively valenced emotion (e.g., wincing of the eyes, shaking, or rolling the head back). Thompson, Graham, and Russo (2005) sought to assess the effect of this practice by using twenty clips of a blues concert performed by B.B.  King. Although all of the selected clips possessed some level of dissonance, half were performed with a relatively neutral facial expression. Two groups of participants were asked to provide judgments of dissonance. One group made judgments in an auditory-only condition and the other made judgments in an auditory-visual condition. Results revealed that visual information influenced judgments of dissonance, such that the difference between dissonant and neutral performances was greater in the audio-visual condition. However, it would be erroneous to conclude that information from the visual and auditory channel had been integrated at the level of perceptual representation. Integration at the perceptual level is said to take place when information from across the senses is integrated in a manner that is automatic and pre-attentive (Arieh & Marks, 2008; Spence, 2011). All of the multisensory examples considered in the rest of this chapter meet these simple criteria. However, the neural mechanisms allowing for perceptual integration are by no means uniform. To foreshadow, there are at least three main types of mechanisms that have been implicated. The mechanisms vary with respect to network size but all involve some form of direct or indirect communication between primary sensory areas of the brain (see Fig. 1).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

214  frank russo

S

A

V

STS

S

S

A

V

STS

IFG

A

V

Figure 1.  Schematic diagram of brain circuitry underpinning three mechanisms of multisensory integration (STS = Superior Temporal Sulcus; IFG = Inferior Frontal Gyrus; S = Somatosensory Cortex; A = Auditory Cortex; V = Visual Cortex). Top panel diagrams first mechanism involving primary sensory areas only. Second panel diagrams second mechanism involving the first mechanism in addition to a known multisensory area, superior-temporal sulcus (STS). Bottom panel diagrams third mechanism that may be described as sensorimotor. It builds on the second mechanism adding feedback connections from a known motor planning area, inferior frontal gyrus (IFG). Subcortical contributions from the superior colliculus not diagrammed.

First, a basic form of multisensory integration occurs when unisensory input activates areas of primary sensory cortex that are not normally associated with that input. This phenomenon has been observed following sensory deprivation that is permanent (e.g.,  blindness) or temporary (e.g., blindfold), suggesting a role for rapid cortical plasticity (Merabet et  al., 2008). Complementary evidence has been found using ­unisensory information. For example, auditory cortex may be activated by lip reading in

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   215 the context of silent speech (Calvert et al., 1997) or silent tactile stimulation (Foxe et al., 2002). Although not strictly multisensory, these examples reveal the existence of lateral connections between primary sensory areas and suggest the potential for integration without the involvement of higher-order multisensory areas (Foxe & Schroeder, 2005). Second, evidence has been observed for a “superadditive” neural response to multisensory input that is greater than the neural response for equivalent unisensory inputs. Most of the evidence for superadditivity has been found using intercellular recordings in the superior colliculus using animal models (Stein & Meredith, 1993). However, using non-invasive imaging methods, evidence for superadditivity has also been found in the cerebral cortex. For instance, a superadditive response has been observed in superior temporal sulcus using audio-tactile and audio-visual stimuli (Beauchamp, Yasar, Frye, & Ro, 2008). This body of evidence suggests a mechanism for multisensory integration that relies on hierarchical processing involving the progressive convergence of pathways. Finally, evidence is emerging from electrophysiological, neuroimaging, and brain stimulation studies for the functional role of connectivity across broad expanses of sensorimotor cortex (Frith & Hasson, 2016; Keil, Müller, Ihssen, & Weisz, 2012; Luo, Liu, & Poeppel, 2010; Luo & Poeppel, 2007). Synchronized oscillations across multisensory and motor areas may serve to integrate and select task-relevant information from across the senses. Sensory input may feed forward leading to a predictive motor code that is informed by priors (empirically based expectations about movement patterns). In turn, this predictive code can feed back to multisensory areas allowing for comparison with incoming sensory input (Kilner, Friston, & Frith, 2007). This body of evidence emphasizes the inherent uncertainty that exists in sensory information and the important role that the motor system can have in disambiguating that uncertainty. This sensorimotor mechanism allows for context-sensitive multisensory integration that relies on feedforward and feedback connections (Senkowski, Schneider, Foxe, & Engel, 2008). In addition to investigating the particular mechanisms underpinning multisensory integration, research has attempted to explain the extent to which the different senses will contribute to the perception of a multisensory stimulus. The likelihood of integrating information from across the senses is lawfully related to the extent to which information about a signal appears to overlap in space and time. In other words, when the audio and visual aspects of a signal are delayed in time or separated in space, the likelihood of integration is reduced. In addition, the law of inverse effectiveness states that multisensory integration is inversely proportional to the effectiveness of the strongest unisensory response (Meredith & Stein, 1986; Stein & Meredith, 1993). Hence, if an auditory input is robust on its own to facilitate some functional goal, it will be resistant to influence from non-auditory information. If the auditory input is weak due to a compromised sensory system, perceptual ambiguity, or masking from noise, then the likelihood of integrating information from other senses increases. Maximum-likelihood estimation (MLE) methods have been used to model psychophysical as well as neural findings (Alais & Burr,  2004; Ernst & Banks,  2002; Gu, Angelaki, & DeAngelis, 2008; Rohe & Noppeney, 2015). Based on Bayesian probability

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

216  frank russo theory, MLE models are essentially a weighted linear sum that combines signals from different senses (Angelaki, Gu, & DeAngelis, 2009; Ernst & Bülthoff, 2004). The weight assigned to each signal is determined by stimulus or perceiver characteristics that influence signal reliability. Like the inverse effectiveness rule, the critical assumption in this approach is that inherent uncertainty exists in sensory information.

Auditory Processing Despite the extensive involvement of non-auditory areas in music processing, there is no mistaking that the auditory cortex is the central hub for processing music in the ­neurotypical brain. Rather than an undifferentiated whole, the auditory cortex is best understood as a collection of modules that work together as an “auditory network” enabling the processing of separate dimensions of music. The exposition of these modules is briefly reviewed here to allow for comparison with processing of the same dimensions as experienced by other senses. While more exhaustive reviews of auditory neuroscience may be found elsewhere in this volume, the focus of the brief review provided here sets the stage for subsequent discussion of evidence for non-auditory input activating auditory cortex. The area known as the auditory core exists in both hemispheres including the superior temporal gyrus of the temporal lobe and extending into the lateral sulcus as well as the transverse temporal gyri that runs toward the center of the brain. The latter is often referred to as Heschl’s gyrus, which is the first structure in the cortex that reflects the tonotopic map that originates in the cochlea. Some research has suggested the existence of separate caudal and rostral tonotopic maps with mirror-like orientations (Formisano et al., 2003). Additional tonotopic maps have been found in the belt area surrounding the core (Rauschecker, Tian, & Hauser, 1995; Rauschecker, Tian, Pons, & Mishkin, 1997). Beyond the belt area lies a tertiary area of auditory cortex known as the parabelt. The parabelt is thought to have functionally distinct subdivisions (Kaas & Hackett, 2000). The caudal subdivision abuts and is interconnected with the superior temporal sulcus. Together, this caudal subdivision of the parabelt and the superior temporal sulcus constitute the posterior hub of the auditory-motor pathway (more details on pathways below). An early PET study by Zatorre & Belin (2001) indicated that in both hemispheres, temporal variation of auditory input engages the core, whereas spectral variation engages the belt. However, responses to temporal features (i.e., with relevance for rhythm) were clearly biased toward the left and responses to spectral features (i.e., with relevance for pitch and timbre) were clearly biased toward the right. This apparent pattern of hemispheric specialization has been further validated by the results of neuropsychological studies involving patients with cortical lesions. In general, patients with lesions in the right hemisphere have more impaired pitch processing than do those with

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   217 lesions in the left hemisphere. For example, lesions in the right hemisphere lead to weaker pitch discrimination (Milner,  1962; Johnsrude, Penhune, & Zatorre,  2000), weaker perception of the missing fundamental (Zatorre,  1988), weaker sensitivity to pitch direction (Johnsrude, Penhune, & Zatorre, 2000), and weaker sensitivity to the global pitch contour (Peretz, 1990). On the basis of neurophysiological and psychophysical data, Poeppel (2001) proposed a similar (but speech-specific) hemispheric specialization that focused on the window of temporal integration. He proposed that the left hemisphere had a short integration window (20–50 ms) that supports processing of formant transitions and that the right hemisphere had a long integration window (150–250 ms) that supports processing of intonation contours. This specialization may ultimately be rooted in differences in the volume of white-matter tissue across the two hemispheres. A postmortem study by Anderson, Southern, and Powers (1999) found a higher volume of white matter tissue in the belt area of the left hemisphere compared to the right due to greater thickness of the myelin sheathing. More recent neuroimaging research has further validated this proposed explanation for hemispheric specialization. For example, Hyde, Peretz, & Zatorre (2008) found that activation in the right hemisphere increased parametrically as a function of the pitch distance between consecutive tones. In contrast, they observed only a coarse-grain differentiation in the left hemisphere. Auditory evoked potentials have also been used to elucidate hemispheric specialization. Neurons in the right hemisphere have been found to possess sharper frequency tuning than those in the left hemisphere (Liégeois-Chauvel, Giraud, Badier, Marquis, & Chauvel, 2012).

Pathways Much like the “what” (ventral) and “where” (dorsal) visual pathways originally proposed to explain functional organization in the visual system (Goodale & Milner, 1992), there are two main auditory pathways leading out of auditory cortex and terminating in frontal areas (Zatorre, Chen, & Penhune, 2007). A ventral auditory pathway is thought to be involved primarily with category-based representations (e.g., phonemes). A dorsal “auditory-motor” pathway is thought to be specialized for sensorimotor translations of time-varying information that is not categorical. This pathway may be particularly important in the context of learning a new piece of music (Lahav, Saltzman, & Schlaug, 2007; Schalles & Pineda,  2015), perceiving emotion in music (McGarry, Pineda, & Russo, 2015; Thompson et al., 2005; Vines, Krumhansl, Wanderley, Dalca, & Levitin, 2011), and in the type of feedback monitoring required for performance, particularly in continuous pitch instruments like voice or violin (Loui, 2015; Zatorre et al., 2007). The auditory-motor pathway involves reciprocal connections between inferior frontal gyrus and posterior subdivisions of the superior temporal gyrus (auditory parabelt) and superior temporal sulcus (multisensory area).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

218  frank russo

Multisensory Perception of Pitch Visuomotor Influences Numerous studies have demonstrated that the size of a sung melodic interval can be judged directly through the visual system. When videos of sung melodic intervals are presented to observers without audio, they are able to accurately scale them according to size (Thompson & Russo, 2007). This ability does not appear to require music or vocal training, which argues against a cognitive account based on long-term memory associations, and further suggests that some aspects of the visual information provide reliable cues for judging interval size. Video-based tracking has shown that larger intervals possess more head movement, eyebrow raising, and mouth opening. The influence of visual information on perception of size in sung melodic intervals persists even under point-light presentation conditions in which the dynamic information in the display is retained while eliminating static visual cues (Abel, Li, Russo, Schlaug, & Loui, 2016). The visual channel continues to influence the perceived size of sung melodic intervals even when audio is present (Russo, Sandstrom, & Maksimowski, 2011; Thompson et al., 2005; Thompson, Russo, & Livingstone,  2010). The mouth area may be particularly important in judging the size of sung melodic intervals as reducing the level of audibility in an audio-visual presentation (by increasing level of background noise) causes observers to increase the proportion gaze directed toward the mouth (Russo et al., 2011). However, the visual influence on auditory judgments has been found to be mitigated for participants with a young onset of musical training (Abel et al., 2016). One interpretation of this finding is that early-trained musicians possess a stronger audio-motor representation of sung melodic intervals. This enhancement in priors may allow them to focus on auditory input or rely less heavily on non-auditory input when presented with multisensory musical stimuli. This prioritization may be further reinforced through experience playing in groups where orthogonal streams of audio and visual information may co-exist. But how can we be sure that vision influences melodic pitch processing at a perceptual (vs. cognitive) level? One behavioral means of assessing whether multisensory integration is perceptual is to utilize a dual-task paradigm. Thompson et al. (2010) presented participants with sung melodic intervals accompanied by facial expressions used to perform a small or large interval (two and nine semitones, respectively). Participants were asked to count the number of 0’s and 1’s that were superimposed over the singer’s face during performance of each interval. The conditions were blocked by digit speed (300 or 700 msec per digit) as well as task demand (single or dual task). Results revealed that the influence of the visual information on auditory judgments of sung melodic interval size was not moderated by cognitive load. These findings suggest that the integration was automatic and pre-attentive. The cortical underpinnings of this example of multisensory integration in music may originate in motion selective areas of the dorsal visual pathway, such as the medial temporal and the medial superior temporal areas. Both of these areas are adjacent to the

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   219 posterior bank of the superior temporal sulcus, a known multisensory area that projects to premotor areas allowing for sensorimotor translations of dynamic sensory input (Kilner, 2011). There are also reciprocal connections from premotor to superior temporal sulcus allowing for a predictive coding model of action involving sensory representations (Kilner et al., 2007). This type of predictive coding may be particularly important in shaping auditory judgments of action on the basis of visual input alone or in situations where auditory input is ambiguous for some reason (e.g., an individual with severe hearing loss or an individual with normal hearing listening in low signal-to-noise conditions). The mechanism proposed for visual perception and audio-visual integration of melodic pitch information involves feedforward and feedback connections along the dorsal stream. Feedforward connections provide multisensory input to motor planning areas. Feedback connections provide predictive coding of movement informed by ­priors that can be compared with incoming sensory information (Kilner et al., 2007; Maes, Leman, Palmer, & Wanderley, 2014). In the case of individuals with severe hearing loss, there may also be an additional contribution owed to visual activation of the auditory cortex in belt areas (Finney, 2001; Röder, Stock, Bien, Neville, & Rösler, 2002). Research in animal models suggests that belt areas undergo profound plastic changes following a period of auditory deprivation, which leads in some cases to enhanced visual processing. Lomber, Meredith, and Kral (2010) showed that deactivation of posterior belt areas selectively eliminates enhancements to visual localization, whereas deactivation of the dorsal belt areas eliminates enhancement of visual motion detection. Transcranial magnetic stimulation (TMS) has been used as one means of investigating the assumed involvement of motor areas in processing sung melodic intervals (Royal, Lidji, Théoret, Russo, & Peretz, 2015). Non-musicians were given brief training that enabled them to apply a label to intervals of different size (e.g., unison, octave, etc.). Following training, facilitative TMS was applied over motor cortex, while participants observed a pitch interval label that was immediately followed by the audio-visual ­presentation of a sung interval. Participants were required to make a forced-choice judgment regarding whether the pitch interval label matched the pitch interval contained in the two-note vocal melody. Motor-evoked potentials recorded from the mouth muscles contralateral to the hemisphere receiving stimulation were found to increase relative to baseline for large pitch intervals and decrease for small pitch intervals, suggesting that some type of motor simulation was taking place. Another line of evidence in support of motor involvement in perception of song may be found in EEG research investigating the sensorimotor (or mu) wave. The oscillatory generators of the sensorimotor wave can be found in the inferior frontal gyrus, and to a lesser extent in the inferior parietal lobe. The sensorimotor wave becomes desynchronized when an individual moves intentionally or when they observe others moving intentionally, and the extent of desynchronization is enhanced under multisensory presentation conditions (Kaplan & Iacoboni, 2007; McGarry, Russo, Schalles, & Pineda, 2012). These data have been interpreted as evidence of an internal simulation involving motor planning and proprioception. While some controversy exists regarding the putative mirror system responsible for the sensorimotor wave (Hickok,  2009), its

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

220  frank russo responsiveness to observation of intentional action is less equivocal. A meta-analysis by Fox et al. (2016), involving eighty-five studies, found significant event-related desynchronization during observation of intentional action (Cohen’s d = 0.31, N = 1,508). With regard to music stimuli, evidence has been found for sensorimotor desynchronization in response to audio-only presentations of isolated sung notes (Lévêque & Schön, 2013) and audio-visual presentations of sung melodic intervals (McGarry et al., 2015). Although it seems likely, it remains to be determined whether sensorimotor desynchronization in response to song is greater in multisensory compared to unisensory presentation conditions.

Somatosensory Influences Because all sound arises from a source of mechanical vibration, it should be no surprise that evidence exists for perception of pitch and other musical dimensions on the basis of vibrotactile input (i.e., mechanical vibration of the skin). Detection thresholds for vibrotactile stimuli show peak sensitivity around 250 Hz, and a sharp decline in sensitivity (i.e., larger thresholds) below 100 Hz (Hopkins, Maté-Cid, Fulford, Seiffert, & Ginsborg, 2016; Morioka & Griffin, 2005; Verrillo, 1992). Thresholds are also smaller in smooth (vs. hairy) skin due to increased mechanoreceptor density (Verrillo & Bolanowski, 1986), and with large (vs. small) contactor areas due to effects of spatial summation (Morioka & Griffin, 2005). Pitch discrimination thresholds obtained with vibrotactile stimuli tend to be about five times greater than those obtained with auditory stimuli (Branje, Maksimowski, Karam, Fels, & Russo, 2010; Verrillo, 1992). In addition to this relatively poor pitch discrimination ability, there is no convincing psychophysical evidence for vibrotactile pitch discriminations beyond about 1,000 Hz. Single cell recording in macaques has revealed that low-frequency vibrotactile stimuli can activate belt areas of auditory cortex (Schroeder et al., 2001). Convergent evidence has been found in imaging studies involving adults with normal hearing. Low-frequency vibrotactile stimuli has been shown to activate auditory cortex bilaterally (Levänen, Jousmäki, & Hari,  1998), particularly in posterior belt areas (Schürmann, Caetano, Hlushchuk, Jousmäki, & Hari, 2006). The extent of auditory activations observed in deaf participants is more widespread than that observed in normal hearing participants (Auer, Bernstein, Sungkarat, & Singh, 2007), likely due to neuroplastic changes following sensory deprivation. One question resulting from this work is whether activation of auditory areas by vibrotactile stimuli is direct or whether it is the result of projections from somatosensory areas. Using MEG, Caetano and Jousmäki (2006) were able to track the time course of vibrotactile activations. They presented normal hearing participants with 200 Hz vibrotactile stimuli delivered to the fingertips. An initial response was observed in somatosensory cortex, peaking around 60 ms, followed by transient auditory responses in auditory and secondary somatosensory cortices between 100 and 200 ms. Finally, a  sustained response was observed in auditory cortex between 200 and 700 ms.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   221 Somatosensory Cortex primary cortex for processing vibrotactile information Motor Cortex primary cortex for processing motor activity including output from predictive coding for motor simulations Sensorimotor Pathways feedforward and feedback from sensory areas to motor planning areas; critical for sensorimotor transformations and predictive Inferior Frontal Gyrus primary area for motor planning; connections to basal ganglia and motor cortex Vestibular Cortex primary cortex for processing vestibular information Superior Colliculus primary subcortical area (not visible) implicated in multisensory integration

Auditory Core primary region of auditory cortex; includes Heschl’s gyrus implicated in pitch perception Auditory Belt secondary region of auditory cortex; increased response to visual and vibrotactile input following sensory deprivation Auditory Parabelt tertiary region of auditory cortex; posterior region interconnected with superior temporal sulcus and implicated in multisensory integration Visual Cortex primary cortex for processing visual information Medial Temporal Area involved in processing dynamic visual information; posterior region interconnected with superior temporal sulcus Superior Temporal Sulcus primary cortical area implicated in multisensory integration

Figure 2.  Schematic sagittal view of the human brain featuring modules and pathways that are involved in the multisensory perception of music.

Although these studies all present unisensory stimuli, taken together, these findings suggest a likely mechanism for audio-tactile integration that is hierarchical involving a progressive convergence of auditory and somatosensory pathways. One of the main areas of sensory convergence in the cortex appears to be the posterior subdivisions of the auditory parabelt and the superior temporal sulcus (see Fig. 2).

Multisensory Perception of Timbre Visuomotor Influences Saldaña and Rosenblum (1993) presented participants with audio-visual presentations of cello tones where bowing and plucking was crossed across the senses. So, for example, observers were presented with a multisensory stimulus in which the audio channel consisted of an unequivocal plucking sound and the visual channel presented an unequivocal bowing movement. Much like the “McGurk effect” upon which this study is based (McGurk & Macdonald, 1976), auditory judgments were influenced by visual information. For instance, plucking sounds were more likely to be heard as bowing when accompanied by bowing visual movement. The authors interpreted their results with regard to an automatic internal motor simulation that is driven by auditory and visual information. Much like the explanation for sung melodic pitch, an internal motor simulation may have provided a predictive coding model of action involving sensory representations. The output of the predictive coding model may have been

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

222  frank russo integrated with direct auditory input at the level of the superior temporal sulcus. Consistent with this interpretation, fMRI work involving multisensory speech has consistently implicated the superior temporal sulcus and superior temporal gyrus (Callan et al., 2003, 2004; Jones & Callan, 2003). Similar evidence has been found with multisensory tool use and the extent of activation in the superior temporal sulcus appears to adhere to the law of inverse effectiveness (Stevenson & James, 2009).

Somatosensory Influences Several studies have investigated the ability to discriminate timbre using vibrotactile stimuli. Russo, Ammirante, and Fels (2012) found that deaf and hearing observers were able to accurately distinguish instrument timbres on the basis of vibrotactile information alone. Deaf and hearing participants were also able to distinguish timbre on the basis of synthetic tones that differed only with regard to spectral envelope (dull vs. bright). This ability persisted even though numerous controls were put in place to ensure that participants received no trace of residual auditory input. Russo et al. (2012) proposed that vibrotactile discrimination involves the cortical integration of spectral information filtered through frequency-tuned mechanoreceptors. There are four known channels that respond to touch (Bolanowski, Gescheider, Verrillo, & Checkosky, 1988), and each is sensitive to a unique range of the frequency spectrum. This allows the mechanoreceptors to collectively code for spectral shape in the same way that has been proposed for critical bands in the auditory system (Makous, Friedman, & Vierck, 1995). It would only take two such channels to allow for the coding of spectral tilt. A follow-up study revealed that deaf participants are able to discriminate sung vowels and that extent of difference in spectral tilt between pairs strongly predicted their discriminability (Ammirante, Russo, Good, & Fels, 2013). In addition to the influence of vibrotactile stimulation on passive reception of timbre, it seems likely that such stimulation provides performers with valuable timbre information during active performance (Marshall & Wanderley, 2011). As an example, the string vibrations of a piano are detectable at the level of the key press. Vibration detection thresholds are reduced under natural playing conditions involving active touch (Papetti, Jarvelainen, Giordano, Schiesser, & Frohlich, 2017) and the co-occurrence of sound at the same frequency (Ro, Hsu, Yasar, Caitlin Elmore, & Beauchamp, 2009). Perhaps not surprisingly, the perception of sound quality as evaluated by the performer has been shown to be influenced by vibration that is felt through the keys (Fontana, Papetti, Järveläinen, & Avanzini, 2017). To date, there have been no neural studies investigating auditory-tactile perception of timbre. However, it seems likely that this ability would depend on direct projections from somatosensory cortex to posterior belt areas of auditory cortex (see top panel of Fig. 1). These direct projections are likely to be right lateralized because of thinner myelin sheathing in the right auditory cortex (Anderson et al., 1999), which may better support communication across frequency channels, thus enabling spectral analysis.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   223

Multisensory Perception of Rhythm Visuomotor Influences Rhythm involves the metrical patterning and grouping of tones that is shaped by intensity and duration. Visual influences have been found to affect the ability to track rhythm as well as the low-level dimensions that contribute to rhythm (e.g., loudness and duration). Because percussionists do not have the ability to independently control the intensity and duration of the notes that they produce, the use of gestures may be particularly important in shaping these dimensions (Schutz, 2008). Rosenblum and Fowler (1991) recorded handclaps of varying intensity. They presented participants with audio-visual pairings of the handclaps that were either congruent or incongruent. Although participants were asked to base loudness judgments only on what they heard, the visual information presented had a systematic influence on loudness judgments. Schutz and colleagues have shown that expressive gestures are also able to influence the duration of a performed note. Their initial study utilized recordings of notes performed on a marimba with “long” and “short” gestures (Schutz & Lipscomb, 2007). Audio and visual channels were recombined to form congruent and incongruent audio-visual pairings. These pairings were presented to listeners and they were asked to make duration estimations on the basis of sound alone. Although the auditory content of the recordings had no effect on estimations of duration, the visual presentation influenced perceived duration such that long gestures lengthened notes and short ­gestures shortened notes. This effect persisted even when visual content was substituted with a point-light display, suggesting that the effect was based on the dynamics of visual movement (Schutz & Kubovy, 2009). The ability to synchronize to metrical structures created by discrete visual flashes has been found to be inferior to synchronization with discrete auditory tones that have the same temporal characteristics (Patel, Iversen, Chen, & Repp, 2005). However, the auditory advantage is almost eliminated if visual rhythms are presented using continuous stimuli such as a bouncing ball (Grahn,  2012; Hove, Fairhurst, Kotz, & Keller, 2013; Iversen, Patel, Nicodemus, & Emmorey, 2015). Imaging results have shown that activation in the putamen, a key timing area involved in motor planning and beat perception (Grahn & Brett, 2007), parallels results obtained with sensorimotor synchronization tasks. In particular, continuous visual stimuli led to greater activation of the putamen than did visual flashes, approaching activation levels obtained with auditory beeps. This finding suggests that the ability to synchronize to metrical structure is not simply contingent on the channel of sensory input but also on the nature of stimulus presentation (Grahn, 2012; Hove et al., 2013; Ross, Iversen, & Balasubramaniam, 2016). While discrete events are optimal with auditory stimuli, continuous events lead to better outcomes with visual stimuli. Some evidence suggests that the deaf possess some advantage in tracking visual rhythms (Iversen et al., 2015). The latter finding may be owed to neuroplastic changes

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

224  frank russo resulting from sensory deprivation and life-long experience with signing (Bavelier et al., 2000, 2001). Referring back to Fig. 1, the strength of direct visual input to auditorymotor pathways is likely enhanced in deaf individuals. Many studies have used EEG to assess neural entrainment to the beat. When the frequency of the beat is within the range of human movement (e.g., 1 to 4 Hz), large swathes of cortex entrain to that frequency. These neural oscillations will persist even after a rhythmic stimulus has been temporarily paused. Depending on when the rhythmic stimulus is resumed, the entrained neural oscillations will either increase or decrease in power (Simon & Wallace, 2017). Power decreases when the rhythmic stimulus anticipates the beat (too early) and it increases when the rhythmic stimulus is resumed on the beat (on time). However, if the beat is resumed as an audio-visual event, there is no modulation of power in the entrained neural oscillations. These findings reveal that multisensory inputs are not equivalent to auditory inputs with respect to entrainment. One interpretation is that multisensory input is “highly reliable or salient” and that resources should be allocated to processing it independently from the oscillations manifesting from the original auditory-only beat. This pattern of neural findings may also help to explain results from sensorimotor synchronization studies revealing superior synchronization using multisensory rhythms compared with auditory-only rhythms (Elliott, Wing, & Welchman, 2010; Varlet, Marin, Issartel, Schmidt, & Bardy, 2012). Although visual influences on the perception of rhythm can be powerful, it is important to acknowledge that many listeners will choose to listen with their eyes closed under challenging conditions. One interpretation of this phenomenon is that the visual information is somehow distracting. In a task involving temporal order judgments of varying complexity, researchers found progressively greater deactivation of visual cortical areas as temporal asynchronies approached discrimination thresholds (Hairston et al., 2008). This finding is perhaps best understood from the perspective of the inverse effectiveness rule (Stein & Meredith, 1993), whereby deactivation of the visual cortex protects against integration of potentially aberrant timing information from the visual system in a task that is well handled by audition.

Somatosensory Influences Some evidence exists for the somatosensory system contributing to the perception of rhythm. Tranchant et al. (2017) asked deaf and hearing participants to synchronize movements to a vibrotactile beat delivered through a vibrating platform. Hearing ­participants were also asked to synchronize movements to the same beat delivered through audition and without vibrotactile stimulation. Results revealed that most participants were able to synchronize to the vibrotactile beat with no differences between groups. However, for hearing participants, synchronization performance was better in the auditory condition than in the vibrotactile condition. Other studies have demonstrated that sensorimotor synchronization to a beat is ­possible using vibrotactile stimulation applied to the fingertip (Brochard, Touzalin,

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   225 Després, & Dufour, 2008; Elliott et al., 2010), toe (Müller et al., 2008), or to the back (Ammirante, Patel, & Russo, 2016). Findings have revealed that synchronization to a simple (metronomic) vibrotactile beat can be as accurate as synchronization to an auditory beat but only under certain conditions. For example, Müller et al. (2008) found equivalence on the fingertip but not the toe and Ammirante et al. (2016) found equivalence on the back, but only when a large portion of the back was stimulated. Presumably, spatial summation (involving integration of information across receptors), improved the somatosensory response to rhythmic information (Gescheider, Bolanowski, Pope, & Verrillo,  2002). Ammirante et al. (2016) also included an audio-tactile condition to investigate multisensory integration. Results indicated that sensorimotor synchronization to audio was consistently equivalent to auditory-tactile, regardless of contactors size. These results may be interpreted with respect to the maximum likelihood estimation model (Ernst & Banks, 2002), where auditory information represents a highly reliable cue that is resistant to integration with information from a somewhat less reliable channel of sensory input (vibrotactile). The results of Ammirante et al. (2016) may also be considered with respect to sensorimotor models of perception (Fig. 1, Panel 3). The Action Simulation for Auditory Perception (ASAP) model suggests that our ability to find the beat in rhythm is based on an internal simulation of periodic motor activity (Patel & Iversen, 2014). A secondary hypothesis posited in the model is that beat perception evolved from mechanisms required for verbal communication, as both involve periodic timing and the integration of motor and auditory information. This hypothesis is supported in part by the observation that beat synchronization exists robustly in vocal-learning species that are only distally related to humans (e.g., parrots and elephants) and not at all in non-human primates (Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015). As vocal communication is primarily based in the auditory modality it follows that cognitive and neurological timing mechanisms would show a preference for auditory stimuli. Again, this prediction is confirmed by evidence demonstrating that sensorimotor synchronization to auditory stimuli tends to be superior to sensorimotor synchronization to visual or vibrotactile stimuli. Current research in my lab led by Sean Gilmore is using EEG and source analysis to investigate the extent to which neural entrainment to the beat is possible under audio-only, vibrotactile-only, and audio-vibrotactile stimuli. On the basis of the behavioral results of Ammirante et al. (2016), we expect to find that neural entrainment in motor planning areas will be weakest for vibrotactile stimuli and that no differences will exist between audio and audio-tactile conditions.

Movement-Based Influences Both passive and active head movements are capable of stimulating the vestibular ­system (Cullen & Roy, 2004). Given that people actively move their heads while listening to music it would seem that vestibular stimulation is commonplace in music

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

226  frank russo listening. Moreover, given that vestibular cortex is extensively connected with other sensory systems it stands to reason that there are ample opportunities for multisensory integration in music that involve the vestibular system. Phillips-Silver & Trainor (2005) assessed the contribution of the vestibular system to multisensory rhythm using an ambiguous auditory rhythm. These rhythms can be encoded in duple form (a march) or in triple form (a waltz). The rhythms were presented to infants while they were bounced on every second or every third beat. On the basis of a head-turn preference procedure, researchers were able to conclude that when infants were bounced on every second beat, they were coding the ambiguous rhythm in duple form, and when they were bounced on every third beat they coded the rhythm in triple form. A follow-up experiment in the same study showed that blindfolding infants mitigated but did not eliminate the effect, which confirms that this example of multisensory integration in rhythm does not depend on visual perception. Two other studies by Trainor and colleagues have confirmed that these effects of ­auditory-vestibular integration in music persist into adulthood. In one study, adults were trained to bounce in duple or triple time while listening to an ambiguous rhythm. A subsequent listening test showed that adults identified an auditory version of the rhythm pattern with accented beats that matched their bouncing experience as more similar than a version whose accents did not match (Phillips-Silver & Trainor, 2007). Because this study involved self-motion it was not able to separate out the contributions of vestibular and proprioceptive cues. However, a follow-up study involving direct galvanic stimulation of the vestibular system was able to provide evidence that auditory and vestibular information are integrated in rhythm perception in adults even in the absence of movement. In single cell recordings involving animal models, the posterior parietal cortex appears to be a likely locus of multisensory integration involving vestibular input (Bremmer, Schlack, Duhamel, Graf, & Fink,  2001). This area happens to be proximal to other cortical areas that have been implicated as contributing to multisensory processing (i.e., posterior superior temporal gyrus, auditory parabelt, and medial temporal areas). Other researchers have considered the consequences of multisensory integration resulting from moving to the beat. Manning & Schutz (2013) had participants move or simply listen to an isochronous beat. A final tone was presented following a brief pause and participants were asked whether it was consistent with the timing of the ­preceding sequence. Accuracy in this timing task was superior in the movement condition. In a follow-up study, it was found that the accuracy gains in this timing task are greater in percussionists than in non-percussionists, suggesting a role for experience with moving to the beat (Manning & Schutz, 2016). It seems likely that the multisensory timing cues resulting from moving to the beat would lead to stronger neural entrainment to the beat. Indeed, EEG research involving an ambiguous rhythm has shown that entrainment is stronger after participants have been trained to move to the rhythm in a way that suggests a binary or ternary form (Chemin, Mouraux, & Nozaradan, 2014). In addition, the entrainment gains were detectable at frequencies related to the meter of movement.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   227

Multisensory Music

Multisensory Areas Rhythm

hyt h Pit ch, Tim bre ,R

m

yth Rh

Somatosensory Cortex

Motor Areas , bre Tim ch, Pit

m

Vestibular Areas

Pitch, Timbre, Rhythm

Pitch, Rhythm

Auditory Cortex

Visual Cortex

Multisensory Stimulus

Figure 3.  Schematic representation of cortical connections supporting multisensory perception of music.

Summary and Conclusions This chapter has provided theory and evidence regarding multisensory processing in music. Three mechanisms were proposed and a broad range of evidence was reviewed. Fig.  3 provides a schematic depiction of this review focusing on brain areas and

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

228  frank russo c­ onnections that underpin multimodal processing of pitch, timbre, and rhythm. Solid lines are used to indicate connections that have been validated using multiple lines of evidence. Dashed lines are used to indicate connections that are more theoretical with only limited validation. Regardless of the evidential status, the proposed connection strength is reflected by line thickness. Due to space considerations, this review has been necessarily selective in topics considered. A more exhaustive consideration of the subject could have broadened the focus to include multisensory perception of ­lyrics (Quinto, Thompson, Russo, & Trehub, 2010), expressivity (Vuoskoski, Thompson, Clarke, & Spence, 2014), and emotion (Thompson, Russo, & Quinto, 2008; Vines et al., 2011), as well as examples of multisensory integration that are better understood from an associative or cognitive perspective (e.g., North,  2012; North, Hargreaves, & McKendrick, 1999; Wapnick et al., 2000). Nonetheless, this chapter has attempted to make the case that our conceptualization of music should be multisensory. Although the majority of individuals will justifiably focus on sound as the core of music processing, a more inclusive and nuanced consideration of music takes a multisensory perspective, involving the integration of inputs from auditory, visual, somatosensory, vestibular, and motor areas.

Acknowledgments Funding supporting this research was provided by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC). I would like to thank Fran Copelli for assistance with figures and discussion of concepts. Sean Gilmore and Michael Schutz provided valuable feedback on earlier drafts of this chapter.

References Abel, M. K., Li, H. C., Russo, F. A., Schlaug, G., & Loui, P. (2016). Audiovisual interval size estimation is associated with early musical training. PLoS ONE 11(10), 1–12. Alais, D., & Burr, D. (2004). Ventriloquist effect results from near-optimal bimodal integration. Current Biology 14(3), 257–262. Ammirante, P., Patel, A. D., & Russo, F. A. (2016). Synchronizing to auditory and tactile metronomes: A test of the auditory-motor enhancement hypothesis. Psychonomic Bulletin & Review 23(6), 1882–1890. Ammirante, P., Russo, F. A., Good, A., & Fels, D. I. (2013). Feeling voices. PloS ONE 8(1), 1–5. Anderson, B., Southern, B. D., & Powers, R. E. (1999). Anatomic asymmetries of the posterior superior temporal lobes: A postmortem study. Neuropsychiatry Neuropsychology, and Behavioral Neurology 12(4), 247–254. Angelaki, D. E., Gu, Y., & DeAngelis, G. C. (2009). Multisensory integration: Psychophysics, neurophysiology, and computation. Current Opinion in Neurobiology 19(4), 452–458. Arieh, Y., & Marks, L.  E. (2008). Cross-modal interaction between vision and hearing: A speed-accuracy analysis. Perception & Psychophysics 70(3), 412–421. Auer, E. T., Bernstein, L. E., Sungkarat, W., & Singh, M. (2007). Vibrotactile activation of the auditory cortices in deaf versus hearing adults. Neuroreport 18(7), 645–648.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   229 Bavelier, D., Brozinsky, C., Tomann, A., Mitchell, T., Neville, H., & Liu, G. (2001). Impact of early deafness and early exposure to sign language on the cerebral organization for motion processing. Journal of Neuroscience 21(22), 8931–8942. Bavelier, D., Tomann, A., Hutton, C., Mitchell, T., Corina, D., Liu, G., & Neville, H. (2000). Visual attention to the periphery is enhanced in congenitally deaf individuals. Journal of Neuroscience 20(17), RC93. Beauchamp, M. S., Yasar, N. E., Frye, R. E., & Ro, T. (2008). Touch, sound and vision in human superior temporal sulcus. NeuroImage 41(3), 1011–1020. Bolanowski, S. J., Gescheider, G. A., Verrillo, R. T., & Checkosky, C. M. (1988). Four channels mediate the mechanical aspects of touch. Journal of the Acoustical Society of America 84(5), 1680–1694. Branje, C., Maksimowski, M., Karam, M., Fels, D. I., & Russo, F. A. (2010). Vibrotactile display of music on the human back. Proceedings of the 3rd International Conference on Advances in Computer–Human Interactions, ACHI 2010 (pp. 154–159). Retrieved from https://doi.org/ 10.1109/ACHI.2010.40 Bremmer, F., Schlack, A., Duhamel, J. R., Graf, W., & Fink, G. R. (2001). Space coding in primate posterior parietal cortex. NeuroImage 14(1), S46–S51. Brochard, R., Touzalin, P., Després, O., & Dufour, A. (2008). Evidence of beat perception via purely tactile stimulation. Brain Research 1223, 59–64. Caetano, G., & Jousmäki, V. (2006). Evidence of vibrotactile input to human auditory cortex. NeuroImage 29(1), 15–28. Callan, D.  E., Jones, J.  A., Munhall, K., Callan, A.  M., Kroos, C., & Vatikiotis-Bateson, E. (2003). Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport 14(17), 2213–2218. Callan, D.  E., Jones, J.  A., Munhall, K., Kroos, C., Callan, A.  M., & Vatikiotis-Bateson, E. (2004). Multisensory integration sites identified by perception of spatial wavelet filtered visual speech gesture information. Journal of Cognitive Neuroscience 16(5), 805–816. Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C. R., McGuire, P. K., . . . David, A. S. (1997). Activation of auditory cortex during silent lipreading. Science 276(5312), 593–596. Chemin, B., Mouraux, A., & Nozaradan, S. (2014). Body movement selectively shapes the neural representation of musical rhythms. Psychological Science 25(12), 2147–2159. Cullen, K. E., & Roy, J. E. (2004). Signal processing in the vestibular system during active versus passive head movements. Journal of Neurophysiology 91(5), 1919–1933. Elliott, M. T., Wing, A. M., & Welchman, A. E. (2010). Multisensory cues improve sensorimotor synchronisation. European Journal of Neuroscience 31(10), 1828–1835. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433. Ernst, M. O., & Bülthoff, H. H. (2004). Merging the senses into a robust percept. Trends in Cognitive Sciences 8(4), 162–169. Finney, E. M. F. (2001). Visual stimuli activate auditory cortex in the deaf. Nature Neuroscience 4(12), 1171–1173. Fontana, F., Papetti, S., Järveläinen, H., & Avanzini, F. (2017). Detection of keyboard vibrations and effects on perceived piano quality. Journal of the Acoustical Society of America 142(5), 2953–2967. Formisano, E., Kim, D. S., Di Salle, F., Van De Moortele, P. F., Ugurbil, K., & Goebel, R. (2003). Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40(4), 859–869.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

230  frank russo Fox, N. A., Yoo, K. H., Bowman, L. C., Cannon, E. N., Ferrari, P. F., Bakermans-Kranenburg, M.  J., . . . Van IJzendoorn, M.  H. (2016). Assessing human mirror activity with EEG mu rhythm: A meta-analysis. Psychological Bulletin 142(3), 291–313. Foxe, J. J., & Schroeder, C. E. (2005). The case for feedforward multisensory convergence during early cortical processing. Neuroreport 16(5), 419–423. Foxe, J. J., Wylie, G. R., Martinez, A., Schroeder, C. E., Javitt, D. C., Guilfoyle, D., . . . Murray, M.  M. (2002). Auditory-somatosensory multisensory processing in auditory association cortex: An fMRI study. Journal of Neurophysiology 88(1), 540–543. Frith, C. D., & Hasson, U. (2016). Mirroring and beyond: Coupled dynamics as a generalized framework for modelling social interactions. Philosophical Transactions of the Royal Society B: Biological Sciences 371(1693), 20150366. Retrieved from https://doi.org/10.1098/rstb. 2015.0366 Gescheider, G.  A., Bolanowski, S.  J., Pope, J.  V., & Verrillo, R.  T. (2002). A four-channel analysis of the tactile sensitivity of the fingertip: Frequency selectivity, spatial summation, and temporal summation. Somatosensory and Motor Research 19(2), 114–124. Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences 15(1), 20–25. Grahn, J.  A. (2012). See what I hear? Beat perception in auditory and visual rhythms. Experimental Brain Research 220(1), 51–61. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Gu, Y., Angelaki, D. E., & DeAngelis, G. C. (2008). Neural correlates of multisensory cue integration in macaque MSTd. Nature Neuroscience 11(10), 1201–1210. Hairston, W.  D., Hodges, D.  A., Casanova, R., Hayasaka, S., Kraft, R., Maldjian, J.  A., & Burdette, J. H. (2008). Closing the mind’s eye: Deactivation of visual cortex related to auditory task difficulty. Neuroreport 19(2), 151–154. Hickok, G. (2009). Eight problems for the mirror neuron theory of action understanding in monkeys and humans. Journal of Cognitive Neuroscience 21(7), 1229–1243. Hopkins, C., Maté-Cid, S., Fulford, R., Seiffert, G., & Ginsborg, J. (2016). Vibrotactile presentation of musical notes to the glabrous skin for adults with normal hearing or a hearing impairment: Thresholds, dynamic range and high-frequency perception. PLoS ONE 11(5), e0155807. Retrieved from https://doi.org/10.1371/journal.pone.0155807 Hove, M. J., Fairhurst, M. T., Kotz, S. A., & Keller, P. E. (2013). Synchronizing with auditory and visual rhythms: An fMRI assessment of modality differences and modality appropriateness. NeuroImage 67, 313–321. Hyde, K. L., Peretz, I., & Zatorre, R. J. (2008). Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46(2), 632–639. Iversen, J. R., Patel, A. D., Nicodemus, B., & Emmorey, K. (2015). Synchronization to auditory and visual rhythms in hearing and deaf individuals. Cognition 134, 232–244. Johnsrude, I.  S., Penhune, V.  B., & Zatorre, R.  J. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain 123(1), 155–163. Jones, J. A., & Callan, D. E. (2003). Brain activity during audiovisual speech perception: An fMRI study of the McGurk effect. Neuroreport 14(8), 1129–1133. Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences 97(22), 11793–11799. Kaplan, J. T., & Iacoboni, M. (2007). Multimodal action representation in human left ventral premotor cortex. Cognitive Processing 8(2), 103–113.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   231 Keil, J., Müller, N., Ihssen, N., & Weisz, N. (2012). On the variability of the McGurk effect: Audiovisual integration depends on prestimulus brain states. Cerebral Cortex 22(1), 221–231. Kilner, J.  M. (2011). More than one pathway to action understanding. Trends in Cognitive Sciences 15(8), 352–357. Kilner, J. M., Friston, K. J., & Frith, C. D. (2007). Predictive coding: An account of the mirror neuron system. Cognitive Processing 8(3), 159–166. Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomotor recognition network while listening to newly acquired actions. Journal of Neuroscience?27(2), 308–314. Levänen, S., Jousmäki, V., & Hari, R. (1998). Vibration-induced auditory-cortex activation in a congenitally deaf adult. Current Biology 8(15), 869–872. Lévêque, Y., & Schön, D. (2013). Listening to the human voice alters sensorimotor brain rhythms. PLoS ONE 8(11), 1–10. Liégeois-Chauvel, C., Giraud, K., Badier, J. M., Marquis, P., & Chauvel, P. (2012). Intracerebral evoked potentials in pitch perception reveal a functional asymmetry of human auditory cortex. Annals of the New York Academy of Sciences 930, 117–132. Lomber, S. G., Meredith, M. A., & Kral, A. (2010). Cross-modal plasticity in specific auditory cortices underlies visual compensations in the deaf. Nature Neuroscience 13(11), 1421–1427. Loui, P. (2015). A dual-stream neuroanatomy of singing. Music Perception: An Interdisciplinary Journal 32(3), 232–241. Luo, H., Liu, Z., & Poeppel, D. (2010). Auditory cortex tracks both auditory and visual stimulus dynamics using low-frequency neuronal phase modulation. PLoS Biology 8(8), e1000445. Retrieved from http://dx.plos.org/10.1371/journal.pbio.1000445.g007 Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54(6), 1001–1010. McGarry, L. M., Pineda, J. A., & Russo, F. A. (2015). The role of the extended MNS in emotional and nonemotional judgments of human song. Cognitive, Affective, & Behavioral Neuroscience 15(1), 32–44. https://doi.org/10.3758/s13415-014-0311-x McGarry, L. M., Russo, F. A., Schalles, M. D., & Pineda, J. A. (2012). Audio-visual facilitation of the mu rhythm. Experimental Brain Research 218(4), 527–538. McGurk, H., & Macdonald, J. (1976). Hearing lips and seeing voices. Nature 264(5588), 746–748. Maes, P.-J., Leman, M., Palmer, C., & Wanderley, M.  M. (2014). Action-based effects on music perception. Frontiers in Psychology 4. Retrieved from https://doi.org/10.3389/fpsyg. 2013.01008 Makous, J. C., Friedman, R. M., & Vierck, C. J. (1995). A critical band filter in touch. Journal of Neuroscience 15(4), 2808–2818. Manning, F.  C., & Schutz, M. (2013). “Moving to the beat” improves timing perception. Psychonomic Bulletin and Review 20(6), 1133–1139. Manning, F. C., & Schutz, M. (2016). Trained to keep a beat: Movement-related enhancements to timing perception in percussionists and non-percussionists. Psychological Research 80(4), 532–542. Marshall, M. T., & Wanderley, M. M. (2011). Examining the effects of embedded vibrotactile feedback on the feel of a digital musical instrument. New Interfaces for Musical Expression (June), 399–404. Merabet, L.  B., Hamilton, R., Schlaug, G., Swisher, J.  D., Kiriakopoulos, E.  T., Pitskel, N. B., . . . Pascual-Leone, A. (2008). Rapid and reversible recruitment of early visual cortex for touch. PLoS ONE 3(8). Retrieved from https://doi.org/10.1371/journal.pone.0003046

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

232  frank russo Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T. (2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140093. Retrieved from https://doi. org/10.1098/rstb.2014.0093 Meredith, M. A., & Stein, B. E. (1986). Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56(3), 640–662. Milner, B. (1962). Laterality effects in audition. In V. B. Mountcastle (Ed.), Interhemispheric relations and cerebral dominance (pp. 177–195). Baltimore, MD: Johns Hopkins University Press. Morioka, M., & Griffin, M. J. (2005). Thresholds for the perception of hand-transmitted vibration: Dependence on contact area and contact location. Somatosensory and Motor Research 22(4), 281–297. Müller, K., Aschersleben, G., Schmitz, F., Schnitzler, A., Freund, H.  J., & Prinz, W. (2008). Inter- versus intramodal integration in sensorimotor synchronization: A combined behavioral and magnetoencephalographic study. Experimental Brain Research 185(2), 309–318. Nattiez, J.-J. (1990). Music and discourse: Toward a semiology of music. Princeton, NJ: Princeton University Press. North, A. C. (2012). The effect of background music on the taste of wine. British Journal of Psychology,103(3), 293–301. North, A. C., Hargreaves, D. J., & McKendrick, J. (1999). The influence of in-store music on wine selections. Journal of Applied Psychology 84(2), 271–276. Papetti, S., Jarvelainen, H., Giordano, B. L., Schiesser, S., & Frohlich, M. (2017). Vibrotactile sensitivity in active touch: Effect of pressing force. IEEE Transactions on Haptics 10(1), 113–122. Patel, A. D., & Iversen, J. R. (2014). The evolutionary neuroscience of musical beat perception: The Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in Systems Neuroscience 8, 57. Retrieved from https://doi.org/10.3389/fnsys.2014.00057 Patel, A. D., Iversen, J. R., Chen, Y., & Repp, B. H. (2005). The influence of metricality and modality on synchronization with a beat. Experimental Brain Research 163(2), 226–238. Peretz, I. (1990). Processing of local and global musical information by unilateral brain-damaged patients. Brain 113(4), 1185–1205. Phillips-Silver, J., & Trainor, L. J. (2005). Feeling the beat: Movement influences infant rhythm perception. Science 308(5727), 1430. Phillips-Silver, J., & Trainor, L. J. (2007). Hearing what the body feels: Auditory encoding of rhythmic movement. Cognition 105(3), 533–546. Platz, F., & Kopiez, R. (2012). When the eye listens: A meta-analysis of how audio-visual presentation enhances the appreciation of music performance. Music Perception 30(1), 71–83. Poeppel, D. (2001). Pure word deafness and the bilateral processing of the speech code. Cognitive Science 25(5), 679–693. Quinto, L., Thompson, W. F., Russo, F. A., & Trehub, S. E. (2010). A comparison of the McGurk effect for spoken and sung syllables. Attention, Perception, & Psychophysics 72(6), 1450–1454. Rauschecker, J. P., Tian, B., & Hauser, M. (1995). Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268(5207), 111–114. Rauschecker, J. P., Tian, B., Pons, T., & Mishkin, M. (1997). Serial and parallel processing in rhesus monkey auditory cortex. Journal of Comparative Neurology 382(1), 89–103. Ro, T., Hsu, J., Yasar, N. E., Caitlin Elmore, L., & Beauchamp, M. S. (2009). Sound enhances touch perception. Experimental Brain Research 195(1), 135–143.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

multisensory processing in music   233 Röder, B., Stock, O., Bien, S., Neville, H., & Rösler, F. (2002). Speech processing activates visual cortex in congenitally blind humans. European Journal of Neuroscience 16(5), 930–936. Rohe, T., & Noppeney, U. (2015). Cortical hierarchies perform Bayesian causal inference in multisensory perception. PLoS Biology 13(2). Retrieved from https://doi.org/10.1371/journal. pbio.1002073 Rosenblum, L.  D., & Fowler, C.  A. (1991). Audiovisual investigation of the loudness-effort effect for speech and nonspeech events. Journal of Experimental Psychology: Human Perception and Performance 17(4), 976–985. Ross, J. M., Iversen, J. R., & Balasubramaniam, R. (2016). Motor simulation theories of musical beat perception. Neurocase 22(6), 558–565. Royal, I., Lidji, P., Théoret, H., Russo, F. A., & Peretz, I. (2015). Excitability of the motor system: A transcranial magnetic stimulation study on singing and speaking. Neuropsychologia 75, 525–532. Russo, F. A., Ammirante, P., & Fels, D. I. (2012). Vibrotactile discrimination of musical timbre. Journal of Experimental Psychology: Human Perception and Performance 38(4), 822–826. Russo, F. A., Sandstrom, G. M., & Maksimowski, M. (2011). Mouth versus eyes: Gaze fixation during perception of sung interval size. Psychomusicology: Music, Mind, and Brain 21(1–2), 98–107. Saldaña, H. M., & Rosenblum, L. D. (1993). Visual influences on auditory pluck and bow judgments. Perception & Psychophysics 54(3), 406–416. Schalles, M.  D., & Pineda, J.  A. (2015). Musical sequence learning and EEG correlates of audiomotor processing. Behavioural Neurology, 2015. Retrieved from https://doi.org/ 10.1155/2015/638202 Schroeder, C. E., Lindsley, R. W., Specht, C., Marcovici, A., Smiley, J. F., & Javitt, D. C. (2001). Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85(3), 1322–1327. Schürmann, M., Caetano, G., Hlushchuk, Y., Jousmäki, V., & Hari, R. (2006). Touch activates human auditory cortex. NeuroImage 30(4), 1325–1331. Schutz, M. (2008). Seeing music? What musicians need to know about vision. Empirical Musicology Review 3(3), 83–108. Schutz, M., & Kubovy, M. (2009). Deconstructing a musical illusion: Point-light representations capture salient properties of impact motions. Canadian Acoustics 37(1) 23–28. Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music: Vision influences perceived tone duration. Perception 36(6), 888–897. Senkowski, D., Schneider, T. R., Foxe, J. J., & Engel, A. K. (2008). Crossmodal binding through neural coherence: Implications for multisensory processing. Trends in Neurosciences 31(8), 401–409. Simon, D. M., & Wallace, M. T. (2017). Rhythmic modulation of entrained auditory oscillations by visual inputs. Brain Topography 30(5), 565–578. Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, & Psychophysics 73(4), 971–995. Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press. Stevenson, R. A., & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage 44(3), 1210–1223. Thomas, C. (1983). Music as heard: A study in applied phenomenology. New Haven, CT: Yale University Press.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

234  frank russo Thompson, W.  F., Graham, P., & Russo, F.  A. (2005). Seeing music performance: Visual influences on perception and experience. Semiotica 156(1/4), 203–227. Thompson, W. F., & Russo, F. A. (2007). Facing the music. Psychological Science 18(9), 756–757. Thompson, W., Russo, F., & Livingstone, S. (2010). Facial expressions of singers influence perceived pitch relations. Psychonomic Bulletin & Review 17(3), 317–322. Thompson, W. F., Russo, F. A., & Quinto, L. (2008). Audio-visual integration of emotional cues in song. Cognition & Emotion 22(8), 1457–1470. Tranchant, P., Shiell, M.  M., Giordano, M., Nadeau, A., Peretz, I., & Zatorre, R.  J. (2017). Feeling the beat: Bouncing synchronization to vibrotactile music in hearing and early deaf people. Frontiers in Neuroscience 11. Retrieved from https://doi.org/10.3389/fnins.2017.00507 Varlet, M., Marin, L., Issartel, J., Schmidt, R. C., & Bardy, B. G. (2012). Continuity of visual and auditory rhythms influences sensorimotor coordination. PLoS ONE 7(9). Retrieved from https://doi.org/10.1371/journal.pone.0044082 Verrillo, R. T. (1992). Vibration sensation in humans. Music Perception: An Interdisciplinary Journal 9(3), 281–302. Verrillo, R. T., & Bolanowski, S. J. (1986). The effects of skin temperature on the psychophysical responses to vibration on glabrous and hairy skin. Journal of the Acoustical Society of America 80(2), 528–532. Vines, B. W., Krumhansl, C. L., Wanderley, M. M., Dalca, I. M., & Levitin, D. J. (2011). Music to my eyes: Cross-modal interactions in the perception of emotions in musical performance. Cognition 118(2), 157–170. Vuoskoski, J. K., Thompson, M. R., Clarke, E. F., & Spence, C. (2014). Crossmodal interactions in the perception of expressivity in musical performance. Attention, Perception, & Psychophysics, 76(2), 591–604. Wapnick, J., Mazza, J. K., & Darrow, A. A. (2000). Effects of performer attractiveness, stage behaviour, and dress on evaluation of children’s piano performances. Journal of Research in Music Education 323(4), 323–335. Zatorre, R. J. (1988). Pitch perception of complex tones and human temporal-lobe function. Journal of the Acoustical Society of America 84(2), 566–572. Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex. Cerebral Cortex 11(10), 946–953. Zatorre, R. J., Chen, J. L., & Penhune, V. B. (2007). When the brain plays music: Auditory–motor interactions in music perception and production. Nature Reviews Neuroscience 8, 547–558.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

section iv

N EU R A L R E SP ONSE S TO M USIC : C O GN I T ION, A F F E C T, L A NGUAGE

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 11

M usic a n d M emory Lutz Jäncke

Introduction Music listening, music composing, and music-making are strongly associated with memory processes. For example, when we listen to music we might remember the title, the melody, the singer or musicians, and the circumstances in which we heard the music for the first time. It is also possible that we catch the gist when listening to a particular piece of music without explicitly knowing the details of the piece. These are the most obvious memory aspects associated with music. However, some people might even be able to remember a single tone or a tone interval without relying on a reference tone. Music might also help to boost our memory and help us to consolidate what we have learned. These few examples demonstrate that music is associated in many ways with memory processes. In this chapter, I will discuss these associations and provide some examples of and future applications for music supporting memory processes. But before I examine the typical music-related memory aspects, I discuss some basic principles of the human memory system.

Human Memory in General Human memory comprises several parts: (1) sensory memory, (2) short-term memory, (3) working memory, and (4) long-term memory. The sensory memory stores sensory information for a very short period. This memory system is strongly associated with neural networks processing sensory information. Thus, this information is not ­processed, interpreted, and encoded. The working memory system is a central system not only for memory processes; it is rather pivotal for many if not all cognitions. The main functions controlled by working memory are often coined as “maintenance and manipulation” to express the fact that working memory not only holds but also

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

238   lutz jäncke manipulates information. To hold information for a short period of time without any cognitive manipulation is a matter of short-term memory. Manipulation, which is a main pillar of the working memory system, is strongly related to executive functions, pattern recognition, long-term memory, encoding for long-term memory, language and music comprehension, problem solving, and even creativity. This is all accomplished under participation of the working memory system. Thus, this system is pivotal for nearly all music functions and particularly for music memory. The neural networks, which are involved in working memory process, are not focal but are distributed over many brain areas due to the many functions associated with working memory. In long-term memory, encoded material is stored for longer time periods, sometimes even extremely long—up to many decades. Long-term memory is divided into an explicit and implicit memory system. The explicit memory system contains consciously available information and comprises the semantic and episodic memory. The semantic memory contains conscious memory of facts while the episodic memory is a system for holding events, memory traces associated with places, times, emotions, and other concept-based knowledge of an experience. This explicit memory (sometimes also called declarative memory) is not a simple store; it is rather a mechanism constructing the past on the basis of stored and new information using specific strategies (e.g., retrieval schemas, which will be described later). The neural underpinnings of the explicit memory system are relatively complex and contain so-called “bottleneck structures” in mesiotemporal brain areas (including the hippocampus) and networks in temporal, parietal, as well as frontal brain areas. Thus, the explicit memory system is based on a distributed network with a mesiotemporal core system. The implicit memory system contains information that is not easy to verbalize but can be used without consciously thinking about it. The networks controlling this implicit memory system do not overlap with the neural networks for the explicit memory system. The neural networks for the implicit memory mainly comprise premotor, cerebellar, and basal ganglia structures.

Memory Processes during Music Listening The psychological processes and the neural underpinnings of music listening have been studied quite intensively. These studies have shown that music is processed in a cascade of steps that begins with the segregation within the auditory stream, followed by the extraction and integration of a variety of acoustic features, leading to cognitive memoryrelated processes that induce personal, often emotional, experiences. Thus, listening to music can be conceived of as a hierarchical continuous serial-to-parallel conversion during which the auditory stream (stream of tones and chords) is integrated to melody chunks and these melody chunks are then integrated to an entire melody (Fig.  1).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   239 Serial-Parallel-Transformation t2

t1

t3

t4

m1

xn



x1

X

t5

t3



t1

m1

• Prediction • Interpretation • Working memory • Experience

t6

t7

t8

t9

t6

∫ m2

m2

t4

• Specific for music • Different to speech

t10

m3

M

t10



t7

m3

m3 M m1



time

Figure 1.  Schematic description of the serial-to-parallel conversion, which can be conceived of as a form of integration of serial information on different levels. t1–t10 represent different tones presented in serial order. m1 to m3 are the integrated tones combining to form melody fragments. At the next level, these melody fragments are integrated into a larger melody cluster or even into the entire musical piece.

For this serial-to-parallel conversion, working memory processes are pivotal, since the tonal and/or music information is stored temporarily and perpetually manipulated. The sound sequences are woven into a melodic contour of pitch and rhythm. These melodic contours do not appear to be due to bottom-up processes since the listener is not a passive listener or receiver, but is actively engaged in processing the music. In this context, the listener uses acoustic memories, aesthetic judgments, and expectations and combines them to understand and interpret the particular piece of music (the schema concept is discussed later in the chapter). Thus, the listener stores many aspects of the auditory stimuli—such as pitch, pitch interval, timbre, and rhythm—in memory. Based on this stored information, the listener constructs an integrated memory of the particular melody. In the following, I will describe some memory processes associated with tone, tone interval, and melody processing in more detail.

Tone Memory Even non-musicians are relatively good at remembering and recognizing single tones or the pitch of a melody. For example, in an experiment conducted by Gaab and colleagues (Gaab, Gaser, Zaehle, Jancke, & Schlaug, 2003), non-musicians performed well in pitch memory tasks during which they were asked to make a decision on whether the last or second to last tone of a tone sequence was the same or different as the first tone. The recognition rate for the tones was astonishingly high, with an accuracy of about 66 percent. The authors also conducted fMRI measurements during pitch memory learning.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

240   lutz jäncke When relating the pitch memory performance to the task-related hemodynamic responses, they revealed that bilaterally the supramarginal gyrus and the dorsolateral cerebellum were significantly correlated with good task performance. The authors suggest that (besides the auditory cortex), the supramarginal gyrus and the dorsolateral cerebellum may play a critical role in short-term storage of pitch information. Absolute pitch listeners are much better in memorizing tones and chords. Absolute pitch (AP) is defined as the ability to identify a note without relying on a reference tone (Levitin & Rogers, 2005; Takeuchi & Hulse, 1993). It is a rare ability with an incidence of 1 percent in the general population, although Asian people speaking tonal languages have a higher rate (Deutsch, Henthorn, Marvin, & Xu, 2006). Absolute pitch is supposed to originate from an intertwining of genetic factors (Gregersen, Kowalsky, Kohn, & Marvin, 1999), early exposure to music (Gregersen, Kowalsky, Kohn, & Marvin, 2001), and intensity of musical training (Gregersen et al., 2001). Currently, a two-component model is discussed explaining this extraordinary ability. In the context of this model, it is suggested that AP is constituted by one perceptual (i.e., “categorical perception”) and two cognitive—“pitch memory” (i.e., explicit memory) and “pitch labeling” (i.e., implicit associative memory)—mechanisms, whereby the latter mechanism has been suggested as constituting the load-bearing skeleton of AP. Several neurophysiological and neuroanatomical studies support this suggestion. One main finding in this context is that frontotemporal areas are strongly activated during tone listening and tone memory tasks in AP listeners and that these regions are specifically and strongly functionally and anatomically interconnected (Elmer, Rogenmoser, Kühnis, & Jäncke, 2015; Rogenmoser, Elmer, & Jäncke, 2015; Zatorre, Perry, Beckett, Westbury, & Evans, 1998). Although these findings are interesting and important for understanding the neural underpinnings of tone perception and tone memory, listening to single tones and remembering them are not adequate tasks in understanding musical listening and the associated memory processes in their entirety.

Tone Interval Memory More important for understanding music-related memory processes is to understand the psychological and neurophysiological processes that are operative during tone sequence and melody listening tasks. Even non-musicians are very good in recognizing melodies based largely upon the relative sizes of the intervals between successive pitches. This ability is robustly preserved even when the entire frequency range of the music is shifted up or down (i.e., during transposing). This ability, which is called relative pitch (RP) processing, is strongly influenced or even entirely acquired early during development. For example, Trainor and colleagues (Trainor, McDonald, & Alain, 2002) showed that 5.5- to 6.5-month-old infants preferred to listen to a particular melody, which they have listened to repeatedly (compared to a novel melody). In this experiment, the authors also demonstrated that the AP information was more or less unimportant. Most important, however, was the long-term representation of the melody, which is based on the tone intervals. In a further electrophysiological experiment (Plantinga &

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   241 Trainor,  2005), it was shown that RP interval processing occurs in a more or less automatic fashion, as demonstrated by mismatch negativities (MMN) to deviations of known pitch intervals. Since the MMN is commonly regarded as a neurophysiological marker of pre-attentive processing of change detection, the authors conclude that pitch interval perception is automatically implanted. Further studies have substantiated these findings by showing that the encoding accuracy increases with increasing length of the tone sequences (Lee, Janata, Frost, Martinez, & Granger, 2015). The authors interpret these findings as support for the idea that it is easier for the subjects to apply particular Gestalt principles to longer than to short tone sequences.

Tonal Working Memory When we listen to music we integrate the incoming sequential auditory information. That makes it necessary to hold some auditory information for a short period of time in memory and to combine this with the next incoming sounds of a melody. Thus, we have to hold auditory information, and based on our knowledge about the musical structure, we combine the tone sequences into melodies. Without such a mechanism, it would be impossible for us to follow and understand even the shortest musical piece. From this description, it is clear that short-term memory processes (maintaining auditory information for a short period of time) as well as cognitive processes (manipulating, combining, and prediction) are involved here. This combination of maintenance and manipulation of incoming stimuli has led to the formulation of the working memory (WM) concept. The classical WM model was mostly developed using verbal material (Baddeley & Hitch, 1974). According to this model, verbal information is processed by a phonological loop, which is further subdivided into a passive storage component ­ rocess). (phonological store) and an active rehearsal mechanism (articulatory rehearsal p The passive storage component is assumed to store auditory or speech-based information for a few seconds. In addition, an attentional control system (the central executive) controls and supervises the phonological loop. In a later version of the WM model, the mutual interaction between long-term memory (LTM) and WM was recognized by proposing an episodic buffer (Baddeley, 2010). Recent developments have led to a more domain-general model of WM (Cowan, 2011; Oberauer & Lewandowsky, 2011). This new model proposes polymodal LTM representations of items, which are activated either by incoming sensory input or by volition, thus becoming available for attentional selection. Based on these theoretical contributions, we now accept that WM is a system with limited capacity binding information from the phonological loop, storing information in a multimodal code, and enabling the interaction between WM and LTM under the supervision of attention and executive control.

Behavioral Findings of Tonal Working Memory Although the classical WM model is well elaborated, it has been unclear whether music information (e.g., tones, chords, and timbre) is processed within the WM system similar to verbal information. As mentioned above, the classical WM model has been designed

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

242   lutz jäncke on the basis of verbal information and does not explicitly specify whether the phonological loop also processes non-verbal information. In behavioral WM studies, one typically influences the rehearsal process by introducing specific stimuli that are similar to the stimuli (e.g., phonological similarity effect) that should be held in mind. Other paradigms manipulate the length of the to-be-remembered items (e.g., word, or sequence, length effect). An important part of the classical WM model is that verbal information can be maintained in verbal WM by internal articulatory rehearsal (within the phonological loop). But does such an internal rehearsal also exist for pitch and timbre information? Not many studies have been conducted to date trying to answer this question and they have come to conflicting conclusions (for an excellent summary see the review by Schulze & Koelsch, 2012). However, as Schulze and Koelsch (2012) correctly point out, the conflicting results are mainly based on the different paradigms and stimuli settings used. Nevertheless, when having a closer look at these findings, a more or less clear picture emerges. There is clear evidence that a tonal WM indeed exists in which tonal information is rehearsed. However, the subjects must be able to rehearse the material. Rehearsal is possible if the subjects are familiar with the tone information and when the to-be-remembered tone information is salient enough (i.e., when tones are used whose frequencies correspond to the frequencies of the Western chromatic scale, or if the frequency differences between the used tones are not smaller than one semitone). Behavioral studies directly comparing verbal and tonal WM are relatively rare. Early studies (Deutsch, 1970; Salamé & Baddeley, 1989) reported that tones or instrumental music as intervening stimuli interfered more strongly with WM tasks for tones than for phonemes or syllables. Thus, these studies were taken as support for a specialized tonal and verbal WM system. However, Semal and colleagues (Semal, Demany, Ueda, & Hallé, 1996) discovered that the frequency relations between the intervening stimuli and the standard stimuli are most important to explain the results of the behavioral WM experiments. They rather identified that pitch similarity of the intervening stimuli (words or tones) had a greater effect on the performance rate than the particular modality (verbal or tonal) of the intervening stimuli. Thus, the pitches for both verbal and tonal stimuli are processed in the same WM system. This auditory WM system always comes into play when the to-be-remembered information is auditorily coded. For example, in a suppression experiment (during which the subjects had to either sing or speak during the retention period), recognition accuracy for both tone and digit sequences decreases, regardless of whether the suppression material was verbal or non-verbal (Schendel & Palmer, 2007). Thus, again this experiment demonstrates that musical or verbal suppression does not selectively impair verbal or tonal WM. A further experiment uncovered expertise-related influences on the tonal WM system (Williamson, Baddeley, & Hitch, 2010). The results of this experiment showed decreased performance if the tone sequences consisted of more proximal (similar) pitches compared to more distal (dissimilar) pitches, an effect resembling the phonological similarity effect in the verbal WM domain.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   243 Thus, one should refer more to an auditory WM where auditory information of the same (or similar) pitch determines intervening effects. And these intervening effects are independent whether the standard or intervening stimuli are tones, vowels, syllables, or words. In other words, everything that sounds similar interferes with each other. There is a single auditory-based WM system that is used for all auditory information, regardless of whether it is verbal or non-verbal auditory material. This might explain how musical training can improve verbal working memory (discussed later). However, in the context of music listening, one has to keep in mind that it is possible that acoustic information exists, which one cannot rehearse. This could be ­specific timbre or pitch information. In this case, the subjects cannot take advantage of the phonological loop (or general auditory rehearsal mechanism). In such situations, the auditory information is retained for a short period of time in specific ­feature maps.

Neuroanatomical Correlates of Working Memory With the advent of modern brain imaging techniques, it is now possible to identify the neural networks that are involved in controlling WM processes. In the past, several studies have examined the neural underpinnings of auditory WM using verbal material. These studies have shown that mainly Broca’s area and premotor areas are core regions involved in the internal rehearsal of verbal material (for a review of these studies see Schulze & Koelsch, 2012). Besides these core regions, the insular cortex and the cerebellum seem also to be involved in internal rehearsal of verbal information. The neural underpinning of the phonological store has been suggested to rely on parietal areas including the inferior and superior parietal lobules and on the posterior perisylvian brain (particularly including the left posterior planum temporale). While parietal brain areas most likely reflect increased engagement of attentional resources (which, incidentally, nicely fits with the pivotal role of attention in WM processes according to the new domain-general WM models: Oberauer & Lewandowsky 2011), the left posterior planum temporale is possibly involved in the temporary storage of verbal information during WM tasks. On the basis of these findings, and because posterior perisylvian brain areas also support speech processing, it has been proposed that they act as an auditory–motor interface for WM (Hickok & Poeppel, 2007). These findings suggest a dual-stream model of speech processing with a ventral stream involved in speech comprehension (supporting lexical access) and a left dominant dorsal stream comprising the planum temporale enabling sensory–motor integration. Through this stream the perceived speech signals are mapped onto articulatory representations in frontal brain areas Elmer, Hänggi, Meyer, & Jäncke, 2013). Far fewer neuroimaging studies have directly investigated the neural underpinnings of WM for tones. However, the few studies that have examined tonal WM revealed that, only in non-musicians, all structures involved in tonal WM were also involved in verbal WM. In summary, consistently across studies (Schulze, Mueller, & Koelsch, 2011, 2013; Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011), data obtained from non-musicians indicate a considerable overlap of neural resources underlying WM for both verbal and

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

244   lutz jäncke tonal information. This common network includes a mainly left-lateralized fronto-parietal network including Broca’s area, parietal areas, and the planum temporale.

Memory for Music When we listen to music, we often recognize the musical piece quite well. Sometimes we remember the title of the musical piece or even further information like the text, composer, and the main instruments. Listeners sometimes are even very accurate in reproducing familiar music by singing and moving rhythmically to the music (Frieler et al., 2013; Halpern, 1989; Levitin, 1994). Similar to verbal and non-verbal memory, musical memory can be divided into implicit (unconscious), semantic, and episodic musical memory (the latter memory systems are conscious) (Platel, 2005). The implicit musical memory can best be seen in neurological patients. For example, Johnson and colleagues (Johnson, Kim, & Risse, 1985) exploited the so-called mere exposure effect in the context of music listening experiments. This mere exposure effect was first demonstrated and described by Zajonc (1968) as a psychological phenomenon, describing that subjects tend to develop a preference for items simply because they have been repeatedly confronted with this item. In the study by Johnson and colleagues, Korsakoff ’s syndrome patients preferred an unfamiliar musical piece after only one previous presentation, compared to new musical pieces. However, these patients did very poorly in a music recognition test. Halpern and O’Connor (2000) found the same dissociation in normal elderly listeners, who were at chance in recognizing just-presented melodies. However, these subjects liked these musical pieces better than new melodies. A similar distinction between explicit and implicit music memory was drawn by Samson and Peretz (2005). On the basis of a comprehensive analysis of neurological patients suffering from lesions in either the right or the left temporal lobe, they concluded that right temporal lobe structures have a crucial role in the formation of melody representations that support priming and memory recognition, which are both more implicit memory processes, whereas left-sided temporal lobe structures are more involved in the explicit retrieval of melodies. Mere exposure effects have also been shown in healthy subjects (Green, Bærentsen, Stødkilde-Jørgensen, Roepstorff, & Vuust, 2012; Honing & Ladinig, 2009). These and further similar studies in this area gave rise to the suggestion that there is indeed an implicit musical memory, which demonstrates different features compared to explicit musical memory. Implicit musical memory in normal and healthy subjects appears, for example, during by-the-way music listening during which we might move or hum without explicitly knowing which musical piece we are listening to. This definitely happens nowadays quite often, especially when we use our mobile devices (e.g., iPhone, etc.) while we stroll through the street, drive a car, or jog. The semantic musical memory is defined as memory for music excerpts without associating them with the context in which the listener learned the excerpt. Thus, we do not associate and remember the temporal (when) or spatial (where) circumstances under which we have encoded and learned the musical piece. Musical semantic memory

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   245 may represent a form of musical lexicon, separate of a verbal lexicon, even though strong links certainly exist between them. Interestingly, musical pieces can be associated with non-music semantic memory as Koelsch and colleagues have shown (Koelsch et al., 2004). They demonstrated that short music excerpts can prime concrete and even abstract words. Even when the musical pieces were unknown to the subject, this priming effect occurred. Obviously, music can carry meaning. The precise psychological mechanism responsible for this interesting association between music and meaning is currently not entirely understood but this study particularly shows that musical information is strongly embedded in distributed memory network. Episodic musical memory, on the other hand, is defined as the capacity to recognize a musical excerpt for which the spatiotemporal context during learning can be recalled (when, where, under which circumstances, and with which people). A particular form of episodic musical memory is the autobiographical musical memory. This memory part is activated when we listen to music which is strongly associated with past experiences of our own life. A further memory concept, which is similar to the autobiographical memory, is the so-called memory for “nostalgia.” Nostalgia has been defined as an affective process sometimes accompanying autobiographical memories (Wildschut, Sedikides, Arndt, & Routledge, 2006) giving rise to (mostly) positive and (sometimes) negative effects (such as sadness). Nostalgia is strongly associated with personality traits explaining the obvious inter-individual differences in the presence of this effect. The different facets of musical memory have been the focus of substantial research in recent years. Based on this research, we now know that the different musical memory systems mentioned earlier can be modulated by different psychological aspects comprising (1) intrinsic musical features such as timbre or tempo, (2) the emotional and arousal components, and (3) individual schemas and musical structure. A further issue influencing music memory processes, which incidentally is relatively new, pertains to the (4) particular brain activation pattern during encoding and retrieval of music information. In the following, I will discuss these issues in more detail.

Intrinsic Features of Musical Pieces Halpern and Müllensiefen (2008) manipulated timbre and tempo in order to examine their influence on implicit and explicit memory for musical pieces. They asked their study participants to encode forty unfamiliar short tunes. After that, the participants were asked to give explicit and implicit memory ratings for a list of eighty tunes, which included forty that had previously been heard. To measure implicit memory, a rating of the pleasantness of old and new melodies was used. Measures reflecting explicit memory performance were obtained by calculating the difference between the recognition confidence ratings of old and new melodies. Half of the forty previously heard tunes differed in timbre or tempo in comparison with the first exposure. Change in timbre and tempo both impaired explicit memory measures, and change in tempo also made implicit tune recognition worse. These findings support the hypothesis that an implicit musical memory indeed exists, but furthermore shows implicit music memory is only influenced by tempo variations. Interestingly, timbre and tempo had an influence on the explicit music

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

246   lutz jäncke memory. These and further similar studies in this area gave rise to the suggestion that there is indeed an implicit musical memory, which demonstrates different features compared to the explicit musical memory.

Emotion and Arousal Induced by Music Several studies have shown that emotion and arousal evoked by musical pieces influence retrieval and recognition of music. The main finding is that emotional and arousing musical pieces are remembered better than pieces which are less emotional and arousing (Alonso, Dellacherie, & Samson, 2015; Eschrich, Münte, & Altenmüller, 2005, 2008; Ferreri & Rodriguez-Fornells, 2017; Parks & Clancy Dollinger, 2014; Peretz et al., 2009; Vieillard & Gilet,  2013) (but for contradictory results, see Altenmüller, Siggel, Mohammadi, Samii, & Münte, 2014). The reason for this memory enhancing effect is thought to be based on at least two different and partly interacting effects: (1) activation of the mesolimbic system, and (2) enhancing the number of associations within the semantic associative network. Emotionally and rewarding music strongly activates the mesolimbic reward system (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015). The mesolimbic system is a relatively small brain system (including the nucleus accumbens and the ventromedial prefrontal cortex), which is important for the control of emotion, reward, and learning and which is mediated mainly by dopamine. Dopamine is also widely recognized to be the critical transmitter involved in addiction processes, for example, during the course of virtually all drug abuses (including heroin, alcohol, cocaine, and nicotine abuse). Even psychological addictions (e.g., gaming) are associated with particular activations within the dopamine system (Kühn et al., 2011). But other forms of rewards such as positive social interactions likewise activate dopaminergic neurons and are powerful aids to attention and learning (Keitz, Martin-Soelch, & Leenders, 2003). Dopamine is thought to strengthen the synaptic potentiation in memory networks activated during learning and consolidation of the music material. Thus, dopamine also promotes plastic adaptations in brain areas involved in the control of trained and practiced tasks. A further transmitter involved in music listening is serotonin. Serotonin levels are significantly higher when subjects are exposed to music they find pleasing (Evers & Suhr, 2000). Several (mostly animal) studies have suggested a particular role of serotonin in learning and memory processes (Meneses & Liy-Salmeron, 2012). However, it is not entirely clear whether serotonin plays a positive or inhibitory role in memory formation. It may actually be memory enhancing in one brain area and inhibiting in another. Nevertheless, these transmitter systems (together with several others) may support learning and memory processes. However, one has to acknowledge that not only positively evaluated and rewarding music is preferentially stored in musical memory but also negative or simply arousing music. That these non-rewarding musical pieces are strongly implemented in the music memory can be better explained by the associative memory models, which will be explained in the next paragraph. However, it should be kept in mind that this model is also useful in explaining the role of emotion in general, irrespective of valence and arousal.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   247 In the context of the semantic associative network model of memory formation (Bower,  1981) or the Search of Associative Memory (SAM) model (Raaijmakers & Shiffrin, 1981), it has also been proposed that emotions are used as contextual information linked to the to-be-remembered item. These models assume that emotions are represented in a network of nodes together with the musical piece. Thus stimulation and “activation” of emotion nodes would create a form of spreading activation that lowers the threshold of excitation of all associatively linked nodes and thus helps to retrieve the music memory trace from memory. We will come to this model and the extension later on. This model is particularly suited to explain why even unpleasant music might be remembered well. This issue has not been studied so far, but from introspection it is known that we sometimes heavily dislike particular musical pieces despite recognizing them relatively accurately.

Individual Schemas and Music Structure Different listeners may understand the same musical piece in very different ways. They may have varying degrees of appreciation for the musical structure and they may differ on how it fits into the cultural context. In order to describe and understand how we individually perceive and memorize music, I will use the well-known schema concept (Piaget, 1923). In other words, schemas are a form of cognitive heuristic which automatically makes assumptions about the music and, although not completely accurate, enables us to make quick judgments. Schemas are a product of our experiences and can be adjusted or refined throughout our entire lives. These schemas help us to understand various musical pieces; they can influence our music memories, or influence what musical piece we pay our attention to, and thus affect the chunks of information that are available for encoding long-term music memories. Additionally, when we try to remember a musical memory, schemas can help us to piece together memories from it. These schemas determine how (and whether) we encode, consolidate, and remember a particular piece of music. A schema can even prevent us from encoding, for example, when we are not interested in or strongly dislike a particular musical piece. In such situations, we will not focus our attention on this piece and at the end we will remember it poorly. On the other hand, it is possible that a particular musical piece fits perfectly to a stored schema (which incidentally is positively evaluated), in which case we direct our attention to this piece of music and insert it preferentially into our memory system. Incidentally, we know from several neurophysiological studies that focusing attention on a particular auditory stimulus enhances neural activation in the auditory cortex (Jäncke, Mirzazade, & Shah, 1999). Thus, attention gives rise to focal neural activation increases in specific brain areas and thus can influence learning, consolidation, and improved retrieval of stored information. While schemas depend on the individual subject and how the subject “organizes” the neural networks and mental structures for processing incoming information, the musical structure itself also plays a pivotal role in learning and remembering musical pieces. There are long and short pieces, some of them are monotone while others vary dynamically across the entire piece. Some pieces use several musical themes appearing

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

248   lutz jäncke in different forms while others only use one more or less simple theme. In other words, musical structure is defined by the degree of change within different levels of the musical piece. Many researchers use the terms “information” or “complexity” to describe the musical structure (Werbik, 1971). In this context, information refers to redundancy. If the next note in a piece of music is relatively determined by the preceding notes, it conveys little new information about the piece. Thus, a musical piece containing complicated changes on many levels of its structure contains more information than a piece that is repetitive and for which the next notes and beats are easily predictable on the basis of the preceding notes and beats. In the context of music memory, it is obvious that complexity of the musical piece affects how it is encoded, consolidated, and recalled. The more complicated (and complex) a musical piece, the more difficult it is to encode and remember it. However, whether we can learn and remember complicated and complex music also depends on our mental structure and the schemas we have available for music perception and music memory. Those who have mental structures for complex music will find it easier to learn and retrieve them. Thus, there should be a strong interaction between the mental structure for music and the musical structure itself for forming musical memory. As far as I know, this has not been studied explicitly in the music domain. However, in other domains it has frequently been shown that experts (with specific and optimized mental structures) are partly exceptional in discriminating, learning, and recognizing information from their fields of expertise (Gobet, 1998; Rawson & Van Overschelde, 2008). Thus, it is most likely that expertise in music (even low level expertise) will have substantial influence on music memory. Nevertheless, it is left to future studies to show that the available mental structure for music has indeed an influence on musical memory.

Brain Activation during Encoding and Retrieval of Music Only a few studies have examined the neural underpinnings of music memory so far. The few fMRI studies have uncovered mostly similar findings (Altenmüller et al., 2014; Ford, Addis, & Giovanello,  2011; Gagnepain et al.,  2017; Groussard et al.,  2010; Janata, 2009; Margulis, Mlsna, Uppunda, Parrish, & Wong, 2009; Plailly, Tillmann, & Royet, 2007; Platel, 2005; Watanabe, Yagishita, & Kikyo, 2008). However, there are also some differences depending on the paradigm used and the particular music memory system studied. All studies identified a strong involvement of the bilateral temporal brain area including the primary and secondary auditory cortex (within the superior temporal gyrus) and temporal brain areas known to be involved in language and memory processing (the middle and inferior temporal gyrus). In addition, all studies reported the involvement of frontal brain areas during music recognition. Mostly, the left inferior frontal cortex is involved. When it comes to episodic music memory, bilateral frontal cortex activations have also been reported with slightly right-sided dominance. Sometimes the precuneus has also been reported as being activated during episodic music memory tasks. When autobiographical music memory is tested, hemodynamic responses in default-mode network (DMN) regions increase, including lateral parietal, temporal, medial prefrontal, and posterior cingulate cortices (Ford et al., 2011; Janata, 2009).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   249 Although these studies adequately demonstrate that a distributed cortical network is involved in music memory process, one has to keep in mind that the fMRI environment and the obtained hemodynamic measures are not optimal for studying the neural underpinnings of music processing in general and music memory processes in particular. The loud and partly annoying measurement environment is suboptimal for music presentation and even for fine-graded cognitive processes. A major drawback of many fMRI studies (and those mentioned earlier) is the fact that mostly very short fragments of musical pieces (10–20 seconds) have been used, which may have precluded the complex cognitive and emotional processes associated with natural music listening. In addition, the hemodynamic responses are slow and only partly correlate with the underlying neurophysiological activations (Logothetis, 2008). In future experiments it would be extremely helpful to study the neural underpinnings of the different music memory systems using silent and less annoying neurophysiological measurement techniques, such as EEG, MEG, or NIRS which provide the possibility of working with natural music stimuli. Currently, there are no studies using the types of experimental paradigms that were used in the aforementioned fMRI studies. Thus, it is of utmost importance to study the neurophysiological oscillations, intracortical current densities, and coherences during music memory tasks. This would provide the opportunity to study the neural underpinnings of music memory processes using more natural experimental situations. Until now, many music perception studies have been published using these techniques and more natural music stimuli. Since music perception implicitly makes use of music memory processes, these studies have uncovered findings that are also interesting for music memory research. For example, listening to natural music is associated with activations in distributed neural systems comprising bilaterally temporal and frontal brain areas (Jäncke & Alahmadi, 2016). In addition, particular coherences between adjacent and distant brain areas are obvious during music perception (Bhattacharya & Petsche, 2001; Bhattacharya, Petsche, Feldmann, & Rescher, 2001; Bhattacharya, Petsche, & Pereda, 2001; Jäncke, 2012) and other music-related tasks (Bangert & Altenmüller, 2003). Thus, these studies partly correspond with fMRI studies in showing that music perception (and thus partly music memory) is controlled via a distributed neural network binding together brain systems involved in auditory, memory, attention, sequence processing, and executive functions. These neurophysiological findings could be used to understand the possible enhancing effects of music on cognitive tasks (which I will summarize in the next section). In his review article, Wolfgang Klimesch summarized his EEG findings on memory research (Klimesch, 1999) and reported that “good” and “bad” memory performers substantially differ in terms of the time courses of event-related desynchronizations (ERD) in the upper alpha and theta band during a semantic judgment task. The results indicate within the first 1000 ms after presenting the test stimuli, good memory performance is associated with a significantly larger extent of alpha band desynchronization. The opposite holds true for the theta band where good memory performance is reflected by a larger extent of synchronization during the first 1000 ms. In this respect, the phasic responses

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

250   lutz jäncke of these frequency bands reflect the quality and performance of the memory. In addition, tonic changes of these frequency bands are also related to the performance in memory, cognition, and perception. For example, increased tonic alpha band and decreased theta band power are associated with increased performance in various cognitive and perceptual tasks. In this respect, it seems obvious that attempts are made to influence the tonic and phasic oscillations in the alpha and theta bands in certain brain areas in such a way that the functions performed by these brain areas run optimally. This has been done by Klimesch and colleagues (Klimesch, Sauseng, & Gerloff, 2003). They induced increased alpha-band oscillation in parietal brain areas using transcranial magnetic stimulation (TMS) prior to the performance of spatial intelligence tasks. By doing this, they increased the tonic alpha band power in parietal areas. As a result of this manipulation, the subjects substantially improved their cognitive performance. Thus, it is conceivable that music listening might influence brain activation in a similar way leading to an improvement in several ongoing cognitive processes.

Music as Memory Enhancer Can music be used as memory enhancer? When asking this question one has to distinguish which aspect of memory should benefit from music. In fact there are different influences of music on memory performance. First, we have to discuss whether musicians or non-professional but musically trained subjects benefit from musical training in terms of improved memory performance (e.g., improved verbal working memory or improved long-term memory). Second, does background music exert beneficial or even detrimental effects on cognitive functions? Third, can music be used to enhance memory functions? And fourth, is music beneficial for clinical samples? In the following I will summarize some of the important findings in this field.

Musical Proficiency and Memory An often-asked question in the context of music research is whether musicians are outperforming non-musicians in non-musical memory tasks. In other words, is there a kind of transfer from music proficiency to non-musical abilities? A very recent metaanalysis aimed to clarify whether musicians indeed perform better than non-musicians in various memory tasks (Talamini, Altoè, Carretti, & Grassi, 2017). By searching published work on this topic in international databases, they collected twenty-nine studies that used fifty-three different memory tasks (e.g., working memory and long-term memory tasks with different materials). For these studies and memory tests, they calculated Hedges’ g, a measure of the effect size adjusted for small groups. These g values were interpreted according to the criteria suggested by Cohen (1988): small effect = 0.2 to 0.5; medium effect = 0.5 to 0.8; large effect > 0.8. Using this measure, they uncovered that

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   251 musicians performed better than non-musicians in terms of long-term memory (small effect: g = 0.29), short-term memory (medium effect: g = 0.57), and working memory (medium effect: g = 0.56). They also controlled for the influence of moderator variables (e.g., stimulus material: tonal, verbal, or visuospatial) and identified that the musician’s advantage for short-term and working memory was larger with tonal stimuli, moderate with verbal stimuli, and small or null with visuospatial stimuli. Thus, one is relatively safe in concluding that musicians are really better, even in non-music related memory processes. But why are they better? Currently, two possibilities are available to explain this finding: (1) a kind of Pygmalion effect or (2) a consequence of musical training. According to the Pygmalion (or Rosenthal) effect, musicians might perform better than non-musicians because the researchers expected musicians to do better, which might induce an improvement in their performance. However, differences between musicians and non-musicians have not been reported for all cognitive tasks (Schellenberg, 2001). There are only a few tasks (including memory functions) for which musicians show enhanced performance. Another possibility could be that individuals with better memory are more likely to become musicians. This is also not very likely since individuals with good memory can become very skilled and successful in other domains outside the music business. They could become good academics, economists, or philosophers. Thus, this hypothesis is not very helpful in explaining the memory advantage in musicians. On the other hand, a better memory might be a consequence of music training. This musical training might have positively influenced (1) auditory processing, (2) improved overlapping neural networks for speech and music functions, and (3) active learning strategies, such as chunking and sensorimotor integration. Improved auditory processing has been demonstrated in many experiments (Kühnis, Elmer, & Jäncke, 2014; Marie, Magne, & Besson, 2011). This improved ability could be helpful in memory tasks, especially when stimuli are presented orally, because a better auditory encoding of the item to be remembered could strengthen the trace of the stimulus in the listener’s memory. In addition, encoding via the working memory system makes use of the phonological/tonal/auditory loop of the working memory system (described earlier). Thus, musicians might use their superior auditory functions to use the early auditory encoding more efficiently than non-musicians. Incidentally, two studies (Okhrei, Kutsenko, & Makarchuk,  2017; Talamini, Carretti, & Grassi,  2016) revealed no difference between musicians and non-musicians in short-term memory tasks when verbal stimuli were presented visually, thus supporting the hypothesis that auditory encoding is the important link here. A further possible reason for the superior memory performance in musicians could be based on the strong overlap between neural networks and psychological functions involved in speech and music processing. For example, phonological awareness, reading ability, and music perception are controlled by overlapping networks (Anvari, Trainor, Woodside, & Levy, 2002; Flaugnacco et al., 2015). Music performance is a multisensorial issue involving associating the music notation with the sound of the notes, and the motor responses. These associations have to be built up during learning to play a music instrument. This particular type of training

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

252   lutz jäncke is initially effortful and demands attentional and executive control. Nevertheless, music training might therefore enhance active learning strategies, such as chunking and attentional control, functions that are essential to developing a good memory.

Influence of Background Music on Learning and Recall The influence of background music on various tasks and cognitive processes has been studied and discussed for quite a long while. A meta-analysis conducted by Kämpfe and colleagues (Kämpfe, Sedlmeier, & Renkewitz,  2010) revealed that background music does not have a uniform effect on the performance of tasks. Based on these findings, one might tentatively conclude that the effect of background music on cognitive function can be attributed to general arousal and mood changes (Schellenberg & Weiss, 2013). In one of these studies (Jaencke & Sandmann, 2010), EEG activity was recorded during encoding of verbal material. The authors found no influence of background music on verbal learning. There was, however, a substantially stronger alpha band desynchronization during the first 800–1200 ms after presentation to the stimulus to learn during background music. Four seconds later this changed to a substantial alpha band synchron­ization. According to the results presented by Klimesch (1999), this could indicate that background music presentation slightly improves the neural underpinnings of encoding (indicated by the phasic alpha band desynchronization) followed by a more efficient consolidation (indicated by the later and more tonic alpha band ­synchronization). But these neurophysiological changes do not correlate with the memory performance since the latter did not change during background listening. In a further study by Kussner and colleagues (Kussner, de Groot, Hofman, & Hillen,  2016), the authors reported unstable effects of background music on learning performance. While they found no influence of background music on learning in the first experiment, the exact replication in a second experiment revealed a beneficial effect of background music. However, they identified that beta band power measured at baseline before the learning experiment (which served as an index of trait arousal) correlated with the learning performance. Thus, general arousal as indicated by resting state beta band activity could indicate a good starting point for later occurring learning. Whether background music might positively influence learning and memory in clinical populations or in elderly subjects suffering from age-dependent memory declines is still disputed. Some studies have shown that background music can enhance memory performance in the elderly. Using NIRS, Ferreri et al. (2014) reported improved learning during background music listening conditions in elderly subjects. This learning improvement was accompanied by decreased prefrontal cortex blood flow, which the authors interpret as less activation and less “disturbing effort” during encoding. If these findings hold true in replications, it will open new perspectives to counter the often decreased episodic memory performance in the elderly.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   253

Music as Memory Modulator in Healthy Subjects A recent set of studies revealed that emotional arousal evoked by music can enhance memory consolidation (Judde & Rickard, 2010). The authors of this study presented music excerpts immediately after learning, 20 or 45 minutes after encoding of verbal material. During this post-learning period, the subjects relaxed. One week later the same subjects took part in a retention experiment during which they were tested whether they remembered the words they had learned one week before. The retention performance was significantly enhanced, regardless of valence, when music presentation occurred at 20 minutes, but not immediately or 45 minutes after encoding. The authors explain this facilitatory effect of music presentation on long-term memory in the context of what is currently known about the time course of memory consolidation. Memory consolidation is time-dependent since the biochemical processes modulating synaptic processes need some time (at least 25 minutes) to develop and to install the new and altered synaptic contacts in the memory networks, including the release of various hormones into the bloodstream (i.e., epinephrine, norepinephrine, and cortisol) (McGaugh, 2000). Thus, when arousing music (irrespective of the valence) is presented exactly 20–25 minutes post-learning, memory consolidation is enhanced. In a subsequent experiment the authors demonstrated that learning emotional material was attenuated when relaxing music was presented during the post-learning phase (Rickard, Wong, & Velik, 2012). Thus, relaxing music may counter the increased arousal levels that might enhance the formation of emotional memory containing negative and unwanted memories. A number of studies have investigated how memory performance changes when the words to be learned are sung (Calvert & Tart, 1993; Kilgour, Jakobson, & Cuddy, 2000; McElhinney & Annett,  1996; Tamminen, Rastle, Darby, Lucas, & Williamson,  2017; Wallace,  1994). The learning materials have been words, lyrics, or ballads. Although these studies differ in terms of the particular paradigms used, they all came to more or less the same conclusion that sung (verbal material) is better recalled. However, the benefit of the sung modality increased as familiarity with the melody increased. In some studies the sung benefit was entirely restricted to conditions in which the song was familiar to the participants (Calvert & Tart, 1993; Tamminen et al., 2017). These results can be best explained in the context of the SAM theory. When new information is encoded it is easier and more efficient to “connect” this new information with already stored memory traces, which familiar music is. In other words, familiar music offers the possibility of attaching new information to it. Michael Thaut and his colleagues (Peterson & Thaut,  2007) examined the neural underpinnings associated with the presentation and encoding of sung verbal material compared to spoken verbal material. For this they measured EEG during the presentation of the material to be learned and calculated what they called “learning-related changes in coherence (LRCC)” to quantify the learning-related changes of brain oscillations between the scalp electrodes. Using this measure, they found increased coherences

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

254   lutz jäncke within and between left and right frontal areas in several frequency bands during encoding of sung words. These results are interpreted as support for the hypothesis that verbal learning in the context of musical presentation strengthens coherent oscillations in frontal cortical networks, which are known to be involved in encoding and retrieval of memory information. Although the neurophysiological findings are compelling and consistent with what we know about the neural underpinnings of working memory and other memory processes (i.e., changes of brain synchronization during learning and retrieval), there was no difference in terms of behavioral performance for the sung or spoken material. It might be possible that the learning material was too easy and thus induced floor effects or that the sung material was not related strongly enough to familiar melodies.

Music as Memory Modulator in Neurological Patients There is currently substantial interest in finding non-invasive interventions to rehabilitate the cognitive impairments of neurological patients. In particular, patients suffering from memory impairments have been targeted in recent research in order to identify possible beneficial effects of music on memory impairments. One of the first and most important studies with neurological patients has been published by Särkämo et al. (2008). Applying a single-blind, randomized, and controlled experiment with sixty stroke patients of whom twenty listened daily for one hour to self-selected music, the study revealed substantial improvements for verbal memory and focused attention only for those patients listening to self-selected music. The control subjects (listening to audio book or doing nothing in addition) did not show any improvement in these cognitive functions. This study was one of the first, demonstrating beneficial effects of music listening on cognitive recovery in neurological patients. A number of studies have shown that patients with Alzheimer’s disease (AD) recognize lyrics that they learned heard sung more reliably than lyrics heard in the spoken modality (Simmons-Stern, Budson, & Ally, 2010). Besides this improvement, they also showed substantial improvements in memorizing the semantic content of the lyrics learned in the sung modality (Simmons-Stern et al., 2012). Other studies have shown that the material learned in the sung condition was relatively robust since the patients recognize these stimuli relatively well even after longer periods of time (Moussard, Bigand, Belleville, & Peretz, 2012, 2014; Palisson et al., 2015). Incidentally, similar findings have been shown in multiple sclerosis patients (Thaut, Peterson, McIntosh, & Hoemberg, 2014). Although some of the beneficial effects of sung material were relatively small (e.g., Moussard et al., 2014), they all can be explained within the same theoretical framework. We know that music processing is associated with brain activations in a distributed neural network including many brain areas. This increases the likelihood that even in case of strong degenerations of some brain areas, other network parts are intact, which then can be used for encoding and consolidation. A further possibility could be that

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   255 musical information could be used as “context” information, which can be used to “attach” the newly learned information (similar to the SAM theory proposed for memory processes in healthy subjects).

A Model for Music Memory Principally, music memory is not that different from the “classical” memory system. However, there are some fundamental and important differences when it comes to natural music and how it is processed, stored, and retrieved. Music is a dynamic stimulus evolving over time, so when listeners listen to music they have to integrate the incoming sequential auditory information and apply specific memory-based mechanisms (Gestalt perception, chunking, etc.) to form this sequence to a musical piece. Thus, music listening is not only a matter of simple auditory information processing, it is much more, since several psychological functions are involved, from working memory to several aspects of long-term memory. What is, however, special for music perception and music memory is the fact that widely distributed neural networks are involved in perceiving and recognizing musical pieces. Figure 2 demonstrates schematically the specific nature of the musical memory, which is partly derived from the SAM model proposed by Raaijmakers and Shiffrin (1981) and from Kalveram’s model of inverse processing (Kalveram & Seyfarth, 2009). As one can see from Fig. 2, auditory information is fed to the memory storage, which can be conceived of as a kind of correlation storage where auditory information (at) is associated with lots of different information resulting in a set of efferences (et). In this sense, music information can be associated with motor programs, which is particularly important for musicians who have learned to generate music by manipulating instruments. For that they need specific and highly specialized motor programs, which they can use to operate their instrument. But non-musicians do also have (maybe even an innate) audio-motor coupling, which becomes obvious when we listen to rhythmic music. In these situations, we tend to move according to the music rhythm. Besides motor programs, episodic, autobiographical, semantic, and implicit memory information are also associated with music information. Even emotion and motivation can be related to the incoming auditory information. These associations can be conceived of as correlations of varying strength, with the correlation strength depending on the frequency of repetition and the salience of the associated information. Some of these correlations give rise to conscious perceptions (explicit memory) while others remain unconscious (implicit memory). Executive functions can enhance the processing of incoming information by directing attention to particular information at the end enhancing the neural activation of those areas that are involved in processing this information. We can also apply executive functions to direct our attention to particular correlations, thus enhancing their likelihood to result in an appropriate efference. This can also lead to a kind of suppression and/or inhibition of other correlations. In this

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

256   lutz jäncke Attention correlation storage STt = 1

Auditory processing

et = f (at) rt (et, at)

efference (et)

Attention

Motor program Motor program parameter Episodic memories Autobiographic memories Semantic memories Implicit memories Emotion Motivation

Figure 2.  Schematic description of the memory system associating music information with many non-music aspects.

context, perceptual and memory schemas can be applied according to which we select or enhance incoming information or the pattern of correlations within the storage. This model can also be operated in an inverse fashion. For example, when a person wants to change his current mood he might “activate” the correlations within the storage associated with a particular emotion. This will “activate” images of those musical pieces that activate the wished emotion. Thus, the goal (to evoke a particular emotion) is now fed into the storage, which activates those efferences yielding to the emotion in question. No generation has listened to music as often as our generation. According to a 2016 US survey estimate, more than 90 percent of the population reported listening to music for an average of 25 hours a week (Nielsen, 2017). Thus, it is obvious that music is a frequently applied cue for autobiographical memories because music is associated with lots of everyday information. Thus music can serve as an efficient strategy to assess and stimulate our biographical memory. Music is a complex stimulus carrying much information and which evolves over time. This could be the reason why music processing is associated with distributed neural network activations. Obviously, it is relatively easy to link music information to multimodal information. Music can carry meaning, emotion, and episodic non-music information; it can also trigger and control motor behavior. This multiplicity and versatility of the music network offer many possibilities to insert new information. This could be the reason why several music-related learning strategies improve memory functions. However, future studies are necessary to show whether music interventions can be used to improve memory functions in both healthy and neurologically impaired subjects.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   257

References Alonso, I., Dellacherie, D., & Samson, S. (2015). Emotional memory for musical excerpts in young and older adults. Frontiers in Aging Neuroscience 23. Retrieved from https://doi. org/10.3389/fnagi.2015.00023 Altenmüller, E., Siggel, S., Mohammadi, B., Samii, A., & Münte, T. F. (2014). Play it again, Sam: Brain correlates of emotional music recognition. Frontiers in Psychology 114. Retrieved from https://doi.org/10.3389/fpsyg.2014.00114 Anvari, S. H., Trainor, L. J., Woodside, J., & Levy, B. A. (2002). Relations among musical skills, phonological processing, and early reading ability in preschool children. Journal of Experimental Child Psychology 83(2), 111–130. Baddeley, A. (2010). Working memory. Current Biology 20(4), R136–R140. Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), Psychology of Learning and Motivation (Vol. 8, pp. 47–89). New York: Academic Press. Bangert, M., & Altenmüller, E. O. (2003). Mapping perception to action in piano practice: A longitudinal DC-EEG study. BMC Neuroscience 4, 26. Retrieved from https://doi. org/10.1186/1471-2202-4-26 Bhattacharya, J., & Petsche, H. (2001). Enhanced phase synchrony in the electroencephalograph gamma band for musicians while listening to music. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 64(1 Pt. 1), 012902. Bhattacharya, J., Petsche, H., Feldmann, U., & Rescher, B. (2001). EEG gamma-band phase synchronization between posterior and frontal cortex during mental rotation in humans. Neuroscience Letters 311(1), 29–32. Bhattacharya, J., Petsche, H., & Pereda, E. (2001). Long-range synchrony in the gamma band: Role in music perception. Journal of Neuroscience 21(16), 6329–6337. Bower, G. H. (1981). Mood and memory. The American Psychologist 36, 129–148. Calvert, S. L., & Tart, M. (1993). Song versus verbal forms for very-long-term, long-term, and short-term verbatim recall. Journal of Applied Developmental Psychology 14(2), 245–260. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York: Lawrence Erlbaum Associates. Cowan, N. (2011). The focus of attention as observed in visual working memory tasks: Making sense of competing claims. Neuropsychologia 49, 1401–1406. Deutsch, D. (1970). Tones and numbers: Specificity of interference in immediate memory. Science 168(3939), 1604–1605. Deutsch, D., Henthorn, T., Marvin, E., & Xu, H. (2006). Absolute pitch among American and Chinese conservatory students: Prevalence differences, and evidence for a speech-related critical period. Journal of the Acoustical Society of America 119(2), 719–722. Elmer, S., Hänggi, J., Meyer, M., & Jäncke, L. (2013). Increased cortical surface area of the left planum temporale in musicians facilitates the categorization of phonetic and temporal speech sounds. Cortex 49(10), 2812–2821. Elmer, S., Rogenmoser, L., Kühnis, J., & Jäncke, L. (2015). Bridging the gap between perceptual and cognitive perspectives on absolute pitch. Journal of Neuroscience 35(1), 366–371. Eschrich, S., Münte, T. F., & Altenmüller, E. O. (2005). Remember Bach: An investigation in episodic memory for music. Annals of the New York Academy of Sciences 1060, 438–442. Eschrich, S., Münte, T. F., & Altenmüller, E. O. (2008). Unforgettable film music: The role of emotion in episodic long-term memory for music. BMC Neuroscience 9, 48. Retrieved from https://doi.org/10.1186/1471-2202-9-48

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

258   lutz jäncke Evers, S., & Suhr, B. (2000). Changes of the neurotransmitter serotonin but not of hormones during short time music perception. European Archives of Psychiatry and Clinical Neuroscience 250(3), 144–147. Ferreri, L., Bigand, E., Perrey, S., Muthalib, M., Bard, P., & Bugaiska, A. (2014). Less effort, better results: How does music act on prefrontal cortex in older adults during verbal encoding? An fNIRS study. Frontiers in Human Neuroscience 8, 301. Retrieved from https://doi. org/10.3389/fnhum.2014.00301 Ferreri, L., & Rodriguez-Fornells, A. (2017). Music-related reward responses predict episodic memory performance. Experimental Brain Research 235(12), 3721–3731. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schon, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PLoS ONE 10(9), e0138715. Ford, J. H., Addis, D. R., & Giovanello, K. S. (2011). Differential neural activity during search of specific and general autobiographical memories elicited by musical cues. Neuropsychologia 49(9), 2514–2526. Frieler, K., Fischinger, T., Schlemmer, K., Lothwesen, K., Jakubowski, K., & Müllensiefen, D. (2013). Absolute memory for pitch: A comparative replication of Levitin’s 1994 study in six European labs. Musicae Scientiae: The Journal of the European Society for the Cognitive Sciences of Music 17(3), 334–349. Gaab, N., Gaser, C., Zaehle, T., Jancke, L., & Schlaug, G. (2003). Functional anatomy of pitch memory: An fMRI study with sparse temporal sampling. NeuroImage 19(4), 1417–1426. Gagnepain, P., Fauvel, B., Desgranges, B., Gaubert, M., Viader, F., Eustache, F., . . . Platel, H. (2017). Musical expertise increases top-down modulation over hippocampal activation during familiarity decisions. Frontiers in Human Neuroscience 11, 472. Retrieved from https://doi.org/10.3389/fnhum.2017.00472 Gobet, F. (1998). Expert memory: A comparison of four theories. Cognition 66(2), 115–152. Green, A. C., Bærentsen, K. B., Stødkilde-Jørgensen, H., Roepstorff, A., & Vuust, P. (2012). Listen, learn, like! Dorsolateral prefrontal cortex involved in the mere exposure effect in music. Neurology Research International 2012, 846270. Retrieved from http://dx.doi. org/10.1155/2012/846270 Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (1999). Absolute pitch: Prevalence, ethnic variation, and estimation of the genetic component. American Journal of Human Genetics 65(3), 911–913. Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2001). Early childhood music education and predisposition to absolute pitch: Teasing apart genes and environment. American Journal of Medical Genetics 98(3), 280–282. Groussard, M., La Joie, R., Rauchs, G., Landeau, B., Chetelat, G., Viader, F., . . . Platel, H. (2010). When music and long-term memory interact: Effects of musical expertise on functional and structural plasticity in the hippocampus. PLoS ONE 5(10), e13225. Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition 17(5), 572–581. Halpern, A. R., & Müllensiefen, D. (2008). Effects of timbre and tempo change on memory for music. Quarterly Journal of Experimental Psychology 61(9), 1371–1384. Halpern, A. R., & O’Connor, M. G. (2000). Implicit memory for music in Alzheimer’s disease. Neuropsychology 3(14), 391–397. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience 8(5), 393–402.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   259 Honing, H., & Ladinig, O. (2009). Exposure influences expressive timing judgments in music. Journal of Experimental Psychology: Human Perception and Performance 35(1), 281–288. Jaencke, L., & Sandmann, P. (2010). Music listening while you learn: No influence of background music on verbal learning. Behavioral and Brain Functions 6, 3. Retrieved from https://doi.org/10.1186/1744-9081-6-3 Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex 19(11), 2579–2594. Jäncke, L. (2012). The dynamic audio-motor system in pianists. Annals of the New York Academy of Sciences 1252, 246–252. Jäncke, L., & Alahmadi, N. (2016). Detection of independent functional networks during music listening using electroencephalogram and sLORETA-ICA. Neuroreport 27(6), 455–461. Jäncke, L., Mirzazade, S., & Shah, N. J. (1999). Attention modulates activity in the primary and the secondary auditory cortex: A functional magnetic resonance imaging study in human subjects. Neuroscience Letters 266(2), 125–128. Johnson, M. K., Kim, J. K., & Risse, G. (1985). Do alcoholic Korsakoff ’s syndrome patients acquire affective reactions? Journal of Experimental Psychology: Learning, Memory, and Cognition 11(1), 22–36. Judde, S., & Rickard, N. (2010). The effect of post-learning presentation of music on long-term word-list retention. Neurobiology of Learning and Memory 94(1), 13–20. Kalveram, K. T., & Seyfarth, A. (2009). Inverse biomimetics: How robots can help to verify concepts concerning sensorimotor control of human arm and leg movements. Journal of Physiology 103(3–5), 232–243. Kämpfe, J., Sedlmeier, P., & Renkewitz, F. (2010). The impact of background music on adult listeners: A meta-analysis. Psychology of Music 39(4), 424–448. Keitz, M., Martin-Soelch, C., & Leenders, K.  L. (2003). Reward processing in the brain: A prerequisite for movement preparation? Neural Plasticity 10(1–2), 121–128. Kilgour, A. R., Jakobson, L. S., & Cuddy, L. L. (2000). Music training and rate of presentation as mediators of text and song recall. Memory & Cognition 28(5), 700–710. Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Research Reviews 29(2), 169–195. Klimesch, W., Sauseng, P., & Gerloff, C. (2003). Enhancing cognitive performance with repetitive transcranial magnetic stimulation at human individual alpha frequency. European Journal of Neuroscience 17(5), 1129–1133. Koelsch, S., Kasper, E., Sammler, D., Schulze, K., Gunter, T., & Friederici, A. D. (2004). Music, language and meaning: Brain signatures of semantic processing. Nature Neuroscience 7(3), 302–307. Kühn, S., Romanowski, A., Schilling, C., Lorenz, R., Mörsen, C., Seiferth, N., . . . IMAGEN Consortium (2011). The neural basis of video gaming. Translational Psychiatry 1(11), e53. Kühnis, J., Elmer, S., & Jäncke, L. (2014). Auditory evoked responses in musicians during passive vowel listening are modulated by functional connectivity between bilateral auditoryrelated brain regions. Journal of Cognitive Neuroscience 26(12), 2750–2761. Kussner, M. B., de Groot, A. M., Hofman, W. F., & Hillen, M. A. (2016). EEG beta power but not background music predicts the recall scores in a foreign-vocabulary learning task. PLoS ONE 11(8), e0161387. Lee, Y. S., Janata, P., Frost, C., Martinez, Z., & Granger, R. (2015). Melody recognition revisited: Influence of melodic Gestalt on the encoding of relational pitch information. Psychonomic Bulletin & Review 22(1), 163–169.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

260   lutz jäncke Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence from the production of learned melodies. Perception & Psychophysics 56, 414–423. Levitin, D. J., & Rogers, S. E. (2005). Absolute pitch: Perception, coding, and controversies. Trends in Cognitive Sciences 9(1), 26–33. Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature 453(7197), 869–878. McElhinney, M., & Annett, J. M. (1996). Pattern of efficacy of a musical mnemonic on recall of familiar words over several presentations. Perceptual and Motor Skills 82(2), 395–400. McGaugh, J. L. (2000). Memory: A century of consolidation. Science 287(5451), 248–251. Margulis, E. H., Mlsna, L. M., Uppunda, A. K., Parrish, T. B., & Wong, P. C. M. (2009). Selective neurophysiologic responses to music in instrumentalists with different listening biographies. Human Brain Mapping 30(1), 267–275. Marie, C., Magne, C., & Besson, M. (2011). Musicians and the metric structure of words. Journal of Cognitive Neuroscience 23(2), 294–305. Meneses, A., & Liy-Salmeron, G. (2012). Serotonin and emotion, learning and memory. Reviews in the Neurosciences 23(5–6), 543–553. Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2012). Music as an aid to learn new verbal information in Alzheimer’s disease. Music Perception: An Interdisciplinary Journal 29(5), 521–531. Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2014). Learning sung lyrics aids retention in normal ageing and Alzheimer’s disease. Neuropsychological Rehabilitation 24(6), 894–917. Nielsen (2017). Nielsen Music year-end report 2016. Retrieved from http://www.nielsen.com/ us/en/press-room/2017/nielsen-releases-2016-us-year-end-music-report.html Oberauer, K., & Lewandowsky, S. (2011). Modeling working memory: A computational implementation of the Time-Based Resource-Sharing theory. Psychonomic Bulletin & Review 18(1), 10–45. Okhrei, A., Kutsenko, T., & Makarchuk, M. (2017). Performance of working memory of musicians and non-musicians in tests with letters, digits, and geometrical shapes. Biologija 62(4), 207–215. Palisson, J., Roussel-Baclet, C., Maillet, D., Belin, C., Ankri, J., & Narme, P. (2015). Music enhances verbal episodic memory in Alzheimer’s disease. Journal of Clinical and Experimental Neuropsychology 37(5), 503–517. Parks, S. L., & Clancy Dollinger, S. (2014). The positivity effect and auditory recognition memory for musical excerpts in young, middle-aged, and older adults. Psychomusicology: Music, Mind, and Brain 24(4), 298–308. Peretz, I., Gosselin, N., Belin, P., Zatorre, R. J., Plailly, J., & Tillmann, B. (2009). Music lexical networks. Annals of the New York Academy of Sciences 1169, 256–265. Peterson, D. A., & Thaut, M. H. (2007). Music increases frontal EEG coherence during verbal learning. Neuroscience Letters 412(3), 217–221. Piaget, J. (1923). La langage et la pensée chez l’enfant: Études sur la logique de l’enfant. Retrieved from http://pubman.mpdl.mpg.de/pubman/item/escidoc:2375486/component/ escidoc:2375485/Piaget_1923_language_pensee_enfant.pdf Plailly, J., Tillmann, B., & Royet, J.-P. (2007). The feeling of familiarity of music and odors: The same neural signature? Cerebral Cortex 17(11), 2650–2658. Plantinga, J., & Trainor, L. J. (2005). Memory for melody: Infants use a relative pitch code. Cognition 98(1), 1–11.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and memory   261 Platel, H. (2005). Functional neuroimaging of semantic and episodic musical memory. Annals of the New York Academy of Sciences 1060, 136–147. Raaijmakers, J. G., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review 88(2), 93–134. Rawson, K. A., & Van Overschelde, J. P. (2008). How does knowledge promote memory? The distinctiveness theory of skilled memory. Journal of Memory and Language 58(3), 646–668. Rickard, N. S., Wong, W. W., & Velik, L. (2012). Relaxing music counters heightened consolidation of emotional memory. Neurobiology of Learning and Memory 97(2), 220–228. Rogenmoser, L., Elmer, S., & Jäncke, L. (2015). Absolute pitch: Evidence for early cognitive facilitation during passive listening as revealed by reduced P3a amplitudes. Journal of Cognitive Neuroscience 27(3), 623–637. Salamé, P., & Baddeley, A. (1989). Effects of background music on phonological short-term memory. Quarterly Journal of Experimental Psychology Section A 41(1), 107–122. Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Samson, S., & Peretz, I. (2005). Effects of prior exposure on music liking and recognition in patients with temporal lobe lesions. Annals of the New York Academy of Sciences 1060, 419–428. Särkämö, T., Tervaniemi, M., Laitinen, S., Forsblom, A., Soinila, S., Mikkonen, M., . . . Hietanen, M. (2008). Music listening enhances cognitive recovery and mood after middle cerebral artery stroke. Brain: A Journal of Neurology 131, 866–876. Schellenberg, E. G. (2001). Music and nonmusical abilities. Annals of the New York Academy of Sciences 930, 355–371. Reprinted in G.  E.  McPherson (Ed.), The child as musician: A  handbook of musical development (2nd ed., pp. 149–176). Oxford: Oxford University Press, 2016. Schellenberg, E. G., & Weiss, M. W. (2013). Music and cognitive abilities. In D. Deutsch (Ed.), The Psychology of Music (3rd ed., pp. 499–550). London: Academic Press. Schendel, Z.  A., & Palmer, C. (2007). Suppression effects on musical and verbal memory. Memory & Cognition 35(4), 640–650. Schulze, K., & Koelsch, S. (2012). Working memory for speech and music. Annals of the New York Academy of Sciences 1252, 229–236. Schulze, K., Mueller, K., & Koelsch, S. (2011). Neural correlates of strategy use during auditory working memory in musicians and non-musicians. European Journal of Neuroscience 33(1), 189–196. Schulze, K., Mueller, K., & Koelsch, S. (2013). Auditory stroop and absolute pitch: An fMRI study. Human Brain Mapping 34(7), 1579–1590. Schulze, K., Zysset, S., Mueller, K., Friederici, A. D., & Koelsch, S. (2011). Neuroarchitecture of verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping 32, 771–783. Semal, C., Demany, L., Ueda, K., & Hallé, P.  A. (1996). Speech versus nonspeech in pitch memory. Journal of the Acoustical Society of America 100(2 Pt. 1), 1132–1140. Simmons-Stern, N. R., Budson, A. E., & Ally, B. A. (2010). Music as a memory enhancer in patients with Alzheimer’s disease. Neuropsychologia 48(10), 3164–3167. Simmons-Stern, N. R., Deason, R. G., Brandler, B. J., Frustace, B. S., O’Connor, M. K., Ally, B. A., & Budson, A. E. (2012). Music-based memory enhancement in Alzheimer’s disease: Promise and limitations. Neuropsychologia 50(14), 3295–3303.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

262   lutz jäncke Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin 113(2), 345–361. Talamini, F., Altoè, G., Carretti, B., & Grassi, M. (2017). Musicians have better memory than nonmusicians: A meta-analysis. PLoS ONE 12(10), e0186773. Talamini, F., Carretti, B., & Grassi, M. (2016). The working memory of musicians and nonmusicians. Music Perception: An Interdisciplinary Journal 34(2), 183–191. Tamminen, J., Rastle, K., Darby, J., Lucas, R., & Williamson, V. J. (2017). The impact of music on learning and consolidation of novel words. Memory 25(1), 107–121. Thaut, M. H., Peterson, D. A., McIntosh, G. C., & Hoemberg, V. (2014). Music mnemonics aid verbal memory and induce learning-related brain plasticity in multiple sclerosis. Frontiers in Human Neuroscience 8, 395. Retrieved from https://doi.org/10.3389/fnhum.2014.00395 Trainor, L. J., McDonald, K. L., & Alain, C. (2002). Automatic and controlled processing of melodic contour and interval information measured by electrical brain activity. Journal of Cognitive Neuroscience 14(3), 430–442. Vieillard, S., & Gilet, A.-L. (2013). Age-related differences in affective responses to and memory for emotions conveyed by music: A cross-sectional study. Frontiers in Psychology 4, 711. Retrieved from https://doi.org/10.3389/fpsyg.2013.00711 Wallace, W.  T. (1994). Memory for music: Effect of melody on recall of text. Journal of Experimental Psychology: Learning, Memory, and Cognition 20(6), 1471–1485. Watanabe, T., Yagishita, S., & Kikyo, H. (2008). Memory of music: Roles of right hippocampus and left inferior frontal gyrus. NeuroImage 39(1), 483–491. Werbik, H. (1971). Informationsgehalt und emotionale Wirkung von Musik. Mainz: B. Schott. Wildschut, T., Sedikides, C., Arndt, J., & Routledge, C. (2006). Nostalgia: Content, triggers, functions. Journal of Personality and Social Psychology 91(5), 975–993. Williamson, V. J., Baddeley, A. D., & Hitch, G. J. (2010). Musicians’ and nonmusicians’ shortterm memory for verbal and musical sequences: Comparing phonological similarity and pitch proximity. Memory & Cognition 38(2), 163–175. Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology 9(2 pt. 2), 1–27. Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998). Functional anatomy of musical processing in listeners with absolute pitch and relative pitch. Proceedings of the National Academy of Sciences 95(6), 3172–3177.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 12

M usic a n d At ten tion, Ex ecu ti v e Fu nction, a n d Cr e ati v it y Psyche Loui and Rachel E. Guetta

Introduction Attention is “the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration, of consciousness are of its essence” (James, 1890, p. 403). Executive functions are “a family of top-down mental processes needed when you have to concentrate and pay attention . . . three core EFs: inhibition [inhibitory control, including self-control (behavioral inhibition) and interference control (selective attention and cognitive inhibition)], working memory (WM), and cognitive flexibility (also called set shifting, mental flexibility, or mental set shifting and closely linked to creativity)” (Diamond, 2013, pp. 1–2). Creativity is “the ability to produce work that is novel (i.e., original, unexpected), high in quality, and appropriate (i.e., useful, meets task constraints)” (Sternberg, Lubart, Kaufman, & Pretz, 2005, p. 351). How does music, as “organized sound” (Varèse & Wen-Chung, 1966), intersect with these cognitive capacities of the human mind? In this chapter, we provide a general overview of the contemporary research at the intersection of music and attention, executive functions, and creativity. On one hand, we see that musical sounds provide an optimal stimulus set with which to understand the fundamental properties of attention, executive functions, and creativity. On the other hand, music also offers a window through which researchers may assess effects of long-term training on more general cognitive function, as well as neurocognitive development throughout the lifespan.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

264    psyche loui and rachel e. guetta

Music and Attention There are many ways to conceptualize the vast literature on attention. Perhaps as a result, research on the intersection between attention and music has been similarly fragmented. Nevertheless, research on music and attention has followed the trends of psychology and neuroscience more generally, and musical stimuli have served as a useful model to tease apart several models of attention. Here we provide a general overview of the disparate theories on attention, before turning to its intersection with the work on music more specifically.

Theories of Attention Patel’s OPERA hypothesis (Patel, 2011b) posits that one of several reasons why music training benefits the neural encoding of speech is through attention: by engaging shared brain networks between music and speech that are associated with focused attention. Attention has been thought of in terms of early versus late selection theories, and in terms of its operation over space and time. Early selection theories focus on sensory processing and more exogenous (reflexive) sources of information, whereas late selection theories focus more on feature selection and more cognitive, endogenous operations. Theories of early versus late attention differ in their posited effects of perceptual ­selection, enhancement, and cognitive focus along various points in the perceptual-­ cognitive pathway, or along the gradient of primary to association areas in the human cortex. Such early versus late selection theories of attention pertain to when, temporally, along the classic sensory-cognitive pathway attentional processes might operate. Early selection theories generally focus on sensory processing (closer to the sensory periphery) whereas late selection theories focus more on cognitive processing. Evidence for early selection comes from findings from the dichotic listening paradigm in which eventrelated brain potentials were recorded. The amplitude of the N1, an event-related brain response generated in response to sounds, is enhanced in response to sounds in the attended ear relative to the unattended ear (Woldorff & Hillyard, 1991). Magnetoence­ phalography work subsequently pointed out the source of this attentional enhancement to the auditory cortex (Woldorff et al., 1993). Since the auditory cortex is part of the primary sensory cortices, the finding that this early cortical way station acts on attentional processing as early as 100 ms after sound presentation provides convincing evidence for early selection. Theories that posit relatively late selection conceptualize attention as a feature-based or object-based operation. In particular, the feature integration theory (Treisman & Gelade, 1980) posits that attention operates by combining pre-attentively selected features within a busy scene. Support for this comes from illusory conjunctions, in which unattended features of visual objects, such as color and shape, are sometimes combined

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   265 to give rise to an illusory percept of a nonexistent object. While this theory has received lots of interest, the definitions of features in vision may not so readily transfer to audition. In the auditory modality, stimulus representation has been described as hierarchical, as shown by psychophysical and modeling studies. At the lowest rung of the hierarchy of stimulus representation there are “primitive features” such as acoustic frequency, whereas at higher levels there are more complex, emergent features such as virtual pitch, which combine with other features to form objects. Attention can be enhanced by cueing at the appropriate level, thus reducing uncertainty (Hafter & Saberi, 2001). Object-based attention offers a direct comparison between visual and auditory processing. Much like the visual system combines features to form objects, the auditory system forms objects by grouping together sound elements that share features such as frequency and harmonic structure (Shinn-Cunningham, 2008). The temporal evolution of these features is especially relevant for object formation in the auditory system. At a low-level timescale, the auditory system may group together sounds based on similar fine-grained temporal features such as attack time, while at a higher-level t­ imescale, distinct tones may be grouped together based on temporal proximity to give rise to beat perception. Beat perception has been proposed as an attentional mechanism, through which different temporal objects such as tones are combined together to form larger units such as rhythms and phrases (De Freitas, Liverence, & Scholl, 2014; Grahn, 2012). The rhythmic effects of attention over time will be revisited later in this section. Evidence for late selection in auditory neurophysiology comes from findings of later attention-related enhancements in event-related brain responses such as the P300 (Purves et al., 2008). In addition, cases of late selection are supported by the neuropsychological literature, in which patients with lesions in the right parietal cortex present with lack of awareness of their contralesional (usually left) visual field. For these cases, sometimes the successful perception of one feature can reduce the detectability of another, simultaneously present feature, a condition known as extinction. In the auditory/ musical modality, interesting evidence comes from the use of an auditory illusion in a case of auditory extinction (Deouell, Deutsch, Scabini, Soroker, & Knight, 2007). This study took advantage of Deutsch’s scale illusion, in which presenting subjects with alternating high-pitched and low-pitched tones to the left and right ear paradoxically leads to the percept of a stream of high tones in the right ear and low tones in the left (Deutsch, 1974). When the patient with auditory neglect was presented with octave illusion stimuli, he reported only hearing the high-pitched stream. The fact that he only heard the right-lateralized stream, rather than the right-ear stimulus, does suggest that some forms of perceptual analysis, such as the formation of auditory streams, are intact before attention and its disruption in hemispatial neglect, thus providing support for late selection. A third line of literature supports a combination of early and late selection theories in showing attention-related enhancements of mid-latency brain responses to sound. For example, the mismatch negativity, an event-related potential generated around 200 ms after the onset of unexpected sounds, is both pre-attentively generated and modulated by attention (Woldorff, Hillyard, Gallen, Hampson, & Bloom, 1998). More specific to the

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

266    psyche loui and rachel e. guetta music literature, the Early Right Anterior Negativity (ERAN), an event-related potential in response to unexpected musical chords such as the Neapolitan chord (Koelsch, Gunter, Friederici, & Schröger, 2000), is also pre-attentively generated but modulated by attention: When subjects were directing attention away from auditory stimuli in a visual task, they nevertheless elicited an ERAN in response to the Neapolitan chord; however its amplitude is larger in the attended condition (Loui, Grent-’T-Jong, Torpey, & Woldorff, 2005). Taken together, the best available resolution for the debate on early versus late selection holds that attention acts on multiple levels of the perceptual-cognitive or primary-association continuum, by selecting relevant features and processing them more fully at more sensory stages, and also by combining selected features to form coherent objects, streams, or scenes at later association stages.

Selection and Filtering While the controversy amongst the early versus late selection camps continues, other work has focused on the roles of attention for selecting and filtering (Hafter, Sarampalis, & Loui, 2008). Perhaps the most common example of attentional filtering is the famous cocktail party effect, our remarkable ability to focus on one speaker amidst a noisy environment (Cherry, 1953). In contrast, Broadbent (1982) noted that peripheral stimuli may also take over attention and processing, such as in the “breakthrough of the unattended” phenomenon. Bregman’s (1994) theory of auditory scene analysis posits that we stream or segregate distinct auditory stimuli by means of top-down knowledge as well as bottom-up ­perceptual processing, based on acoustic features such as frequency and amplitude co-modulation. This auditory stream segregation, the dividing of our world into separate sound-emitting objects, helps us to make sense of the sounds around us. Music listening, thus, entails many aspects of analyzing a busy auditory scene. In Western music, for instance, at various times we are continually separating and fusing the different voices within the musical surface to perceive melody and harmony. This act of auditory scene analysis requires selective and divided attention, and interacts with training (Loui & Wessel, 2007). In music, the objects to which we attend may pertain to horizontal aspects such as melody, vertical aspects such as harmony, timbral aspects including spectral centroid, and amplitude envelope. Attended features or objects may also be music-theoretically defined components such as specific chord changes and harmonies. They may also pertain to rhythm, meter, and/or larger-scale musical structure such as form.

Attending to Musical Pitch and Harmonicity The musical surface is rich with different types of information, all of which can direct our attention as we listen. Frequency, pitch, and harmonicity can act as predictive cues, guiding our attention toward the cued feature. Early psychophysical work had shown

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   267 that subjects were better at detecting tones that are presented at an expected frequency as well as an expected pitch, giving rise to the idea that cues can combine hierarchically, as reviewed above (Hafter & Saberi, 2001). However, cues do not have to share perceptual features with the target in order to drive attention: Voluntary attention to a cue frequency heightens sensitivity for a different target frequency; furthermore, a visual cue can direct attention toward an auditory frequency (Hafter, Schlauch, & Tang,  1993). Thus, auditory sensitivity increases not only for what is physically presented, but also for what is attended. Signal detection is easier when the signal shares perceptual features with the attended cue, thus enabling involuntary or exogenous cueing; but also, whenever the cue provides some information that can endogenously (or voluntarily) lead to the reduction of uncertainty and the increase in predictability about the target in an ongoing task. These effects of endogenous cueing also guide expectations in a higher-level musical context. Based on our long-term knowledge from encountering music in our culture, humans have developed expectations for commonly co-occurring musical structures such as in harmony, melody, and musical syntax. Reaction time studies have shown that our knowledge of musical syntax can act as a prime or a cue that directs attention toward musically expected stimuli, thus reducing reaction time for harmonically expected musical structures and increasing reaction time for unexpected structures (Bharucha & Stoeckig, 1986; Marmel, Tillmann, & Dowling, 2008). This enhanced attentional processing due to the priming effect of tonality is not tied to tasks that involve reacting to the feature of musical expectation itself; its effects even spread to visual processing (Escoffier & Tillmann, 2008). The priming effect of tonal expectations has been shown in non-musicians as well as musicians, suggesting that they result from implicitly learned expectations rather than from explicit musical training (Bigand, Poulin, Tillmann, Madurell, & D’Adamo, 2003). However, the effect of tonal expectations does depend on selective attention: when the task is to attend selectively to the melodic contour of a chord progression, musically trained subjects were more affected by unexpected harmonies, showing both reaction time costs and benefits relative to musically untrained subjects, who were slower overall but not affected by different unexpected chord progressions (Loui & Wessel, 2007). This again points to the analysis of complex musical materials (such as chord progressions with different voices) as auditory scenes with different streams of information in local as well as global contexts, a view echoed in other cognitive and electrophysiological studies (Justus & List,  2005; List, Justus, Robertson, & Bentin, 2007).

Temporal Attention, Prediction, and Entrainment of Musical Stimuli In addition to operating over different points in frequency, pitch, and harmonicity, attention also operates over time. Perhaps the most recent influential view of how music can contribute to the discussion in attention comes from the idea that music unfolds

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

268    psyche loui and rachel e. guetta over time in the form of rhythm, which is the pattern of inter-onset intervals which enable the cognitive system to chunk the incoming sound stimuli in a hierarchical ­manner (Longuet-Higgins & Lee, 1982; Povel & Essens, 1985). The idea that attention is temporally based is not incompatible with the object-based views of attention reviewed earlier in this chapter, but more recently there has been a shift of interest specifically toward how attention changes dynamically over time. This is modeled by the Dynamic Attending Theory, which posits that attention fluctuates in rhythmically predictable pulses, giving rise to different levels of detection and identification to stimuli presented at different times relative to the attentional rhythm (Jones, 1976; Jones & Boltz, 1989; Jones, Moynihan, MacKenzie, & Puente, 2002). Compelling evidence for the Dynamic Attending Theory comes from psychophysical studies, in which subjects were better at same-different judgments in pitch when the pitch to be judged occurred at a rhythmically predictable time (Jones et al., 2002). The study of rhythmic attention has recently become closely tied to the study of rhythmic oscillations in the brain. The idea that there are intrinsic rhythmic fluctuations in the brains of humans and other mammals is not new, going back to the late 1800s and popularized by Hans Berger in the 1920s (Millett,  2001). Berger discovered that by recording electrical signals from the human scalp, he could observe spontaneous electrical fluctuations of the electroencephalogram (EEG) at the rate of ~10 Hz, which he coined as the alpha rhythm. The power of alpha-band activity is highest during states of rest and relaxation. In contrast, activity increases in different frequency bands such as beta (~20 Hz), gamma (>30 Hz), and delta (2–4 Hz) have been observed during different mental states. These bands of oscillatory activity, and the physical relationships between them, are hypothesized to have functional significance for enabling long-range neuronal communications across the brain. In particular, beta activity is shown to track the beat during the perception and imagery of rhythmic music (Fujioka, Ross, & Trainor, 2015). Rhythmic synchronization to the beat frequency is strongest over the motor areas (Nozaradan, Zerouali, Peretz, & Mouraux,  2013), suggesting an involvement of the motor system in attending to the beat, consistent with fMRI work (Grahn & Brett, 2007). Furthermore, bursts of activity in the beta band are found to originate in the left sensorimotor cortex and influences activity in the auditory cortex, suggesting that the motor system, with its intrinsic oscillatory activity in the beta band, guides rhythmic attention in the auditory system (Morillon & Baillet, 2017). Together, the recent literature shows that musical rhythm drives auditory attention via the entrainment of oscillatory neuronal activity at multiple frequencies, which originates in the motor system but is tightly coupled with the auditory system. In addition to being important for understanding attention to musical rhythm, these findings also pertain to speech, which contains multiple temporal modulations at specific frequencies (Ding et al., 2017). Selective attention in the real world likely entails listening at various t­imescales, which affects different ­patterns of neural and behavioral entrainment (Henry, Herrmann, & Obleser, 2015). Understanding how the brain organizes these fluctuating rhythms may have implications for designing music targeted toward enhancing attention. New approaches to music composition have inserted some rhythmic components (e.g., fast rhythmic amplitude

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   269 modulations) to the musical stimulus to t­ arget specific neuronal oscillations, with the ultimate goal of improving cognition (James et al., 2017). This approach is promising as it may offer therapeutic possibilities for music-based training of executive functions that makes use of the rhythmic temporal properties of attention to achieve optimal goaldirected behavior.

Music and Executive Function Executive functions (EFs) include processes related to planning and self-control, as well as attention, working memory, mental inhibition, and cognitive flexibility (Diamond, 2013). This subset of cognitive function enables us to readily manipulate and prioritize information, filter through distractors, balance our thoughts, and switch between tasks to optimize cognitive performance. Without these processes, we would not be able to concentrate on important tasks, think before acting, adapt to unexpected challenges, resist temptations, or generally function cognitively in our daily lives. The fundamental EFs, namely inhibition, interference control, working memory, and cognitive flexibility, play important roles in development, intelligence, and social and cognitive health. The question as to whether and how EFs are enhanced through either passive music listening or more active long-term musical training has gained increased attention. The proposition that music and musical training may influence executive functioning has been a topic of debate in recent years, perhaps first widely popularized in media and public interest by the Mozart Effect (Rauscher, Shaw, & Ky, 1993). The idea that merely listening to music could improve our grades in school, our ability to focus, or even our general IQ was at once exciting and applicable, not to mention marketable. Since the inception of the Mozart Effect, however, research has debunked the idea that passively listening to Mozart can transfer to cognitive gains outside of the musical domain. And so, the questions remain: Does music training confer non-musical advantages? If so, how? The long-term effect of music training is arguably the most active area of music and the brain research today. This section will delineate the current theories and literature on what potential effects music may have on EFs, the roles of near versus far transfer, and the functions of specific neural mechanisms on EFs and transfer. Since the Mozart Effect has largely been discredited, the focus in music cognition research has been the long-term, more effortful effects of musical training. Unlike passive listening, long-term music training engages more of our neural and cognitive circuitry and thus can be expected to induce structural and functional plastic changes in the brain. The importance of discerning whether musical training promotes any advantages to EFs relates to the question of the transferring of skills, or transfer. The transfer and generalization of learning and skills from one area to another, then, can increase general cognitive capacities. Near transfer occurs within a specific modality (e.g., music and speech) whereas far transfer occurs between two less obviously

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

270    psyche loui and rachel e. guetta related domains (e.g., music and IQ or music and conflict monitoring). While nearer forms of transfer between music and related areas have been demonstrated, far transfer is harder to prove.

Association Studies Suggesting Near Transfer Studying near transfer as a means to understand the possible effects of music on related cognitive abilities and EFs include association studies looking at groups of children and adults, some musically trained and some untrained. From these comparison studies between subjects with different levels of musical training, we know that training has measureable effects on the brain as indicated by auditory evoked responses, such as those generated from the brainstem (Kraus & Chandrasekaran, 2010). Patel’s OPERA hypothesis postulates that musical training benefits the neural encoding of speech in five ways, the first of which is overlap between neural resources for music and speech (Patel, 2011a). This is supported by many known associations between musical training, speech, and language skills. For instance, musical training improves auditory skills such as pitch discrimination, which is associated with children’s reading abilities and phonemic knowledge, providing evidence of an association between musical abilities and the EF needed for reading and linguistic processing (Lamb & Gregory, 1993). Children with better pitch perception and production abilities also perform better at phonemic awareness tests even after controlling for intelligence and musical training (Loui, Kroog, Zuk, Winner, & Schlaug, 2011), providing additional support for shared neural resources for musical (pitch) and speech (phonemic) awareness. Advantages of pitch discrimination generalize to tasks that involve the perception of pitch in speech, and may be generally helpful in non-musical, cognitive tasks (Lolli, Lewenstein, Basurto, Winnik, & Loui, 2015). Still, association studies of near transfer lack certain clarity due to potential confounds, such as parental income, education, and other indirect causes of non-random allocation of participants. Theoretically, an influential model that has been proposed to underlie near transfer between music and language is Patel’s shared syntactic resource integration hypothesis (SSRIH) (Patel, 2003). The SSRIH proposes that syntax in language and music share a common set of processes, executed in temporal and frontal brain regions. The proposal of a synergistic processing scheme between music and language was demonstrated when both reaction time and reading comprehension were especially taxed due to the need to simultaneously integrate syntactically ambiguous grammar and harmonic violations (Slevc, Rosenberg, & Patel, 2009). Supporting the SSRIH, these findings reinforce the theory that music and language draw on a common pool of limited processing resources for approaching and making sense of incoming elements into syntactic structures. The resolution of perceptual and cognitive conflicts, or cognitive control, then, has been implicated in both music and linguistic processing. Patel’s demonstration of interactive effects between the two modalities suggests the presence of near transfer between syntactic processing of music to language.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   271 Although the SSRIH posits shared resources between music and language, the nature of this resource is unclear. Slevc and Okada (2015) suggest that cognitive control, and the implicated prefrontal cortical mechanisms, may be one shared resource between the musical and linguistic domains. And while the intersection of music and language has not historically been focused on EFs, the idea that cognitive control may be controlling both syntactic domains is worth noting. The points of convergence between processing and filtering amongst language and music, as well as the notion of transfer, may help to explain a possible mechanism by which musical training enhances cognitive functions such as EFs. These findings pose generalizable implications on immediate and long-term cognitive transfer from musical training to, say, reading exercises and vice versa. Slevc and Okada’s theory that cognitive control may be one shared resource between the musical and linguistic domains is important in understanding how detection and resolution of conflict occurs when expectations are violated and interpretations must be reworked, as in the case of grammatical and harmonic violations. By this account, musical training involves not just the incremental processing and integration of musical elements as they occur sequentially, but also the generation of musical predictions and expectations, which must sometimes be prioritized and revised in response to evolving musical input. An additional study investigating the relationship between music and EFs evaluated musical experience and its ability to predict individual differences on inhibition, updating, and set-switching in both auditory and visual modalities (Slevc, Davey, Buschkuehl, & Jaeggi, 2016). Incidentally, musical ability was indeed able to predict better performance on both auditory and visual updating tasks, even when controlling for a variety of potential confounds such as age, handedness, bilingualism, and socio-economic status. Musical ability was not, however, clearly related to inhibitory control and was unrelated to set-switching behavior. Such mixed results from this group show that the extra-musical gains associated with musical ability are not limited to auditory processes, but rather to specific aspects of EFs. This supports a process-specific, but modality-general relationship between musical experience and non-musical aspects of cognition, hereby also bolstering the potential of near and far transfer.

Far Transfer The hypothesis that music training enhances EFs assumes that far transfer of cognitive skills takes place as a result of training; however, far transfer has not been reliable across studies (Sala & Gobet, 2017b). On one hand, cross-sectional studies comparing musicians and non-musicians have shown positive effects of EF: Adult musicians perform better on measures of cognitive flexibility, working memory, and verbal fluency, and musically trained children also perform better on behavioral and fMRI indices of verbal fluency, rule representation, and task switching (Zuk, Benjamin, Kenyon, & Gaab, 2014). On the other hand, cross-sectional studies are still limited by the fundamental possibility

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

272    psyche loui and rachel e. guetta that results may be due to similar confounds as the association studies, such as differences in parental education, socio-economic status (although these were mostly controlled for in the previous study), or some aspect of exposure in the home environment that is outside of the experimenter’s control, as well as pre-existing differences before initiating training. Long-term differences in EF performance, only after controlling for these potential confounds, would provide a convincing basis for the possibility of far transfer.

Longitudinal Studies on Far Transfer Longitudinal studies aim to eliminate these confounds, and the randomized controlled trial is still hailed as the gold standard for such experimental designs. In that regard, some longitudinal studies do provide support for music to EF transfer. Several longitudinal studies have tested the effects of music lessons on IQ. Preschool children who received weekly music training for six months showed higher gains on performance IQ tests than musically untrained counterparts, with effects being observable as early as the age of 3 (Gromko & Poorman, 1998). Still, some of these extra-musical gains could have been attributed to non-musical factors such as time spent with the class and with the instructor, as these were not given in the no-treatment control group. Thus, an active control group is an important improvement to the design of these longitudinal studies. A 2004 longitudinal study tested the relationship between music lessons and general intelligence, here IQ (Schellenberg, 2004). The study assigned 144 children to either music lessons on keyboard or voice, or to control groups with either drama lessons or no lessons. Children in the two music groups exhibited greater increases in full-scale IQ from preto post-lessons, as measured by the WISC-III (Wechsler, 1991). Although the effect was fairly small, the demonstrated enhancements generalized across all IQ subtests, index scores, and standardized measures of academic achievement. Further, the drama group exhibited improvements in measures of social behavior that were not evident amongst the music group. Here, the presence of active control groups provides more substantial evidence for the possibility of far transfer.

Behavioral Changes and Neural Mechanisms In addition to a drama lesson control group, other studies have compared music training against sports and visual art training as active control groups. One study compared the effects of two interactive computerized training programs in music and visual art on preschool children (Moreno et al., 2011). Children in the music group showed enhanced performance on verbal intelligence measures after only 20 days of training. Furthermore, this boosted performance was positively correlated with changes in event-related potential (ERP) measures during an executive-function task (the go/no-go task, requiring cognitive control and inhibition), here demonstrating far transfer. Such longitudinal

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   273 studies with randomized, active control groups provide the most impressive evidence of the far transfer effects of music to extra-musical gains. In another longitudinal behavioral and ERP study, Habibi and colleagues (2016) compared children in music training, children in sports training, and a no-training matched control group. Children with musical training showed an improvement in their ability to detect auditory changes, as measured by cortical auditory evoked potentials to musical notes after one year of training. Specifically, the P1 amplitude, an ERP measure of auditory cortical activity, decreased significantly for all three groups, though with the largest decrease in the music group from baseline to year 2 (Habibi, Cahn, Damasio, & Damasio, 2016). A particularly robust difference between the three groups is the decrease in P1 amplitude and latency in the music group elicited by piano tones in the passive task. As decreased P1 amplitude and latency is observed in adults, these results may suggest accelerated maturity of auditory processing as a result of music training. Combining cross-sectional and longitudinal data in a behavioral and fMRI study in children and adults, Ellis and colleagues showed that musically trained subjects were superior at melodic discrimination, with the number of hours of practice predicting the behavioral improvement. Interestingly, the underlying changes in brain activity involved increased leftward asymmetry in the supramarginal gyrus (SMG). Longitudinal fMRI data showed changes in activity of the left SMG during melodic discrimination that correlated with hours of practice, after controlling for age and previous training (Ellis, Bruijn, Norton, Winner, & Schlaug, 2013). As the left SMG is a region implicated in short-term auditory working memory, these training-related changes in left SMG activity may suggest improved working memory function over time, by co-opting brain areas that are otherwise involved in systems that are not normally engaged for music. It is worth noting that while Moreno et al. showed transfer to a non-auditory task, Habibi et al. and Ellis et al. showed effects of long-term training on neural processing of sounds, which did not involve transfer per se. Nevertheless, the neural mechanisms that changed as a result of training, that is, the left SMG and the neural generators of the P1, may be relatively domain-general, respectively subserving working memory and auditory processing more generally. The combined use of neuroimaging, electrophysiology, and behavioral tasks is fruitful for investigating transfer effects of musical training, as it provides clues as to the underlying neural mechanism behind transfer. The evolution of functional neural signatures over the course of longitudinal studies may be informative not only of how music training affects the brain, but also of how neural processes develop more generally throughout the lifespan.

Negative Findings Studies reviewed thus far have reported positive transfer effects for near transfer, and more limited but nevertheless successful results on far transfer. However, not all reports have been positive, and the effect sizes of far transfer have been small, as shown by a  recent meta-analysis of the far transfer effects of musical training (Sala & Gobet,

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

274    psyche loui and rachel e. guetta 2017a, b). Mehr and colleagues found no reliable evidence for non-musical cognitive benefits from brief preschool music lessons (Mehr, Schachner, Katz, & Spelke, 2013). Preschool children were either given music classes, arts instruction, or no lessons. After six weeks, the participants were assessed in four distinct cognitive areas in which older arts students have been reported to excel: spatial-navigational reasoning, visual form analysis, numerical discrimination, and receptive vocabulary. At first, music class participants showed greater spatial-navigational ability than those in the visual arts class, while children from the visual arts class showed greater visual form analysis ability than children from the music class. However, the researchers were unable to replicate this trend. In the end, the children who were provided with music classes performed no better overall than those with visual arts or no classes. These findings demand caution in interpreting other positive findings for enhanced executive functioning as a result of music instruction. It may be important to note, however, that the brief training sessions from this study do not readily compare to long-term musical training. Furthermore, the selection of transfer tasks needs to take into account the underlying mechanism that could lead to transfer.

Conclusions and Implications While the popularized Mozart Effect is highly confounded, the benefits of long-term musical training on EFs seem to be promising. Music may also have protective effects against age-related hearing loss: For instance, oscillatory neural activity of older adults is less flexible to speech-paced rhythms, especially during focused attention (Henry, Herrmann, Kunke, & Obleser, 2017). While neural entrainment to speech is disrupted in older age, it may be possible that extended music lessons, which bolster speech perception at a younger age, can protect against some of this disintegration later in life (WhiteSchwoch, Carr, Anderson, Strait, & Kraus,  2013). Thus, further understanding the influences of musical training on executive function is crucial, as our ability to flexibly manipulate mental information is not only necessary for successful functioning in everyday life, but also has implications throughout our lifetime.

Music and Creativity While executive function pertains to the ability of the cognitive system to work with conflicting constraints, creativity pertains to relatively unconstrained thought processes. Thinking “outside of the box” is a foremost marvel of the human mind. The ability to be creative, or to produce output that is at once novel and unexpected, yet useful and appropriate, requires some domain-specific knowledge (Csikszentmihalyi,  1996; Sternberg & Lubart, 1999; Sternberg et al., 2005). While the exact mechanisms contributing to the creative processes are still unknown, there is evidence that creativity relies

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   275 on real-time contributions of multiple constituent mental processes (Goldenberg, Mazursky, & Solomon, 1999). These mental processes involve selective attention and stream segregation, long-term and autobiographical as well as working memory, idea generation and evaluation, and expectation and prediction, as well as the ability to switch between these processes. Creativity, then, incorporates some of the fundamental EFs, such as attention and mental flexibility. Creativity does differ from other components of executive function, however, in its form of thought. While executive function entails the ability to engage in deliberation and strongly constrained thinking, creative thinking has fewer deliberate constraints (Christoff, Irving, Fox, Spreng, & AndrewsHanna, 2016). And it is due to this relatively unconstrained nature that the study of creativity has been more elusive and imprecise. In a creativity task there is no single correct answer, yet there are more and less creative answers. The standard definition of creativity is bipartite: for a work to be considered creative, it has to be both novel and useful/appropriate (Runco & Jaeger, 2012). Historical and empirical musicologists have long been interested in finding novelty in pieces of music relative to their context. This is important both for better understanding of existing works, and for the possibility of generating novel works (Collins, 2016). In contrast, the usefulness of music is difficult, if not impossible, to define. Most might expect that for artistic domains including music, the concept of usefulness in music opens up more questions than it answers, and is therefore not a good definition at all. Appropriateness is easier to define as being within the stylistic or genre-based context, for example, sonata form, variations on a theme, or classical versus jazz versus experimental music improvisation. To be considered appropriate, one has to stay primarily within an expected genre, or within the style. In that regard, creativity in music must be considered within its historical and stylistic context. This dependence on the environment applies to creativity more generally, which must be considered relative to the domain, the field, and the creator (Csikszentmihalyi, 1996).

Musical Improvisation as a Model of Creativity Psychological studies on creativity and music have considered creativity as a set of cognitive functions. The study of musical improvisation offers a window into creativity, which is predicated upon novel combinations of existing skills (Limb & Braun, 2008). A systematic literature review of neuroscience of musical improvisation shows shared neural networks between musical improvisation and other forms of creativity, such as artistic or scientific creativity. Generally, a network of prefrontal regions is involved in musical improvisation as well as every other form of creativity (Beaty, 2015). At the same time, there are also some differences between music, artistic, and scientific creativity (e.g., insight problems). As shown in a meta-analysis of fMRI studies on creativity (Boccia, Piccardi, Palermo, Nori, & Palmiero, 2015), musical creativity often involves auditory-motor networks, such as the supplementary motor areas, in addition to other prefrontal regions that are consistently active in creativity studies.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

276    psyche loui and rachel e. guetta Improvisation training is fundamentally cognitive training (Biasutti, 2015). Teaching improvisation in the classroom can not only increase creativity among students (Norgaard, 2017), it may also inform cognitive theories of creativity and improvisation (Norgaard, Spencer, & Montiel, 2013). A critical review of PET, fMRI, and EEG studies on creativity showed that although there is some convergence on the importance of the prefrontal cortex, there are nevertheless many holes in the literature that would benefit from further investigations (Sawyer,  2011). A systematic understanding of musical improvisation, that combines multiple methods in musical information retrieval, psychophysics and psychometrics, and cognitive neuroscience will be useful for a thorough understanding of what creativity means, and how to foster creativity in pedagogy.

Neuroimaging Studies of Music and Creativity (For a detailed overview of neuroimaging studies on improvisation, see Chapter 20). With the advent of fMRI and the engineering of MR-compatible musical instruments (Hollinger, Steele, Penhune, Zatorre, & Wanderley, 2007), it became possible to observe functional correlates of human brain activity during jazz improvisation, comparing it to a closest non-improvised control condition. The first fMRI study on jazz improvisation compared improvised versus overlearned conditions in novel melodies and musical scales (Limb & Braun, 2008). Results showed many loci of activations, with a general trend of more activity in mesial regions during improvisation, especially in the prefrontal cortex. Another fMRI study looked at piano improvisation as an auditory-motor sequencing problem (Bengtsson, Csikszentmihalyi, & Ullén, 2007). This study also compared the task of improvisation against the task of reproducing a previously created improvisation from memory. The most significant difference in brain activity between improvisation and reproduction conditions was again found in the pre-supplementary motor area (pre-SMA); however, the improvisation condition also showed higher activity in dorsolateral prefrontal cortex and dorsal premotor ­cortex. Together, results are consistent with Limb and Braun (2008) in identifying a network of interacting prefrontal areas active during improvisation. Similarly, another fMRI study on musical i­ mprovisation (Berkowitz & Ansari, 2008) tested similar experimental and control conditions of improvisation versus reproduction, but with the additional comparison between rhythmic and melodic improvisation and control conditions. Results showed more activations as well as deactivations for melodic improvisation relative to rhythmic improvisation, with effects centered around motor planning regions in the frontal lobe, specifically the premotor cortex. Freestyle rap is another form of musical creativity that involves heavy use of rhythmic improvisation as opposed to melodic improvisation. One fMRI study compared brain activity during spontaneous freestyle rap to conventional rehearsed performance (Liu et al., 2012). During the freestyle condition, rap artists showed an upregulation of mesial regions (presumably important for idea generation and/or self-referential processes) and downregulation of lateral regions which reflect rule learning. The mesial regions are part of a larger group of regions that are intrinsically correlated in their activity, together

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   277 known as the Default Network (Fox & Raichle, 2007). In contrast, the lateral regions, such as the dorsolateral prefrontal cortex, are part of a larger network consistently active during executive functions (Executive Control Network) (Shirer, Ryali, Rykhlevskaia, Menon, & Greicius, 2012). Although most studies in musical creativity have shown improvisation-related activity in prefrontal regions (including the medial prefrontal cortex, the dorsolateral prefrontal cortex, the cingulate cortex, and the pre-SMA), other studies have observed activity in the classic language and emotion networks. One study showed activity in the inferior frontal gyrus, also known as Broca’s area, while jazz musicians were interacting by “trading fours” (Donnay, Rankin, Lopez-Gonzalez, Jiradejvong, & Limb, 2014) and improvising to communicate a specific positive or negative emotional intent (McPherson, Barrett, Lopez-Gonzalez, Jiradejvong, & Limb, 2016). Broca’s area is also the known neural generator of the ERAN, an electrophysiological marker for the processing of musically unexpected events (Maess, Koelsch, Gunter, & Friederici, 2001), and recent work has shown a larger ERAN in jazz improvising musicians, suggesting increased involvement of Broca’s area following improvisation training (Przysinda, Zeng, Maves, Arkin, & Loui, 2017). Functional connectivity from fMRI results also showed that duration of improvisation experience was negatively correlated with fronto-parietal areas in the executive control network, but positively correlated with functional connectivity between areas within the auditory-motor network (Pinho, De Manzano, Fransson, Eriksson, & Ullén, 2014). Based on these recent studies, it appears that areas important for auditorymotor functions, including the language network, are as intrinsic to musical creativity as the aforementioned default and executive control networks.

Data-Driven Correlates of Creativity While the literature has generally defined creativity as the tendency to produce novel and appropriate output, the determination of creativity in the output has generally required the consensual assessment of multiple raters (Amabile,  1982), a relatively time-consuming technique that can be sensitive to bias on the part of the raters. With recent advances in musical information retrieval, it may be fruitful to relate the definition of creativity to information that can be gleaned from the creative output itself. Since people who are more creative tend to produce more fluent, original, and flexible output (Silvia, Beaty, & Nusbaum, 2013), it may be useful to operationally define creativity as fluent production of high information content. Information theory includes many possible measures, the first of which is entropy, first defined by Shannon (1948) and subsequently used in neuroscience (Friston, 2010) and in music cognition (Hansen & Pearce,  2014). Information retrieval techniques such as the musical information retrieval toolbox (Lartillot & Toiviainen, 2007) now have relatively data-driven measures of musical information content such as entropy, as well as harmonic movement, spectral centroid change, and onset detection. Applying these types of information retrieval techniques to musical performances may yield useful information about the player’s creativity.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

278    psyche loui and rachel e. guetta A new and potentially fruitful approach comes from relating entropy from musical production to brain structure to reveal brain–behavior correlations, an approach beginning to be adopted in recent studies (Arkin, Przysinda, Pfeifer, Zeng, & Loui, 2019; Zeng, Przysinda, Pfeifer, Arkin, & Loui,  2018). As data-driven approaches of understanding become increasingly sophisticated, it has become even more important to continue relating the studies of music and the brain to find unifying approaches to data that might inform both fields. We can move toward biomarkers of creativity by having rigorously defined outcome measures and relating these outcome measures to data from the brain. In this way, music offers a promising approach for the most useful way to conceptualize creativity.

Personality and Cognitive Profiles of Creative Musicians Examining personality and cognitive profiles of creative musicians has also lent interesting insight into the neuropsychological study of creativity. Jazz musicians tend to be more creative, as measured by the Divergent Thinking Test (Benedek, Borovnjak, Neubauer, & Kruse-Weber, 2014). These differences are not only domain-specific to music, but also generalize to domain-general indicators of divergent thinking outside the musical realm. Kleinmintz and colleagues (Kleinmintz, Goldstein, Mayseless, Abecasis, & Shamay-Tsoory, 2014) also showed higher divergent thinking scores and alternative uses task performance in improvising musicians, with the mediating effect of idea evaluation. Specifically, the evaluation of creativity mediated the effect. Furthermore, Przysinda and colleagues (2017) showed higher scores on the divergent thinking task among jazz musicians. In terms of personality measures, Benedek and colleagues (2014) showed different personality profiles in jazz and improvisational musicians. Specifically, jazz and improv musicians are more open to experience, as are jazz listeners (Rentfrow & Gosling, 2003). This is consistent with the creativity literature in general: there is a consistent statistical association between creativity and openness to experience (McCrae, 1987). Although this association is well replicated, the direction of causality is unknown. Perhaps being open to experience makes you more creative; perhaps being creative makes you more open to experience, or perhaps both are due to some other variable(s). Hopefully the neurocognitive knowledge of creativity will inform better music making in performance and in the classroom (Biasutti, 2015), while improving understanding of how musical knowledge might transfer to extra-musical outcomes in other areas of cognition.

Conclusions Following our adopted definition in the beginning of this chapter of music as organized sound, we have now seen that organized sounds are generated in many situations that are barely musical, if at all. For example, experimental stimuli in an auditory research

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   279 study are intentionally organized sounds that vary in their musicality. The extent to which these intentional sounds become perceived as music may involve our attention toward its context and the many elements of the musical surface. The literature we reviewed also shows that fundamentally, the human mind is “an anticipator, an expectation-generator” (Dennett, 2008). As expectation shapes all that we experience, how we perceive music also depends on our expectation. Music interfaces with many aspects of cognition: from attention, which is linked to stimulus processing and selection, to creativity, which involves generating new stimuli as well as reacting to them. At another level, music requires and influences executive function, which is the collection of our brain’s central executive processes, which we must deploy to interact with music. Open questions pertain to the intersection of these three sections: Does better executive function give rise to better creativity? Or are the two constructs inversely related? How does attention to specific elements of the musical surface enable or enhance creativity? Understanding these seemingly disparate aspects of cognitive function as interrelated can drive the formulation of new and interesting research questions, which might inform our understanding of music as well as cognitive science more generally.

References Amabile, T. M. (1982). Social psychology of creativity: A consensual assessment technique. Journal of Personality and Social Psychology 43(5), 997–1013. Arkin, C., Przysinda, E., Pfeifer, C., Zeng, T., & Loui, P. (2017). Information content predicts creativity in musical improvisation: A behavioral and voxel-based morphometry study. Under review. Arkin, C., Przysinda, E., Pfeifer, C., Zeng, T., & Loui, P. (2019). Grey matter correlates of creativity in musical improvisation. Under review. Beaty, R. E. (2015). The neuroscience of musical improvisation. Neuroscience & Biobehavioral Reviews 51, 108–117. Benedek, M., Borovnjak, B., Neubauer, A. C., & Kruse-Weber, S. (2014). Creativity and personality in classical, jazz and folk musicians. Personality and Individual Differences 63, 117–121. Bengtsson, S. L., Csikszentmihalyi, M., & Ullén, F. (2007). Cortical regions involved in the generation of musical structures during improvisation in pianists. Journal of Cognitive Neuroscience 19, 830–842. Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor sequences: The neural correlates of musical improvisation. NeuroImage 41(2), 535–543. Bharucha, J. J., & Stoeckig, K. (1986). Reaction time and musical expectancy: Priming of chords. Journal of Experimental Psychology: Human Perception and Performance 12(4), 403–410. Biasutti, M. (2015). Pedagogical applications of the cognitive research on music i­ mprovisation. Frontiers in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.00614 Bigand, E., Poulin, B., Tillmann, B., Madurell, F., & D’Adamo, D. A. (2003). Sensory versus cognitive components in harmonic priming. Journal of Experimental Psychology: Human Perception and Performance 29(1), 159–171. Boccia, M., Piccardi, L., Palermo, L., Nori, R., & Palmiero, M. (2015). Where do bright ideas occur in our brain? Meta-analytic evidence from neuroimaging studies of domain-specific creativity. Frontiers in Psychology 6, 1195. Retrieved from https://doi.org/10.3389/fpsyg.2015.01195

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

280    psyche loui and rachel e. guetta Bregman, A.  S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press. Broadbent, D. E. (1982). Task combination and selective intake of information. Acta Psychologica 50(3), 253–290. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America 25(5), 975–979. Christoff, K., Irving, Z. C., Fox, K. C., Spreng, R. N., & Andrews-Hanna, J. R. (2016). Mindwandering as spontaneous thought: A dynamic framework. Nature Reviews Neuroscience 17(11), 718–731. Collins, D. (2016). The act of musical composition: Studies in the creative process. New York: Routledge. Csikszentmihalyi, M. (1996). Creativity: Flow and the psychology of discovery and invention. New York: HarperCollins. De Freitas, J., Liverence, B. M., & Scholl, B. J. (2014). Attentional rhythm: A temporal analogue of object-based attention. Journal of Experimental Psychology: General 143(1), 71–76. Dennett, D. C. (2008). Kinds of minds: Toward an understanding of consciousness. New York: Basic Books. Deouell, L. Y., Deutsch, D., Scabini, D., Soroker, N., & Knight, R. T. (2007). No disillusions in auditory extinction: Perceiving a melody comprised of unperceived notes. Frontiers in Human Neuroscience 1, 15. Retrieved from https://doi.org/10.3389/neuro.09.015.2007 Deutsch, D. (1974). An illusion with musical scales. Journal of the Acoustical Society of America 56(S1). Retrieved from https://doi.org/10.1121/1.1914084. Diamond, A. (2013). Executive functions. Annual Review of Psychology 64, 135–168. Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D. (2017). Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81(Part B), 181–187. Donnay, G. F., Rankin, S. K., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2014). Neural substrates of interactive musical improvisation: An fMRI study of “trading fours” in jazz. PLoS ONE 9, e88665. Ellis, R. J., Bruijn, B., Norton, A. C., Winner, E., & Schlaug, G. (2013). Training-mediated leftward asymmetries during music processing: A cross-sectional and longitudinal fMRI analysis. NeuroImage 75, 97–107. Escoffier, N., & Tillmann, B. (2008). The tonal function of a task-irrelevant chord modulates speed of visual processing. Cognition 107(3), 1070–1083. Fox, M. D., & Raichle, M. E. (2007). Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nature Reviews Neuroscience 8(9), 700–711. Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience 11, 127–138. Fujioka, T., Ross, B., & Trainor, L. (2015). Beta-band oscillations represent auditory beat and  its metrical hierarchy in perception and imagery. Journal of Neuroscience 35(45), 15187–15198. Goldenberg, J., Mazursky, D., & Solomon, S. (1999). Creative sparks. Science 285(5433), 1495–1496. Grahn, J.  A. (2012). See what I hear? Beat perception in auditory and visual rhythms. Experimental Brain Research 220(1), 51–61. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Gromko, J. E., & Poorman, A. S. (1998). The effect of music training on preschoolers’ spatialtemporal task performance. Journal of Research in Music Education 46(2), 173–181.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   281 Habibi, A., Cahn, B. R., Damasio, A., & Damasio, H. (2016). Neural correlates of accelerated auditory processing in children engaged in music training. Developmental Cognitive Neuroscience 21, 1–14. Hafter, E. R., & Saberi, K. (2001). A level of stimulus representation model for auditory detection and attention. Journal of the Acoustical Society of America 110, 1489. Retrieved from https:// doi.org/10.1121/1.1394220 Hafter, E. R., Sarampalis, A., & Loui, P. (2008). Auditory attention and filters. In W. Yost (Ed.), Auditory perception of sound sources (pp. 115–142). Dordrecht: Springer. Hafter, E.  R., Schlauch, R.  S., & Tang, J. (1993). Attending to auditory filters that were not stimulated directly. Journal of the Acoustical Society of America 94, 743–747. Retrieved from https://doi.org/10.1121/1.408203 Hansen, N. C., & Pearce, M. T. (2014). Predictive uncertainty in auditory sequence processing. Frontiers in Psychology 5, 1052. Retrieved from https://doi.org/10.3389/fpsyg.2014.01052 Henry, M. J., Herrmann, B., Kunke, D., & Obleser, J. (2017). Aging affects the balance of neural entrainment and top-down neural modulation in the listening brain. Nature Communications 8, 15801. doi:10.1038/ncomms15801 Henry, M. J., Herrmann, B., & Obleser, J. (2015). Selective attention to temporal features on nested time scales. Cerebral Cortex 25(2), 450–459. Hollinger, A., Steele, C., Penhune, V., Zatorre, R., & Wanderley, M. (2007). fMRI-compatible electronic controllers. In Proceedings of the 7th international conference on New Interfaces for Musical Expression (pp. 246–249). New York: ACM. doi:10.1145/1279740.1279790 James, T., Przysinda, E., Sampaio, G., Woods, K.  J.  P., Hewett, A., Morillon, B., & Loui, P. (2017). Acoustic effects on oscillatory markers of sustained attention. Presentation at the International Conference on Auditory Cortex. Banff, Canada. James, W. (1890). The principles of psychology. New York: Henry Holt. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review 83(5), 323–355. Jones, M.  R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review 96(3), 459–491. Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stimulusdriven attending in dynamic arrays. Psychological Science 13(4), 313–319. Justus, T., & List, A. (2005). Auditory attention to frequency and time: An analogy to visual local-global stimuli. Cognition 98(1), 31–51. Kleinmintz, O. M., Goldstein, P., Mayseless, N., Abecasis, D., & Shamay-Tsoory, S. G. (2014). Expertise in musical improvisation and creativity: The mediation of idea evaluation. PLoS ONE 9, e101568. Koelsch, S., Gunter, T. C., Friederici, A. D., & Schröger, E. (2000). Brain indices of music processing: Nonmusicians are musical. Journal of Cognitive Neuroscience 12(3), 520–541. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11, 599–605. Lamb, S. J., & Gregory, A. H. (1993). The relationship between music and reading in beginning readers. Educational Psychology 13(1), 19–27. Lartillot, O., & Toiviainen, P. (2007). A Matlab toolbox for musical feature extraction from audio. In Proceedings of the 10th International Conference on Digital Audio Effects (pp. 237–244). Bordeaux, France. Retrieved from http://dafx.labri.fr/main/papers/p237.pdf Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical performance: An fMRI study of jazz improvisation. PLoS ONE 3, e1679. List, A., Justus, T., Robertson, L. C., & Bentin, S. (2007). A mismatch negativity study of local– global auditory processing. Brain Research 1153, 122–133.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

282    psyche loui and rachel e. guetta Liu, S., Chow, H. M., Xu, Y., Erkkinen, M. G., Swett, K. E., Eagle, M. W., . . . Braun, A. R. (2012). Neural correlates of lyrical improvisation: An fMRI study of freestyle rap. Scientific Reports 2, 834. doi:10.1038/srep00834 Lolli, S., Lewenstein, A. D., Basurto, J., Winnik, S., & Loui, P. (2015). Sound frequency affects speech emotion perception: Results from congenital amusia. Frontiers in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.01340 Longuet-Higgins, H. C., & Lee, C. S. (1982). The perception of musical rhythms. Perception 11(2), 115–128. Loui, P., Grent-’T-Jong, T., Torpey, D., & Woldorff, M. (2005). Effects of attention on the neural processing of harmonic syntax in Western music. Cognitive Brain Research 25(3), 678–687. Loui, P., Kroog, K., Zuk, J., Winner, E., & Schlaug, G. (2011). Relating pitch awareness to phonemic awareness in children: Implications for tone-deafness and dyslexia. Frontiers in Psychology 2, 111. Retrieved from https://doi.org/10.3389/fpsyg.2011.00111 Loui, P., & Wessel, D. (2007). Harmonic expectation and affect in Western music: Effects of attention and training. Perception & Psychophysics 69(7), 1084–1092. McCrae, R. R. (1987). Creativity, divergent thinking, and openness to experience. Journal of Personality and Social Psychology 52(6), 1258–1265. McPherson, M.  J., Barrett, F.  S., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C.  J. (2016). Emotional intent modulates the neural substrates of creativity: An fMRI study of emotionally targeted improvisation in jazz musicians. Scientific Reports 6, 18460. doi:10.1038/ srep18460 Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: An MEG study. Nature Neuroscience 4, 540–545. Marmel, F., Tillmann, B., & Dowling, W. J. (2008). Tonal expectations influence pitch perception. Perception & Psychophysics 70(5), 841–852. Mehr, S. A., Schachner, A., Katz, R. C., & Spelke, E. S. (2013). Two randomized trials provide no consistent evidence for nonmusical cognitive benefits of brief preschool music enrichment. PloS ONE 8(12), e82007. Millett, D. (2001). Hans Berger: From psychic energy to the EEG. Perspectives in Biology and Medicine 44(4), 522–542. Moreno, S., Bialystok, E., Barac, R., Schellenberg, E.  G., Cepeda, N.  J., & Chau, T. (2011). Short-term music training enhances verbal intelligence and executive function. Psychological Science 22(11), 1425–1433. Morillon, B., & Baillet, S. (2017). Motor origin of temporal predictions in auditory attention. Proceedings of the National Academy of Sciences 114(42), E8913–E8921. Norgaard, M. (2017). Developing musical creativity through improvisation in the large performance classroom. Music Educators Journal 103(3), 34–39. Norgaard, M., Spencer, J., & Montiel, M. (2013). Testing cognitive theories by creating a  pattern-based probabilistic algorithm for melody and rhythm in jazz improvisation. Psychomusicology: Music, Mind, and Brain 23(4), 243–254. Nozaradan, S., Zerouali, Y., Peretz, I., & Mouraux, A. (2013). Capturing with EEG the neural entrainment and coupling underlying sensorimotor synchronization to the beat. Cerebral Cortex 25(3), 736–747. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience 6, 674–681. Patel, A. D. (2011a). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology 2, 142. Retrieved from https://doi.org/10.3389/ fpsyg.2011.00142

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

music and attention, executive function, and creativity   283 Patel, A. D. (2011b). Why does musical training benefit the neural encoding of speech? A new hypothesis. Journal of the Acoustical Society of America 130, 2398. Retrieved from https:// doi.org/10.1121/1.3654612 Pinho, A. L., De Manzano, O., Fransson, P., Eriksson, H., & Ullén, F. (2014). Connecting to create: Expertise in musical improvisation is associated with increased functional connectivity between premotor and prefrontal areas. Journal of Neuroscience 34(18), 6156–6163. Povel, D.-J., & Essens, P. (1985). Perception of temporal patterns. Music Perception: An Interdisciplinary Journal 2(4), 411–440. Przysinda, E., Zeng, T., Maves, K., Arkin, C., & Loui, P. (2017). Jazz musicians reveal role of expectancy in human creativity. Brain and Cognition 119, 45–53. Purves, D., Cabeza, R., Huettel, S. A., Labar, K. S., Platt, M. L., Woldorff, M. G., & Brannon, E. M. (2008). Cognitive neuroscience. Sunderland: Sinauer Associates. Rauscher, F. H., Shaw, G. L., & Ky, C. N. (1993). Music and spatial task performance. Nature 365(6447), 611. Rentfrow, P. J., & Gosling, S. D. (2003). The do re mi’s of everyday life: The structure and personality correlates of music preferences. Journal of Personality and Social Psychology 84(6), 1236–1256. Runco, M. A., & Jaeger, G. J. (2012). The standard definition of creativity. Creativity Research Journal 24(1), 92–96. Sala, G., & Gobet, F. (2017a). Does far transfer exist? Negative evidence from chess, music, and working memory training. Current Directions in Psychological Science 26(6), 515–520. Sala, G., & Gobet, F. (2017b). When the music’s over: Does music skill transfer to children’s and young adolescents’ cognitive and academic skills? A meta-analysis. Educational Research Review 20, 55–67. Sawyer, K. (2011). The cognitive neuroscience of creativity: A critical review. Creativity Research Journal 23(2), 137–154. Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science 15(8), 511–514. Shannon, C.  E. (1948). A mathematical theory of communication. Bell System Technical Journal 27(3), 379–423. Shinn-Cunningham, B.  G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences 12(5), 182–186. Shirer, W. R., Ryali, S., Rykhlevskaia, E., Menon, V., & Greicius, M. D. (2012). Decoding subjectdriven cognitive states with whole-brain connectivity patterns. Cerebral Cortex 22(1), 158–165. Silvia, P. J., Beaty, R. E., & Nusbaum, E. C. (2013). Verbal fluency and creativity: General and specific contributions of broad retrieval ability (Gr) factors to divergent thinking. Intelligence 41(5), 328–340. Slevc, L. R., Davey, N. S., Buschkuehl, M., & Jaeggi, S. M. (2016). Tuning the mind: Exploring the connections between musical ability and executive functions. Cognition 152, 199–211. Slevc, L. R., & Okada, B. M. (2015). Processing structure in language and music: A case for shared reliance on cognitive control. Psychonomic Bulletin & Review 22(3), 637–652. Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: Selfpaced reading time evidence for shared processing of linguistic and musical syntax. Psychonomic Bulletin & Review 16(2), 374–381. Sternberg, R.  J., & Lubart, T. (1999). The concept of creativity: Prospects and paradigms. In  R.  Sternberg (Ed.), Handbook of creativity 1 (pp. 3–15). Cambridge: Cambridge University Press.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

284    psyche loui and rachel e. guetta Sternberg, R. J., Lubart, T. I., Kaufman, J. C., & Pretz, J. E. (2005). Creativity. In K. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 351–370). Cambridge: Cambridge University Press. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology 12(1), 97–136. Varèse, E., & Wen-Chung, C. (1966). The liberation of sound. Perspectives of New Music 5(1), 11–19. White-Schwoch, T., Carr, K. W., Anderson, S., Strait, D. L., & Kraus, N. (2013). Older adults benefit from music training early in life: Biological evidence for long-term training-driven plasticity. Journal of Neuroscience 33(45), 17667–17674. Woldorff, M. G., Gallen, C. C., Hampson, S. A., Hillyard, S. A., Pantev, C., Sobel, D., & Bloom, F. E. (1993). Modulation of early sensory processing in human auditory cortex during auditory selective attention. Proceedings of the National Academy of Sciences 90(18), 8722–8726. Woldorff, M.  G., & Hillyard, S.  A. (1991). Modulation of early auditory processing during selective listening to rapidly presented tones. Electroencephalography and Clinical Neurophysiology 79(3), 170–191. Woldorff, M.  G., Hillyard, S.  A., Gallen, C.  C., Hampson, S.  R., & Bloom, F.  E. (1998). Magnetoencephalographic recordings demonstrate attentional modulation of mismatchrelated neural activity in human auditory cortex. Psychophysiology 35(3), 283–292. Zeng, T., Przysinda, E., Pfeifer, C., Arkin, C., & Loui, P. (2017). Structural connectivity predicts success in musical improvisation. Under review. Zeng, T., Przysinda, E., Pfeifer, C., Arkin, C., & Loui, P. (2018). White matter connectivity reflects success in musical improvisation. bioRxiv. Zuk, J., Benjamin, C., Kenyon, A., & Gaab, N. (2014). Behavioral and neural correlates of executive functioning in musicians and non-musicians. PLoS ONE 9, e99868.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 13

N eu r a l Cor r el ate s of M usic a n d Emotion Patrik N. Juslin and Laura S. Sakka

Introduction When it comes to explaining the universal attraction of music as a human phenomenon, few aspects loom larger than the emotional responses it arouses. Music listeners may experience anything from startle reflexes and changes in arousal to discrete emotions such as happiness, sadness, interest, and nostalgia—as well as profound aesthetic emotions (Juslin, 2019). Such experiences are the “driving force” behind most people’s engagement with music, and might have far-reaching implications for their well-being and health (e.g., MacDonald, Kreutz, & Mitchell, 2012; Thaut & Wheeler, 2010). When systematic studies of music and emotion finally took off, around the millennium (Juslin & Sloboda, 2001), it was inevitable that neuropsychological research would play a role in that trend. While imaging studies could constrain psychological theorizing, psychological theories could guide imaging studies and help to organize their findings. Coinciding with a reappraisal of the role of emotion in human behavior in the neurosciences (Damasio, 1994), the end of the 1990s saw the first brain imaging studies focusing on emotions in music (Blood, Zatorre, Bermudez, & Evans, 1999). Mapping the neural correlates of emotional responses to music turned out to be more difficult than initially expected, however. Even such a seemingly delimited domain as emotion appears to involve a wide range of subcortical and cortical areas, distributed across the brain (Koelsch, 2014); and unfortunately, the relevant brain regions do not come in neat little packages, which can be interpreted easily by researchers. Hence, to account for the neural correlates of musical emotions could turn out to be one of the great challenges in the neuroscience of music.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

286   patrik n. juslin and laura s. sakka The goal of this chapter is to offer a theoretical and empirical review of studies of the neural correlates of emotional responses to music, carried out over the last thirty-five years. The remainder of the chapter is structured as follows: First, we provide basic definitions and distinctions of the field of musical affect. Second, we present a ­theoretical framework, which could serve to organize the field. Third, we review seventy-eight empirical studies, published between 1982 and 2016. We distinguish ­different empirical approaches in these studies and draw general conclusions based on their results. Finally, we consider the implications of these findings and offer some methodological recommendations for future studies.

Musical Affect: Definitions and Distinctions Emotions belong to the field of affect, which covers a range of phenomena. The ­common and defining feature is valence (i.e., the evaluation of an object, person, or event as being positive or negative). Most researchers also require a certain degree of arousal, in order to distinguish affect from purely cognitive judgments. Accordingly, musical affect could comprise anything from preference (e.g., liking a piece) and mood (a mild, objectless, and long-lasting affective state, e.g., feeling gloomy after hearing sad music in the background all morning) to aesthetic judgment (e.g., rating a composition as valuable as “art”). Most brain studies to date, however, have arguably focused on emotions, as defined by Juslin (2011, p. 114): Emotions are relatively brief, intense, and rapidly changing reactions to potentially important events (subjective challenges or opportunities) in the external or internal environment—often of a social nature—which involve a number of subcomponents (cognitive changes, subjective feelings, expressive behavior, and action tendencies) that are more or less ‘synchronized’ during an emotional episode.

Changes in the intensity, quality, and complexity of an emotion could occur, from moment to moment, and such changes can be captured in terms of shifts along such emotion dimensions as arousal and valence (Russell, 1980). However, emotions may also be analyzed in terms of qualitatively distinct categories (e.g., joy, sadness, awe, ­nostalgia), which remain throughout an episode (Izard, 1977). Both categorical and dimensional approaches receive some support in empirical studies (e.g., HarmonJones, Harmon-Jones, & Summerell, 2017), though we agree with Zentner’s (2010) view that dimensional models are ultimately unable to do justice to the richness or specificity of emotional responses to music. Most researchers in the domain seem to agree that music can influence emotions (for reviews, see Juslin & Sloboda, 2010), so the primary aim of current research is rather to

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   287 understand the nature of this process—how it “works.” In the following section, we describe a framework that can serve to organize and guide research. First, we need to make a distinction between perception and induction of emotions: We may simply perceive (or recognize) an emotion expressed in the music or we may actually feel an emotion in ourselves. The distinction is important, because different psychological processes—and hence different neural substrates—may be involved, depending on the type of process. Whenever practically feasible, it is recommendable to measure multiple emotion components (self-reported feeling, expression, psychophysiology) in order to draw more valid conclusions about the occurrence of an aroused emotion. (If researchers do not find a coherent response in multiple emotion components, there is reason to suspect that “only” perception of emotion has occurred.)

Psychological Mechanisms: A Theoretical Framework To explain emotional responses to music, we need to uncover the psychological mechan­ isms that produce perceived or induced emotion. Broadly speaking, the mechanism refers to those causal processes through which an outcome is brought into being. In the present context, this involves a functional (i.e., psychological) description of what the brain is “doing” in principle (e.g., retrieving a memory). Such a process description at the psychological level must not be confused with the separate question of where in the brain the process is implemented, or with the phenomenological experience it seeks to explain (Dennett, 1987). Several authors have proposed possible mechanisms underlying perception and induction of emotions in music, typically involving one or a few possibilities (see Berlyne, 1971; Clynes, 1977; Juslin, 2001; Langer, 1957; Meyer, 1956; Scherer & Zentner, 2001; Sloboda & Juslin, 2001). Space limitations prevent us from reviewing previous work here, but a parsimonious way to organize current theory is provided by the ICINAS-BRECVEMAC framework, fully described in Juslin (2019) and briefly summarized below.

Emotion Perception The first part of the acronym ICINAS-BRECVEMAC stands for Iconic-IntrinsicAssociative, and refers to three ways in which music carries emotional meaning. Although the case can be made that emotion perception is a more straightforward process than emotion induction, even perceived emotions may need to be decomposed into different subprocesses.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

288   patrik n. juslin and laura s. sakka Complex

Associative coding

communal

Intrinsic coding

Emotions expressed

Basic

personal

Iconic coding

Universal

Culture-specific Cross-cultural specificity

Figure 1.  Multiple-layer conceptualization of musical expression of emotions. Reproduced from Patrik N. Juslin, What does music express? Basic emotions and beyond, Frontiers in Psychology: Emotion Science 4(596), Figure 2, doi: 10.3389/fpsyg.2013.00596 © 2013 Juslin. This work is licensed under the Creative Commons Attribution License (CC BY 3.0). It is attributed to the author Patrik N. Juslin.

Accordingly, based on the seminal distinction made by Dowling and Harwood (1986), Juslin (2013b) proposes that there are three distinct “layers” of musical expression of emotion. Each layer corresponds to a specific type of coding of emotional meaning (see Fig. 1). The core layer is based on iconically coded basic emotions. Icon refers to how music carries emotional meaning based on a formal resemblance between the music and other events that have an emotional tone (such as emotional speech and gesture). This core layer may explain findings of cross-modal parallels (Juslin & Laukka, 2003) and universal recognition of basic emotions (i.e., sadness, happiness, anger, fear, and love/tenderness) in both speech (Bryan & Barrett, 2008) and music (Fritz et al., 2009). The core layer may be extended, qualified, and even modified by two additional layers based on intrinsic and associative coding, respectively, which enable listeners to perceive also more complex or ambiguous emotions. The two additional layers are less cross-­ culturally invariant and depend more on the context and the listener’s individual learning (Juslin, 2019). Intrinsic coding refers to how music carries meaning based on syntactic relationships within the music itself, how one part of the music may “refer” to another part of the music, thus contributing to shifting levels of stability, tension, or arousal (“affective trajectories”; e.g., Spitzer, 2013). Associative coding, finally, refers to how music carries emotional meaning based on a more arbitrary association (e.g., temporal or spatial contiguity); a piece of music can be perceived as expressive of an emotion just because something in the music (e.g., a melodic theme) has been repeatedly linked with other emotionally meaningful events in the past—either through chance or by design (e.g., Wagner’s “Leitmotif ” strategy; see Dowling & Harwood, 1986). To illustrate this further in a musical piece, the overall emotion category or broad “emotional tone” (e.g., sadness) might be specified by iconically coded features (e.g., slow tempo, minor mode, low and often falling pitch contour, legato articulation);

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   289 this basic emotion category is given “expressive shape” by intrinsically coded features (e.g., local structural features such as syncopations, dissonant intervals, and melodic appoggiaturas), creating “tension” and “release,” which contribute to more timedependent and complex nuances of the same emotion category (e.g., sadness vs. hopelessness); to this we add the final and more personal layer of expression (e.g., that the listener associates the piece with a particular person, event, or physical location). It appears plausible that the three sources of perceived emotions—which might occur alone or in combination—involve partly different neural correlates (Juslin, 2019).

Emotion Induction Our main focus in this chapter will be on induced emotion, which appears to be more complex in terms of its neural substrates. Here, a multi-mechanism framework is clearly called for. The second part of the ICINAS-BRECVEMAC acronym refers to nine psychological mechanisms for induction of emotions (listed below), which may be activated by music (and other stimuli). An evolutionary perspective on human perception of sounds suggests that the ­survival of our ancient ancestors depended on their ability to detect patterns in sounds, derive meaning from them, and adjust their behavior accordingly (Juslin,  2013a; cf. Hodges & Sebald, 2011). This behavioral function can be achieved in a multitude of ways, reflecting the phylogenetic origin of our emotions. The human brain did not develop from scratch. It is the result of a long evolutionary process, during which newer brain structures were gradually imposed on older structures (Gärdenfors,  2003). Brain circuits are laid out like the concentric layers of an onion, functional layer upon functional layer. One consequence of this arrangement, which is the result of natural selection rather than design, is that emotion can be evoked at multiple levels of the brain (Juslin, 2019). Hence, the first author of this chapter has postulated a set of induction mechanisms involving (more or less) distinct brain networks, which have developed gradually and in a specific order during evolution—from simple reflexes to complex judgments. Different mechanisms rely on different kinds of mental representation (e.g., associative, analogical, sensorimotoric), which serve to guide future action. All mechanisms have in common that they can be triggered by a “musical event” (broadly defined as music, listener, and context). The mechanisms are: • Brainstem reflex, a hard-wired attention response to subjectively “extreme” values of basic acoustic features, such as loudness, speed, and timbre (e.g., Davis, 1984); you may become startled and surprised by the loud beginning of a rock song ­during a live concert. • Rhythmic entrainment, a gradual adjustment of an internal body rhythm, such as heart rate, towards an external rhythm in the music (e.g., Harrer & Harrer, 1977); you may experience excitement when your heart rate is becoming gradually

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

290   patrik n. juslin and laura s. sakka s­ ynchronized with a captivating and slightly faster rhythm in a piece of techno music at a nightclub. • Evaluative conditioning, a regular pairing of a piece of music and other positive or negative stimuli leading to a conditioned association (e.g., Blair & Shimp, 1992); you may feel happy when you happen to hear a song that has repeatedly occurred in festive contexts previously. • Contagion, an internal “mimicry” of the perceived voice-like emotional expression of the music (e.g., Juslin, 2001); you may experience sadness when you hear a slow, quiet, low-pitched performance of a classical piece on the cello, featuring much vibrato and rubato. • Visual imagery, inner images of an emotional character conjured up by the listener through a metaphorical mapping of the musical structure (Osborne, 1980); you may become relaxed when you indulge in mental images of a landscape suggested by a piece of “new-age” music. • Episodic memory, a conscious recollection of a particular event from the listener’s past that is “triggered” by the music (Baumgartner,  1992); you may experience ­nostalgia when a song evokes a vivid personal memory from the specific time you met your current partner in life. • Musical expectancy, a response to the gradual unfolding of the syntactical structure of the music, and its expected or unexpected continuations (Meyer, 1956); you may feel anxious due to uncertainty created by phrases without a clear tonal center in an “avant-garde” piece. • Aesthetic judgment, a subjective evaluation of the aesthetic value of the music, based on an individual set of weighted criteria (Juslin, 2013a); you may take pleasure in the exceptional beauty of a Bach composition, or may admire the exceptional skills of a great performer. In addition to these eight mechanisms, music can also arouse emotions through the default mechanism for induction of emotions: Cognitive goal appraisal (Scherer, 1999). You may become annoyed when a neighbor plays music late at night, blocking your goal of going to sleep. Cognitive appraisal appears less important in musical settings, however (Juslin, Liljeström, Västfjäll, Barradas, & Silva, 2008). For further elaboration and predictions for each mechanism, see Juslin (2019). One implication of the framework is that before one can understand an emotion in any given situation, it is necessary to know which of these mechanisms is in operation. This is because each mechanism has its own process characteristics, in terms of information focus, key brain regions, degree of cultural impact and learning, ontogenetic development, induced emotions, induction speed, availability to consciousness, dependence on musical structure, and so forth. Armed with these theoretical principles of music and emotion, we are ready to take a look at the empirical work carried out to date. Our review will be restricted to studies

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   291 that explicitly focus on musical affect. (Aesthetic responses are reviewed in Chapter 15, this volume.)

Review of Empirical Studies General Overview In this section, we summarize seventy-eight neuropsychological studies, published between 1982 and 2016 (see Appendix table). Studies have been grouped with regard to methodology: PET/fMRI (38 studies, 49 percent), EEG (22 studies, 28 percent), lesions (16 studies, 20 percent), and dichotic listening (2 studies, 3 percent). They are described in terms of listeners, musical stimuli, contrast/design, method, main findings, and type of affect (e.g., measuring induced vs. perceived emotions; categories, dimensions, preferences). The categorization concerning induced vs. perceived emotion is not entirely straightforward, because brain studies do not always distinguish the processes in the design. (Previous reviews of the field have tended to inter-mix studies that focus on different aspects, induced vs. perceived emotion.) Sample size varies depending on method—PET/fMRI (M = 16.31), EEG (M = 32.00), lesions (M = 14.44), and dichotic listening (M = 18.00)—but tends to be relatively small overall. Note that PET/fMRI and EEG studies have focused mostly on induced emotion, whereas lesion and dichotic listening studies have focused mostly on perceived emotion. Blood flow studies have used mostly fMRI (as opposed to PET) and “real” (as opposed to synthesized) music, and have mostly adopted dimensional (66 percent) as opposed to discrete (34 percent) approaches to emotion. EEG studies have also used mostly “real” music—but have adopted dimensional (34 percent) and ­discrete (31 percent) approaches to a roughly equal degree. Lesion studies have (in ­contrast to other studies) used mainly synthesized music, and have mostly studied discrete emotions (75 percent), rather than dimensions (38 percent). Such differences between studies that use different methods should clearly be kept in mind when ­interpreting the overall results.

Empirical Approaches The “contrast” and “emotion” columns in the appendix table are suggestive of the kind of empirical approach adopted in the study. Some early studies tended to use an openended exploratory approach, which simply presents listeners with supposedly “emotional” music, to see which regions might be affected. Although such an approach was

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

292   patrik n. juslin and laura s. sakka defensible in the early stages, it makes it difficult to interpret the results (e.g., “It is not possible to disentangle the different subcomponents of the activation due to limitations of this experimental design,” Alfredson, Risberg, Hagberg, & Gustafson, 2004, p. 165). Thus, for instance, it may not be clear whether the study has measured perceived or induced emotion, in the absence of control conditions or converging measures. We identify at least five possible approaches in the neuropsychological study of emotions, which can serve different aims. These have been adopted, implicitly or explicitly, in music studies to highly varying degrees. We briefly summarize these approaches, before looking closer at the actual data. 1.  A first approach appears to serve mainly to demonstrate that stimuli do arouse emotions by comparing the results to previous studies of emotions. Although most musicians and listeners would seem to take the emotional powers of music for granted, it has been the matter of some controversy whether music really evokes emotions (Kivy, 1990). A landmark study by Blood and Zatorre (2001) revealed—for the first time—that pleasurable responses to music influence “core” regions of the brain already linked to emotion, such as the amygdala, the hippocampus, and the ventral striatum. The demonstration of blood-flow changes in such regions appeared to make musical emotions more “real,” in the eyes of some observers. But data of this kind were over-sold, sometimes: A lot was made of the finding that enjoyment of music involves the same “reward circuits” in the brain as other forms of pleasure such as food, sex, and drugs (e.g., the nucleus accumbens); yet this discovery is not that surprising. It would have been far more surprising to discover unique “reward circuits” only for music. The major conclusion of this approach is that “the brain areas affected by emotions to music are similar to those reported in other brain studies of emotion.” 2.  A second approach speaks to the previously discussed distinction between ­perceived and induced emotions. A meta-analysis of PET and fMRI studies of perception and induction of emotion in general (outside music) by Wager et al. (2008) concluded that the two processes involve peak activations of different brain regions, supporting the idea that these are distinct processes. Some authors argue that the processes can be distinguished in terms of prefrontal activation, such that perceived emotion activates mainly the right hemisphere (regardless of the emotion) whereas evoked emotion is lateralized according to valence: positive emotions in the left hemisphere, negative in the right (e.g., Blonder, 1999; Davidson, 1995). To the best of our knowledge, no music study thus far has directly contrasted perception and induction of emotion, but attempts to interpret data along those lines have been made (Juslin & Sloboda, 2001, p. 456). We review further evidence below. The preliminary conclusion of this approach is that “perception and induction of emotions may involve different patterns of brain activation.”

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   293 3.  A third approach, already hinted at above, aims mainly to discriminate neural ­patterns of affective responses with regard to their valence (positive/negative). This approach has been adopted by several studies in the general emotion field. For instance, Chikazoe and colleagues (Chikazoe, Lee, Kriegeskorte, & Anderson, 2014) were able to find particular patterns with significant correlations to the degree of positive or negative valence experienced by subjects. A similar approach is often used in music. In fact, in our estimation, the use of an explicit (“positive vs. negative,” “pleasant vs. unpleasant”) or implicit (“happy vs. sad,” “consonant vs. dissonant”) valence dimension is the most common approach in blood-flow studies. Several studies indicate that positive affect is handled in the left hemisphere, whereas negative affect is handled in the right (see Altenmüller, Schurmann, Lim, & Parlitz, 2002; Daly et al.,  2014; Flores-Gutiérrez et al.,  2007; Schmidt & Trainor,  2001; Tsang, Trainor, Santesso, Tasker, & Schmidt, 2001). Not all of the studies seem to follow this pattern, however—at least on first view. A problem is that in some cases, it is difficult to know for sure whether a study has measured perceived or evoked emotion, since multi-component indices were not used. For instance, it is an open question whether ratings of pleasantness of music in some studies are just that (ratings of the stimuli) or whether they index feelings of pleasure. If there is insufficient control over which process is actually elicited in studies, this can explain the mixed findings. We submit that the results suggest some degree of specificity in terms of valence, but the nature of these patterns and their interpretation remain contested. Yet, a preliminary conclusion of this approach is that “neural correlates can distinguish the valence of musically aroused affect.” 4.  A fourth approach seeks to obtain links between discrete emotions and neural structures. This is part of an ongoing debate about whether there is emotion-specificity in responding more generally. Some neuroscientists claim to have been able to distinguish neural activity in terms of discrete emotions (see Damasio et al.,  2000; Kassam, Markey, Cherkassky, Loewenstein, & Just, 2013; Murphy, Nimmo-Smith, & Lawrence, 2003; Saarimäki et al., 2016). We should clearly acknowledge, however, that the hypothesis of emotion-specific activation remains controversial. A recent review failed to obtain any evidence that discrete emotions can be consistently localized to distinct brain regions (cf. ClarkPolner, Wager, Satpute, & Barrett, 2016). This is sometimes cited as evidence against a discrete emotions approach. However, the very same review failed to obtain evidence of specific regions linked with dimensions such as valence also! Hence, the authors argue that the localization hypothesis for affective states—whether discrete or dimensional—is flawed in general. Previous neuropsychological studies may, indeed, have been too eager to localize particular emotions in specific parts of the brain. Some tendencies in a similar direction may be found in music research also—for instance, linking the amygdala to fear

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

294   patrik n. juslin and laura s. sakka perception (Peretz, 2001), and the hippocampus to tender emotions (Koelsch, 2014), although both these structures are clearly involved in a much wider range of emotions. There is a risk here that neuroscientists “claim” certain areas as “music-specific” or “emotion-specific” when, in fact, they are neither. In our view, both the proponents and critics of the emotion-specificity approach have tended to confuse causal mechanisms with affective outcomes: there is no reason to assume emotion specificity in the former (e.g., a “memory area” may be active across emotions), even though there is specificity in the felt emotions (nostalgia vs. awe). In a meta-analysis, Lindquist and colleagues (Lindquist, Wager, Kober, BlissMoreau, & Barrett,  2012) observed a set of interacting brain regions commonly involved in basic psychological operations of both an emotional and non-emotional nature during emotion experience, across a range of discrete emotion categories. The authors argue that this finding is consistent with a “constructive” approach to emotion (Barrett, 2017). However, it is equally consistent with the BRECVEMAC framework presented earlier. The major conclusion of this approach, then, is that “although there may be some limited level of emotion specificity in regions linked to conscious emotional experience, most areas involve domain-general processes (such as memory) which are active not only during emotions.” This, then, leads us to the fifth and final approach. 5.  The fifth approach focuses on underlying psychological processes or brain functions; that is, mechanisms (e.g., Cabeza & Nyberg, 2000). By carefully isolating distinct psychological processes in the experimental design, one can link neural correlates to mental functions. For example, episodic memories might involve a partly distinct brain network from conditioned responses. This approach is the “essence” of neuropsychology and has been successful in the neurosciences more generally. Yet this approach is still rare in the music field (Janata, 2009; Steinbeis, Koelsch, & Sloboda, 2006). Over time, one may discern a change from basic lateralization studies (e.g., dichotic listening) and a search for individual brain structures to a consideration of more complex and distributed networks. But we are not aware of any study of neural correlates so far that contrasts different psychological mechanisms. (We will consider such an approach later in the chapter.) Thus, the increasing awareness of the role of mechanisms has not yet translated into concrete designs. This becomes clear when we take a closer look at the findings.

Summary of Brain Imaging Data At the current stage, the data that are potentially most informative when it comes to pinpointing neural correlates of musical emotion come from the (38) brain imaging studies conducted to date. Tables 1 and 2 summarize the main findings for perceived and induced

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   295 Table 1.  Summary of main findings of brain imaging studies focusing on perceived emotions (N = 11) General area

Striatum Dorsal striatum Striatum Cingulate cortex

ACC

Cingulate cortex Temporal lobe/ Auditory cortex

Temporal/Auditory Motor areas Motor areas Frontal cortex

Frontal gyrus

PfC

Specific area

N

%

Studies

Amygdala Insula Cerebellum Thalamus PHG Substantia nigra

2 3 3 2 3 1

18 27 27 18 27 9

17, 29 15, 32, 37 28, 29, 32 26, 32 13, 28, 37 37

Dorsal striatum/ caudate Putamen TOTAL Striatum (general) TOTAL

1 1 2 1 3

9 9 18 9 27

17 32 17, 32 37 17, 32, 37

ACC Anterior cingulate gyrus dACC TOTAL Posterior cingulate/precuneus Ventral posterior cingulate gyrus TOTAL

3 1 1 5 2 1 7

27 9 9 45 18 9 64

11, 13, 28 37 26 11, 13, 26, 28, 37 16, 32 28 11, 13, 16, 26, 28, 32, 37

TP

2

18

28, 36

STS STG Transverse temporal gyrus/HG FG MTG AC TOTAL

1 3 2 1 1 2 4

9 27 18 9 9 18 36

36 11, 15, 28 11, 28 15 37 11, 28 11, 15, 28, 37

SMA Precentral gyrus TOTAL

1 3 3

9 27 27

28 15, 28, 32 15, 28, 32

IFG SFG Middle frontal gyrus TOTAL PfC mPfC TOTAL mfC: medial frontal cortex OfC: orbitofrontal cortex aMFC Inferior frontal lobule Dorsolateral frontal cortex

2 1 2 5 1 1 2 1 1 1 1 1

18 9 18 45 9 9 18 9 9 9 9 9

26, 28 11 13, 15 11, 13, 15, 26, 28 28 13 13, 28 11 16 36 37 16

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

296   patrik n. juslin and laura s. sakka Table 1.  Continued General area Frontal cortex Brainstem Brainstem

Specific area

N

%

Studies

TOTAL (incl. frontal gyrus & PfC)

8

73

11, 13, 15, 16, 26, 28, 36, 37

Brainstem Dorsomedial midbrain TOTAL Inferior parietal lobe Supramarginal gyrus Angular gyrus Retrosplenial cortex

1 1 2 3 1 1 1

9 9 18 27 9 9 9

29 37 29, 37 28, 32, 37 28 28 29

Note: Numbers in the right-most column indicate studies, as listed in the Appendix table.

emotion, respectively, in terms of broad brain areas for which blood-flow changes have been reported. Ideally, the interpretation should be made in terms of “networks” (Bressler & Menon, 2010), rather than “isolated” regions, but current results do not yet enable such interpretations. Some broad conclusions can be drawn based on the findings. First, music listening can cause changes in blood flow in “core” regions for emotional processing. Second, as noted by Peretz (2010, p. 119), “there is not a single, unitary emotional system underlying all emotional responses to music.” On the contrary, a fairly broad range of cortical and subcortical brain regions seem to be linked to musical emotions. Most of these belong to the (extended) limbic system and include the amygdala, the hippocampus, the striatum (including nucleus accumbens), the cingulate cortex, the insula, the prefrontal and orbitofrontal cortex, the cerebellum, the frontal gyrus, the parahippocampal gyrus, and various brainstem structures. The data in Tables 1 and 2 also enable us to compare induced and perceived emotions. As may be seen, there is some overlap between the brain regions reported. This could reflect two things: (a) that there is some extent of overlap in the neural correlates of these processes or (b) that studies have not sufficiently distinguished between the processes— such that some studies that ostensibly focus on induced emotion have measured perceived emotion and vice versa; or that some studies measure both processes at the same time—leading to “noisy” data. Few studies have measured multiple components of emotion so as to enhance the validity of conclusions about induced emotions (discussed at the beginning of this chapter). In any case, note that there are certain differences in the findings for the two processes: Only for induction of emotion have several studies reported changes in the amygdala, the striatum (including nucleus accumbens), and the hippocampus. At least some of these areas may thus distinguish induced emotions from mere perception, though

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   297 Table 2.  Summary of main findings of brain imaging studies focusing on induced emotions (N = 27) General area

Parahippocampus

Parahippocampus

Striatum Ventral striatum

Dorsal striatum Striatum Cingulate cortex

ACC

Cingulate cortex

Specific area

N

%

Studies

Amygdala

17

63

Insula Hippocampus

13 14

48 52

Cerebellum Thalamus Hypothalamus

7 3 1

26 11 4

1, 3, 4, 6, 9, 10, 18, 19, 20, 22, 23, 25, 27, 30, 31, 34, 35 3, 4, 7, 8, 9, 12, 18, 20, 21, 24, 25, 30, 38 3, 4, 7, 8, 10, 18, 19, 25, 27, 30, 31, 34, 35, 38 3, 4, 8, 21, 24, 25, 38 4, 24, 31 12

Parahippocampus PHC PHG TOTAL Basal ganglia Substantia nigra Claustrum

1 1 6 7 1 1 1

4 4 22 26 4 4 4

3 38 5, 6, 9, 18, 25, 38 3, 5, 6, 9, 18, 25, 38 21 14 6

V. Striatum (general) V. Striatum – NAc TOTAL Caudate nucleus Putamen TOTAL Striatum (general) TOTAL (incl. dorsal & ventral)

8 6 12 5 2 5 1 17

30 22 44 19 7 19 4 63

4, 18, 20, 25, 27, 30, 33, 38 7, 21, 24, 27, 31, 33, 34 4, 7, 18, 20, 21, 24, 25, 27, 30, 33, 34, 38 6, 8, 22, 25, 38 6, 8 6, 8, 22, 25, 38 3 3, 4, 6, 7, 8, 18, 20, 21, 22, 24, 25, 27, 30, 31, 33, 34, 38

ACC dACC PfAC sgACC Anterior cingulum vAC TOTAL Cingulate cortex SCG Pre-genual cingulate cortex Sub-genual cingulate cortex Posterior cingulate/ precuneus SC TOTAL

7 1 1 1 1 1 11 1 1 1

26 4 4 4 4 4 41 4 4 4

4, 7, 24, 25, 31, 34, 38 9 7 35 21 9, 14 4, 7, 9, 14, 21, 24, 25, 31, 34, 35, 38 20 7 25

1

4

8

30

3, 4, 5, 8, 9, 14, 25, 38

1 15

4 56

5 3, 4, 5, 7, 8, 9, 14, 20, 21, 24, 25, 31, 34, 35, 38

38

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

298   patrik n. juslin and laura s. sakka Table 2.  Continued General area

Specific area

N

%

Temporal lobe/ Auditory cortex

HG

1

4

FG STG Planum polare mTL TP Temporal regions Temporal lobe AC TOTAL

2 7 1 1 4 2 2 9 19

7 26 4 4 15 7 7 33 70

3, 9 5, 14, 25, 30, 31, 34, 38 31 3 7, 12, 18, 31 8, 10 2, 21 6, 7, 8, 12, 20, 25, 27, 30, 34 2, 3, 5, 6, 7, 8, 9, 10, 12, 14, 18, 20, 21, 25, 27, 30, 31, 34, 38

IFG mFG VMFG SFG TOTAL mPfC VMPfC VPfC PfC TOTAL VMfC OfC Frontopolar areas TOTAL (incl. frontal gyrus & OfC)

7 2 1 1 10 4 4 1 2 10 1 5 2 19

26 7 4 4 37 15 15 4 7 37 4 19 7 70

6, 8, 12, 18, 22, 24, 34 6, 25 3 14 3, 6, 8, 12, 14, 18, 22, 24, 25, 34 7, 8, 9, 14 4, 33, 34, 38 23 8, 10 4, 7, 8, 9, 10, 14, 23, 33, 34, 38 3 4, 5, 24, 35, 38 8, 35 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 18, 22, 23, 24, 25, 33, 34, 35, 38

1 1 1 2

4 4 4 7

20 8 2 3, 6 4 9, 35 14

Temporal/ Auditory Frontal cortex

Frontal gyrus

PfC

Frontal cortex

Motor areas

Motor areas Brainstem

Brainstem

Studies 18

Somatosensory cortex Premotor regions Sensorimotor area Precentral gyrus (motor area) SMA MNS Ventral lateral thalamic nucleus Rolandic operculum TOTAL

1 2 1

4 7 4

1 10

4 37

18 2, 3, 4, 6, 8, 9, 14, 18, 20, 35

Brainstem LC VTA TOTAL Occipital cortex Retrosplenial cortex Supramarginal gyrus

2 1 2 4 1 1 1

7 4 7 15 4 4 4

3, 24 23 8, 24 3, 8, 23, 24 21 7 8

Note: Numbers in the right-most column indicate studies, as listed in the Appendix table.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   299 studies that directly contrast the two processes under controlled conditions are clearly required to confirm this hypothesis.1 Beyond these simple and relatively trivial conclusions, interpretations of the findings tend to become more difficult and “impressionistic” in nature. Given a general lack of “process-pure” manipulations of mechanisms, researchers have to rely on “informed speculations” about the possible role of different brain structures and networks. These are typically based on general knowledge of the brain, but tend to be relatively vague. This is because the analyses involve very broad brain areas which have been proposed to be involved in a wide range of different psychological processes; that is, they have poor “selectivity” (Poldrack, 2006) when it comes to “revealing” specific psychological processes. Koelsch (2014, p. 172) submits that observed changes in the amygdala “could be because music is perceived as a stimulus with social significance owing to its communicative properties.” This is, indeed, one possibility—but we really do not know. And even if this notion is correct, it does not offer very precise information about the functional role of the amygdala. An additional problem is that this form of “reverse inference” about cognitive process is not deductively valid (Poldrack, 2006). Normally, we would infer from brain imaging data that “when cognitive process X is active, then brain area Y is active”—not the other way around. Let us be clear: This is not a matter of competence. Informed speculations and interpretations by distinguished neuroscientists like Stefan Koelsch or Isabelle Peretz are as good as they get. The problem is rather that in the absence of process-specific experimental manipulation in the field as a whole, theoretical interpretations are rendered difficult for a number of reasons. A first problem is that brain imaging “cannot disentangle correlation from causation” (Peretz, 2010, p. 114); a related problem is that results from imaging studies tend to be “overinclusive” (Peretz, 2010, p. 114); therefore, “it is not always easy to determine if the activity is related to emotional or non-emotional processing of the musical structure” (Peretz, 2010, p. 112). Indeed, the same brain structure can serve different roles both within and across domains (Kreutz & Lotze,  2007). In addition, as implied by the ICINAS-BRECVEMAC framework, cognition and emotion are not neatly separated in the brain: specific cognitive processes may be involved depending on the mechanism responsible for the perceived or induced emotion. The specific listener task (self-report of felt affect, ratings of melodies, or mere listening) may also affect the patterns of brain activation/deactivation, and so may differences with respect to the music stimuli (“real” vs. “synthesized” music, “familiar” vs. “unfamiliar,” “self-selected” vs. “experimenter-selected”). All of these issues conspire to make interpretations of findings from brain imaging studies problematic. This has not prevented 1  The dopaminergic mesolimbic reward pathway involving nucleus accumbens is a prime suspect when it comes to positive emotions. The hippocampus and the amygdala are both considered key areas for emotional memories, which will presumably not be involved as much during comparatively “neutral” perception of emotions.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

300   patrik n. juslin and laura s. sakka researchers from suggesting how to organize the findings with regard to the processes of perception and induction, respectively.

Perception of Emotions Double brain dissociations between emotional judgments and melody recognition (Peretz & Gagnon, 1999), and between emotional judgments and basic music perception (Peretz, Gagnon, & Bouchard, 1998), initially lead Peretz (2001) to postulate an “emotion module,” dedicated to perception of emotion in music. Subsequently, she proposed that a more distributed network, originally evolved to process vocal emotions, has been “invaded” by music, such that emotional speech and emotional music will share neural resources (Peretz,  2010). This idea has received some support (Escoffier, Zhong, Schirmer, & Qui, 2013) and is in line with documented parallels in emotions between speech and music (Juslin & Laukka, 2003). Studies on emotions in speech suggest a network of areas primarily in the (right) frontal and parietal lobes, including the inferior frontal gyrus (Schirmer & Kotz, 2006). The possibility of cross-modal parallels can be explored in the context of the present results (Table 1). For perceived emotions, the most frequently reported regions are frontal areas (73 percent of studies) and the frontal gyrus (45 percent). Note that Escoffier et al. (2013) found that tracing of emotions in both speech and music was related to activity in the medial SFG. Moreover, Nair, Large, Steinberg, and Kelso (2002) discovered that ­listening to expressive (as compared to “mechanical”) music performances increased activity in the right inferior frontal gyrus. These findings seem consistent with the “shared-resources hypothesis” (further evidence of a shared neural code was recently reported by Paquette, Takerkart, Saget, Peretz, & Belin, 2018). There are some additional brain regions implicated in emotion-perception studies. Curiously, there are three studies (27 percent) that report changes in the cerebellum during perceived emotion, and five studies (45 percent) that report changes in the anterior cingulate cortex (which occurs also in evoked emotion; cf. Table 2). We return to these findings later. It has further been argued that the perception of dissonance is linked to the parahippocampal gyrus (Blood et al., 1999). This notion receives support from lesion studies showing that this basic ability suffers after damage to the parahippocampal gyrus (Gosselin et al., 2006). Only two of eleven studies (18 percent) report changes in the amygdala (see Table 1)—though it has been found that recognition of “scary” music suffers after damage to the amygdala (Gosselin et al., 2005). It cannot be completely ruled out that the two studies really measured evoked emotion rather than just perceived (since they featured unpleasant stimuli that may have evoked some negative emotion). In summary, the most consistent results are that perception of emotions in music involves the frontal cortex and the frontal gyrus—and, perhaps, some right hemisphere lateralization (Bryden, Ley, & Sugarman, 1982).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   301

Induction of Emotions For induction of emotions, a larger number of brain regions have been reported (Table 2). The most frequently reported areas include the amygdala (63 percent of studies), the frontal cortex (70 percent; Pfc 37 percent), the ventral striatum/NAc (44 percent), the hippocampus (52 percent), the insula (48 percent), and the anterior cingulate cortex (41 percent). However, note that the results vary a lot from study to study, in ways that are not easy to explain. For example, it may be seen in Table 2 that there are numerous additional regions that were reported in only one or a few studies. These include the parahippocampus, the thalamus, the basal ganglia, the cerebellum, motor regions, and the brainstem. One approach to this problem is to look for areas that are consistently activated across studies in the hope that this will reveal an emotion network that is invariably involved in the process. Thus, for instance, Koelsch and colleagues (Koelsch, Siebel, & Fritz, 2010) argue that a network consisting of the amygdala, the hippocampus, the parahippocampus, the temporal poles, and the pregenual cingulate cortex may play a consistent role in emotional processing of music. But is there support for the idea of a set of brain regions that are consistently activated? Close inspection of Table 2 reveals that few brain regions are reported in more than about half of the studies which purported to measure induced emotions. If areas are not consistently found to be influenced, how is this to be interpreted? Some of this variability is surely due to methodological problems and consequent measurement error. This could include differences in how regions of interest (ROI) are defined, or in the assumptions made in the analysis. But assuming that limbic regions were “prime suspects” in the analyses, the variability is still too large to be accounted for by (only) this factor. In principle, one may argue that if these studies have tried to measure emotion and the listed regions are not consistently activated across studies, then either these areas are not related to emotions, or these studies have not consistently managed to induce any emotion. However, a different interpretation suggested by the BRECVEMAC framework (and supported by meta-analyses of “general” emotion findings; cf. Lindquist et al., 2012) is that the variability is due to different psychological mechanisms being activated in different investigations (depending on the musical stimuli, the listeners, and the situation, as well as the experimental procedure). This possibility is elaborated in the following section.

Towards a More Principled Approach If neuropsychology “aims to relate neural mechanisms to mental functions” (Peretz, 2010, p. 99), and most previous studies have not tried to manipulate mechanisms that involve distinct mental functions (discussed earlier), it is hard to resist the conclusion

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

302   patrik n. juslin and laura s. sakka that studies in this field have somehow attempted to do neuropsychology, although without the psychology. There is one exception: Janata (2009) focused specifically on the process of autobiographical memory and found that dorsal regions of the medial prefrontal cortex responded to the relative degree of autobiographical salience of musical stimuli (rated post-hoc). We believe that a more principled approach, which aims to target specific mechanisms, might lead to more interpretable results (Juslin, Barradas, & Eerola, 2015; Juslin, Harmat, & Eerola, 2014). Based on the assumptions that most studies of musical emotion have lacked the needed specificity, in terms of stimulus manipulation and procedures, to ­separate different underlying mechanisms, and that neuroscience studies in general psychology have reached a higher level of theoretical sophistication, we propose hypotheses from various sub-domains (e.g., memory, imagery, language). These might be tested in designs that manipulate specific mechanisms, in a humble attempt to uncover more mechanism-specific brain networks (Juslin, 2019). Emotional responses to music can be expected to involve three general types of brain regions: (1) brain regions always involved during music perception (e.g., the ­primary auditory cortex), (2) regions always involved in the conscious experience of emotion, regardless of the “source” of the emotion (candidates may include the rostral anterior cingulate and the medial prefrontal cortex; see, e.g., Lane, 2000, pp. 356–358), and (3) regions involved in information-processing that differs depending on the mechanism that caused the emotion. The last category of regions may involve processes (e.g., syntactic processing, episodic memory) that do not in themselves imply that emotions have been aroused: They may also occur in the absence of emotions (e.g., Pessoa, 2013). Based on these notions, we propose the following (preliminary) hypotheses for emotion induction. (Neural correlates of aesthetic judgments are discussed in Chapter 15, this volume). Brainstem reflexes involve the reticulospinal tract, which travels from the reticular formation of the brain stem, and the intralaminar nuclei of the thalamus (Davies, 1984; Kinomura, Larsson, Gulyás, & Roland, 1996). “Alarm signals” to auditory events can be emitted as early as at the level of the inferior colliculus of the brainstem (Brandao, Melo, & Cardoso, 1993), producing startle reflexes and increased arousal. Studies show that the reticulospinal tract is required for the acoustic startle response, because lesions in this tract abolish the response (Boulis, Kehne, Miserendino, & Davis, 1990). Yet, although the neural circuitry that “mediates” the acoustic startle is located entirely within the brainstem, the system can be modulated by higher neural tracts (Miserandino, Sananes, Melia, & Davis, 1990). Rhythmic entrainment has been less examined, but could involve neural oscillation patterns to rhythmic stimulation in early auditory areas, motor areas (sensorimotor cortex, supplementary motor area), the cerebellum, and the basal ganglia (see Fujioka, Trainor, Large, & Ross, 2012; Tierney & Kraus, 2013; Trost el al., 2014), perhaps primed early on by reticulospinal pathways in the brainstem (Rossignol & Melvill Jones, 1976). The cerebellum could be particularly important in “active” entrainment (coordination

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   303 of a motor response; e.g., Grahn, Henry, & McAuley, 2011), whereas the caudate nucleus of the basal ganglia could be the crucial area during “passive” entrainment to auditory stimulation (Trost et al., 2014). Evaluative conditioning (EC) involves particularly the lateral nucleus of the amygdala and the interpositus nucleus of the cerebellum (e.g., Fanselow & Poulus, 2005; Johnsrude, Owen, White, Zhao, & Bohbot, 2000; Sacchetti, Scelfo, & Strata, 2005). Hippocampal activation may also occur, if the EC depends strongly on the context, but only the amygdala seems to be required for EC to occur (LeDoux, 2000). The timing of the delivery of the CS and US used in conditioning is important, which may explain why the cerebellum is active in conditioning (like another time-dependent process—rhythmic entrainment). We argue that the amygdala is mainly involved in the evaluation of the stimulus whereas the cerebellum is involved in the timing of the response (Cabeza & Nyberg, 2000). Emotional contagion from music will presumably include brain regions for the ­perception of emotions from the voice (and, hence, presumably of emotions from voice-like characteristics of music), mainly right-lateralized inferior frontal areas (including the frontal gyrus) and the basal ganglia (Adolphs, Damasio, & Tranel, 2002; Paulmann, Ott, & Kotz, 2011; Schirmer & Kotz, 2006), and also “mirror neurons” in premotor regions, in particular regions involved in perceiving emotional vocalizations (e.g., Paquette et al., 2018; Warren et al., 2006; cf. Koelsch, Fritz, von Cramon, Müller, & Friederici, 2006). Visual imagery involves visual representations in the occipital lobe that are spatially mapped and activated in a “top-down” manner during imagery (Charlot, Tzourio, Zilbovicius, Mazoyer, & Denis, 1992; Goldenberg, Podreka, Steiner, Franzén, & Deecke, 1991). This requires the intervention of an attention-demanding process of image generation, which appears to have a left temporo-occipital localization (e.g., Farah, 2000). Self-reported imagery vividness correlates with activation of the visual cortex in imaging studies (Cui, Jeter, Yang, Montague, & Eagleman, 2007), which may also be activated during music listening (e.g., Thornton-Wells et al., 2010). Episodic memory can be divided into various stages (e.g., encoding, retrieval). The conscious experience of recollection of an episodic memory seems to involve the medial temporal lobe, especially hippocampus (e.g., Nyberg, McIntosh, Houle, Nilsson, & Tulving, 1996) and the medial prefrontal cortex (Gilboa, 2004; for similar results in music, see Janata, 2009). Additional areas correlated with episodic memory retrieval include the precuneus (Wagner, Shannon, Kahn, & Buckner,  2005), the entorhinal ­cortex (Haist, Gore, & Mao, 2001), and the amygdala (in the case of emotional memories; Dolcos, LaBar, & Cabeza, 2005). Musical expectancy refers to such expectancies that involve syntactical relationships between different parts of the musical structure (Meyer, 1956), somewhat akin to a syntax in language. Lesion studies indicate that several areas of the left perisylvian cortex are involved in various aspects of syntactical processing (Brown, Hagoort, & Kutas, 2000), and parts of Broca’s area increase their activity when sentences increase in syntactical

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

304   patrik n. juslin and laura s. sakka complexity (Caplan, Alpert, & Waters, 1998; Stromswold, Caplan, Alpert, & Rauch, 1996; for music, see Maess, Koelsch, Gunter, & Friederici,  2001). Musical expectancy also involves monitoring of conflicts between expected and actual music sequences. This may recruit parts of the anterior cingulate (Botvinick, Cohen, & Carter, 2004) or orbitofrontal cortex (Koelsch, 2014). It should be noted that nearly all of the brain regions proposed above have been reported in at least one imaging study of music listening; and many have been reported frequently. Detailed predictions for neural correlates of emotion perception, based on the ICINAS-BRECVEMAC framework (Juslin, 2019), have not been proposed earlier, but the reported blood-flow changes are at the least consistent with sources of perceived emotions in terms of iconic similarity with emotional speech (e.g., the right frontal gyrus), intrinsically coded tension in musical structure (e.g., the anterior cingulate cortex), and associative coding based on classic conditioning (e.g., the cerebellum; see Table 1). Overlapping brain areas between evoked and perceived emotion (Tables  1 and  2) could reflect similar processes—such as emotion perception (prefrontal brain areas, involved in the induction mechanism contagion) and conflict monitoring (the anterior cingulate cortex, which is involved in both intrinsic sources of perceived emotions and the expectancy mechanism for emotion induction). We emphasize, however, that all “post-hoc” speculations of this type must be treated with caution: The relevant distinctions between processes must be made at the stage of experimental design (Juslin et al., 2014, 2015), rather than in the interpretations afterwards.

Concluding Remarks: A Field in Need of an Agenda? Nearly a decade ago, Peretz (2010) observed that the neuropsychology of music and emotion was in its infancy. Yet she seemed optimistic: “It is remarkable how much progress has been accomplished over the last decade” (Peretz, 2010, p. 119). We take a slightly more pessimistic view on the current state of the art: the field may have become a “toddler,” but the results are fragmented. Most studies seem to make sense, when considered on their own, but the different studies do not add up to a consistent “big picture.” When it comes to understanding which brain regions are involved in music and emotion, and their respective role in the underlying processes, it is not obvious that the field has advanced much, as compared to the seminal studies carried out nearly twenty years ago (Blood & Zatorre, 2001). Yes, some brain areas have been (more or less) consistently reported across different studies— but we still do not know which roles they play.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   305 We suggest that this reflects the lack of a systematic research program, which truly attempts to link specific psychological processes to brain networks. The previously outlined ICINAS-BRECVEMAC framework provides one promising way to address this issue. We recognize, however, that there may be other ways of “slicing the pie.” The import­ant thing is that we do not try to eat the pie randomly, because that is bound to get messy. We argue that future research designs need to become increasingly sensitive to ­psychological process distinctions. To this end, we propose three ways of enhancing progress in the domain: (1) actively manipulating and contrasting different psychological mechanisms in the same experimental design (cf. Juslin et al., 2014); (2) employing convergent measures to support conclusions about the engagement of each mechanism (see Juslin et al., 2015, Table 7) and about whether perception or induction of emotion has occurred (e.g., Lundqvist, Carlsson, Hilmersson, & Juslin, 2009); (3) analyzing sets of regions as networks (as opposed to analyzing single regions), in order to increase the selectivity of response in the brain region of interest (Poldrack, 2006; cf. Koelsch, Skouras, & Lohmann, 2018). We also recommend the use of more systematic control conditions (i.e., contrasts) to rule out “alternative interpretations”—contrasting different mechanisms not only with one another, but also with “non-emotional” music listening, listening to “mere sounds,” to silence, etc., in order to isolate the brain networks that are selectively involved in: music listening per se; emotions in general; specific emotion categories and dimensions; psychological mechanisms; and more domain-general cognitive processes (e.g., attention). One hitherto unexplored possibility is to use transcranial magnetic stimulation (PascualLeone, Davey, Rothwell, Wassermann, & Puri, 2002) to disrupt brain activity at crucial times and locations to prevent mechanisms from becoming activated by music events. Some scholars argue that an understanding of musical emotions is important in order to better understand emotions in general (Koelsch et al., 2010). Indeed, because music engages so fully with our emotions, music can sometimes reveal the nature of our “emotional machinery” more clearly than the stimuli normally used to study emotions. The fact that music appears to be so “abstract”—meaning that our “post-hoc” rationalizations for emotions cannot be made to easily fit—may help us to think more clearly about the “true” causes of our emotions (Juslin, 2019). Current theory of music and emotion suggests that responses are mediated by a wide range of mechanisms, rather than just cognitive appraisal. However, there is an unfortunate disconnect between theory in the field and empirical studies of the neural correlates, which prevents brain studies from realizing their full potential; when psychological theory becomes reflected in the experimental design of brain imaging studies, that is when things are bound to get exciting.

No.

Study

Listeners

Music

Contrast/design

Method/lesion area

Main findings

Similar amygdala response to consonant & dissonant music (activation in basolateral & deactivation in superficial and centromedial amygdala) Take home message: different sub-regions of amygdala react differently

1

Ball et al., 2007

N = 14

Original (consonant) & electronically manipulated (dissonant)

1. Consonance (pleasant) 2. Dissonance (unpleasant)

fMRI

2

Alfredson, Risberg, Hagberg, & Gustafson, 2004

Elderly N = 12

Participant-selected emotional music & experimenterselected

1. Emotional music (Em) 2. Non-emot. music (NEm) 3. Silence (Si)

133Xenon inhalation technique

3

Baumgartner, Lutz, Schmidt, & Jäncke, 2006

Females N = 9

Classical music

1. Picture alone 2. Combined (picture & music)

fMRI

4

Blood & Zatorre, 2001

Musicians N = 10

Participant-selected emotional music

Degree of chills intensity

PET

Emotion P/E

Em vs. Si → increased rCBF in right temporal lobe temporal lobe asymmetry (right > left) Combined → increased activation in most structures of the ventral system for emotion processing (amygdala, VMfC, striatum, insula, brainstem, mTL memory system, hippocampus, parahippocampus, cerebellum, fusiform gyrus)

Chills → rCBF increases in left ventral striatum and dorsomedial midbrain, insula, right OfC, thalamus, ACC, SMA, cerebellum, & decreases in amygdala, left hippocampus, & VMPfC

V/A/D

E

V/A

E

A (level of intensity)

E

D (happy, sad, angry)

E

Chills

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

Appendix table.  Characteristics of reviewed studies

Blood, Zatorre, Bermudez, & Evans, 1999

N = 10

Synthesized music

1. Consonance 2. Dissonance

PET

6

Brattico et al., 2011

N = 15

Participant-selected

1. Lyrics 2. No lyrics a. Happy b. Sad

fMRI

7

Brown, Martinez, & Parsons, 2004

N = 10

Unfamiliar instrumental pleasant music

1. Music 2. Rest (silence)

PET

8

Caria, Venuti, & de Falco, 2011

ASD N = 8 Controls N = 14

“Real” music (classical pieces & participantselected) & random tone sequences

1. ASD 2. Controls a. Happy music (favorite – standard) b. Sad music (favorite – standard) c. Random tones

fMRI

Dissonance → increased right PHG & precuneus activity; decreased OfC, SC & frontal polar cortex activity

Happy vs. sad music → activation in left thalamus & right caudate Sad vs. happy music: left hemispheric secondary & associative AC, including insula Music → activation in limbic and paralimbic structures, mostly left (SCG, PfAC, retrosplenial cortex, hippocampus, anterior insula, NAc, ACC)

ASD in happy vs. random → left supramarginal gyrus, primary AC, right auditory association area, IFG, cerebellum, right insula, the putamen, caudate nucleus Favorite vs. standard happy → mPFC, posterior cingulate cortex and precuneus, the left posterior insula, the ventromedial & frontopolar cortices & the lingual gyrus, and AC Sad vs. random → right cerebellum Favorite vs. standard sad → temporal regions, IFG, cerebellum, right supramarginal gyrus, right VTA, right hippocampus, left insula, left precuneus, right medial and frontopolar PfC & the right premotor regions

E

V (pleasant/ unpleasant)

E

V/D (happy/ sad)

a

E A

E

V/D (happy/ sad)

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

5

Appendix table.  Continued No.

Study

Listeners

Music

Contrast/design

Chapin, Jantzen, Scott Kelso, Steinberg, & Large, 2010

“Deep” listeners N = 21

Natural expressive classical piano performance (EP) & mechanical (synthesized) performance (MP)

1. Expressive (EP) 2. Mechanical (MP) performance a. Experienced (EL) b. Inexperienced (IL) listeners

fMRI

10

Eldar, Ganor, Admon, Bleich, & Hendler, 2007

N = 14

“Real” pop music (positive) & computer-­ generated pieces (neutral & negative) Instrumental

1. Neutral film (F) 2. Valenced music (M) 3. Combined (FM)

fMRI

11

Escoffier, Zhong, Schirmer, & Qiu, 2013

N = 16

Unfamiliar non-vocal music from popular genres

1. trace Pitch 2. trace Emotion in a. Music (M) b. Voices (V)

fMRI

12

Flores-Gutiérrez b et al., 2007

N=6

Unfamiliar “real” instrumental music masterpieces

1. Pleasant music (Pl): 2. Unpleasant music (Un) 3. Noise (control)

fMRI

9

Main findings EP → activation of PHG, amygdala, ventral ACC, & dorsal mPfC, FG, inferior parietal lobule, IFG, & precuneus. Tempo of EP (unpredictability) → activation of motor-related areas correlated Emotional arousal in the EL → dACC activity

FM with negative music vs. F → additional activity in amygdala, anterior hippocampus, lateral PfC & lateral temporal regions M alone did not activate these areas M & V → left HG M vs. V → stronger right HG activation Tracing emotions in both V & M → medial SFG & ACC Emotion specific effects in primary and secondary AC & mfC, for both V & M

Pl → left metabolic activation & coherent activity among primary auditory, posterior temporal, inferior parietal, & prefrontal regions Un → right frontopolar & paralimbic areas Un vs. Pl → IFG & insula Music vs. noise → AC, left TP, IFG & frontopolar area

Emotion P/E

V/A/D

E

V/A

E

V/A14 pt

P

V (continuous: sad–happy)

E

D (19 emotions)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

Method/lesion area

Green et al., 2008

N = 21

Unfamiliar piano melodies (digital)

1. Major 2. Minor

fMRI

14

Janata, 2009

N = 13

“Real” music

Degree of familiarity, autobiographical salience, and valence

fMRI

15

Jeong et al., 2011

N = 15

Instrumental music

1. Music 2. Faces 3. Combined music & faces a. Happy b. Sad (CongruentIncongruent)

fMRI

16

Khalfa, Schon, Anton, & LiégeoisChauvel, 2005

N = 13

“Real” instrumental classical

1. fast 2. slow 3. silence a. major b. minor

fMRI

Minor (sad) vs. Major (happy) → limbic activity (left PHG; bilateral ventral ACC; left mPfC)

Autobiographical salience → increased activity in dorsal regions of mPfC & SFG Familiar, autobiographic & positively valenced music → posterior cingulate The degree of positive affect correlated with activity in left vAC, substantial nigra & ventral lateral thalamic nucleus Simple task of music listening: STG, AC, HG, VLPfC

Happy music → bilateral STG & precentral gyrus, & right lingual gyrus Sad music → bilateral STG, & left middle frontal gyrus, lingual gyrus, & insula Happy vs. sad music → higher STG activity Congruent → greater STG activity Happy music & faces vs. sad music & faces → larger STG activity Incongruent → diminished STG activity & greater signal change in bilateral FG

Minor (sad) music → left orbito- and mid-dorsolateral frontal cortex

P V/D (happy/sad)

E V (pleasing/ displeasing)

P V/D (happy/sad)

P

V (continuous from sad to happy)

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

13

Appendix table.  Continued Study

Listeners

Music

Contrast/design

Method/lesion area

Main findings

Irregular (unpleasant) vs. regular (pleasant) chords → increased bilateral amygdala activity

17

Koelsch, Fritz, & Schlaug, 2008

fMRI: musicians: N = 10 non-mus.: N = 10

Synthesized chord sequences

Chord: 1. Regular (pleasant) 2. Irregular (unpleasant)

fMRI

18

Koelsch, Fritz, v. Cramon, Müller, & Friederici, 2006

N = 11

Original music (consonant) & electronically manipulated (dissonant) instrumental

1. Consonance (pleasant) 2. Dissonance (unpleasant)

fMRI

19

Koelsch et al., 2007 (experiment 3)

N = 17 (Exp. 3a) N = 24 (Exp. 3b)

Original music (consonant) & electronically manipulated (dissonant) instrumental

1. Consonance (pleasant) 2. Dissonance (unpleasant)

fMRI

20

Koelsch et al., 2013

N = 18

“Real” music (joy) Manipulated soundtrack music (fear) Computergenerated sequences of isochronous tones (neutral)

Stimuli: 1. Fearful 2. Joyful 3. Neutral

21

Kreutz, Ott, & Wehrum, 2006

N = 25

Instrumental classical & contemporary music

Listened to pieces representing joy, fear, sadness, anger, peacefulness

Emotion P/E

V/A/D

P

V (pleasant/ unpleasant)

E

V (pleasant/ unpleasant)

Dissonant (unpleasant) vs. ­consonant (pleasant) music → stronger left amygdala & right hippocampal formation activity

E

V (pleasant/ unpleasant)

fMRI

AC & SF activity increased for joy & decreased for fear, vs. neutral Fear → right primary somatosensory cortex Emotion-specific functional connectivity of AC with insula, cingulate cortex, and visual & parietal attentional structures

E

V, A, D (fear/joy)

fMRI

Positive emotions (positive valence, peacefulness, and joy) activated anterior cingulum, basal ganglia, insula, NAc

E

V/D

Dissonance (unpleasant) → increased PHG, amygdala, ­hippocampus, and TP activation Consonance (pleasant) → IFG, anterior superior insula, ventral striatum, HG, and Rolandic operculum activation

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

No.

Lehne, Rohrmeier, & Koelsch, 2014

N = 25

“Real” music (piano pieces) and manipulated versions

Degree of music-evoked tension

fMRI

23

Lerner, Papo, Zhdanov, Belozersky, & Hendler, 2009

N = 15

Unfamiliar instrumental music

1. Emotional (neg.) music 2. Neutral music a. Eyes open b. Eyes closed

fMRI

24

Menon & Levitin, 2005

N = 13

Digitized classical music

1. Original music (pleasant) 2. Scrambled music (unpleasant sounds)

fMRI

25

Mitterschiffthaler, Fu, Dalton, Andrew, & Williams, 2007

N = 16

Classical musical pieces

1. Happy 2. Sad 3. Neutral (non-emotion evoking)

fMRI

26

Mizuno & Sugishita, 2007

N = 18

Synthesized piano sounds

1. Major 2. Minor 3. Neutral

fMRI

tension → BOLD signal increases in left pars orbitalis of IFG tension increase → increases in right SF Emotional (negative) music & closed eyes → higher ratings of emotion and greater amygdala activation, leading to increased activations in LC & VPfC

Pleasant music vs. unpleasant sounds → activation of mesolimbic structures involved in reward processing (NAc, VTA, hypothalamus, IfC, left OfC, ACC, cerebellar vermis & brainstem) Connectivity: dynamic VTA-mediated interactions between NAc & hypothalamus, insula, & OfC

Happy → ventral & dorsal striatum, left ACC, left precuneus, & left PHG, left medial frontal gyrus, left posterior cingulate, primary AC, left nucleus caudate Sad → right hippocampus & right amygdala, left cerebellum, AC; posterior cingulate gyrus, left medial frontal gyrus Neutral → left auditory association area & left posterior insula Contrasts Major-Neutral & Minor-Neutral showed activation in bilateral IFG, medial thalamus, & dACC

E

A (tension)

E?

A

E

V

E

V (sad, neutral, happy)

P

V (cheerful, sad, neutral)

c

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

22

Appendix table.  Continued No.

Study

Listeners

Music

Contrast/design

27

Mueller et al., 2015

N = 23

Instrumental pieces (“real” and manipulated)

1. Backward (B) 2. Forward (F) a. Consonant (C) b. Dissonant (D)

fMRI

28

Nair, Large, Steinberg, & Kelso, 2002

Musicians N=4

“Real” piece (expressive performance) & computer-­ generated classical piece

1. Expressive performance (EP) 2. Mechanical performance (MP)

fMRI

29

Pallesen et al., 2005

Musicians N = 11 Non-mus. N = 10

Piano chords (major, minor, dissonant)

1. Major 2. Minor 3. Dissonant a. Passive listening b. Memory task

fMRI

30

Perani et al., 2010

Healthy newborns N = 18

Western tonal classical (original & manipulated)

1. Original music 2. Manipulated (dissonant)

fMRI

Main findings Greater pleasantness in music → greater involvement of the ventral striatum/NAc and amygdala Hippocampus activation was related to music pleasantness Pleasant vs. unpleasant music → stronger activity in AC

EP vs. MP → right ACC, right TP, right IFG, inferior parietal lobe & STG (not amygdala) MP vs. EP → cerebellum, PHG, SMA & dorsolateral PfC Minor & dissonant vs. major chords → enhanced responses in amygdala, retrosplenial cortex, brain stem, & cerebellum (passive listening task – all participants)

Original music → right activations in STG, insula & amygdalahippocampal complex Dissonant music (= emotion?) → reduced activity in right AC, increased activity in left inferior frontal cortex and limbic structures Original music →BOLD signal increase in right amygdala-hippocampal complex Dissonant music → BOLD signal increase in left hippocampal/ entorhinal cortex (possibly amygdala)

Emotion P/E

V/A/D

E

V (pleasant/ unpleasant)

P

?

P

D–V (happy/sad & pleasant/ unpleasant)

E

V

d

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

Method/lesion area

Pereira et al., 2011

N = 14

Personal “real” music (based on familiarity & liking)

1. Familiar (F) 2. Unfam. (U) a. Liked (L) b. Disl. (D)

fMRI

32

Petrini, Crabbe, Sheridan, & Pollick, 2011

N = 16

Musical ­improvisations by saxophonist

1. Audiovisual 2. Visual (musician movements) 3. Audio

fMRI

33

Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011

N=8

Instrumental “real” music

Music: 1. Neutral 2. Chill-inducing

PET fMRI

34

Salimpoor et al., 2013

N = 19

“Real” music (unfamiliar)

Degree of music’s reward value

fMRI

35

Singer et al., 2016

N = 44

“Real” music

fMRI

g

F vs. U →ACC, amygdala, thalamus, puntamen, left hippocampus, left TP, left frontal orbital cortex, right NAc D vs. L →planum polare & STG

Insula & left thalamus → consistent activation for visual, auditory & audiovisual emotional information & increased activation for emotionally mismatching displays vs. emotionally matching displays right thalamus → activation for audiovisual emotional displays & similar activation for emotionally matching and mismatching displays

E

e

V

P

D (­happiness, sadness, surprise)

E

Chills – Extreme pleasure

Reward value was best predicted by activation in NAc, and by functional connectivity between NAc with AC, amygdala, & VMPfC

E

V

Two processing-levels underlying the unfolding of common music emotionality: (1) a shared coreaffective process, confined to a limbic network (amygdala, hippocampus, sgACC & OFC) & mediated by temporal regularities in music, and (2) an experience (musical training) based process that is rooted in a left fronto-parietal network that may involve functioning of the “mirror-neuron system”

E

D (GEMS)

Chills vs. neutral → dopamine release in dorsal & ventral striatum Anticipation of reward → activation of caudate Experience of emotion → activation of NAc

f

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

31

Appendix table.  Continued Study

Listeners

Music

Contrast/design

Method/lesion area

Main findings

Composed vs. computer: Increased activity in the network dedicated to mental state attribution (aMFC, left & right STS, left & right TP) aMFC activity correlated with participants’ perception of expressed intention/ emotion in composed music

36

Steinbeis & Koelsch, 2009

N = 12

“Real” music

1. Music composed by human 2. Computer-generated music

fMRI

37

Suzuki et al., 2008

Males N = 13

Chords: major (happy), minor (sad), consonant (beautiful), & dissonant (ugly)

1. Major (Ma) 2. Minor (Mi) a. Consonant (C) b. Dissonant (D)

PET

38

Trost, Ethofer, Zentner, & Vuilleumier, 2012

N = 15

Instrumental music (control: computergenerated atonal melodies)

1. Positive (P) 2. Negative (N) a. High-arousing (H) b. Low-arousing (L)

fMRI

39

Altenmüller, Schürmann, Lim, & Parlitz, 2002

N = 16

Jazz, rock-pop, classical

1. Positive 2. Negative

EEG

Emotion P/E

MiC vs. MaC →activity in right striatum MaC vs. MiC →activity in left MTG MiC vs. MiD →activity in dorsomedial midbrain, right inferior frontal lobule, & anterior cingulate gyrus

P-H (vitality) → left striatum & insula Tension → STG, right PHG, motor & premotor areas, cerebellum, right caudate nuclues P-L (sublimity)→ right striatum, ACC, hippocampus & OfC N-L (sad) → ACC, hippocampus H →sensory and motor areas L → VMPfC & hippocampus All except P-H → right PHC

Lateralization towards left temporal activation for positive (like) and towards right FTC for negative valence (dislike)

V/A/D

P

V (pleasant/ unpleasant)

P

V

E

V/A D (GEMS)

E

h

V (like/ dislike)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

No.

Baumgartner, Esslen, & Jäncke, 2005

Females N = 24

Classical music

1. Pictures 2. Music 3. Combined

EEG

Global cortical brain activation was highest during emotional responses for the combined condition (lowest Alpha-Power-Density) and lowest for music

E

D (happy/sad/ anger)

41

Daly et al., 2014

N = 31

“Real” music

Correlations of EEG activity with induced emotions

EEG

Asymmetry in the pre-frontal cortex relating to a number of induced emotions Valence related with asymmetry in the beta & gamma bands Results support the valence hypothesis of hemispheric lateralization

E

D, V/A 8 discrete emotions analyzed as V, A (energy) & A (tension)

42

Daly et al., 2015

N = 31 (same as Daly et al., 2014)

“Real” music

Prediction of emotions by EEG activity

EEG

E

D, V/A 8 discrete emotions analyzed as V, A (energy) & A (tension)

43

Field et al., 1998

Depressed female adolescents N = 28

Rock music

1. Music condition (N = 14) 2. Control condition (N = 14) a. Baseline b. Session c. Post-session

EEG

Valence → predicted by delta & beta bands, band-powers over right frontal cortex & by prefrontal asymmetry in the beta band Energy arousal → predicted by beta & gamma bands over the center of the frontal cortex & left motor cortex Tension arousal → predicted by delta, beta & gamma bands over right motor cortex & parietal cortex, & by alpha asymmetry

Music condition during session & post-session (1.b & 1.c) → attenuated right frontal EEG activity, although self-reported mood didn’t change

E

i

Depressed mood

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

40

Appendix table.  Continued Study

Listeners

Music

Contrast/design

Method/lesion area

44

Flores-Gutiérrez et al., 2009

N = 14

Unfamiliar “real” instrumental music master­pieces

1. Pleasant emotions (Pl) 2. Unpleasant (Un) a. Female (F) b. Male (M)

EEG

45

Flores-Gutiérrez j et al., 2007

N=6

Unfamiliar “real” instrumental music masterpieces

1. Pleasant music (Pl) 2. Unpleasant music (Un) 3. Noise (control)

EEG

46

Goydke, Altenmüller, N = 12 Möller, & Münte, 2004 Experiment 1

Single tones played by violin, recorded digitally

1. Standard tones 2. Deviant tones (in terms of emotional expression)

EEG

47

Kamiyama, Abla, Iwanaga, & Okanoya, 2013

Original (“real”) & computergenerated instrumental Western music

Emotionally congruent vs. incongruent music–face pair

EEG

N = 24

Main findings Pl vs. Un → upper alpha couplings linking left anterior and posterior regions Un vs. Pl → posterior midline coherence exclusively in the right hemisphere in men and bilateral in women All music → bilateral oscillations among posterior sensory and predominantly left association areas in women

Emotion P/E

V/A/D

E

D (19 emotions) analyzed as V/A (pleasant, unpleasant, activation)

E

D (19 emotions)

Deviant tones generated mismatch negativity (MMN) responses

P

V/D (happy/sad)

Music elicited larger N400 components in incongruent compared to congruent condition

P

V/D (happy/sad)

Pl → left metabolic activation & coherent activity among primary auditory, posterior temporal, inferior parietal, & prefrontal regions Un → right frontopolar & paralimbic areas Un vs. Pl → IFG & insula Music vs. noise → AC, left TP, IFG & frontopolar area

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

No.

Koelsch, Kilches, Steinbeis, & Schelinski, 2008

N = 20

Original (“real”) & computergenerated piano sonatas

1. Unexpected chords (Un) EEG 2. Expected chords (Ex) a. Emotionally expressive (EmE) b. Non-expressive (NonE)

49

Lin et al., 2010

N = 26

Excerpts from film soundtracks

Association between EEG dynamics and music induced emotions

EEG

50

Logeswaran & Bhattacharya, 2009

N = 46

Instrumental music excerpts

1. Happy face 2. Sad face 3. Neutral face a. Happy music b. Sad music

EEG

51

Sammler, Grigutsch, Fritz, & Koelsch, 2007

N = 18

Instrumental “real” consonant (pleasant: Pl) & electronic dissonant (unpleasant: Un)

1. Consonant –pleasant (Pl) 2. Dissonant – unpleasant (Un)

EEG

52

Schmidt & Hanslmayr, 2009

N = 16

Instrumental “real” (classical & rock)

1. Neutral music 2. Positive 3. Negative a. right-active participants b. left-active

EEG

Un vs. Ex →elicited an ERAN and an N5 Expressivity did no influence the ERAN EmE vs. NonE → larger N5

E

V, A & surprise

P

D (joy, anger, sadness, pleasure)

P

V (continuous scale)

E

V (continuous pleasantness rating)

Left-active (with relatively greater alpha power over right electrode sites) vs. right-active individuals → higher enjoyment for negative and positive music – no difference for neitral

E

V (preference)

Left-active vs. right-active individuals → rated all music as more positive

P

V

Spectral power asymmetry across multiple frequency bands was a sensitive metric for characterizing brain dynamics in response to emotional states Features extracted from frontal & parietal lobes provide discriminative information associated with emotion processing

Neutral face with happy vs. sad music → N1 component in fronto-central & mid-frontal areas

Pl vs. Un → increase of frontal midline theta power No difference in the alpha range, no hemispheric lateralization

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

48

Appendix table.  Continued No.

Study

Listeners

Music

Contrast/design

53

Schmidt & Trainor, 2001

N = 59

4 orchestral excerpts 1. Positive (P) 2. Negat. (N) a. Intense (I) b. Calm (C)

54

Schmidt, Trainor, & Santesso, 2003

3 (N = 33), 6 (N = 42), 9 (N = 52), & 12 (N = 40) month old infants

3 orchestral excerpts, expressing fear, joy, and sadness

EEG A.  Valence 1.  Baseline (B) 2.  Music (M) a.  Frontal (F) b.  Parietal (P) i. Left hemisphere (L) ii. Right h­ emisphere (R)

55

Shahabi & Moghimi, 2016

N = 19

“Real” classical & Iranian music

1. Joyful (J) 2. Melancholic (M) 3. Neutral (N)

56

Singer et al., 2016

N = 44

“Real” music

m

EEG

EEG

EEG

Main findings P → greater left frontal EEG activity N → greater right frontal EEG activity I → greater overall frontal EEG activity

M vs. B (3 months) → more activation across all sites F vs. P. (9 & 12 months) → more activation M vs. B (12 months) → less activation L vs. R (12 months) → more activation V → no difference in EEG activity

J → increased connectivity in the frontal and frontal-parietal regions Perceived valence → positive correlation with frontal inter-hemispheric flow, & negative correlation with parietal bilateral connectivity Two processing-levels underlying the unfolding of common music emotionality: (1) a shared coreaffective process, confined to a limbic network (amygdala, hippocampus, sgACC & OFC) & mediated by temporal regularities in music, and (2) an experience (musical training) based process that is rooted in a left fronto-parietal network that may involve functioning of the “mirror-neuron system”

Emotion P/E

V/A/D

E

k

D-V/A fear (NI), joy (PI), happy (PC), sad (NC)

E

l

D

P

V/A

E

D (GEMS)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

Method/lesion area

Spreckelmeyer et al., 2013

N = 10

Brief melodic violin phrases

Deviant tones (3 happy & 1 sad or 3 sad & 1 happy)

EEG

58

Spreckelmeyer, Kutas, Urbach, Altenmüller, & Münte, 2006

N = 14

Syllable ‘ha’ sung by opera singers

1. Sad 2. Happy a. Pictures b. Voice

EEG

59

Steinbeis, Koelsch, & Sloboda, 2006

Musicians: N = 12 non-mus.: N = 12

3 matched versions of 6 Bach chorales (1 original & 2 manipulated)

1. Expected harmonies (EH) 2. Unexpected (UH) 3. Very unexpected (VUH) a. Musicians b. Non-mus.

EEG

60

Tsang, Trainor, Santesso, Tasker, & Schmidt, 2001

N = 55

Classical music manipulated regarding mode & tempo

1. Slow (S) 2. Fast (F) a. Minor (Mi) b. Major (Ma)

EEG

61

Gosselin, Peretz, Hasboun, Baulac, & Samson, 2011

Patients N = 16 Controls n N = 15–16

Computer-generated 1. Faces music 2. Music a. Patients b. Controls

Lesion (unilateral anteromedial temporal excision)

62

Gosselin, Peretz, Johnsen, & Adolphs, 2007 Experiment 1

Patient (S.M.) N=1 Controls N=4

Computer-generated music

Lesion (complete bilateral amygdala damage)

1. S.M. 2. Controls

deviant tone → MMN response

P

V/D (happy/ sad)

P

V/D (happy/ sad)

E, P

A (felt intensity & perceived tension)

E

V/D (happy/ sad)

Patients → impaired recognition of scary music and faces

P

Patient → impaired recognition of scary and sad music

D (happy/ sad/fear/ peaceful)

P

V/D (happy/sad & pleasant/ unpleasant)

Sad congruent pairs → LPP Happy congruent pairs → enlargement of the P2 component (indicating early integration for happy)

UH & VUH vs. EH → Felt emotional intensity increased UH & VUH vs. EH → increased perceived tension UH & VUH → early negativity (over the whole scalp), earlier for musicians VUH→ an early fronto-central & a later parietal positivity (P300), for musicians

Changes in tempo & mode towards happy →greater left frontal activation Changes in tempo & mode towards sad → greater right frontal activation

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

57

Appendix table.  Continued Study

Listeners

Music

Contrast/design

Method/lesion area

63

Gosselin, Peretz, Johnsen, & Adolphs, 2007 Experiment 2

Patient (S.M.) N=1 Controls N=7

Pre-existing music, manipulated in mode and tempo

1. Original music 2. Mode 3. Tempo a. S.M. b. Controls

Lesion (complete) bilateral amygdala damage

64

Gosselin et al., 2005

Patients N = 16 controls N = 16

Computer-generated

1. Patients 2. Controls

Lesion (amygdala resection)

65

Gosselin et al., 2006

Epileptic patients N = 17 Controls N = 19

Synthesized instrumental classical music

1. Dissonance – 2. Consonance a. Patients b. Controls

Lesion (PHC resection)

66

Griffiths, Warren, Dean, & Howard, 2004

Lesion patient N=1

Personal music

Case study (effect of lesion)

Focal lesion (left insula, frontal lobe, & amygdala)

67

Hsieh, Hornberger, Piguet, & Hodges, 2012

SD patients N = 11 AD patients N = 12 Controls N = 20

Computer-generated melodies in piano timbre

1. Music 2. Faces a. AD b. SD c. Controls

Volumetric MRI VBM

Main findings Patient → no emotion recognition impairment when mode and tempo are manipulated

Emotion P/E

V/A/D

P

V/D (happy/sad)

P

D (happy/ sad/fear/ peaceful)

P

V/D (happy/sad & pleasant/ unpl.)

Selective loss of emotional responding to music – normal recognition of emotion

E



Emotion recognition in music & faces → correlated with atrophy in right TP, amygdala, & insula Emotion recognition in music only → correlated with atrophy in left anterior & inferior temporal lobe SD & AD vs. Controls → greater emotions recognition impairment in music than faces SD vs. AD → greater impairment

P

D (happy/ peaceful/ scary/sad)

Patients → impaired recognition of scary music

Patients vs. controls → rated dissonant music as slightly pleasant

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

No.

Music vs. tones → enhanced gamma band activity in AC AC asymmetry in emotional processing in music Left vs. right AC → stronger high gamma fluctuations Right vs. left AC → emotion-modulated gamma oscillatory activity Happy, sad & peaceful music → right-lateralized functional coupling between amygdala & AC Angry (& disliked) music → de-­correlation between all the structures

P→

Auditory agnosia/ bitemporal injury

Music listening remained an emotionally rewarding experience, despite impaired music perception (inability to recognize melody and timbre, to name highly familiar tunes, to recognize expressed emotion, & to reproduce an overlearned rhythm)

E



Right temporo-parietal malacic area, involving the plica curva & supramarginal gyrus

Music anhedonia Right temporoparietal malacic area

Incapacity to discriminate between melodic patterns, timbre, & pitch, qualitative alterations in emotional involvement in the music





1. SD 2. AD 3. Controls

SD: focal degenerative pathology

Impaired emotion recognition in SD but not AD

P

D (happy/sad/ anger/fear)

1 epilepsy patient (implanted with chronic depth electrodes)

“Real” instrumental classical

1. Attentive music listening 2. Passive listening to pure tones

Intra-cerebral EEG in AC & amygdala

69

Matthews, Chang, De May, Engstrom, & Miller, 2009

1 patient

Several tasks measuring perception of sounds and music

Auditory agnosia/ bitemporal injury

70

Mazzoni, Moretti, Pardossi, Vista, & Muratorio, 1993

1 anhedonic participant



71

Omar, Hailstone, Warren, Crutch, & Warren, 2010 (Experiment 3)

musicians SD: N = 1 AD: N = 1 Controls: N=6

Western classical canon and film scores

E→

D (happy/ sad/ angry/ peaceful) V (like/dislike)

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

Liégeois-Chauvel et al., 2014

68

Appendix table.  Continued Study

Listeners

Music

Contrast/design

Method/lesion area

72

Omar et al., 2011

FTLD patients N = 26 Controls N = 21

Western classical canon and film scores

1. Patients 2. Controls a. Music b. Faces c. Non-verbal vocal sounds

VBM FTLD Neuro­ degenerative disease

73

Peretz & Gagnon, 1999

Music agnosia patient N=1 Controls N=4

Computer-generated melodies

1. Patient 2. Controls a. Melody recognition b. Emotion recognition

Music agnosia bilateral cerebral damage

74

Peretz, Gagnon, & Bouchard, 1998 Experiment 1

Patient N=1 Controls N=4

Computer-generated & original instrumental Western music

1. Patient 2. Controls

Lesions in temporal lobes & frontal areas

75

Peretz, Gagnon, & Bouchard, 1998 (Exp. 1–6)

Patient N=1 Controls N=4

Computer-generated and original instrumental Western music

1. Patient 2. Controls

Lesions in temporal lobes & frontal areas

76

Satoh, Nakase, Nagata, & Tomimoto, 2011

N=1



Case study

Music anhedonia: Infarction in right parietal lobe

Main findings FTLD vs. controls → impairedrecognition of emotions in music associated with gray matter loss in insula, OfC, anterior cingulate and medial PfC, anterior temporal & posterior temporal and parietal cortices, amygdala, & the subcortical mesolimbic system

Emotion P/E

V/A/D

P

D (happy/sad/ anger/fear)

P

V/D (happy/sad)

Patient → Unaffected ability to perceive musical expression

P

V/D (happy/sad)

Patient → Preserved emotional appreciation of music, despite an impaired recognition/knowledge

P

V/D (happy/sad)

E



Patient vs. controls → selective sparing of the processing of music’s emotional tone, despite music agnosia (i.e., deficits in discrimination & recognition of familiar & unfamiliar melodies)

Impaired experienced emotion in music, but not visual experienced emotion, nor perceived emotion in music

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

No.

77

Bryden, Ley, & Sugarman, 1982

N = 20

7-note tonal sequences

1. Left ear 2. Right ear

Dichotic listening

78

Gagnon & Peretz, 2000

For this task: N = 16 (total N = 32)

Computer-generated tonal and atonal short melodies

1. Tonal 2. Atonal a. non-affective instruction group b. affective instruction o group

Dichotic listening

Left ear (indicating right hemisphere) advantage in emotion perception Pleasant → left hemisphere Unpleasant → right hemisphere

P

V(positive/ neutral/ negative)

P

V (pleasantness)

Notes: a “Pleasant feelings” b Also included EEG for measuring functional relations c Stimulus pleasantness and reward value (as determined from pilot study) d Inferred pleasantness-unpleasantness due to dissonance-consonance e Music that is familiar-liked; emotional reactions are presumed to recruit these areas f Reward value of music indicated by participants’ willingness to purchase the music g Also measured EEG h Preference i Self-reported mood j Also included fMRI for measuring brain sites k Participants were instructed to feel the mood that the music expresses l Music that was expected to induce the emotion it expresses m Also measured fMRI n 15 participants for the musical task and 16 for the facial task. o Participants were instructed to judge whether the melody was pleasant or unpleasant

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

Emotion: P = perceived; E = evoked/induced; V = valence; A = arousal; D = discrete emotions, GEMS = Geneva Emotional Music Scale. In contrast/design column: numbers indicate levels of factor 1; letters indicate levels of factor 2; Latin letters indicate levels of factor 3.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

324   patrik n. juslin and laura s. sakka

References (Articles marked * are included in the empirical review) Adolphs, R., Damasio, H., & Tranel, D. (2002). Neural systems for recognition of emotional prosody: A 3-D lesion study. Emotion 2, 23–51. *Alfredson, B. B., Risberg, J., Hagberg, B., & Gustafson, L. (2004). Right temporal lobe activation when listening to emotionally significant music. Applied Neuropsychology 11, 161–166. *Altenmüller, E., Schurmann, K., Lim, V. K., & Parlitz, D. (2002). Hits to the left, flops to the right: Different emotions during listening to music are reflected in cortical lateralisation patterns. Neuropsychologia 40, 2242–2256. *Ball, T., Rahm, B., Eickhoff, S. B., Schulze-Bonhage, A., Speck, O., & Mutschler, I. (2007). Response properties of human amygdala subregions: Evidence based on functional MRI combined with probabilistic anatomical maps. PLoS ONE 2, e307. Barrett, L. F. (2017). How emotions are made: The secret life of the brain. Boston: Houghton Mifflin Harcourt. Baumgartner, H. (1992). Remembrance of things past: Music, autobiographical memory, and emotion. Advances in Consumer Research 19, 613–620. *Baumgartner, T., Esslen, M., & Jäncke, L. (2005). From emotion perception to emotion experience: Emotions evoked by pictures and classical music. International Journal of Psychophysiology 60, 34–43. *Baumgartner, T., Lutz, K., Schmidt, C. F., & Jäncke, L. (2006). The emotional power of music: How music enhances the feeling of affective pictures. Brain Research 1075, 151–164. Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton Century Crofts. Blair, M. E., & Shimp, T. A. (1992). Consequences of an unpleasant experience with music: A second-order negative conditioning perspective. Journal of Advertising 21, 35–43. Blonder, L.  X. (1999). Brain and emotion relations in culturally diverse populations. In A.  L.  Hinton (Ed.), Biocultural approaches to the emotions (pp. 275–296). Cambridge: Cambridge University Press. *Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences 98, 11818–11823. *Blood, A.  J., Zatorre, R.  J., Bermudez, P., & Evans, A.  C. (1999). Emotional responses to ­pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience 2, 382–387. Botvinick, M.  M., Cohen, J.  D., & Carter, C.  S. (2004). Conflict monitoring and anterior ­cingulate cortex. Trends in Cognitive Sciences 8, 539–546. Boulis, N. M., Kehne, J. H., Miserendino, M. J. D., & Davis, M. (1990). Differential blockade of early and late components of acoustic startle following intrathecal infusion of 6-cyano7-nitroquinoxaline-2,3-dione (CNQX) or D, L-2-amino-5-phosphonovaleric acid (AP-5). Brain Research 520, 240–246. Brandao, M. L., Melo, L. L., & Cardoso, S. H. (1993). Mechanisms of defense in the inferior colliculus. Behavioral Brain Research 58, 49–55. *Brattico, E., Alluri, V., Bogert, B., Jacobsen, T., Vartiainen, N., Nieminen, S., & Tervaniemi, M. (2011). A functional MRI study of happy and sad emotions in music with and without lyrics. Frontiers in Psychology 2, 308. Retrieved from https://doi.org/10.3389/fpsyg.2011.00308 Bressler, S. L., & Menon, V. (2010). Large-scale brain networks in cognition: Emerging methods and principles. Trends in Cognitive Sciences 14, 277–290.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   325 Brown, C. M., Hagoort, P., & Kutas, M. (2000). Postlexical integration processes in language comprehension: Evidence from brain-imaging research. In M. S. Gazzaniga (Ed.), The new cognitive neurosciences (2nd ed., pp. 881–895). Cambridge, MA: MIT Press. *Brown, S., Martinez, M. J., & Parsons, L. M. (2004). Passive music listening spontaneously engages limbic and paralimbic systems. Neuroreport 15, 2033–2037. Bryan, G. A., & Barrett, H. C. (2008). Vocal emotion recognition across disparate cultures. Journal of Cognition and Culture 8, 135–148. *Bryden, M. P., Ley, R. G., & Sugarman, J. H. (1982). A left-ear advantage for identifying the emotional quality of tonal sequences. Neuropsychologia 20, 83–87. Cabeza, R., & Nyberg, L. (2000). Imaging cognition II: An empirical review of 275 PET and fMRI studies. Journal of Cognitive Neuroscience 12, 1–47. Caplan, D., Alpert, N., & Waters, G. (1998). Effects of syntactic structure and propositional number on patterns of regional cerebral blood flow. Journal of Cognitive Neuroscience 10, 541–542. *Caria, A., Venuti, P., & de Falco, S. (2011). Functional and dysfunctional brain circuits underlying emotional processing of music in autism spectrum disorders. Cerebral Cortex 21, 2838–2849. *Chapin, H., Jantzen, K., Kelso, J. S., Steinberg, F., & Large, E. (2010). Dynamic emotional and neural responses to music depend on performance expression and listener experience. PloS ONE 5, e13812. Charlot, V., Tzourio, N., Zilbovicius, M., Mazoyer, B., & Denis, M. (1992). Different mental imagery abilities result in different regional cerebral blood flow activation patterns during cognitive tasks. Neuropsychologia 30, 565–580. Chikazoe, J., Lee, D. H., Kriegeskorte, N., & Anderson, A. K. (2014). Population coding of affect across stimuli, modalities, and individuals. Nature Neuroscience 17, 1114–1122. Clark-Polner, E., Wager, T. D., Satpute, A. B., & Barrett, L. F. (2016). Neural fingerprinting: Meta-analysis, variation, and the search for brain-based essences in the science of emotion. In L. F. Barrett, M. Lewis, & J. M. Haviland-Jones (Eds.), Handbook of emotions (4th ed., pp. 146–165). New York: Guilford Press. Clynes, M. (1977). Sentics: The touch of emotions. New York: Doubleday. Cui, X., Jeter, C. B., Yang, D., Montague, P. R., & Eagleman, D. M. (2007). Vividness of mental imagery: Individual variability can be measured objectively. Vision Research 47, 474–478. *Daly, I., Malik, A., Hwang, F., Roesch, E., Weaver, J., Kirke, A., . . . Nasuto, S. J. (2014). Neural correlates of emotional responses to music: An EEG study. Neuroscience Letters 573, 52–57. *Daly, I., Williams, D., Hallowell, J., Hwang, F., Kirke, A., Malik, A., . . . Nasuto, S.  J. (2015). Music-induced emotions can be predicted from a combination of brain activity and acoustic features. Brain and Cognition 101, 1–11. Damasio, A. (1994). Descartes’ error: Emotion, reason, and the human brain. New York: Avon Books. Damasio, A.  R., Grabowski, T.  J., Bechara, A., Damasio, H., Ponto, L.  L.  B., Parvizi, J., & Hichwa, R.  D. (2000). Subcortical and cortical brain activity during the feeling of selfgenerated emotions. Nature Neuroscience 3, 1049–1056. Davidson, R. J. (1995). Celebral asymmetry, emotion, and affective style. In R. J. Davidson & K. Hugdahl (Eds.), Brain asymmetry (pp. 361–387). Cambridge, MA: MIT Press. Davis, M. (1984). The mammalian startle response. In R. C. Eaton (Ed.), Neural mechanisms of startle behavior (pp. 287–342). New York: Plenum Press. Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

326   patrik n. juslin and laura s. sakka Dolcos, F., LaBar, K.  S., & Cabeza, R. (2005). Remembering one year later: Role of the amygdala and the medial temporal lobe memory system in retrieving emotional memories. Proceedings of the National Academy of Sciences 102, 2626–2631. Dowling, W. J., & Harwood, D. L. (1986). Music cognition. New York: Academic Press. *Eldar, E., Ganor, O., Admon, R., Bleich, A., & Hendler, T. (2007). Feeling the real world: Limbic response to music depends on related content. Cerebral Cortex 17, 2828–2840. *Escoffier, N., Zhong, J., Schirmer, A., & Qui, A. (2013). Emotions in voice and music: Same code, same effect? Human Brain Mapping 34, 1796–1810. Fanselow, M. S., & Poulos, A. M. (2005). The neuroscience of mammalian associative learning. Annual Review of Psychology 56, 207–234. Farah, M. J. (2000). The neural bases of mental imagery. In M. S. Gazzaniga (Ed.), The new cognitive neurosciences (2nd ed., pp. 965–974). Cambridge, MA: MIT Press. *Field, T., Martinez, A., Nawrocki, T., Pickens, J., Fox, N. A., & Schanberg, S. (1998). Music shifts frontal EEG in depressed adolescents. Adolescence 33, 109–116. *Flores-Gutierrez, E. O., Diaz, J.-L., Barrios, F. A., Favila-Humara, R., Guevara, M. A., del Rio-Portilla, Y., & Corsi-Cabrera, M. (2007). Metabolic and electric brain patterns during pleasant and unpleasant emotions induced by music masterpieces. International Journal of Psychophysiology 65, 69–84. *Flores-Gutiérrez, E. O., Díaz, J.-L., Barrios, F. A., Guevara, M. Á., del Río-Portilla, Y., CorsiCabrera, M., & del Flores-Gutiérrez, E. O. (2009). Differential alpha coherence hemispheric patterns in men and women during pleasant and unpleasant musical emotions. International Journal of Psychophysiology 71, 43–49. Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., . . . Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology 19, 1–4. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized timing of isochronous sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32, 1791–1802. *Gagnon, L., & Peretz, I. (2000). Laterality effects in processing tonal and atonal melodies with affective and nonaffective task instructions. Brain & Cognition 43, 206–210. Gärdenfors, P. (2003). How homo became sapiens: On the evolution of thinking. Oxford: Oxford University Press. Gilboa, A. (2004). Autobiographical and episodic memory: One and the same? Evidence from prefrontal activation in neuroimaging studies. Neuropsychologica 42, 1336–1349. Goldenberg, G., Podreka, I., Steiner, M., Franzén, P., & Deecke, L. (1991). Contributions of occipital and temporal brain regions to visual and acoustic imagery: A SPECT study. Neuropsychologia 29, 695–702. *Gosselin, N., Peretz, I., Hasboun, D., Baulac, M., & Samson, S. (2011). Impaired recognition of musical emotions and facial expressions following anteromedial temporal lobe excision. Cortex 47, 1116–1125. *Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emotion recognition from music. Neuropsychologia 45, 236–244. *Gosselin, N., Peretz, I., Noulhiane, M., Hasboun, D., Beckett, C., Baulac, M., & Samson, S. (2005). Impaired recognition of scary music following unilateral temporal lobe excision. Brain 128, 628–640. *Gosselin, N., Samson, S., Adolphs, R., Noulhiane, M., Roy, M., Hasboun, D., . . . Peretz, I. (2006). Emotional responses to unpleasant music correlates with damage to the parahippocampal cortex. Brain 129, 2585–2592.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   327 *Goydke, K. N., Altenmüller, E., Möller, J., & Münte, T. (2004). Changes in emotional tone and instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Research 21, 351–359. Grahn, J. A., Henry, M. J., & McAuley, J. D. (2011). fMRI investigation of cross-modal interactions in beat perception: Audition primes vision, but not vice versa. NeuroImage 54, 1231–1243. *Green, A. C., Baerentsen, K., Stodkilde-Jorgensen, H., Wallentin, M., Roepstorff, A., & Vuust, P. (2008). Music in minor activates limbic structure: A relationship with dissonance? Neuroreport 19, 711–715. *Griffiths, T. D., Warren, J. D., Dean, J. L., & Howard, D. (2004). “When the feeling’s gone”: A  selective loss of musical emotion. Journal of Neurology, Neurosurgery & Psychiatry 75, 344–345. Haist, F., Gore, J. B., & Mao, H. (2001). Consolidation of human memory over decades revealed by functional magnetic resonance imaging. Nature Neuroscience 4, 1139–1145. Harmon-Jones, E., Harmon-Jones, C., & Summerell, E. (2017). On the importance of both dimensional and discrete models of emotion. Behavioral Sciences 7, 66. Harrer, G., & Harrer, H. (1977). Music, emotion, and autonomic function. In M. Critchley & R. A. Henson (Eds.), Music and the brain: Studies in the neurology of music (pp. 202–216). London: William Heinemann Medical Books. Hodges, D., & Sebald, D. (2011). Music in the human experience: An introduction to music psychology. New York: Routledge. *Hsieh, S., Hornberger, M., Piguet, O., & Hodges, J. R. (2012). Brain correlates of musical and facial emotion recognition: Evidence from the dementias. Neuropsychologia 50, 1814–1822. Izard, C. E. (1977). The emotions. New York: Plenum Press. *Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex 19, 2579–2594. *Jeong, J.-W., Diwadkar, V.  A., Chugani, C.  D., Sinsoongsud, P., Muzik, O., Behen, M. E., . . . Chugani, D. C. (2011). Congruence of happy and sad emotion in music and faces modifies cortical audiovisual activation. NeuroImage 54, 2973–2982. Johnsrude, I.  S., Owen, A.  M., White, N.  M., Zhao, W.  V., & Bohbot, V. (2000). Impaired ­preference conditioning after anterior temporal lobe resection in humans. Journal of Neuroscience 20, 2649–2656. Juslin, P. N. (2001). Communicating emotion in music performance: A review and a theoretical framework. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 309–337). Oxford: Oxford University Press. Juslin, P.  N. (2011). Music and emotion: Seven questions, seven answers. In I.  Deliège & J. Davidson (Eds.), Music and the mind (pp. 113–135). Oxford: Oxford University Press. Juslin, P. N. (2013a). From everyday emotions to aesthetic emotions: Toward a unified theory of musical emotions. Physics of Life Reviews 10, 235–266. Juslin, P.  N. (2013b). What does music express? Basic emotions and beyond. Frontiers in Psychology: Emotion Science 4, 596. Juslin, P. N. (2019). Musical emotions explained. Oxford: Oxford University Press. Juslin, P.  N., Barradas, G., & Eerola, T. (2015). From sound to significance: Exploring the mechanisms underlying emotional reactions to music. American Journal of Psychology 128, 281–304. Juslin, P.  N., Harmat, L., & Eerola, T. (2014). What makes music emotionally significant? Exploring the underlying mechanisms. Psychology of Music 42, 599–623.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

328   patrik n. juslin and laura s. sakka Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin 129, 770–814. Juslin, P. N., Liljeström, S., Västfjäll, D., Barradas, G., & Silva, A. (2008). An experience sampling study of emotional reactions to music: Listener, music, and situation. Emotion 8, 668–683. Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford: Oxford University Press. Juslin, P. N., & Sloboda, J. A. (Eds.). (2010). Handbook of music and emotion: Theory, research, applications. Oxford: Oxford University Press. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31, 559–575. *Kamiyama, K. S., Abla, D., Iwanaga, K., & Okanoya, K. (2013). Interaction between musical emotion and facial expression as measured by event-related potentials. Neuropsychologia 51, 500–505. Kassam, K.  S., Markey, A.  R., Cherkassky, V.  L., Loewenstein, G., & Just, M.  A. (2013). Identifying emotions on the basis of neural activation. PLoS ONE 8, e66032. *Khalfa, S., Schon, D., Anton, J. L., & Liégeois-Chauvel, C. (2005). Brain regions involved in the recognition of happiness and sadness in music. Neuroreport 16, 1981–1984. Kinomura, S., Larsson, J., Gulyás, B., & Roland, P. E. (1996). Activation by attention of the human reticular formation and thalamic intralaminar nuclei. Science 271, 512–515. Kivy, P. (1990). Music alone: Reflections on a purely musical experience. Ithaca, NY: Cornell University Press. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15, 170–180. *Koelsch, S., Fritz, T., & Schlaug, G. (2008). Amygdala activity can be modulated by unexpected chord functions during music listening. Neuroreport 19, 1815–1819. *Koelsch, S., Fritz, T., von Cramon, D. Y., Müller, K., & Friederici, A. D. (2006). Investigating emotion with music: An fMRI study. Human Brain Mapping 27, 239–250. *Koelsch, S., Kilches, S., Steinbeis, N., & Schelinski, S. (2008). Effects of unexpected chords and of performer’s expression on brain responses and electrodermal activity. PLoS ONE 3, e2631. *Koelsch, S., Remppis, A., Sammler, D., Jentschke, S., Mietchen, D., Fritz, T., . . . Siebel, W. A. (2007). A cardiac signature of emotionality. European Journal of Neuroscience 26, 3328–3338. Koelsch, S., Siebel, W.  A., & Fritz, T. (2010). Functional neuroimaging. In P.  N.  Juslin & J.  A.  Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 313–344). Oxford: Oxford University Press. *Koelsch, S., Skouras, S., Fritz, T., Herrera, P., Bonhage, C., Küssner, M. B., & Jacobs, A. M. (2013). The roles of superficial amygdala and auditory cortex in music-evoked fear and joy. NeuroImage 81, 49–60. Koelsch, S., Skouras, S., & Lohmann, G. (2018). The auditory cortex hosts network nodes influential for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13, e0190057. Kreutz, G., & Lotze, M. (2007). Neuroscience of music and emotion. In W.  Gruhn & F. Rauscher (Eds.), Neurosciences in music pedagogy (pp. 143–167). New York: Nova. *Kreutz, G., Ott, U., & Wehrum, S. (2006). Cerebral correlates of musically-induced emotions: An fMRI-study. In M.  Baroni et al. (Eds.), Proceedings of the Ninth International Conference on Music Perception and Cognition (ICMPC). Bologna, August 22–26.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   329 Lane, R.  D. (2000). Neural correlates of conscious emotional experience. In R.  D.  Lane &  L.  Nadel (Eds.), Cognitive neuroscience of emotion (pp. 345–370). Oxford: Oxford University Press. Langer, S. K. (1957). Philosophy in a new key. Cambridge, MA: Harvard University Press. LeDoux, J. E. (2000). Cognitive-emotional interactions: Listen to the brain. In R. D. Lane & L. Nadel (Eds.), Cognitive neuroscience of emotion (pp. 129–155). Oxford: Oxford University Press. *Lehne, M., Rohrmeier, M., & Koelsch, S. (2014). Tension-related activity in the orbitofrontal cortex and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9, 1515–1523. *Lerner, Y., Papo, D., Zhdanov, A., Belozersky, L., & Hendler, T. (2009). Eyes wide shut: Amygdala mediates eyes-closed effect on emotional experience with music. PLoS ONE 4, e6230. *Liégeois-Chauvel, C., Bénar, C., Krieg, J., Delbé, C., Chauvel, P., Giusiano, B., & Bigand, E. (2014). How functional coupling between the auditory cortex and the amygdala induces musical emotion: A single case study. Cortex 60, 82–93. *Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010). EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering 57, 1798–1806. Lindquist, K. A., Wager, T. D., Kober, H., Bliss-Moreau, E., & Barrett, L. F. (2012). The brain basis of emotion: A meta-analytic review. Behavioral and Brain Sciences 35, 121–143. *Logeswaran, N., & Bhattacharya, J. (2009). Crossmodal transfer of emotion by music. Neuroscience Letters 455, 129–133. Lundqvist, L.-O., Carlsson, F., Hilmersson, P., & Juslin, P. N. (2009). Emotional responses to music: Experience, expression, and physiology. Psychology of Music 37, 61–90. MacDonald, R., Kreutz, G., & Mitchell, L. (Eds.). (2012). Music, health, and well-being. Oxford: Oxford University Press. Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: A MEG study. Nature Neuroscience 4, 540–545. *Matthews, B. R., Chang, C.-C., De May, M., Engstrom, J., & Miller, B. L. (2009). Pleasurable emotional response to music: A case of neurodegenerative generalized auditory agnosia. Neurocase 15, 248–259. *Mazzoni, M., Moretti, P., Pardossi, L., Vista, M., & Muratorio, A. (1993). A case of music imperception. Journal of Neurology, Neurosurgery & Psychiatry 56, 322–324. *Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage 28, 175–184. Meyer, L. B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press. Miserendino, M. J. D., Sananes, C. B., Melia, K. R., & Davis, M. (1990). Blocking of acquisition but not expression of conditioned fear-potentiated startle by NMDA antagonists in the amygdala. Nature 345, 716–718. *Mitterschiffthaler, M. T., Fu, C. H., Dalton, J. A., Andrew, C. M., & Williams, S. C. (2007). A functional MRI study of happy and sad affective states induced by classical music. Human Brain Mapping 28, 1150–1162. *Mizuno, T., & Sugishita, M. (2007). Neural correlates underlying perception of tonalityrelated emotional contents. Neuroreport 18, 1651–1655. *Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J., . . . Möller, H. E. (2015). Investigating the dynamics of the brain response to music: A central role of the ventral ­striatum/nucleus accumbens. NeuroImage 116, 68–79.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

330   patrik n. juslin and laura s. sakka Murphy, F.  C., Nimmo-Smith, I., & Lawrence, A.  D. (2003). Functional neuroanatomy of emotions: A meta-analysis. Cognitive, Affective, & Behavioral Neuroscience 3, 207–233. *Nair, D. G., Large, E. W., Steinberg, F., & Kelso, J. A. S. (2002). Perceiving emotion in expressive piano performance: A functional MRI study. In K. Stevens et al. (Eds.), Proceedings of the 7th International Conference on Music Perception and Cognition, July 2002 (CD rom). Adelaide, Australia: Causal Productions. Nyberg, L., McIntosh, A.  R., Houle, S., Nilsson, L.-G., & Tulving, E. (1996). Activation of medial-temporal structures during episodic memory retrieval. Nature 380, 715–717. *Omar, R., Hailstone, J. C., Warren, J. E., Crutch, S. J., & Warren, J. D. (2010). The cognitive organization of music knowledge: A clinical analysis. Brain 133, 1200–1213. *Omar, R., Henley, S., Bartlett, J. W., Hailstone, J. C., Gordon, E., Sauter, D. A., . . . Warren, J. D. (2011). The structural neuroanatomy of music emotion recognition: Evidence from frontotemporal lobar degeneration. NeuroImage 56, 1814–1821. Osborne, J. W. (1980). The mapping of thoughts, emotions, sensations, and images as responses to music. Journal of Mental Imagery 5, 133–136. *Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koistovo, J., Gjedde, A., & Carlson, S. (2005). Emotion processing of major, minor, and dissonant chords: A functional magnetic resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453. Paquette, S., Takerkart, S., Saget, S., Peretz, I., & Belin, P. (2018). Cross-classification of musical and vocal emotions in the auditory cortex. Annals of the New York Academy of Sciences 1423, 329–337. Pascual-Leone, A., Davey, N. J., Rothwell, J., Wassermann, E. M., & Puri, B. K. (Eds.). (2002). Handbook of transcranial magnetic stimulation. Oxford: Oxford University Press. Paulmann, S., Ott, D. V. M., & Kotz, S. A. (2011). Emotional speech perception unfolding in time: The role of the basal ganglia. PLoS ONE 6, e17694. *Perani, D., Saccuman, M.  C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., . . . Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107, 4758–4763. *Pereira, C. S., Teixeira, J., Figueiredo, P., Xavier, J., Castro, S. L., & Brattico, E. (2011). Music and emotions in the brain: Familiarity matters. PLoS ONE 6, e27241. Peretz, I. (2001). Listen to the brain: A biological perspective on musical emotions. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 105–134). Oxford: Oxford University Press. Peretz, I. (2010). Towards a neurobiology of musical emotions. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 99–126). Oxford: Oxford University Press. *Peretz, I., & Gagnon, L. (1999). Dissociation between recognition and emotional judgment for melodies. Neurocase 5, 21–30. *Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants, immediacy, and isolation after brain damage. Cognition 68, 111–141. Pessoa, L. (2013). The cognitive-emotional brain: From interactions to integration. Cambridge, MA: MIT Press. *Petrini, K., Crabbe, F., Sheridan, C., & Pollick, F.  E. (2011). The music of your emotions: Neural substrates involved in detection of emotional correspondence between auditory and visual music actions. PloS ONE 6, e19165. Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences 10, 59–63.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neural correlates of music and emotion   331 Rossignol, S., & Jones, G. (1976). Audio-spinal influence in man studied by the H-reflex and its possible role on rhythmic movements synchronized to sound. Electroencephalography and Clinical Neurophysiology 41, 83–92. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178. Saarimäki, H., Gotsopoulos, A., Jaaskelainen, I. P., Lampinen, J., Vuilleumier, P., Hari, R., . . . Nummenmaa, L. (2016). Discrete neural signatures of basic emotions. Cerebral Cortex 26, 2563–2573. Sacchetti, B., Scelfo, B., & Strata, P. (2005). The cerebellum: Synaptic changes and fear conditioning. The Neuroscientist 11, 217–227. *Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14, 257–262. *Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R.  J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340, 216–219. *Sammler, D., Grigutsch, M., Fritz, T., & Koelsch, S. (2007). Music and emotion: Electrophysiological correlates of the processing of pleasant and unpleasant music. Psychophysiology 44, 293–304. *Satoh, M., Nakase, T., Nagata, K., & Tomimoto, H. (2011). Musical anhedonia: Selective loss of emotional experience in listening to music. Neurocase 17, 410–417. Scherer, K. R. (1999). Appraisal theories. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion (pp. 637–663). Chichester: Wiley. Scherer, K.  R., & Zentner, M.  R. (2001). Emotional effects of music: Production rules. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361–392). Oxford: Oxford University Press. Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: Brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences 10, 24–30. *Schmidt, B., & Hanslmayr, S. (2009). Resting frontal EEG alpha-asymmetry predicts the evaluation of affective musical stimuli. Neuroscience Letters 460, 237–240. *Schmidt, L. A., & Trainor, L. J. (2001). Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cognition & Emotion 15, 487–500. *Schmidt, L. A., Trainor, L. J., & Santesso, D. L. (2003). Development of frontal encephalogram (EEG) and heart rate (ECG) responses to affective musical stimuli during the first 12 months of post-natal life. Brain and Cognition 52, 27–32. *Shahabi, H., & Moghimi, S. (2016). Toward automatic detection of brain responses to emotional music through analysis of EEG effective connectivity. Computers in Human Behavior 58, 231–239. *Singer, N., Jacoby, N., Lin, T., Raz, G., Shpigelman, L., Gilam, G., . . . Hendler, T. (2016). Common modulation of limbic network activation underlies musical emotions as they unfold. NeuroImage 141, 517–529. Sloboda, J. A., & Juslin, P. N. (2001). Psychological perspectives on music and emotion. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 71–104). Oxford: Oxford University Press. Spitzer, M. (2013). Sad flowers: Affective trajectory in Schubert’s Trockne Blumen. In T. Cochrane, B. Fantini, & K. R. Scherer (Eds.), The emotional power of music (pp. 7–21). Oxford: Oxford University Press.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

332   patrik n. juslin and laura s. sakka *Spreckelmeyer, K.  N., Altenmüller, E., Colonius, H., & Münte, T.  F. (2013). Preattentive processing of emotional musical tones: A multidimensional scaling and ERP study. Frontiers in Psychology 4, 656. doi:10.3389/fpsyg.2013.00656 *Spreckelmeyer, K.  N., Kutas, M., Urbach, T.  P., Altenmüller, E., & Münte, T.  F. (2006). Combined perception of emotion in pictures and musical sounds. Brain Research 1070, 160–170. *Steinbeis, N., & Koelsch, S. (2009). Understanding the intentions behind man-made products elicits neural activity in areas dedicated to mental state attribution. Cerebral Cortex 19, 619–623. *Steinbeis, N., Koelsch, S., & Sloboda, J. A. (2006). The role of musical structure in emotion: Investigating neural, physiological, and subjective emotional responses to harmonic expectancy violations. Journal of Cognitive Neuroscience 18, 1380–1393. Stromswold, K., Caplan, D., Alpert, N., & Rauch, S. (1996). Localization of syntactic comprehension by positron emission tomography. Brain and Language 52, 452–473. *Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., . . . Yanai, K. (2008). Discrete cortical regions associated with the musical beauty of major and minor chords. Cognitive, Affective, & Behavioral Neuroscience 8, 126–131. Thaut, M. H., & Wheeler, B. L. (2010). Music therapy. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 819–848). Oxford: Oxford University Press. Thornton-Wells, T.  A., Cannistraci, C.  J., Anderson, A.  W., Kim. C.  Y., Eapen, M., Gore, J. C., . . . Dykens, E. M.(2010). Auditory attraction: Activation of visual cortex by music and sound in Williams syndrome. American Journal on Intellectual and Developmental Disabilities 115, 172–189. Tierney, A., & Kraus, N. (2013). The ability to move to a beat is linked to the consistency of neural responses to sound. Journal of Neuroscience 33, 14981–14988. *Trost, W., Ethofer, T., Zentner, M. R., & Vuilleumier, P. (2012). Mapping aesthetic musical emotions in the brain. Cerebral Cortex 22, 2769–2783. Trost, W., Frühholz, S., Schön, D., Labbé, C., Pichon, S., Grandjean, D., & Vuilleumier, P. (2014). Getting the beat: Entrainment of brain activity by musical rhythm and pleasantness. NeuroImage 103, 55–64. *Tsang, C. D., Trainor, L. J., Santesso, D. L., Tasker, S. L., & Schmidt, L. A. (2001). Frontal EEG responses as a function of affective musical features. Annals of the New York Academy of Sciences 930, 439–442. Wager, T. D., Barrett, L. F., Bliss-Moreau, E., Lindquist, K. A., Duncan, S., Kober, H., & Mize, J. (2008). The neuroimaging of emotion. In M. Lewis, J. M. Haviland-Jones, & L. F. Barrett (Eds.), Handbook of emotions (3rd ed., pp. 249–267). New York: Guilford Press. Wagner, A. D., Shannon, B. J., Kahn, I., & Buckner, R. L. (2005). Parietal lobe contributions to episodic memory retrieval. Trends in Cognitive Sciences 9, 445–453. Warren, J. E., Sauter, D. A., Eisner, F., Wiland, J., Dresner, M. A., Wise, R. J., . . . Scott, S. K. (2006). Positive emotions preferentially engage an auditory-motor “mirror” system. Journal of Neuroscience 26, 13067–13075. Zentner, M.  R. (2010). Homer’s prophecy: An essay on music’s primary emotions. Music Analysis 29, 102–125.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 14

N eu roch emica l R esponses to M usic Yuko Koshimori

Introduction Music is used in various settings in everyday life and can modulate our mood, emotion, arousal, motivation, and movement. These music-induced effects can be objectively assessed using neuroimaging techniques and peripheral biomarkers. For example, functional neuroimaging studies have demonstrated that listening to music alters brain activity in various brain regions in the mesocorticolimbic pathways such as the anterior cingulate cortex, orbitofrontal cortex, insula, amygdala, hippocampus, and ventral striatum, which are implicated in reward, motivation, and emotional behaviors, in addition to the brain regions in the motor pathways such as the premotor and supplementary motor areas, thalamus, basal ganglia, and cerebellum. These neuroimaging studies reveal important anatomical information that also allows us to infer what brain functions music can modulate. However, within the same anatomical region, different neuroreceptors are expressed. Knowledge of the neurochemical functions can uncover more specific effects of music on various brain functions and help us to better understand the effects of music on brain pathology. Neurochemical functions can be measured using positron emission tomography (PET). PET imaging is a nuclear medicine imaging technique to quantify the chemical/ biological processes of molecules in vivo by injecting radiolabeled molecules (i.e., radioligands; Venneti, Lopresti, & Wiley,  2013). The radioligands typically resemble endogenous biological molecules with specific biological targets that they bind. This allows for mapping the distribution of the molecules in the brain. The radioligand is synthesized by labeling a precursor molecule with short-lived radionuclides such as carbon-11 (t1/2 = 20.4 min) or fluorine-18 (t1/2 = 109.8 min). After the radioligand is injected intravenously, it enters the bloodstream, crosses the blood–brain barrier, and binds target receptors or proteins in the brain. The radioligands can be agonists that

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

334   yuko koshimori induce downstream signaling in a manner similar to the endogenous molecules or antagonists that block the receptor and prevent it from being available to the endogenous molecules (Gunn, Slifstein, Searle, & Price, 2015). Multiple radioligands for different targets have been developed such as dopamine (DA), serotonin (5-HT), norepinephrine (NE), opioid, acetylcholine (Ach) and others (Gunn et al., 2015). PET imaging employing these radioligands allows for uncovering music-induced neurotransmissions that bind target neuroreceptors. On the other hand, major limitations of PET imaging are cost, invasiveness, and limited accessibility. Because of these limitations, there have been few studies using PET imaging to investigate music-induced neurochemical changes. Accordingly, the research findings discussed in this chapter are primarily based on the molecular concentrations or secretion rate of the peripheral biomarkers in blood (e.g., plasma and platelets), saliva, and urine. It should be noted that some central and peripheral chemicals serve different functions (e.g., norepinephrine), and whether some peripheral measures reflect the central measures is debatable (e.g., oxytocin), which is discussed later in this chapter. This chapter covers neurotransmitters including DA, 5-HT, NE, and Ach; neuropeptides such as beta (ß)-endorphin, oxytocin (OT), and arginine vasopressin (AV), as well as their receptors and associated genes; steroid hormones such as cortisol; and peripheral immune biomarkers.

Dopamine Systems Dopamine (DA) is synthesized in the cytosol of catecholaminergic neurons in the ventral tegmental area (VTA) and in the substantia nigra pars compacta (SNpc) of the brain. The VTA DA sends projections to the ventral striatum/nucleus accumbens (NAc), amygdala, and hippocampus as well as medial prefrontal cortex such as orbitofrontal and anterior cingulate cortices whereas the SN DA sends projections to the dorsal striatum (i.e., putamen and caudate nucleus). These DA pathways are commonly referred to as mesocorticolimbic and nigro-striatal dopamine pathways, respectively. The former pathway is associated with emotional/motivational functions whereas the latter pathway is more involved in the executive/cognitive and sensorimotor functions (Solís & Moratalla, 2018). There are five dopamine receptor subtypes, which are classified based on their functional properties and subdivided into D1-like and D2-like families. The D1-like family consists of D1 and D5 receptors and the D2-like family consists of D2, D3, and D4 receptors. DA is also synthesized in the adrenal medulla and acts as a hormone along with other catecholamines (see section “Norepinephrine Systems”). Numerous functional neuroimaging studies have demonstrated that music alters brain activity in the DA pathways associated with reward/motivation (Menon & Levitin, 2005; Salimpoor, van den Bosch, Kovacevic, McIntosh, & Dagher,  2013), emotion/ pleasure (Koelsch, 2014; Mueller et al., 2015; Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011), as well as motor functions (Grahn & Rowe, 2009). However, to date there has been only one study that investigated dopaminergic transmission in the ventral

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    335 and dorsal striatum associated with musical pleasure (Salimpoor et al., 2011). This study employed PET with [11C]raclopride that is a D2/D3 receptor antagonist and found that greater DA release occurred in bilateral dorsal and ventral striatum, but notably in the right caudate nucleus and the right NAc during listening to pleasurable music selected by the participants compared to neutral music. Furthermore, the greater DA release in the right caudate nucleus was associated with the greater number of peak pleasure or “chills” whereas greater DA release in the right NAc was associated with more intense chills experienced. These anatomically distinct roles of the subregions in music listening were further elucidated by analyzing their temporal brain activation using functional magnetic resonance imaging (fMRI). The increased brain activity in the right caudate nucleus occurred several seconds prior to experiencing the pleasurable peak while the enhanced activity in the right NAc occurred during the pleasurable moments. The authors interpreted these findings as indicating that the former structure is involved in the antici­ pation and prediction of pleasure and the latter structure, in experiencing pleasure. This study demonstrated that musical pleasure is associated with DA release in the ventral striatum, particularly in the NAc. However, as the individuals who regularly experience “chills” during music listening were selected to participate in the study, further research is needed to investigate whether DA is also released during listening to pleasurable music in those who have never experienced “chills.” In addition to musical pleasure, the role of DA in music perception and auditorymotor entrainment was investigated in Parkinson’s disease (PD) that is primarily characterized by loss of dopaminergic neurons in the SNpc, resulting in depletion of dopaminergic input to the dorsal striatum. The involvement of DA in these functions was investigated when people with PD were on and off dopaminergic medication. One study showed that dopaminergic medication improved music perception (Cameron, Pickett, Earhart, & Grahn, 2016). However, it is unknown whether the improvement was due to the practice effect that was also observed in the healthy participants or the effect of the medication. Another study using PET with [11C]-DTBZ that binds dopamine transporters (VMAT2) did not find strong association between dopaminergic denervation and auditory-motor task performance (Miller et al., 2013). However, when the people with PD were grouped based on the similarity of dopaminergic denervation, the auditorymotor synchronization accuracy paralleled the pattern of denervation. On the other hand, two fMRI studies did not implicate the role of DA during auditory-motor entrainment (Elsinger et al., 2003; Jahanshahi et al., 2010). In addition to these central dopaminergic functions, dopamine-associated gene expression and peripheral dopaminergic levels were also investigated. The expression of alpha-synuclein (SNCA) that maintains DA neuronal homeostasis (Murphy, Rueter, Trojanowski, & Lee,  2000; Oczkowska, Kozubski, Lianeri, & Dorszewska,  2014) was upregulated in professional musicians after a two-hour concert performance compared to after a music-free control condition (Kanduri, Kuusi, et al., 2015), as well as in listeners with musical experiences longer than 10 years and those with high musical aptitude after listening to 20-minutes of classical music (Kanduri, Raijas, et al., 2015). The latter study also reported that the upregulation was absent after a music-free control condition or in

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

336   yuko koshimori listeners with no significant musical experience (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015). A few studies investigated the dopaminergic levels in the peripheral samples as well as associated psychological measures using an auditory stimulus and music. Two studies reported decreased dopaminergic levels: one in the urine sample following daily listening to binaural beats for 60 days in healthy adults, who also reported decreased trait anxiety (Wahbeh, Calabrese, & Zwickey, 2007) and the other study in the plasma sample following a 12-week dance movement therapy (DMT) that combines music, light exercise, and sensory stimulation in female adolescents with mild depression whose psycho­ logical distress was also reduced (Jeong et al., 2005). One study reported no change in the plasma dopaminergic level following listening to music (high-uplifting music or low-uplifting) in healthy adults who had performed a stress-induced task (Hirokawa & Ohira, 2003). However, it is unknown whether the stress-induced task affected the plasma dopaminergic level before listening to music. To summarize, there is some evidence that music enhances dopaminergic function. Listening to pleasurable music induces dopamine release, and music upregulates the SNCA expression that may facilitate dopaminergic neurotransmission. However, these responses may occur only in specific people (i.e., those with extensive musical training or those who regularly experience “chills” by listening to pleasurable music). Further studies are needed to investigate these effects in individuals with varying music education/training levels and listening habits and experiences, as well as using different genres of music. Music may also be able to reduce the peripheral DA levels and psychological disturbances. Future studies including both clinical and healthy control participants are needed to clarify these effects. In addition, some PD studies are suggestive of the role of DA in music perception and auditory cuing. Further pharmacological studies in people with PD are needed to address the limitations of previous studies for clarification.

Endogenous Opioid Systems Central endogenous opioid systems (EOS) include three opioid peptides including β-endorphin, enkephalins, and dynorphin and their three receptors, Mu (μ), Delta (δ), and Kappa (κ) (Benarroch, 2012). Neurons containing β-endorphin are localized in two main areas, the arcuate nucleus of the hypothalamus and nucleus tractus solitarius tract of the brainstem, which send widespread projections to the rest of the brain. Enkephalin and dynorphin are primarily located in local neurons. Opioid receptors are widely but differentially expressed throughout the central nervous system (CNS): the μ receptor is the most abundant receptor in the cerebral cortex, amygdala, thalamus, brainstem, dorsal horn and dorsal root ganglion (DRG) neurons; the δ receptor is mostly expressed in the olfactory system, striatum, limbic cortex, dorsal horn and DRG neurons; and the κ receptor in the claustrum, striatum, hippocampus, hypothalamus, brainstem, and dorsal

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    337 horn. The EOS involves various functions such as reward, pain modulation, stress responses, and autonomic control. In the previous section, the study by Salimpoor et al. (2011) showed that musical pleasure was associated with DA release in the ventral striatum. However, the central EOS likely plays a primary role in the positive affect or pleasure induced by music, which acts synergistically with DA (Chanda & Levitin, 2013). Both endogenous and exogenous opioids can activate DA neurons in the VTA (Hjelmstad, Xia, Margolis, & Fields, 2013) that innervates the NAc. Both EOS and dopamine systems are involved in reward mechanisms, which consist of liking/pleasure (core reactions to hedonic experience of receiving a reward), wanting (motivational aspect), and learning (association and cognitive representations) (Berridge & Kringelbach, 2015), but animal research favors the notion that the EOS but not the DA system generates pleasure. Specifically, stimulation of μ opioid receptors in the rostrodorsal part of the medial shell in the NAc, the pos­ terior ventral pallidum, as well as anterior orbitofrontal cortex and posterior insula (Castro & Berridge, 2017) enhances pleasure. In fact, two studies reported that blocking opioid receptors attenuated musical thrills (Goldstein,  1980) and pleasure (Mallik, Chanda, & Levitin, 2017) in response to participant-selected music. In addition to the role of the central EOS in musical pleasure, several studies investigated plasma levels of β-endorphin, which is released from the anterior pituitary gland (see also the section “Neuroendocrine Systems II”), acts as a hormone associated with stress (Kreutz, Murcia, & Bongard, 2012), and therefore functions differently from the β-endorphin in the CNS (Veening, Gerrits, & Barendregt, 2012). Listening to techno-music increased the plasma β-endorphin level, accompanied with increases in other psychophysiological measures as well as changes in emotional states whereas listening to classical music did not affect them (Gerra et al.,  1998). Interestingly, this study also showed the association between the β-endorphin responses to music and personal traits. Higher β-endorphin responses were associated with less novel seeking. In contrast to enhanced EOS responses, a decrease in the plasma concentration of β-endorphin was reported in response to experimenter-selected relaxation music, accompanied with a reduction in worries, fear, and blood pressure in coronary patients (Vollert, Störk, Rose, & Möckel, 2003) as well as after a single one-hour singing session in choirs affected by cancer including carers, bereaved carers, and patients (Fancourt, Williamon, et al., 2016). In this latter study, the β-endorphin level showed negative correlations with the levels of immune biomarkers and a positive correlation with another stress biomarker. Another study reported a decrease in response to experimenter-selected classical music and imagery, but not to music only or imagery only (McKinney, Tims, Kumar, & Kumar, 1997). In summary, musical pleasure is associated with the central EOS and music induces changes in the plasma concentration of β-endorphin. As the EOS plays an important role in various functions and has two different functional systems (i.e., central and peripheral systems), more research is needed to replicate and extend existing literature. Suggested future studies include PET studies that investigate both EOS and dopamine

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

338   yuko koshimori systems associated with musical pleasure/reward, studies that investigate the effects of different genres of music and personal traits on the release of β-endorphin, as well as studies that investigate the effects of music on EOS associated with pain modulation (i.e., μ opioid receptors in the central pain network; Benarroch, 2012) and stress regulation including both healthy participants and those with pain and stress. When the cerebrospinal fluid (CSF) or peripheral β-endorphin levels are assessed, the diurnal fluctuations should be taken into account. Moreover, it should be noted that responses of the plasma β-endorphin little reflect those in the CSF although they are not entirely independent (Veening et al., 2012).

Serotonin Systems Serotonin (5-HT) is synthesized in the raphe nuclei of the brainstem. Some of the 5-TH neurons project to the dorsal cochlear nucleus (DCN) and others send ascending projections to the inferior colliculus (IC) in which auditory neurons express multiple subtypes of 5-HT receptors (Hurley & Sullivan, 2012). A few studies demonstrated that the pharmacological stimulation of 5-HT receptors altered auditory perception and thereby subjective feelings in healthy participants. Specifically, 5-HT receptor 2A (5HT2A) agonist altered the neural response to both participant-selected, personally meaningful music and experimenter-selected non-meaningful music (Barrett, Preller, Herdener, Janata, & Vollenweider, 2017), enhanced the emotion induced by experimenter-selected music (Kaelen et al., 2015), as well as enhanced subjective experiences (mental imagery) accompanied with greater brain connectivity during listening to experimenter-selected music (Kaelen et al., 2016). These studies suggest that the variance in the neuroreceptor expression may play a role in subjective musical experiences. Several studies investigated genetic associations between 5-HT systems and musical ability/behavior using conventional genetic approaches such as genome-wide linkage scans, association studies, copy number variation studies, and candidate gene association. However, the associations are weak and inconclusive. In a small sample, musical traits were associated with the protocadherin-alpha gene (Ukkola-Vuoti et al., 2013) that is important for maturation of the serotonergic projections (Katori et al., 2009) and the galactose mutarotase gene that plays a role in 5-HT release and in membrane trafficking of 5-HT transporter (Djurovic et al., 2009). The serotonin transporter gene (SLC6A4) that regulates 5-HT supply to the receptors has been associated with musical memory (Granot et al., 2007) and choir participation (Morley et al., 2012) whereas it showed weak associations with musical aptitude (Ukkola, Onkamo, Raijas, Karma, & Järvelä, 2009) and no association with active music listening (Ukkola-Vuoti et al., 2011). Serotonin is also implicated in behavioral states such as stress and emotional behavior (Hurley & Sulivan,  2012), as well as various psychiatric and neurologic disorders such  as depression, anxiety disorders, obsessive-compulsive disorder, dementia, and post-traumatic stress disorders (Bandelow et al., 2017). One study measured the platelet

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    339 content of 5-HT as a model of neural biochemistry and reported a decrease in response to the experimenter-selected unpleasant music compared to pleasant music, suggesting that unpleasant music induces emotional stress or negative emotions, which led to 5-HT release and decreased intracellular 5-HT content of the serotonergic neurons, reflected by 5-HT content of platelets (Evers & Suhr, 2000). Another study reported the plasma serotonin concentration increased in female adolescents with mild depression who had DMT whereas it decreased in those who had no intervention (Jeong et al.,  2005). However, the 5-HT levels did not significantly differ between these two groups after the experimental session. There are also two studies that reported no 5-HT changes following music interventions (Kumar et al., 1999; Wahbeh et al., 2007). In summary, the literature shows weak evidence on associations between 5-HT systems and music. However, music modulates the activity in brain regions associated with emotion (Koelsch,  2014) and musical activities can influence social behavior and interaction. Similarly, the 5-HT systems play an important role in emotional behavior and social interaction (Hurley & Sullivan, 2012). In addition, 5-HT closely interacts with neuropeptides—oxytocin and arginine vasopressin that are implicated in social behavior and social reward (Albers, 2015; Dölen, Darvishzadeh, Huang, & Malenka, 2013) and that have been associated with social aspects of music and musical activities. Therefore, more studies are needed to fully understand the relationships between central 5-HT systems and music.

Neuroendocrine Systems I (Posterior Pituitary) Two neuropeptides released from the posterior pituitary are oxytocin (OT) and arginine vasopressin (AV). They are highly conserved across species (Johnson & Young, 2017) and modulate social behaviors (Bachner-Melman & Ebstein,  2014), including social cognition (Donaldson & Young,  2008) and social affiliation (Insel,  2010), as well as reproductive behaviors. They are also implicated in psychiatric disorders such as autism spectrum disorder (Bachner-Melman & Ebstein, 2014; Donaldson & Young, 2008). OT and AV are collectively called nonapeptides because they are composed of nine amino acid residues (Acher & Chauvet, 1995). They are predominantly synthesized in the magnocellular neurons in the hypothalamic supraoptic and paraventricular nuclei and released centrally and peripherally into the circulation through the posterior pituitary (Johnson & Young, 2017) and thereby act as neuromodulators or neurohormones (Bachner-Melman & Ebstein, 2014; Donaldson & Young, 2008). Although several nonapeptide receptors are identified in the brain, OT receptor (OTR), vasopressin receptor 1a (V1aR), and vasopressin receptor 1b (V1bR) have been a major focus of investigation. These nonapeptide receptors are expressed throughout auditory and mesolimbic pathways (Johnson & Young, 2017).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

340   yuko koshimori

Oxytocin Several studies investigated peripheral OT responses to musical activities. A single 30-minute singing lesson increased the serum OT level compared to the baseline level in both professional and amateur singers (Grape, Sandgren, Hansson, Ericson, & Theorell, 2003). Compared to a chatting group, a singing group showed increase in the salivary OT level and improvement of psychological well-being (Kreutz et al., 2012). In another study, the plasma OT level increased in a small sample of four singers after improvised singing, but it did not change after pre-composed singing (Keeler et al., 2015). Furthermore, a group of boys with mild emotional disturbance aged between 8 and 12 years showed an increased level of salivary OT in the free session of group drumming compared to the practice session, which was not observed in the same age group of girls or an older aged group of boys (Yuhi et al., 2017). In contrast to these findings, two studies reported that group singing reduced the OT levels. One study found a decrease in the salivary OT level after choir singing (Schladt et al., 2017). However, this change was not observed after solo singing in the same participants. Instead, the OT level increased after solo singing. Another study reported that singing in a single 70-minute choir rehearsal was associated with a decrease in the salivary OT level across three populations affected by cancer (Fancourt, Williamon, et al., 2016). In addition to these studies with musical activities, the effect of passive listening was investigated in two studies. Elevated plasma OT level was reported in cardiac surgery patients who listened passively to experimenter-selected “soothing” music (soft, relaxing of 60–80 beats per minute with a volume of 50–60 dB) for 30 minutes one day after the surgery, but not in those who rested without listening to music (Nilsson, 2009). Elevated plasma OT level was also observed in participants with Williams Syndrome (WS) who listened to their favorite music that elicited positive emotions (Dai et al., 2012).

Arginine Vasopressin Arginine vasopressin receptor 1A (AVPR1A) is one of the main genes that have been associated with musical activities and related behaviors in genome-wide linkage and association studies (Bachner-Melman et al., 2005; Granot et al., 2007; Mariath et al., 2017; Ukkola et al., 2009; Ukkola-Vuoti et al., 2011). The AVPR1A microsatellites have been associated with musical working memory (Granot et al., 2007; but also see Granot, Uzefovsky, Bogopolsky, & Ebstein, 2013), musical aptitude (Ukkola et al., 2009), active music listening (Ukkola-Vuoti et al., 2011), and a wide range of musical abilities (e.g., musical abilities associated with tempo, rhythm, dynamics, vocality, and pitch, as well as creativity and development of musical ideas and accompaniment) (Mariath et al., 2017). Except for these genetic studies, the relationships between AV and music have been little explored. The only study that has measured the AV level was the study by Dai et al. (2012) mentioned above, which also found an increase in the AV level in ­participants with WS.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    341 In summary, music induces the peripheral OT responses and there is some genetic association between AVR and music. However, the directional changes are not consistent among those OT studies. The elevated OT levels are generally implicated in positive social experiences (Chanda & Levitin, 2013). However, OT is also released in response to various kinds of stress (Brown, Cardoso, & Ellenbogen,  2016; de Jong et al.,  2015; Pierrehumbert et al., 2010). The reduction in the OT level may reflect lower arousal and stress during choir singing (Schladt et al., 2017). Taking blood samples may cause stress and increase the OT level in some participants at the baseline measurement, confounding the findings. Alternatively, the inconsistent findings may also be partially derived from different sampling methods. Some studies measured the OT levels in plasma and others measured the salivary OT. These peripheral levels are used as a proxy for the central OT levels. However, there are no strong correlations between central (CSF) and peripheral measures as well as between the peripheral measures (Carson et al., 2015; Hoffman, Brownley, Hamer, & Bulik, 2012; Javor et al., 2014; Lefevre et al., 2017; Valstad et al., 2017). Other possible factors explaining the inconsistencies include the influence by gonadal steroid (Insel, 2010) and subjective experiences of the musical activities employed in the study (Yuhi et al., 2017). The baseline measurement in a healthy control group is also needed to understand the directional changes and interactions when clinical populations are studied. To date, there has been only one study that investigated the effects of music on both neuropeptides in a clinical population and no study in healthy participants. For future studies measuring both neuropeptides, it should be noted that OT and AV show similar directional changes for some social behaviors such as pair bonding (Caldwell, 2017), whereas they show different effects in some cases, and opposite effects for aggression (Ferris,  1992; MacLean et al.,  2017), anxiety and stress (Bachner-Melman & Ebstein, 2014; Heinrichs, von Dawans, & Domes, 2009), and social approach (Thompson & Walton, 2004). In fact, several electrophysiological experiments revealed their differential regulations of excitatory projections in the limbic system (Campbell-Smith, Holmes, Lingawi, Panayi, & Westbrook, 2015; Huber, Veinante, & Stoop, 2005; Lubin, Elliot, Black, & Johns, 2003; Numan et al., 2010). Furthermore, animal research suggests that neuroanatomical distribution of their receptors may be critical for determining function.

Neuroendocrine Systems II (Anterior Pituitary) A neuroendocrine system, commonly referred to as the hypothalamic-pituitary-adrenal (HPA) axis releases cortisol as its main effector hormone (Spencer, Chun, Hartsock, & Woodruff, 2018). Cortisol plays an important role in circadian and stress regulation. The basal cortisol levels fluctuate in a circadian fashion in the absence of stressors and the levels rise in response to acute physical or psychological stressors as well as a circadian

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

342   yuko koshimori entrainment. Circadian and stress-induced cortisol secretion is determined by the neurohormone, corticotropin releasing factor (CRF) produced in and secreted from the medial paraventricular nucleus of the hypothalamus. In response to CRF, the anterior pituitary produces and secretes adrenocorticotropin hormone (ACTH) and ß-endorphin (see also the section “Endogenous Opioid Systems”). Triggered by the ACTH, the cortisol is synthesized in the adrenal cortex. It passively diffuses into the adrenal vein and is carried throughout the circulatory system. In addition to CRF, vasopressin (AVP) is also involved in the process of the secretion. The measured cortisol levels discussed in this section are primarily salivary cortisol unless mentioned otherwise. Salivary cortisol is a valid and reliable measure for unbound hormone in blood (Kirschbaum & Hellhammer, 1994). Cortisol has been most studied as a stress biomarker in response to music (Chanda & Levitin, 2013; Fancourt, Ockelford, & Belai, 2014; Hodges, 2010). There is a general consensus that relaxing music regardless whether it is experimenter- or participant-selected reduces cortisol levels (BeaulieuBoire et al., 2013; Chanda & Levitin, 2013; Chen, Sung, Lee, & Chang, 2015; Fancourt et al., 2014; Hodges,  2010; Jayamala, Lakshmanagowda, Pradeep, & Goturu, 2015; Kreutz et al., 2012; Mejía-Rubalcava, Alanís-Tavira, Mendieta-Zerón, & Sánchez-Pérez, 2015; but also see null findings by Chen et al., 2015; Chlan, Engeland, & Savik, 2013; Good et al., 2013; Tan, McPherson, Peretz, Berkovic, & Wilson, 2014). However, when experimenter-selected relaxing music and participant-selected music from a choice of genres were compared, participant-selected music was more effective in reducing the cortisol level by showing the prolonged effect post-surgery (Leardi et al., 2007). Another study suggests that the sound of rippling water may be more effective than relaxing music in reducing the cortisol level (Thoma et al., 2013). However, neither was significantly different from the control condition without acoustic stimulation. In addition to the effect of relaxing music, several studies investigated the effect of stimulating music on the cortisol levels. Five studies reported that stimulating music also reduced the cortisol levels in female adolescents with chronic depression (Field et al., 1998), in surgical patients (Koelsch et al., 2011), in participants with hypertensives (Möckel et al., 1995), in dancers (Quiroga Murcia, Kreutz, Clift, & Bongard, 2010), in participants with lung infection (le Roux, Bouic, & Bester, 2007), and in healthy males (Ooishi, Mukai, Watanabe, Kawato, & Kashino,  2017), whereas other studies found an  increase in healthy participants (Brownley, McMurray, & Hackney,  1995; Gerra et  al.,  1998; Hébert, Béland, Dionne-Fournelle, Crête, & Lupien,  2005; Karageorghis et al., 2017). These results suggest that stimulating music can either attenuate or enhance the cortisol level, which may depend on participant characteristics and/or their preference of music. Furthermore, there are some studies suggesting that music in general may reduce the cortisol levels. For example, both participant-selected chill-inducing music and music they disliked significantly reduced the cortisol levels in both male and female participants (Fukui & Toyoshima, 2013). Additionally, listening to music regardless of genre (Mozart, Strauss, and ABBA) led to a significant reduction in the serum cortisol concentrations, which was also significantly lower compared to those in the silence condition

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    343 (Trappe & Voit,  2016). Furthermore, another study reported that both repetitive drumming and instrumental meditation music decreased the cortisol levels (Gingras, Pohler, & Fitch, 2014). Taken together, cortisol appears to be responsive to music in general. Cortisol responses to music were also investigated in surgical patients pre- and postas well as during surgery. Listening to participant-selected and experimenter-selected music during and post-surgery prevented the cortisol level from increasing, and/or decreased the cortisol level post-surgery (Graversen & Sommer, 2013; Nilsson, 2009; Schneider, Schedlowski, Schürmeyer, & Becker, 2001; Tabrizi, Sahraei, & Rad, 2012; but also see Lin, Lin, Huang, Hsu, & Lin,  2011). Comparing different periods of time of music listening, one study reported that listening to experimenter-selected relaxing music following the surgery was most effective in reducing the level of serum cortisol relative to listening to music in the pre- and peri-operative periods (Nilsson, Unosson, & Rawal, 2005). Altogether, these studies suggest that listening to music is beneficial to surgical patients by reducing cortisol levels. Schneider et al. (2001) reported that more than a majority of patients in the music group thought that the beneficial effect of music was a distraction. Thus, how music exerts the beneficial effect on the cortisol level and other behavioral measures in surgical patients needs further investigation. The counteracting effect of music on the elevated cortisol levels induced by acute stressors was also studied in younger healthy participants. Experimenter-selected relaxing music helped to reduce the cortisol level immediately following a psychological stressor whereas the silence condition led to an increase during the same recovery period (Khalfa, Dalla Bella, Roy, Peretz, & Lupien, 2003), suggesting that relaxing music facilitates faster recovery from the stressor. On the other hand, experimenter-selected relaxing music did not lower the cortisol levels after the exposure to a psychological stressor whereas it prevented stress-induced increases in heart rate, systolic blood pressure, and anxiety compared to the silence condition (Knight & Rickard,  2001). Another study used an acute physiological stressor and demonstrated that tapping to the experimenter-selected positive music post-stressor was associated with more posi­ tive mood and stronger cortisol responses (i.e., increase) compared to tapping to the neural music (Koelsch et al.,  2016). The positive mood was also associated with the greater cortisol response to the acute stressor in the music group. Authors interpreted these findings as indicating that the stronger cortisol response may reflect an early sign of immuno-enhancing response to the acute stressor, but not a higher stress level because the music group overall had a more positive mood (Koelsch et al., 2016). There was no effect of music on the level of ACTH in this study. The inconsistent findings of these studies may be partly due to different types of stressor and how music was applied in these studies. Moreover, the effects of group musical activities on endocrine responses have been studied. Singing was associated with a reduction in endocrine responses (Fancourt, Aufegger, & Williamon, 2015; Fancourt, Williamon, et al., 2016; Schladt et al., 2017). Cortisol reduction was greater for choir singing than solo singing, accompanied with a reduction in the salivary OT (Schladt et al., 2017). In addition, the effect of group singing on the endocrine responses was modulated by the conditions of performance

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

344   yuko koshimori (Fancourt et al., 2015). More specifically, the reduced levels in cortisol and cortisone were only observed in the low-stress condition (singing without an audience) compared to the high-stress condition (singing in a live concert). On the other hand, no endocrine changes were found following a single session of group drumming (Bittman et al., 2001) or multiple sessions of group drumming (Fancourt, Perkins, et al.,  2016) in healthy participants. Music therapy, in which musical and other activities are led by a therapist, showed mixed results. Following guided imaginary and music (GIM) therapy that combines relaxation techniques and listening to classical music, the cortisol level was reduced to be lower compared to the silence condition in healthy participants (McKinney et al., 1997) and in individuals on sick-leave (Beck, Hansen, & Gold, 2015). On the other hand, there were no endocrine responses to an individualized music therapy in older healthy adults (Suzuki, Kanamori, Nagasawa, Tokiko, & Takayuki, 2007); to an individualized music therapy or to a multisensory stimulation environment including auditory stimulation in older adults with severe dementia (Valdiglesias et al., 2017); or to movement music therapy in older healthy adults (Shimizu et al., 2013). One study investigated the social effect of music listening on the cortisol level (Linnemann, Strahler, & Nater,  2016). Listening to music in the presence of others (mostly friends), but not listening alone, attenuated the secretion of cortisol. However, the presence of others alone significantly explained the variance in the cortisol level. In addition, the findings of this study should be interpreted with caution since the time intervals between music listening and the cortisol measurement were unknown. Interestingly, listening to music for relaxation was associated with significant reductions in the subjective stress level and in the cortisol concentration in healthy participants (Linnemann, Ditzen, Strahler, Doerr, & Nater, 2015). Moreover, the reduction in the cortisol level was not associated with the perception of music as relaxing. The authors emphasized the importance of non-musical, contextual factors such as reasons for music listening. It would be interesting to compare the cortisol response to a nonmusical control activity for relaxation. This study also showed that listening to music for distraction increased the stress level, which contrasted the findings in surgical patients (Schneider et al., 2001). This may be due to differences in participants’ characteristics and/or circumstances. The effects of music on endocrine measures may be mediated by sex. For example, testosterone showed opposite responses between men and women. Music decreased testosterone in men but it increased in women (Fukui & Toyoshima,  2013; Fukui & Yamashita, 2003). In addition, music may have differential effects on men and women. In one study, after strenuous exercise men and women showed different trajectory of the cortisol levels during a recovery period with music. This was observed regardless of  musical tempo (Karageorghis et al.,  2017). In another study, the cortisol level decreased more steeply in men relative to women in both choir and solo singing (Schladt et al., 2017). In contrast, other studies did not find any sex effect on the level of cortisol associated with music listening (Fukui & Yamashita, 2003; Nater, Abbruzzese, Krebs, & Ehlert, 2006).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    345 The studies discussed above included adult participants. Several studies investigated the cortisol responses to music in younger age groups. In schoolchildren, extra two-hour musical activities including singing, moving, dancing, or playing instruments during a school year resulted in a reduction of the cortisol level measured in the afternoon at the end of school year. However, this result reached a statistical significance only when a one-tail t-test was used (Lindblad, Hogmark, & Theorell,  2007). In preterm infants, exposure to live instrumental music reduced the cortisol level along with improvement of other measures for oxygen desaturations, apneas, and pain (Schwilling et al., 2015). On the other hand, recorded lullabies did not affect the cortisol level or sleep–awake behavior (Dorn et al., 2014). Another study also did not find any effect of recorded lullaby combined with touch on the cortisol level (Qiu et al., 2017). This study, however, showed that following the intervention, blood ß-endorphin was significantly increased, accompanied with decreased pain responses. To summarize, it is relatively conclusive that music reduces the cortisol level. The beneficial effects of music may be associated with distraction from aversive states (Chanda & Levitin, 2013) in the context of acute stressors (Linnemann, Kappert, et al., 2015) and/or listener’s intention of music listening (Linnemann, Ditzen, et al., 2015). Further studies are needed to clarify how music exerts beneficial effects on stress ­biomarkers. The endocrine responses are primarily studied associated with stress in which multiple factors can affect the findings, for example depending on whether the stressor is either acute or chronic (Koelsch et al., 2016) or whether it is psychological, physiological, or physical. In addition, appropriate stress response differs depending on the circadian phase (Spencer et al., 2018). Therefore, more studies warrant further elucidation of the effect of music on stress responses.

Norepinephrine Systems Norepinephrine (NE) neurons are located in the brainstem, primarily in the locus coeruleus (LC), whose axons widely project to the cerebral cortex, limbic regions, thalamus, and cerebellum as well as to the spinal cord. The major NE projection from the LC is thought to play an important role in stress responses and various psychiatric disorders (Hurley, Flashman, Chow, & Taber, 2010). In addition, the NE neurons located in the caudal pons and medulla are involved in the function of the sympathetic nervous system (SNS), regulating the autonomic responses of heart rate, blood pressure, and respiration. The activation of the SNS induced by physical or psychological stressors releases NE, which stimulates the adrenal glands that synthesizes and secretes hormonal norepinephrine, epinephrine, and dopamine (Kreutz et al., 2012). Music has been studied as an intervention to reduce stress and to normalize the SNS. A single therapeutic session using relaxing music reduced the plasma level of epinephrine in critically ill patients, which was also accompanied with reductions in the amount of sedative drug required, in blood pressure, and in heart rate (Conrad et al., 2007).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

346   yuko koshimori Similarly, another study reported that music therapy sessions using familiar music lowered the plasma NE level in the elderly with dementia and cardiovascular disease compared to those without music therapy (Okada et al., 2009). The patients in the music therapy group also showed improvement in other SNS measures and a reduction in the number of congestive heart failures. On the other hand, music therapy sessions increased both NE and epinephrine levels in males with Alzheimer’s disease (Kumar et al., 1999). The increased levels were normalized at a six-week follow-up. Two studies demonstrated the differential effects of stimulating and relaxing music. Experimenter-selected slow rhythm (classical) music decreased the plasma NE level whereas experimenter-selected fast rhythm (music from action movies) increased the plasma epinephrine level in healthy male participants (Yamamoto et al., 2003). These changes did not affect the following exercise performance. Similarly, using the sal­ ivary alpha-amylase as a surrogate biomarker for SNS, energizing music increased the activity whereas relaxing music decreased it in healthy participants (Linnemann, Ditzen, et al., 2015). One study showed the effects of music and the genre of music on the NE and epinephrine levels only in patients but not in healthy participants (Möckel et al., 1995). The hypertensive participants who selected modern classical music from a choice of preselected genres showed a reduction in the NE level whereas those who selected meditative music showed a reduced epinephrine level. The participants in both groups also showed reductions in other stress biomarkers. Another study showed that the religious Islamic music selected by an experimenter reduced the plasma NE level whereas classic music increased it in the Muslim participants who were listening to it during a dental procedure, which was also accompanied with the same directional changes in the systolic blood pressure. The differences in the NE levels as well as in the systolic blood pressure between pre- and post-dental procedure in the religious music group also significantly differed from the classic music and no music groups (Maulina, Djustiana, & Shahib, 2017). These studies suggest that the significance of music may play an important role in exerting the positive effects on the hormonal and physiological measures. In contrast, there were no catecholaminergic changes in response to experimenterselected stimulating music (Hirokawa & Ohira, 2003) or experimenter-selected relaxing music in healthy participants (Gerra et al., 1998; Hirokawa & Ohira, 2003); experimenterselected relaxing music in post-operative critically ill patients (Conrad et al., 2007); participant-selected music from a list, from a choice of genre, or of their preferences in preoperative patients (Lin et al., 2011; Schneider et al., 2001; Wang, Kulkarni, Dolev, & Kain, 2002) or in those receiving ventilator support (Chlan, Engeland, & Anthony, 2007); or participant-selected music from a choice of genre in patients under general anesthesia (Migneault et al.,  2004) or in post-operative patients (Lin et al.,  2011). Furthermore, experimenter-selected positive music from various genres and with varying tempo that evokes feelings of pleasure and happiness did not change the levels of NE compared to a neutral auditory stimuli following an acute physiological stressor (Koelsch et al., 2016).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    347 To summarize, literature shows conflicting results on the peripheral catecholaminergic responses to music. Music tends to decrease the levels of catecholamine in some individuals with medical conditions. Tempo/rhythm of music may be an important factor influencing the responses. This may reflect that the auditory nuclei in the brainstem and the midbrain encoding auditory temporal information (Griffiths, Uppenkamp, Johnsrude, Josephs, & Patterson,  2001) are innervated by the NE system from the LC (Levitt & Moore,  1979; Thompson, 2003). The beneficial effects observed in listening to religious Islamic music among Muslim participants (Maulina et al., 2017) suggest the top-down regulation of music on the NE system and SNS. The underlying mechanisms for the effects of music on SNS are also discussed in review papers (Fancourt et al., 2014; Juslin & Västfjäll, 2008).

Peripheral Immune System The immune system functions to protect and defend the body against infection and damage from foreign organisms and toxins, while maintaining checks and balances to prevent self-reactivity. It has two branches: innate and adaptive immunity. The innate immune responses occur immediately following an insult and are the first component of the immune system to be activated against invasion (Turvey & Broide, 2010), and include activation of immune cells such as granulocyte and monocytes/macrophages, and secretion of pro-inflammatory cytokines such as interleukin IL-1β, IL-6, tumor necrosis factor alpha TNF-α, and interferon-gamma IFN-γ to upregulate the acute inflammatory response. In contrast, the adaptive immune system, consisting of B cells and T cells, is slower acting with its responses occurring days to weeks after exposure. Unlike the innate immune system, the adaptive immune system is capable of memory and is able to adjust in response to pathogens. Anti-inflammatory cytokines include IL-1Ra, IL-4, IL-6, IL-10, IL-11 and IL-13, and TNF-β that modulate the inflammatory immune response to prevent the harmful effects of prolonged immune system activation. It should be noted that the immune cells such as natural killer (NK) cells and dendritic cells cannot be clearly defined as innate or adaptive and that some cytokines have both pro- and anti-inflammatory properties depending on the amount of the cytokines expressed, the length of time they are expressed, or which form of the receptors the cytokines activate (Rainville, Tsyglakova, & Hodes, 2018).

Immune Cells Group drumming led to an increase in the levels of NK cell in healthy adults (Bittman et al., 2001) and increased CD4+ T cell, and memory T cell counts only in older adults, but not in younger adults (Koyama et al., 2009). In contrast, listening to experimenterselected relaxing music during surgery decreased the levels of NK cell, which was not observed in patients who chose their music from preselected music (Leardi et al., 2007).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

348   yuko koshimori

Cytokines Among cytokines, IL-6 (that presents both pro- and anti-inflammatory properties) has been most researched associated with music. Music therapy sessions using relaxing music reduced the IL-6 levels, which was accompanied with reductions in SNS biomarkers in surgical patients (Conrad et al., 2007), and in the elderly with cerebrovascular disease and dementia (Okada et al., 2009). Experimenter-selected classical music also decreased the IL-6 level among older adults who liked the genre of music, which was accompanied with an increase in the expression of μ opioid receptors (Stefano, Zhu, Cadet, Salamon, & Mantione, 2004) whereas it did not change the levels of other cytokines. On the other hand, group drumming exercises led to an increase in the IL-6 level, along with increased levels of pro-inflammatory IFN-γ in older adults, which was not observed in younger adults (Koyama et al., 2009). Because Koyama et al. (2009) also reported increased CD4+ T cell and memory T cell counts only in older adults, the increased IL-6 level may be anti-inflammatory. Although it appeared that IL-6 showed “the greatest levels of responsiveness” (Fancourt et al., 2014, p. 18), more recent studies showed otherwise (Beaulieu-Boire et al., 2013; Fancourt, Perkins, et al., 2016; Fancourt, Williamon, et al., 2016; Koelsch et al., 2016). Further research is needed to determine whether or not IL-6 is a sensitive immune biomarker in response to music. Other cytokines also showed responses to music. The level of anti-inflammatory IL-1 was increased, along with the cortisol reduction in response to participant-selected music compared to the control conditions (Bartlett, Kaufman, & Smeltekop, 1993). In another study, anti-inflammatory IL-4 was increased, accompanied with a reduction in a pro-inflammatory marker, monocyte chemoattractant protein (MCP) in response to multiple group drumming sessions (Fancourt, Perkins, et al., 2016). Another study reported increased inflammatory markers including pro-inflammatory IL-2 and soluble IL2 receptor α; anti-inflammatory IL-4; and IL-17 that displays both pro- and antiinflammatory profiles, along with improved affects and reductions in the cortisol, ß-endorphin, and OT levels in response to a single session of singing in choirs affected by cancer (Fancourt, Williamon, et al., 2016). One study found that Mozart, but not Beethoven or Schubert, downregulated the levels of anti-inflammatory IL-4, 10 and 13 and upregulated the levels of pro-inflammatory cytokine such as IFN-γ and IL-12, which was also associated with alleviated allergic skin responses (Kimata, 2003). The findings reported by Kimata (2003) may reflect the enhancement of pro-inflammatory responses induced by music, which was similar to the increased cortisol responses following the acute physiological stressor (Koelsch et al., 2016).

Immunoglobulin A Along with these peripheral immune biomarkers, immunoglobulin A (IgA) is one of the most commonly studied immune biomarkers associated with music. Immunoglobulin

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    349 A is a major serum immunoglobulin that is predominantly produced in the bone marrow and mediates various protective functions through interaction with specific receptors and immune mediators (Woof & Ken, 2006). Immunoglobulin A is also a principal antibody class in the external secretions that bathe vast mucosal surfaces of the gastrointestinal, respiratory, and genitourinary tracts and plays an important role in first line immune protection. Secretory and serum IgA are different biochemical and immunochemical properties produced by cells with different organ distributions. Therefore, different methods of immunization can induce either secretory or serum IgA responses or a combination of both. In general, research has yielded consistent results: music increases the concentrations or secretion rate of secretory IgA (S-IgA) (Chanda & Levitin, 2013; Fancourt et al., 2014; Hodges,  2010), suggesting that music enhances immunity in healthy individuals. Furthermore, the S-IgA increase was greater when engaging in group singing compared to passive listening (Beck, Cesario, Yousefi, & Enamoto,  2000; Kreutz, Bongard, Rohrmann, Hodapp, & Grebe, 2004; Kuhn, 2002). Another study showed that S-IgA was increased only in repose to “designer music” that brings positive feelings, but not to relaxing (new age) or rock music (McCraty, Atkinson, & Rein, 1996). However, there are also a few studies reporting no changes in the levels of IgA. In two studies, the serum levels of IgA did not change in patients who listened to experimenterselected calming music post-surgery (Nilsson et al., 2005) or joyful music (which was described to the patients as “relaxing” acoustic stimulation to reduce noise) before, during, or after surgery (Koelsch et al., 2011). No music effect on the plasma IgA concentrations may be due to the effects of local anesthetic infiltration (Nilsson et al., 2005) or due to differences in response to music between S-IgA and serum IgA (Woof & Ken, 2006). Furthermore, two studies reported no changes following stressors such as eating adverse/allergic food (Kejr et al., 2010) or a stressful cognitive task (Hirokawa & Ohira, 2003). The immunoenhancement effect of music may be limited in healthy individuals without exposure to stressors. To summarize, there is some evidence that music induces changes in immune biomarkers. S-IgA appears to respond most consistently and robustly to music in healthy individuals. Music-induced increase of S-IgA is interpreted as immunoenhancement. Future studies can investigate how long this effect lasts and whether music experiences and habits modulate the effect. Although music induces responses of other immune biomarkers, the interpretations can be challenging due to the inconsistency in the directional changes of cytokines with different inflammatory properties. An interesting observation from animal research is that individual differences in the peripheral immune system influence the development of stress susceptibility, demonstrated by higher circulating levels of IL-6 and leukocytes in susceptible mice compared to resilient and control mice (Hodes et al., 2014; Rainville et al., 2018). Therefore, it may be useful to separate participants depending on the baseline level of immune biomarkers. Furthermore, as immune biomarkers are closely connected with hormones (Yovel, Shakhar, & Ben-Eliyahu, 2001), sex may need to be accounted for in the study design.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

350   yuko koshimori

Cholinergic Systems Cholinergic neurons are localized in the basal forebrain, pedunculopontine tegmental nucleus (PPT), and laterodorsal tegmental nucleus (LDT). Two latter nuclei are collectively termed as the pontomesencephalic tegmentum (PMT) nuclei. The PMT nuclei send widespread projections to the spinal cord, thalamus, basal forebrain, and frontal cortex. Major acetylcholine (Ach) receptors include nicotinic (nAChR) and muscarinic (mAChR) receptors, which are expressed in the auditory system (Metherate,  2011; Morley & Happe, 2000). Cholinergic modulation of auditory functions is well studied and animal research demonstrated that cholinergic neurons were responsive to simple auditory stimuli such as pure tones (Koyama, Jodo, & Kayama, 1994) and clicks (Reese, Garciarill, & Skinner,  1995a, b). However, whether acoustic information including music induces cholinergic responses in the human brain is unknown. In animal research, some neurons in the primary auditory cortex send direct glutamatergic projections to the superior olivary complex, as well as PMT that innervates the IC and the auditory thalamus (Motts & Schofield, 2010). These observations suggest that auditory stimuli activating the primary auditory cortex may be able to affect the activity of cholinergic neurons in the PMT, influencing various functions such as arousal, the sleep–wake cycle, motor control, and motivation and reward behavior (Schofield, 2010). Cholinergic PMT cells are connected with dopaminergic neurons in the VTA (Chen, Nakamura, Kawamura, Takahashi, & Nakahara, 2006; Pan & Hyland, 2005) and these connections are likely to be involved in reward behavior (Pan & Hyland, 2005), and the connections from PPT with BG are associated with motor functions. The cholinergic PPT neurons responsive to clicks (Reese et al., 1995a, b) may underlie part of the mech­ anisms for auditory-motor entrainment. There is also a network of the mediodorsal nucleus of the thalamus projecting to cholinergic and non-cholinergic neurons in the globus pallidus that project to the auditory cortex (Moriizumi & Hattori, 1992), which may also be associated with auditory motor functions.

Discussion and Future Directions Research demonstrates that music induces responses of neurochemicals as well as peripheral hormones and immune biomarkers, along with concomitant functional changes. Some of them are extensively studied and yield relatively consistent responses (e.g., a reduction in cortisol and an increase in S-IgA), and others are little studied and/ or show inconsistent results. In addition, there are few studies that have directly investigated the CNS responses. As a neuroscientific pursuit in music as well as clinical application of music are of growing interest, more studies are needed to elucidate and confirm

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    351 the neurochemical responses to music and acoustic information by employing more rigorous study designs. Future studies need to consider participant characteristics such as age, sex, trait and state of depression and anxiety levels, baseline neurochemical levels, polymorphisms associated with music ability, music education/training levels, and music listening habits and preferences. At the same time, more studies are needed to investigate the effects of these individual characteristics on neurochemical responses to determine important confounding variables in music studies. Moreover, studies including clinical populations or older healthy adults need to include control groups consisting of participants without the medical conditions or younger healthy adults to determine whether the ­target group is different from the control group in the baseline level of neurochemical measures and how they react differently to the music intervention. Existing literature used different methods to evaluate the molecular responses to music. Some studies simply compared the levels between pre- and post-music intervention and others additionally included control silence conditions. In addition, some studies used passive listening and others used group musical activities. In order to determine and dismantle the specific effects of music, future studies need to include control conditions well matched with music conditions in terms of attention, engagement, and interactions (e.g., passive listening to music versus passive listening to an audio book suggested by Chanda & Levitin, 2013). In general, research shows that participant-selected music has greater responses compared to experimenter-selected music. When the experimenter-selected music is used in experimental studies or in clinical settings, participants’ rating on the selected music for emotional dimensions and liking may help to understand the findings or answer some of the inconsistency and variance of the responses although subjective and objective hedonic reactions are not always mutual (Berridge & Kringelbach, 2015). More specific descriptions of music may also help to further clarify what components of music are important to induce such responses. Furthermore, concomitant measures of other relevant biomarkers and physiological/ emotional/behavioral data are useful to determine whether observed neurochemical responses are beneficial. Demonstrated correlations may help to interpret the findings and the underlying mechanisms. Moreover, the findings based on the peripheral measures to infer brain functions should be interpreted with caution unless the measures are well-validated proxies for the central measures. In addition, timing of measurement is important for some biomarkers, and thus multiple measurements over a period of time may be able to capture more distinct response. To date, there is only one study directly addressing neurochemical changes associated with music listening (Salimpoor et al., 2011), which used PET with a D2/D3 receptor antagonist, [11C]raclopride (see section “Dopamine Systems” for details). However, D2/D3 receptor agonists such as [11C]-(+)-PHNO (Rabiner & Laruelle, 2010; Willeit et al., 2006) may be more advantageous to investigate the functional changes in the ventral striatum because it is more sensitive to competition from endogenous dopamine

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

352   yuko koshimori following administration of dopamine releasing stimuli than the antagonist [11C] raclopride (Narendran et al., 2010; Shotbolt et al., 2012; Willeit et al., 2008) and shows up to 20-fold higher affinity for D3 over D2 receptors, providing higher sensitivity and allowing for better quantification of the D3 receptor subtype in the ventral striatum (Graff-Guerrero et al., 2008; Narendran et al., 2006). There are three radioligands developed for opioid receptors,: [11C]-carfentanil, targeting µ opioid receptors (Frost et al., 1989), [11C]-diprenorphine for non-selective opioid receptors (Jones et al., 1994), and more recent [11C]-LY2795050, targeting κ opioid receptors (Naganawa et al., 2015). For the neuroimmune system, a number of radioligands have been developed, targeting translocator protein (TSPO) that is localized to the outer membrane of mitochondria of glia cells and has been used as a biomarker for neuroimmune system and neuroinflammation in normal aging and various diseases and disorders (Gunn et al., 2015). In addition, radioligands for 5-HT and cholinergic subtype receptors as well as for NE have been developed. When PET studies are conducted, demographic characteristics such as age and sex, and body mass are important confounding variables to be taken into account (Gunn et al., 2015). In addition, polymorphism can have great impact on binding. For example, TSPO polymorphism produces three different binding phenotypes (Owen et al., 2011). Along with these neuroreceptors and proteins that can be studied using PET imaging, the recent development of proton magnetic resonance spectroscopy at high magnetic field strengths allows for more reliable estimation of the amino acids glutamine, glutamate, and gamma-aminobutyric acid (Ciurleo, Di Lorenzo, Bramanti, & Marino, 2014). This neuroimaging technique may be able to shed a light on cortico-cortical interactions and top-down modulations of music. Moreover, pharmacological studies in a double-blind placebo-controlled crossover design, combined with more accessible fMRI would facilitate to elucidate the role of neurochemicals in music-associated complex functions such as cognition and emotional behaviors.

References Acher, R., & Chauvet, J. (1995). The neurohypophysial endocrine regulatory cascade: Precursors mediators, receptors, and effectors. Frontiers in Neuroendocrinology 16(3), 237–289. Albers, H. E. (2015). Species, sex and individual differences in the vasotocin/vasopressin system: Relationship to neurochemical signaling in the social behavior neural network. Frontiers in Neuroendocrinology 36, 49–71. Bachner-Melman, R., Dina, C., Zohar, A. H., Constantini, N., Lerer, E., Hoch, S., . . . Ebstein, R. P. (2005). AVPR1a and SLC6A4 gene polymorphisms are associated with creative dance performance. PLoS Genetics 1(3), 394–403. Retrieved from https://doi.org/10.1371/journal. pgen.0010042 Bachner-Melman, R., & Ebstein, R. P. (2014). The role of oxytocin and vasopressin in emotional and social behaviors. Handbook of Clinical Neurology 124, 53–68. Bandelow, B., Baldwin, D., Abelli, M., Bolea-Alamanac, B., Bourin, M., Chamberlain, S.  R., . . . Riederer, P. (2017). Biological markers for anxiety disorders, OCD and PTSD:

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    353 A  consensus statement. Part II: Neurochemistry, neurophysiology and neurocognition. World Journal of Biological Psychiatry 18(3), 162–214. Barrett, F. S., Preller, K. H., Herdener, M., Janata, P., & Vollenweider, F. X. (2017). Serotonin 2A receptor signaling underlies LSD-induced alteration of the neural response to dynamic changes in music. Cerebral Cortex (December), 1–12. Retrieved from https://doi.org/10.1093/ cercor/bhx257 Bartlett, D., Kaufman, D., & Smeltekop, R. (1993). The effects of music listening and perceived sensory experience on the immune system as measured by interleukin-1 and cortisol. Journal of Music Therapy 30(4), 194–209. Beaulieu-Boire, G., Bourque, S., Chagnon, F., Chouinard, L., Gallo-Payet, N., & Lesur, O. (2013). Music and biological stress dampening in mechanically-ventilated patients at the intensive care unit ward: A prospective interventional randomized crossover trial. Journal of Critical Care 28(4), 442–450. Beck, B. D., Hansen, A. M., & Gold, C. (2015). Coping with work-related stress through guided imagery and music (GIM): Randomized controlled trial. Journal of Music Therapy 52(3), 323–352. Beck, R. J., Cesario, T. C., Yousefi, A., & Enamoto, H. (2000). Choral singing, performance perception, and immune system changes in salivary immunoglobulin A and cortisol. Music Perception: An Interdisciplinary Journal 18(1), 87–106. Benarroch, E. E. (2012). Endogenous opioid systems: Current concepts and clinical correlations. Neurology 79, 807–814. Berridge, K.  C., & Kringelbach, M.  L. (2015). Pleasure systems in the brain. Neuron 86(3), 646–664. Bittman, B., Berk, L., Felten, D., Westengard, J., Simonton, O., Pappas, J., & Ninehouser, M. (2001). Composite effects of group drumming music therapy on modulatin of neuroendocrineimmune parameters in normal subjects. Alternative Therapies 7(1), 38–47. Brown, C. A., Cardoso, C., & Ellenbogen, M. A. (2016). A meta-analytic review of the correlation between peripheral oxytocin and cortisol concentrations. Frontiers in Neuroendocrinology 43, 19–27. Brownley, K. A., McMurray, R. G., & Hackney, A. C. (1995). Effects of music on physiological and affective responses to graded treadmill exercise in trained and untrained runners. International Journal of Psychophysiology 19(3), 193–201. Caldwell, H.  K. (2017). Oxytocin and vasopressin: Powerful regulators of social behavior. Neuroscientist 23(5), 517–528. Cameron, D. J., Pickett, K. A., Earhart, G. M., & Grahn, J. A. (2016). The effect of dopaminergic medication on beat-based auditory timing in Parkinson’s disease. Frontiers in Neurology 7, 1–8. Retrieved from https://doi.org/10.3389/fneur.2016.00019 Campbell-Smith, E.  J., Holmes, N.  M., Lingawi, N.  W., Panayi, M.  C., & Westbrook, R.  F. (2015). Oxytocin signaling in basolateral and central amygdala nuclei differentially regulates the acquisition, expression, and extinction of context-conditioned fear in rats. Learning & Memory 22(5), 247–257. Carson, D. S., Berquist, S. W., Trujillo, T. H., Garner, J. P., Hannah, S. L., Hyde, S. A., . . . Parker, K. J. (2015). Cerebrospinal fluid and plasma oxytocin concentrations are positively correlated and negatively predict anxiety in children. Molecular Psychiatry 20(9), 1085–1090. Castro, D. C., & Berridge, K. C. (2017). Opioid and orexin hedonic hotspots in rat orbitofrontal cortex and insula. Proceedings of the National Academy of Sciences 114(43), E9125–E9134.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

354   yuko koshimori Chanda, M.  L., & Levitin, D.  J. (2013). The neurochemistry of music. Trends in Cognitive Sciences 17(4), 179–191. Chen, C. J., Sung, H. C., Lee, M. S., & Chang, C. Y. (2015). The effects of Chinese five-element music therapy on nursing students with depressed mood. International Journal of Nursing Practice 21(2), 192–199. Chen, J., Nakamura, M., Kawamura, T., Takahashi, T., & Nakahara, D. (2006). Roles of pedunculopontine tegmental cholinergic receptors in brain stimulation reward in the rat. Psychopharmacology 184(3–4), 514–522. Chlan, B.  L.  L., Engeland, W.  C., & Anthony, A. (2007). Influence of music on the stress response in patients receiving mechanical ventilatory support: A pilot study. American Journal of Critical Care 16(2), 141–146. Chlan, L. L., Engeland, W. C., & Savik, K. (2013). Does music influence stress in mechanically ventilated patients? Intensive and Critical Care Nursing 29(3), 121–127. Ciurleo, R., Di Lorenzo, G., Bramanti, P., & Marino, S. (2014). Magnetic resonance spectroscopy: An in vivo molecular imaging biomarker for Parkinson’s disease? BioMed Research International 2014, 519816. Retrieved from https://doi.org/10.1155/2014/519816 Conrad, C., Niess, H., Jauch, K., Bruns, C., Hartl, W., & Welker, L. (2007). Overture for growth hormone: Requiem for interleukin-6? Critical Care Medicine 35(12), 2709–2713. Dai, L., Carter, C. S., Ying, J., Bellugi, U., Pournajafi-Nazarloo, H., & Korenberg, J. R. (2012). Oxytocin and vasopressin are dysregulated in Williams syndrome, a genetic disorder affecting social behavior. PLoS ONE 7(6), e38513. Retrieved from https://doi.org/10.1371/ journal.pone.0038513 de Jong, T. R., Menon, R., Bludau, A., Grund, T., Biermeier, V., Klampfl, S. M., . . . Neumann, I. D. (2015). Salivary oxytocin concentrations in response to running, sexual self-stimulation, breastfeeding and the TSST: The Regensburg Oxytocin Challenge (ROC) study. Psychoneuroendocrinology 62, 381–388. Djurovic, S., Le Hellard, S., Kähler, A. K., Jönsson, E. G., Agartz, I., Steen, V. M., . . . Andreassen, O. A. (2009). Association of MCTP2 gene variants with schizophrenia in three independent samples of Scandinavian origin (SCOPE). Psychiatry Research 168(3), 256–258. Dölen, G., Darvishzadeh, A., Huang, K. W., & Malenka, R. C. (2013). Social reward requires coordinated activity of nucleus accumbens oxytocin and serotonin. Nature 501(7466), 179–184. Donaldson, Z. R., & Young, L. J. (2008). Oxytocin, vasopressin, and the neurogenetics of sociality. Science 322(5903), 900–904. Correction (2009): Science 323(5920), 1429. Dorn, F., Wirth, L., Gorbey, S., Wege, M., Zemlin, M., Maier, R.  F., & Lemmer, B. (2014). Influence of acoustic stimulation on the circadian and ultradian rhythm of premature infants. Chronobiology International 31(9), 1062–1074. Elsinger, C. L., Rao, S. M., Zimbelman, J. L., Reynolds, N. C., Blindauer, K. A., & Hoffmann, R. G. (2003). Neural basis for impaired time reproduction in Parkinson’s disease: An fMRI study. Journal of the International Neuropsychological Society 9(7), 1088–1098. Evers, S., & Suhr, B. (2000). Changes of the neurotransmitter serotonin but not of hormones during short time music perception. European Archives of Psychiatry and Clinical Neuroscience 250(3), 144–147. Fancourt, D., Aufegger, L., & Williamon, A. (2015). Low-stress and high-stress singing have contrasting effects on glucocorticoid response. Frontiers in Psychology 6, 1–5. Retrieved from https://doi.org/10.3389/fpsyg.2015.01242

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    355 Fancourt, D., Ockelford, A., & Belai, A. (2014). The psychoneuroimmunological effects of music: A systematic review and a new model. Brain, Behavior, and Immunity 36, 15–26. Fancourt, D., Perkins, R., Ascenso, S., Carvalho, L. A., Steptoe, A., & Williamon, A. (2016). Effects of group drumming interventions on anxiety, depression, social resilience and inflammatory immune response among mental health service users. PLoS ONE 11(3), 1–16. Retrieved from https://doi.org/10.1371/journal.pone.0151136 Fancourt, D., Williamon, A., Carvalho, L. A., Steptoe, A., Dow, R., & Lewis, I. (2016). Singing modulates mood, stress, cortisol, cytokine and neuropeptide activity in cancer patients and carers. Ecancermedicalscience 10, 1–13. Retrieved from https://doi.org/10.3332/ecancer.2016.631 Ferris, C. (1992). Role of vasopressin in aggressive and dominant/subordinate behaviors. Annals of the New York Academy of Sciences 652, 212–226. Field, T., Martinez, A., Nawrocki, T., Pickens, J., Fox, N., & Schanberg, S. (1998). Music shifts frontal EEG in depressed adolescents. Adolescence 33(129), 109–116. Frost, J. J., Douglass, K. H., Mayberg, H. S., Dannals, R. F., Links, J. M., Wilson, A. A., . . . Wagner, H. N. (1989). Multicompartmental analysis of [11C]-Carfentanil binding to opiate receptors in humans measured by positron emission tomography. Journal of Cerebral Blood Flow & Metabolism 9(3), 398–409. Fukui, H., & Toyoshima, K. (2013). Influence of music on steroid hormones and the relationship between receptor polymorphisms and musical ability: A pilot study. Frontiers in Psychology 4, 1–8. Retrieved from https://doi.org/10.3389/fpsyg.2013.00910 Fukui, H., & Yamashita, M. (2003). The effects of music and visual stress on testosterone and cortisol in men and women. Neuro Endocrinology Letters 24(3–4), 173–180. Gerra, G., Zaimovic, A., Franchini, D., Palladino, M., Giucastro, G., Reali, N., . . . Brambilla, F. (1998). Neuroendocrine responses of healthy volunteers to “techno-music”: Relationships with personality traits and emotional state. International Journal of Psychophysiology 28, 99–111. Gingras, B., Pohler, G., & Fitch, W.  T. (2014). Exploring shamanic journeying: Repetitive drumming with shamanic instructions induces specific subjective experiences but no larger cortisol decrease than instrumental meditation music. Plos ONE 9(7). Retrieved from https://doi.org/10.1371/journal.pone.0102103 Goldstein, A. (1980). Thrills in response to music and other stimuli. Physiological Psychology 8(1), 126–129. Good, M., Albert, J. M., Arafah, B., Anderson, G. C., Wotman, S., Cong, X., . . . Ahn, S. (2013). Effects on postoperative salivary cortisol of relaxation/music and patient teaching about pain management. Biological Research for Nursing 15(3), 318–329. Graff-Guerrero, A., Willeit, M., Ginovart, N., Mamo, D., Mizrahi, R., Rusjan, P., . . . Kapur, S. (2008). Brain region binding of the D2/3 agonist [11C]-(+)- PHNO and the D2/3 antagonist [11C]raclopride in healthy humans. Human Brain Mapping 29(4), 400–410. Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience 29(23), 7540–7548. Granot, R. Y., Frankel, Y., Gritsenko, V., Lerer, E., Gritsenko, I., Bachner-Melman, R., . . . Ebstein, R. P. (2007). Provisional evidence that the arginine vasopressin 1a receptor gene is associated with musical memory. Evolution and Human Behavior 28(5), 313–318. Granot, R. Y., Uzefovsky, F., Bogopolsky, H., & Ebstein, R. P. (2013). Effects of arginine vasopressin on musical working memory. Frontiers in Psychology 4, 1–12. Retrieved from https:// doi.org/10.3389/fpsyg.2013.00712

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

356   yuko koshimori Grape, C., Sandgren, M., Hansson, L., Ericson, M., & Theorell, T. (2003). Does singing promote well-being? An empirical study of professional and amateur singers during a singing lesson. Integrative Physiological & Behavioral Science 38(1), 65–74. Graversen, M., & Sommer, T. (2013). Perioperative music may reduce pain and fatigue in patients undergoing laparoscopic cholecystectomy. Acta Anaesthesiologica Scandinavica 57(8), 1010–1016. Griffiths, T. D., Uppenkamp, S., Johnsrude, I., Josephs, O., & Patterson, R. D. (2001). Encoding of the temporal regularity of sound in the human brainstem. Nature Neuroscience 4(6), 633–637. Gunn, R. N., Slifstein, M., Searle, G. E., & Price, J. C. (2015). Quantitative imaging of protein targets in the human brain with PET. Physics in Medicine and Biology 60(22), R363–R411. Hébert, S., Béland, R., Dionne-Fournelle, O., Crête, M., & Lupien, S. J. (2005). Physiological stress response to video-game playing: The contribution of built-in music. Life Sciences 76(20), 2371–2380. Heinrichs, M., von Dawans, B., & Domes, G. (2009). Oxytocin, vasopressin, and human social behavior. Frontiers in Neuroendocrinology 30(4), 548–557. Hirokawa, E., & Ohira, H. (2003). The effects of music listening after a stressful task on immune functions, neuroendocrine responses, and emotional states in college students. Journal of Music Therapy 40(3), 189–211. Hjelmstad, G. O., Xia, Y., Margolis, E. B., & Fields, H. L. (2013). Opioid modulation of ventral pallidal afferents to ventral tegmental area neurons. Journal of Neuroscience 33(15), 6454–6459. Hodes, G. E., Pfaua, M. L., Marylene Leboeufb, C., Goldena, S. A., Christoffela, D. J., Bregmana, D., . . . Russo, S. J. (2014). Individual differences in the peripheral immune system promote resilience versus susceptibility to social stress. Proceedings of the National Academy of Sciences 111(45), 16136–16141. Hodges, D. (2010). Pyschophysiological measures. In P.  Juslin & J.  Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 279–312). Oxford: Oxford University Press. Hoffman, E. R., Brownley, K. A., Hamer, R. M., & Bulik, C. M. (2012). Plasma, salivary, and urinary oxytocin in anorexia nervosa: A pilot study. Eating Behaviors 13(3), 256–259. Huber, D., Veinante, P., & Stoop, R. (2005). Vasopressin and oxytocin excite distinct neuronal populations in the central amygdala. Science 308(5719), 245–248. Hurley, L. M., & Sullivan, M. R. (2012). From behavioral context to receptors: Serotonergic modulatory pathways in the IC. Frontiers in Neural Circuits 6, 1–17. Retrieved from https://doi.org/10.3389/fncir.2012.00058 Hurley, R. A, Flashman, L. A., Chow, T. W., & Taber, K. H. (2010). The brainstem: Anatomy, assessment, and clinical syndromes. Journal of Neuropsychiatry and Clinical Neuroscience 22(1), 2–6. Retrieved from https://doi.org/10.1176/appi.neuropsych.23.2.121 Insel, T. R. (2010). The challenge of translation in social neuroscience: A review of oxytocin, vasopressin, and affiliative behavior. Neuron 65(6), 768–779. Jahanshahi, M., Jones, C. R. G., Zijlmans, J., Katzenschlager, R., Lee, L., Quinn, N., . . . Lees, A. J. (2010). Dopaminergic modulation of striato-frontal connectivity during motor timing in Parkinson’s disease. Brain 133(3), 727–745. Javor, A., Riedl, R., Kindermann, H., Brandstatter, W., Ransmayr, G., & Gabriel, M. (2014). Correlation of plasma and salivary oxytocin in healthy young men: Experimental evidence. Neuro Endocrinology Letters 35(6), 470–473.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    357 Jayamala, A. K., Lakshmanagowda, P. B., Pradeep, G. C. M., & Goturu, J. (2015). Impact of music therapy on breast milk secretion in mothers of premature newborns. Journal of Clinical and Diagnostic Research 9(4), CC04–CC06. Jeong, Y. J., Hong, S. C., Myeong, S. L., Park, M. C., Kim, Y. K., & Suh, C. M. (2005). Dance movement therapy improves emotional responses and modulates neurohormones in adolescents with mild depression. International Journal of Neuroscience 115(12), 1711–1720. Johnson, Z. V., & Young, L. J. (2017). Oxytocin and vasopressin neural networks: Implications for social behavioral diversity and translational neuroscience. Neuroscience & Biobehavioral Reviews 76, 87–98. Jones, A. K. P., Cunningham, V. J., Ha-Kawa, S. K., Fujiwara, T., Liyii, Q., Luthra, S. K., . . . Jones, T. (1994). Quantitation of [11C]diprenorphine cerebral kinetics in man acquired by PET using presaturation, pulse-chase and tracer-only protocols. Journal of Neuroscience Methods 51(2), 123–134. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31(5), 559–575; discussion 575–621. Kaelen, M., Barrett, F.  S., Roseman, L., Lorenz, R., Family, N., Bolstridge, M., . . . CarhartHarris, R. L. (2015). LSD enhances the emotional response to music. Psychopharmacology 232(19), 3607–3614. Kaelen, M., Roseman, L., Kahan, J., Santos-Ribeiro, A., Orban, C., Lorenz, R., . . . Carhart-Harris, R. (2016). LSD modulates music-induced imagery via changes in parahippocampal connectivity. European Neuropsychopharmacology 26(7), 1099–1109. Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A. K., Lähdesmäki, H., & Järvelä, I. (2015). The effect of music performance on the transcriptome of professional musicians. Scientific Reports 5, 1–7. Retrieved from https://doi.org/10.1038/srep09506 Kanduri, C., Raijas, P., Ahvenainen, M., Philips, A. K., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä, I. (2015). The effect of listening to music on human transcriptome. PeerJ 3, e830. Retrieved from https://doi.org/10.7717/peerj.830 Karageorghis, C.  I., Bruce, A.  C., Pottratz, S.  T., Stevens, R.  C., Bigliassi, M., & Hamer, M. (2017). Psychological and psychophysiological effects of recuperative music postexercise. Medicine & Science in Sports & Exercise 50(4), 739–746. Katori, S., Hamada, S., Noguchi, Y., Fukuda, E., Yamamoto, T., Yamamoto, H., . . . Yagi, T. (2009). Protocadherin-α family is required for serotonergic projections to appropriately innervate target brain areas. Journal of Neuroscience 29(29), 9137–9147. Keeler, J. R., Roth, E. A., Neuser, B. L., Spitsbergen, J. M., Waters, D. J. M., & Vianney, J.-M. (2015). The neurochemistry and social flow of singing: Bonding and oxytocin. Frontiers in Human Neuroscience 9, 1–10. Retrieved from https://doi.org/10.3389/fnhum.2015.00518 Kejr, A., Gigante, C., Hames, V., Krieg, C., Mages, J., König, N., . . . Diel, F. (2010). Receptive music therapy and salivary histamine secretion. Inflammation Research 59(Suppl. 2), 217–218. Khalfa, S., Dalla Bella, S., Roy, M., Peretz, I., & Lupien, S. J. (2003). Effects of relaxing music on salivary cortisol level after psychological stress. Annals of the New York Academy of Sciences 999, 374–376. Kimata, H. (2003). Listening to Mozart reduces allergic skin wheal responses and in vitro allergen-specific IGE production in atopic dermatitis patients with latex allergy. Behavioral Medicine 29(1), 15–19. Kirschbaum, C., & Hellhammer, D.  H. (1994). Salivary cortisol in psychoneuroendocrine research: Recent developments and applications. Psychoneuroendocrinology, 19(4), 313–333.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

358   yuko koshimori Knight, W. E. J., & Rickard, N. S. (2001). Relaxing music prevents stress-induced increase in subjective anxiety, systolic blood pressure, and heart rate in healthy males and female. Journal of Music Therapy 34(4), 254–272. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15(3), 170–180. Koelsch, S., Boehlig, A., Hohenadel, M., Nitsche, I., Bauer, K., & Sack, U. (2016). The impact of acute stress on hormones and cytokines, and how their recovery is affected by music-evoked positive mood. Scientific Reports 6, 1–11. Retrieved from https://doi.org/10.1038/srep23008 Koelsch, S., Fuermetz, J., Sack, U., Bauer, K., Hohenadel, M., Wiegel, M., . . . Heinke, W. (2011). Effects of music listening on cortisol levels and propofol consumption during spinal anesthesia. Frontiers in Psychology 2, 1–9. Retrieved from https://doi.org/10.3389/fpsyg.2011.00058 Koyama, M., Wachi, M., Utsuyama, M., Bittman, B., Hirokawa, K., & Kitagawa, M. (2009). Recreational music-making modulates immunological responses and mood states in older adults. Journal of Medical and Dental Sciences 56, 79–90. Koyama, Y., Jodo, E., & Kayama, Y. (1994). Sensory responsiveness of “broad-spike” neurons in the laterodorsal tegmental nucleus, locus coeruleus and dorsal raphe of awake rats: Implications for cholinergic and monoaminergic neuron-specific responses. Neuroscience 63(4), 1021–1031. Kreutz, G., Bongard, S., Rohrmann, S., Hodapp, V., & Grebe, D. (2004). Effects of choir singing or listening on secretory immunoglobulin A, cortisol, and emotional state. Journal of Behavioral Medicine 27(6), 623–635. Kreutz, G., Murcia, C., & Bongard, S. (2012). Psychoneuroendocrine research on music and health: An overview. In R. MacDonald, G. Kreutz, & L. Mitchell (Eds.), Music, health, and wellbeing (pp. 457–476). Oxford: Oxford University Press. Kuhn, D. (2002). The effects of active and passive participation in musical activity on the immune system as measured by salivary immunoglobulin A (SIgA). Journal of Music Therapy 39(1), 30–39. Kumar, A., Tims, F., Cruess, D., Mintzer, M., Ironson, G., Loewenstein, D., . . . Kumar, M. (1999). Music therapy increases serum melatonin levels in patients with Alzheimer’s disease. Alternative Therapies in Health Medicine 5(9), 49–57. Leardi, S., Pietroletti, R., Angeloni, G., Necozione, S., Ranalletta, G., & Del Gusto, B. (2007). Randomized clinical trial examining the effect of music therapy in stress response to day surgery. British Journal of Surgery 94(8), 943–947. le Roux, F., Bouic, P., & Bester, M. (2007). The effect of Bach’s Magnificat on emotions, immune, and endocrine parameters during physiotherapy treatment of patients with infectious lung conditions. Journal of Music Therapy 44(2), 156–168. Lefevre, A., Mottolese, R., Dirheimer, M., Mottolese, C., Duhamel, J. R., & Sirigu, A. (2017). A comparison of methods to measure central and peripheral oxytocin concentrations in human and non-human primates. Scientific Reports 7(1), 17222. Retrieved from https://doi. org/10.1038/s41598-017-17674-7 Levitt, P., & Moore, R. Y. (1979). Origin and organization of brainstem catecholamine innervation in the rat. Journal of Comparative Neurology 186(4), 505–528. Lin, P. C., Lin, M. L., Huang, L. C., Hsu, H. C., & Lin, C. C. (2011). Music therapy for patients receiving spine surgery. Journal of Clinical Nursing 20(7–8), 960–968. Lindblad, F., Hogmark, Å., & Theorell, T. (2007). Music intervention for 5th and 6th graders: Effects on development and cortisol secretion. Stress and Health 23(1), 9–14.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    359 Linnemann, A., Ditzen, B., Strahler, J., Doerr, J. M., & Nater, U. M. (2015). Music listening as a means of stress reduction in daily life. Psychoneuroendocrinology 60, 82–90. Linnemann, A., Kappert, M. B., Fischer, S., Doerr, J. M., Strahler, J., & Nater, U. M. (2015). The effects of music listening on pain and stress in the daily life of patients with fibromyalgia syndrome. Frontiers in Human Neuroscience 9, 1–10. Retrieved from https://doi.org/10.3389/ fnhum.2015.00434 Linnemann, A., Strahler, J., & Nater, U. M. (2016). The stress-reducing effect of music listening varies depending on the social context. Psychoneuroendocrinology 72, 97–105. Lubin, D., Elliot, J., Black, M., & Johns, J. (2003). An oxytocin antagonist infused into the central nucleus of the amygdala increases maternal aggressive behavior. Behavioral Neuroscience 117(2), 195–201. McCraty, R., Atkinson, M., & Rein, G. (1996). Music enhances the effect of positive emotional states on salivary IgA. Stress Medicine 12, 167–175. McKinney, C. H., Tims, F. C., Kumar, A. M., & Kumar, M. (1997). The effect of selected classical music and spontaneous imagery on plasma β-endorphin. Journal of Behavioral Medicine 20(1), 85–99. MacLean, E. L., Gesquiere, L. R., Gruen, M. E., Sherman, B. L., Martin, W. L., & Carter, C. S. (2017). Endogenous oxytocin, vasopressin, and aggression in domestic dogs. Frontiers in Psychology 8. Retrieved from https://doi.org/10.3389/fpsyg.2017.01613 Mallik, A., Chanda, M.  L., & Levitin, D.  J. (2017). Anhedonia to music and mu-opioids: Evidence from the administration of naltrexone. Scientific Reports 7, 1–8. Retrieved from https://doi.org/10.1038/srep41952 Mariath, L. M., da Silva, A. M., Kowalski, T. W., Gattino, G. S., De Araujo, G. A., Figueiredo, F.  G., . . . Schuch, J.  B. (2017). Music genetics research: Association with musicality of a polymorphism in the AVPR1A gene. Genetics and Molecular Biology 40(2), 421–429. Maulina, T., Djustiana, N., & Shahib, M. N. (2017). The effect of music intervention on dental anxiety during dental extraction procedure. The Open Dentistry Journal 11(1), 565–572. Mejía-Rubalcava, C., Alanís-Tavira, J., Mendieta-Zerón, H., & Sánchez-Pérez, L. (2015). Changes induced by music therapy to physiologic parameters in patients with dental anxiety. Complementary Therapies in Clinical Practice 21(4), 282–286. Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage 28(1), 175–184. Metherate, R. (2011). Functional connectivity and cholinergic modulation in auditory cortex. Neuroscience & Biobehavioral Reviews 35(10), 2058–2063. Migneault, B., Girard, F., Albert, C., Chouinard, P., Boudreault, D., Provencher, D., . . . Girard, D. C. (2004). The effect of music on the neurohormonal stress response to surgery under general anesthesia. Anesthesia & Analgesia 98(2), 527–532. Miller, N. S., Kwak, Y., Bohnen, N. I., Müller, M. L. T. M., Dayalu, P., & Seidler, R. D. (2013). The pattern of striatal dopaminergic denervation explains sensorimotor synchronization accuracy in Parkinson’s disease. Behavioural Brain Research 257, 100–110. Möckel, M., Störk, T., Vollert, J., Röcker, L., Danne, O., Hochrein, H., . . . Frei, U. (1995). Stress reduction through listening to music: Effects on stress hormones, hemodynamics and mental state in patients with arterial hypertension and in healthy persons. Deutsche Medizinische Wochenschrift 120(21), 745–752. Moriizumi, T., & Hattori, T. (1992). Choline acetyltransferase-immunoreactive neurons in the rat entopeduncular nucleus. Neuroscience 46(3), 721–728.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

360   yuko koshimori Morley, A. P., Narayanan, M., Mines, R., Molokhia, A., Baxter, S., Craig, G., . . . Craig, I. (2012). AVPR1A and SLC6A4 polymorphisms in choral singers and non-musicians: A gene association study. PLoS ONE 7(2), 2–8. Retrieved from https://doi.org/10.1371/journal.pone.0031763 Morley, B. J., & Happe, H. K. (2000). Cholinergic receptors: Dual roles in transduction and plasticity. Hearing Research 147(1–2), 104–112. Motts, S. D., & Schofield, B. R. (2010). Cholinergic and non-cholinergic projections from the pedunculopontine and laterodorsal tegmental nuclei to the medial geniculate body in guinea pigs. Frontiers in Neuroanatomy 4, 1–8. Retrieved from https://doi.org/10.3389/ fnana.2010.00137 Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J., .. . Möller, H. E. (2015). Investigating the dynamics of the brain response to music: A central role of the ventral striatum/nucleus accumbens. NeuroImage 116, 68–79. Murphy, D. D., Rueter, S. M., Trojanowski, J. Q., & Lee, V. M. (2000). Synucleins are developmentally expressed, and alpha-synuclein regulates the size of the presynaptic vesicular pool in primary hippocampal neurons. Journal of Neuroscience 20(9), 3214–3220. Naganawa, M., Zheng, M.-Q., Henry, S., Nabulsi, N., Lin, S.-F., Ropchan, J., . . . Huang, Y. (2015). Test-retest reproducibility of binding parameters in humans with 11C-LY2795050, an antagonist PET radiotracer for the opioid receptor. Journal of Nuclear Medicine 56(2), 243–248. Narendran, R., Mason, N.  S., Laymon, C.  M., Lopresti, B.  J., Velasquez, N.  D., May, M. A., . . . Frankle, W. G. (2010). A comparative evaluation of the dopamine D(2/3) agonist radiotracer [11C](-)-N-propyl-norapomorphine and antagonist [11C]raclopride to measure amphetamine-induced dopamine release in the human striatum. Journal of Pharmacology and Experimental Therapeutics 333(2), 533–539. Narendran, R., Slifstein, M., Guillin, O., Hwang, Y., Hwang, D. R., Scher, E., . . . Laruelle, M. (2006). Dopamine (D2/3) receptor agonist Positron Emission Tomography radiotracer [11C]-(+)-PHNO is a D3 receptor preferring agonist in vivo. Synapse 60(7), 485–495. Nater, U. M., Abbruzzese, E., Krebs, M., & Ehlert, U. (2006). Sex differences in emotional and psychophysiological responses to musical stimuli. International Journal of Psychophysiology 62(2), 300–308. Nilsson, U. (2009). Soothing music can increase oxytocin levels during bed rest after openheart surgery: A randomised control trial. Journal of Clinical Nursing 18(15), 2153–2161. Nilsson, U., Unosson, M., & Rawal, N. (2005). Stress reduction and analgesia in patients exposed to calming music postoperatively: A randomized controlled trial. European Journal of Anaesthesiology 22(2), 96–102. Numan, M., Bress, J. A., Ranker, L. R., Gary, A. J., DeNicola, A. L., Bettis, J. K., & Knapp, S. E. (2010). The importance of the basolateral/basomedial amygdala for goal-directed maternal responses in postpartum rats. Behavioural Brain Research 214(2), 368–376. Oczkowska, A., Kozubski, W., Lianeri, M., & Dorszewska, J. (2014). Mutations in PRKN and SNCA genes important for the progress of Parkinson’s disease. Current Genomics 14(8), 502–517. Okada, K., Kurita, A., Takase, B., Otsuka, T., Kodani, E., Kusama, Y., . . . Mizuno, K. (2009). Effects of music therapy on autonomic nervous system activity, incidence of heart failure events, and plasma cytokine and catecholamine levels in elderly patients with cerebrovascular disease and dementia. International Heart Journal 50(1), 95–110. Ooishi, Y., Mukai, H., Watanabe, K., Kawato, S., & Kashino, M. (2017). Increase in salivary oxytocin and decrease in salivary cortisol after listening to relaxing slow-tempo and exciting

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    361 fast-tempo music. PLoS ONE 12(12), 1–16. Retrieved from https://doi.org/10.1371/journal. pone.0189075 Owen, D. R. J., Gunn, R. N., Rabiner, E. A., Bennacef, I., Fujita, M., Kreisl, W. C., . . . Parker, C. A. (2011). Mixed-affinity binding in humans with 18-kDa translocator protein ligands. Journal of Nuclear Medicine 52(1), 24–32. Pan, W. X., & Hyland, B. I. (2005). Pedunculopontine tegmental nucleus controls conditioned responses of midbrain dopamine neurons in behaving rats. Journal of Neuroscience 25(19), 4725–4732. Pierrehumbert, B., Torrisi, R., Laufer, D., Halfon, O., Ansermet, F., & Beck Popovic, M. (2010). Oxytocin response to an experimental psychosocial challenge in adults exposed to traumatic experiences during childhood or adolescence. Neuroscience 166(1), 168–177. Qiu, J., Jiang, Y.-F., Li, F., Tong, Q.-H., Rong, H., & Cheng, R. (2017). Effect of combined music and touch intervention on pain response and β-endorphin and cortisol concentrations in late preterm infants. BMC Pediatrics 17(1), 1–7. Retrieved from https://doi.org/10.1186/ s12887-016-0755-y Quiroga Murcia, C., Kreutz, G., Clift, S., & Bongard, S. (2010). Shall we dance? An exploration of the perceived benefits of dancing on well-being. Arts & Health 2(2), 149–163. Rabiner, E. A., & Laruelle, M. (2010). Imaging the D3 receptor in humans in vivo using [11C] (+)-PHNO positron emission tomography (PET). International Journal of Neuropsychopharmacology 13(3), 289–290. Rainville, J.  R., Tsyglakova, M., & Hodes, G.  E. (2018). Deciphering sex differences in the immune system and depression. Frontiers in Neuroendocrinology (August). Retrieved from https://doi.org/10.1016/j.yfrne.2017.12.004 Reese, N. B., Garciarill, E., & Skinner, R. D. (1995a). Auditory input to the pedunculopontine nucleus: I. Evoked potentials. Brain Research Bulletin 37(3), 257–264. Reese, N. B., Garciarill, E., & Skinner, R. D. (1995b). Auditory input to the pedunculopontine nucleus: II. Unite responses. Brain Research Bulletin 37(3), 265–273. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14(2), 257–262. Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., & Dagher, A. Z. R. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219. Schladt, T. M., Nordmann, G. C., Emilius, R., Kudielka, B. M., de Jong, T. R., & Neumann, I. D. (2017). Choir versus solo singing: Effects on mood, and salivary oxytocin and cortisol concentrations. Frontiers in Human Neuroscience 11, 1–9. Retrieved from https://doi. org/10.3389/fnhum.2017.00430 Schneider, N., Schedlowski, M., Schürmeyer, T.  H., & Becker, H. (2001). Stress reduction through music in patients undergoing cerebral angiography. Neuroradiology 43(6), 472–476. Schofield, B. R. (2010). Projections from auditory cortex to midbrain cholinergic neurons that project to the inferior colliculus. Neuroscience 166(1), 231–240. Schwilling, D., Vogeser, M., Kirchhoff, F., Schwaiblmair, F., Boulesteix, A. L., Schulze, A., & Flemmer, A.  W. (2015). Live music reduces stress levels in very low-birthweight infants. Acta Paediatrica (Oslo, Norway), 104(4), 360–367. Shimizu, N., Umemura, T., Hirai, T., Tamura, T., Sato, K., & Kusaka, Y. (2013). Effects of movement music therapy with the naruko clapper on psychological, physical and physiological indices among elderly females: A randomized controlled trial. Gerontology 59(4), 355–367.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

362   yuko koshimori Shotbolt, P., Tziortzi, A.  C., Searle, G.  E., Colasanti, A., Van Der Aart, J., Abanades, S., . . . Rabiner, E. A. (2012). Within-subject comparison of [11C]-(+)-PHNO and [11C]raclopride sensitivity to acute amphetamine challenge in healthy humans. Journal of Cerebral Blood Flow and Metabolism 32(1), 127–136. Solís, O., & Moratalla, R. (2018). Dopamine receptors: Homomeric and heteromeric complexes in l‑DOPA‑induced dyskinesia. Journal of Neural Transmission 1, 1–8. Retrieved from https://doi.org/10.1007/s00702-018-1852-x Spencer, R. L., Chun, L. E., Hartsock, M. J., & Woodruff, E. R. (2018). Glucocorticoid hormones are both a major circadian signal and major stress signal: How this shared signal contributes to a dynamic relationship between the circadian and stress systems. Frontiers in Neuroendocrinology 49, 52–71. Stefano, G. B., Zhu, W., Cadet, P., Salamon, E., & Mantione, K. J. (2004). Music alters constitutively expressed opiate and cytokine processes in listeners. Medical Science Monitor: International Medical Journal of Experimental and Clinical Research 10(6), MS18–MS27. Suzuki, M., Kanamori, M., Nagasawa, S., Tokiko, I., & Takayuki, S. (2007). Music therapyinduced changes in behavioral evaluations, and saliva chromogranin A and immunoglobulin A concentrations in elderly patients with senile dementia. Geriatrics & Gerontology International 7(1), 61–71. Tabrizi, E. M., Sahraei, H., & Rad, S. M. (2012). The effect of music on the level of cortisol, blood glucose and physiological variables. EXCLI Journal 11, 556–565. Retrieved from https://doi.org/10.3389/fpsyg.2011.00058 Tan, Y. T., McPherson, G. E., Peretz, I., Berkovic, S. F., & Wilson, S. J. (2014). The genetic basis of music ability. Frontiers in Psychology 5, 1–19. Retrieved from https://doi.org/10.3389/ fpsyg.2014.00658 Thoma, M. V., La Marca, R., Brönnimann, R., Finkel, L., Ehlert, U., & Nater, U. M. (2013). The effect of music on the human stress response. PLoS ONE 8(8), 1–12. Retrieved from https://doi.org/10.1371/journal.pone.0070156 Thompson, A.  M. (2003). Pontine sources of norepinephrine in the cat cochlear nucleus. Journal of Comparative Neurology 457(4), 374–383. Thompson, R. R., & Walton, J. C. (2004). Peptide effects on social behavior: Effects of vasotocin and isotocin on social approach behavior in male goldfish (Carassius auratus). Behavioral Neuroscience 118(3), 620–626. Trappe, H.-J., & Voit, G. (2016). The cardiovascular effect of musical genres. Deutzsches Ärzteblatt International 113(20), 347–352. Turvey, S.  E., & Broide, D.  H. (2010). Innate immunity. Journal of Allergy and Clinical Immunology 125(2 Suppl. 2), S24–S32. Ukkola-Vuoti, L., Kanduri, C., Oikkonen, J., Buck, G., Blancher, C., Raijas, P., . . . Järvelä, I. (2013). Genome-wide copy number variation analysis in extended families and unrelated individuals characterized for musical aptitude and creativity in music. PLoS ONE 8(2). Retrieved from https://doi.org/10.1371/journal.pone.0056356 Ukkola-Vuoti, L., Oikkonen, J., Onkamo, P., Karma, K., Raijas, P., & Järvelä, I. (2011). Association of the arginine vasopressin receptor 1A (AVPR1A) haplotypes with listening to music. Journal of Human Genetics 56(4), 324–329. Ukkola, L., Onkamo, P., Raijas, P., Karma, K., & Järvelä, I. (2009). Musical aptitude is associated with AVPR1A-Halotypes. PLoS ONE 4(5), e5534. Retrieved from https://doi.org/10.1371/ journal.pone.0005534

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

neurochemical responses to music    363 Valdiglesias, V., Maseda, A., Lorenzo-López, L., Pásaro, E., Millán-Calenti, J. C., & Laffon, B. (2017). Is salivary chromogranin A a valid psychological stress biomarker during sensory stimulation in people with advanced dementia? Journal of Alzheimer’s Disease 55(4), 1509–1517. Valstad, M., Alvares, G. A., Egknud, M., Matziorinis, A. M., Andreassen, O. A., Westlye, L. T., & Quintana, D. S. (2017). The correlation between central and peripheral oxytocin concentrations: A systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews 78, 117–124. Veening, J. G., Gerrits, P. O., & Barendregt, H. P. (2012). Volume transmission of beta-endorphin via the cerebrospinal fluid: A review. Fluids and Barriers of the CNS 9(1), 1. Retrieved from https://doi.org/10.1186/2045-8118-9-16 Venneti, S., Lopresti, B. J., & Wiley, C. A. (2013). Molecular imaging of microglia/macrophages in the brain. Glia 61(1), 10–23. Retrieved from https://doi.org/10.1002/glia.22357 Vollert, J., Störk, T., Rose, M., & Möckel, M. (2003). Musik als begleitende therapie bei koronarer herzkrankheit. Deutsche Medizinische Wochenschrift, 128, 2712–2716. Wahbeh, H., Calabrese, C., & Zwickey, H. (2007). Binaural beat technology in humans: A pilot study to assess psychologic and physiologic effects. Journal of Alternative and Complementary Medicine 13(1), 25–32. Wang, S., Kulkarni, L., Dolev, J., & Kain, Z. (2002). Music and preoperative anxiety: A randomized, controlled study. Anesthesia & Analgesia 94(6), 1489–1494. Willeit, M., Ginovart, N., Graff, A., Rusjan, P., Vitcu, I., Houle, S., . . . Kapur, S. (2008). First human evidence of d-amphetamine induced displacement of a D2/3agonist radioligand: A [11C]-(+)-PHNO positron emission tomography study. Neuropsychopharmacology 33(2), 279–289. Willeit, M., Ginovart, N., Kapur, S., Houle, S., Hussey, D., Seeman, P., & Wilson, A. A. (2006). High-affinity states of human brain dopamine D2/3 receptors imaged by the agonist [11C]-(+)-PHNO. Biological Psychiatry 59(5), 389–394. Woof, J. M., & Ken, M. A. (2006). The function of immunoglobulin A in immunity. Journal of Pathology 208(2), 270–282. Yamamoto, T., Ohkuwa, T., Itoh, H., Kitoh, M., Terasawa, J., Tsuda, T., . . . Sato, Y. (2003). Effects of pre-exercise listening to slow and fast rhythm music on supramaximal cycle performance and selected metabolic variables. Archives of Physiology and Biochemistry 111(3), 211–214. Yovel, G., Shakhar, K., & Ben-Eliyahu, S. (2001). The effects of sex, menstrual cycle, and oral contraceptives on the number and activity of natural killer cells. Gynecologic Oncology 81(2), 254–262. Yuhi, T., Kyuta, H., Mori, H.-A., Murakami, C., Furuhara, K., Okuno, M., . . . Higashida, H. (2017). Salivary oxytocin concentration changes during a group drumming intervention for maltreated school children. Brain Sciences 7, 152. Retrieved from https://doi.org/10.3390/ brainsci7110152

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 15

The N eu roa e sth etics of M usic: A R e se a rch Agen da Comi ng of Age Elvira Brattico

Introduction Historically, the study of music, how it is perceived and appreciated and how it is ­created (composed) and produced (performed) has been approached in two broadly distinct ways. On one hand, music has been studied as a succession of compositions and composers and how these are acclaimed in different epochs. This “humanistic” approach uses the descriptive methods of history, sociology, and philosophy, and it is often identified with musicology proper. Within this approach, philosophical aesthetics of music finds its place (Scruton, 1999): the goal is to describe the change of musical taste over time, namely the explicit or unsaid principles that tacitly govern the consensus on what is considered musically acceptable and admirable (“beautiful”) and what is not. The peculiarity of this “humanistic” approach is the attention to the work of a single composer or musician, narrated for evidencing the uniqueness and exceptionality of his/her work, and its non-replicable contribution to humanity (Zeki, 2014). On the other hand, music has also been studied analytically with methods resembling natural sciences more than humanities. Music theory in primis and systematic musicology in secundis have evidenced the conventions that underlie music composition, namely the recipes for creating music, derived from the work of generally recognized composers, and the constant laws of perception that govern how music is understood and appreciated. With the advent of cognitive science, this “systematic” approach, grounded on the scientific method, has been inspired by the computer metaphor in the search for universal rules that govern how we perceive, appreciate, and produce music (Sloboda, 1985). The search for the perception and cognition laws for music has profited from neurological findings, in which patients with brain lesions in auditory temporal areas showed a

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   365 loss of musical perceptual abilities accompanied with preservation of other auditory perceptual skills (Peretz & Zatorre, 2005). These studies, when supported by the oppos­ite findings, namely when showing a double dissociation between music and language perception, provided the grounds for the initial influential models of music perception and production, listing a set of modules, each dedicated to encapsulated and automatic subskills (Peretz & Coltheart, 2003). This line of research, bridging systematic musicology with brain-lesion studies, has seen its climax in the 1980s and early 1990s. The 1990s, called the “decade of the brain,” also witnessed a surge of interest for answering epistemological (perception- and cognition-related) questions with experiments on healthy volunteers using methods borrowed from neurophysiology and neurology. New brain scanning devices such as magnetoencephalography (MEG, measuring the magnetic fields around ion currents produced by neurotransmission) and functional magnetic resonance imaging (fMRI, measuring neuron-activity-dependent hemoglobin changes in blood flow in the brain) allowed access to the study of music brain functions to a broader group of researchers, without the need to study rare brainlesion patients. Healthy volunteers could be increasingly measured during music tasks without causing any harm to them, apart from the short-lasting discomfort of the experimental session. This variation of the systematic approach peaked in the 2000s decade and is called “cognitive neuroscience of music” (Levitin & Tirovolas,  2009; Peretz & Zatorre,  2003; Samson, Dellacherie, & Platel,  2009) or more simply “neurosciences of music” (Altenmüller et al., 2012; Bigand & Tillmann, 2015). According to these accounts, music corresponds to a biological function, involving universal features that are shared by all humans ontologically (since birth) and philologically (since the appearance of Homo sapiens). More complex models of music perception, cognition, and emotions started to emerge, incorporating findings that pointed at shared rather than modular neural resources dedicated to music, in relation to other auditory functions (Früholz, Trost, & Kotz, 2016; Koelsch & Siebel, 2005; Patel, 2008). Hence, in cognitive neuroscience of music, the main goals have been, and still are, the search for brain specializations for music (as opposed to speech), the determination of the neural foundations of music perception, emotion, and production, and the identification of music effects on other brain functions. Overall, the predominant topics and models within cognitive neuroscience of music leave little space to aesthetic processes such as evaluative judgments, appreciation, and taste formation. In the past years, though, we are witnessing a shift of paradigm within the “systematic” approach, centered on a revised conceptualization of music, and that might ultimately reconcile this approach with the traditional “humanistic” one. This shift has been initiated by studies that have focused on the subjective experience of music listening, rather than the objective, physical attributes of it. In these studies, experimental participants were asked to bring their own music to the laboratory, and their individual reactions to the music heard became the focus of investigation, irrespectively of which object induced those reactions (Blood & Zatorre, 2001; Brattico et al., 2016). This experience is referred to as aesthetic when it originates in association to an artistic, human-made object without clear utilitarian functions. In several philosophical conceptualizations, in art and music what matters is the phenomenological content of the individual experience.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

366   elvira brattico The scientific method applied to the study of this experience is called empirical aesthetics, when mainly behavioral methods are used, or neuroaesthetics, when also brain research techniques are applied. In empirical aesthetics and neuroaesthetics, researchers strive to fragment the aesthetic experience into subprocesses or stages that can be studied separ­ately and that, when replicated, can produce a predictable outcome. However, since the human mind possesses an embodied craving for beauty, harmony, and symmetry, some artistic-object features that generate an aesthetic experience occur more frequently than others (Chatterjee & Vartanian, 2016; Conway & Rehding, 2013; Pearce et al., 2016; Smith, 2005). Indeed, art and music are forms of human expression that are as old as our species is (Aubert et al., 2014; Curtis, 2006). Hence, the aesthetic experience of music (and other arts) must be a biological as well as a cultural phenomenon. This point of view does not in any way downplay the act of creation, but rather emphasizes the fact that an aesthetic experience has aspects that are amenable for analysis in terms of biological frameworks. A recent cross-cultural study (Savage, Brown, Sakai, & Currie, 2015) provides further support studying music eliciting aesthetic experiences common to all humans. This study showed that the well-studied statistically predominant perceptual and cognitive features of music (pitch: use of discrete pitches, small intervals and melodic contours; rhythm: isochronous beat and multiples of beats; form: short phrases lasting less than 9 sec) are accompanied by other features that have been thus far marginalized in scientific investigation, namely instrumentation (concurrent use of voice and instruments), performance style (chest voice), and social context (performed in groups and by males). These features relate to aspects of music that are relevant in an aesthetic experience and that have been thus far mainly related to cultural transmission rather than biology: for instance, mastering the style of the music is often a prerequisite for reaching a positive aesthetic outcome and the type of social context is also a determinant of a musical ­aesthetic experience. In line with this, a meta-analysis has summarized the reasons for listening to music (Schafer, Sedlmeier, Stadtler, & Huron, 2013), illustrating, from the subjective experiential viewpoint, how music can be addressed by scientific investigation. Among the 129 surveyed reasons from the literature, three main factors emerged: social relatedness, self-awareness, and mood regulation/arousal. The last factor supports ­previous claims that music listening behavior is explained by the emotional and aesthetic impact of music. The other two factors have been less studied with neuroscience methods, also due to the limitations intrinsic to the experimental setup. In the present chapter, I first describe the general framework of neuroaesthetics of art that has inspired the advocated paradigm shift from music neuroscience to music neuro­ aesthetics, and then provide some putative reasons for the slow emergence of this field of research, as opposed to the neuroaesthetics of visual arts. Then, I list some of the main findings obtained within music neuroaesthetics that have been organized in the few models existing in the literature. The discussion is dedicated to the frontiers in the study of intra-subject neural interactions between brain areas that give rise to aesthetic responses and to the latest attempts for capturing the neural attributes of inter-subject interactions during musical performance.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   367

The Need to Study the Neural Determinants of Art The term neuroaesthetics was first coined by Semir Zeki almost two decades ago (Zeki, 1999) to indicate a multidisciplinary field of research, focused at first on visual art, merging a long history of philosophical and empirical aesthetics with the methodology of cognitive and affective neurosciences (Chatterjee,  2011; Chatterjee & Vartanian, 2014,  2016; Conway & Rehding,  2013; Nadal & Pearce,  2011; Pearce et al.,  2016). Neuroaesthetics seeks to understand the neural principles underlying the different processes that compose a human aesthetic experience with an artistic object (Livingstone & Hubel, 2002). An aesthetic experience has been defined as a psychological state determined by interaction with an object to which we intend to attribute (evaluate/appraise) positive or negative qualities according to perceptual, cognitive, affective, or cultural criteria. It is intrinsically different from other affective experiences due to a special attitude (also referred to as focus, stance, or pre-classification) toward the object. According to a Kantian notion, this aesthetic stance is often characterized by being disinterested, distanced from the primary emotional needs of the organism (Leder, Gerger, Brieber, & Schwarz, 2014). According to a somewhat tautological definition, an aesthetic experience is “an experience of a particular kind of object that has been designed to produce such an experience” (Bundgaard, 2015, p. 788). According to this conceptualization, an aesthetic experience arises when, through perceptual-representational processes, we attribute to the stimulus a meaning based on aesthetic evaluation. While there exist some universal laws of preference for some stimulus configurations (e.g., according to Gestalt laws humans tend to like symmetry, equilibrium, and order due to organizational function of the organism; Cupchik,  2007; Eysenck,  1942), the stimulus alone is not by itself the source of an aesthetic experience. Rather, it is the intentional relation and attitude that the subject has with the stimulus. Because of this, subjectivity is intrinsic in aesthetic responses. A stimulus that is aesthetically appealing to one person can be repulsive to another. These variations derive from both the internal state, including the personal experience of previous encounters with the stimulus, and the attitudes toward the stimulus, the current mood, and the innate biological predispositions for processing the stimulus and for having an aesthetic experience as a whole (Pelowski, Markey, Forster, Gerger, & Leder, 2017). Along this conceptualization, the research field of neuroaesthetics is dedicated to studying how the brain facilitates the human capacity for experiencing phenomena as “aesthetic” and for creating objects that evoke such experience. To delve into these aims, one can choose two possible directions of investigation, as also conceptualized by Brattico (2015), Cupchik and colleagues (Cupchik, Vartanian, Crawley, & Mikulis, 2009), Jacobsen and Beudt (2017), and Pelowski et al. (2017): on one hand, the bottom-up perceptual facilitation of aesthetic responses based on the physical properties of an

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

368   elvira brattico artwork, and, on the other hand, the feedback and feedforward relationship between top-down, intentional orientation of attention and the artwork. Following Redies (Redies, 2015), this dualism in how aesthetic phenomena are studied can be represented as a dichotomy between formalist and contextual theories. Formalist theories propose that the aesthetic experience relies on formal properties of the stimulus (e.g., symmetry, sensual beauty), which are considered to be universal and based on human brain physiology. Often in these theories, aesthetic responses to art are described as automatic and independent from conscious control (Zeki, 2013). In turn, in contextual theories the aesthetic experience depends on the intention of the artist and the circumstances under which the artwork has been created and is displayed. Some of these theories focus on contemporary abstract art, characterized by a lesser role given to sensory features (Jacobsen, 2014; Leder, Belke, Oeberst, & Augustin, 2004; Pelowski, Markey, Lauring, & Leder, 2016). Some proposals also attempt a reconciliation between the two opposite stands, modeling the impact of top-down and bottom-up factors depending on the type of artistic stimulus that is at hand. For instance, in the model by Redies (Redies, 2015) external information, meaning the stimulus features and context in which it is displayed, is distinct from internal representation, meaning the subjective representation and reaction to the stimulus by the beholder. In this particular model, aesthetic experience is reached only with favorable encoding and cognitive mastering of the stimulus. In most proposals, mainly focused on visual art (Pearce et al., 2016; Pelowski et al., 2016), the aesthetic experience seems to emerge from the interaction of cognitive, affective, and evaluative processes, involving at least three different brain processes: (a) an enhancement of low-level sensory processing; (b) high-level top-down processing and activation of cortical areas involved in evaluative judgment; (c) an engagement of the reward circuit, including cortical and subcortical regions. The initial efforts within neuroaesthetics of visual art involved measurements of subjects’ brain activity while they evaluated the beauty or preference of artistic versus natural pictures (e.g., Vartanian & Goel, 2004), while they rated the beauty or correctness of abstract visual patterns (e.g., Jacobsen & Höfel, 2003), or while they viewed abstract, still life, landscape, or portrait pictures classified as beautiful, ugly, or neutral prior to the brain scanning session (e.g., Kawabata & Zeki, 2004). After these inspiring works, a great number of publications using neuroimaging and neurophysiological techniques have followed. Current neuroaesthetic research has fractionated human responses to art into the main outcomes of aesthetic emotions (e.g., pleasure, being moved, interest), preference (e.g., conscious liking), and judgment (e.g., beauty), associating to each of them a replicable and reliable pattern of neural and physiological activity (Brattico et al., 2016; Brattico, Bogert, & Jacobsen, 2013; Brattico & Pearce, 2013; Chatterjee & Vartanian, 2014, 2016; Istok, Brattico, Jacobsen, Ritter, & Tervaniemi, 2013; Jacobsen, 2014; Leder, Markey, & Pelowski, 2015; Nieminen, Istok, Brattico, Tervaniemi, & Huotilainen, 2011; Pearce et al., 2016; Pelowski et al., 2016; Reybrouck & Brattico, 2015). In these proposals, aesthetic emotions are the subjective feelings elicited by an artistic object whereas aesthetic judgments are defined as subjective evaluations based on an i­ ndividual set

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   369 of criteria. Moreover, several factors affecting the aesthetic experience have been targeted by neuroscientific investigation: environment, intentions, familiarity, expertise, and attitudes. In the latest overarching proposal called the Vienna Integrated Model of Art Perception or VIMAP (Pelowski et al.,  2017), bottom-up processing of low-level ­artwork derived features, listing perceptual analysis, implicit memory integration, and explicit classification, is conjoined with top-down factors. Among those latter factors, cognitive mastery, namely the matching of all information collected in previous processing stages to existing predictions and schemata, plays a central role and leads to the creation of meaning and associations. Brain substrates of the difference stages of the visual aesthetic experiences have also been identified particularly in visual cortices for feature analysis, dorsolateral prefrontal cortex for cognitive mastery default-mode network regions, error-monitoring regions of the anterior cingulate cortex, limbic regions (particularly, insula and amygdala) for controlling emotions, and orbitofrontal cortex for integrating signals from cognitive and emotional brain regions and issuing aesthetic judgments. Lately, while the initial and majority of efforts have concentrated on visual art (paintings), researchers keen on the neuroaesthetics approach have expanded their interest from visual art toward several other artistic domains, such as sculpture (Di Dio, Macaluso, & Rizzolatti,  2007), architecture (Coburn, Vartanian, & Chatterjee,  2017), dance (Calvo-Merino, Glaser, Grezes, Passingham, & Haggard,  2005; Calvo-Merino, Jola, Glaser, & Haggard, 2008), and poetry (Wassiliwizky, Koelsch, Wagner, Jacobsen, & Menninghaus, 2017). In the past few years, the field has seen a fast growth with several special issues of journals and books (e.g., Huston, Nadal, Mora, Agnati, & Cela Conde, 2015; Martindale, Locher, & Petrov, 2007), reviews (Chatterjee, 2011; Chatterjee and Vartanian, 2014, 2016; Leder & Nadal, 2014; Nadal et al., 2008; Pearce et al., 2016; Pelowski et al., 2016, 2017), and conferences (e.g., Nadal & Pearce, 2011). While critiques do exist (Tallis, 2008, 2011), and are indeed welcome for a healthy scientific debate, in the past two years the status of neuroaesthetics, especially for visual arts, has changed from that of contingent or trendy to that of a mature discipline (Chatterjee, 2011; Leder & Nadal, 2014; Pearce et al., 2016).

Neuroaesthetics: A Research Agenda also for Music Similar to other artistic domains, music is phylogenetically universal: it has existed across all human cultures and epochs. It might be even older than our Homo sapiens species: a flute with two holes carved in a bear bone was found in 1996 from a cave in Slovenia that was inhabited by Neanderthals (Aubert et al., 2014; Seghdi & Brattico, in press). Music is also ontogenetically universal considering that it is the first form of

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

370   elvira brattico communication between a newborn and a parent and the last one to disappear when all other cognitive functions have been polished away by neurodegenerative decay (Golden et al., 2017; Jacobsen et al., 2015; Matrone & Brattico, 2015). Music shares all these aspects, namely universality, evolutionary functions, emotional impact, expressivity, with other forms of art. Moreover, music is characterized by responses that are aesthetic in nature, since they involve a variety of emotional processes that typically are associated with and temporally precede evaluative (subjective) decisions to consciously like the music heard and attribute to it (objective) properties of beauty or mastering or interest, as well as to seek for the same experience again. These processes form a learning motivational loop that ultimately generates a set of preferences and habits called musical taste. For instance, the top reasons why we listen to music (Laukka, 2007; McDonald & Stewart, 2008) and even why we become musicians (Juslin & Laukka, 2004; Sloboda, 1992) are related to the aesthetic responses that music evokes: enjoyment, being moved, entertainment, and beauty. Also, when asked to name the adjectives that best describe the aesthetic value of music hundreds of university students indicated “beautiful” as the most common word (Istok et al., 2009). Hence, cognitive neuroscience can regard music as a form of expressive art, rather than an auditory domain to be contrasted with the other auditory domains of speech/ language, as proposed in a first essay dedicated to the emerging field (Brattico & Pearce, 2013), aligning itself to the recent progress of neuroaesthetics. Along this line of thought, already in the late 1800s, the German philosopher Eduard Hanslick (1825–1904) underlined the strong links between music and aesthetics as opposed to the utilitarian function of speech: “Speech and music have their centres of gravity at different points, around which the characteristics of each are grouped: and while all specific laws of music will centre in its independent forms of beauty, all laws of speech will turn upon the ­correct use of sounds as a medium of expressing ideas” (Hanslick, 1954, pp. 94–95). In a second essay dedicated to music neuroaesthetics (Hodges, 2016), the field was described as counting two distinct research agendas. The first one is a “broad” agenda that studies music perception, cognition, and emotion, without explicit reference to aesthetics or to any aesthetic concept, and which can be identified with the broader field of cognitive neuroscience of music. The second one is a research agenda of “narrow” scope that can be identified as the “core” neuroaesthetics of music, since it deals primarily with aesthetic processes, and it explicitly refers to preference, aesthetic emotions, and beauty (or other aesthetic) judgments. The increasing amount of studies under the umbrella of the “core” neuroaesthetics of music often do not explicitly refer to any specific model of the musical aesthetic experience, but they typically contain the word “aesthetic” when describing findings. The goals of the “core” neuroaesthetics of music are to determine how the neuronal processing of multisensorial signals leads to aesthetic responses during music listening and per­ formance. Aesthetic responses include emotions (such as sensory and conscious pleasure or enjoyment, being moved), liking or preference, and aesthetic judgment. The present chapter aims at identifying the main themes that separate music neuroaesthetics from the broader cognitive neurosciences of music (see Fig. 1).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   371

Motor neuroscience

Music neuroscience

Neuroaesthetics Neuroaesthetics of music

Social, cognitive, affective neuroscience

Figure 1.  Diagram illustrating the standing of the field of neuroaesthetics of music within broader human cognitive neuroscience studies.

Existing Models of the Musical Aesthetic Experience Even if the past few years have witnessed several studies on aesthetic-related phenomena during music listening, the scientific questions asked have often been addressed without any explicit reference to overarching aesthetic frameworks, differently from what happens in visual research (Brattico & Pearce, 2013; Hodges, 2016). In a critical integrative analysis of thirty-one empirical aesthetic studies conducted between the years 1990 and 2015 (out of the initial 1,450 references first obtained) (Tiihonen, Brattico, Maksimainen, Wikgren, & Saarikallio, 2017), it was noted that scientific investigations of pleasure, one of the main subjective aesthetic responses to any artwork, have been contextualized within aesthetic frameworks and concepts for the visual modality (studies using stimuli from figurative arts, such as painting or sculptures) whereas they were linked to basic neuroscientific literature on primary pleasure (or the absence of it) for music modality. This analysis confirms that visual empirical and neuroaesthetics are active fields counting a number of established and well-recognized frameworks, whereas research on music is dominated by sensory and basic emotion models.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

372   elvira brattico The current situation can be attributed to the scarcity of brain-based models of aesthetic processes in music, leading to limited efforts of overarching interpretations of the individual neuroscientific findings obtained. One of these models (illustrated in Fig. 2) is characterized by a chronometric distinction of the information processing stages leading to aesthetic responses. This and further developments by the same authors establish a distinction between pre-attentive, low-level perceptual and emotional stages, and reflective processes involving cognitive control (Brattico, 2015; Brattico et al., 2013; Brattico & Pearce, 2013; Nieminen et al., 2011; Reybrouck & Brattico, 2015). These stages lead to the three main outcomes of an aesthetic experience, namely emotion, preference, and judgment (Brattico, 2015; Brattico & Pearce, 2013). These previous accounts include a locationist view combined with a temporal information processing description of the brain mechanisms involved in the aesthetic experience of music: each temporally evolving stage depends on a distinct set of specific brain structures. The final outcomes of the aesthetic experience require the succession of all previous stages in order to materialize. For instance, according to Brattico et al. (2013), conscious liking judgments can be issued after the brainstem, thalamus, and limbic regions have quickly reacted to salient features of the sound, and after the frontotemporal cortex has encoded and integrated those sound features with learned cognitive schemata, using parietal and action observation neural resources for attributing emotional connotations to the sounds (see Fig. 2). If all these stages are successfully completed, and if limbic, prefrontal, and mentalizing brain regions are conspicuously activated, then a liking judgment, possibly accompanied also by a beauty verdict, is issued. Other influential models that inform research on music neuroaesthetics, although not explicitly referring to the aesthetic experience as a whole, have targeted either musicinduced emotions or mainly pleasure (irrespectively of other emotions). The most influential model for music-induced emotions has been first proposed by Juslin and Västfjäll (2008) and identifies eight main mechanisms that are supposed to explain induction of any musical emotion: brainstem reflexes (the automatic reactions to salient, potentially important, features of sounds), evaluative conditioning (deriving from repeated pairing of music to positive or negative stimuli), emotional contagion (when music mimics a bodily or vocal emotional expression), visual imagery (association with visual images during listening), episodic memory (elicitation of a memory for a particular event), and musical expectancy. This latter mechanism has been strongly linked with two important forces accounting for a rewarding musical experience: predictability and surprise (Huron, 2006). During listening, we use our former encounters with music and our implicit knowledge of musical conventions to consciously or implicitly anticipate the outcomes of the musical “paths,” wondering where they might lead us (Huron, 2006, 2009). According to Huron (2006,  2009), Imagination, Tension, Prediction, Reaction, and Appraisal (ITPRA) create a loop leading to musical pleasure: anticipating future events in music through imagination creates both physiological and psychological tension, and both unconscious and conscious predictions for specific features are formed; the final outcome is a reaction leading to a conscious appraisal response (whether the outcome is good, bad, or something in between). In a summarizing effort, Vuust & Kringelbach (2010)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   373

EXTERNAL CONTEXT Environment, peers, setting

INTERNAL CONTEXT Expertise, mood, attention 3

5

(1a–4) Early emotional reactions.

(2b) Higher level visual analysis

(1b–4, 4, 5) Discrete emotional states.

600ms P300

2b

300ms P300 4

1000ms LPP Late LPP

6

100ms

1b 200ms MMN ERAN (1a–1b) Feature integration, perceptual invariances, hierarchical structure, temporal succession

Feature analysis and integration brainstem, sensory cortices Early emotional reactions sensory cortices, parahippocampal gyrus, amygdala

2a

Cognitive processing, rules non-primary sensory cortices, prefrontal cortex

1a 20ms ABR

P50

N1

100ms P2

Aesthetic judgment anterior cingulate, premotor cortex, orbitofrontal cortex

Discrete emotions amygdala, central striatum, prefrontal cingulate, insula

Aesthetic emotions medial prefrontal cortex, motor areas, ventral striatum, orbitofrontal cortex

(1a, 2a) Feature analysis. Stimuli arrive at peripheral organs and sensory relay stations of central nervous system, where they are decomposed into features.

Conscious liking anterior cingulate, orbitofrontal cortex, motor areas, ventral striatum, insula

Time

Figure 2.  A schematic representation of a previous framework concerning the timing, local­ ization, and effects of neural processes contributing to aesthetic experience (modified from Brattico et al., 2013). The lower block shows how the various processes evolve as a function of time, beginning from the first sensory analyses to the main outcomes of aesthetic emotions, preference, and judgments. The upper block illustrates their rough anatomic locations and connections in the human brain. ABR = auditory brainstem response; LPP = late positive potential.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

374   elvira brattico identified a dichotomy between extra-musical mechanisms that rely, for example, on associations with past events or other emotional sounds and the intra-musical mechanism of anticipation. According to this latter mechanism, a musical structure would be aesthetically pleasing when it optimally challenges learned predictions for incoming events (Vuust & Witek, 2014; Witek, Clarke, Wallentin, Kringelbach, & Vuust, 2014; Witek, Kringelbach, & Vuust, 2015). In the brain, dopaminergic neurotransmission between ventral tegmental area, ventral striatum (including the nucleus accumbens), amygdala, and insula up to the orbitofrontal cortex, is associated with desire for a reward, especially when it comes as unexpected (prediction error), whereas dorsal striatum and opioid neurotransmission seem to be related with the actual pleasurable reaction (Berridge & Kringelbach,  2015; Kringelbach & Berridge,  2017; Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011). To the initial six emotion-inducing mechanisms proposed by Juslin and Västfjäll (2008), another two were added by Juslin (2013). The first one was rhythmic entrainment. This mechanism is particularly interesting from the neuroscience perspective as it has been linked to the neural mechanism that synchronizes the firing frequency of neuronal assemblies to the pulse of the music heard (Large & Snyder, 2009), although not at tempi below 1 note per second (Doelling & Poeppel, 2015). In some cases, this neuronal entrainment can be observed even in the spectral domain. For instance, a dissonant sound seems to elicit neuronal activity that periodically oscillates at the same frequency as the beats (amplitude modulations) of the sound (Fishman et al.,  2001; Pallesen et al., 2015). The other added mechanism was aesthetic judgment (Juslin, 2013), that begins when a special aesthetic attitude is adopted and that is based on a set of individual criteria determining a preference or rejection of a particular musical piece. Aesthetic judgment, according to Juslin (2013), accounts for the special nature of music-induced emotions that distinguish them from mundane emotions (such as sadness for a sudden loss) as well as for the common incidence of mixed emotions induced by music representing negative emotions but producing pleasurable feelings of enjoyment. Notably, in this model, the distinction between emotions, preference, and judgments is made, similarly to Brattico et al. (2013), although it differs from that because of a lesser emphasis on temporally succeeding neutrally distinct processes. The most comprehensive accounts of the aesthetic experience (Brattico et al., 2013; Hargreaves & North, 2010; Hodges, 2016; Juslin, 2013) cover also the context, namely the external physical environment surrounding the individual during a musical activity. The listening experience changes depending on whether it is consumed alone or with peers, in a concert hall or at home. The listener, that is, the internal state of the individual (attention, intention, attitude, motivation, personality) cannot be omitted either (Brattico et al.,  2013; Brattico,  2015; Hargreaves & North,  2010; Hodges,  2016; Reybrouck and Brattico, 2015); a specific internal state can either impose an incidental music consumption, in the case of a distracted person with no intention to have any musical exposure, or cause a full aesthetic experience with positive responses, in the case of the avid concertgoer.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   375

Main Brain Structures Related to Aesthetic Responses to Music In the information processing models of the aesthetic experience presented above, the extraction of acoustic features in brainstem, thalamus, and sensory cortices is the first necessary stage. In music (but also in visual art, according to some models), emotional responses, described also as reactions or reflexes, occur already at an early stage and are closely predicted by the physical content of the stimulus (Brattico et al., 2013; Pearce, 2015; Reybrouck & Brattico,  2015). For instance, a rough, dissonant chord can alone excite neuronal assemblies in the limbic system, such as the amygdala and parahippocampal gyrus (Blood, Zatorre, Bermudez, & Evans, 1999; Gosselin et al., 2006; Pallesen et al., 2005). In the case of early emotional reactions to sounds, causing immediate sensory pleasure, limbic regions can be activated even without the involvement of higher-order brain areas. A dissociation between fast and slow routes for pleasure (described in Brattico, 2015; Kringelbach & Berridge, 2017), is visible in studies involving tasks that distract subjects from deliberate evaluation of sounds. For instance, in Bogert et al. (2016) limbic regions were activated in response to emotionally stereotypical music clips only when subjects were focusing their attention on descriptive aspects of the sounds, whereas they were downregulated when subjects had to direct their conscious attention to the emotions expressed by the music. An intermediate stage of the aesthetic experience, explicitly mentioned in Brattico et al. (2013) and Juslin (2013) as well as several visual models (Pelowski et al., 2016, 2017), includes integration of features and the modulation by existing cultural knowledge. This stage requires the involvement of lateral prefrontal cortex, particularly the inferior frontal gyrus, and premotor areas. These brain regions have been repeatedly involved in the detection of incongruous sound events, violating expectations based on previous knowledge of musical conventions. The predictive coding theory of brain function suggests that in both auditory and frontal regions of the brain prior predictions are continuously applied top-down to an incoming signal and when an error occurs between priors and actual signal, predictions are changed in a bottom-up feedback loop for minimizing free energy (Friston,  2005; Vuust, Ostergaard, Pallesen, Bailey, & Roepstorff,  2009). These prediction errors can be measured by using the event-related potential (ERP) technique and by focusing on brain responses such as the N100 or the mismatch negativity (MMN) or the early right anterior negativity (ERAN) (Koelsch, 2011; Koelsch & Siebel, 2005), tracking the information content (probabilistically based on the occurrence of sounds in the preceding context) or subjective expectancy of sounds (Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya, 2010). During the intermediate stage, discrete emotions expressed by music are perceived and possibly even induced. While in Juslin’s (2013) model, emotions are considered as an outcome of the different psychological and neural mechanisms activated during a listening experience, in Brattico et al.’s (2013) model emotions are perceived and felt

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

376   elvira brattico before other aesthetic outcomes occur. Support for this view comes from studies showing the independence between conscious, thought-related aesthetic processes and emotional processes (Bogert et al., 2016; Brielmann & Pelli, 2017; Liu et al., in press). A recent metaanalysis of fMRI studies on musical emotions highlights a set of regions in the brain that form the core of the functional network that processes musical emotions, namely nucleus accumbens, amygdala, hippocampus, insula, cingulate cortex, orbitofrontal cortex, and temporal pole (Koelsch, 2014). One kind of emotional response to music is conscious pleasure or enjoyment, closely related to liking and preference. In existing models, enjoyment and conscious liking are described as aesthetic outcomes since they require a deliberate decision and an evaluative act deriving from the integration of the preceding cognitive and emotional information processing stages (Brattico et al., 2013; Juslin, 2013). From the brain perspective, conscious pleasure and liking, often accompanied by the bodily response of chills, have been consistently associated with activity of mesolimbic brain regions of the reward circuit, including the nucleus accumbens, the ventral tegmental area, the amygdala, the insula, the orbitofrontal cortex, and the ventromedial prefrontal cortex, which rely on the neuro­ transmitter dopamine (Blood & Zatorre, 2001; Blum et al., 2010; Chanda & Levitin, 2013; Koelsch, 2014; Salimpoor et al., 2013; Zatorre, 2015). A third kind of aesthetic outcome is aesthetic judgment (“this music is beautiful”). As visible from Table  1, only a few studies have analyzed aesthetic judgments. Indeed, beauty is the most mentioned criterion when freely associating a word to music aesthetic value (Istok et al., 2009). A series of studies aimed at contrasting aesthetic versus cognitive responses to the same musical stimuli in order to evidence the specificity and chronometry of the neural mechanisms that govern aesthetic processes occurring during music listening. The first of these studies (Brattico, Jacobsen, De Baene, Glerean, & Tervaniemi, 2010) was conducted using electroencephalography (EEG): subjects were asked to judge the same 180 musical sequences while they were either deciding if the sequences sounded correct or incorrect (descriptive task) or they were deciding whether they liked them or not (evaluative task). Results showed larger frontal negativities for evaluative than descriptive task and more neural resources involved in “aesthetic” listening. In terms of brain structures, the orbitofrontal cortex is repeatedly found active in association with beauty judgments of music (similarly to beauty judgments of visual art) (Brattico et al., submitted; Ishizu & Zeki, 2011).

Present Advances: Neural Interactions for Music Aesthetic Responses Recent years have seen a change in the way brain physiology is described, from a locationist view where each structure subserves one or a few main functions, to a distributed view where the brain is described as a complex dynamic system and where the interactions

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   377 Table 1.  Selected published research on the neuroaesthetics of music. The list includes studies dealing with the aesthetic experience of music in general, or with any of its main outcomes (emotions, pleasure, liking, beauty) in relation with the underlying neural mechanisms. Studies focused only on psychological mechanisms are not considered here Authors

Date

Title

Topic

Method

Alluri & Toiviainen

2015

Musical expertise modulates functional connectivity of limbic regions during continuous music listening

Pleasure

fMRI

Berns et al.

2010

Neural mechanisms of the influence of popularity on adolescent ratings of music

Liking

fMRI

Blood & Zatorre

2001

Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion

Pleasure

PET

Brattico

2015

From pleasure to liking and back: Bottom-up and top-down neural routes to the aesthetic enjoyment of music

Pleasure and liking

Review

Brattico & Pearce

2013

The neuroaesthetics of music

Aesthetic experience

Review

Brattico et al.

2009 The origins of the aesthetic enjoyment of music: A review of the literature

Aesthetic experience

Review

Brattico et al.

2010

Cognitive vs. affective listening modes and judgments of music: An ERP study

Liking, attitude

ERP

Brattico et al.

2013

Towards a neural chronometry for the aesthetic experience of music

Aesthetic experience, model

Review

Brattico et al.

2016

It’s sad but I like it: The neural dissociation between musical emotions and liking in experts and laypersons

Liking and emotions

fMRI

Brattico et al.

2017

Global sensory qualities and aesthetic experience of music

Aesthetic experience

Review

Brown et al.

2011

Naturalizing aesthetics: Brain areas for aesthetic appraisal across sensory modalities

Pleasure

Meta-analysis

Chapin et al.

2010

Dynamic emotional and neural responses to music depend on performance expression and listener experience

Emotions

fMRI

Früholz et al.

2016

The sound of emotions: Towards a unifying neural network perspective of affective sound processing

Emotions

Review

Ishizu & Zeki

2011

Toward a brain-based theory of beauty

Beauty

fMRI

Juslin

2013

From everyday emotions to aesthetic emotions: Towards a unified theory of musical emotions

Aesthetic experience, model

Review

Koelsch

2014

Brain correlates of music-evoked emotions

Emotions

Meta-analysis

(continued)

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

378   elvira brattico Table 1.  Continued Authors

Date

Title

Topic

Method

Kornysheva et al.

2010

Tuning-in the beat: Aesthetic appreciation of musical rhythms correlates with a premotor activity boost

Beauty

fMRI

Kühn & Gallinat

2012

The neural correlates of subjective pleasantness

Liking

Meta-analysis

Lehne & Koelsch

2015

Tension-resolution patterns as a key element of aesthetic experience: Psychological principles and underlying brain mechanisms

Aesthetic experience, model

Review

Liu et al.

2017

Towards tunable consensus clustering for studying functional brain connectivity during affective processing

Liking and emotions

fMRI

Liu et al.

In Effect of explicit evaluation on neural Liking press connectivity related to listening to unfamiliar music

fMRI

MartínezMolina et al.

2016

Neural correlates of specific musical anhedonia

Pleasure

fMRI

Mas-Herrero et al.

2018

Modulating musical reward sensitivity up and Pleasure down with transcranial magnetic stimulation

TMS

Menon & Levitin

2005 The rewards of music listening: Response and physiological connectivity of the mesolimbic system

Pleasure

fMRI

Molnar-Szakacs 2006 Music and mirror neurons: from motion to & Overy “e” motion

Aesthetic experience, model

Montag et al.

2011

How one’s favorite song activates the reward circuitry of the brain: Personality matters!

Pleasure

fMRI

Müller et al.

2010

Aesthetic judgments of music in experts and laypersons: An ERP study

Beauty

ERP

Nieminen et al. 2011

The development of aesthetic responses to music and their underlying neural and psychological mechanisms

Aesthetic experience, model

Review

Pearce

2015

Effects of expertise on the cognitive and neural processes involved in musical appreciation

Aesthetic experience

Review

Pereira et al.

2011

Music and emotions in the brain: Familiarity matters

Pleasure

fMRI

Sachs et al.

2016

Brain connectivity reflects human aesthetic responses to music

Pleasure

fMRI

Salimpoor et al.

2011

Anatomically distinct dopamine release during anticipation and experience of peak emotion to music

Pleasure

fMRI and PET

Salimpoor et al.

2013

Interactions between the nucleus accumbens Pleasure and auditory cortices predict music reward value

fMRI

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   379 Salimpoor & Zatorre

2013

Steinbeis & Koelsch

Neural interactions that give rise to musical pleasure

Pleasure

Review

2009 Understanding the intentions behind man-made products elicits neural activity in areas dedicated to mental state attribution

Intention

fMRI

Suzuki et al.

2008 Discrete cortical regions associated with the musical beauty of major and minor chords

Beauty

fMRI

Trost et al.

2012

Mapping aesthetic musical emotions in the brain

Emotions

fMRI

Trost et al.

2014

Getting the beat: Entrainment of brain activity by musical rhythm and pleasantness

Pleasure

fMRI

Trost et al.

2015

Temporal dynamics of musical emotions examined through intersubject synchrony of brain activity

Emotions

fMRI

Vuust & Kringelbach

2010

The pleasure of making sense of music

Pleasure, model

Review

Wilkins et al.

2014

Network science and the effects of music preference on functional brain connectivity: From Beethoven to Eminem

Liking and genre preference

fMRI

between its components govern cognitive functions (Bassett & Sporns, 2017; Medaglia, Lynall, & Bassett, 2015). This novel view derives from the technological and scientific progress of network neuroscience, namely the marriage between network science and cognitive neuroscience (Bassett & Sporns, 2017). Network techniques are mathematical tools to describe complex systems organized in networks that change over time (dynamics) (Medaglia et al., 2015; Newman, 2010). In previous overviews of the music neuroaesthetic field (Brattico & Pearce,  2013; Hodges,  2016), studies from network neuroscience have not been much mentioned. Indeed, most studies on functional connectivity have been published in the past two years. For instance, it has been recently found that functional connectivity between the superior temporal gyrus (where the auditory cortex is located), the inferofrontal cortex (where hierarchical predictions for sounds are computed), and reward regions determines the pleasurable rewarding responses to music, or the absence of them (Martínez-Molina, Mas-Herrero, Rodriguez-Fornells, Zatorre, & Marco-Pallares, 2016; Sachs, Ellis, Schlaug, & Loui, 2016; Salimpoor et al., 2013; Wilkins, Hodges, Laurienti, Steen, & Burdette, 2014). For instance, in a study where subjects had to decide how much money they would use to buy songs, it was found that the connections between the nucleus accumbens and its surrounding regions (the amygdala and the hippocampus) predicted how much a participant would spend on each song (Salimpoor et al., 2013). The importance of the neural interactions between the nucleus accumbens and the auditory cortex for determining aesthetic pleasure to music has been remarked also by

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

380   elvira brattico studies aiming at identifying the neural sources of individual differences in pleasurable reactions to musical sounds (Keller et al.,  2013; Martínez-Molina et al.,  2016; Sachs et al., 2016). These studies originate from the recently empirically proven observation that music is not universally liked and appreciated but rather individuals vary greatly in their sensitivity to musical reward, ranging from musicophilics characterized by acute craving for music and increased responsiveness and interest for musical sounds (Sacks, 2007) to musical anhedonics, with a total indifference to music (Mas-Herrero, Marco-Pallares, Lorenzo-Seva, Zatorre, & Rodriguez-Fornells,  2013; Mas-Herrero, Zatorre, Rodriguez-Fornells, & Marco-Pallares, 2014). A recent study using diffusion tensor imaging (DTI) has evidenced that the white-matter tracts between the posterior portion of the superior temporal lobe and emotion- and reward-processing regions such as the anterior insula and the medial prefrontal cortex explain the individual differences in reward sensitivity to music (Sachs et al., 2016). In that study, reward sensitivity was quantified with the amount of chills experienced by each individual combined with the degree of physiological changes (heart rate and skin conductance response) during listening to music inducing chills versus neutral music. Another study (Martínez-Molina et al., 2016) used the newly developed BMRQ questionnaire to identify music-specific anhedonic, hedonic, and hyperhedonic subjects. They were measured with fMRI during a music listening task where they rated the pleasantness of the music excerpts, and a gambling task, where they either won or lost a symbolic amount of money. Results evidenced decreased regional activity in the ventral striatum (including the nucleus accumbens) in anhedonics and increased regional activity in hyperhedonics as well as downregulated functional connectivity between this area and the right superior temporal gyrus in anhedonics. These results were obtained only in relation with pleasantness responses to the music and not with the gambling task. These findings are not confined to receptive pleasure during listening but also relate to the desire to move to rhythmic aspects of the music. A study by Witek et al. (forthcoming) found local changes in directed effective connectivity between motor (dorsomedial prefrontal) and reward (striatal) networks during maximal rhythm-induced pleasurable urge to move. In addition, they showed that maximal pleasurable desire to move to sound was predicted by a meta-stable brain network organization, namely a neural organization lying between an ordered and a disordered state (computed as whole-brain shuffling speed of effective connectivity matrices) (Deco, Kringelbach, Jirsa, & Ritter, 2017). These and other studies compellingly demonstrate that functional connectivity between the superior temporal gyrus (where the auditory cortex is located), the inferofrontal cortex (where hierarchical predictions for sounds are computed), and reward regions of the brain are linked with pleasurable rewarding responses to music, or the absence of them (Martínez-Molina et al., 2016; Sachs et al., 2016; Salimpoor et al., 2013; Wilkins et al., 2014). Notably, the neural transmission between these brain areas is regulated by the monoamine neurotransmitter dopamine that has been linked to incentive salience and motivation for acting, namely to the “wanting” phase of the reward cycle (Kringelbach & Berridge, 2017). A very recent investigation has discovered

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   381 a  molecular link between affective sensitivity to (musical) sounds and dopamine functionality (Quarto et al., 2017): a functional variation in a dopamine receptor gene modulates the impact of sounds on mood states and emotion-related prefrontal and striatal brain activity. The studies reviewed above, while having the important merit to reveal the complex architecture subserving the rewarding experience of music listening, have not examined whether this experience can be consumed spontaneously, even with casual listening, or whether it requires focused attention and a particular attitude (that is sometimes referred to as aesthetic stance). A fresh study (Liu et al., in press) contrasted conditions varying in the type of focused attentional involvement toward the music requested from subjects. Similarly to previous findings (Bogert et al., 2016; Brattico et al., 2016; Liu, Abu-Jamous, Brattico, & Nandi, 2017), the study observed a co-activation in a network of mesiotemporal limbic structures, including the nucleus accumbens, in response to the liked musical stimuli, irrespectively of whether subjects were focusing on making a conscious liking evaluation or not. Functional connectivity within prefrontal and parieto-occipital regions was instead obtained for the liking judgments.

Future Challenges and Promises Until now, the musical experience has been analyzed from the point of view of the subject. Yet, music (like other arts) can represent a means of communication between the judgmental intentions of the perceiver and the meaning-making intentions of the composer/artist. The act of meaning attribution, which is essential to an aesthetic exper­ience, as argued, for example, by Chatterjee and Vartanian (2014), Pearce et al. (2016), Leder et al. (2004), and Menninghaus et al. (2017), cannot exist without the assignment of an intention to the agent producing the artistic object (Acquadro, Congedo, & De Riddeer, 2016). Modern neuroscience offers unprecedented opportunities to capture the essence of such aesthetic processes, thanks to the hyperscanning approach, namely the synchronized brain recordings of two or more persons doing an experimental task together (Hari, Henriksson, Malinen, & Parkkonen,  2015; Konvalinka & Roepstorff, 2012; Zhdanov et al., 2015; Zhou, Bourguignon, Parkkonen, & Hari, 2016). Even if presently, “core” neuroaesthetics of music does not much account for motor production, the mirror neuron or action observation system (a set of neurons in the fronto-parietal regions of the brain that responds when watching others doing a motor action; Freedberg & Gallese,  2007; Gallese & Freedberg,  2007; Rizzolatti, Fadiga, Gallese, & Fogassi,  1996) has been proposed as a key mechanism allowing aesthetic responses to music in an interactive situation (Molnar-Szakacs & Overy,  2006). According to one model (Molnar-Szakacs & Overy,  2006), music is described as a sequence of hierarchically organized sequences of motor acts synchronous with auditory information and activating both the auditory cortex and motor regions of the action observation network in the posterior inferior frontal gyrus (BA 44) and adjacent

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

382   elvira brattico ­ remotor cortex. In this model, the anterior insula serves to evaluate the internal visceral p changes derived from music and relay these changes to activity in the limbic system, which ultimately is responsible for the complex affective experiences originating from music listening. The co-activation of the same motor systems in musician and perceiver is supposed to allow the co-representation and sharing of emotions during an aesthetic musical experience. Hence, future studies using hyperscanning techniques might measure the aesthetic value of a musical interaction and determine the responsible neural mechanisms. Initial investigations measuring the inter-subject coupling of electroencephalographic signals (especially in beta frequency range) from guitarists playing in a duet prove the feasibility of this approach (Lindenberger, Li, Gruber, & Müller, 2009; Müller, Sanger, & Lindenberger, 2013). To conclude, the agenda of the neuroaesthetics of music, by addressing questions related to intra- and inter-subjectivity during a musical activity, comes close to the essence of music and of what we are as humans. While there still is the risk of “biologism,” researchers working under the music neuroaesthetics umbrella reach out to the “humanistic” approach since they strive to explain how “musical appreciation is dependent on culture, memory, mood and many other factors such as personal taste” (Tallis, 2011, p. 54).

References Acquadro, M. A., Congedo, M., & De Riddeer, D. (2016). Music performance as an experimental approach to hyperscanning studies. Frontiers in Human Neuroscience 10, 242. Retrieved from https://doi.org/10.3389/fnhum.2016.00242 Alluri, V., & Toiviainen, P. (2015). Musical expertise modulates functional connectivity of limbic regions during continuous music listening. Psychomusicology: Music, Mind, and Brain 25(4), 443–454. Altenmüller, E., Demorest, S. M., Fujioka, T., Halpern, A. R., Hannon, E. E., Loui, P., . . . Zatorre, R. J. (2012). Introduction to the neurosciences and music IV: Learning and memory. Annals of the New York Academy of Sciences 1252, 1–16. Aubert, M., Brumm, A., Ramli, M., Sutikna, T., Saptomo, E.  W., Hakim, B., . . . Dosseto, A. (2014). Pleistocene cave art from Sulawesi, Indonesia. Nature 514(7521), 223–227. Bassett, D.  S., & Sporns, O. (2017). Network neuroscience. Nature Neuroscience 20(3), 353–364. Berns, G. S., Capra, C. M., Moore, S., & Noussair, C. (2010). Neural mechanisms of the influence of popularity on adolescent ratings of music. NeuroImage 49(3), 2687–2696. Berridge, K.  C., & Kringelbach, M.  L. (2015). Pleasure systems in the brain. Neuron 86(3), 646–664. Bigand, E., & Tillmann, B. (2015). Introduction to the neurosciences and music V: Cognitive stimulation and rehabilitation. Annals of the New York Academy of Sciences 1337, vii–ix. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences 98(20) 11818–11823. Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience 2(4), 382–387.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   383 Blum, K., Chen, T. J., Chen, A. L., Madigan, M., Downs, B. W., Waite, R. L., . . . Gold, M. S. (2010). Do dopaminergic gene polymorphisms affect mesolimbic reward activation of music listening response? Therapeutic impact on Reward Deficiency Syndrome (RDS). Medical Hypotheses 74(3), 513–520. Bogert, B., Numminen-Kontti, T., Gold, B., Sams, M., Numminen, J., Burunat, I., . . . Brattico, E. (2016). Hidden sources of joy, fear, and sadness: Explicit versus implicit neural processing of musical emotions. Neuropsychologia 89, 393–402. Brattico, E. (2015). From pleasure to liking and back: Bottom-up and top-down neural routes to the aesthetic enjoyment of music. In M.  Nadal, J.  P.  Houston, L.  Agnati, F.  Mora, & C.  J.  Cela Conde (Eds.), Art, aesthetics, and the brain (pp. 303–318). Oxford: Oxford University Press. Brattico, E., Bogert, B., Alluri, V., Tervaniemi, M., Eerola, T., & Jacobsen, T. (2016). It’s sad but I like it: The neural dissociation between musical emotions and liking in experts and laypersons. Frontiers in Human Neuroscience 9, 676. Retrieved from https://doi.org/10.3389/ fnhum.2015.00676 Brattico, E., Bogert, B., & Jacobsen, T. (2013). Toward a neural chronometry for the aesthetic experience of music. Frontiers in Psychology 4, 206. Retrieved from https://doi.org/10.3389/ fpsyg.2013.00206 Brattico, E., Brattico, P., & Jacobsen, T. (2009). The origins of the aesthetic enjoyment of music: A review of the literature. Musicae Scientiae 13(2), 15–39. Brattico, E., Brusa, A., Fernandes, H. M., Jacobsen, T., Gaggero, G., Toiviainen, P., Vuust, P., & Proverbio, A. M. (submitted). The beauty and the brain: Investigating the neural correlates of musical beauty during a realistic listening experience. Brattico, E., Jacobsen, T., De Baene, W., Glerean, E., & Tervaniemi, M. (2010). Cognitive vs. affective listening modes and judgments of music: An ERP study. Biological Psychology 85(3), 393–409. Brattico, E., & Pearce, M.  T. (2013). The neuroaesthetics of music. Psychology of Aesthetics, Creativity, and the Arts 7, 48–61. Brattico, P., Brattico, E., & Vuust, P. (2017). Global sensory qualities and aesthetic experience of music. Frontiers in Neuroscience 11. Retrieved from https://doi.org/10.3389/fnins.2017.00159 Brielmann, A.  A., & Pelli, D.  G. (2017). Beauty requires thought. Current Biology 27(10), 1506–1513 e3. Brown, S., Gao, X., Tisdelle, L., Eickhoff, S.  B., & Liotti, M. (2011). Naturalizing aesthetics: Brain areas for aesthetic appraisal across sensory modalities. NeuroImage 58(1), 250–258. Bundgaard, H. (2015). Feeling, meaning, and intentionality: A critique of the neuroaesthetics of beauty. Phenomenology and the Cognitive Sciences 14(4), 781–801. Calvo-Merino, B., Glaser, D. E., Grezes, J., Passingham, R. E., & Haggard, P. (2005). Action observation and acquired motor skills: An FMRI study with expert dancers. Cerebral Cortex 15(8), 1243–1249. Calvo-Merino, B., Jola, C., Glaser, D. E., & Haggard, P. (2008). Towards a sensorimotor aesthetics of performing art. Consciousness and Cognition 17(3), 911–922. Chanda, M.  L., & Levitin, D.  J. (2013). The neurochemistry of music. Trends in Cognitive Sciences 17(4), 179–193. Chapin, H., Jantzen, K., Kelso, J. A., Steinberg, F., & Large, E. (2010). Dynamic emotional and neural responses to music depend on performance expression and listener experience. PloS ONE 5(12), e13812. Chatterjee, A. (2011). Neuroaesthetics: A coming of age story. Journal of Cognitive Neuroscience 23(1), 53–62.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

384   elvira brattico Chatterjee, A., & Vartanian, O. (2014). Neuroaesthetics. Trends in Cognitive Sciences 18(7), 370–375. Chatterjee, A., & Vartanian, O. (2016). Neuroscience of aesthetics. Annals of the New York Academy of Sciences 1369, 172–194. Coburn, A., Vartanian, O., & Chatterjee, A. (2017). Buildings, beauty, and the brain: A neuroscience of architectural experience. Journal of Cognitive Neuroscience 29(9), 1521–1531. Conway, B.  R., & Rehding, A. (2013). Neuroaesthetics and the trouble with beauty. PLoS Biology 11, e1001504. Cupchik, G.  C. (2007). A critical reflection on Arnheim’s Gestalt theory of aesthetics. Psychology of Aesthetics, Creativity, and the Arts 1(1), 16–24. Cupchik, G.  C., Vartanian, O., Crawley, A., & Mikulis, D.  J. (2009). Viewing artworks: Contributions of cognitive control and perceptual facilitation to aesthetic experience. Brain and Cognition 70(1), 84–91. Curtis, G. (2006). The cave painters. New York: Anchor Books. Deco, G., Kringelbach, M. L., Jirsa, V. K., & Ritter, P. (2017). The dynamics of resting fluctu­ ations in the brain: Metastability and its dynamical cortical core. Scientific Reports 7, 3095. doi:10.1038/s41598-017-03073-5 Di Dio, C., Macaluso, E., & Rizzolatti, G. (2007). The golden beauty: Brain response to classical and renaissance sculptures. PLoS ONE 11, 1–9. Doelling, K. B., & Poeppel, D. (2015). Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences 112(45), E6233–E6242. Eysenck, H.  J. (1942). The experimental study of the “good Gestalt”: A new approach. Psychological Review 49(4), 344–364. Fishman, Y.  I., Volkov, I.  O., Noh, M.  D., Garell, P.  C., Bakken, H., Arezzo, J.  C., . . . Steinschneider, M. (2001). Consonance and dissonance of musical chords: Neural correlates in auditory cortex of monkeys and humans. Journal of Neurophysiology 86(6), 2761–2788. Freedberg, D., & Gallese, V. (2007). Motion, emotion and empathy in esthetic experience. Trends in Cognitive Sciences 11(5), 197–203. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences 360(1456), 815–836. Früholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions: Towards a unifying neural network perspective of affective sound processing. Neuroscience & Biobehioral Reviews 68, 96–110. Gallese, V., & Freedberg, D. (2007). Mirror and canonical neurons are crucial elements in esthetic response. Trends in Cognitive Sciences 11(10), 411. Golden, H.  L., Clark, C.  N., Nicholas, J.  M., Cohen, M.  H., Slattery, C.  F., Paterson, R. W., . . . Warren, J. D. (2017). Music perception in dementia. Journal of Alzheimer’s Disease 55(3), 933–949. Gosselin, N., Samson, S., Adolphs, R., Noulhiane, M., Roy, M., Hasboun, D., . . . Peretz, I. (2006). Emotional responses to unpleasant music correlates with damage to the parahippocampal cortex. Brain 129(10), 2585–2592. Hanslick, E. (1954). On the musically beautiful. Indianapolis: Hackett (English translation from the 8th ed. 1891). Hargreaves, D.  J., & North, A.  C. (2010). Experimental aesthetics and liking for music. In P.  N.  Juslin & J.  A.  Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 515–46). Oxford: Oxford University Press.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   385 Hari, R., Henriksson, L., Malinen, S., & Parkkonen, L. (2015). Centrality of social interaction in human brain function. Neuron 88(1), 181–193. Hodges, D.  A. (2016). The neuroaesthetics of music. In S.  Hallam, I.  Cross, & M.  Thaut (Eds.), The Oxford handbook of music psychology (2nd ed., pp. 247–262). Oxford: Oxford University Press. Huron, D. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Huron, D. (2009). Aesthetics. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (pp. 151–159). Oxford: Oxford University Press. Huston, J. P., Nadal, M., Agnati, L., Mora, L., & Cela Conde, C. J. (Eds.). (2015). Art, aesthetics and the brain. Oxford: Oxford University Press. Ishizu, T., & Zeki, S. (2011). Toward a brain-based theory of beauty. PLoS ONE 6, e21852. Istok, E., Brattico, E., Jacobsen, T., Krohn, K., Mueller, M., & Tervaniemi, M. (2009). Aesthetic responses to music: A questionnaire study. Musicae Scientiae 13, 183–206. Istok, E., Brattico, E., Jacobsen, T., Ritter, A., & Tervaniemi, M. (2013). “I love rock ’n’ roll”: Music genre preference modulates brain responses to music. Biological Psychology 92(2), 142–151. Jacobsen, J. H., Stelzer, J., Fritz, T. H., Chetelat, G., La Joie, R., & Turner, R. (2015). Why musical memory can be preserved in advanced Alzheimer’s disease. Brain 138(8), 2438–2450. Jacobsen, T. (2014). Domain specificity and mental chronometry in empirical aesthetics. British Journal of Psychology 105(4), 471–473. Jacobsen, T., & Beudt, S. (2017). Domain generality and domain specificity in aesthetic appreciation. New Ideas in Psychology 47, 97–102. Jacobsen, T., & Höfel, L. (2003). Descriptive and evaluative judgment processes: Behavioral and electrophysiological indices of processing symmetry and aesthetics. Cognitive, Affective, & Behavioral Neuroscience 3(4), 289–299. Juslin, P. N. (2013). From everyday emotions to aesthetic emotions: Towards a unified theory of musical emotions. Physics of Life Reviews 10(3), 235–266. Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research 33(3), 217–238. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31(5), 559–575. Kawabata, H., & Zeki, S. (2004). Neural correlates of beauty. Journal of Neurophysiology 91(4), 1699–1705. Keller, J., Young, C. B., Kelley, E., Prater, K., Levitin, D. J., & Menon, V. (2013). Trait anhedonia is associated with reduced reactivity and connectivity of mesolimbic and paralimbic reward pathways. Journal of Psychiatric Research 47(10), 1319–1328. Koelsch, S. (2011). Toward a neural basis of music perception: A review and updated model. Frontiers in Psychology 2, 110. Retrieved from https://doi.org/10.3389/fpsyg.2011.00110 Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15, 170–180. Koelsch, S., & Siebel, W.  A. (2005). Towards a neural basis of music perception. Trends in Cognitive Sciences 9(12), 578–584. Konvalinka, I., & Roepstorff, A. (2012). The two-brain approach: How can mutually interacting brains teach us something about social interaction? Frontiers in Human Neuroscience 6, 215. Retrieved from https://doi.org/10.3389/fnhum.2012.00215

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

386   elvira brattico Kornysheva, K., von Cramon, D.  Y., Jacobsen, T., & Schubotz, R.  I., Tuning-in the beat: Aesthetic appreciation of musical rhythms correlates with a premotor activity boost. Human Brain Mapping 31(1), 48–64. Kringelbach, M. L., & Berridge, K. C. (2017). The affective core of emotion: Linking pleasure, subjective well-being, and optimal metastability in the brain. Emotion Review 9(3), 191–199. Kühn, S., & Gallinat, J. (2012). The neural correlates of subjective pleasantness. NeuroImage 61(1), 289–294. Large, E.  W., & Snyder, J.  S. (2009). Pulse and meter as neural resonance. Annals of the New York Academy of Sciences 1169, 46–57. Laukka, P. (2007). Uses of music and psychological well-being among the elderly. Journal of Happiness Studies 8(2), 215–241. Leder, H., Belke, B., Oeberst, A., & Augustin, D. (2004). A model of aesthetic appreciation and aesthetic judgements. British Journal of Psychology 95(4), 489–508. Leder, H., Gerger, G., Brieber, D., & Schwarz, N. (2014). What makes an art expert? Emotion and evaluation in art appreciation. Cognition and Emotion 28, 1137–1147. Leder, H., Markey, P. S., & Pelowski, M. (2015). Aesthetic emotions to art: What they are and what makes them special. Comment on “The quartet theory of human emotions: An integrative and neurofunctional model” by S. Koelsch et al. Physics of Life Reviews 13, 67–70. Leder, H., & Nadal, M. (2014). Ten years of a model of aesthetic appreciation and aesthetic judgments: The aesthetic episode—developments and challenges in empirical aesthetics. British Journal of Psychology 105(4), 443–464. Lehne, M., & Koelsch, S. (2015). Tension-resolution patterns as a key element of aesthetic experience: Psychological principles and underlying brain mechanisms. In J.  P.  Huston, M.  Nadal, F.  Mora, L.  Agnati, & C.  J.  Cela Conde (Eds.), Art, aesthetics, and the brain (pp. 285–302). Oxford: Oxford University Press. Levitin, D. J., & Tirovolas, A. K. (2009). Current advances in the cognitive neuroscience of music. Annals of the New York Academy of Sciences 1156, 211–231. Lindenberger, U., Li, S.  C., Gruber, W., & Müller, V. (2009). Brains swinging in concert: Cortical phase synchronization while playing guitar. BMC Neuroscience 10, 22. Retrieved from https://doi.org/10.1186/1471-2202-10-22 Liu, C., Abu-Jamous, B., Brattico, E., & Nandi, A. K. (2017). Towards tunable consensus clustering for studying functional brain connectivity during affective processing. International Journal of Neural Systems 27(2), doi:10.1142/S0129065716500428 Liu, C., Brattico, E., Abu-Jamous, B., Pereira, C. S., Jacobsen, T., & Nandi, A. K. (in press). Effect of explicit evaluation on the neural connectivity related to listening to unfamiliar music. Frontiers in Human Neuroscience. Retrieved from https://doi.org/10.3389/fnhum. 2017.00611 Livingstone, M., & Hubel, D.  H. (2002). Vision and art: The biology of seeing. New York: Harry N. Abrams. McDonald, C., & Stewart, L. (2008). Uses and functions of music in congenital amusia. Music Perception 25(4), 345–355. Martindale, C., Locher, P., & Petrov, V. M. (2007). Evolutionary and neurocognitive approaches to aesthetics, creativity and the arts. Amityville, NY: Baywood Publishing. Martínez-Molina, N., Mas-Herrero, E., Rodriguez-Fornells, A., Zatorre, R.  J., & MarcoPallares, J. (2016). Neural correlates of specific musical anhedonia. Proceedings of the National Academy of Sciences 113, E7337–E7345.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   387 Mas-Herrero, E., Dagher, A., & Zatorre, R. J. (2018). Modulating musical reward sensitivity up and down with transcranial magnetic stimulation. Nature Human Behaviour 2, 27–32. Mas-Herrero, E., Marco-Pallares, J., Lorenzo-Seva, U., Zatorre, R. J., & Rodriguez-Fornells, A. (2013). Individual differences in music reward experiences. Music Perception 31(2), 118–138. Mas-Herrero, E., Zatorre, R.  J., Rodriguez-Fornells, A., & Marco-Pallares, J. (2014). Dissociation between musical and monetary reward responses in specific musical anhedonia. Current Biology 24(6), 699–704. Matrone, C., & Brattico, E. (2015). The power of music on Alzheimer’s disease and the need to understand the underlying molecular mechanisms. Journal of Alzheimer’s Disease and Parkinsonism 5. doi:10.4172/2161-0460.1000196 Medaglia, J. D., Lynall, M. E., & Bassett, D. S. (2015). Cognitive network neuroscience. Journal of Cognitive Neuroscience 27(8), 1471–1491. Menninghaus, W., Wagner, V., Hanich, J., Wassiliwizky, E., Jacobsen, T., & Koelsch, S. (2017). The distancing-embracing model of the enjoyment of negative emotions in art reception. Behavioral and Brain Sciences 40, 1–58. Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage 28(1), 175–184. Molnar-Szakacs, I., & Overy, K. (2006). Music and mirror neurons: From motion to “e”motion. Social Cognitive and Affective Neuroscience 1(3), 235–241. Montag, C., Reuter, M., & Axmacher, N. (2011). How one’s favorite song activates the reward circuitry of the brain: Personality matters! Behavioural Brain Research 225(2), 511–514. Müller, V., Höfel, L., Brattico, E., & Jacobsen, T. (2010). Aesthetic judgments of music in experts and laypersons: An ERP study. International Journal of Psychophysiology 76(1), 40–51. Müller, V., Sanger, J., & Lindenberger, U. (2013). Intra- and inter-brain synchronization during musical improvisation on the guitar. PLoS ONE 8, e73852. Nadal, M., Munar, E., Capo, M. A., Rossello, J., & Cela-Conde, C. J. (2008). Towards a framework for the study of the neural correlates of aesthetic preference. Spatial Vision 21(3–5), 379–396. Nadal, M., & Pearce, M. T. (2011). The Copenhagen neuroaesthetics conference: Prospects and pitfalls for an emerging field. Brain and Cognition 76(1), 172–183. Newman, M. E. J. (2010). Networks: An introduction. Oxford: Oxford University Press. Nieminen, S., Istok, E., Brattico, E., Tervaniemi, M., & Huotilainen, M. (2011). The development of aesthetic responses to music and their underlying neural and psychological mechanisms. Cortex 47(9), 1138–1146. Pallesen, K. J., Bailey, C. J., Brattico, E., Gjedde, A., Palva, J. M., & Palva, S. (2015). Experience drives synchronization: The phase and amplitude dynamics of neural oscillations to musical chords are differentially modulated by musical expertise. PLoS ONE 10, e0134211. Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2005). Emotion processing of major, minor, and dissonant chords: a functional magnetic resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453. Patel, A. (2008). Music, language, and the brain. Oxford: Oxford University Press. Pearce, M. T. (2015). Effects of expertise on the cognitive and neural processes involved in musical appreciation. In J.  P.  Huston, M.  Nadal, F.  Mora, L.  Agnati, & C.  J.  Cela Conde (Eds.), Art, aesthetics, and the brain (pp. 319–338). Oxford: Oxford University Press. Pearce, M. T., Ruiz, M. H., Kapasi, S., Wiggins, G. A., & Bhattacharya, J. (2010). Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. NeuroImage 50(1), 302–313.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

388   elvira brattico Pearce, M. T., Zaidel, D. W., Vartanian, O., Skov, M., Leder, H., Chatterjee, A., & Nadal, M. (2016). Neuroaesthetics: The cognitive neuroscience of aesthetic experience. Perspectives on Psychological Science 11(2), 265–279. Pelowski, M., Markey, P. S., Forster, M., Gerger, G., & Leder, H. (2017). Move me, astonish me . . . delight my eyes and brain: The Vienna Integrated Model of top-down and bottom-up processes in Art Perception (VIMAP) and corresponding affective, evaluative, and neurophysiological correlates. Physics of Life Reviews 21, 80–125. Pelowski, M., Markey, P. S., Lauring, J. O., & Leder, H. (2016). Visualizing the impact of art: An update and comparison of current psychological models of art experience. Frontiers in Human Neuroscience 10, 160. doi:10.3389/fnhum.2016.00160 Pereira, C. S., Teixeira, J., Figueiredo, P., Xavier, J., Castro, S. L., & Brattico, E. (2011). Music and emotions in the brain: Familiarity matters. PloS ONE 6(11), e27241. Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience 6, 688–691. Peretz, I., & Zatorre, R. (Eds.). (2003). The cognitive neuroscience of music. Oxford: Oxford University Press. Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annual Review of Psychology 56, 89–114 Quarto, T., Fasano, M. C., Taurisano, P., Fazio, L., Antonucci, L. A., Gelao, B., . . . Brattico, E. (2017). Interaction between DRD2 variation and sound environment on mood and emotionrelated brain activity. Neuroscience 341, 9–17. Redies, C. (2015). Combining universal beauty and cultural context in a unifying model of visual aesthetic experience. Frontiers in Human Neuroscience 9, 218. Retrieved from https:// doi.org/10.3389/fnhum.2015.00218 Reybrouck, M., & Brattico, E. (2015). Neuroplasticity beyond sounds: Neural adaptations following long-term musical aesthetic experiences. Brain Sciences 5(1), 69–91. Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research 3(2), 131–141. Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects human aesthetic responses to music. Social Cognitive and Affective Neuroscience 11(6), 884–891. Sacks, O. (2007). Musicophilia: Tales of music and the brain. New York: Vintage. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14, 257–262. Salimpoor, V. N., Van Den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R.  J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219. Salimpoor, V. N., & Zattore, R. J. (2013). Neural interactions that give rise to musical pleasure. Psychology of Aesthetics, Creativity, and the Arts 7, 62–75. Samson, S., Dellacherie, D., & Platel, H. (2009). Emotional power of music in patients with memory disorders: Clinical implications of cognitive neuroscience. Annals of the New York Academy of Sciences 1169, 245–255. Savage, P.  E., Brown, S., Sakai, E., & Currie, T.  E. (2015). Statistical universals reveal the structures and functions of human music. Proceedings of the National Academy of Sciences 112, 8987–8992. Schafer, T., Sedlmeier, P., Stadtler, C., & Huron, D. (2013). The psychological functions of music listening. Frontiers in Psychology 4, 511. Retrieved from https://doi.org/10.3389/ fpsyg.2013.00511

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

the neuroaesthetics of music: a research agenda coming of age   389 Scruton, R. (1999). The aesthetics of music. Oxford: Oxford University Press. Seghdi, N., & Brattico, E. (in press). The phylogenetic roots of music. Biokulturelle Menneske. Sloboda, J. A. (1985). The musical mind. Oxford: Oxford University Press. Sloboda, J.  A. (1992). Empirical studies of emotional response to music. In M.  R.  Jones & S.  Holleran (Eds.), Cognitive Bases of Musical Communication (pp. 33–46). Washington, DC: American Psychological Association. Smith, C.  U. (2005). Evolutionary neurobiology and aesthetics. Perspectives in Biology and Medicine 48(1), 17–30. Steinbeis, N., & Koelsch, S. (2009). Understanding the intentions behind man-made products elicits neural activity in areas dedicated to mental state attribution. Cerebral Cortex 19(3), 619–623. Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., . . . Yanai, K. (2008). Discrete cortical regions associated with the musical beauty of major and minor chords. Cognitive, Affective, & Behavioral Neuroscience 8(2), 126–31. Tallis, R. (2008). The limitations of a neurological approach to art: Review of Neuroarthistory: From Aristotle and Pliny to Baxandall and Zeki by John Onians (Yale University Press, 2008). Lancet 372, 19–20. Tallis, R. (2011). Reflections of a metaphysical flaneur. London and New York: Routledge. Tiihonen, M., Brattico, E., Maksimainen, J., Wikgren, J., & Saarikallio, S. (2017). Constituents of music and visual-art related pleasure: A critical integrative literature review. Frontiers in Psychology 8, 1218. Retrieved from https://doi.org/10.3389/fpsyg.2017.01218 Trost, W., Ethofer, T., Zentner, M., & Vuilleumier, P. (2012). Mapping aesthetic musical emotions in the brain. Cerebral Cortex 22(12), 2769–2783. Trost, W., Frühholz, S., Cochrane, T., Cojan, Y., & Vuilleumier, P. (2015). Temporal dynamics of musical emotions examined through intersubject synchrony of brain activity. Social Cognitive and Affective Neuroscience 10(12), 1705–1721. Trost, W., Frühholz, S., Schön, D., Labbé, C., Pichon, S., Grandjean, D., & Vuilleumier, P. (2014). Getting the beat: Entrainment of brain activity by musical rhythm and pleasantness. NeuroImage 103, 55–64. Vartanian, O., & Goel, V. (2004). Neuroanatomical correlates of aesthetic preference for paintings. Neuroreport 15(5), 893–897. Vuust, P., & Kringelbach, M. L. (2010). The pleasure of making sense of music. Interdisciplinary Science Reviews 35(2), 166–182. Vuust, P., Ostergaard, L., Pallesen, K. J., Bailey, C., & Roepstorff, A. (2009). Predictive coding of music: Brain responses to rhythmic incongruity. Cortex 45(1), 80–92. Vuust, P., & Witek, M.  A. (2014). Rhythmic complexity and predictive coding: A novel approach to modeling rhythm and meter perception in music. Frontiers in Psychology 5, 1111. Retrieved from https://doi.org/10.3389/fpsyg.2014.01111 Wassiliwizky, E., Koelsch, S., Wagner, V., Jacobsen, T., & Menninghaus, W. (2017). The emotional power of poetry: Neural circuitry, psychophysiology and compositional principles. Social Cognitive and Affective Neuroscience 12(8), 1229–1240. Wilkins, R. W., Hodges, D. A., Laurienti, P. J., Steen, M., & Burdette, J. H. (2014). Network science and the effects of music preference on functional brain connectivity: From Beethoven to Eminem. Scientific Reports 4, 6130. doi:10.1038/srep06130 Witek, M. A., Clarke, E. F., Wallentin, M., Kringelbach, M. L., & Vuust, P. (2014). Syncopation, body-movement and pleasure in groove music. PLoS ONE 9, e94446. Witek, M. A., Gilson, M., Clarke, E. F., Wallentin, M., Deco, G., Kringelbach, M. L., & Vuust, P. (forthcoming). The brain dynamics of musical groove: Whole-brain modelling of effective

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

390   elvira brattico connectivity reveals increased metastability of reward and motor networks. Nature Communication. Witek, M. A., Kringelbach, M. L., & Vuust, P. (2015). Musical rhythm and affect: Comment on “The quartet theory of human emotions: An integrative and neurofunctional model” by S. Koelsch et al. Physics of Life Reviews 13, 92–94. Zatorre, R. J. (2015). Musical pleasure and reward: Mechanisms and dysfunction. Annals of the New York Academy of Sciences 1337, 202–211. Zeki, S. (1999). Inner vision: An exploration of art and the brain. Oxford: Oxford University Press. Zeki, S. (2013). Clive Bell’s “Significant Form” and the neurobiology of aesthetics. Frontiers in Human Neuroscience 7, 730. Retrieved from https://doi.org/10.3389/fnhum.2013.00730 Zeki, S. (2014). Neurobiology and the humanities. Neuron 84(1), 12–14. Zhdanov, A., Nurminen, J., Baess, P., Hirvenkari, L., Jousmaki, V., Makela, J. P., . . . Parkkonen, L. (2015). An internet-based real-time audiovisual link for dual MEG recordings. PLoS ONE 10, e0128485. Zhou, G., Bourguignon, M., Parkkonen, L., & Hari, R. (2016). Neural signatures of hand kinematics in leaders vs. followers: A dual-MEG study. NeuroImage 125, 731–738.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

chapter 16

M usic a n d L a nguage Daniele Schön and Benjamin Morillon

Introduction While music and language may differ in terms of their structures and functions, they share the distinctive feature of being dynamically organized in time; the information they carry is intrinsically contained in the temporal dimension. A frequently asked question is whether music and language are processed by similar or different brain regions, neural networks, or cortical oscillatory processes, and to what extent the brain circuitry is specialized compared to other stimuli. In order to tackle these issues, it is worth keeping in mind some principles. Nikolaas Tinbergen and David Marr described different levels of analysis that must, in their valuable opinion, be taken into account if one wants to understand behavior and complex systems (Marr, 1982; Tinbergen, 1963). Marr’s three levels of analysis (computational, algorithmic, and implementational) are particularly suited to study brain functions. Because music and language differ in terms of surface acoustic features and convey different purposes, the computations needed to process them differ. On the other hand, at the implementation level, the same organ and a myriad of cells process both music and language. The key program in modern cognitive neurosciences is thus to tackle the algorithmic level (Poeppel, 2012): Are similar or different algorithms involved in the processing of music and language? And what are they? In this chapter, we will begin with a historical perspective, where the human brain is described from a phrenological viewpoint. Then, we will describe the common functions and operations in music and language, the methodological limitations in current approaches, and portray the resource-sharing hypothesis. We will then describe the interdependency between music and language, notably how musical training improves language skills, before trying to bridge music and language in a single context. We will conclude by describing a promising avenue: studies that adopt a dynamical standpoint to understand music and language.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

392    daniele schön and benjamin morillon

On Modularity of Music and Language From a historical perspective, the study of the comparison of music and language brain functions dates back to the early observations of deficits acquired following a brain lesion. Since then, language and musical disorders are described with different terms: aphasia and amusia. This distinction comes along with a deeper distinction between language and musical domains that at the end of the nineteenth century had been the object of structural or historical formalization following very different paths. Language is analyzed as a formal system of different elements, while music is viewed in a historical perspective as an artistic behavior. Language and music are thus viewed as two highly distinct human domains. In this context, the observation of selective impairment of language or musical abilities fits in very well and also complies with the idea that different functions are implemented in different brain regions. The birth of cognitive sciences is strongly influenced by this vision of language as a specific and uniquely human function with dedicated neural structures and music as a different human “artistic” function. At the end of the 1950s Noam Chomsky was convinced that the principles underlying language structure are biologically determined: every individual has the same language potential because it is genetically transmitted, independently of socio-cultural differences. This scientific and political view of language development has had a tremendous impact in the field of linguistics, cognitive sciences, and neurosciences. It stands in clear contrast with that of another giant of psychology, B. F. Skinner. Skinner considered the mind as a tabula rasa whereon only experience could add knowledge. The two giants faced each other in an intellectual duel. The most famous attack of Chomsky (1959) is the argument on the poverty of the stimulus: the child exposed to a limited amount of linguistic stimuli is able to generalize to new linguistic constructions using the rules acquired on the initial set. According to Chomsky, the trial and error learning mechanisms defended by the behaviorists would not be an appropriate model to language acquisition since language is acquired by listening to correct sentences. This observation, as well as the fact that a confined brain lesion such as in Broca’s area may induce a specific language deficit (agrammatism), led to Chomsky’s suggestion that syntactic knowledge may be partly innate. Curiously, Chomsky did not remark that music acquisition follows very similar principles as language acquisition: early acquisition, generativity, and learning from correct structures. Chomsky’s work strongly inspired that of Jerry Fodor who in the early 1980s wrote The modularity of mind (1983). The mind (and the brain) would be organized in independent modules with specific functions. Again Fodor’s view is strongly influenced by and reinforces the results of the neuropsychological literature, digging deeper and deeper into specific deficits following focal brain lesions. The functioning of the brain seems quite simple: every region has a specific functional role and the lesion causes a deficit that may be very specific, for instance affecting independently nouns and verbs processing (Hillis & Caramazza, 1995).

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   393 It is within this context that the field of neuropsychology of music develops, beyond previous anecdotal accounts. As every new field, the desire to gain identity and acknowledgment is strong. Music is thus studied as a special human faculty with dedicated brain areas. This vision is also constrained by some intrinsic limitations of the field of neuropsychology of music. First, research on musical skills in brain lesion patients requires a neurologist or neuropsychologist with a musical background. Indeed, while testing language skills may appear a simple task, assessing musical abilities definitely requires special skills, even more so in the era of “pencil and paper.” The second limitation is the Western idea that music is the prerogative of a few people, called musicians, and thus it only make sense to assess musical abilities in experienced musicians such as composers, conductors, or performers with musical education (Basso & Capitani, 1985; Luria, Tsevetkova, & Futer, 1965). Altogether this gives access to a limited amount of data strongly influenced by the modular approach, with musical functions clearly distinct from other human abilities. This is the vision that is well summarized in the article entitled “Modularity of music processing” (Peretz & Coltheart, 2003): several single case studies are used to defend not only the hypothesis of modularity of music and language, but also the modularity of different levels of music processing. However, focusing on a single function, even more when using a single methodological approach (for instance, brain lesions) will systematically lead toward a modular interpretation of reality. In other words, focusing only on language syntactic processing in Broca’s aphasics will necessarily lead one to conclude that the left inferior frontal operculum is involved (or not) in syntactic processing. This may be in turn interpreted in a modular perspective: syntax is independently and specifically processed in the left frontal operculum. By contrast a comparative approach will give a broader and more complex picture. Patel (2003), considering language and musical syntax, claims that, while these may seem very different, there are several commonalities, such as the need to build an integrated flow of information that takes into account a certain number of rules. Here we can clearly see all the power of the comparative approach that requires us to go beyond a circular definition of cognitive function (e.g., syntax is syntax) in order to compare apparently different function (e.g., syntax and harmony) that can possibly be redefined in terms of a more elementary function with a greater psychobiological validity. In the case of syntax and harmony, finding common substrates requires one to redefine the object of the study (i.e., some elementary operation common to both). With the advent of the neuroimaging era, while the first two decades have been dominated by a modular approach, the last decade has put the accent on the importance of the network and its connections. Cognitive neurosciences have also gained access to the functioning of non-pathological brains using highly sophisticated experimental designs. This has allowed a breakdown of both language and music processing into more elementary operations. If the search of biomarkers has somewhat consolidated the innatist model, several major criticisms have been developed further. For instance, studies on the zebra finch, a species of bird well known for their ability to learn new songs, showed that it is the learning process that alters the neuronal circuits. The maturation of synaptic inhibition onto premotor neurons is correlated with learning but not age (Vallentin, Kosche, Lipkind, & Long, 2016). This shows that even in a species wherein one could think that

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

394    daniele schön and benjamin morillon the rules governing song acquisition are genetically encoded, the environment plays an important role. Of course putting a zebra finch in a cage with a cat (and assuming that the cat did not eat the bird) would not allow the bird to learn how to meow, that is to say that the genes do play a major role, of course. When considering the case of language and music, two extremely refined forms of communication, while the human species specificity is certainly genetically encoded, this does not imply that whatever allows the development of language is specific to language and not shared with music. In other words, if language and music are specific to humans in their capacity to convey an extraordinary amount of information, one should not misinterpret this in terms of different evolutionary or developmental trajectories of language and music. Psychology of music and neurosciences of music are recent fields of research. The major limitation of new disciplines (and of humans) is their strong desire to build their own identity, which often occurs to the detriment of considering neighboring disciplines (and identities). Our field has also yielded to this temptation by building musical cognitive models that have initially ignored other potentially inspiring and similar domains, such as language for instance. We will see now that music shares several cognitive operations with language.

Common Functions and Operations in Music and Language Both language and music serve a highly sophisticated communicative function. While we will refrain here from giving a definition of what is language and what is music, it is important to keep in mind that they require a huge amount of different perceptual and cognitive operations. To perceive both music and language, one of the first operations that needs to be implemented is the possibility to discriminate sounds. The two phonemes [d] and [t] are quite similar but need to be distinguished as it is the case for a C and B in music or for the same pitch played by an oboe or a bassoon. Sounds can be characterized in terms of a limited number of spectral features and these features are relevant to both musical and linguistic sounds. The analysis of the acoustic features of sounds takes place in the cochlea and in several subcortical relays up to the primary auditory cortex. There is a suggestion that the auditory cortex may be asymmetric in the use of temporal windows of analysis, with the left auditory cortex preferring short windows of integration and the right auditory cortex preferring longer windows (Giraud et al., 2007; Poeppel, 2003; Zatorre, Belin, & Penhune, 2002). This hypothesis has been used to defend the idea that language, requiring short windows of analysis to discriminate consonants, is preferentially processed in the left hemisphere, while music, requiring longer windows of analysis to discriminate pitch, is preferentially processed in the right hemisphere. While the debate is still open, one should keep in mind that language perception is not just consonant discrimination,

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   395 but also requires us to take into account other features, such as pitch in tone or stresses, that require longer windows of analysis. On the other side, music is often considered in our Western society and by non-musicians as mostly relying on pitch discrimination. However, any good musician will claim that an extremely important feature of music is the sound quality, which is not stationary as pitch and requires short analysis windows. The scenario is thus more complicated than it is often depicted and the idea of the cortex performing parallel processing on any acoustic input, yielding to the extraction of complementary piece of information, seems necessary to overcome the simplistic monolithic distinction between language and music. Generating different patterns of neuronal responses to every sound would yield, in everyday life, to an infinite number of sound representations. This is why sounds are categorized. Two acoustically different tokens of [b] will thus be perceived as a unique [b]. Two different high Es of the violin will be perceived as E, even if one is slightly lower than the other; an A and a C note of a piano will be perceived as “piano” sounds. Categorization is necessary and common to both language and music and it allows us to make sense of the world, by reducing its intrinsic variety to a finite and limited number of categories. Categorical representations of sounds are possibly distributed across neuronal populations within the human auditory cortices, including primary auditory areas (Belin, Zatorre, Lafaille, Ahad, & Pike, 2000; Liebenthal, Binder, Spitzer, Possing, & Medler, 2005; Rauschecker & Scott, 2009; Rauschecker & Tian, 2000; Staeren, Renvall, De Martino, Goebel, & Formisano, 2009), although motor regions seem also to play a role in representing, for instance, phonemic acoustic features (Cheung, Hamilton, Johnson, & Chang, 2016). We rarely perceive sounds in isolation, but rather in a complex flow. This requires us to build a structure that evolves in time, taking into account the different phonemes of a sentence or tones of a melody. Building such a structure requires at the very least a working memory capacity that allows manipulating sound representations. Sounds are grouped into larger units and this grouping depends upon our previous experience with these sounds. In other words we take advantage of our previous experience with the world and build multiple statistical distributions of sounds. Different distributions will account for different grouping strategies: for instance, streaming a specific voice or musical instrument in a cocktail party or in a musical ensemble (Elhilali & Shamma, 2008); grouping phonemes together or tones to build words or melodies according to the transitional probabilities of phonemes or tones (Saffran, Aslin, & Newport, 1996; Saffran, Johnson, Aslin, & Newport,1999; Schön et al., 2008). These statistical distributions are built on the memory traces of what we have previously perceived and strongly influence our upcoming perception of the world. In fact, following these statistical distributions, several rules may emerge that allow us to simplify even more the complex and continuous auditory flow. Importantly in both language and music, the distributions can also be computed onto symbolic unit. These distributions or internal models at different feature levels have two major consequences. The first, cited earlier, is that they allow us to generate new sequences having similar statistical properties—in other words, new sentences or melodies complying to

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

396    daniele schön and benjamin morillon the rules of the musical or linguistic system. The second is that they allow us to make accurate predictions on upcoming events. Listening to a person speaking or playing we will be able to anticipate, to a certain degree, what and when is going to be said/played. Considering the very fast and changing nature of the auditory flow, this ability is of utmost importance and it explains why sounds (phonemes or tones) missing from a speech or musical signal can be restored by the brain and appear to be heard (DeWitt & Samuel, 1990). In this respect music is particularly challenging because it may require us to anticipate simultaneously several distinct streams of features. For instance, when listening to a symphony orchestra or a string quartet, several melodic lines take place at the same time and need to be anticipated in order to perceive a sense of continuity in the music. Overall, language and music are characterized by a limited set of acoustic features, categorized by the human brain into a limited set of representations, and subjected to similar rules of statistical learning.

Overlap and Resource Sharing Since most research in cognitive neuroscience has been guided by the assumption that brain regions are specialized for a given function, studies on music and language have addressed the question of whether music and language share common neural substrates. This has been often referred as the notion of overlap (Patel, 2011). The idea is simple. If one could show that there is a strong overlap for music and language processing, this would go against a modular and domain-specific view. However, there are more problems with this approach than one might imagine. We will review them briefly in the following section together with some neuroimaging findings. The first problem is of purely methodological order. Indeed, many published works using fMRI, including those comparing music and speech processing, use a subtraction logic. Namely, results are a statistical contrast that only allows us to see which areas show a greater signal compared to another condition. This is referred to the tip of the iceberg problem. Indeed, it may well be that by contrasting a language and a music task one finds a peak in a given region. This is then interpreted as a specific area dedicated to language (or to music, depending upon the direction of the subtraction; see for instance Rogalsky & Hickok, 2011). However, this completely ignores the possibility that there is a large common substrate that is invisible when making the subtraction (e.g., 100 and 101 share 100, but 101–100 only shows 1). This approach is Manichean and suffers from its lack of quantitative descriptions. These studies have, therefore, a methodological bias toward highlighting differences rather than commonalities. A second series of problems is the experimental designs that have been used. Indeed, only a few studies have directly used the same participants and the same experiment music and language processing. Comparing results across studies will also tend to show differences that may not be due to brain computations but to differences in the populations, acquisition, or analysis pipelines. Even when assessing music and language processing

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   397 in the same participant, there remains the challenge of comparing comparable conditions. This goes beyond the fact that speech and music stimuli by nature are different acoustically, insofar as if this was the only difference it should affect only the primary auditory cortex. The real challenge is to define the proper elementary operation and balance the difficulty level of the task across linguistic and musical stimuli. Defining the operation is already quite challenging because it requires a “good” model of what to compare. Of course comparing music and language does not make any sense, because there is no such a thing as a function for music in the brain. Thus, music and language need to be reduced to more elementary functions as described earlier. But even comparing syntactic processing is not trivial. Indeed, one needs to choose which syntactic level to compare in language (syntactic embedding or gender agreement do not imply the same operations) and find the good analogy in music. Then, the researcher is still left with a complicated issue, that of the difficulty level. For instance, in comparing the role of pitch in music and in language prosody, one should ascertain that the difficulty level of the task is comparable across material rather than using a fixed criterion (e.g., detect a 15 percent pitch change) that may be trivial with music but not with speech (Schön, Magne, & Besson, 2004). Another important issue is raised by Peretz and colleagues: It is important to keep in mind that neural overlap does not necessarily entail neural sharing. The neural circuits established for musicality may be intermingled or adjacent to those used for a similar function in language and yet be neurally separable. For example, mirror neurons are interspersed among purely motor-related neurons in pre-motor regions of the macaque cortex (Rizzolatti & Craighero, 2004). Similarly, the neurons responsible for the computation of some musical feature may be interspersed among neurons involved in similar aspects in speech. (Peretz, Vuvan, Lagrois, & Armony, 2015, p. 3)

The problem that is raised here is the scale problem of human anatomy. Historically, there has been a very rough distinction of music and language in terms of hemispheric dominance and this led many people to believe that language is processed by the left hemisphere and music by the right hemisphere. We now clearly know that this is not the case (Lindell, 2006; Vigneau et al., 2011). Then, there have been more specific claims that the left Broca’s area would be language specific, but this has also been falsified, by showing for instance that musical harmony (Koelsch et al., 2002; Maess, Koelsch, Gunter, & Friederici, 2001) and rhythm processing (Herdener et al., 2012) are mediated by the same regions processing language syntax (Friederici & Kotz, 2003). Further work based on multivariate pattern analysis has shown that within overlapping regions, distinct brain patterns of responses can be found to linguistic and musical sounds (Abrams et al., 2010; Fedorenko, McDermott, Norman-Haignere, & Kanwisher, 2012). However, these differences could be accounted for in terms of differences in the stimuli manipulation or in the task. For instance, Abrams et al. (2010) compared scrambled versions of music and speech to normal music and speech and used a fixed scrambling window of 350 ms. As the authors acknowledge, it could be that music and speech have inherently different acoustical regularities and structures, rendering one material more “scrambled” than the other.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

398    daniele schön and benjamin morillon Also, different patterns of activation in common brain areas may result from the same neural population reacting differently to music and language (Kunert & Slevc, 2015). The argument raised by Peretz advocates for the possibility of music dedicated neurons, adjacent to language dedicated neurons. While this is of course a non-falsifiable hypothesis for the moment, one should not think of music or language as a whole, but in terms of precisely defined elementary operations. If these operations are required with both language and music material, then there would be no reason for the brain to produce two extremely intermingled networks computing the same algorithm. On the other side it is clear that the rules determining gender agreement or those affecting tonality modulations are necessarily represented in different neural networks. Thus, claiming that differences may always subsist at a smaller scale is a recursive argument that does not really add much to the debate (besides the fact that at a quantum level, music and language can be described by the same equations). In our view, the major advances will not come from single unit recordings showing specific neurons to the last chord of a precise Haydn piano sonata, but rather from neurocomputational models precisely describing what particular operations are subtended by a given neural network when listening to speech and to music. A more promising approach seems to us to study whether two different levels of music and language processing interact or not. Indeed, the interaction is a measure of the extent to which two processes influence each other and as such it can be used to infer that one process is not independent of the other. Several studies have tackled this issue by using interference paradigms. For instance Slevc and colleagues (Slevc, Rosenberg, & Patel, 2009) measured the reading time of garden path sentences and found that it was influenced by simultaneous presentation of irrelevant harmonically unexpected chords while it was not affected by timbrally unexpected chords (e.g., a different instrument). These results have been interpreted as evidence for shared music–language resources processing structural (syntactic) relations. The task-irrelevant music being processed automatically, it uses some resources resulting in a suboptimal processing of the language syntactic relations. Other studies have used this approach to show an interaction between melodic and syntactic processing (Fedorenko, Patel, Casasanto, Winawer, & Gibson, 2009), harmonic and syntactic processing but not semantic processing (Hoch, Poulin-Charronnat, & Tillmann, 2011), and harmonic processing and word recall (Fiveash & Pammer, 2014). This has also been coupled to electrophysiological measures, confirming that melodic or harmonic unexpected events affect the syntax-related left anterior negativity (Carrus, Pearce, & Bhattacharya, 2013; Koelsch, Gunter, Wittfoth, & Sammler, 2005). Interestingly Sammler et al. (2013) showed a co-localization of early components elicited by musical and linguistic syntactic deviations using intracranial recordings. Surprisingly, few neuroimaging studies have exploited the possibly most natural setting to compare music and language which is a stimulus that combines both speech and music: song. The use of songs has the clear advantage of solving the problem of using different stimuli in the language and musical task. Schön et al. (2010) used an interference paradigm based on sung sentences and showed that the processing demands of melodic and lexical/phonological processing interact in a large network including bilateral

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   399 Phonological main effect as a function of the interaction

400

No. of suprathreshold voxels

350 300 250 200 150 100 50 0

0

0.5

1

1.5 2 2.5 t values of LM interaction

3

3.5

Figure 1.  Number of surviving voxels for the main effect of lexical/phonological dimension as a function of the threshold of the interaction between phonological and melodic dimensions. The dotted vertical line indicates the p-value of 0.05 for the mask. The right edge corresponds to a very conservative p-value (adapted from Schön et al., 2010).

temporal cortex and left inferior frontal cortex. Importantly, most voxels sensitive to the lexical/phonological manipulation are also sensitive to the interaction between the lexical/phonological and the melodic dimensions. In other words there seem to be very few voxels that are involved in lexical/phonological and are not influenced by melodic structure (see Fig. 1). Similarly, Sammler et al. (2010) using an adaptation paradigm, showed a strong integration between melodic and phonological levels in song in the dorsal pathway with a degree of integration decaying toward anterior regions of the left STS, possibly resulting from the processing of meaning of words. This integration of melodic and phonological dimension is also in line with the findings that a sung language is more easily learned than a spoken language (Schön et al., 2008). Kunert and colleagues (Kunert, Willems, Casasanto, Patel, & Hagoort, 2015) showed an effect of musical harmonic deviancy on language syntax processing in the left inferior frontal gyrus. Notably this effect was not present when the deviancy in the musical stimulus was limited to the acoustic level (louder sound). Interestingly the authors also showed, in a behavioral study, an effect of the syntactic structure of sentences on the performance of a musical harmonic judgment task, confirming the idea of shared resources. One may wonder how to combine these data suggesting shared resources with the “older” data issued from the neuropsychological studies pointing rather to a specificity and independence of several levels of language and music processing. However, very few studies have tried to systematically assess the co-existence of language and musical deficits, even for the most studied language deficit following a lesion in Broca’s area.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

400    daniele schön and benjamin morillon Ani Patel was the first to investigate brain-damaged individuals and more specifically aphasic individuals with grammatical comprehension problems in language in order to see whether they also have a deficit in processing structural musical relations (Patel, Iversen, Wassenaar, & Hagoort, 2008). Broca’s aphasic patients and controls had to judge whether a set of sentences contained or not a grammatical or semantic error. A similar task was used with harmonic error introduced into musical chord sequences. In a second experiment participants were tested using an implicit harmonic priming procedure. Both experiments showed that the aphasic patients have an impaired musical syntactic processing. Importantly, this took place in absence of low-level deficits, and with a preserved short-term memory for pitch patterns. This scenario is complicated by the fact that not all agrammatic patients may necessarily show a musical deficit (Slevc, FaroqiShah, Saxena, & Okada, 2016). On a similar line, Sammler and colleagues (Sammler, Koelsch, & Friederici, 2011) showed a reduction or extinction of the typical electrophysiological marker of musical syntax processing in agrammatic patients with a lesion in the left inferior frontal cortex. These results are consistent with the hypothesis that Broca’s area computes a rather domain-general “syntactic” processing but still a huge amount of work remains to be done with brain-lesioned patients.

Music Training and Language Skills We have seen that the approach of studying music and language brain correlates is limited by a number of methodological problems that render the interpretation of the results in terms of sharing or not of the resources rather complex. Another way to address the sharing resources hypothesis is to investigate whether music training affects the way the brain processes language, and vice versa. The reasoning is the following. Musical expertise requires an intense training often starting at an early age. As a result of learning, all the operations required by music perception and production will be affected by this training and become more efficient. If some of these operations are also required by language perception and production, then one should be able to observe a more efficient processing whenever the appropriate language processing levels are investigated. By contrast with the approach described above, the validation of this hypothesis does not necessarily require brain imaging data, insofar as behavioral differences can be taken as evidence that resource sharing exists. Psychologists and some neuroscientists often use the term “transfer of learning.” This term is, however, rather vague as it seems to point to some sort of magic transfer of learning from one domain to another or from one function to another function without specifying how this transfer would actually take place. However, an alternative explanation is to hypothesize that these so-called transfer effects are simply due to an elementary function that is shared by both music and language

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   401 processing. According to this view there is no transfer taking place, but only sharing of functions and resources. Importantly, while there is no clear way of showing how transfer could be possibly implemented, shared elementary operations can be defined via careful experimental manipulations. Considering the early steps of sound analysis helps to clarify this point. The group of Nina Kraus has studied for many years the effect of music training on sound perception in general, including speech. Using EEG and focusing on high frequency (>200 Hz) neural responses, possibly principally occurring at the subcortical level, this group of researchers has shown that, compared to non-musicians, musicians have a stronger representation of several features of speech sounds, including the fundamental frequency (Wong, Skoe, Russo, Dees, & Kraus, 2007), the harmonics (Kraus & Chandrasekaran, 2010), and rapid transients that may be important in distinguishing consonants (ParberyClark, Tierney, Strait, & Kraus, 2012). Overall, the correlation between the neural response and the stimulus is greater in musicians than in non-musicians and this independently of whether the stimulus is a music or a speech sound (Musacchia, Sams, Skoe, & Kraus, 2007). Most importantly this correlation is more resistant to acoustic noise in musicians. In other words, musicians seem to be able to filter out the noise better than non-musicians (Parbery-Clark, Skoe, & Kraus, 2009). Interestingly some of these differences can be observed in adults that had a few years of music training during childhood, thus showing that these changes last in time and do not necessarily require a long-lasting and intense training (Skoe & Kraus, 2012). Moreover, these differences induced by music training are not simply due to a better processing of any sound feature. Indeed, results of a recent experiment show that music training can facilitate the selective processing of certain relevant features of speech. In this study, Intartaglia and colleagues (Intartaglia, White-Schwoch, Kraus, & Schön, 2017) compared French and American participants listening to an American phoneme, not existing in French. The comparison of the neural signatures showed that American participants had a more robust representation compared to French participants. The differences concerned the high formant frequencies that are necessary to encode the specific features of consonants and vowels. They then tested French musicians and the differences with the Americans disappeared. In other words, music training seems to allow a better encoding of the relevant features of speech sounds, even when these sounds are not familiar. When interpreting these overall results one should keep in mind that two possible non-exclusive explanations co-exist. First, the subcortical relays may be more efficient in sound processing due to massive bottom-up processing. In this case one can clearly see that there is no need to advocate for a transfer effect. There is a dedicated auditory subcortical network that processes both musical and linguistic sounds. If this network becomes more efficient via intensive musical training, then speech processing will also benefit from the enhanced efficiency. Second, the cortical regions are known to send efferent signals to the subcortical relays and these modulatory top-down signals may play a role in enhancing the representation of certain features of sounds or in reducing

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

402    daniele schön and benjamin morillon the noise (Strait, Kraus, Parbery-Clark, & Ashley, 2010; Tenenbaum, Kemp, Griffiths, & Goodman, 2011). In this perspective, the changes are possibly due to an enhanced connectivity that allows a finer modulatory activity of cortical over subcortical activity. Independently of whether these enhanced subcortical representations reflect a bottom-up or a top-down modulation, these results are important in interpreting the differences that may be observed at a more integrated level. Indeed, differences observed at a phonological, syntactic, or prosodic level may result from a cascade effect of early auditory processing differences. The studies on prosody and phoneme perception in musicians are particularly sensitive to this issue. Indeed, pitch is important in speech at the supra-segmental level, by signaling the emotional content of an utterance (Kotz et al., 2003), the linguistic structure (Steinhauer, Alter, & Friederici, 1999), and certain syntactic features such as to determine whether the utterance is a question or not (Astésano, Besson, & Alter, 2004). Pitch contour also plays a role at the segmental level in tone languages: it plays a linguistically contrastive function. Musicians are more accurate in detecting subtle pitch variations in both music and speech prosody. These variations in speech prosody are detected earlier by musicians’ brains and elicit more distinguishable event-related potentials compared to normal speech (Schön et al., 2004). This has been replicated with 8-year-old musician children (Magne, Schön, & Besson, 2006). Music lessons also seem to promote sensitivity to emotions conveyed by speech prosody. Indeed, musically trained adults perform better than untrained adults in discrimination and identification of emotional prosody (Thompson, Schellenberg, & Husain, 2004). Finally, musicians are more accurate at identifying, reproducing, or discriminating Mandarin tones (Gottfried & Riester, 2000; Gottfried, Staby, & Ziemer, 2004; Marie, Delogu, Lampis, Belardinelli, & Besson, 2011). However, as previously stated, it is difficult to know to what extent these differences are due to cortical or subcortical plasticity. Considering that anatomical differences have been observed at the cortical level in the auditory cortex (Benner et al., 2017; Kleber et al., 2016; Schlaug, Jäncke, Huang, Staiger, & Steinmetz, 1995; Shahin, Bosnyak, Trainor, & Roberts, 2003), it seems reasonable to believe that the whole auditory network is modified by music training, thus affecting speech processing at multiple levels. Interestingly, previous studies provided evidence for a positive relationship between the function or the anatomy of the planum temporale and performance during syllable categorization (Elmer, Hänggi, Meyer, & Jäncke, 2013). Recently, Elmer and colleagues (Elmer, Hänggi, & Jäncke, 2016) provided evidence for a relationship between planum temporale connectivity, musicianship, and phonetic categorization. They found an increased connectivity between the left and right plana temporalia in musicians compared to non-musicians. This increased connectivity positively correlated with the performance in a phonetic categorization task as well as with musical aptitudes. Indeed, music training seems to affect the sensitivity to some acoustic features that are important to categorization of syllables, in particular temporal features such as voice-onset time (Chobert, Marie, François, Schön, & Besson, 2011; Zuk et al., 2013). Very few studies have examined whether musical expertise influences the processing of the speech temporal structures. While isochrony is absent in speech, several nested

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   403 temporal hierarchies are present in speech (Cummins & Port, 1998; Ghitza, 2011; Giraud & Poeppel, 2012). Musicians outperform non-musicians when asked to judge the lengthening of a syllable in a sentence (Marie, Magne, & Besson, 2011). Also, independently of whether musicians direct attention to the temporal or semantic content, they are more sensitive to subtle changes in the temporal structure of speech than non-musicians (Magne et al., 2006; Marie, Delogu, et al., 2011). Milovanov et al. (2009) reported a positive correlation between musical aptitudes and sensitivity to syllable discrimination in children. In artificial language learning, speech segmentation results from the capacity to parse a continuous stream of syllables and to build and maintain probabilistic relationship of the different elements (syllables) that compose words. François and Schön (2011) showed that musicians have improved segmentation skills compared to non-musicians. Indeed, when listening to a new stream of an artificial language, they are faster and more accurate at segmenting the continuous stream. Children, after only one year of music training already show an improvement in speech segmentation (François, Chobert, Besson, & Schön,  2012). This ability, namely discovering word boundaries in the ­continuous stream of natural speech, is of utmost importance during language learning in the first years of life (Saffran et al., 1996). The evidence concerning an effect of music training on language semantic and syntactic levels is rather scarce. One study showed that music training seems to influence semantic aspects of language processing (Dittinger et al., 2016). However, in this study, French participants had to learn new words that were in Thai language. Thus, differences may be due to the difficulty of the task at the perceptual level in terms of discriminating Thai tokens that differed in terms of pitch or vowel length. At the neural level, results indicate an increased functional connectivity in the ventral and dorsal streams of the left hemisphere during retrieval of novel words in musicians compared to non-musicians (Dittinger, Valizadeh, Jäncke, Besson, & Elmer, 2018). An effect of musical expertise on syntactic processing was shown by Jentschke and Koelsch (2009) with earlier and larger evoked responses to syntactic errors in children with musical training. However, others described that differences are absent at the behavioral level and that musical expertise does not modulate the amplitude of responses evoked by syntactic violations but only the topographical distribution (Fitzroy & Sanders, 2013). Thus, the evidence that music training affects language semantic and syntactic processing is not yet compelling and further studies are awaited. Overall, while the theoretical framework of transfer of learning remains uncertain, there is a rather massive amount of data pointing to an improvement induced by music training at different levels of speech and language processing. Patel (2014) has tried to formalize the conditions under which music training may be beneficial to speech processing. In the OPERA hypothesis (Overlap, Precision, Emotion, Repetition, and Attention) he suggests that, in order for music training to enhance speech processing, music and speech need to share sensory or cognitive processing mechanisms and music must place higher demands on these mechanisms compared to speech. These mechanisms are tightly bound to the music emotional rewards system (Salimpoor et al., 2013). The last ingredients of music-induced and speech-related neural plasticity would be the fact that

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

404    daniele schön and benjamin morillon music training requires a repetition of sound patterns and gestures for an enormous amount of time under conditions of highly focused attention.

Bridging Music and Language When considering the effects of music training on speech and language abilities, one should keep in mind that most of the studies described here compared adult professional musicians to a group of adult non-musicians. This comparison has two methodological weaknesses. The first concerns the possibility of pre-existing differences, namely musicians already differed from non-musicians before starting to make music. The second is that music training is a complex activity, often involving individual lessons, group activities, theory classes, and so on. This makes it impossible to know what factors in music training had an impact on speech and language abilities. Both criticisms can be addressed by running longitudinal studies assessing the absence of differences before the beginning of music training (Chobert, François, Velay, & Besson, 2012; François et al., 2012), and comparing the music-training group with a control group involved in an activity with a similar setting (e.g., visual arts, theater). However, this approach is time and cost consuming, insofar as it requires following two groups of children for a long period of time (often one year), testing them at least twice and coordinating the two training programs. There is an alternative methodological approach that is somewhat in between the interference or interaction approach and the group comparison described earlier. The idea is to test the effect of music stimulation on speech perception. This has proven particularly successful in the temporal domain. Indeed, the structure of speech and music have a similar hierarchical temporal scaffolding (Haegens & Golumbic,  2018; Schön & Tillmann, 2015). A series of studies has shown that priming the temporal structure of speech using a music rhythmic prime can induce a speech processing benefit (Cason, Astésano, & Schön, 2015; Cason & Schön, 2012; Chern, Tillmann, Vaughan, & Gordon, 2018; Przybylski et al., 2013). These studies showed a benefit of rhythmic priming both in phoneme detection and in a grammaticality judgment task. This approach has been particularly efficient with language-impaired population. For instance, passive listening to a rhythmic regular prime improved the performance in a grammaticality judgment task in children with dyslexia or specific language impairment (SLI, Bedoin, Brisseau, Molinier, Roch, & Tillmann, 2016; Przybylski et al., 2013) and patients with a basal ganglia lesion (Kotz, Gunter, & Wonneberger, 2005). While these results support the importance of temporal predictions, the ability to anticipate in time upcoming events, for language processing, it is not clear whether the benefit at the grammatical level is mediated by a selective effect at the syntactic level or by improved speech perception. For instance Cason and colleagues (Cason, Hidalgo, Isoard, Roman, & Schön, 2015) have shown that priming the temporal structure of a sentence with music improved phoneme perception in hearing-impaired children.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   405 Most of these studies have used a passive listening approach. However, an active approach, requiring the intervention of the audio-motor network seems to have a stronger effect than passive listening (Cason, Astésano, & Schön, 2015; Morillon & Baillet, 2017; Morillon, Schroeder, & Wyart, 2014). An interesting avenue for the future is to test the effect of a single session of music training on several levels of speech and language processing. This seems to us a good compromise between all the above-mentioned approaches insofar as it prevents the criticisms of pre-existing differences, and it allows a strict control of the content of the training session without “reducing” music to a passive listening of an isochronous metronome. Recently, Hidalgo and colleagues (Hidalgo, Falk, & Schön, 2017) used this type of approach to investigate temporal adaptation in speech interaction in hearing-impaired children. They showed that a 30 minute session of active rhythmic training facilitated the access to the temporal structure of verbal interactions and improved performance in a simple turn-taking task. One of the factors prompting research in the domain of music and language is the possibility to use music to remediate language impairment. Thus, the fundamental research supports the therapeutic approach of using music to recover impaired functions by defining what aspects of music training benefit language processing and at which levels of processing. While it is not the aim of this chapter to review this literature (see Chapter 29 by Lee, Thaut, and Santoni, this volume), it is important to note that the underlying neuroscientific models supporting the use of music in language rehabilitation have changed. For instance, the development of melodic intonation therapy to recover language function in non-fluent aphasic patients was somewhat driven by the idea that patients can learn a new way to speak through singing by using the right hemisphere (Albert, Sparks, & Helm, 1973; Zumbansen, Peretz, & Hébert, 2014). Forty years later our knowledge of the spatiotemporal dynamics subtending music and language on one side and of the pathophysiology of language disorders on the other side has been refined. Stahl and colleagues (Stahl, Kotz, Henseler, Turner, & Geyer, 2011) have shown, concerning non-fluent aphasia, that rhythmic training may be the most relevant aspect of the musical intervention, rather than the melodic aspect, especially when patients present a basal ganglia lesion, a subcortical structure involved in motor coordination and the processing of temporal information (Kotz & Schwartze, 2010). Interestingly, several recent works on the use of music for language rehabilitation point to an important role of the rhythmic aspect of music. More precisely, musical training targeted toward improving rhythmic perception and production, resulted in improved phonological and reading skills (Bhide, Power, & Goswami, 2013; Cogo-Moreira, de Avila, Ploubidis, & de Jesus Mari, 2013; Flaugnacco et al., 2015; Moore, Branigan, & Overy, 2017; Overy, 2000). These results suggest a shared substrate and point to temporal processing as playing a major role in language processing. This fits with the temporal sampling framework proposed by Goswami (2011) for dyslexia and by extension for SLI. Building on the neural resonance theory positing internal oscillators guiding attention over time (Large & Jones, 1999), Goswami suggests that deficits in syllabic segmentation and other sequential processes may result from impaired rhythmic entrainment leading to difficulties in sampling information over time. Along a similar line, Tierney and Kraus (2014)

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

406    daniele schön and benjamin morillon proposed the precise auditory timing hypothesis (PATH) that suggests that neural entrainment in auditory and motor cortex, and the interaction between them, underlies many of the behavioral aspects of both language and music processing. We will now describe music and language with a temporal focus.

A Temporal Focus into Music and Language As of today, the most promising approach to understand information processing seems to us the adoption of a dynamical emphasis, focusing on the temporal dimension. This perspective can be operationalized in complementary ways, ranging from portraying temporal regularities within sensory inputs to investigating time-resolved neural patterns of activity implicated in sensory processing, both in terms of frequency-resolved neural oscillations and neural networks dynamics. The underlying motivation is to describe information processing at the algorithmic (or representational) level, as first proposed by David Marr (Marr, 1982; Poeppel, 2012)—in other words to understand how the system does what it does, and more precisely what representations it uses, how they emerge, and how they are manipulated. Describing the time constants or the temporal profile of activity of each of these neural algorithms constitutes a preliminary stage toward this ultimate goal. While this approach can be carried out separately for music and language, a direct comparison of the two is also useful to delimitate general processing steps from more specific ones. In the speech domain, David Poeppel has theorized this approach in the “asymmetric sampling in time” hypothesis (Giraud and Poeppel, 2012; Poeppel, 2003). Basically, speech can be described as a multi-timescale signal, with a hierarchical organization composed of phonemic, syllabic, and prosodic information (among others). At the neural level, both parallel and sequential processing occurs, with gamma (~30 Hz), theta (~5 Hz), and delta (~2 Hz) oscillations being specifically engaged by these multi-timescale, quasi-rhythmic properties of speech, and tracking its dynamics. Giraud and Poeppel argue that such neural oscillations “are foundational in speech and language processing, ‘packaging’ incoming information into units of the appropriate temporal granularity” (Giraud & Poeppel, 2012, p. 511). Interestingly, music is also characterized by a multi-timescale structure, with rhythm and meter hierarchically organized (Vuust & Witek, 2014). However, in an acoustic characterization of the temporal modulations in music and speech, Ding and colleagues (2017) recently highlighted that their temporal modulation rates differ. While the main tempo of music is around 2 Hz (120 bpm), a temporal modulation around 5 Hz primarily characterizes speech, which corresponds to the syllabic rate. At least two complementary avenues can be drawn from this result. First, the distinction between music and speech modulation properties could be at  the origin of some of their computational differences. In a fascinating paradigm,

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   407 Oded Ghitza showed that intelligibility of time-compressed speech can be greatly enhanced if periods of silence of the appropriate duration are inserted (Ghitza, 2011; Ghitza & Greenberg, 2009). Oscillation-based models of speech perception best explain these data, where optimum intelligibility is achieved when the syllable rhythm is within the range of the theta-frequency brain rhythms (~4–10 Hz), comparable to the rate at which segments and syllables are articulated in conversational speech. Follow-up experiments were performed in the music domain, where participants had to identify the musical key of time-compressed short melodic sequences (Farbood, Marcus, & Poeppel, 2013; Farbood, Rowland, Marcus, Ghitza, & Poeppel, 2015). This highlighted that insertion of silence gaps was beneficial to performance, in unison with the speech experiments, providing compelling clues into possible oscillatory mechanisms underlying segmentation of auditory information. However, the two experiments in the music domain were not conclusive with regard to the preferred rate of processing, observed at 2–3 Hz or 5–7 Hz, respectively. While the former result would be compatible with the fact that the main tempo of music is around 2 Hz, suggesting that the distinctions between music and speech acoustic modulation properties are a productive attribute of their respective perceptual analysis, the latter would be compatible with the idea that the auditory cortex parses information at the theta rate, and that such sampling operates rather independently of the nature of the acoustic signal (music or speech). Second, the most shared characteristic between music and language acoustic signals is that both of them have strong temporal constraints (i.e., a salient main modulation rate, at ~2 and 5 Hz, respectively), leading to strong temporal predictions. Temporal predictions are believed to play a fundamental role in the way we sample sensory information, in particular in the auditory domain (Jones, 1976; Nobre & van Ede, 2018; Schroeder & Lakatos, 2009). Behavioral experiments show that anticipating the moment of occurrence of an upcoming event optimizes its processing by improving the quality of sensory information (Jaramillo & Zador,  2011; Morillon, Schroeder, Wyart, & Arnal,  2016; Rohenkohl, Cravo, Wyart, & Nobre, 2012). Current theories and empirical findings suggest that this enhancement is achieved by the entrainment of low-frequency neuronal oscillations, which temporally modulates the excitability of task-relevant neuronal populations (Cravo, Rohenkohl, Wyart, & Nobre, 2013; Large & Jones, 1999; Schroeder & Lakatos, 2009). Such entrainment, principally observed in sensory cortices (Besle et al., 2011; Lakatos et al., 2013), would be possible thanks to the downward propagation of temporal prediction signals, recently shown to originate in the motor system (Morillon & Baillet, 2017). These signals would be responsible for the predictive alignment of the neuronal excitability phase of ongoing oscillations in sensory cortex with upcoming events, possibly through top-down phase-reset (e.g., Park, Ince, Schyns, Thut, & Gross, 2015 ; Stefanics et al., 2010). A recent proposition by Arnal and colleagues (Rimmele, Morillon, Poeppel, & Arnal, submitted) is that time estimation relies on the neural recycling of action circuits (Coull, 2011) and is implemented by internal, non-conscious “simulation” of movements in most ecological situations (Arnal, 2012; Arnal & Giraud, 2012; Schubotz, 2007). On this view, temporal predictions correspond to a covert form of active sensing (Morillon, Hackett,

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

408    daniele schön and benjamin morillon Kajikawa, & Schroeder, 2015; Schroeder, Wilson, Radman, Scharfman, & Lakatos, 2010). In other words, the efferent motor signals that are generated when synchronizing our actions to predictable events are also generated during the passive perception of such regularities (Arnal, 2012; Patel & Iversen, 2014). When temporal regularities occur in the timescale of natural actions/movements, the motor system is recruited (Chen, Penhune, & Zatorre, 2008; Du & Zatorre, 2017; Grahn & Rowe, 2012; Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015; Teki, Grube, Kumar, & Griffiths, 2011; Zatorre, Chen, & Penhune, 2007). The great richness of the repertoire of motor schemes (gestures) makes it possible to simulate (and predict) the occurrence of sensory events with great accuracy and to treat them with greater precision (Morillon et al., 2016; Schubotz, 2007), offering a flexible tool to precisely predict “when” and select relevant information in time. Given the finesse of our motor expertise and the amazing complexity of our repertoire of actions, this means that we can use internal simulation of action to anticipate temporal trajectories. This conception is compatible with various forms of “motor theories” of speech ­perception (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967), in which the covert simulation of actions can lead to a given sensory configuration. The role of temporal predictions, while being a critical role in both music and language, differs in multiple ways. First, music is much more rhythmic than speech, hence predictions are more precise. Second, while temporal predictions have primarily a contextual role in language, helping to optimize the extraction of relevant information, they serve a much more fundamental purpose in music. Indeed, musical rhythm has a remarkable capacity to move our minds and bodies. This is because it is part of the information content in itself, rather than being a contextual cue (as in language). In a compelling review article, Vuust and Witek (2014) hypothesize that music would exploit general principles of brain functioning, notably its structuration as a Bayesian, predictive system, to optimize our pleasure and desire to move. In any case, these distinctions highlight that music stimulates the dorsal auditory stream much more than language, as this pathway is involved in audio-motor transformation (Hickok & Poeppel,  2007) and temporal information processing (Morillon & Baillet, 2017). As a consequence, musical training or musical stimulation strengthen the connectivity between auditory and motor cortices, which has beneficial effects for speech comprehension (Falk, Lanzilotti, & Schön, 2017), especially in noisy conditions (Du & Zatorre, 2017), and phonological and reading skills in children (Flaugnacco et al., 2015), as described earlier. Overall, while music and language have both different structure and function, they share the specificity to be temporal in essence. Adopting a dynamical approach seems thus the most promising avenue to understand how the human brain interacts with this type of multisensory environment.

References Abrams, D. A., Bhatara, A., Ryali, S., Balaban, E., Levitin, D. J., & Menon, V. (2010). Decoding temporal structure in music and speech relies on shared brain resources but elicits different fine-scale spatial patterns. Cerebral Cortex 21(7), 1507–1518. Albert, M. L., Sparks, R. W., & Helm, N. A. (1973). Melodic intonation therapy for aphasia. Archives of Neurology 29, 130–131.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   409 Arnal, L. (2012). Predicting “when” using the motor system’s beta-band oscillations. Frontiers in Human Neuroscience 6, 225. Retrieved from https://doi.org/10.3389/fnhum.2012.00225 Arnal, L., & Giraud, A.  L. (2012). Cortical oscillations and sensory predictions. Trends in Cognitive Sciences 16(7), 390–398. Astésano, C., Besson, M., & Alter, K. (2004). Brain potentials during semantic and prosodic processing in French. Cognitive Brain Research 18(2), 172–184. Basso, A., & Capitani, E. (1985). Spared musical abilities in a conductor with global aphasia and ideomotor apraxia. Journal of Neurology, Neurosurgery & Psychiatry 48(5), 407–412. Bedoin, N., Brisseau, L., Molinier, P., Roch, D., & Tillmann, B. (2016). Temporally regular musical primes facilitate subsequent syntax processing in children with specific language impairment. Frontiers in Neuroscience 10. Retrieved from https://doi.org/10.3389/fnins.2016.00245 Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature 403(6767), 309–312. Benner, J., Wengenroth, M., Reinhardt, J., Stippich, C., Schneider, P., & Blatow, M. (2017). Prevalence and function of Heschl’s gyrus morphotypes in musicians. Brain Structure and Function 222(8), 1–17. Besle, J., Schevon, C. A., Mehta, A. D., Lakatos, P., Goodman, R. R., McKhann, G. M., . . . Schroeder, C. E. (2011). Tuning of the human neocortex to the temporal dynamics of attended events. Journal of Neuroscience 31(9), 3176–3185. Bhide, A., Power, A., & Goswami, U. (2013). A rhythmic musical intervention for poor readers: A comparison of efficacy with a letter-based intervention. Mind, Brain, and Education 7(2), 113–123. Carrus, E., Pearce, M. T., & Bhattacharya, J. (2013). Melodic pitch expectation interacts with neural responses to syntactic but not semantic violations. Cortex 49(8), 2186–2200. Cason, N., Astésano, C., & Schön, D. (2015). Bridging music and speech rhythm: Rhythmic priming and audio-motor training affect speech perception. Acta Psychologica 155, 43–50. Cason, N., Hidalgo, C., Isoard, F., Roman, S., & Schön, D. (2015). Rhythmic priming enhances speech production abilities: Evidence from prelingually deaf children. Neuropsychology 29(1), 102. Cason, N., & Schön, D. (2012). Rhythmic priming enhances the phonological processing of speech. Neuropsychologia 50(11), 2652–2658. Chen, J.  L., Penhune, V.  B., & Zatorre, R.  J. (2008). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex 18(12), 2844–2854. Chern, A., Tillmann, B., Vaughan, C., & Gordon, R. L. (2018). New evidence of a rhythmic priming effect that enhances grammaticality judgments in children. Journal of Experimental Child Psychology 173, 371–379. Cheung, C., Hamilton, L. S., Johnson, K., & Chang, E. F. (2016). The auditory representation of speech sounds in human motor cortex. eLife 5, e12577. Chobert, J., François, C., Velay, J. L., & Besson, M. (2012). Twelve months of active musical training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and voice onset time. Cerebral Cortex 24(4), 956–967. Chobert, J., Marie, C., François, C., Schön, D., & Besson, M. (2011). Enhanced passive and active processing of syllables in musician children. Journal of Cognitive Neuroscience 23(12), 3874–3887. Chomsky, N. (1959). A review of B. F. Skinner’s Verbal Behavior. Language 35(1), 26–58. Cogo-Moreira, H., de Avila, C. R. B., Ploubidis, G. B., & de Jesus Mari, J. (2013). Effectiveness of music education for the improvement of reading skills and academic achievement in young poor readers: A pragmatic cluster-randomized, controlled clinical trial. PloS ONE 8(3), e59984.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

410    daniele schön and benjamin morillon Coull, J. T. (2011). Discrete neuroanatomical substrates for generating and updating temporal expectations. In S.  Dehaene & E.  Brannon (Eds.), Space, time and number in the brain: Searching for the foundations of mathematical thought (pp. 87–101). Amsterdam: Elsevier. Cravo, A. M., Rohenkohl, G., Wyart, V., & Nobre, A. C. (2013). Temporal expectation enhances contrast sensitivity by phase entrainment of low-frequency oscillations in visual cortex. Journal of Neuroscience 33(9), 4002–4010. Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics 26(2), 145–171. DeWitt, L. A., & Samuel, A. G. (1990). The role of knowledge-based expectations in music perception: Evidence from musical restoration. Journal of Experimental Psychology: General 119(2), 123–144. Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D. (2017). Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81(B), 181–187. Dittinger, E., Barbaroux, M., D’Imperio, M., Jäncke, L., Elmer, S., & Besson, M. (2016). Professional music training and novel word learning: From faster semantic encoding to longer-lasting word representations. Journal of Cognitive Neuroscience 28(10), 1584–1602. Dittinger, E., Valizadeh, S. A., Jäncke, L., Besson, M., & Elmer, S. (2018). Increased functional connectivity in the ventral and dorsal streams during retrieval of novel words in professional musicians. Human Brain Mapping 39(2), 722–734. Du, Y., & Zatorre, R. J. (2017). Musical training sharpens and bonds ears and tongue to hear speech better. Proceedings of the National Academy of Sciences 5, 201712223. Retrieved from https://doi.org/10.1073/pnas.1712223114 Elhilali, M., & Shamma, S.  A. (2008). A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation. Journal of the Acoustical Society of America 124(6), 3751–3771. Elmer, S., Hänggi, J., & Jäncke, L. (2016). Interhemispheric transcallosal connectivity between the left and right planum temporale predicts musicianship, performance in temporal speech processing, and functional specialization. Brain Structure and Function 221(1), 331–344. Elmer, S., Hänggi, J., Meyer, M., & Jäncke, L. (2013). Increased cortical surface area of the left planum temporale in musicians facilitates the categorization of phonetic and temporal speech sounds. Cortex 49(10), 2812–2821. Falk, S., Lanzilotti, C., & Schön, D. (2017). Tuning neural phase entrainment to speech. Journal of Cognitive Neuroscience 29(8), 1378–1389. Farbood, M. M., Marcus, G., & Poeppel, D. (2013). Temporal dynamics and the identification of musical key. Journal of Experimental Psychology Human Perception & Performance 39(4), 911–918. Farbood, M. M., Rowland, J., Marcus, G., Ghitza, O., & Poeppel, D. (2015). Decoding time for the identification of musical key. Attention, Perception, & Psychophysics 77(1), 28–35. Fedorenko, E., McDermott, J. H., Norman-Haignere, S., & Kanwisher, N. (2012). Sensitivity to musical structure in the human brain. Journal of Neurophysiology 108(12), 3289–3300. Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural integration in language and music: Evidence for a shared system. Memory & Cognition 37(1), 1–9. Fitzroy, A. B., & Sanders, L. D. (2013). Musical expertise modulates early processing of syntactic violations in language. Frontiers in Psychology 3, 603. Retrieved from https://doi.org/10.3389/ fpsyg.2012.00603 Fiveash, A., & Pammer, K. (2014). Music and language: Do they draw on similar syntactic working memory resources? Psychology of Music 42(2), 190–209.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   411 Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PLoS ONE 10(9), e0138715. Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge, MA: MIT Press. François, C., Chobert, J., Besson, M., & Schön, D. (2012). Music training for the development of speech segmentation. Cerebral Cortex 23(9), 2038–2043. François, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical and linguistic structures. Cerebral Cortex 21(10), 2357–2365. Friederici, A. D., & Kotz, S. A. (2003). The brain basis of syntactic processes: Functional imaging and lesion studies. NeuroImage 20, S8–S17. Ghitza, O. (2011). Linking speech perception and neurophysiology: Speech decoding guided by cascaded oscillators locked to the input rhythm. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00130 Ghitza, O., & Greenberg, S. (2009). On the possible role of brain rhythms in speech perception: Intelligibility of time-compressed speech with periodic and aperiodic insertions of silence. Phonetica 66, 113–126. Giraud, A. L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S., & Laufs, H. (2007). Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron 56(6), 1127–1134. Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience 15(4), 511–517. Goswami, U. (2011). A temporal sampling framework for developmental dyslexia. Trends in Cognitive Sciences 15(1), 3–10. Gottfried, T. L., & Riester, D. (2000). Relation of pitch glide perception and Mandarin tone identification. Journal of the Acoustical Society of America 108(5), 2604. Gottfried, T. L., Staby, A. M., & Ziemer, C. J. (2004). Musical experience and Mandarin tone discrimination and imitation. Journal of the Acoustical Society of America 115(5), 2545. Grahn, J. A., & Rowe, J. B. (2012). Finding and feeling the musical beat: Striatal dissociations between detection and prediction of regularity. Cerebral Cortex 23(4), 913–921. Haegens, S., & Golumbic, E. Z. (2018). Rhythmic facilitation of sensory processing: A critical review. Neuroscience & Biobehavioral Reviews 86, 150–165. Herdener, M., Humbel, T., Esposito, F., Habermeyer, B., Cattapan-Ludewig, K., & Seifritz, E. (2012). Jazz drummers recruit language-specific areas for the processing of rhythmic structure. Cerebral Cortex 24(3), 836–843. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience 8, 393–402. Hidalgo, C., Falk, S., & Schön, D. (2017). Speak on time! Effects of a musical rhythmic training on children with hearing loss. Hearing Research 351, 11–18. Hillis, A. E., & Caramazza, A. (1995). Representation of grammatical categories of words in the brain. Journal of Cognitive Neuroscience 7(3), 396–407. Hoch, L., Poulin-Charronnat, B., & Tillmann, B. (2011). The influence of task-irrelevant music on language processing: Syntactic and semantic structures. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00112 Intartaglia, B., White-Schwoch, T., Kraus, N., & Schön, D. (2017). Music training enhances the automatic neural processing of foreign speech sounds. Scientific Reports 7(1), 12631. Jaramillo, S., & Zador, A. M. (2011). The auditory cortex mediates the perceptual effects of acoustic temporal expectation. Nature Neuroscience 14, 246–251.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

412    daniele schön and benjamin morillon Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development of syntax processing in children. NeuroImage 47(2), 735–744. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review 83(5), 323–355. Kleber, B., Veit, R., Moll, C. V., Gaser, C., Birbaumer, N., & Lotze, M. (2016). Voxel-based morphometry in opera singers: Increased gray-matter volume in right somatosensory and auditory cortices. NeuroImage 133, 477–483. Koelsch, S., Gunter, T. C., Cramon, D. Y. V., Zysset, S., Lohmann, G., & Friederici, A. D. (2002). Bach speaks: A cortical “language-network” serves the processing of music. NeuroImage 17(2), 956–966. Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: An ERP study. Journal of Cognitive Neuroscience 17(10), 1565–1577. Kotz, S. A., Gunter, T. C., & Wonneberger, S. (2005). The basal ganglia are receptive to rhythmic compensation during auditory syntactic processing: ERP patient data. Brain and Language 95(1), 70–71. Kotz, S. A., Meyer, M., Alter, K., Besson, M., von Cramon, D. Y., & Friederici, A. D. (2003). On the lateralization of emotional prosody: An event-related functional MR investigation. Brain & Language 86(3), 366–376. Kotz, S. A., & Schwartze, M. (2010). Cortical speech processing unplugged: A timely subcorticocortical framework. Trends in Cognitive Sciences 14(9), 392–399. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8), 599–605. Kunert, R., & Slevc, L. R. (2015). A commentary on: “Neural overlap in processing music and speech.” Frontiers in Human Neuroscience 9. Retrieved from https://doi.org/10.3389/ fnhum.2015.00330 Kunert, R., Willems, R. M., Casasanto, D., Patel, A. D., & Hagoort, P. (2015). Music and language syntax interact in Broca’s area: An fMRI study. PloS One 10(11), e0141069. Lakatos, P., Musacchia, G., O’Connel, M. N., Falchier, A. Y., Javitt, D. C., & Schroeder, C. E. (2013). the spectrotemporal filter mechanism of auditory selective attention. Neuron 77, 750–761. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review 106(1), 119–159. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review 74, 431–461. Liebenthal, E., Binder, J. R., Spitzer, S. M., Possing, E. T., & Medler, D. A. (2005). Neural substrates of phonemic perception. Cerebral Cortex 15(10), 1621–1631. Lindell, A. K. (2006). In your right mind: Right hemisphere contributions to language processing and production. Neuropsychology Review 16(3), 131–148. Luria, A. R., Tsevetkova, L. S., & Futer, D.  S. (1965). Aphasia in a composer. Journal of the Neurological Sciences 2(3), 288–292. Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: an MEG study. Nature neuroscience, 4(5), 540–545. Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: Behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience 18(2), 199–211.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   413 Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of musical expertise on segmental and tonal processing in Mandarin Chinese. Journal of Cognitive Neuroscience 23(10), 2701–2715. Marie, C., Magne, C., & Besson, M. (2011). Musicians and the metric structure of words. Journal of Cognitive Neuroscience 23(2), 294–305. Marr, D. (1982). Vision. San Francisco, CA: Freeman. Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W.T. (2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140093. doi:10.1098/rstb.2014.0093 Milovanov, R., Huotilainen, M., Esquef, P. A., Alku, P., Välimäki, V., & Tervaniemi, M. (2009). The role of musical aptitude and language skills in preattentive duration processing in school-aged children. Neuroscience Letters 460(2), 161–165. Moore, E., Branigan, H. & Overy, K. (2017). Exploring the role of auditory-motor synchron­ isation in the transfer of music to language skills in dyslexia. Outstanding Poster Award talk at Neurosciences and Music VI conference. Morillon, B., & Baillet, S. (2017). Motor origin of temporal predictions in auditory attention. Proceedings of the National Academy of Sciences 114(42), E8913–E8921. Morillon, B., Hackett, T. A., Kajikawa, Y., & Schroeder, C. E. (2015). Predictive motor control of sensory dynamics in auditory active sensing. Current Opinion in Neurobiology 31, 230–238. Morillon, B., Schroeder, C.  E., & Wyart, V. (2014). Motor contributions to the temporal precision of auditory attention. Nature Communications 5, 5255. Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal prediction in lieu of periodic stimulation. Journal of Neuroscience 36(8), 2342–2347. Musacchia, G., Sams, M., Skoe, E., & Kraus, N. (2007). Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proceedings of the National Academy of Sciences 104(40), 15894–15898. Nobre, A. C., & van Ede, F. (2018). Anticipated moments: Temporal structure in attention. Nature Reviews Neuroscience 19, 34–38. Overy, K. (2000). Dyslexia, temporal processing and music: The potential of music as an early learning aid for dyslexic children. Psychology of Music 28(2), 218–229. Parbery-Clark, A., Skoe, E., & Kraus, N. (2009). Musical experience limits the degradative effects of background noise on the neural processing of sound. Journal of Neuroscience 29(45), 14100–14107 Parbery-Clark, A., Tierney, A., Strait, D. L., & Kraus, N. (2012). Musicians have fine-tuned neural distinction of speech syllables. Neuroscience 219, 111–119. Park, H., Ince, R. A. A., Schyns, P. G., Thut, G., & Gross, J. (2015). Frontal top-down signals increase coupling of auditory low-frequency oscillations to continuous speech in human listeners. Current Biology 25(12), 1649–1653. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience 6(7), 674–681. Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology 2, 142. doi:10.3389/fpsyg.2011.00142 Patel, A. D. (2014). Can nonlinguistic musical training change the way the brain processes speech? The expanded OPERA hypothesis. Hearing Research 308, 98–108. Patel, A. D., & Iversen, J. R. (2014). The evolutionary neuroscience of musical beat perception: The Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in System Neuroscience 8, 57. Retrieved from https://doi.org/10.3389/fnsys.2014.00057

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

414    daniele schön and benjamin morillon Patel, A. D., Iversen, J. R., Wassenaar, M., & Hagoort, P. (2008). Musical syntactic processing in agrammatic Broca’s aphasia. Aphasiology 22(7–8), 776–789. Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience 6(7), 688–691. Peretz, I., Vuvan, D., Lagrois, M. É., & Armony, J. L. (2015). Neural overlap in processing music and speech. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140090. Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as “asymmetric sampling in time.” Speech Communication 41(1), 245–255. Poeppel, D. (2012). The maps problem and the mapping problem: Two challenges for a cognitive neuroscience of speech and language. Cognitive Neuropsychology 29(1–2), 34–55. Przybylski, L., Bedoin, N., Krifi-Papoz, S., Herbillon, V., Roch, D., Léculier, L., . . . Tillmann, B. (2013). Rhythmic auditory stimulation influences syntactic processing in children with developmental language disorders. Neuropsychology 27(1), 121–131. Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nature Neuroscience 12, 718–724. Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences 97(22), 11800–11806. Rimmele, J. M., Morillon, B., Poeppel, D., & Arnal, L. H. (submitted). The proactive and flexible sense of timing. Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience 27, 169–192. Rogalsky, C., & Hickok, G. (2011). The role of Broca’s area in sentence comprehension. Journal of Cognitive Neuroscience 23(7), 1664–1680. Rohenkohl, G., Cravo, A. M., Wyart, V., & Nobre, A. C. (2012). Temporal expectation improves the quality of sensory information. Journal of Neuroscience 32(24), 8424–8428. Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science 274(5294), 1926–1928. Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition 70(1), 27–52. Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R.  J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219. Sammler, D., Baird, A., Valabrègue, R., Clément, S., Dupont, S., Belin, P., & Samson, S. (2010). The relationship of lyrics and tunes in the processing of unfamiliar songs: A functional magnetic resonance adaptation study. Journal of Neuroscience 30(10), 3572–3578. Sammler, D., Koelsch, S., Ball, T., Brandt, A., Grigutsch, M., Huppertz, H.  J., . . . Friederici, A. D. (2013). Co-localizing linguistic and musical syntax with intracranial EEG. NeuroImage 64, 134–146. Sammler, D., Koelsch, S., & Friederici, A. D. (2011). Are left fronto-temporal brain areas a prerequisite for normal music-syntactic processing? Cortex 47(6), 659–673. Schlaug, G., Jäncke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum size in musicians. Neuropsychologia 33(8), 1047–1055. Schön, D., Boyer, M., Moreno, S., Besson, M., Peretz, I., & Kolinsky, R. (2008). Songs as an aid for language acquisition. Cognition 106(2), 975–983.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

music and language   415 Schön, D., Gordon, R., Campagne, A., Magne, C., Astésano, C., Anton, J. L., & Besson, M. (2010). Similar cerebral networks in language, music and song perception. NeuroImage 51(1), 450–461. Schön, D., Magne, C., & Besson, M. (2004). The music of speech: Music training facilitates pitch processing in both music and language. Psychophysiology 41(3), 341–349. Schön, D., & Tillmann, B. (2015). Short- and long-term rhythmic interventions: Perspectives for language rehabilitation. Annals of the New York Academy of Sciences 1337, 32–39. Schroeder, C. E., & Lakatos, P. (2009). Low-frequency neuronal oscillations as instruments of sensory selection. Trends in Neurosciences 32(1), 9–18. Schroeder, C. E., Wilson, D. A., Radman, T., Scharfman, H., & Lakatos, P. (2010). Dynamics of active sensing and perceptual selection. Current Opinion in Neurobiology 20, 172–176. Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a new framework. Trends in Cognitive Sciences 11(5), 211–218. Shahin, A., Bosnyak, D. J., Trainor, L. J., & Roberts, L. E. (2003). Enhancement of neuroplastic P2 and N1c auditory evoked potentials in musicians. Journal of Neuroscience 23(13), 5545–5552. Skoe, E., & Kraus, N. (2012). A little goes a long way: How the adult brain is shaped by musical training in childhood. Journal of Neuroscience 32(34), 11507–11510. Slevc, L. R., Faroqi-Shah, Y., Saxena, S., & Okada, B. M. (2016). Preserved processing of musical structure in a person with agrammatic aphasia. Neurocase 22(6), 505–511. Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: Self-paced reading time evidence for shared processing of linguistic and musical syntax. Psychonomic Bulletin & Review 16(2), 374–381. Staeren, N., Renvall, H., De Martino, F., Goebel, R., & Formisano, E. (2009). Sound categories are represented as distributed patterns in the human auditory cortex. Current Biology 19(6), 498–502. Stahl, B., Kotz, S. A., Henseler, I., Turner, R., & Geyer, S. (2011). Rhythm in disguise: Why singing may not hold the key to recovery from aphasia. Brain 134(10), 3083–3093. Stefanics, G., Hangya, B., Hernadi, I., Winkler, I., Lakatos, P., & Ulbert, I. (2010). Phase entrainment of human delta oscillations can mediate the effects of expectation on reaction speed. Journal of Neuroscience 30(41), 13578–13585. Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience 2(2), 191–196. Strait, D. L., Kraus, N., Parbery-Clark, A., & Ashley, R. (2010). Musical experience shapes topdown auditory mechanisms: Evidence from masking and auditory attention performance. Hearing Research 261(1), 22–29. Teki, S., Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of durationbased and beat-based auditory timing. Journal of Neuroscience 31(10), 3805–3812. Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science 331(6022), 1279–1285. Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: Do music lessons help? Emotion 4(1), 46–64. Tierney, A., & Kraus, N. (2014). Auditory-motor entrainment and phonological skills: Precise auditory timing hypothesis (PATH). Frontiers in Human Neuroscience 8. Retrieved from https://doi.org/10.3389/fnhum.2014.00949 Tinbergen, N. (1963). On aims and methods of ethology. Ethology 20, 410–433.

OUP CORRECTED PROOF – FINAL, 07/09/2019, SPi

416    daniele schön and benjamin morillon Vallentin, D., Kosche, G., Lipkind, D., & Long, M. A. (2016). Inhibition protects acquired song segments during vocal learning in zebra finches, Science 351(6270), 267–271. Vigneau, M., Beaucousin, V., Hervé, P. Y., Jobard, G., Petit, L., Crivello, F., . . . Tzourio-Mazoyer, N. (2011). What is right-hemisphere contribution to phonological, lexico-semantic, and sentence processing? Insights from a meta-analysis. NeuroImage 54(1), 577–593. Vuust, P., & Witek, M.  A.  G. (2014). Rhythmic complexity and predictive coding: A novel approach to modeling rhythm and meter perception in music. Frontiers in Psychology 5, 1111. Retrieved from https://doi.org/10.3389/fpsyg.2014.01111 Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422. Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences 6(1), 37–46. Zatorre, R. J., Chen, J. L., & Penhune, V. B. (2007). When the brain plays music: Auditory– motor interactions in music perception and production. Nature Reviews Neuroscience 8, 547–558. Zuk, J., Ozernov-Palchik, O., Kim, H., Lakshminarayanan, K., Gabrieli, J. D. E., Tallal, P., & Gaab, N. (2013). Enhanced syllable discrimination thresholds in musicians. PLoS ONE 8(12), e80546. Zumbansen, A., Peretz, I., & Hébert, S. (2014). Melodic intonation therapy: Back to basics for future research. Frontiers in Neurology 5. Retrieved from https://doi.org/10.3389/fneur. 2014.00007

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

section v

M USIC I A NSH I P A N D BR A I N F U NC T ION

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 17

M usica l Ex pertise a n d Br a i n Struct u r e: The Cause s a n d Consequ ence s of Tr a i n i ng Virginia B. Penhune

Introduction Over the past twenty years, brain imaging studies have demonstrated that music training can change brain structure, predominantly in the auditory-motor network that underlies music performance. These studies have also shown that brain structural variation is related to performance on a range of musical tasks, and that even short-term training can result in brain plasticity. In this chapter, we will argue that the observed differences in brain structure between experts and novices derive from at least four sources. First, there may be pre-existing individual differences in structural features supporting specific skills that predispose people to undertake music training. Second, lengthy and consistent training likely produces structural change in the brain networks tapped by performance through repeated cycles of prediction, feedback, and error-correction that drive learning. Third, the timing of practice during specific periods of development may result in brain changes that do not occur at other periods of time, and which may promote future learning and plasticity. Fourth, both the rewarding nature of music itself, as well as the reward value of practice and accurate performance may make music training a particularly effective driver of brain plasticity.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

420   virginia b. penhune

Structural Brain Differences in Adult Musicians There is now a relatively large body of brain imaging data showing differences in gray- (GM) and white-matter (WM) architecture between musicians and non-musicians (see Fig. 1). In adults all of these studies are cross-sectional, and typically compare music students or professionals with controls selected to have very little music training. One of the most common and expected findings is that music training is associated with enhancements in auditory regions, particularly Heschl’s gyrus (HG), the region of primary auditory cortex. These studies have found that musicians commonly show greater gyrification of HG (Schneider et al., 2002; Schneider et al., 2005), and greater GM volume or cortical thickness (CT) in this region (Bermudez, Lerch, Evans, & Zatorre, 2009; Foster & Zatorre,  2010; Gaser & Schlaug,  2003; Karpati, Giacosa, Foster, Penhune, & Hyde,  2017; Schneider et al.,  2002,  2005). These differences have been shown to be

M1 Parietal

PMC

Arcuate Fasiculus

IFG

CST

STG

Cerebellum

Figure 1.  Regions of the dorsal auditory pathway affected by music training. Illustrates brain regions found to show structural changes in musicians compared to non-musicians. These include the auditory (superior temporal gyrus, STG), partietal, premotor cortex (PMC), and inferior frontal gyrus (IFG) regions in the dorsal auditory pathway, as well as the connecting fibers of the arcuate fasciculus. Also pictured are the cerebellum and corticospinal tract (CST). Regions not shown are the corpus callosum and basal ganglia.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   421 related to indices of music proficiency (Schneider et al., 2002, 2005), hours of music practice (Foster & Zatorre, 2010), variations in EEG and MEG responses to auditory signals (Schneider et al., 2002, 2005), and performance on melody discrimination and rhythm reproduction tasks (Foster & Zatorre, 2010; Karpati et al., 2017). The second most common finding is enhancement in motor regions of the brain, including GM in primary motor, premotor, and parietal regions, as well as the cerebellum and basal ganglia. In addition, consistent increases have been observed in white-matter pathways, including the corpus callosum, descending motor tracts, and sensorimotor connections. One of the first studies in this domain found that the length of the central sulcus, and by inference the size of the motor cortex (M1), was larger in trained musicians, and that earlier onset of training was related to greater length (Amunts et al., 1997). This finding has been replicated in subsequent studies using whole-brain analysis techniques (Bermudez et al., 2009; Gaser & Schlaug, 2003). Differences between musicians and non-musicians have also been observed in the corpus callosum (CC), the primary white-matter pathway connecting the two hemispheres. In another early investigation, it was found that the surface area of the anterior half of the CC was larger in musicians, and that this difference was greatest for those who began training before age 7 (Schlaug, Jancke, Huang, Staiger, & Steinmetz,  1995). Musicians have also been found to have greater white-matter integrity in the CC as measured using diffusion tensor imaging (DTI), with these measures being related to hours of practice (Bengtsson et al., 2005), as well as to age of start and performance on a sensory-motor synchronization task (Steele, Bailey, Zatorre, & Penhune, 2013). In the descending motor pathways, changes in DTI measures have been observed to be related to hours of practice in childhood (Bengtsson et al., 2005). Changes in subcortical structures have also been observed, with a recent study reporting that musicians have greater gray-matter volume in the putamen (Vaquero et al., 2016), and others showing enhancements in cerebellar gray- (Gaser & Schlaug, 2003; Hutchinson, Lee, Gaab, & Schlaug, 2003) and white-matter (Abdul-Kareem, Stancak, Parkes, Al-Ameen, et al., 2011). However, a more recent study from our laboratory using cerebellar-specific segmentation techniques found no differences in either gray- or whitematter volumes between musicians and non-musicians, but that musicians who began training before age 7 had reduced volumes in cerebellar regions specifically related to motor timing (Baer et al., 2015). Other regions found to differ between musicians and non-musicians are in frontal and parietal cortex, including regions important for language (pars opercularis and triangularis; areas 44 and 45) and working memory (dorsolateral: 9/46; and ventrolateral prefrontal cortex: 47/12). Enhanced GM density has been observed in areas 44/45 that is related to years of music experience (Abdul-Kareem, Stancak, Parkes, & Sluming, 2011; James et al., 2014; Sluming et al., 2002), and to performance on a test of absolute pitch (Bermudez et al., 2009). Importantly, musicians have also been found to have greater white-matter integrity as measured with DTI in the arcuate fasciculus, the pathway connecting auditory, parietal, and inferior frontal regions (Halwani, Loui, Ruber, & Schlaug, 2011). Musicians have also been reported to have greater cortical thickness in DLPFC;

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

422   virginia b. penhune and interregional variability in cortical thickness is correlated across a broader range of auditory and motor regions in musicians compared to controls (Bermudez et al., 2009). Finally, several studies have reported greater gray-matter volume in parietal regions (Foster & Zatorre, 2010; Gaser & Schlaug, 2003; James et al., 2014), which are engaged in sensorimotor transformations and planning that are relevant for playing a musical instrument (Andersen & Cui, 2009; Gogos et al., 2010; Rauschecker, 2011). In particular, Foster and Zatorre (2010) found that both gray-matter volume and cortical thickness were related to performance on a test of melodic discrimination in a group of people with varying levels of music experience. Taken together, cross-sectional studies in adult musicians provide evidence that longterm practice produces structural changes in regions of the dorsal auditory-motor network that has been shown in functional imaging studies to be recruited during playing (Brown, Zatorre, & Penhune, 2015; Chen, Penhune, & Zatorre, 2008; Herholz & Zatorre, 2012; Novembre & Keller, 2014).

Developmental Impacts on Training-Related Plasticity Studying effects of music training in childhood is important because that is when ­lessons typically begin, but also because we know that sensorimotor experience during early sensitive periods in development can have differential impacts on long-term brain plasticity. The first longitudinal study in children examined the effects of 15 months of piano training study in 6- to 8-year-olds (Hyde et al., 2009). Longitudinal studies are critical because they allow us to establish more direct causal connections between training and any observed changes in the brain. This study found that children who received training did not differ from untrained children at baseline, but showed graymatter enhancements in auditory and motor cortex, as well as enlargement of the corpus callosum. Most importantly, the volume of auditory cortex was found to be related to performance on tests of melody and rhythm discrimination, and the volume of motor cortex was found to be related to performance on a test of fine-motor skill. These results are supported by a second longitudinal study which found that 6- to 8-year-old children participating in a music training program were found to have greater WM integrity in the CC after two years (Habibi et al., 2017). There was also some evidence of reduced cortical thinning in right compared to left posterior auditory cortex. Taken together, these longitudinal results indicate that even relatively short-term training in childhood can produce changes in behavior and brain structure. Most importantly, changes occurred in the same regions of the auditory-motor network—auditory cortex, M1, and the CC—that have been shown to differ after long-term training in adults. The parallel between longitudinal changes in childhood and cross-sectional findings in

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   423 adults supports the inference that the structural differences observed in adults are indeed the result of training. The only other anatomical study in children found that in a large group of 8- to 10-year-olds, the volume of HG was larger in those who practiced more, and was associated with measures of music aptitude, as well as behavioral and MEG measures of auditory processing (Seither-Preisler, Parncutt, & Schneider, 2014). This is consistent with a longitudinal EEG study in children showing enhancements of auditory evoked responses to musical features (Putkinen, Tervaniemi, Saarikivi, Ojala, & Huotilainen, 2014). Interestingly, however, no changes in HG volume were observed when examining possible longitudinal effects after 13 months of additional training. Further, hierarchical regressions models predicting HG volume found that aptitude accounted for a greater proportion of the variance than practice time. The authors interpreted these last two findings as indicating that anatomical predispositions make a greater contribution to musical outcomes than training. However, it is also possible that training-related plastic changes had already occurred in the period preceding the study. Most children began lessons between 6 and 7 years old, and thus had already been playing for one to two years. The issue of whether predispositions or training contribute most to observed structural differences between musicians and non-musicians has long been debated, with little data that can directly contribute to settling the argument. As will be discussed further in this chapter, some data from untrained adults show that individual differences in specific anatomical features are related to performance or learning of musical tasks, providing indirect evidence that pre-existing anatomical features may mediate the potential to acquire musical skills (Foster & Zatorre, 2010; Li et al., 2014; Paquette, Fujii, Li, & Schlaug, 2017; Schneider et al., 2005). The finding described earlier of larger HG volume in children who practice more, and which does not change over time can also be considered as evidence for a pre-existing structural feature associated with musical skill (Seither-Preisler et al., 2014). Work with twins has shown that the propensity to practice is heritable, and that genes appear to account for a large portion of the variance in music abilities (Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014). However, a very recent study from this same group compared brain structure in monozygotic twins discordant for music practice. They found that the twins who played had greater cortical thickness in auditory and motor regions as well as WM enhancements in the corpus callosum compared to those who did not (de Manzano & Ullén, 2018). These findings provide the most definitive support yet for the causal effect of music training on brain structure. In an effort to synthesize these apparently opposing results, the authors have proposed a gene–environment interaction model of the musical skills and its impact on the brain (Ullén, Hambrick, & Mosing, 2016). This model proposes that multiple genetic predispositions subserving specifically musical skills, such as auditory and motor abilities, as well as non-specific cognitive and personality factors contribute to the likelihood that someone will engage in training. They also hypothesize that environmental factors interact with genetic predispositions to either promote or discourage persistence. We would further propose that the timing of music experience interacts with both

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

424   virginia b. penhune

GENETIC VARIATION

MOTOR SKILL AUDITORY PERCEPTION

MOTOR SKILL

COGNITIVE FUNCTION PERSONALITY

AUDITORY PERCEPTION

COGNITIVE FUNCTION PERSONALITY

MATURATION EXPERIENCE GENE-ENVIRONMENT INTERACTIONS

Figure 2.  Gene–maturation–environment interactions. Illustrates the interaction between genes, brain maturation, and specific training. Genetic variation leads to individual differences in brain structures for musical aptitudes such as auditory perception and motor dexterity. Genetic variation also regulates other non-specific aptitudes, such as cognitive skills and personality factors, including openness and the propensity to practice. Maturation produces normative changes that peak at different times depending on the brain region. Experience, such as music training, then interacts with both pre-existing individual differences, and normative maturation to change brain structure and plasticity. Experience also feeds back on genes through gene–environment interactions that can further enhance or limit plasticity.

predispositions and normative brain maturation to influence long-term behavioral and brain plasticity (see Fig. 2).

The Interaction between Development and Training A very important question in understanding the effect of music training on brain structure is the interaction between brain development and music training. Anecdotal evidence from the lives of famous musicians suggests that an early start of training can promote the development of extraordinary skill in adulthood (Jorgensen, 2011). Evidence from animal and human studies also shows that early experience, such as specific auditory exposure (Chang & Merzenich, 2003; de Villers-Sidani, Chang, Bao, & Merzenich, 2007), or enriched sensorimotor environments (Kolb et al., 2012), can have long-term effects on behavior and the brain. Two important early studies provided suggestive evidence that the impact of music training on brain structure was related to the age of start, with those who begin earlier showing greater enhancements in the size of M1 (Amunts et al., 1997) or the surface area of the corpus callosum (Schlaug et al., 1995). However, without specific controls, the age of start of training is typically confounded with the total years of training, making it impossible to attribute the observed differences to the age at which training began.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   425 In addition, these studies did not link the observed neuroanatomical differences to relevant behavior. To address these issues, a series of studies have compared behavior and brain structure in early- (ET < age 7) and late-trained (LT > age 7) musicians (see Fig. 3; see also Baer et al., 2015; Bailey & Penhune, 2010, 2012, 2013; Bailey, Zatorre, & Penhune, 2014; Steele et al., 2013; Vaquero et al., 2016). In these studies we matched ET and LT groups on important potential confounding variables including: years of music experience, years of formal training, and hours of current practice. In addition, we assessed cognitive measures such as non-verbal IQ and auditory working memory which might be thought to be related to the capacity for early training. Most importantly, we assessed perform­ance on relevant musical skills, such as rhythm reproduction and melody discrimin­ation. The age 7 cut-off for ET and LT groups was initially drawn from the study by Schlaug et al. (1995) and was essentially arbitrary. However, using a large sample of behavioral data, we have been able to show that the likely age range where early training has its strongest effect is between 7 and 9 (Bailey & Penhune, 2013). Behaviorally our studies have shown that adult musicians who begin training before age 7 outperform those who begin later on rhythm reproduction and melody discrimination tasks (Bailey & Penhune, 2010, 2012). Drawing on this work, we collected a large sample of ET and LT musicians with behavioral, T1 and DTI data. Analysis using deformation-based morphometry on the T1 data found that ET musicians show enlargement in the region of the ventral premotor cortex (vPMC), and that the volume of this region is related to performance on the rhythm synchronization task (Bailey et al., 2014). These findings are consistent with fMRI studies showing that vPMC is active when both musicians and non-musicians are performing the same rhythm task (Chen et al., 2008). In the same sample, DTI measures showed that ET musicians also had enhanced WM integrity in the posterior mid-body of the corpus callosum, the location of fibers connecting M1 and PMC in the two hemispheres (Steele et al., 2013). We interpreted these findings based on data about normative maturation in these regions, and the relative contribution of genes and environment to their variability. A large, cross-sectional developmental sample showed that GM volume in anterior motor regions, including MI and PMC, have their peak period of growth between 6 and 8 years old (Giedd et al., 1999). Similarly, the size of the anterior region of the CC shows its peak increase at the same time (Westerhausen et al., 2011), and variability of this region is more strongly influenced by environmental than genetic factors (Chiang et al., 2009). Based on these data, we can hypothesize that early training at the time of peak matur­ational change in motor regions and the CC may enhance brain plasticity. In addition, the ­relatively stronger contribution of environment to the size of anterior CC in adults suggests that it might be more susceptible to the impact of music training. We interpreted these findings as demonstrating a scaffold, or metaplastic, effect where early training promotes brain plasticity which is sustained or augmented by later practice (Steele et al., 2013). Our findings in the PMC and CC appear to tell a straightforward story in which early training produces enlargement or enhancement of brain structure. However, more recent findings make it clear that reality is not so simple. Using the same sample described earlier, we examined GM and WM volumes in the cerebellum using a novel multi-atlas segmentation technique that labels all thirteen lobules in both hemispheres (Baer et al., 2015).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

vPMC

Corpus Callosum

0.5

0.8

DBM Value

0.3 0.2 0.1 0 –0.1 2

4

6

8

10

12

14

16

Fractional Anisotropy

0.4

0.75 0.7 0.65

–0.2 –0.3

0

Age of Onset (years)

5

10

15

20

Age of Onset (years)

PUTAMEN

Lobule VIIIa

0.9

0.65 0.60 0.55 0.50 0.45

r = 0.36 ρ = 0.03

0.40 0.35

3

4

5

6

7

8

9 10 11 12 13

Age of onset of piano playing

Lett lobule VIIIA

Mean GMV values in R Putamen

y=4

0.8 0.7 0.6 0.5 0.4 r = .346

0.3 0.2

0

5

10

15

20

Age of Start (Years)

Figure 3.  Findings from studies examining structural differences between early- (ET; before age 7) and late-trained (LT; after age 7) musicians. Panel A on the left is taken from Bailey et al., 2014 and shows GM enhancement in the ventral premotor cortex (vPMC) in ET musicians. Panel A on the right is taken from Steele et al., 2013 and shows enhanced FA in the posterior midbody of the corpus callosum. Panel B on the left is taken from Vaquero et al., 2016 and shows reduced GM in the putamen in ET musicians. Panel B on the right is taken from Baer et al, 2015 and shows reduced volume of left cerebellar lobule VIIIa. The graphs at the bottom of each panel show the relationship of volume changes with the age of onset of training.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   427 In addition, we tested these musicians and controls on a classic auditory-motor tapping and continuation task (Repp, 2005). The cerebellum has been linked to a range of sensory and motor timing functions that are likely to be relevant for music training and perform­ance (Koziol et al., 2014; Sokolov, Miall, & Ivry, 2017). And, as described earlier, previous work had found greater cerebellar GM volume in trained musicians (Gaser & Schlaug, 2003; Hutchinson et al., 2003). However, the results of our study showed that ET musicians had smaller volumes of cerebellar lobules IV, V, and VI compared to LT musicians. Strikingly, earlier age of start, greater music experience, and better timing performance were all correlated with smaller cerebellar volumes. Better timing performance was specifically associated with smaller volumes of right lobule VI which has been functionally linked to perceptual and motor timing (E, Chen, Ho, & Desmond,  2014; Ivry, Spencer, Zelaznik, & Diedrichsen, 2002). This is consistent with another recent study which found that early-trained pianists had smaller GM volume in the right putamen, and lower timing variability when playing scales (Vaquero et al., 2016). So, why does training affect the cerebellum differently than the cortex, and how do these findings challenge our understanding of the effects of early experience? There are several features of cerebellar anatomy that may explain this result. First, developmental studies show that peak growth in the cerebellum occurs much later than in most of the cortex, between the ages of 12 and 18 (Tiemeier et al., 2010). Thus early experience may have a different effect on cerebellar plasticity, such that experience leads to greater efficiency and reduced expansion. Second, the cerebellum is unique in being structurally homogeneous, with the identical cytoarchitecture and input–output circuitry throughout (Schmahmann, 1997). In the motor system, the cerebellar circuits are known to play a role in error-correction and optimization. Because these circuits are uniform across the structure, it is hypothesized that they perform the same role in optimizing a wide variety of functions in the regions to which it is connected (Balsters, Whelan, Robertson, & Ramnani, 2013; Koziol et al., 2014; Sokolov et al., 2017). The cerebellar regions that are smaller in ET musicians in our study are connected to frontal motor and association regions, including M1, PMC, and prefrontal cortex (Diedrichsen, Balsters, Flavell, Cussans, & Ramnani, 2009; Kelly & Strick, 2003). Based on this information, it is possible that training-related skills and cortical expansion might be supported by greater optimiz­ation and reduced expansion in the cerebellum. If this is true, then cortical and cerebellar changes with training should be inversely related.

Aptitude and Short-Term Training Differences in brain structure between musicians and non-musicians have generally been attributed to long and intensive training. However, it is more likely that they result from an interaction between training-induced plasticity and pre-existing individual

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

428   virginia b. penhune differences in the brain that predispose certain people to engage in music (see Fig. 2). While there is little direct evidence for specific brain features that predispose an individual to become a musician, evidence from studies of individual differences in music ability and response to training can provide some clues. Individual differences in auditory and motor regions of untrained individuals have been linked to performance on specific musical tasks, and to the ability to learn to play an instrument. GM concentrations in auditory regions and the amygdala were found to be correlated with interval discrim­ ination in a large sample unselected for music training (Li et al., 2014). Similarly, in a sample selected to have a range of musical experience, GM concentration and cortical thickness in auditory and parietal regions were found to be related to the ability to discriminate melodies that had been transposed (Foster & Zatorre, 2010). Finally, a recent study found that cerebellar volumes were related to beat perception in musicians (Paquette et al.,  2017). Individual differences in WM tracts connecting auditory and motor regions, and in motor output pathways have been found to be related to faster learning of short melodies (Engel et al., 2014). Further, WM integrity in the left arcuate fasciculus and the temporal segment of the CC have been found to predict individual differences in auditory-motor synchronization (Blecher, Tal, & Ben-Shachar, 2016). Findings showing that brain structural features can predict musical skills are consistent with results in related domains, where the volume of auditory cortex was found to be associated with the ability to learn linguistic pitch discrimination (Wong et al., 2008), and the volume of both auditory cortex (Golestani, Molko, Dehaene, LeBihan, & Pallier, 2007; Golestani, Paus, & Zatorre, 2002) and the arcuate fasciculus have been found to be related to foreign language sound learning (Vaquero, Rodriguez-Fornells, & Reiterer, 2017). Very importantly, however, aptitude for music training likely relies on more than pure auditory or motor skill. Heritability studies show that the propensity to practice appears to be genetically transmitted (Mosing et al., 2014), and that personality variables such as “openness to experience” are also associated with lifetime practice (Butkovic, Ullén, & Mosing, 2015). Thus, an individual with exceptional pre-existing skills must also have the right personality characteristics to undertake long-term training, and the openness to engage with new people, places, and ideas. A talented individual who does not like to practice, or hates stress, travel, and challenge is unlikely to become a professional musician.

Bringing It All Together Taken together, the current data on brain structure in musicians suggests that there may be pre-existing structural features—likely in the auditory-motor network supporting musical skill—that predispose individuals to pursue music training. Once training begins, the long-term effects on behavior and brain structure depend on the age of start, and thus on the interaction between training and the maturational trajectories of these regions and their connections. Early training may produce a type of scaffold or metaplasticity effect. Metaplasticity is a term that originates from studies of hippocampal learning mechanisms, and denotes the idea that experience can change the potential for

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   429 plasticity of a synapse (for review see Altenmüller & Furuya, 2016; Herholz & Zatorre, 2012). When applied to the context of music, it is the idea that training during specific phases of brain development can have long-term effects on how those regions change in response to future experience. Evidence for metaplastic effects resulting from music training comes from studies showing that musicians have enhanced learning of sensory and motor skills (Herholz, Boh, & Pantev, 2011; Ragert, Schmidt, Altenmüller, & Dinse, 2004; Rosenkranz, Williamon, & Rothwell, 2007), and greater increases in M1 activity during learning (Hund-Georgiadis & von Cramon, 1999). Thus we can think of early training as a scaffold on which later training can build (Bailey et al., 2014; Steele et al., 2013). Along with these training-specific metaplastic effects, evidence from heritability studies indicates that skills and abilities not specific to music may also contribute to promoting or limiting plasticity; these include the propensity to practice (Mosing et al., 2014), as well personality and cognitive variables that can support training (Butkovic et al., 2015).

Why Is Music Such an Effective Driver of Brain Plasticity? Why does music training produce such robust changes in brain structure? One very obvious answer is practice—lots of practice. For the studies reviewed here, the average length of training for musicians was 15–20 years. This is the equivalent of thousands of hours of practice across a large portion of the person’s life. While the idea that simply practicing long enough will result in expertise has been largely debunked (for review, see Mosing et al., 2014), long-term, consistent practice is strongly associated with expertise in a range of domains (Macnamara, Hambrick, & Oswald, 2014). Further, in the studies reviewed here, the length of training is typically strongly related to both structural brain differences and task performance. The impact of practice on brain organization is supported by studies in animals showing that practice on new motor tasks is associated with expanded representations in motor areas (Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995; Nudo, Milliken, Jenkins, & Merzenich, 1996), changes in MR measures of gray- and white-matter (Scholz, Allemang-Grand, Dazai, & Lerch, 2015; Scholz, Niibori, Frankland, & Lerch, 2015), and increased numbers of synapses and dendritic spines (Kleim, Barnaby, et al., 2002; Kleim, Freeman, et al., 2002; Kleim et al., 2004). Neuronal changes in gray matter that are related to learning include neurogenesis, synaptogenesis, and changes in neuronal morphology. In white matter, changes related to learning including increases in the number of axons, axon diameter, packing density of fibers, and myelination can be found (Zatorre, Fields, & Johansen-Berg, 2012). A second reason that music training may be particularly effective in driving brain plasticity is the highly specific nature of practice. The majority of musicians are experts on a single instrument; thus they perform millions of repetitions of the same movements, and listen attentively to an even larger number of associated sounds. When practicing, a musician imagines and plans a precise sequence of sounds and the movements required

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

430   virginia b. penhune to produce them. Once the plan is set in motion, they use auditory and somatosensory information to detect subtle deviations in sound and movement, implementing adjustments to enhance performance. Practice is therefore a repeated prediction, feedback, and error-correction cycle. Auditory-motor prediction is thought to be a central function of the dorsal stream, particularly of the premotor cortex. Brain imaging studies have shown increased activity in the PMC when people listen to melodies that they have learned to play (Chen, Rae, & Watkins, 2012; Lahav, Saltzman, & Schlaug, 2007), and recent work from our laboratory has shown that transcranical magnetic stimulation (TMS) over dorsal PMC disrupts learning of auditory-motor associations (Lega, Stephan, Zatorre, & Penhune, 2016). Feedback and error-correction are key components of motor learning (Diedrichsen, Shadmehr, & Ivry, 2010; Sokolov et al., 2017; Wolpert, Diedrichsen, & Flanagan, 2011), and studies of both motor and sensory learning show that functional and structural changes in the brain are driven by decreases in error and improved precision. For example, learning to juggle (Scholz, Klein, Behrens, & Johansen-Berg, 2009), balance on a tilting board (Taubert et al., 2010), or to perform a complex visuomotor task (Lakhani et al., 2016; Landi, Baguear, & Della-Maggiore, 2011) have all been shown to produce changes in gray- or white-matter architecture that were related to decreases in error with learning. Thus error-driven learning, particularly during periods of high developmental plasticity may be an important contributor to structural brain changes measured in adult musicians. Another reason that music training may be so successful in producing brain plasticity is that it is inherently multisensory. To produce music, performers must learn to link sounds to actions, but they must also link visual, somatosensory, and proprioceptive feedback to these sounds and actions. As described earlier, training is a prediction to feedback to error-correction cycle in which musicians use all their sensory resources to produce the perfect sound. Sounds are linked to actions relatively rapidly, as has been shown by changes in the strength of motor activity during passive listening to learned melodies after short-term training (Bangert et al., 2006; D’Ausilio, Altenmüller, Olivetti Belardinelli, & Lotze, 2006; Lega et al., 2016; Stephan, Brown, Lega, & Penhune, 2016). In particular, it was shown that learning to play a melody resulted in greater changes in the activity of auditory cortex than learning to remember the melody by listening alone (Lappe, Herholz, Trainor, & Pantev, 2008). This may partly be based on strong intrinsic connections between the auditory and motor systems (Chen et al., 2012; Poeppel, 2014; Zatorre, Chen, & Penhune, 2007). But it can also be hypothesized that co-activation of circuits deriving from multiple senses may drive plasticity even more strongly than input from a single sense (Lee & Noppeney, 2011, 2014). A final feature of music training that is likely crucial in promoting plasticity is the rewarding nature of performance. There are three aspects of reward that may stimulate plasticity: first, the rewarding nature of music itself that is experienced through playing; second, the intrinsic reward of performing, both for the player and through the acclaim it may bring; and finally, the potentially rewarding nature of practice and the pleasure of accurate performance. The intrinsic pleasure derived from music appears to be common to most people (Mas-Herrero, Marco-Pallares, Lorenzo-Seva,

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   431 Zatorre, & Rodriguez-Fornells, 2011), and is hypothesized to be based on the same dopamine-modulated, predictive systems that regulate reward in other domains with direct biological consequences, including drugs, food, sex, and money (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015). Thus learning to produce a rewarding stimulus, such as music, is likely to be rewarding to the player. We also know that learning and brain plasticity are strongly affected by the reward value of what is learned. Animal studies show that brain plasticity associated with auditory learning is greater when the information to be learned is rewarded, or behaviorally relevant. For example, the responses of neurons in the auditory cortex of ferrets were modulated by the reward value of stimuli (David, Fritz, & Shamma, 2012). Further, pairing a tone with stimulation of dopamine circuits in the brainstem increased the selectivity of responding in auditory neurons tuned to the same tone (Bao, Chan, & Merzenich, 2001). Importantly, dopamine has been shown to modulate motor learning in both humans and animals (Floel et al., 2005; Tremblay et al., 2009, 2010); possibly through the reinforcement and habit-formation circuitry of the striatum (Graybiel & Grafton, 2015; Haith & Krakauer, 2013). Thus, if the output of practice, a beautiful piece of music, is rewarding and stimulates dopamine release, then playing such a piece should promote learning. It is also likely that the social benefits of playing music add to this type of reward. Finally, humans seem to have a strong internal motivation to practice and perfect many skills, even if those skills do not have immediate physiological, psychological, or social outcomes. In addition to music, people spend hours perfecting their golf swing, playing video games, or baking elaborate cakes. All of these skills require practice, and the outcome of practice is often not immediate. Thus we hypothesize that practice itself may be rewarding, and that the prediction–feedback–error-correction cycle that is important for learning, may be motivating across a range of domains. When musicians are learning a new and challenging piece, or perfecting an old one, they know exactly what they want it to sound like. This representation is translated into a motor plan, and both the imagined outcome and the plan become predictions against which they will measure their performance. When musicians attempt to play the piece, they will likely make errors, which lead to corrections and learning; but when they play the piece as imagined, they experience the reward of accurate performance. Because error feedback and reward are so important for learning, these mechanisms seem like strong candidates for promoting brain plasticity, but have been little explored.

Where Do We Go From Here? Bringing together the data from this review, we suggest three directions for future research. (1) Currently, most studies examine GM and WM differences separately, or do not directly link them through analysis. Analyses typically target differences in individual regions, when it is very likely that plasticity changes occur at the network

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

432   virginia b. penhune level. Additionally, groups are defined a priori rather than using data-driven approaches using participant characteristics such as training duration or age-ofstart. Implementing these kinds of analyses requires large samples with multiple imaging measures. This implies a multi-center, data-sharing approach where standard behavioral and imaging protocols are implemented to allow aggregation of results. (2) A related goal for music neuroscientists in the next ten years should be the establishment of standardized test batteries with age-based norms that can be administered across locations. A number of groups have been working on the development of tests aimed at children and adults (Dalla Bella et al., 2017; Ireland, Parker, Foster, & Penhune, in press; Mullensiefen, Gingras, Musil, & Stewart, 2014; Peretz et al., 2013). Important features of such norms are: availability, standard of administration, and up-to-date norms. (3) Studies targeting gene–maturation–environment interactions that will allow us to understand the complex interactions between pre-existing individual differences in ability, and the type and timing of music training. Music-specific databases and standard instruments would contribute to the feasibility of such work.

References Abdul-Kareem, I.  A., Stancak, A., Parkes, L.  M., Al-Ameen, M., Alghamdi, J., Aldhafeeri, F. M., . . . Sluming, V. (2011). Plasticity of the superior and middle cerebellar peduncles in musicians revealed by quantitative analysis of volume and number of streamlines based on diffusion tensor tractography. Cerebellum 10(3), 611–623. Abdul-Kareem, I. A., Stancak, A., Parkes, L. M., & Sluming, V. (2011). Increased gray matter volume of left pars opercularis in male orchestral musicians correlate positively with years of musical performance. Journal of Magnetic Resonance Imaging 33(1), 24–32. Altenmüller, E., & Furuya, S. (2016). Brain plasticity and the concept of metaplasticity in skilled musicians. Advances in Experimental Medicine and Biology 957, 197–208. Amunts, K., Schlaug, G., Jancke, L., Steinmetz, H., Schleicher, A., Dabringhaus, A., & Zilles, K. (1997). Motor cortex and hand motor skills: Structural compliance in the human brain. Human Brain Mapping 5(3), 206–215. Andersen, R. A., & Cui, H. (2009). Intention, action planning, and decision making in parietalfrontal circuits. Neuron 63(5), 568–583. Baer, L., Park, M., Bailey, J., Chakravarty, M., Li, K., & Penhune, V. (2015). Regional cerebellar volumes are related to early musical training and finger tapping performance. NeuroImage 109, 130–139. Bailey, J.  A., & Penhune, V.  B. (2010). Rhythm synchronization performance and auditory working memory in early- and late-trained musicians. Experimental Brain Research 204(1), 91–101. Bailey, J. A., & Penhune, V. B. (2012). A sensitive period for musical training: Contributions of age of onset and cognitive abilities. Annals of the New York Academy of Sciences 1252, 163–170. Bailey, J. A., & Penhune, V. B. (2013). The relationship between the age of onset of musical training and rhythm synchronization performance: Validation of sensitive period effects. Frontiers in Neuroscience 7, 227. Retrieved from https://doi.org/10.3389/fnins.2013.00227

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   433 Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2014). Early musical training is linked to gray matter structure in the ventral premotor cortex and auditory-motor rhythm synchronization performance. Journal of Cognitive Neuroscience 26(4), 755–767. Balsters, J. H., Whelan, C. D., Robertson, I. H., & Ramnani, N. (2013). Cerebellum and cognition: Evidence for the encoding of higher order rules. Cerebral Cortex 23(6), 1433–1443. Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., . . . Altenmüller, E. (2006). Shared networks for auditory and motor processing in professional pianists: Evidence from fMRI conjunction. NeuroImage 30(3), 917–926. Bao, S., Chan, V. T., & Merzenich, M. M. (2001). Cortical remodelling induced by activity of ventral tegmental dopamine neurons. Nature 412(6842), 79–83. Bengtsson, S., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience 8(9), 1148–1150. Bermudez, P., Lerch, J. P., Evans, A. C., & Zatorre, R. J. (2009). Neuroanatomical correlates of musicianship as revealed by cortical thickness and voxel-based morphometry. Cerebral Cortex 19(7), 1583–1596. Blecher, T., Tal, I., & Ben-Shachar, M. (2016). White matter microstructural properties correlate with sensorimotor synchronization abilities. NeuroImage 138, 1–12. Brown, R. M., Zatorre, R. J., & Penhune, V. B. (2015). Expert music performance: Cognitive, neural, and developmental bases. Progress in Brain Research 217, 57–86. Butkovic, A., Ullén, F., & Mosing, M.  A. (2015). Personality-related traits as predictors of music practice: Underlying environmental and genetic influences. Personality and Individual Differences 74, 133–138. Chang, E. F., & Merzenich, M. M. (2003). Environmental noise retards auditory cortical development. Science 300(5618), 498–502. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008). Moving on time: Brain network for auditory-motor synchronization is modulated by rhythm complexity and musical training. Journal of Cognitive Neuroscience 20(2), 226–239. Chen, J. L., Rae, C., & Watkins, K. E. (2012). Learning to play a melody: An fMRI study examining the formation of auditory-motor associations. NeuroImage 59(2), 1200–1208. Chiang, M.  C., Barysheva, M., Shattuck, D.  W., Lee, A.  D., Madsen, S.  K., Avedissian, C., . . . Thompson, P. M. (2009). Genetics of brain fiber architecture and intellectual performance. Journal of Neuroscience 29(7), 2212–2224. D’Ausilio, A., Altenmüller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity of the motor cortex while listening to a rehearsed musical piece. European Journal of Neuroscience 24(3), 955–958. Dalla Bella, S., Farrugia, N., Benoit, C. E., Begel, V., Verga, L., Harding, E., & Kotz, S. A. (2017). BAASTA: Battery for the Assessment of Auditory Sensorimotor and Timing Abilities. Behavior Research Methods 49(3), 1128–1145. David, S. V., Fritz, J. B., & Shamma, S. A. (2012). Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proceedings of the National Academy of Sciences 109(6), 2144–2149. de Manzano, O., & Ullén, F. (2018). Same genes, different brains: Neuroanatomical differences between monozygotic twins discordant for musical training. Cerebral Cortex 28(1), 387–394. de Villers-Sidani, E., Chang, E. F., Bao, S., & Merzenich, M. M. (2007). Critical period window for spectral tuning defined in the primary auditory cortex (A1) in the rat. Journal of Neuroscience 27(1), 180–189.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

434   virginia b. penhune Diedrichsen, J., Balsters, J. H., Flavell, J., Cussans, E., & Ramnani, N. (2009). A probabilistic MR atlas of the human cerebellum. NeuroImage 46(1), 39–46. Diedrichsen, J., Shadmehr, R., & Ivry, R. B. (2010). The coordination of movement: Optimal feedback control and beyond. Trends in Cognitive Sciences 14(1), 31–39. E, K. H., Chen, S. H., Ho, M. H., & Desmond, J. E. (2014). A meta-analysis of cerebellar contributions to higher cognition from PET and fMRI studies. Human Brain Mapping 35(2), 593–615. Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–307. Engel, A., Hijmans, B.  S., Cerliani, L., Bangert, M., Nanetti, L., Keller, P.  E., & Keysers, C. (2014). Inter-individual differences in audio-motor learning of piano melodies and white matter fiber tract architecture. Human Brain Mapping 35(5), 2483–2497. Floel, A., Breitenstein, C., Hummel, F., Celnik, P., Gingert, C., Sawaki, L., . . . Cohen, L.  G. (2005). Dopaminergic influences on formation of a motor memory. Annals of Neurology 58(1), 121–130. Foster, N. E., & Zatorre, R. J. (2010). Cortical structure predicts success in performing musical transformation judgments. NeuroImage 53(1), 26–36. Gaser, C., & Schlaug, G. (2003). Brain structure differences between musicians and nonmusicians. Journal of Neuroscience 23(27), 9240–9245. Giedd, J., Blumenthal, J., Jeffries, N., Castellanos, F., Liu, H., Zijdenbos, A., . . . Rapoport, J. (1999). Brain development during childhood and adolescence: A longitudinal MRI study. Nature Neuroscience 2(10), 861–863. Gogos, A., Gavrilescu, M., Davison, S., Searle, K., Adams, J., Rossell, S. L., . . . Egan, G. F. (2010). Greater superior than inferior parietal lobule activation with increasing rotation angle during mental rotation: An fMRI study. Neuropsychologia 48(2), 529–535. Golestani, N., Molko, N., Dehaene, S., LeBihan, D., & Pallier, C. (2007). Brain structure predicts the learning of foreign speech sounds. Cerebral Cortex 17(3), 575–582. Golestani, N., Paus, T., & Zatorre, R. (2002). Anatomical correlates of learning novel speech sounds. Neuron 35, 997–1010. Graybiel, A.  M., & Grafton, S.  T. (2015). The striatum: Where skills and habits meet. Cold Spring Harbor Perspectives in Biology 7(8), a021691. doi:10.1101/cshperspect.a021691 Habibi, A., Damasio, A., Ilari, B., Veiga, R., Joshi, A. A., Leahy, R. M., . . . Damasio, H. (2017). Childhood music training induces change in micro and macroscopic brain structure: Results from a longitudinal study. Cerebral Cortex 1–12. doi:10.1093/cercor/bhx286 Haith, A. M., & Krakauer, J. W. (2013). Model-based and model-free mechanisms of human motor learning. Advances in Experimental Medicine and Biology 782, 1–21. Halwani, G. F., Loui, P., Ruber, T., & Schlaug, G. (2011). Effects of practice and experience on the arcuate fasciculus: Comparing singers, instrumentalists, and non-musicians. Frontiers in Psychology 2, 156. Retrieved from https://doi.org/10.3389/fpsyg.2011.00156 Herholz, S. C., Boh, B., & Pantev, C. (2011). Musical training modulates encoding of higherorder regularities in the auditory cortex. European Journal of Neuroscience 34(3), 524–529. Herholz, S.  C., & Zatorre, R. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Hund-Georgiadis, M., & von Cramon, D. (1999). Motor-learning-related changes in piano players and non-musicians revealed by functional magnetic-resonance signals. Experimental Brain Research 125(4), 417–425.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   435 Hutchinson, S., Lee, L. H., Gaab, N., & Schlaug, G. (2003). Cerebellar volume of musicians. Cerebral Cortex 13(9), 943–949. Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. Journal of Neuroscience 29(10), 3019–3025. Ireland, K., Parker, A., Foster, N., & Penhune, V. (in press). Rhythm and melody tasks for schoolaged children with and without musical training: Age-equivalent scores and reliability. Frontiers in Auditory Cognitive Neuroscience. Ivry, R. B., Spencer, R. M., Zelaznik, H. N., & Diedrichsen, J. (2002). The cerebellum and event timing. Annals of the New York Academy of Sciences 978, 302–317. James, C. E., Oechslin, M. S., Van De Ville, D., Hauert, C. A., Descloux, C., & Lazeyras, F. (2014). Musical training intensity yields opposite effects on grey matter density in cognitive versus sensorimotor networks. Brain Structure & Function 219(1), 353–366. Jorgensen, H. (2011). Instrumental learning: Is an early start a key to success? British Journal of Music Education 18(3), 227–239. Karpati, F. J., Giacosa, C., Foster, N. E. V., Penhune, V. B., & Hyde, K. L. (2017). Dance and music share gray matter structural correlates. Brain Research 1657, 62–73. Kelly, R., & Strick, P. (2003). Cerebellar loops with motor cortex and prefrontal cortex of a non-human primate. Journal of Neuroscience 23(23), 8432–8444. Kleim, J. A., Barnaby, S., Cooper, N., Hogg, T., Reidel, C., Remple, M., & Nudo, R. (2002). Motor learning-dependent synaptogenesis is localized to functionally reorganized motor cortex. Neurobiology of Learning and Memory 77(1), 63–77. Kleim, J. A., Freeman, J. H., Jr., Bruneau, R., Nolan, B. C., Cooper, N. R., Zook, A., & Walters, D. (2002). Synapse formation is associated with memory storage in the cerebellum. Proceedings of the National Academy of Sciences 99(20), 13228–13231. Kleim, J. A., Hogg, T., VandenBerg, P., Cooper, N., Bruneau, R., & Remple, M. (2004). Cortical synaptogenesis and motor map reorganziation occur during late, but not early, phase of motor skill learning. Journal of Neuroscience 24(3), 628–633. Kolb, B., Mychasiuk, R., Muhammad, A., Li, Y., Frost, D. O., & Gibb, R. (2012). Experience and the developing prefrontal cortex. Proceedings of the National Academy of Sciences 109(Suppl. 2), 17186–17193. Koziol, L. F., Budding, D., Andreasen, N., D’Arrigo, S., Bulgheroni, S., Imamizu, H., . . . Yamazaki, T. (2014). Consensus paper: The cerebellum’s role in movement and cognition. Cerebellum 13(1), 151–177. Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomotor recognition network while listening to newly acquired actions. Journal of Neuroscience 27(2), 308–314. Lakhani, B., Borich, M. R., Jackson, J. N., Wadden, K. P., Peters, S., Villamayor, A., . . . Boyd, L.  A. (2016). Motor skill acquisition promotes human brain myelin plasticity. Neural Plasticity 2016, 7526135. doi:10.1155/2016/7526135 Landi, S. M., Baguear, F., & Della-Maggiore, V. (2011). One week of motor adaptation induces structural changes in primary motor cortex that predict long-term memory one year later. Journal of Neuroscience 31(33), 11808–11813. Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by short-term unimodal and multimodal musical training. Journal of Neuroscience 28(39), 9632–9639.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

436   virginia b. penhune Lee, H., & Noppeney, U. (2011). Long-term music training tunes how the brain temporally binds signals from multiple senses. Proceedings of the National Academy of Sciences 108(51), E1441–E1450. Lee, H., & Noppeney, U. (2014). Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music. Frontiers in Psychology 5, 868. Retrieved from https://doi.org/10.3389/fpsyg.2014.00868 Lega, C., Stephan, M. A., Zatorre, R. J., & Penhune, V. (2016). Testing the role of dorsal premotor cortex in auditory-motor association learning using transcranical magnetic stimulation (TMS). PLoS ONE 11(9), e0163380. Li, X., De Beuckelaer, A., Guo, J., Ma, F., Xu, M., & Liu, J. (2014). The gray matter volume of the amygdala is correlated with the perception of melodic intervals: a voxel-based morph­ometry study. PLoS ONE 9(6), e99889. Macnamara, B. N., Hambrick, D. Z., & Oswald, F. L. (2014). Deliberate practice and perform­ance in music, games, sports, education, and professions: A meta-analysis. Psychological Science 25(8), 1608–1618. Mas-Herrero, E., Marco-Pallares, J., Lorenzo-Seva, U., Zatorre, R. J., & Rodriguez-Fornells, A. (2011). Individual differences in music reward experiences. Music Perception 31(2), 118–138. Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not make perfect: No causal effect of music practice on music ability. Psychological Science 25(9), 1795–1803. Mullensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians: An index for assessing musical sophistication in the general population. PLoS ONE 9(2), e89642. Novembre, G., & Keller, P. E. (2014). A conceptual review on action-perception coupling in the musicians’ brain: What is it good for? Frontiers in Human Neuroscience 8, 603. Retrieved from https://doi.org/10.3389/fnhum.2014.00603 Nudo, R., Milliken, G., Jenkins, W., & Merzenich, M. (1996). Use-dependent alterations of movement representations in primary motor cortex of adult squirrel monkeys. Journal of Neuroscience 16(2), 785–807. Paquette, S., Fujii, S., Li, H. C., & Schlaug, G. (2017). The cerebellum’s contribution to beat interval discrimination. NeuroImage 163, 177–182. Peretz, I., Gosselin, N., Nan, Y., Caron-Caplette, E., Trehub, S. E., & Beland, R. (2013). A novel tool for evaluating children’s musical abilities across age and culture. Frontiers in Systems Neuroscience 7, 30. Retrieved from https://doi.org/10.3389/fnsys.2013.00030 Poeppel, D. (2014). The neuroanatomic and neurophysiological infrastructure for speech and language. Current Opinion in Neurobiology 28, 142–149. Putkinen, V., Tervaniemi, M., Saarikivi, K., Ojala, P., & Huotilainen, M. (2014). Enhanced development of auditory change detection in musically trained school-aged children: A longitudinal event-related potential study. Developmental Science 17(2), 282–297. Ragert, P., Schmidt, A., Altenmüller, E., & Dinse, H. (2004). Superior tactile performance and learning in professional pianists: Evidence for meta-plasticity in musicians. European Journal of Neuroscience 19(2), 473–478. Rauschecker, J. (2011). An expanded role for the dorsal auditory pathway in sensorimotor control and integration. Hearing Research 271, 16–25. Repp, B. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic Bulletin and Review 12(6), 969–992.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

musical expertise and brain structure   437 Rosenkranz, K., Williamon, A., & Rothwell, J.  C. (2007). Motorcortical excitability and synaptic plasticity is enhanced in professional musicians. Journal of Neuroscience 27(19), 5200–5206. Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Schlaug, G., Jancke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum size in musicians. Neuropsychologia 33(8), 1047–1055. Schmahmann, J. (1997). The cerebrocerebellar system. In J. Schmahmann (Ed.), The Cerebellum and Cognition (Vol. 41, pp. 31–55). San Diego, CA: Academic Press. Schneider, P., Scherg, M., Dosch, H., Specht, H., Gutschalk, A., & Rupp, A. (2002). Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians. Nature Neuroscience 5(7), 688–694. Schneider, P., Sluming, V., Roberts, N., Scherg, M., Goebel, R., Specht, H. J., . . . Rupp, A. (2005). Structural and functional asymmetry of lateral Heschl’s gyrus reflects pitch perception preference. Nature Neuroscience 8(9), 1241–1247. Scholz, J., Allemang-Grand, R., Dazai, J., & Lerch, J. P. (2015). Environmental enrichment is associated with rapid volumetric brain changes in adult mice. NeuroImage 109, 190–198. Scholz, J., Klein, M. C., Behrens, T. E., & Johansen-Berg, H. (2009). Training induces changes in white-matter architecture. Nature Neuroscience 12(11), 1370–1371. Scholz, J., Niibori, Y., Frankland, P. W., & Lerch, J. P. (2015). Rotarod training in mice is associated with changes in brain structure observable with multimodal MRI. NeuroImage 107, 182–189. Seither-Preisler, A., Parncutt, R., & Schneider, P. (2014). Size and synchronization of auditory cortex promotes musical, literacy, and attentional skills in children. Journal of Neuroscience 34(33), 10937–10949. Sluming, V., Barrick, T., Howard, M., Cezayirli, E., Mayes, A., & Roberts, N. (2002). Voxel-based morphometry reveals increased gray matter density in Broca’s area in male symphony orchestra musicians. NeuroImage 17(3), 1613–1622. Sokolov, A.  A., Miall, R.  C., & Ivry, R.  B. (2017). The cerebellum: Adaptive prediction for movement and cognition. Trends in Cognitive Sciences 21(5), 313–332. Steele, C. J., Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2013). Early musical training and white-matter plasticity in the corpus callosum: Evidence for a sensitive period. Journal of Neuroscience 33(3), 1282–1290. Stephan, M. A., Brown, R., Lega, C., & Penhune, V. (2016). Melodic priming of motor sequence performance: The role of the dorsal premotor cortex. Frontiers in Neuroscience 10, 210. Retrieved from https://www.frontiersin.org/articles/10.3389/fnins.2016.00210 Taubert, M., Draganski, B., Anwander, A., Muller, K., Horstmann, A., Villringer, A., & Ragert, P. (2010). Dynamic properties of human brain structure: Learning-related changes in cor­ tical areas and associated fiber connections. Journal of Neuroscience 30(35), 11670–11677. Tiemeier, H., Lenroot, R. K., Greenstein, D. K., Tran, L., Pierson, R., & Giedd, J. N. (2010). Cerebellum development during childhood and adolescence: A longitudinal morphometric MRI study. NeuroImage 49(1), 63–70. Tremblay, P. L., Bedard, M. A., Langlois, D., Blanchet, P. J., Lemay, M., & Parent, M. (2010). Movement chunking during sequence learning is a dopamine-dependent process: A study conducted in Parkinson’s disease. Experimental Brain Research 205(3), 375–385.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

438   virginia b. penhune Tremblay, P. L., Bedard, M. A., Levesque, M., Chebli, M., Parent, M., Courtemanche, R., & Blanchet, P. J. (2009). Motor sequence learning in primate: Role of the D2 receptor in movement chunking during consolidation. Behavioural Brain Research 198(1), 231–239. Ullén, F., Hambrick, D. Z., & Mosing, M. A. (2016). Rethinking expertise: A multifactorial gene–environment interaction model of expert performance. Psychological Bulletin 142(4), 427–446. Vaquero, L., Hartmann, K., Ripolles, P., Rojo, N., Sierpowska, J., Francois, C., . . . Altenmüller, E. (2016). Structural neuroplasticity in expert pianists depends on the age of musical training onset. NeuroImage 126, 106–119. Vaquero, L., Rodriguez-Fornells, A., & Reiterer, S. M. (2017). The left, the better: White-matter brain integrity predicts foreign language imitation ability. Cerebral Cortex 27(8), 3906–3917. Westerhausen, R., Luders, E., Specht, K., Ofte, S. H., Toga, A. W., Thompson, P. M., . . . Hugdahl, K. (2011). Structural and functional reorganization of the corpus callosum between the age of 6 and 8 years. Cerebral Cortex 21(5), 1012–1017. Wolpert, D. M., Diedrichsen, J., & Flanagan, J. R. (2011). Principles of sensorimotor learning. Nature Reviews Neuroscience 12(12), 739–751. Wong, P. C., Warrier, C. M., Penhune, V. B., Roy, A. K., Sadehh, A., Parrish, T. B., & Zatorre, R.  J. (2008). Volume of left Heschl’s gyrus and linguistic pitch learning. Cerebral Cortex 18(4), 828–836. Zatorre, R. J., Chen, J., & Penhune, V. (2007). When the brain plays music: Sensory-motor interactions in music perception and production. Nature Reviews Neuroscience 8, 547–558. Zatorre, R. J., Fields, R. D., & Johansen-Berg, H. (2012). Plasticity in gray and white: Neuroimaging changes in brain structure during learning. Nature Neuroscience 15(4), 528–536.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 18

Genom ics A pproach e s for Stu dy i ng M usica l A ptitu de a n d R el ated  Tr a its Irma Järvelä

Genomic Approaches to Study Human Traits Each cell in the human body contains 46 chromosomes, made up of ~3 billion nucleotides containing about 20,000 individual genes (Dixon-Salazar & Gleeson,  2010). Of the 20,000 genes, to date the functions of 4,000 genes have been uncovered (http://www. omim.org/). About 1.5 percent of the genome encodes amino acids that form the building blocks of human tissues and organs, the proteins. The human cerebral cortex is made up of ~20 billion neurons, each of which makes an average of 7,000 synaptic contacts (Dixon-Salazar & Gleeson,  2010). The human brain exhibits a higher expression of genes for synaptic transmission and plasticity and higher energy metabolism compared to other primates (Cáceres et al., 2003). Genomic approaches enable the study of biological phenomena in an unbiased and hypothesis-free fashion, without any knowledge of the biological background of the phenotype of interest (Lander,  2011). Molecular genetic analyses can be applied to study human traits based on their molecular properties rather than anatomic regions. The utility of next generation sequencing technology has facilitated the identification of individual genetic variants (“genetic selfies”) with decreasing cost (Lindor, Thibodeau, & Burke, 2017). This has been exemplified in medical research, where thousands of genes that cause inherited diseases or predisposition to common diseases have been identified.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

440   irma järvelä

Genetic load

Genetic dominant fully penetrant Incompletely penetrant Polygenic Multifactorial Environmental

Environmental effect

Figure 1.  The mode of inheritance of human traits spans from monogenic, that is, caused by a single gene, to multifactorial inheritance caused by numerous predisposing variants and environmental factors. Based on genetic and genomics studies musical aptitude is inherited as a multifactorial trait for which both predisposing genetic variants and exposure to music as an environmental factor are needed (Oikkonen et al., 2015; Park et al., 2012; Pulli et al., 2008).

Molecular genetic studies are based on Mendelian rules, knowing that children inherit half of their genes from their mother and half from their father. The inherited variants remain the same during their entire lives. This is the unique strength of DNA studies in the identification of genetic variants associated with human traits. Using statistical methods, genetic loci and alleles can be identified in the human genome that are associated with the trait under study. Genes with their pathways located in the associated regions are the candidate genes whose functions can explain the biological characteristics of a trait under study. Environmental factors (lifestyle) can affect the expression and regulation of genes. The effect of the environmental triggers on the expression and regulation of the genes can be studied for example by RNA- and microRNA-sequencing in humans and model organisms. Methods of genomics and bioinformatics can be applied to combine the data to identify genes and alleles, their regulation, and the pathways linked to musical aptitude and music-related behavioral traits (e.g., music education, listening, performing, creating music; see Fig. 1).

Musical Aptitude as a Biological Trait Musical practices represent distinctive cognitive abilities of humans. In biological (genetic) terms, musical aptitude represents a complex cognitive trait in humans where the function of the auditory pathway (inner ear, brainstem, auditory cortex) and several brain regions are involved. Music is sound that is recognized by hair cells in the inner ear.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   441 These sounds are transmitted as electrical signals through the midbrain to the auditory cortex. About 1 percent of all human genes have a function in hearing; of them at least 80 are known to cause hearing loss (http://hereditaryhearingloss.org/) (Atik, Bademci, Diaz-Horta, Blanton, & Tekin, 2017). Brains are naturally very sensitive to environmental exposure to music (Perani et al.,  2010) and music training (see, e.g., Herholz & Zatorre, 2012; Koelsch, 2010). This sensitivity is age-dependent similarly to language (Penhune & de Villers-Sidani, 2014; White, Hutka, Williams, & Moreno, 2013) or vocal learning in songbirds (Rothenberg, Roeske, Voss, Naguib, & Tchernichovski, 2014). The sensitivity may be linked to emotional content characteristic of musical sounds that have their effect on human body functions (Nakahara, Masuko, Kinoshita, Francis, & Furuya,  2011; Salimpoor, Benovoy, Larcher, Dagher, & Zatorre,  2011). However, the molecular mechanisms and biological pathways mediating the effects of music remain largely unknown. It may be that the ability to detect musical sounds serves as a prerequisite for appreciating music. This ability is called musical aptitude in this chapter. Musical aptitude can include abilities, for example, to perceive and understand intensity, pitch, timbre and tone duration, and the rhythm and structure they form in music. Carl Seashore developed a battery of tests consisting of six subtests that measure pitch, intensity, time, consonance, tonal memory, and rhythm (Seashore, Lewis, & Saetveit, 1960). The Seashore tests for pitch (SP) and for time (ST) consist of pair-wise comparisons of the physical properties of sound, and are used to measure simple sensory capacities such as the ability to detect small differences in tone pitch or length. Karma (1994) developed a music test (KMT) to measure the structure of music that includes recognition of melodic contour, grouping, and relational pitch processing. Auditory structuring ability can be defined as the ability to identify temporal aspects in time (detecting sound patterns in time) (Karma, 1994). A similar kind of pattern recognition is found in many other fields like sport and poetry (comprising language and speech) that resembles gestalt principles in recognition of music structure (Justus & Hutsler, 2005). In zebra finches, identification of acoustic features of song syllables (pitch and timbre) and the species-specific typical gap durations (rhythm) between song syllables are detected by different neural cells (Araki, Bandi, & Yazaki-Sukiyama, 2016). Temporal coding of inter-syllable silent gaps seems to be preserved when birds are exposed to different song environments, suggesting that temporal gap coding is innate and species-specific whereas syllable morphology coding is more experience dependent (Araki et al., 2016). In fact, the detection of gaps resembles the detection of pauses in music structure in humans. This is related to understanding tones with time that evokes anticipatory responses because of the cognitive expectations and prediction cues involved in listening to music (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015). In fact, combined music test scores (KMT, SP, ST) were normally distributed among participants with no specific music education (Oikkonen & Järvelä, 2014) suggesting that the ability to detect pitch, time, and sound patterns is common in populations with no music training. Abilities that animals exhibit without the need for training are referred to as innate traits. The possession of a natural musical ability may explain why musical practices are common and present in all societies.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

442   irma järvelä 100.0%

80.0%

Count

60.0%

40.0%

20.0%

0%

no practice in playing an instrument

practice of < 2 years

none of the parents

practice of >2 years one parent

professional musician both parents

Figure 2.  Parental music education is related to children’s music education. High music education is common among parents of professional musicians (n = 100). Reproduced from Irma Järvelä, Genomics studies on musical aptitude, music perception, and practice, Annals of the New York Academy of Sciences, Special Issue: The Neurosciences and Music 6, p. 2, Figure 1, doi:10.1111/nyas.13620, Copyright © 2018, New York Academy of Sciences.

It has been observed that musicianship clusters in families. How much is this aggregation due to genetic and/or environmental factors, such as exposure to music? Several studies have been performed to analyze the inheritance of musical traits. In a twin study using a Distorted Tunes Test (DTT) (the subjects’ task was to recognize wrong tones incorporated into simple popular melodies) the correlation between the test scores was 0.67 in monozygous and 0.44 in dizygous twins (Drayna, Manichaikul, de Lange, & Snieder, 2001). The heritability (defined as the proportion of the total variance of the phenotype that is genetic, h2 = VG/VP, where VG is genetic variance and VP is the overall variance of the phenotype) of the auditory structuring ability test (Karma Music Test, KMT) was 0.46 in the Finnish families examined (Oikkonen et al., 2015). Carl Seashore’s subtests of pitch (SP) and time discrimination (ST) measure the ability to detect small differences between two sequentially presented tones. The heritability scores are 0.68 and 0.21 for SP and ST, respectively. The heritability of combined KMT, SP and ST scores COMB was 0.60 (Oikkonen et al., 2015). The heritability of pitch perception test (PPA) that is based on singing was 40 percent (Park, Lee, Kim, & Ju, 2012). A genetic component has also been demonstrated in rare music phenotypes

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   443 such as congenital amusia (Peretz, Cummings, & Dube, 2007) and absolute pitch (AP) (Baharloo, Service, Risch, Gitschier, & Freimer,  2000). Congenital amusia is often referred to as “tone deafness” and is a disorder in which a subject’s ability to perceive or produce music is disturbed. A recent family aggregation study showed that the sibling relative risk (λS,) was estimated to be 10.8, which suggests a genetic contribution to the trait (Peretz et al., 2007). Another extreme trait is absolute pitch (AP). AP refers to the ability to identify and name pitches without a reference pitch, and the sibling relative risk (λS) has been estimated to range from 7.8 to 15.1 (Baharloo et al., 2000). In fact, music perception belongs to a class of human cognitive abilities that has been shown to be highly familial. In the Finnish families, 52 percent of the professional musicians had one or both parents who were also professional musicians (Fig. 2).

Evolution of Musical Aptitude Evolution is based on genetic alleles that are transmitted through generations during history. Music cultures can develop in diverse directions but they are linked to the genetic alleles in the human genome. These alleles are responsible for biological determinant human traits. Favorable alleles are enriched in the gene pool showing high allele frequencies associated with the beneficial trait, whereas damaging alleles that cause harmful effects tend to disappear from the gene pool. The universality of music in all societies suggests that beneficial alleles do underlie music-related behavior. However, it is not known what distinguishes humans from primates with regard to musical ability and what are the biological determinants underlying artistic cognitive traits. It is notable that modern humans have an auditory center that functions identically to that of the first primates that lived millions of years ago (Parker et al., 2013). Adaptive convergent sequence evolution has also been found in echolocating bats and dolphins (Montealegre-Z, Jonsson, Robson-Brown, Postles, & Robert,  2012), implying that numerous genes are linked not only to hearing but also vision. Interestingly, several birdsong genes were shown to be upregulated when listening to and performing music (Guo et al.,  2007; Horita et al.,  2012; Kanduri, Kuusi, et al.,  2015; Kanduri, Raijas, et al., 2015; Pfenning et al., 2014). These data suggest that the machinery to facilitate the hearing of sounds is highly conserved. It facilitates communication via sounds important for the survival of humans and other species. Vocal learning in songbirds shows similar features to those found in humans (Araki et al., 2016). Recent studies on songbirds have shown that there exist two different types of brain cells in the bird auditory cortex, these register song syllables in zebra finches (Araki et al., 2016). One type identifies the acoustic features of song syllables (pitch and timbre) that are affected more by the environment whereas the other type detects the species-specific typical gap durations (rhythm) between song syllables which are preserved (Araki et al., 2016). Advanced cognitive abilities are characteristic of humans and are likely to be the recent result of positive selection (Sabeti et al., 2006). For example, FOXP2 has been

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

444   irma järvelä implicated in human speech and language has been under positive selection during recent human evolution (Enard et al., 2002). As genetic evolution is much slower than cultural evolution, we and others (Honing, Ten Cate, Peretz, & Trehub, 2015) hypothesize that the genetic variants associated with musical aptitude have a pivotal role in the development of music culture. In comparison, in songbirds, the evolution of song culture is the result of a multigenerational process where the song is developed by vertical transmission in a species-specific fashion suggesting genetic constraints (Lipkind & Tchernichovski,  2011). This emphasizes the importance of the selection of parental singing skills and their genetic background in evolution. According to the Mendelian rules, half of the genes are directly inherited to the offspring. In fact, the genetic component is larger as the other half of the genes that are not transmitted to the children shape the parental behavior and affect the children’s development. Concordantly, Hambrick et al. (2014) have shown that training in music is responsible for about 30 percent of music performance in professional musicians implying that other factors including genes have a larger effect. In a Swedish twin study, it was found that willingness to practice music is an independent personality trait that has a high heritability (40–70 percent) (Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014). These results point to a greater and independent role of genetic factors contributing to music perception and practice. Genomic approaches can be used to identify the regions of positive selection in the human genome. Variations in the music test scores of auditory structuring ability (Karma Music Test; KMT), Carl Seashore’s subtests of pitch (SP) and time discrimination (ST) suggest that the alleles may have been targeted for selection. When testing three methods for selection, haplotype based methods haploPS, XP-EHH, and the allele frequency based method FST in the combined phenotype of three music test scores shown earlier (COMB), hundreds of genes were found in the selection regions (Liu et al., 2016). Several of them were known to be involved in auditory perception and inner ear development (DICER1, FGF20, CUX1, SPARC, KIF3A, TGFB3, LGR5, GPR98, PAX8, COL11A1, USH2A, PROX1). The findings are consistent with the convergent evolution of genes related to auditory processes and communication in other species (Montealegre-Z et al., 2012; Parker et al., 2013; Zhang et al., 2014). Some genes were known to affect cognition and memory (e.g., GRIN2B, IL1A, IL1B, RAPGEF5) and reward mechanisms (RGS9). Interestingly, several genes were linked to song perception and production in songbirds (e.g., FOXP1, RGS9, GPR98, GRIN2B, VLDLR). Of these GPR98 expressed in the song control nuclei of the vocalizing songbird (zebra finch) has been found to be under positive selection in the songbird lineage (Pfenning et al., 2014). Some hypotheses could be constructed based on previous biological knowledge about the identified genes. FOXP2 has been implicated in an inherited language disorder (Lai, Fisher, Hurst, Vargha-Khadem, & Monaco, 2001) that causes disturbances in the ability to detect timing (rhythm) but not pitch in music (Alcock, Passingham, Watkins, & Vargha-Khadem,  2000). This is concordant with the different brain cells that are

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   445 responsible for pitch and timing in songbirds (Araki et al., 2016). FOXP1 and another candidate gene VLDLR, a direct target gene of human FOXP2 (Ayub et al., 2013; Vernes et al., 2007), belong to the singing-regulated gene networks in the zebra finch. VLDLR, very-low-density lipoprotein receptor (Vldlr) is a member of the Reelin pathway, which affects learned vocalization (Hilliard, Miller, Fraley, Horvath, & White, 2012). GRIN2B is associated with learning, brain plasticity, and cognitive performance in humans (Kauppi, Nilsson, Adolfsson, Eriksson, & Nyberg, 2011) and belongs to the ten prioritized genes in convergent analysis of musical traits in animals and humans (Oikkonen, Onkamo, Järvelä, & Kanduri, 2016). RGS9 is expressed in the striatum and belongs to the regulator of G-protein signaling (RGS) gene family that plays a key role in regulating intracellular signaling of G-protein coupled receptors, such as dopamine. The data support the previous findings of the role of dopaminergic pathway and its link to the reward mechanism as molecular determinants in the positive selection of music (Salimpoor et al.,  2011). This preliminary study identified a huge amount of functionally relevant candidate genes which underlie the evolution of music. Further studies may give a more accurate picture after methods to analyze polygenic selection become available (Qian, Deng, Lu, & Xu, 2013).

Genome-Wide Linkage and Association Analyses of Musical Traits For assigning genetic markers associated with a trait such as musical aptitude the definition of the phenotype is required. As musical aptitude is a complex cognitive trait, it is likely that its individual components have distinct molecular backgrounds. Each of these components (subphenotype) can be analyzed separately and they can also be combined. In a genome-wide study of musical aptitude nearly 800 family members were defined for auditory structuring ability (Karma Music Test, KMT) (Karma, 1994) and perception of pitch and time (Seashore et al., 1960) in music and a combined test score of all the three aforementioned test scores (COMB). When the family material was analyzed for 660,000 genetic markers several genetic loci were found in the human genome (Oikkonen et al., 2015). The identified loci contained candidate genes that affect inner ear development and neurocognitive processes, which are necessary traits for music perception. The highest probability of linkage was obtained at 4q22 (Oikkonen et al., 2015). Earlier, chromosome 4q22 was found in a smaller family material using microsatellite marker scan (Pulli et al., 2008). The strongest association (in unrelated subjects) was found upstream of GATA binding protein 2 (GATA2) at chromosome 3q21.3. GATA2

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

446   irma järvelä is a relevant candidate gene as it regulates the development of cochlear hair cells (Haugas, Lilleväli, Hakanen, & Salminen, 2010) and the inferior colliculus (IC) (Lahti, Achim, & Partanen, 2013) important in tonotopic mapping, that is, the processing of sounds of different frequency in the brain. Interestingly, GATA2 is abundantly expressed in dopaminergic neurons (Scherzer et al., 2008) that release dopamine during emotional arousal to music (Salimpoor et al., 2011). Several plausible candidate genes were located at 4p14 with the highest probability of linkage in the family study (Oikkonen et al., 2015). The pitch perception accuracy (SP) was linked next to the protocadherin 7 gene (PCHD7), expressed in the cochlear (Lin et al.,  2012) and amygdaloid (Hertel, Redies, & Medina, 2012) complexes. PCHD7 is a relevant candidate gene for pitch perception functioning in the hair cells of the cochlea that recognize pitches (Gosselin, Peretz, Johnsen, & Adolphs, 2007). The amygdala is the emotional center of the human brain affected by music (Koelsch, 2010). Interestingly, the homologous gene PCDH15 also affects hair cell sensory transduction and together with cadherin type 23 (CDH23) (another candidate gene at chromosome 16) form a tip-link with each other in sensory hair cells (Sotomayor, Weihofen, Gaudet, & Corey, 2012). Moreover, the Pcdha–gene cluster was found in the CNV-study of musical aptitude (Ukkola-Vuoti et al.,  2013). Platelet-derived growth factor receptor alpha polypeptide (PDGFRA) is expressed in the hippocampus (Di Pasquale et al., 2003), associated with learning and memory. The potassium channel tetramerisation domain containing 8 (KCTD8) is expressed in the spiral ganglion of the cochlea (Metz, Gassmann, Fakler, Schaeren-Wiemers, & Bettler, 2011). KCTD8 also interacts with the GABA receptors GABRB1 and GABRB2; of  these, GABRb1 protein is reduced in schizophrenia, bipolar disorder, and major depression, diseases that severely affect human cognition and mood regulation (Fatemi, Folsom, Rooney, & Thuras,  2013). Cholinergic receptor, nicotinic alpha 9 (neuronal) (CHRNA9) (Katz et al., 2004) and the paired-like homeobox 2b (PHOX2B) (Ousdal et al., 2012) on chromosome 4 also affect inner ear development. In addition, PHOX2B increases amygdala activity and autonomic functions (blood pressure, heart rate, and respiration) that are reported to be affected by music (Blood & Zatorre, 2001). The genome-wide analyses performed on Mongolian families using the pitch perception accuracy (PPA) test identified a partly shared genetic region on chromosome 4q (Park et al., 2012). The statistically most significant locus found in a genome-wide linkage study of absolute pitch (AP) is located at 8q24.21 (Theusch, Basu, & Gitschier, 2009). The results suggest that musical aptitude is an innate ability that is affected by several predisposing genetic variants (Fig. 1). Genome-wide copy number variation (CNV) analysis revealed regions that contain candidate genes for neuropsychiatric disorders were associated with musical aptitude (Ukkola-Vuoti et al.,  2013). A deletion covering the protocadherin-a gene cluster 1–9 (PCDHA 1–9) was associated with low music test scores (COMB) both in familial and sporadic cases. PCDHAs affect synaptogenesis and maturation of the serotonergic projections in the brain and Pcdha mutant mice show abnormalities in learning and memory (Katori et al., 2009).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   447

The Effect of Music Perception and Performance on Human Transcriptome Music acts as an environmental trigger. Numerous studies have shown that listening to and performing classical music have an effect on the human body (Blood & Zatorre, 2001; Salimpoor et al., 2011). When comparing genome-wide RNA expression profiles before and after listening to classical music and after a “music-free” control session, the activity of genes involved in dopamine secretion and transport (SNCA, RTN4, and SLC6A8), and learning and memory (SNCA, NRGN, NPTN, RTN4) were enhanced (Kanduri, Kuusi, et al., 2015). Of these genes, SNCA (George, Jin, Woods, & Clayton, 1995), NRGN (Wood, Olson, Lovell, & Mello, 2008), and RGS2 affect song learning and singing in songbirds (Clayton, 2000) suggesting a shared evolutionary background of sound perception between vocalizing birds and humans. It is noteworthy that the effect of music was only detectable in musically-experienced listeners. The lack of the effect of music in novices could be explained by differences in the amount of exposure to music that is known to affect brain structure and function (Elbert, Pantev, Wienbruch, Rockstroh, & Taub,  1995; Gaser & Schlaug,  2003), unfamiliarity with the music (Salimpoor, Benovoy, Longo, Cooperstock, & Zatorre,  2009), or musical anhedonia (Martínez-Molina, Mas-Herrero, Rodríguez-Fornells, Zatorre, & Marco-Pallarés, 2016). In addition, listening to music increased the expression of the target genes of the dopaminoceptive neuronal glucocorticoid receptor (NR3C1), which increases the synaptic concentration of dopamine linked to rewarding and reinforcing properties (Ambroggi et al., 2009). It is of note that NR3C1 is also a key molecule in the regulation of addictive behavior. Music performance by professional musicians involves a wide spectrum of cognitive and multisensory motor skills, whose molecular basis is largely unknown. The effect of music performance on the genome-wide peripheral blood transcriptome of professional musicians was analyzed by collecting RNA-samples before and after a two-hour concert performance and after a “music-free” control session. The upregulated genes were found to affect dopaminergic neurotransmission, motor behavior, neuronal plasticity, and neurocognitive functions including learning and memory. Specifically, performance of music by professional musicians increased the expression of FOS, DUSP1, SNCA, catecholamine biosynthesis, and dopamine metabolism (Kanduri, Raijas, et al., 2015). Interestingly, SNCA, FOS, and DUSP1 are involved in song perception and production in songbirds. Thus, both listening to and performing music shared partially the same genes as those affected in songbird singing. It is noteworthy that although the brains of songbirds are small they have a double density neuron structure compared to primate brains of the same mass. Thus, the large number of neurons can contribute to the neural basis of cognitive capacity (Enard, 2016).

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

448   irma järvelä In both listening to and performing music (Kanduri, Kuusi, et al.,  2015; Kanduri, Raijas, et al., 2015), one of the strongest activations was detected in the alfa-synuclein gene (SNCA), which has a physiological role in the development of nerve cells and releases neurotransmitters, especially dopamine from the presynaptic cells. Dopamine is responsible for motor functions and genes known to affect growth and the plasticity of nerve cells and the inactivation of genes affecting neurodegeneration (Kanduri, Raijas, et al., 2015). SNCA is located in the best linkage region of musical aptitude on chromosome 4q22.1 and is regulated by GATA2 residing at 3q21, the region with the most significant association with musical aptitude thus linking the results of the GWA study and transcriptional profiling studies to the same locus (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015; Oikkonen et al., 2015) (Fig. 3). GATA2 is abundantly expressed in dopaminergic neurons and binds to intron-1 of endogenous neuronal SNCA to regulate its expression. The results are in agreement with neurophysiological studies where increases in endogenous dopamine have been detected in the striatum when listening to music (Blood & Zatorre, 2001). Interestingly, SNCA is a causative gene for Parkinson’s disease (with disturbed dopamine metabolism) (Petrucci, Ginevrino, & Valente, 2016) and variations in SNCA predispose to Lewy-body dementia (Peuralinna et al., 2008). Listening to music and music performance had partially different effects on gene expression. Some genes such as ZNF223 and PPP2R3A were downregulated after music listening but upregulated after music performance (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015). ZNF223 is a zinc-finger transcription regulator and similar to an immediate early response gene (IEG) ZNF225 (also known as ZENK, EGR1) that regulates chr4 (q21.23-q23)

12

Scale chr4: Oikkonen, 2014

5 Mb 90,000,000 Pulli, 2008

24 25 26

95,000,000

hg19

100,000,000 Park, 2012

SNCA

GATA2 5’

SNCA

3’

Regulates transcription

Figure 3.  The results of DNA- and RNA-studies of music-related traits converge at chromosome 4q22. The alpha-synuclein gene (SNCA) upregulated by listening to music (Kanduri, Raijas, et al., 2015) and music performance by professional musicians (Kanduri, Kuusi, et al., 2015) is located at the most significant region of musical aptitude (Oikkonen et al., 2015; Park et al., 2012; Pulli et al., 2008) and regulated by GATA2, associated with musical aptitude (Oikkonen et al., 2015). Reproduced from Irma Järvelä, Genomics studies on musical aptitude, music perception, and practice, Annals of the New York Academy of Sciences, Special Issue: The Neurosciences and Music 6, p. 4, Fig. 2, doi:10.1111/nyas.13620, Copyright © 2018, New York Academy of Sciences.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   449 the song control system of songbirds (Dong & Clayton, 2008). PPP2R3A, abundantly expressed in the striatum, is known to integrate the effects of dopamine and other neurotransmitters (Ahn et al.,  2007). Other IEGs such as FOS and DUSP1 that are known to be responsible for the song control nuclei of songbirds were upregulated only after music performance (Kanduri, Kuusi, et al., 2015), but not music listening (Kanduri, Raijas, et al., 2015). Many other song perception-related genes in songbirds like RGS2 were found to be differentially regulated after listening to music, but not after music performance (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015). The reasons for the differences are plausibly due, for example, to different types of musical activity and different study subjects. At the molecular level, auditory perception processes have been shown to exhibit convergent evolution across species (Sotomayor et al., 2012; Zhang et al., 2014). Among them is protocadherin15 (PCDH15), also found in human genome-wide association study of musical aptitude (Oikkonen et al., 2015). Also, gene expression specializations have been detected in the regions of the brain that are essential for auditory perception and production, both in humans and songbirds (Pfenning et al.,  2014; Salimpoor et al., 2011; Whitney et al., 2014).

Convergent Analysis Integration of data from various species helps to prioritize genes most relevant to the phenotype. A rich literature exists about genes affecting the vocal learning of different species, especially songbirds (Clayton,  2013; Pfenning et al.,  2014) and recently, data have been gathered about candidate genes associated with human musical traits (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015; Liu et al., 2016; Oikkonen et al., 2015; Park et al.,  2012; Pulli et al.,  2008). When ranking the hitherto known data about the candidate genes found in musical aptitude, music listening, and performance with genes identified in vocalizing animal species, data about brain and tissue-specific molecules and pathways can be utilized, which is not possible in human studies alone. Convergent analysis of genes identified in vocalizing animals and human music-related traits revealed that the most common candidate genes were activity dependent immediate early genes (IEGs) including EGR1, FOS, ARC, BDNF, and DUSP1 (Oikkonen et al., 2016). IEGs respond to sensory and motor stimuli in the brain. Of these, EGR1 is widely expressed in brain regions that affect cognition, emotional response, and sensitivity to reward in the rat (Duclot & Kabbaj, 2017). EGR1 is upregulated by song perception and production in songbirds (Avey, Kanyo, Irwin, & Sturdy, 2008; Drnevich, et al., 2012). Interestingly, EGR1 is the only highly ranked gene in all human phenotypes like music listening, music performing, and musical aptitude. In contrast, PHIP, noradrenalin, and NR4A2 were ranked among the top molecules in the whole sample as well as within music listening studies, but not within music performance (e.g., singing) related studies, whereas DUSP1, PKIA, and DOPEY2 were the top genes specifically in

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

450   irma järvelä music practice. These results support at least partially different molecular backgrounds for music-related processes. FOS and DUSP1 were activated when professional musicians played a concert (Kanduri, Kuusi, et al., 2015). Other candidate genes like FOXP2 and GRIN2B have been shown to be critical for vocal communication in songbirds (Haesler et al.,  2004) and cognitive development, including speech in humans (Hu, Chen, Myers, Yuan, & Traynelis, 2016), and they are located in the selection regions for musical aptitude (Liu et al., 2016). There are still limitations in comparative studies as the avian genomes contain only ~70 percent of the number of the human genes (Zhang et al., 2014). Convergent evidence for genes involved in functions like cognition, learning, and memory has been reported in music-related activities (Oikkonen et al., 2016). Several pathways were identified describing the interaction and function of the identified genes. Among them, the CDK5 signaling pathway regulates cognitive functions in the brain. Interestingly, the MEK gene, a member of the CDK5 signaling pathway is necessary for song learning in songbirds (London & Clayton, 2008). There is a partially shared genetic predisposition for musical abilities and general cognition (Mosing et al., 2014; Mosing, Madison, Pedersen, & Ullén,  2016). Human cognitive capacity has evolved rapidly, therefore it is highly likely that human-specific pathways and genes may underlie human musical abilities. Obviously, cognition-related genes are a plausible group of candidate genes for elucidating the more recent evolution of music-related traits.

Biological Background of Creative Activities in Music Creativity in music is an essential part of the development of music culture and industry. Creative activity in music, composing, improvising, or arranging, is common (Oikkonen et al.,  2016). Some evidence for the biological basis of creativity in music has been obtained from brain imaging studies where composing (Brown, Martinez, & Parsons, 2006) and improvising musical pieces are shown to affect several regions in the brain such as the medial prefrontal cortex, premotor areas, and auditory cortex (Dietrich & Kanso, 2010; Limb & Braun, 2008; Liu et al., 2012). Listening to music has been shown to increase dopamine in a human PET study (Salimpoor, 2011). So far, genomics approaches have been applied rarely to creative activities in general or specifically in musical activities. Dopaminergic genes appeared to be upregulated in genomic studies of musical aptitude and related traits (Kanduri, Kuusi, et al.,  2015; Kanduri, Raijas, et al., 2015; Oikkonen et al., 2015), and some of them, such as FOS and FOXP2 have also been found in songbirds (Murugan, Harward, Scharff, & Mooney, 2013; Nordeen, Holtzman, & Nordeen, 2009). The dopamine D4 receptor gene (DRD4) is an interesting candidate for creativity in music. It mediates dopamine signaling at the neuronal synapses. Two of its signaling variants (7R and 2R) have been associated not

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   451 only with novelty seeking and altruism, but also financial risk taking and heavy drinking: This kind of behavior can be seen as a sensitivity toward the influences from the envir­onment (Kitayama, King, Hsu, Liberzon, & Yoon, 2016). It may be that the carriers of the 7R/2R variants have the capacity to adopt new ways of behavior such as to create new music (Kitayama et al., 2016) (Fig. 4). This may serve as an example of how genetic variants can affect cultural evolution (Kim & Sasaki, 2014). In fact, many composers are known to have composed music describing actual occurrences in society. Based on a large epidemiological study from Sweden, individuals in creative professions were more likely to suffer from bipolar disorder (Kyaga et al., 2011) and creative professions were overrepresented among the first-degree relatives of patients with neuro­psychiatric disorders (e.g., schizophrenia and bipolar disorder) indicating familial co-segregation of creativity and neuropsychiatric disorders (Kyaga et al., 2013). When the known genes and alleles associated with neuropsychiatric disorders (Lee, Ripke, Neale, & Cross-Disorder Group, 2013; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014) were analyzed among artistic professions including musicians, the risk alleles were more prevalent in the artistic professions (Power et al., 2015). When professional musicians played a traditional classic concert, several genes reported to be mutated in neuropsychiatric or neurodegenerative diseases were affected (Kanduri, Kuusi, et al., 2015). This finding may reflect creative activities plausibly linked to music performance. Thus, molecular genetic studies give evidence that artistic creativity and neuropsychiatric disorders are partially shared by the same predisposing genetic variants. Creativity is likely rewarding, whereas diseases cause suffering. It is currently not known which of the numerous risk alleles of neuropsychiatric disorders are required and which are the individual family and environmental protective or risk factors that underlie complex phenotypes like creativity and neuropsychiatric disorders. Musical aptitude Singing of songbirds

Parkinson’s disease

Listening to music

alpha-synuclein = SNCA

Frontotemporal dementia -hallucinations

Performance of professional musicians Listening to music

DRD2: neuropsyciatric diseases, drug addiction

Dopamine

Motor functions Neuronal plasticity

Improves creative thinking and goal-directed working

Reward, aversion

DRD4: sensitivity to environmental influence - novelty seeking -risk taking

Figure 4.  Several music-related traits are found to be linked to dopaminergic metabolism.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

452   irma järvelä

Conclusion Empirical research on the biological background of music-related human traits has been introduced using genomics methods. Genes affecting inner ear development, dopaminergic systems, learning, and memory were found as candidate genes for musical aptitude, listening to and performing music. In addition, the several genes previously known to affect vocal learning of songbirds were identified as candidate genes for music perception and practice. Activity dependent immediate early genes (IEGs) were the most commonly ranked genes by humans and songbirds in convergent analysis. IEGs like EGR1 are critical mediators of gene–environment interactions characterized by rapid and dynamic responses to neuronal activity and reward-related synaptic plasticity (Duclot & Kabbaj, 2017), also reported in music-related studies (Salimpoor et al., 2011; Schneider et al., 2002). IEGs could thus serve as plausible candidate genes to mediate the effects of music as an environmental factor. Replication studies and studies using epigenomics methods are warranted to further elucidate the biological background of music-related traits.

References Ahn, J. H., Sung, J. Y., McAvoy, T., Nishi, A., Janssens, V., Goris, J., . . . Nairn, A. C. (2007). The B''/PR72 subunit mediates Ca2+-dependent dephosphorylation of DARPP-32 by protein phosphatase 2A. Proceedings of the National Academy of Sciences 104(23), 9876–9881. Alcock, K. J., Passingham, R. E., Watkins, K., & Vargha-Khadem, F. (2000). Pitch and timing abilities in inherited speech and language impairment. Brain & Language 75(1), 34–46. Ambroggi, F., Turiault, M., Milet, A., Deroche-Gamonet, V., Parnaudeau, S., Balado, E., . . . Tronche, F. (2009). Stress and addiction: Glucocorticoid receptor in dopaminoceptive neurons facilitates cocaine seeking. Nature Neuroscience 12(3), 247–249. Araki, M., Bandi, M. M., & Yazaki-Sukiyama, Y. (2016). Mind the gap: Neural coding of species identity in birdsong prosody. Science 354(6317), 1282–1287. Atik, T., Bademci, G., Diaz-Horta, O., Blanton, S.  H., & Tekin, M. (2017). Whole-exome sequencing and its impact in hereditary hearing loss. Genetics Research 97, e4. doi:10.1017/ S001667231500004X Avey, M. T., Kanyo, R. A., Irwin, E. L., & Sturdy, C. B. (2008). Differential effects of vocalization type, singer and listener on ZENK immediate early gene response in black-capped chickadees (Poecile atricapillus). Behavioural Brain Research 188(1), 201–208. Ayub, Q., Yngvadottir, B., Chen, Y., Xue, Y., Hu, M., Vernes, S. C., . . . Tyler-Smith, C. (2013). FOXP2 targets show evidence of positive selection in European populations. American Journal of Human Genetics 92(5), 696–706. Baharloo, S., Service, S. K., Risch, N., Gitschier, J., & Freimer, N. B. (2000). Familial aggregation of absolute pitch. American Journal of Human Genetics 67(3), 755–758. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicate in reward and emotion. Proceedings of the National Academy of Sciences 98(20), 11818–11823.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   453 Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in the brain: A PET study of the generation of melodies and sentences. European Journal of Neuroscience 23(10), 2791–2803. Cáceres, M., Lachuer, J., Zapala, M. A., Redmond, J. C., Kudo, L., Geschwind, D. H., . . . Barlow, C. (2003). Elevated gene expression levels distinguish human from non-human primate brains. Proceedings of the National Academy of Sciences 100(22), 13030–13035. Clayton, D. F. (2000). The genomic action potential. Neurobiology of Learning and Memory 74(3), 185–216. Clayton, D. F. (2013). The genomics of memory and learning in songbirds. Annual Review of Genomics and Human Genetics 14, 45–65. Di Pasquale, G., Davidson, B. L., Stein, C. S., Martins, I., Scudiero, D., Monks, A., & Chiorini, J. A. (2003). Identification of PDGFR as a receptor for AAV-5 transduction. Nature Medicine 9, 1306–1312. Dietrich, A., & Kanso, R. (2010). A review of EEG, ERP, and neuroimaging studies of creativity and insight. Psychological Bulletin 136(5), 822–848. Dixon-Salazar, T. J., & Gleeson, J. G. (2010). Genetic regulation of human brain development: Lessons from Mendelian diseases. Annals of the New York Academy of Sciences 1214, 156–167. Dong, S., & Clayton, D. F. (2008) Partial dissociation of molecular and behavioral measures of song habituation in adult zebra finches. Genes, Brain and Behavior 7(7), 802–809. Drayna, D., Manichaikul, A., de Lange, M., & Snieder, H. (2001). Genetic correlates of musical pitch recognition in humans. Science 291(5510), 1969–1972. Drnevich, J., Replogle, K. L., Lovell, P., Hahn, T. P., Johnson, F., Mast, T. G., . . . Clayton, D. F. (2012). Impact of experience-dependent and -independent factors on gene expression in  songbird brain. Proceedings of the National Academy of Sciences 109(Suppl. 2), 17245–17252. Duclot, F., & Kabbaj, M. (2017). The role of Early Growth Response 1 (EGR1) in brain plasticity and neuropsychiatric disorders. Frontiers in Behavioral Neuroscience 11, 35. Retrieved from https://doi.org/10.3389/fnbeh.2017.00035 Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–307. Enard, W. (2016). The molecular basis of human brain evolution. Current Biology 26(20), R1109–R1117. Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S., Wiebe, V., Kitano, T., . . . Pääbo, S. (2002). Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418(6900), 869–872. Fatemi, S. H., Folsom, T. D., Rooney, R. J., & Thuras, P. D. (2013). Expression of GABAA a2-, b1- and e-receptors are altered significantly in the lateral cerebellum of subjects with schizophrenia, major depression and bipolar disorder. Translational Psychiatry 3, e303. doi:10.1038/tp.2013.64 Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musicians. Journal of Neuroscience 23(27), 9240–9245. George, J. M., Jin, H., Woods, W. S., & Clayton, D. F. (1995). Characterization of a novel protein regulated during the critical period for song learning in the zebra finch. Neuron 15, 361–372. Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emotion recognition from music. Neuropsychologia 45(2), 236–244.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

454   irma järvelä Guo, Y.  P., Sun, X., Li, C., Wang, N.  Q., Chan, Y.  S., & He, J. (2007). Corticothalamic ­synchronization leads to c-fos expression in the auditory thalamus. Proceedings of the National Academy of Sciences 104(28), 11802–11807. Haesler, S., Wada, K., Nshdejan, A., Morrisey, E. E., Lints, T., Jarvis, E. D., & Scharff, C. (2004). FoxP2 expression in avian vocal learners and non-learners. Journal of Neuroscience 24(13), 3164–3175. Hambrick, D. Z., Oswald, F. L., Altmann, E. M., Meinz, E. J., Gobet, F., & Campitelli, G. (2014). Deliberate practice: Is that all it takes to become an expert? Intelligence 45, 34–45. Haugas, M., Lilleväli, K., Hakanen, J., & Salminen, M. (2010). Gata2 is required for the development of inner ear semicircular ducts and the surrounding perilymphatic space. Developmental Dynamics 239(9), 2452–2469. Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Hertel, N., Redies, C., & Medina, L. (2012). Cadherin expression delineates the divisions of the postnatal and adult mouse amygdala. Journal of Comparative Neurology 520(17), 3982–4012. Hilliard, A. T., Miller, J. E., Fraley, E. R., Horvath, S., & White, S. A. (2012). Molecular microcircuitry underlies functional specification in a basal ganglia circuit dedicated to vocal learning. Neuron 73(3), 537–552. Honing, H., Ten Cate, C., Peretz, I., & Trehub, S. E. (2015). Without it no music: Cognition, biology and evolution of musicality. Philosophical Transactions of the Royal Society of London B: Biological Sciences 370(1664): 20140088. doi:10.1098/rstb.2014.0088 Horita, H., Kobayashi, M., Liu, W.-C., Oka, K., Jarvis, E. D., & Wada, K. (2012). Specialized motor-driven dusp1 expression in the song systems of multiple lineages of vocal learning birds. PLoS ONE 7, e42173. Hu, C., Chen, W., Myers, S. J., Yuan, H., & Traynelis, S. F. (2016). Human GRIN2B variants in neurodevelopmental disorders. Journal of Pharmacological Sciences 132(2), 115–121. Justus, T., & Hutsler, J. J. (2005). Fundamental issues in the evolutionary psychology of music: Assessing innateness and domain specificity. Music Perception 23(1), 1–27. Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A. K., Lähdesmäki, H., & Järvelä, I. (2015).The effect of music performance on the transcriptome of professional musicians. Scientific Reports 5, 9506. doi:10.1038/srep09506 Kanduri, C., Raijas, P., Ahvenainen, M., Philips, A. K., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä, I (2015). The effect of listening to music on human transcriptome. PeerJ 3, e830. Retrieved from https://doi.org/10.7717/peerj.830 Karma, K. (1994). Auditory and visual temporal structuring: How important is sound to musical thinking? Psychology of Music 22(1), 20–30. Katori, S., Hamada, S., Noguchi, Y., Fukuda, E., Yamamoto, T., Yamamoto, H., . . . Yagi, T. (2009). Protocadherin-alpha family is required for serotonergic projections to appropriately innervate target brain areas. Journal of Neuroscience 29(29), 9137–9147. Katz, E., Elgoyhen, A.  B., Gómez-Casati, M.  E., Knipper, M., Vetter, D.  E., Fuchs, P.  A., & Glowatski, E. (2004). Developmental regulation of nicotinic synapses on cochlear inner hair cells. Journal of Neuroscience 24(36), 7814–7820. Kauppi, K., Nilsson, L.-G., Adolfsson, R., Eriksson, E. & Nyberg, L. (2011). KIBRA polymorph­ism is related to enhanced memory and elevated hippocampal processing. Journal of Neuroscience 31, 14218–14222. Kim, H. S., & Sasaki, J. Y. (2014). Cultural neuroscience: Biology of the mind in cultural contexts. Annual Review of Psychology 65, 487–514.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   455 Kitayama, S., King, A., Hsu, M., Liberzon, I., & Yoon, C. (2016). Dopamine-system genes and cultural acquisition: The norm sensitivity hypothesis. Current Opinion in Psychology 8, 167–174. Koelsch, S. (2010). Towards a neural basis of music-evoked emotions. Trends in Cognitive Sciences 14(3), 131–137. Kyaga, S., Landén, M., Boman, M., Hultman, C. M., Långström, N., & Lichtenstein, P. (2013). Mental illness, suicide and creativity: 40-year prospective total population study. Journal of Psychiatric Research 47(1), 83–90. Kyaga, S., Lichtenstein, P., Boman, M., Hultman, C., Långström, N., & Landén, M. (2011). Creativity and mental disorder: Family study of 300,000 people with severe mental disorder. British Journal of Psychiatry 199(5), 373–379. Lahti, L., Achim, K., & Partanen, J. (2013). Molecular regulation of GABAergic neuron differentiation and diversity in the developing midbrain. Acta Physiologica (Oxford) 207(4), 616–627. Lai, C. S., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F., & Monaco, A. P. (2001). A forkheaddomain gene is mutated in a severe speech and language disorder. Nature 413 (6855), 519–523. Lander, E.  S. (2011). Initial impact of the sequencing of the human genome. Nature 470, 187–197. Lee, S.  H., Ripke, S., Neale, B.  M., & Cross-Disorder Group of the Psychiatric Genomics Consortium (2013). Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature Genetics 45(9), 984–994. Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical performance: An FMRI study of jazz improvisation. PLoS ONE 3, e1679. Lin, J., Yan, X., Wang, C., Guo, Z., Rolfs, A., & Luo, J. (2012). Anatomical expression patterns of delta-protocadherins in developing chicken cochlea. Journal of Anatomy 221(6), 598–608. Lindor, N.  M., Thibodeau, S., & Burke, W. (2017). Whole-genome sequencing in healthy people. Mayo Clinic Proceedings 92(1), 159–172. Lipkind, D., & Tchernichovski, O. (2011). Colloquium paper: Quantification of developmental birdsong learning from the subsyllabic scale to cultural evolution. Proceedings of the National Academy of Sciences 108(Suppl. 3), 15572–15579. Liu, S., Chow, H. M., Xu, Y., Erkkinen, M. G., Swett, K. E., Eagle, M. W., . . . Braun, A. R. (2012). Neural correlates of lyrical improvisation: An fMRI study of freestyle rap. Scientific Reports 2, 834. doi:10.1038/srep00834 Liu, X., Kanduri, C., Oikkonen, J., Karma, K., Raijas, P., Ukkola-Vuoti, L., . . . Järvelä, I. (2016). Detecting signatures of positive selection associated with musical aptitude in the human genome. Scientific Reports 6, 21198. doi:10.1038/srep21198 London, S.  E., & Clayton, D.  F. (2008). Functional identification of sensory mechanisms required for developmental song learning. Nature Neuroscience 11(5), 579–586. Martínez-Molina, N., Mas-Herrero, E., Rodríguez-Fornells, A., Zatorre, R.  J., & MarcoPallarés, J. (2016). Neural correlates of specific musical anhedonia. Proceedings of the National Academy of Sciences 113(46), E7337–E7345. Metz, M., Gassmann, M., Fakler, B., Schaeren-Wiemers, N., & Bettler, B. (2011). Distribution of the auxiliary GABAB receptor subunits KCTD8, 12, 12b, and 16 in the mouse brain. Journal of Comparative Neurology 519(8), 1435–1454. Montealegre-Z, F., Jonsson, T., Robson-Brown, K.  A., Postles, M., & Robert, D. (2012). Convergent evolution between insect and mammalian audition. Science 338(6109), 968–971.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

456   irma järvelä Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not make perfect: No causal effect of music practice on music ability. Psychological Science 25(9), 1795–1803. Mosing, M.  A., Madison, G., Pedersen, N.  L., & Ullén, F. (2016). Investigating cognitive transfer within the framework of music practice: Genetic pleiotropy rather than causality. Developmental Science 19(3), 504–512. Murugan, M., Harward, S., Scharff, C., & Mooney, R. (2013). Diminished FoxP2 levels affect dopaminergic modulation of corticostriatal signaling important to song variability. Neuron 80(6), 1464–1476. Nakahara, H., Masuko, T., Kinoshita, H., Francis, P. R., & Furuya, S. (2011). Performing music can induce greater modulation of emotion-related psychophysiological responses than listening to music. International Journal of Psychophysiology 81(3), 152–158. Nordeen, E. J., Holtzman, D. A., & Nordeen, K. W. (2009). Increased Fos expression among midbrain dopaminergic cell groups during birdsong tutoring. European Journal of Neuroscience 30(4), 662–670. Oikkonen, J., Huang, Y., Onkamo, P., Ukkola-Vuoti, L., Raijas, P., Karma, K., . . . Järvelä, I. (2015). A genome-wide linkage and association study of musical aptitude identifies loci containing genes related to inner ear development and neurocognitive functions. Molecular Psychiatry 20(2), 275–282. Oikkonen, J., & Järvelä, I. (2014). Genomics approaches to study musical aptitude. Bioessays 36(11), 1102–1108. Oikkonen, J., Onkamo, P., Järvelä, I., & Kanduri, C. (2016). Convergent evidence for the molecular basis of musical traits. Scientific Reports 6, 39707. doi:10.1038/srep39707 Ousdal, O. T., Anand Brown, A., Jensen, J., Nakstad, P. H., Melle, I., Agartz, I., . . . Andreassen, O.  A. (2012). Associations between variants near a monoaminergic pathways gene (PHOX2B) and amygdala reactivity: A genome-wide functional imaging study. Twin Research and Human Genetics 15(3), 273–285. Park, H., Lee, S., Kim, H.  J., & Ju, Y.  S. (2012). Comprehensive genomic analyses associate UGT8 variants with musical ability in a Mongolian population. Journal of Medical Genetics 49(12), 747–752. Parker, J., Tsagkogeorga, G., Cotton, J. A., Liu, Y., Provero, P., Stupka, E., & Rossiter, S. J. (2013). Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470), 228–231. Penhune, V., & de Villers-Sidani, E. (2014). Time for new thinking about sensitive periods. Frontiers in Systems Neuroscience 8, 55. Retrieved from https://doi.org/10.3389/fnsys.2014.00055 Perani, D., Saccuman, M.  C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., . . . Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107(10), 4758–4763. Peretz, I., Cummings, S., & Dube, M. P. (2007). The genetics of congenital amusia (tone deafness): A family-aggregation study. American Journal of Human Genetics 81(3), 582–588. Petrucci, S., Ginevrino, M., & Valente, E. M. (2016). Phenotypic spectrum of alpha-synuclein mutations: New insights from patients and cellular models. Parkinsonism & Related Disorders 22(Suppl. 1), S16–S20. Peuralinna, T., Oinas, M., Polvikoski, T., Paetau, A., Sulkava, R., Niinistö, L., . . . Myllykangas, L. (2008). Neurofibrillary tau pathology modulated by genetic variation of alpha-synuclein. Annals of Neurology 64(3), 348–352.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

genomics approaches for studying musical aptitude   457 Pfenning, A. R., Hara, E., Whitney, O., Rivas, M. V., Wang, R., Roulhac, P. L., . . . Jarvis, E. D. (2014). Convergent transcriptional specializations in the brains of humans and songlearning birds. Science 346(6215), 1256846–1256846. Power, R. A, Steinberg, S., Bjornsdottir, G., Rietveld, C. A., Abdellaoui, A., Nivard, M. M., . . . Steffanson, K. (2015). Polygenic risk scores for schizophrenia and bipolar disorder predict creativity. Nature Neuroscience 18(7), 953–955. Pulli, K., Karma, K., Norio, R., Sistonen, P., Göring, H. H., & Järvelä, I. (2008). Genome-wide linkage scan for loci of musical aptitude in Finnish families: Evidence for a major locus at 4q22. Journal of Medical Genetics 45(7), 451–456. Qian, W., Deng, L., Lu, D., & Xu, S. (2013). Genome-wide landscapes of human local adaptation in Asia. PLoS ONE 8, e54224. Rothenberg, D., Roeske, T.  C., Voss, H.  U., Naguib, M., & Tchernichovski, O. (2014). Investigation of musicality in birdsong. Hearing Research 308, 71–83. Sabeti, P. C., Schaffner, S. F., Fry, B., Lohmueller, J., Varilly, P., Shamovsky, O., . . . Lander, E. S. (2006). Positive natural selection in the human lineage. Science 312(5780), 1614–1620. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14(2), 257–262. Salimpoor, V.  N., Benovoy, M., Longo, G., Cooperstock, J.  R., & Zatorre, R.  J. (2009). The rewarding aspects of music listening are related to degree of emotional arousal. PLoS ONE 4, e7487. Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Scherzer, C. R., Grass, J. A., Liao, Z., Pepivani, I., Zheng, B., Eklund, A. C., . . . Schlossmacher, M. G. (2008). GATA transcription factors directly regulate the Parkinson’s disease-linked gene alpha-synuclein. Proceedings of the National Academy of Sciences 105(31), 10907–10912. Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511(7510), 421–427. Schneider, P., Scherg, M., Dosch, H.  G., Specht, H.  J., Gutschalk, A., & Rupp, A. (2002). Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians. Nature Neuroscience 5(7), 688–694. Seashore, C., Lewis, D., & Saetveit, J. (1960). Seashore measures of musical talents. New York: Psychological Corporation. Sotomayor, M., Weihofen, W. A., Gaudet, R., & Corey, D. P. (2012). Structure of a force-conveying cadherin bond essential for inner-ear mechanotransduction. Nature 492(7427), 128–132. Theusch, E., Basu, A., & Gitschier, J. (2009). Genome-wide study of families with absolute pitch reveals linkage to 8q24.21 and locus heterogeneity. American Journal of Human Genetics 85(1), 112–119. Ukkola-Vuoti, L., Kanduri, C., Oikkonen, J., Buck, G., Blancher, C., Raijas, P., . . . Järvelä, I. (2013). Genome-wide copy number variation analysis in extended families and unrelated individuals characterized for musical aptitude and creativity in music. PLoS ONE 8, e56356. Vernes, S.  C., Spiteri, E., Nicod, J., Groszer, M., Taylor, J.  M., Davies, K.  E., . . . Fisher, S.  E. (2007). High-throughput analysis of promoter occupancy reveals direct neural targets of FOXP2, a gene mutated in speech and language disorders. American Journal of Human Genetics 81(6), 1232–1250.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

458   irma järvelä White, E. J., Hutka, S. A., Williams, L. J., & Moreno, S. (2013). Learning, neural plasticity and sensitive periods: Implications for language acquisition, music training and transfer across the lifespan. Frontiers in Systems Neuroscience 7, 90. Retrieved from https://doi.org/10.3389/ fnsys.2013.00090 Whitney, O., Pfenning, A. R., Howard, J. T., Blatti, C. A., Liu, F., Ward, J. M., . . . Jarvis, E. D. (2014). Core and region-enriched networks of behaviorally regulated genes and the singing genome. Science 346(6215), 1256780. Wood, W. E., Olson, C. R., Lovell, P. V., & Mello, C. V. (2008). Dietary retinoic acid affects song maturation and gene expression in the song system of the zebra finch. Developmental Neurobiology 68(10), 1213–1224. Zhang, G., Li, C., Li, Q., Li, B., Larkin, D. M., Lee, C., . . . Wang, J. (2014). Comparative genomics reveals insights into avian genome evolution and adaptation. Science 346(6215), 1311–1320.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

chapter 19

Br a i n R ese a rch i n M usic Per for m a nce Eckart Altenmüller, Shinichi Furuya, Daniel S. Scholz, and Christos I. Ioannou

Introduction Music performance is based on extensive training and playing experience. It provides an excellent model for studying changes in brain functions and structures along with increasing expertise, a phenomenon usually referred to as plasticity of the human brain. Especially in professional musicians demands placed on the nervous system by music performance are very high and provide a uniquely rich multisensory and motor experience to the player. As confirmed by neuroimaging studies, playing music depends on a strong coupling of perception and action mediated by sensory, motor, and multimodal integration areas distributed throughout the brain. A pianist, for example, must draw on a whole set of complex skills, including translating visual analysis of musical notation into motor actions, coordinating multisensory information with bimanual motor activity, developing fine motor skills in both hands coupled with metric precision, and monitoring auditory feedback to fine-tune a performance as it progresses. In this chapter, we summarize research on the effects of musical training on brain function, brain connectivity, and brain structure. First, we address factors inducing and continuously driving brain plasticity in dedicated musicians, arguing that prolonged goaldirected practice, multisensory–motor integration, high arousal, and emotional and social rewards contribute to these plasticity-induced brain adaptations. Subsequently, we briefly review the neuroanatomy and neurophysiology underpinning musical activities by focusing on the perception of sound, integration of sound and movement, and the physiology of motor planning and motor control. Further down, we review the literature on functional changes in brain activation and brain connectivity along with the acquisition of musical skills. In the following section, we focus on structural adaptions

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

460    eckart altenmüller et al. in the gray matter of the brain and in fiber tract density associated with music learning. We critically discuss the findings that structural changes are mostly seen when starting musical training after the age of 7 years, whereas functional optimization is more effective before this age. Finally, we briefly address the phenomenon of de-expertise, reviewing studies which provide evidence that intensive music-making can induce dysfunctional changes which are accompanied by a degradation of skilled motor behavior, also termed “musician’s dystonia” (see Peterson & Altenmüller, this volume). This condition, which is frequently highly disabling, mainly affects male classical musicians with a history of compulsive working behavior, anxiety disorder, or chronic pain. We conclude with a concise summary of the role of brain plasticity, meta-plasticity, and maladaptive plasticity in the acquisition and loss of musicians’ expertise.

Performing Music as a Driver of Brain Plasticity Performing music at a professional level is one of the most demanding and fascinating human experiences. Singing and playing an instrument involve the precise execution of very fast and, in many instances, extremely complex movements that must be structured and coordinated with continuous auditory, somatosensory, and visual feedback. Furthermore, it requires retrieval of musical, motor, and multisensory information from both short-term and long-term memory and relies on continuous planning of an ongoing performance in working memory. The consequences of motor actions have to be anticipated, monitored, and adjusted almost in real-time (Brown, Penhune, & Zatorre, 2015). At the same time, music should be expressive, requiring the performance to be enriched with a complex set of innate and acculturated emotional gestures. Practice is required to develop all of these skills and to execute these complex tasks. Ericsson and colleagues (Ericsson, Krampe, & Tesch-Römer, 1993) undertook one of the most influential studies on practice, with students at the Berlin Academy of Music. They considered not only time invested in practice but also quality of practice, and proposed the concept of “deliberate practice” as a prerequisite for attaining excellence. Deliberate practice combines goal-oriented, structured, and effortful practicing with motivation, resources, and focused attention. Ericsson and colleagues argued that a major distinction between professional and amateur musicians, and generally between more successful versus less successful learners, is the amount of deliberate practice undertaken during the many years required to develop instrumental skills to a high level (Ericsson & Lehmann, 1996). Extraordinarily skilled musicians therefore exert a great deal more effort and concentration during their practice than less skilled musicians, and are more likely to plan, imagine, monitor, and control their playing by focusing their attention on what they are practicing and how it can be improved. Furthermore, they can be eager to build up a network of supportive peers, frequently involving family and friends.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

brain research in music performance   461 The concept of deliberate practice has been refined since it became clear that not only the amount of deliberate practice, but also the point in life at which intense goal-directed practice begins are important variables. In the auditory domain, for example, critical periods—“windows of opportunity”—exist for the acquisition of so-called “absolute” or “perfect” pitch. Absolute pitch denotes the ability to name pitches without a reference pitch. It is mediated by auditory long-term memory and is strongly linked to intense early musical experience, usually before the age of 7 years (Baharloo, Johnston, Service, Gitschier, & Freimer, 1998; Miyazaki, 1988; Sergeant, 1968). However, genetic predisposition may play a role since absolute pitch is more common in certain East Asian populations and may run in families (Baharloo, Service, Risch, Gitschier, & Freimer, 2000; Gregersen, Kowalsky, Kohn, & Marvin, 2001). In the sensorimotor domain, early practice before age 7 years leads to optimized and more stable motor programs (Furuya, Klaus, Nitsche, Paulus, & Altenmüller, 2014) and to smaller yet more efficient neuronal networks, compared to practice commencing later in life (Vaquero et al., 2016). This means that for specific sensorimotor skills, such as fast and independent finger movements, sensitive periods exist during development and maturation of the central nervous system, comparable to those for auditory and somatosensory skills (Ragert, Schmidt, Altenmüller, & Dinse, 2003). The issue of nature vs. nurture, or genetic predisposition vs. environmental influences and training in musical skills is complex, since the success of training is itself subject to genetic variability. General observation suggests that outcomes will not be identical for all individuals receiving the same amount of training. Evidence supporting the contribution of pre-existing individual differences comes from a large Swedish twin study showing that the propensity to practice is partially heritable (Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014). Corrigall and colleagues investigated the contribution of cognitive and personality variables to music training, showing that those who engage in music perform better on cognitive tasks, have better educated parents, and describe themselves as more “open to experience” on personality scales (Corrigall, Schellenberg, & Misura, 2013). Findings are also beginning to accumulate in the music performance domain, indicating that learning outcomes can be predicted in part based on pre-existing structural or functional brain features (Herholz, Coffey, Pantev, & Zatorre, 2016). A convincing example of dysfunctional genetic predisposition is the inability to acquire auditory skills in congenital amusia; a hereditary condition characterized by absent or highly deficient pitch perception (Gingras, Honing, Peretz, Trainor, & Fisher, 2015). In the sensorimotor domain, musician’s dystonia, the loss of motor control in skilled movements while playing an instrument, has a strong genetic background in about one-third of affected musicians (Schmidt et al., 2009). On the other hand, training is clearly necessary for musical expertise, with a large number of researchers reporting that the length of musical experience is strongly correl­ ated with performance on a range of musical tasks, as well as with brain function and structure (Amunts et al., 1997; Bengtsson et al., 2005; Bermudez, Lerch, Evans, & Zatorre, 2008; Chen, Penhune, & Zatorre, 2008a; Oechslin, Imfeld, Loenneker, Meyer, & Jäncke, 2010). Predispositions and experience contribute to musical expertise, and the relative

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

462    eckart altenmüller et al. balance between the two factors may differ in specific aspects of the many different musical subskills. Furthermore, it seems that there exist early sensitive periods during which musical stimulation or training of subskills has to take place in order to establish fertile ground for growing extraordinary expertise later in life. This is best illustrated by the scaffold metaphor (Steele, Bailey, Zatorre, & Penhune, 2013). An early start to training develops the “scaffold” for building a “skyscraper-like” level of expertise later in life, whereas a late start of training allows only for moderate results even after long and intense training. Of course these scaffolds may differ from one domain to the next. For example, an outstanding virtuoso like the legendary pianist Lang Lang, known for his breathtaking finger dexterity, may require both highly relevant inherited traits and intense early sensorimotor training. Other musicians such as the late French singer Edith Piaf, known for her emotional expressivity but somehow lacking in technique, may have started technical exercises late in life but had genetic and biographical conditions allowing her to build up emotional depth, a character trait we feel and value, despite the difficulty in operationalizing it for precise study. Performing music at a professional level relies on a range of subskills, which are represented in different, though overlapping brain networks. Auditory skills such as the abovementioned perfect pitch, sensitivity to timing variations (e.g., “groove”) and to micro-pitches (e.g., tuning of a violin), or auditory long-term memory (e.g., memorizing a 12-tone series), are mainly processed in the temporal lobes of both hemispheres with a right hemisphere bias (Zatorre, 2001). However, signs of auditory and musical expertise can already be detected in the ascending auditory pathway at the brainstem level (Skoe & Kraus, 2013). Sensorimotor skills, such as low two-point discrimination thresholds (the ability to discern that two nearby objects touching the skin are two distinct points) and high tactile sensitivity (e.g., left fifth finger in professional violinists), bimanual or quadrupedal coordination (e.g., for piano and organ playing), fast finger movements (e.g., right hand arpeggios on the classical guitar), or complex hand postures (e.g., left hand on the electric guitar), are represented in premotor, motor, and parietal cortical areas, and in subcortical brain structures such as the basal ganglia and the cerebellum (Altenmüller & Furuya, 2015). Emotional and performance skills are supported by individualized prefrontal and orbitofrontal cortical regions and in the limbic system. Selfmonitoring, anticipation of the consequences of one’s actions, motivation, and focusing attention (all contributing to goal-directed “deliberate” practice), recruit a highly diverse network, including lateral prefrontal cortices, parietal cortices, limbic structures, and particularly motivational pathways, including the accumbens nucleus, and memory structures such as the hippocampus (Zatorre & Salimpoor, 2013). All of these regions and the interconnecting nerve fibers are subject to modifications in function and structure in association with musical practice, a phenomenon which is based on brain plasticity. Brain plasticity denotes the general ability of our central nervous system to adapt throughout the lifespan to changing environmental conditions, body biomechanics, and new tasks. Brain plasticity is most typically observed for complex tasks with high behavioral relevance activating circuits involved in emotion, motivation, and reward. The continued activities of accomplished musicians are ideal for providing the prerequisites of

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

brain research in music performance   463 brain plasticity (for a review see Schlaug, 2015). In musical expertise, the abovementioned processes are accompanied by changes in the function of the brain’s neuronal networks, as a result of a strengthening of synaptic connections, and in changes of its gross structure. With respect to mechanisms and microstructural effects of plasticity, our understanding of the molecular and cellular processes underlying these adaptations is far from complete. Brain plasticity may occur on different time scales. For example, the efficiency and size of synapses may be modified in a time window of seconds to minutes, while the growth of new synapses and dendrites may require hours to days. An increase in gray matter density, which mainly reflects an enlargement of neurons due to increased metabolism, needs at least several weeks. White matter density also increases as a consequence of musical training. This effect is primarily due to an enlargement of myelin cells which wrap around the nerve fibers (axons) and dendrites, greatly contributing to the velocity of the electrical impulses traveling along them. Under conditions requiring rapid information transfer and high temporal precision these myelin cells adapt by growing, and as a consequence nerve conduction velocity increases. Finally, brain regions involved in specific tasks may be enlarged after long-term training due to the growth of structures supporting nervous function, for example, in the blood vessels that are necessary for oxygen and glucose transportation (for a comprehensive review see Taubert, Villringer, & Ragert, 2012). There are four main reasons why we believe that these effects on brain plasticity are more pronounced in music performance than in other skilled activities. First, the intensity of goal-directed training is extremely high; students admitted to a German state conservatory have spent an average of 10 years and 10,000 hours of deliberate practice in order to pass the demanding entrance examinations (Ericsson et al., 1993). Second, related to the above, musical training in those individuals who later become professional musicians usually starts very early, sometimes before age 6 years when the adaptability of the central nervous system is at its highest. Third, musical activities are strongly linked to conditions of high arousal and positive emotions, but also to stressors such as music performance anxiety. Neuroactive hormones, such as adrenalin (arousal), endorphins (joy), dopamine (rewarding experience), and stress hormones (fear of f­ailure) support neuroplastic adaptations. Fourth, performing music in public is ­frequently accompanied by strong social feelings best described as a sense of connectedness and meaning. As a consequence, increased release of oxytocin and serotonin will similarly enhance plastic adaptations (Zatorre & Salimpoor, 2013). However, we should be careful in claiming that music produces more prominent plastic adaptations in the brain compared to other skilled activities as the methodology of group comparisons in brain plasticity research might produce a bias. For example, group investigations into professional classical pianists compared to “non-musicians,” such as in our study by Vaquero et al. (2016), might be influenced by differences in sample homogeneity. As opposed to many skilled activities, such as playing golf or other sports or other creative professions, such as writing or painting, classical pianists experience from a very young age similar acculturation and take part in highly homogeneous activities due to the canonical nature of their training. The latter study similar

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

464    eckart altenmüller et al. etudes of Hanon, Czerny, and Chopin for many years, and this may well produce more uniform brain adaptations, which dominate any individual changes. In other pursuits such as the  visual arts, creative writing, architecture, jazz improvisation, and music composition, individualized training may produce more diverse effects that are masked in group statistics.

Brain Regions Involved in Performing Music: A Quick Overview Playing a musical instrument or singing at a professional level require highly refined auditory, sensorimotor, and emotional-communicative skills that are acquired over many years of extensive training, and that have to be stored and maintained through further regular practice. Auditory feedback is needed to improve and perfect per­ form­ance, and activity of emotion-related brain areas is required to render a performance vivid and touching. Performance-based music-making therefore relies primarily on a highly developed auditory–motor–emotion integration capacity, which is reflected on the one hand in increased neuronal connectivity and on the other hand in functional and structural adaptations of brain areas supporting these activities. In the following, we give a quick overview of the many brain regions involved in making music (for a review see Brown et al., 2015). Music perception involves primary and secondary auditory areas (A1, A2) and auditory association areas (AA) in the two temporal lobes. The primary auditory area, localized in the upper portion of the temporal lobe in Heschl’s gyrus receives its main input from the inner ears via the ascending auditory pathway. It is mainly involved in basic auditory processing such as pitch and loudness perception, perception of time structures, and spectral decomposition. The left primary auditory cortex is specialized in the rapid analysis of time structures, such as differences in voice onset times when articulating “da” or “ta.” The right, on the other hand, deals primarily with the spectral decomposition of sounds. The secondary auditory areas surround the primary area in a belt-like formation. More complex auditory features such as timbre are processed in the secondary auditory areas (Koelsch, 2011). Finally, in the auditory association areas, auditory gestalt perception takes place. Auditory gestalts can be understood, for example, as pitch-time patterns like melodies and words. In right-handers, and in about 95 percent of all lefthanders, Wernicke’s area in the left posterior portion of the upper temporal lobe is specialized in language decoding (Kraus, McGee, & Koch, 1998). In contrast to the early auditory processing of simple acoustic structures, listening to music is a far more complex task. Music is experienced not only as an acoustic structure over time, but also as patterns, associations, emotions, expectations, and so on. Such experiences rely on a complex set of perceptive, cognitive, and emotional operations. Integrated over time, and frequently linked to biographic memories, they enable us to

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

brain research in music performance   465 experience strong emotions, processed in structures of the limbic system such as the ventral tegmental area of the mesencephalon or the accumbens nucleus in the basal forebrain (Salimpoor et al., 2013). Memories and social emotions evoked during music listening and playing involve the hippocampus, deep in the temporal lobe, and the dorso­ lateral prefrontal cortex, mainly in the right hemisphere. Making music relies on voluntary skilled movements which involve four cortical regions in both hemispheres: the primary motor area (M1) located in the precentral gyrus directly in front of the central sulcus; the supplementary motor area (SMA) located anter­iorly to M1 of the frontal lobe and the inner (medial) side of the cortex; the cingulate motor area (CMA) below the SMA and above the corpus callosum on the inner (medial) side of the hemisphere; and the premotor area (PMA), which is located adjacent to the lateral aspect of the primary motor area (see Fig. 1). SMA, PMA, and CMA can be described as secondary motor areas, because they are used to process movement patterns rather than simple movements. In addition to cortical regions, the motor system includes the subcortical structures of the basal ganglia, and the cerebellum. Steady kinaesthetic feedback is also required to control any guided motor action and comes from the primary somatosensory area (S1) behind the central sulcus in the parietal lobe. This lobe is involved in many aspects of movement processing and is an area where information from multiple sensory regions converges. In the posterior parietal area, body coordinates in space are monitored and calculated, and visual information is transferred into these coordinates. As far as musicians are concerned, this area is prominently activated during tasks involving multisensory integration, for Supplementary motor a. (SMA)

Primary motor a. (M1)

Primary somatosensory a. (S1)

Premotor a. (PMA) Cingulate motor a. (CMA)

Leg 4 3, 1, 2

6 8 Frontal lobe

Prefrontal a.

Auditory association a. (AA)

Arm

5 7 Parietal lobe

Posterior parietal a.

Face Occipital lobe Temporal lobe

Secondary auditory a. (A2)

Primary auditory a. (A1)

Figure 1.  Brain regions involved in sensory and motor music processing. (The abbreviation “a” stands for “area.”) Left hemisphere is shown in the foreground (lower right); right hemisphere in the background (upper left). The numbers relate to the respective Brodmann’s areas, a labeling of cortical regions according to the fine structure of the nervous tissue.

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

466    eckart altenmüller et al. example, during sight-reading, the playing of complex pieces of music (Haslinger et al., 2005), and the transformation of musical pitch information into movement coordinates (Brown et al.,  2013) and of musical notation into corresponding motor actions (Stewart et al., 2003). The primary motor area (M1) represents the movements of body parts distinctly, in systematic order. The representation of the leg is located on the top and the inner side of the hemisphere, the arm in the upper portion, and the hand and mouth in the lower portion of M1. This representation of distinct body parts in corresponding brain regions is called “somatotopic” or “homuncular” order. Just as the motor homunculus is represented upside-down, so too is the sensory homunculus on the other side of the central sulcus. The proportions of both the motor and the sensory homunculi are markedly distorted since they are determined by the density of motor and sensory innervations of the respective body parts. For example, control of fine movements of the tongue requires many more nerve fibers transmitting information to this muscle, compared to control of the muscles of the back. Therefore, the hand, lips, and tongue require almost two-thirds of the neurons in this area (Roland & Zilles, 1996). However, as further explained below, the relative representation of body parts may be modified by usage. Moreover, the primary motor area does not simply represent individual muscles; multiple muscular representations are arranged in a complex way so as to allow the execution of simple types of movements rather than the activation of a specific muscle. This process is a consequence of the fact that a two-dimensional array of neurons in M1 has to code for three-dimensional movements in space (Gentner & Classen, 2006). Put more simply, our brain does not represent muscles but rather movements. The supplementary motor area (SMA) is mainly involved in the sequencing of complex movements and in the triggering of movements based on internal cues. It is particu­ larly engaged when the execution of a sequential movement depends on internally stored and memorized information. It therefore is important for both rhythm and pitch processing because of its role in sequencing and the hierarchical organization of movement (Hikosaka & Nakamura, 2002). Skilled musicians and non-musicians engage the SMA either when performing music or when imagining listening to or performing music (de Manzano & Ullén, 2012; Herholz & Zatorre, 2012). This finding suggests that the SMA may be crucial for experts’ ability to plan music segment-by-segment during performance. The premotor area (PMA) is primarily engaged when our motor system has to react to external stimuli, such as acoustic or visual prompts. Anticipation, planning, and preparation of movement patterns in response to visual cues have been attributed to the function of PMA (Stetson & Anderson, 2015). It is involved in the learning, execution, and recognition of limb movements and seems to be particularly concerned with the integration of visual information, which is necessary for movement planning. The PMA is also responsible for processing complex rhythms (Chen, Penhune, & Zatorre, 2008b). The function of the cingulate motor area (CMA) is still under debate. Electrical stimulation and brain imaging studies demonstrate its involvement in movement selection in situations when movements are critical to obtain reward or avoid punishment. This fact points towards close links between the cingulate gyrus and the emotion processing

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

brain research in music performance   467 limbic system. The CMA may therefore play an important role in mediating cortical cognitive and limbic-emotional functions, for example, in error processing during a musical performance (Herrojo-Ruiz, Jabusch, & Altenmüller, 2009). The basal ganglia, located deep inside the cerebral hemispheres, are interconnected reciprocally via the thalamus to the motor and sensory cortices, thus constituting a loop of information flow between cortical and subcortical areas. They are indispensable for any kind of voluntary action and play a crucial role in organizing sequences of motor actions. The basal ganglia are therefore the structures mainly involved in automation of skilled movements such as sequential finger movements (Seger,  2006). Their special function consists of selecting appropriate motor actions and comparing the goal and course of those actions with previous experience. The middle putamen in particular seems to be involved in storing fast and automated movement programs. It is subject to plastic adaptations in professional musicians. Furthermore, in the basal ganglia the flow of information between the cortex and the limbic emotional systems, in particular the amygdala and the accumbens nucleus, converges. It is therefore assumed that the basal ganglia process and control the emotional evaluation of motor behavior in terms of expected reward or punishment (for a review see Haber, 2003). The cerebellum is an essential contributor to the timing and accuracy of fine-tuned movements. It is thought to play a role in correcting errors and in learning new skills. The cerebellum has been hypothesized to be part of a network including parietal and motor cortex that encodes predictions of the internal models of these skills. The term “internal model” refers to a neural process that simulates the response of the motor system in order to estimate the outcome of a motor command. The cerebellum is connected to almost all regions of the brain, including those important for memory and higher cognitive functions. It has been proposed that this structure serves as a universal control system that contributes to learning, and to optimizing a range of functions across the brain (Ramnani, 2014).

The Effects of Musical Training on Brain Function With advanced techniques, brain function can be precisely assessed. Activity changes of brain networks, connectivity measures between brain areas on a small and a large scale, and even the amount of nerve cells activated in response to musical stimuli can be estimated (for a review on methodology see Altenmüller, Münte, & Gerloff, 2004). The neural bases of refined auditory processing in musicians are well understood. In 1998, Pantev and colleagues provided a first indication that extensive musical training can plastically alter receptive functions (Pantev et al., 1998). Equivalent current dipole strength, a measure of mass neuronal activation, was computed from evoked magnetic fields generated in auditory cortex in response to piano tones and to pure tones of equal

OUP CORRECTED PROOF – FINAL, 07/10/2019, SPi

468    eckart altenmüller et al. fundamental frequency and loudness. In musicians, the responses to piano tones (but not to pure tones) were ~25 percent larger than in non-musicians. In a study of violinists and trumpeters, this effect was most pronounced for tones from each musician’s own type of instrument (Hirata, Kuriki, & Pantev, 1999). In a similar way, evoked neural responses to subtle alterations in rhythm or pitch are much more pronounced in musicians than in non-musicians (Münte, Nager, Beiss, Schroeder, & Altenmüller, 2003). Even functions such as sound localization that operate on basic acoustic properties have shown effects of plasticity and expertise amongst different groups of musicians. A conductor, more than any other musician, is likely to depend on spatial localization for successful performance. For example, he might need to guide his attention to a certain player in a large orchestra. In one study, professional conductors were found to be better than pianists and non-musicians at separating adjacent sound sources in the periphery of the auditory field. This behavioral selectivity was paralleled by modulation of evoked brain responses, which were selective for the attended source in conductors, but not in pianists or non-musicians (Münte, Kohlmetz, Nager, & Altenmüller, 2001). These functional adaptations are not restricted to the auditory cortex, but can be observed in subcortical areas of the ascending auditory pathway: musically trained individuals have enhanced brainstem representations of musical sound wave-forms (Wong, Skoe, Russo, Dees, & Kraus, 2007). Refined somatosensory perception constitutes another basis of high-level per­ form­ance. The kinaesthetic sense is especially important. It allows for control and feedback of muscle and tendon tension as well as joint positions, which enables continuous monitoring of finger, hand, and lip position in the frames of body and instrument coordinates (e.g., the keyboard, the mouthpiece). Intensive musical training has also been associated with an expansion of the functional representation of finger or hand maps, as demonstrated in magnetoencephalography (MEG) studies. For example, the somatosensory representation of the left fifth digit in string players was found to be larger than that of non-musicians (Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995). Musicians who had begun training early in life (